Top Qs
Timeline
Chat
Perspective
MediaPipe
Open source AI framework by Google From Wikipedia, the free encyclopedia
Remove ads
MediaPipe is an open source framework with many libraries developed by Google for several artificial intelligence and machine learning solutions. These solutions range from generative artificial intelligence,[2][3] real-time computer vision,[4][5][6] natural language processing[7] and audio[8][9][10] techniques. These solutions can also be used on various platforms such as Android,[11][12] JavaScript web,[13] Python[14][15] and iOS,[16] supporting edge devices.[17][18][19]
Remove ads
History
Google has long used MediaPipe in its products and services. Since 2012, it has been used for real-time analysis of video and audio on YouTube. Over time MediaPipe has been incorporated into many more products such as Gmail, Google Home, etc.[20]
MediaPipe's first stable release was version 0.5.0.[21] It was made open source in June 2019 at the Conference on Computer Vision and Pattern Recognition in Long Beach, California, by Google Research. This initial release included only five pipelines examples: Object Detection, Face Detection, Hand Tracking, Multi-hand Tracking, and Hair Segmentation.[22] From its initial release to April 2023, numerous pipelines have been made. In May 2025, MediaPipe Solutions was introduced. This transition offered more capabilities for on-device machine learning.[23] MediaPipe is now under Google's subdivision, Google AI Edge.
Remove ads
Solutions
MediaPipe's available solutions are:
- LLM Inference API
- Object detection
- Image classification
- Image segmentation
- Interactive segmentation
- Hand landmark detection
- Gesture Recognition
- Image embedding
- Face detection
- Face landmark detection
- Pose landmark detection
- Image generation
- Text classification
- Text embedding
- Language detector
- Audio Classification
MediaPipe's legacy solutions are:
- Face Detection
- Face Mesh
- Iris
- Hands
- Pose
- Holistic
- Selfie segmentation
- Hair segmentation
- Object detection
- Box tracking
- Instant motion tracking
- Objectron
- KNIFT
- AutoFlip
- MediaSequence
- YouTube 8M
Remove ads
Programming Language
MediaPipe is primarily written in the programming language C++, although this is not the sole programing language used in its creation. The other notable programming languages used within its source code include Python, Starlark, and Java.[21]
The ability for MediaPipe to separate itself into a system of components allows for customization. Pre-built solutions are also available and it may help to start with these and slightly optimize them for an ideal output.[24]
How MediaPipe Works
Summarize
Perspective
MediaPipe contains a multitude of different components that all work together to create a general purpose computer vision framework. Each component works in its own unique way with different architectures.
Hand Tracking
MediaPipe includes a hand tracking system that has been designed to run efficiently on devices with limited computational resources. This works by estimating a set of 3D landmarks for each detected hand and is intended to remain stable across a wide range of environments including different poses, lightning conditions, and motions.[25]
MediaPipe works off of a pre-trained deep learning model that is trained to detect the palm area on human hands, which is done through a detector model named BlazePalm.[24][26] Starting with the identification of the palm, MediaPipe is able to use the positioning of the palm as an input to a second model that predicts the positions of key landmarks that will represent the hand's structure.[25]


MediaPipe continuously monitors the confidence of its predictions and re-runs detection when needed to maintain its accuracy, while temporal smoothing helps reduce the jitter between frames. For scenes with more than one hand, the process is repeated independently for each detected region.[25][24]
Human Pose Estimation
Another area that MediaPipe specializes in is recognizing changes in the human body specifically posture. Mediapipe can support the creation of body posture analysis systems. This can aid in many fields such as ergonomic industry, the arts, sports, and entertainment. [24]
Remove ads
References
Wikiwand - on
Seamless Wikipedia browsing. On steroids.
Remove ads