Top Qs
Timeline
Chat
Perspective
Vision-language-action model
Foundation model allowing control of robot actions From Wikipedia, the free encyclopedia
Remove ads
A vision-language-action model (VLA) is a foundation model that allows control of robot actions through vision and language commands.[1]
One method for constructing a VLA is to fine-tune a vision-language model (VLM) by training it on robot trajectory data and large-scale visual language data[2] or Internet-scale vision-language tasks.[3]
Examples of VLAs include RT-2 from Google DeepMind.[4]
Remove ads
References
Wikiwand - on
Seamless Wikipedia browsing. On steroids.
Remove ads