Top Qs
Timeline
Chat
Perspective

Vision-language-action model

Foundation model allowing control of robot actions From Wikipedia, the free encyclopedia

Remove ads

A vision-language-action model (VLA) is a foundation model that allows control of robot actions through vision and language commands.[1]

One method for constructing a VLA is to fine-tune a vision-language model (VLM) by training it on robot trajectory data and large-scale visual language data[2] or Internet-scale vision-language tasks.[3]

Examples of VLAs include RT-2 from Google DeepMind.[4]

Remove ads

References

Loading related searches...

Wikiwand - on

Seamless Wikipedia browsing. On steroids.

Remove ads