Top Qs
Timeline
Chat
Perspective
Veo (text-to-video model)
Video-generating machine learning model From Wikipedia, the free encyclopedia
Remove ads
Veo is a text-to-video model developed by Google DeepMind and announced in May 2024. As a generative AI model, it creates videos based on user prompts. Veo 3, released in May 2025, can also generate accompanying audio.
Remove ads
Development
In May 2024, a multimodal video generation model called Veo was announced at Google I/O 2024.[1] Google claimed that it could generate 1080p videos beyond a minute long.[1] In December 2024, Google released Veo 2, available via VideoFX. It supports 4K resolution video generation, and has an improved understanding of physics.[2] In April 2025, Google announced that Veo 2 became available for advanced users on Gemini App.[3] In May 2025, Google released Veo 3, which not only generates videos but also creates synchronized audio — including dialogue, sound effects, and ambient noise — to match the visuals.[4][5] Google also announced Flow, a video-creation tool powered by Veo and Imagen.[6]
A key innovation of the May 2025 release of Veo 3 was that it generated music and voice to match well with the video.[5] Google DeepMind CEO Demis Hassabis described the release as the moment when AI video generation left the era of the silent film.[5]
Reactions
A reporter for Gizmodo reacted to the release of Veo 3 by observing that users directed the model to generate low-quality content, such as man on the street interviews or haul videos of people unboxing products.[7] Another media commentator reported that the tool tended to repeat the same joke in response to different prompts.[8]
Commentators speculated that Google had trained the service on YouTube videos[5] or Reddit posts.[8] Google itself had not stated the source of its training content.[5]
References
External links
Wikiwand - on
Seamless Wikipedia browsing. On steroids.
Remove ads