Veo (text-to-video model)

Veo
A video generated by Veo 3 of an owl and badger
Developer	Google DeepMind
Initial release	May 2024; 1 year ago (2024-05)

Stable release	Veo 3.1 / 15 October 2025; 48 days ago (2025-10-15)

Type	Text-to-video model
Website	deepmind.google/models/veo/

Development

In May 2024, a multimodal video generation model called Veo was announced at Google I/O 2024.^[1] Google claimed that it could generate 1080p videos over a minute long.^[1] In December 2024, Google released Veo 2, available via VideoFX. It supports 4K resolution video generation and has an improved understanding of physics.^[2] In April 2025, Google announced that Veo 2 became available for advanced users on the Gemini app.^[3]

In May 2025, Google released Veo 3, which not only generates videos but also creates synchronized audio — including dialogue, sound effects, and ambient noise — to match the visuals.^[4] Google also announced Flow, a video-creation tool powered by Veo and Imagen.^[5]^[6] Google DeepMind CEO Demis Hassabis described the release as the moment when AI video generation left the era of the silent film.^[6]

Remove ads

Capabilities and limitations

Summarize

Perspective

A LGBTQ romantic thriller short film, generated by Google Veo 3. This video is an example of detailed, diverse, realistic character models; continuity with characters and environments between cuts; music; voice acting; subtitles; and product placement.

Google Veo can be bought by several subscription/membership tiers, and/or by using Google "AI credits". The software itself can be run by two different consoles called Google Gemini and Google Flow, with Gemini being geared towards shorter, quicker, and faster projects, using the Gemini AI chat model, or through Google Flow, which is essentially a movie editor, as well, allowing users to create longer projects, and continuity using the same characters and actors. Users can create a maximum length of eight seconds per clip.^[7]

Google Veo, has a relatively simple interface and dashboard, however writing prompts, for those who have little to no experience in transcribing or filmmaking may face issues with the software misunderstanding what the user intended by their prompt (no matter how detailed it was). So although Veo does have a friendly and simple setup, prompts, which are the forefront of the software, need to be not only short and to the point, but they also must be very specific, if the user wants the right vision for their project. Google Veo, when it comes to human models, is able to generate several ethnicity and body types. The software is also capable of generating stand up comedy routines, and Music videos. It can as well generate animals, cartoons, and animation. Prompts must accurately describe places, people, and things in each scene, in addition knowledge of film and camera lingo such as panning, zooming, and terms for camera angles, are also important.^[8]

Google Veo however, has strict guidelines and blockades to their software. Before a clip is generated, the algorithm computer software reviews it, and if it is anything deemed inappropriate, too graphically sexual, illegal, showcasing graphic abuse/assault/fighting (unless the prompt specifies that it is a fictitious martial arts scene etc.) gross behaviors, antisemitism, racist, homophobic, anything depicting reigning regimes, rioting, blood, gore, or warfare, (unless in some cases the prompt specifies that it is fictitious period drama, the clip may still be generated), the clip will not be generated. In addition, Google Veo cannot and will not generate character actors that look identical to celebrities or real-life individuals. Users have primarily complained that, regardless of how descriptive and detailed their prompts are, Google Veo often misunderstands the input, resulting in completely different outputs. Common issues include the emulation of incorrect subtitles and captions, the generation of complex scenes that are incomplete due to the maximum eight-second length, the production of garbled and nonsensical speech, and character models that appear deformed in both appearance and movement. Users have also reported that their prompts and generated content are falsely flagged as violating guidelines, along with a variety of other issues and complaints. However, trial and error may have to be used with Veo for optimal results.^[9]

Remove ads

Reactions

A reporter for Gizmodo reacted to the release of Veo 3 by observing that users were directing the model to generate low-quality content, such as man on the street interviews or haul videos of people unboxing products.^[10] Another media commentator reported that the tool tended to repeat the same joke in response to different prompts.^[11]

Commentators speculated that Google had trained the service on YouTube videos^[6] or Reddit posts.^[11] Google itself had not stated the source of its training content.^[6]

In July 2025, Media Matters for America reported that racist and antisemitic videos generated using Veo 3 were being uploaded to TikTok.^[12]^[13] Ryan Whitwam of Ars Technica commented, "In a perfect world, Veo 3 would refuse to create these videos, but vagueness in the prompt and the AI's inability to understand the subtleties of racist tropes (i.e., the use of monkeys instead of humans in some videos) make it easy to skirt the rules."^[13]

Veo (text-to-video model)

Development

Capabilities and limitations

Reactions

See also

References

External links

Wikiwand - on