Five months after releasing its AI filmmaking tool Flow, Google updated the underlying video generation model Veo and added new capabilities to Flow.
Google introduced Veo 3.1, which builds on Veo 3, on Oct. 15. The updated Veo version provides improved audio generation and audio support, more narrative control, and better realism, the vendor said.
The upgraded Veo and Flow capabilities represent the growing interest in video generation among generative AI technology users in the media and entertainment industries and others.
Earlier this month, independent generative AI vendor OpenAI launched the latest version of its video generation model Sora 2. The model maker also created a new Sora social media app, which is still available only by invitation.
While Sora 2 and Veo 3.1 compete, the strategies OpenAI and Google employed with their respective models differ, said Arun Chandrasekaran, an analyst at Gartner.
“OpenAI is trying something very different ” with Sora, Chandrasekaran said. “They’re also making Sora more of a consumer app, whereas Google is focused on the B2B space.”
Incremental changes
Google executed some incremental improvements with Veo 3.1, such as making the model more perceptive and synchronizing the video and audio.
“They’re trying to make these models a lot more perceptive to the prompts,” Chandrasekaran said. “They clearly want to understand the intent of the prompt, the kind of experience that the user wants to see from the prompt so that they can actually translate that into realistic videos or realistic output.”
Google said Veo 3.1’s improved audio capabilities include native audio support for features such as dialogue, ambient sounds, sound effects and music.
Another upgraded capability is the ability to craft the look of a scene by adding reference images to control the characters, objects and style.
Flow uses the various “ingredients” — images, added objects, and style — to create a final scene.
New Flow capabilities include the ability to upload first and last frames and have Flow generate the transition video between them. Users can also add or remove elements from existing clips, extend a video clip with audio support or use precision editing capabilities.
Some challenges
The changes Google made demonstrate its intention to help users create a richer video experience, Chandrasekaran said.
“Given the fact that a lot of the early interest around the video model is coming from industries like media and entertainment, gaming … the ability to have good cinematic experiences or richer experiences, that’s really important,” he said.
Despite the improved prompting capabilities and synchronization, Google will still need to pay attention to the authenticity of what the model creates, Chandrasekaran said.
“The ability to create inauthentic content or deepfakes like content, the propensity for doing that is just a lot higher,” he said.
So, Google will need to identify the customer who is using its product and ensure that the model or video creation platform doesn’t fall into the wrong hands, Chandrasekaran continued.
A lot of these technologies are what we call dual-use technologies,” he said. “There is always a good use case. But it can also be misused quite easily.”