AI-generated videos and images used to be so easy to spot (remember Will Smith eating spaghetti?). But the latest AI video models are getting good — scary good.
Naturally, generating video with AI is a whole lot trickier than generating images. While there are dozens of good to great AI image generators, in the video space, you can count on one hand how many tools can do it convincingly. Two of the most popular are Google's Veo 3 and OpenAI's Sora 2.
So, which AI video model wins out in a head-to-head contest? If you've been closely following this footrace, the answer probably won't surprise you.
What are Veo 3 and Sora 2?
Veo 3 is the name of Google's cutting-edge generative AI video model. Not only was Veo 3 a dramatic improvement over the previous generation, Veo 2, but it also kicked off a whole new era of AI video. Veo 3 can generate realistic videos based on text prompts rather than simply animating existing images. Crucially, it can also create dialogue and other realistic sounds. You can access Veo 3 in Google's AI chatbot Gemini or via other Google tools like Flow, an experimental AI filmmaking tool.
Veo 3 is available in two flavors — Veo 3 Fast and Veo 3 Quality. Because we wanted to test the quality of the videos, we chose the latter for this test.
OpenAI launched Sora 2 on Sept. 30 in a standalone iOS app called Sora. Sora 2 is the successor to the company's first AI video model, also called Sora. At the time of writing, Sora 2 is only available via the invite-only Sora app. Sora 2 also offers a social media-style feed of videos from the community, like TikTok for AI videos (because we didn't have enough of those already).
Notes on comparisons
Appropriately, we used AI — in this case, ChatGPT — to help create prompts for AI video tests. The prompts below were designed to test different aspects of video creation, from audio to animation. ChatGPT came up with prompts to test video generators, which we then tweaked and refined.
A handheld camera follows a young woman walking through a crowded street in Tokyo at night during a light rain. Neon signs reflect off wet asphalt and umbrellas. The camera stays fixed on her from behind as she glances toward a glowing billboard, then continues walking. The scene should feel cinematic and hyper-real, like shot on a mirrorless camera with shallow depth of field.
A superhero in a red and silver suit lands hard on a rooftop at sunset, cracking the concrete under their feet. The cape ripples in the wind as the camera orbits around them in slow motion. In the distance, drones fly between skyscrapers with glowing windows. The overall tone should feel like a live-action blockbuster.
A cyberpunk-inspired 3D animation of Times Square filled with holographic ads and flying cars. A large digital billboard lights up with the word ‘MASHABLE’ in bold white type. The animation should have crisp text, glowing reflections, and dynamic lighting reminiscent of Into the Spider-Verse’s visual energy.
A hand-drawn, painterly 2D animation of two friends sitting by a café window on a rainy afternoon. Soft watercolor-style lighting and visible brush strokes. One says gently: ‘You know, sometimes the smallest step can change everything.’ The other smiles and nods. Include subtle mouth animation matching the line, light rain sound outside, and quiet clinking of cups in the background.
Photorealistic street scene where [the subject] dances freely down a tree-lined city sidewalk, loose casual clothes, upbeat tempo. Ambient street sounds (distant traffic, footsteps), cinematic lighting at golden hour.
I also created a prompt designed to generate a video of a copyrighted character, as well as a second prompt in case the generator refused. I'm choosing not to share this prompt so as to not encourage creating AI videos that blatantly use copyrighted material, which has been a sore point for OpenAI and Sora so far.
Prompt 1: A woman in Tokyo
This prompt was generally straightforward in terms of creativity, but the hope was that the video generators would be able to create a cinematic and lively feel through things like reflections in water. So how'd they do?
Both Sora 2 and Veo 3 created nice-looking videos. But there were some clear differences. The video that Sora 2 generated had a much tighter crop than Veo 3, meaning images and details in the background of the shot were much less visible. Veo 3 had a wider angle, resulting in a more immersive video. That may be partially a point in Sora's favor, given the fact that the prompt specifically mentioned having a shallow depth of field; Sora 2's video showed a much shallower depth of field than the video created by Veo 3.
It was fascinating to see the choices that the generators made about the young woman. Sora generated a subject with an umbrella despite the prompt not directing it to do so – even though it did mention umbrellas. While the video created by Sora 2 wasn’t incorrect, the video created by Veo 3 was more interesting, more detailed, and better overall.
Winner: Veo 3
Prompt 2: A superhero landing
We pushed the two video generators to create videos of copyrighted characters, but not in this prompt. As a result, I was a little surprised when Sora 2 refused to create this video, noting copyrighted material. After all, the concept of a superhero isn't copyrighted. This seems to be part of a post-launch crackdown on intellectual property infringement.
While Veo 3 did produce a video, the result wasn't as ordered. For one thing, the prompt specifically mentions live-action, but the superhero's face, or what's visible of it, looked more animated than real.
The generator also struggled with physics. For most of the video, our superhero is standing on what appears to be a hole in the concrete, while the concrete pieces created when the superhero lands seemingly disappear into thin air. More prompt engineering could surely solve this problem, but it's annoying all the same.
Google also gets the win here, but only by forfeit — its opponent didn't show up.
Winner: Veo 3
Mashable Light Speed
Prompt 3: Cyperpunk Times Square
This prompt, thankfully, was easy for both generators to follow. Both Veo 3 and Sora 2 were able to create an approximation of what Times Square might look like in the future, complete with skyscrapers and billboards. Both also followed the instruction to have one billboard show particular words.
Sora 2 did a slightly better job at recreating the Into the Spider-Verse aesthetic, though neither of the two could be rated excellent.
Still, Veo 3's video was more interesting than Sora 2's. It had movement instead of a single static image. (The generators often added moving details to static images, and it made for boring results.)
While Sora 2 followed the prompt a little better, Veo 3’s video was much more interesting. I’m giving this one to both.
Winner: Tie
Prompt 4: Two friends talking
This prompt was designed to test the generators' ability to create audio that goes along with the video. Both Veo 3 and Sora 2 have the ability to add dialogue and sound effects.
First, the visuals. The prompt specified 2D animation, and only Veo 3 actually followed that. Sora 2 created something in a style of 3D animation instead of 2D.
The audio that Sora 2 generated was a little strange. The dialogue sounded off, as if both of the characters were sleep-talking or hypnotized. Veo 3's dialogue was much more lively and realistic. The background sound effects were similar in both videos. In both, you can hear rain, but neither followed the prompt in adding the sounds of clinking cups.
The winner here is pretty clear. Again, it’s Veo 3.
Winner: Veo 3
Prompt 5: Dancing in the street
One of the headline features of OpenAI's Sora 2 is cameos, or the ability to make videos featuring the likeness of real people (who have explicitly given permission for this use). For this prompt, I attempted to create a video of myself dancing in the street.
On Sora 2, this was easy; it's a feature that's explicitly supported by the app. In Veo, however, it was much more difficult. Google offers a feature called Ingredients to Video, where you can upload things like images for the generator to use in creating the video. However, Ingredients to Video is not supported by Veo 3, just the lower-quality Veo 2 Fast. You can only create portrait orientation videos with the feature.
On top of that, in our testing of Veo 3, we found that Gemini will often refuse to make videos based on pictures featuring people. This is done to prevent deepfakes, which is great, but animating still images is one of the most common uses of AI video, and Veo 3 makes it unnecessarily difficult.
Both videos were a little strange, and I say that as the subject. The face in the video created by Veo 2 was glitchy, and for some reason, Veo 2 decided that I should be dancing backwards. The video created by Sora 2 was a little more creative, and it gave me clothes that I don't think I could pull off in real life.
Sora did a better job at making me actually dance than Veo 2 did. I have no idea why Sora 2 had me say “this feels good”, but it's… not terrible.
Winner: Sora 2
Prompt 6: Copyright material
This prompt was designed to test whether or not the generators could create video of copyrighted characters. As we saw in the superhero prompt, Sora 2 is extremely sensitive when it comes to this, so it came as no surprise when it refused to respond to the first and second prompts — even though the second prompt doesn't mention a character by name, only alluding to them.
Veo 3 had no problem generating a video of a copyrighted character, however. This worked with multiple characters, too.
There's no winner or loser in this category. We're not going to wade into the debate around generating content of copyrighted characters — at least, not here. Still, it's worth keeping in mind that if you're looking to create videos of characters you know and love, you won't be able to do it with Sora while the app is under such scrutiny from rights holders.
The winner: It's Veo 3, and it's not close
A screenshot from a photorealistic AI video generated by Google to promote Veo 3. AI-GENERATED IMAGE. Credit: Google
OpenAI's Sora 2 is making headlines for its social approach and its ability to create videos with you in them. However, beyond making memes, it's extremely limited.
Google's Veo 3 generates much better and higher-quality videos overall. Of the two models, if you want to use generative AI video for professional purposes — for filmmaking, gaming, social media, or, most likely, in advertising — only Veo 3 is a truly viable option.
Sora 2 did excel at creating a video of me, and that's the biggest advantage it has to offer right now. But Veo 3, when used in the Google Flow app, is both higher quality and more versatile, offering features for horizontal and portrait orientations and settings for creating multiple videos at a time.
Disclosure: Ziff Davis, Mashable’s parent company, in April filed a lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.