TL;DR
- OpenAI shared a first glimpse at new tool Sora that instantly generates videos from just a line of text.
- The apparent capabilities of Sora are deemed perfect for stock footage, presentations, and commercials with developments likely to lead to longer form films.
- AI tools that can generate videos indistinguishable from video shot with a real camera raise concerns again about content creation industry jobs and misuse by spreading deepfakes
OpenAI seems to delight in pulling rabbits from a hat and was more than aware of what its latest research project would do when it alerted the internet.
Everyone’s gone wild for Sora, a new diffusion model being tested which can generate one minute video clips from just a single text input. To prove what it can do OpenAI dropped some videos online generated by Sora “without modification.” One clip highlighted a photorealistic woman walking down a rainy Tokyo street.
The Sora prompt for this video was: A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about.
“Every single one of [them] is AI-generated, and if this doesn’t concern you at least a little bit, nothing will,” tweeted YouTube tech journalist Marques Brownlee. “This is simultaneously really impressive and really frightening at the same time,” he added on his YouTube channel.
A blog post on the website of nonlinear editing software Lightworks declared, “Sora’s almost magical powers represents yet another seismic shift in the possibilities of content creation.”
READ MORE: Sora: The Future of Filmmaking? Exploring the Pros, Cons, and Ethical Considerations (Lightworks)
“It’s incredible and scary” says Erik Naso of Newsshooter.
“Sora is a glimpse into a future where the lines between creation, imagination, and AI blur into something truly extraordinary,” says Conor Jewiss at Stuff.
READ MORE: How you’ll be able to create truly insane videos from text thanks to OpenAI’s Sora tool (Stuff)
Benj Edwards of Ars Technica thinks OpenAI is on track to deliver a “cultural singularity” — the moment when truth and fiction in media become indistinguishable.
“Technology like Sora pulls the rug out from under that kind of media frame of reference. Very soon, every photorealistic video you see online could be 100 percent false in every way. Moreover, every historical video you see could also be false.”
READ MORE: OpenAI collapses media reality with Sora, a photorealistic AI video generator (Ars Technica)
What has excited the AI and artistic community so much is the cinematic photorealism of the videos produced by OpenAI’s algorithm which seems “to understand how things like reflections, and textures, and materials, and physics, all interact with each other over time,” said Brownlee.
In its research paper Open AI states the model deeply understands language, enabling it to accurately interpret prompts and generate compelling characters that express vibrant emotions.
READ MORE: Video generation models as world simulators (OpenAI)
Sora can also create multiple shots within a single generated video that accurately persist characters and visual style.
OpenAI further states it is teaching the AI to understand and simulate the physical world in motion, with the goal of training models that help people solve problems that require real-world interaction.
Two videos in particular grabbed attention. “This is one of the most convincing AI generated videos I’ve ever seen, says Brownlee of a video made with this text prompt: “A movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors.”
“This looks like it could be an actual film trailer,” says Theoretically Media’s Tim Simmonds.” I mean that there’s nothing really in here to majorly indicate that this is AI generated.”
The other, featuring an aerial flyover, was spun-up from the prompt: “Historical footage of California during the gold rush.”
“The drone footage of an old California mining town looks really, really pretty great,” Simmonds says. “And even as the camera makes this turn here, the buildings stay intact, they don’t start to shift and warp and morph into weird things.”
Brownlee thinks it demonstrates “all sorts of implications for the drone pilot that no longer needs to be hired, and all the photographers and videographers whose footage no longer needs to be licensed to show up in the ad that’s being made,” he says.
“It’s also very capable of historical themed footage,” he adds. “This is supposed to be California during the gold rush. It’s AI generated but it could totally pass for the opening scene in an old western.
Which begs the inevitable question, How long until an entire ad with every single shot is completely generated with AI? Or an entire YouTube video, or an entire movie?
Simmonds still thinks we are a way out from that “because [Sora] still has flaws and there’s no sound [no audio/dialogue sync] and there’s a long way to go with the prompt engineering to iron these things out,” he says.
Naso agrees that Sora “could change the game for stock footage,” adding that the next stage for AI prompt filmmaking is dialogue-based scenes. “So far, these examples are more like b-roll.”
READ MORE: OpenAI Sora – The most realistic AI-generated video to date (Newsshooter)
Nonetheless, even at the pace of AI development it seems OpenAI has caught everyone napping.
Rachel Tobac, a member of the technical advisory council of the Cybersecurity and Infrastructure Security Agency (CISA), posted on X (formerly known as Twitter) that “we need to discuss the risks” of the AI model.
“My biggest concern is how this content could be used to trick, manipulate, phish, and confuse the general public,” she said.
OpenAI also says it is aware of defamation or misinformation problems arising from this technology and plans to apply the same content filters to Sora as the company does to DALL-E 3 that prevent “extreme violence, sexual content, hateful imagery, celebrity likeness, or the IP of others,” as Aminu Abdullahi reports at TechRepublic.
READ MORE: Behind the Controversy: Why Artists Hate AI Art (TechRepublic)
Others flagged concerns about copyright and privacy, with Ed Newton-Rex, CEO of non-profit AI certification company Fairly Trained, maintaining: “You simply cannot argue that these models don’t or won’t compete with the content they’re trained on, and the human creators behind that content.”
Anticipating these concerns, OpenAI plans to watermark content created with Sora with C2PA metadata. However, OpenAI doesn’t currently have anything in place to prevent users of its other image generator, DALLE-3, from removing metadata.
OpenAI said it is engaging with artists, policymakers and others to ensure safety before releasing the new tool to the public. However, its get-out clause is that despite extensive research and testing, “we cannot predict all of the beneficial ways people will use our technology, nor all the ways people will abuse it.”
The Microsoft-backed company is valued at $80 billion after a recent injection of VC funds. “It will become impossible for humans to detect AI-generated content by human beings,” Gartner analyst Arun Chandrasekaran warned TechRepublic. “VCs are making investments in startups building deepfake detection tools, however, there is a need for public-private partnerships to identify, often at the point of creation, machine-generated content.”
READ MORE: OpenAI Completes Deal That Values the Company at $80 Billion (The New York Times)
READ MORE: OpenAI’s Sora Generates Photorealistic Videos (Tech Republic)
Sora joins a chorus of other text to video generators such as Runway and Fliki, the Meta Make A Video generator, and the yet-to-be-released Google Lumiere.
Question: Has Apple taken its eye off the ball? Answer: Maybe not. Its researchers have just published paper about Keyframer, a design tool for animating static images with natural language.
As Emilia David at The Verge points out, Keyframer is one of several generative AI innovations that Apple has announced in recent months. In December, the company introduced Human Gaussian Splats (HUGS), which can create animation-ready human avatars from video clips. Apple also released MGIE, an AI model that can edit images using text-based descriptions.