TL;DR
- There is a new wave of AI video generators coming to market that bring long- form quality narrative video one step closer.
- Google Lumiere is a new video AI model still in beta mode that uses space-time diffusion to generate more coherent and realistic videos.
- But we’re not there yet. Despite the incredible leaps the video AI models have considerable limitations; while using them without knowledge of their training data could open the user up to legal challenges.
Google Lumiere may yet to be released but the company has released video clips it says were created by the new technology. Reviewers have gone wild.
AI video generation has gone from uncanny valley to near realistic in just a few years and the latest wave of tools, including OpenAi Sora and Google Lumiere, brings the prospect of coherent AI-generated features a step closer.
“An AI video generator that looks to be one of the most advanced text-to-video models yet,” says Matt Growcoot at PetaPixel. “Pretty amazing,” assesses Sabrina Ortiz at ZDNet. “Revolutionary,” judges Jace Dela Cruz at TechTimes. “Can render cute animals in implausible situations,” writes Benj Edwards at Ars Technica.
READ MORE: Google’s New AI Video Generator Looks Incredible (PetaPixel)
READ MORE: Google’s AI video generator tech is pretty amazing. See for yourself (ZDNet)
READ MORE: Google Unveils Lumiere: A Revolutionary New AI-Powered Text-to-Video Generator (TechTimes)
READ MORE: Google’s latest AI video generator can render cute animals in implausible situations (Ars Technica)
YouTuber Matt Wolfe predicts that 30-60 minute long completely AI-generated films “that are coherent and enjoyable” are coming in the next few months.
Calling the news a “bombshell,” Tim Simmons, owner and founder of AI commentator Theoretically Media, notes that Lumiere isn’t perfect but it’s a startling advance nonetheless, catapulting Google into the front ranks of AI video generators.
“I don’t think I’ve seen before this [idea] you can give the model a reference image and then it will generate videos in the style of that reference image,” Wolfe says.
That’s because Google has taken a different approach to its model.
As The Verge explains, Lumiere uses new diffusion model called Space-Time-U-Net, or STUNet, that figures out where things are in a video (space) and how they simultaneously move and change (time).
“Other models stitch videos together from generated key frames where the movement already happened, while STUNet lets Lumiere focus on the movement itself based on where the generated content should be at a given time in the video,” says The Verge reporter Emilia David.
Lumiere starts with creating a base frame from the prompt. Then, it uses the STUNet framework to begin approximating where objects within that frame will move to create more frames that flow into each other, creating the appearance of seamless motion.
READ MORE: Google’s Lumiere brings AI video closer to real than unreal (The Verge)
Lumiere starts with creating a base frame from the prompt. Then, it uses the STUNet framework to begin approximating where objects within that frame will move to create more frames that flow into each other, creating the appearance of seamless motion.
“By handling both the placement of objects and their movement simultaneously,” Google claims Lumiere “can create consistent, smooth and realistic movement across a full video clip,” reports Ryan Morrison at Tom’s Guide.
READ MORE: Google claims new Lumiere AI video generator uses space and time together to create stunning clips (Tom’s Guide)
Or as Simmons puts it, “Basically, it all comes down to this space time unit which allows for the video to be created all at once, as opposed to other models which begin with an input frame, an output frame and then generates key frames between those. [With Lumiere] the video is generated all at once.”
Beyond text-to-video generation, Lumiere will also allow for image-to-video generation, stylized generation, which lets users make videos in a specific style, cinemographs that animate only a portion of a video, and inpainting to mask out an area of the video to change the color or pattern.
It can generate 80 frames at 16fps — or five seconds of video — putting it on par and even ahead of its competitors. But Google’s research paper describes “a new inflation scheme which includes learning to downsample the video in both space and time,” which Google says can pave the way to longer (suggesting even “full-length”) clips.
CineD carries an overview of AI video generators to highlight their current capabilities and limitations. This excludes Lumiere but includes leaders like Runway’s Gen-2, Pika 1.0 and Stability AI’s Stable Video Diffusion (SVD).
After testing, Mascha Deikova concludes that AI video generators haven’t yet reached the point where they can take over our jobs as cinematographers or 2D/3D animators.
“The frame-to-frame consistency is not there, the results often have a lot of weird artifacts, and the motion of the characters does not feel even remotely realistic,” she says.
Deikova also finds the overall process still requires “way too much effort” to get a decent generated video that’s close to your initial vision. “It seems easier to take a camera and get the shot that you want ‘the ordinary way,’” she says.
“At the same time, it is not like AI is going to invent its own ideas or carefully work on framing that is the best one for the story. Nor is that something non-filmmakers will be constantly aware of while generating videos.”
There are also other limitations that users of AI video generators should be aware of so that they don’t fall foul of the law. For example, as noted by Deikova, SVD doesn’t allow using its models for commercial purposes. You will face the same issue with Runway and Pika on a free-of-charge basis. At the same time, once you get a paid subscription, Pika will remove their watermark and grant commercial rights.
Lumiere is not available for independent testing, and there is no word as to Google’s timeline for potential deployment. This is perhaps because, as Google’s Lumiere paper noted, “there’s a risk of misuse for creating fake or harmful content with our technology, and we believe that it is crucial to develop and apply tools for detecting biases and malicious use cases to ensure a safe and fair use.”
In CineD’s roundup, it warns that nobody knows what data most of AI video models were trained on. Most possibly, it suggests, the database consists of anything to be found online — therefore images and works of artists who haven’t given their permission nor have received any attribution.
READ MORE: AI Video Generators Tested – Why They Won’t Replace Us Anytime Soon (CineD)
One company forging a different tack is Adobe with its content credentialled image generator Firefly, trained on sources that either Adobe owns or that artists have given the company permission to use. The company is also developing a video AI generator but has yet to release it.
READ MORE: Adobe teases generative AI video tools (Ars Technica)
As of today — and we really mean, like right now — text or image-to-video generators could become a quick solution for ideation (storyboarding), previsualization and animation.
Bu applying visual storytelling tools and “crafting beautiful evolving cinematography” is, right now, one for human minds.
For short films with a decent story we’re probably looking at 18 months to two years out, estimates Tim Simmons of Theoretically Media, speaking on the Curious Refuge podcast. For cinematic quality — equivalent to something you’d watch in a movie theater — we could be only three years away, he predicts.
“These innovations are happening really fast and what I really think is important is for very talented storytellers to take these tools, and to craft their own stories. It’s always going to be a back and forth process but very different from the normal creative process you may be familiar with.”
“AI is elevating everyone into the position of curator to where you can pick through a wide variety of outputs, and select the best one for which whichever story you’re trying to put together,” Caleb Ward asserted on the Curious Refuge podcast. “Essentially, your creative potential has now been multiplied by 50 times using artificial intelligence.”
He noted that, in the past, artists would spend a lot of time mastering software. A big part of their artistic endeavor was figuring out how to master software like Adobe After Effects and Illustrator or Houdini.
Now, however, there are a variety of AI tools to select from — general AI video generators to specific tools for specific jobs. The role of the artist is changing to one in which the skill is knowing which ones to use to achieve the vision.
“So essentially, it’s going to be more important than ever before for everybody to have their own creative taste, and to be able to disseminate from all of the outputs that AI is giving you,” Ward said. “It’s like having a team of creative assistants that are working for you. But that team is also just a little hung over. And so you have to be able to push them in the right direction.”
Shelby Ward, on the same podcast episode, makes the point that AI tools are being developed so rapidly that their output will soon be indistinguishable from Hollywood content. It therefore could democratize the whole content creation industry.
“Think about the types of people who are able to create compelling content, films and stories,” he says. “You don’t have to be networked in Hollywood now to create a film that’s really interesting. And you don’t have to be born into a certain socio-economic status. Really, anybody with average creative tools has the power to create something really, really interesting. And that barrier to entry is only going to shrink and the quality is only going to get better.”