Make-A-Video: Text-to-Video Generation’s Next... Generation?

The text prompt for this was “a confused grizzly bear in calculus class.” — The text prompt for this Make-A-Video was “a confused grizzly bear in calculus class” (and that looks about right).

ALSO ON NAB AMPLIFY:

Dall-e Courtesy of OpenAI artificial intelligence AI

What Will DALL-E Mean for the Future of Creativity?

“Generative AI research is pushing creative expression forward by giving people tools to quickly and easily create new content,” Meta stated in a blog post announcing the new AI tool. “With just a few words or lines of text, Make-A-Video can bring imagination to life and create one-of-a-kind videos full of vivid colors and landscapes.”

In a Facebook post, Meta CEO Mark Zuckerberg described the work as “amazing progress,” adding, “It’s much harder to generate video than photos because beyond correctly generating each pixel, the system also has to predict how they’ll change over time.”

Examples on Make-A-Video’s announcement page include “a young couple walking in heavy rain” and “a teddy bear painting a portrait.” It also showcases Make-A-Video’s ability to take a static source image and animate it. For example, a still photo of a sea turtle, once processed through the AI model, can appear to be swimming.

The Make-A-Video prompt here was: “A teddy bear painting a portrait” Image courtesy of Meta — The Make-A-Video prompt here was: “A teddy bear painting a portrait” Video courtesy of Meta

The key technology behind Make-A-Video — and why it has arrived sooner than some experts anticipated — is that it builds off existing work with text-to-image synthesis used with image generators like OpenAI’s DALL-E. Meta announced its own text-to-image AI model in July.

ALSO ON NAB AMPLIFY:

AI Can Produce Visuals We Can’t Even Imagine, So Maybe We Should Just Enjoy It

According to Benj Edwards at Arts Technica, instead of training the Make-A-Video model on labeled video data (for example, captioned descriptions of the actions depicted), Meta instead took image synthesis data (still images trained with captions) and applied unlabeled video training data so the model learns a sense of where a text or image prompt might exist in time and space. It can then predict what comes after the image and display the scene in motion for a short period.

ALSO ON NAB AMPLIFY:

content creator creator economy NFT Vermeer

Recognizing Ourselves in AI-Generated Art

Cracking the code to create photorealistic video on demand — and then drive it with a narrative — is exercising other minds too.

Chinese researchers are behind another text-to-video model named CogVideo, OpenAI is also thought to be working on one, and no doubt there are numerous other initiatives in the works.

EXPLORING ARTIFICIAL INTELLIGENCE:

With nearly half of all media and media tech companies incorporating Artificial Intelligence into their operations or product lines, AI and machine learning tools are rapidly transforming content creation, delivery and consumption. Find out what you need to know with these essential insights curated from the NAB Amplify archives:

Make-A-Video: Text-to-Video Generation’s Next… Generation?

READ MORE: Introducing Make-A-Video: An AI system that generates videos from text (Meta)

ALSO ON NAB AMPLIFY:

What Will DALL-E Mean for the Future of Creativity?

ALSO ON NAB AMPLIFY:

AI Can Produce Visuals We Can’t Even Imagine, So Maybe We Should Just Enjoy It

READ MORE: Meta announces Make-A-Video, which generates video from text (Arts Technica)

READ MORE: Meta’s new text-to-video AI generator is like DALL-E for video (The Verge)

READ MORE: Make-A-Video: Text-To-Video Generation Without Text-Video Data (Meta)

ALSO ON NAB AMPLIFY:

Recognizing Ourselves in AI-Generated Art

EXPLORING ARTIFICIAL INTELLIGENCE:

Are you interested in contributing ideas, suggestions or opinions? We’d love to hear from you. Email us here.

Subscribe