The Ways AI Is Going to Revolutionize Filmmaking

READ MORE: Get ready for the next generation of AI (MIT Technology Review)

Only a few months ago the art world was agog at breakthroughs in text-to-image synthesis, but already new models have arrived capable of text-to-video. Advances in the field have been so swift that Meta’s Make-A-Video — announced just three weeks ago — looks basic.

Another generative AI, called Phenaki, can create video from a still image and a prompt rather than a text prompt alone. It can also make far longer clips and users can create videos multiple minutes long based on several different prompts that form the script for the video. The example given by MIT Technology Review’s Melissa Heikkilä is of “a photorealistic teddy bear is swimming in the ocean at San Francisco. The teddy bear goes underwater. The teddy bear keeps swimming under the water with colorful fishes. A panda bear is swimming underwater.”

As a generative AI tool, Phenaki can create video clips from still images and text prompts. Cr: Phenaki

Heikkilä writes that “a technology like this could revolutionize filmmaking and animation. It’s frankly amazing how quickly this happened. DALL-E was launched just last year. It’s both extremely exciting and slightly horrifying to think where we’ll be this time next year.”

A white paper, “Phenaki: Variable Length Video Generation from Open Domain Textual Descriptions,” explains how generating videos from text is particularly challenging due to the computational cost, limited quantities of high-quality text-video data and variable length of videos. To address these issues, Phenaki compresses the video to a small representation of “discrete tokens.” The paper shares that “this tokenizer uses causal attention in time, which allows it to work with variable-length videos.”

“As the technology develops, there are fears it could be harnessed as a powerful tool to create and disseminate misinformation. It’s only going to become harder and harder to know what’s real online, and video AI opens up a slew of unique dangers that audio and images don’t, such as the prospect of turbo-charged deepfakes.”
— Melissa Heikkilä

It goes on to explain how it achieves a compressed representation of video. “Previous work on text to video either use per-frame image encoders or fixed length video encoders. The former allows for generating videos of arbitrary length, however in practice, the videos have to be short because the encoder does not compress the videos in time and the tokens are highly redundant in consecutive frames.

“The latter is more efficient in the number of tokens but it does not allow to generate variable length videos,” the paper says. “In Phenaki, our goal is to generate videos of variable length while keeping the number of video tokens to a minimum so they can be modeled… within current computational limitations.”

The Ways AI Is Going to Revolutionize Filmmaking

READ MORE: Get ready for the next generation of AI (MIT Technology Review)

READ MORE: Phenaki: Variable Length Video Generation from Open Domain Textual Descriptions (Phenaki)

READ MORE: DreamFusion: Text-to-3D using 2D Diffusion (DreamFusion)

READ MORE: Video fake news believed more, shared more than text and audio versions (Pennsylvania State University)

READ MORE: A quick guide to the most important AI law you’ve never heard of (MIT Technology Review)

AI ART — I DON’T KNOW WHAT IT IS BUT I KNOW WHEN I LIKE IT:

Are you interested in contributing ideas, suggestions or opinions? We’d love to hear from you. Email us here.

Subscribe