READ MORE: Artificial Intelligence in Video Production (ProVideo Coalition)
Automatic captioning, audio cleanup and image generation from a text prompt are just some of the efficiency-saving and creativity-injecting AI tools available in smartphone apps, regular NLE software, and often just for free.
“The AI/ML genie isn’t going back in its bottle, and for some specialties like painting, drawing, and concept art, a lot of work is about to dry up,” says Iain Anderson at ProVideo Coalition. “But it’s not all doom and gloom. While an ML system can get a lot done very quickly, humans still bring unique skills and creativity, and an artist who knows how to drive an image generation machine can use it to their advantage rather than bemoaning its existence.”
Going forward, the smart artist should focus on what an AI-augmented toolkit can do for them. Here are some of them in what is a rapidly evolving part of the industry.
Automatic captioning (available now via Premiere Pro, iOS, YouTube and Vimeo) is a neat AI trick and a big time saver. This is only going to get better, and it’s essentially free, today.
To Anderson, the interesting thing is not saving a bit of time creating captions of your finished edit, it’s getting free, accurate transcriptions of every source clip to make your editing job easier. Paid solutions like Builder and ScriptSync have offered this service in many different forms, but if AI-derived time-coded transcriptions become free, accurate, and quick enough, text-assisted editing will become a much more widely used workflow.
EXPLORING ARTIFICIAL INTELLIGENCE:
With nearly half of all media and media tech companies incorporating Artificial Intelligence into their operations or product lines, AI and machine learning tools are rapidly transforming content creation, delivery and consumption. Find out what you need to know with these essential insights curated from the NAB Amplify archives:
- This Will Be Your 2032: Quantum Sensors, AI With Feeling, and Life Beyond Glass
- Learn How Data, AI and Automation Will Shape Your Future
- Where Are We With AI and ML in M&E?
- How Creativity and Data Are a Match Made in Hollywood/Heaven
- How to Process the Difference Between AI and Machine Learning
Need a tool to automatically recognize people and remove the background behind them? It’s called Keyper, and it works in Final Cut Pro, Motion, Premiere Pro and After Effects right now.
“While it’s not 100% perfect, it’s good enough to let you color-correct a person separate from their backgrounds, or to place text partly behind a person,” reviews Anderson. “If you can control the shot at least a little, it’s like a virtual green screen you don’t have to set up, and can work wonders.”
Simply cloning one part of an image over another is mechanical, but Photoshop’s Content-Aware Fill is a fair bit smarter. Known more generally as “inpainting,” it’s an impressive technique that’s essential for many VFX tasks. After Effects incorporated a video-ready version of this tech a few versions back. Runway can do that too.
“The AI/ML genie isn’t going back in its bottle, and for some specialties like painting, drawing, and concept art, a lot of work is about to dry up. But it’s not all doom and gloom. While an ML system can get a lot done very quickly, humans still bring unique skills and creativity, and an artist who knows how to drive an image generation machine can use it to their advantage rather than bemoaning its existence.”
— Iain Anderson
“Soon, you should also be able to use ML-based techniques to automatically select an object, a trick which even your iPhone can do now. Another feature to be added is text recognition of your requests, so you’ll just be able to ask to ‘remove the bin in that shot’ and it will.”
Bad audio is easier to clean up than ever. iZotope RX’s Voice Denoise and the new Voice Isolation feature in Final Cut Pro use ML-trained models of what voices should and shouldn’t sound like. But beyond cleanup, modern AI methods can transform one person’s voice into another’s.
Koe Recast isn’t perfect but it’s still crazily good, according to Anderson. It lets you turn your voice (or a recording) into one of a selection of replacement voices, with a decent amount of emotion, and with far better results than the terrible robotic nonsense used by cheap YouTube ads today.
DALL-E 2, Midjourney and Stable Diffusion create realistic or artistic images from text prompts.
These automatic image generation engines have been trained on millions of the images on the internet, identified by surrounding text or their alt tags, and do their magic essentially by guided random chance. Starting from noise, the image is then randomly changed many, many times. Each new generation is assessed to discover which is the closest match for the text prompt you’ve provided, and as the process repeats, it eventually produces a coherent image.
If you’re determined enough, you can also use these techniques to create an animation, as artist Paul Trillo did below using DALL-E 2 and Runway’s AI-powered morphing.
In fact, Runway promises to integrate this tech directly into their editing solution for a seamless “replace the background of this shot with a Japanese garden” experience.
This is just the start, and when you start combining image generation with other open-source projects — for seamless texture generation, AI-based upscalers, and deepfake-style face mapping — you’ll start to see the potential.