- Innovations like ChatGPT and DALL·E 2 highlight the incredible advances that have taken place with AI, causing professionals in countless fields to wonder whether or not such innovations mean the end of thought leadership or if they should instead focus on the opportunities presented by such tools.
- What do filmmakers and other creative professionals really think about these developments though? What are the top concerns, questions and viewpoints surrounding the recent surge of available AI generative technologies that have recently hit the open market? Should we be worried or simply embrace the technology and forge ahead and let the bodies fall in the wake?
- “As the saying goes – with great power comes great responsibility, and sadly, I think that may not end well for many developers who can’t control the who/where/how the end users utilize these amazing technologies,” writes ProVideo Coalition (PVC) contributor Jeff Foster.
- “AI has already generated huge legal and ethical issues that I suspect will only grow larger. But the genie is out of the bottle – indeed he or she emerged at the Big Bang itself – so let’s work together to figure out how to work with this fast-emerging reality to continue to be storytellers that speak to the human condition,” writes PVC contributor Mark Spencer.
Innovations like ChatGPT and DALL·E 2 highlight the incredible advances that have taken place with AI, causing professionals in countless fields to wonder whether or not such innovations mean the end of thought leadership or if they should instead focus on the opportunities presented by such tools. Even more recently, ProVideo Coalition (PVC) writers have detailed why we need these AI tools as well as how they can be turned into unexpected services.
What do filmmakers and other creative professionals really think about these developments though? What are the top concerns, questions and viewpoints surrounding the recent surge of available AI generative technologies that have recently hit the open market? Should we be worried or simply embrace the technology and forge ahead and let the bodies fall in the wake?
Below is how various PVC writers explored those answers in a conversation took shape over email. You can keep the discussion going in the comments section or on Twitter.
I’m definitely not unbiased as I’m currently engaging with as much of it on a user level as I can get my hands on (and have time to experiment with) and sort out the useful from the useless noise, so I can share my findings with the ProVideo community.
But with that said, I do see some lines being crossed where there may be legitimate concerns that producers and editors will have to keep in mind as we forge ahead and not paint ourselves into a corner – either legally or ethically.
Sure, most of the tools available out there are just testing the waters – especially with the AI image and animation generators. Some are getting really good (except for too many fingers and huge breasts) but when it gets indistinguishable from reality, we may see some pushback.
So the question arises that people generating AI images IN THE STYLE OF [noted artist] or PHOTOGRAPHED BY [noted photographer] if they are in fact infringing on those artists’ copyrights/styles or simply mimicking published works?
It is already being addressed in the legal system in a few lawsuits against certain AI tool developers that will eventually shake out how exactly their tools gather diffusion data it creates (it’s not just copy/paste) so that will either settle the direct copyright infringement argument against artists, or it will be a nail in the coffin for many developers and forbid further access to available online libraries.
The next identifiable technology that raises potential concern IMO are the AI tools that will regenerate facial imagery in film/video for the purpose of dubbing and ratings controls for possible misuse and misinformation.
On that note, I’ve mentioned ElevenLabs in my last article as a highly advanced TTS (Text To Speech) generator that not only allows you to customize and modify voices and speech patterns reading scripted text with astounding realism, but also lets you sample ANY recorded voice and then generate new voice recordings with your text inputs. For example, you could potentially used any A-list celebrity to say whatever marketing blurb you want in a VO or make a politician actually tell the truth (IT COULD HAPPEN!).
But if you could combine those last two technologies together, then we have a potential for a flood of misuse.
I’ve been actively using AI for a feature documentary I’ve been working on the past few years, and it’s made a huge difference on the 1100+ archival images I’ve retouched and enhanced, so I totally see the benefits for filmmakers already. It does add a lot of value to the finished piece and I’m seeing much cleaner productions in high-end feature docs these days.
As recently demonstrated, some powerful tools and (rather complex) workflows are being developed specifically for video & film, to benefit on-screen dubbing and translations without the need for subtitles. It’s only a matter of time before these tools are ready and available for use by the general public.
As the saying goes – with great power comes great responsibility, and sadly, I think that may not end well for many developers who can’t control the who/where/how the end users utilize these amazing technologies.
I am not sure we will see a sudden shift in the production process regarding AI and documentary filmmaking. There is something about being on location with a camera in hand, finding the emotional thread, and framing up to tell a good story. It is nearly impossible to replace the person holding the camera or directing the scene. I think the ability of a director or photographer to light a scene, light multi-camera interviews, and be with a subject through times of stress is irreplaceable.
Yet, AI can easily slip into the pre-production and post-production process for documentary filmmaking. For example, I already use Rev.com for its automatic transcription of interviews and captions. Any technology to make the process of collaborating and increasing the speed of the editing process will run through the post-production work like wildfire. I can remember when we paid production assistants to log reality tv footage. Not only did the transcription look tedious, but it was also expensive to pay for throughout the shoot. Any opportunity to save a production company money will be used.
Then we get to the type of documentary filmmaking that may require the recreation of scenes to tell the story of something that happened sometime before the documentary shoot. I could see documentary producers and editors turn to whatever AI tool to recreate a setting or scenes or even an influential person’s voice. The legal implications are profound, though, and I can see a waterfall of new laws giving notable people intellectual property to a family member’s former image and voice no matter how long ago they passed or at the very least 100 years of control of that image and voice. Whenever there is money to be made from a person’s image or voice, there will be bad actors and those who ask for forgiveness instead of permission, but I bet the legal system will eventually catch up and protect those who want it.
The rights issues are extremely knotty (I’ve recently written about this). On one hand, the extant claims that a trained AI contains “copies of images” are factually incorrect. The trained state of an AI such as Stable Diffusion, which is at the centre of recent legal action, is represented by something like the weights of interconnections in a neural networks, which is not image data. In fact, it’s notoriously difficult to interpret the internal state of a trained AI. Doing that is a major research topic, and our lack of understanding is why, for instance, it’s hard to show why an AI made a certain decision.
It could reasonably be said that the trained state of the AI contains something of the essence of an artist’s work and the artist might reasonably have rights in whatever that essence is. Worse, once an AI becomes capable of convincingly duplicating the style of an artist, probably the AI encompasses a bit more than just the essence of that artist’s work, and our inability to be specific about what that essence really is doesn’t change the fact that the artist really should have rights in it. What makes this really hard is that most jurisdictions do not allow people to copyright a style of artwork, so if a human artist learns how to duplicate someone else’s style, so long as they’re upfront about what they’re doing, that’s fine. What rubs people the wrong way is doing it with a machine which can easily learn to duplicate anyone’s work, or everyone’s work, and which can then flood the market with images in that style which might realistically begin to affect the original artist’s work.
In a wider sense this interacts with the broad issues of employment in general falling off in the face of AI, which is a society-level issue that needs to be addressed. Less skilled work might go first, although perhaps not – the AI can cut a show, but it can’t repair the burst water main without more robotics than we currently have. One big issue coming up, which probably doesn’t even need AI, is self-driving vehicles. Driving is a massive employer. No plans have been made for the mass unemployment that’s going to cause. Reasonable responses might include universal basic income but that’s going to require some quite big thinking economically, and the idea that only certain, hard-to-automate professions have to get up and go to work in the morning is not likely to lead to a contented society.
This is just one of a lot of issues workers might have with AI and so the recent legal action might be seen as an early skirmish in what could be a quite significant war. I think Brian’s right about this not creating sudden shifts in most areas of production. To some extent the film and TV industry already does a lot of things it doesn’t really need to do, such as shooting things on 65mm negative. People do these things because it tickles them. It’s art. That’s not to say there might not likely be pressures to use more efficient techniques when they are available, as has been the case with photochemical film, and that will create another tension (as if there aren’t already a lot) between “show” and “business”. As a species we tend to be blindsided by this sort of thing more than we really should be. We tend to assume things won’t change. Things change.
I do think that certain types of AI information might end up being used to guide decision-making. For instance, it’s quite plausible to imagine NLE software gaining analysis tools which might create the same sort of results that test screenings would. Whether that’s good or not depends how we use this stuff. Smart application of it might be great. Allowing it to become a slave driver might be a disaster, and I think we can all imagine that latter circumstance arising as producers get nervous.
While AI has a lot to offer, and will cause a great deal of change in our field and across society, I don’t think it’ll cause broad, sweeping changes just yet. Artificial Intelligence has been expected to be the next big thing for decades now, and (finally!) some recent breakthroughs are starting to have a more obvious impact. Yet, though ChatGPT, Stable Diffusion, Dalle and Midjourney can be very impressive, they can also fail badly.
ChatGPT seems really smart, but if you ask it about a specialist subject that you know well, it’s likely to come up short. What’s worse than ChatGPT not knowing the answer? Failing to admit it, but instead guessing wrong while sounding confident. Just for fun, I asked it “Who wrote Final Cut Pro Efficient Editing” because that’s the modern equivalent of Googling yourself, right? It’s now told me that both Jeff Greenberg and Michael Wohl wrote the book I wrote in 2020, and I’m not as impressed as I once was.
Don’t get me wrong: if you’re looking for a surface level answer, or something that’s been heavily discussed online, you can get lucky. It can certainly write the script for a very short, cheesy film. (Here’s one it wrote: https://vimeo.com/795582404/b948634f34.) Lazy students are going to love it, but it remains to be seen if it’s really going to change the way we write. My suspicion is that it’ll be used for a lot of low-value content, as AI-based generators like Jasper are already used today, but the higher-value jobs will still go to humans. And that’s a general theme.
Yes, there will be post-production jobs (rotoscoping, transcription) done by humans today which will be heavily AI-assisted tomorrow. Tools like Keyper can mask humans in realtime, WhisperAI does a spectacular job of transcription on your own computer, and there are a host of AI-based tools like Runway which can do amazing tricks. These tasks are mostly technical, though, and decent AI art is something novel. Image generators can create impressive results, albeit with many failures, too many fingers, and lingering ethical and copyright issues. But I don’t think any of these tools are going away now. Technology always disrupts, but we adapt and find a new normal. Some succeed, some fail.
A saving grace is that it’s easy to get an AI model about 95% of the way there, but, the last 5% gets a bit harder, and the final 1% is nearly impossible. Now sometimes that 5% doesn’t matter — a voice recording that’s 95% better is still way better, and a transcription that’s nearly right is easy to clean up. But a roto job where someone’s ears keep flicking in and out of existence is not a roto job the client will accept, and it’s not necessarily something that can be easily amended.
So, if AI is imperfect, it won’t totally replace humans at all the jobs we’re doing today. Many will be displaced, but we’ll get new jobs too. AI will certainly make it into consumer products, where people don’t care if a result is perfect, but to be part of a professional workflow, it’s got to be reliable and editable. There are parallels in other creative fields, too: after all, graphic designers still have a livelihood despite the web-based templated design tool Canva. Yes, Canva took away a lot of boring small jobs, but it doesn’t scale to an annual report or follow brand guidelines. The same amount of good work is being done by the same number of professionals, and there’s a lot more party invitations that look a little better.
For video, there will be a lot more AI-based phone apps that will perform amazing gimmicks. More and better TikTok filters too. There will also be better professional tools that will make our jobs easier and some things a lot quicker — and some, like the voice generation and cleanup tools, will find fans across the creative world. Still, we are a long, long way from clients just asking Siri 2.0 to make their videos for them.
Beyond video, the imperfection of AI is going to heavily delay any society-wide move to self-driving cars. The world is too unpredictable, my Tesla still likes to brake for parked cars on bends, and to move beyond “driver assistance”, self-driving tech has to be perfect. A capability to deal with 99.9999% of situations is not enough if that 0.00001% kills someone. There have been some self-driving successes where the environment is more carefully mapped and controlled, but a general solution is still a way off. That said, I wouldn’t be surprised to see self-driving trucks limited to predictable highway runs arrive soon. And yes, that will put some people out of work.
So what to do? Stay agile, be ready for change. There’s nothing more certain than change. And always remember, as William Gibson said: “The Future is Already Here, it’s Just Not Very Evenly Distributed.”
AI audio tools keep growing. Some that come to mind are Accusonus ERA (currently being bought), Adobe Speech Enhancement, AI Mastering, AudioDenoise, Audo.ai, Auphonic, Descript, Dolby.io, Izotope RX, Krisp, Murf AI Studio, Veed.io and AudioAlter. Of those, I have personally tested Accusonus ERA, Adobe Speech Enhancement, Auphonic, Descript and Izotope RX6.
I have published articles or reviews about a few of those in ProVideo Coalition.
There’s a lot of use of AI and “smart” tools in the audio space. I often think a lot of it is really just snake oil – using “AI” as a marketing term. But in any case, there are some cool products that get you to a solid starting point quickly.
Unfortunately, Accusonus is gone and has seemingly been bought by Meta/Facebook. If not directly bought, then they’ve gone into internal development for Facebook and are no longer making retail plug-ins.
In terms of advanced audio tools, Sonible is making some of the best new plug-ins. Another tool to look at is Adobe’s Podcast application, which is going into public beta. Their voice enhancement feature is available to be used now through the website. Processing is handled in the cloud without any user control. You have to take or leave the results, without any ability to edit them or set preferences.
AI and Machine Learning tools offer some interesting possibilities, but they all suffer from two biases. The first is the bias of the developers and the libraries used to train the software. In some cases that will be personal biases and in others it will be the biases of the available resources. Plenty has been written about the accuracy of dog images versus cat images created by AI tools. Or that of facial recognition flaws with darker skin, including tattoos.
The second large bias is one of recency – mainly the internet. More general and specific data is available from the last 10-20 years using internet resources, than prior. If you want to find niche information prior to the advent of the internet, let’s say before 1985, then it can be a very difficult search. That won’t be something AI will likely access. For example, if you tried to have AI mimic the exact way that Cinedco’s Ediflex software and UI worked, I doubt it would happen, because the available internet data is sparse and it’s so niche.
I think the current state of the software is getting close enough to fool many people and could probably pass the famous Turing test criteria. However, it’s still derivative. AI can take A+B and create C or maybe D and E. What it can’t do today (and maybe never), is take A+B and create K in the style of P and Q with a touch of Z. At least not without some clear guidance to do so. This is the realm of artists to be able to make completely unexpected jumps in the thought process. So maybe we will always be stuck in that 95% realm and the last 1-5% will always be another 5 years out.
Another major flaw in AI and Machine Learning – in spite of the name – is that it does not “learn” based on user training. For instance, Pixelmator Pro uses image recognition to name layers. If I drag in a photo of the Eiffel Tower it will label it generically as tower or building. If I then correct that layer name by changing it to Eiffel Tower, the software does nothing to “learn” from my correction. The next time I drag in the same image, it still gets a generic name, based on shape recognition. So there’s no iterative process of “training” the library files that the software is based on.
I do think that AI will be a good assistant in many cases, but it won’t be perfect. Rotoscoping will still require human finesse (at least for a while). When I do interviews for articles, I record them via Skype or Zoom and then use speech-to-text to create a transcript. From that I will write the article, cleaning up the conversation as needed. Since the software is trying to create a faithful transcription to what the speaker said, I often find that the clean-up effort takes more time and care than if I’d simply listened to the audio and transcribed it myself, editing as it went along. So AI is not always a time-saver.
There are certainly legal questions. At what point is an AI-generated image an outright forgery? How will college professors know whether the student’s paper is original versus something created through ChatGPT? I heard yesterday that actual handwriting is being pushed in some schools again, precisely because of such concerns (along with the general need to have legible writing). Certainly challenging ethical times ahead.
I think that in the world of film we have a bit of breathing room when it comes to advances in AI bringing significant changes and perhaps a bit of an early warning of what might be to come. Our AI tools are largely technical rather than creative, and the creative ones less well developed compared to the image and text creation tools, so they don’t yet pose much of a challenge to our livelihoods and the legal issues aren’t as complicated. For example, AI noise reduction or upscaling – they are effectively fixing our mistakes – and there isn’t much need for the models to be trained on data they might not have legal access to (though I imagine behind the scenes this is an important topic for them, as getting access to high quality training data would improve their product).
I see friends who are writers or artists battling to deal with the sudden changes in the AI landscape. I know copywriters whose clients are asking them if they can’t just use ChatGPT now to save them money or others saying their original writing has been falsely flagged as AI-generated by an AI analysis tool and while I’m sure the irony is not lost on them, it doesn’t lessen their stress. So in terms of livelihoods and employment I think there are real ethical issues, though I have no idea how they can be solved, aside from trusting that creative people will always adapt, though that takes time and the suddenness of all this has been hard for many.
On the legal side, I feel like there is a massive amount of catching up to do and it will be fascinating to see how these current cases work out. It feels like we need a whole new set of legal precedents to deal with emerging AI tools, aside from just what training data the models can access. Looking at the example of deepfakes, I love what a talented comedian and voice impersonator like Charlie Hopkinson can do with it – I love watching Gandalf or Obi-Wan roasting their own shows – but every time I watch, I wonder what Sir Ian McKellen would think – though somehow I think he would take it quite well. Charlie does put a brief disclaimer on the videos, but that doesn’t feel enough to me. I would have thought the bare minimum would be a permanent disclaimer watermark, let alone a signed permission from the owner of that face! I think YouTube has put some work into this, focusing more on the political or the even less savoury uses, which of course are more important, but more needs to be done.
I think we in the worlds of production and post would be wise to keep an eye on all the changes happening so we can stay ahead and make them work to our advantage.
I have been experiencing a sense of excitement and wonderment over the most recent developments in AI.
It’s accelerating. And at the same time, I’m cynical – I’ve read/watched exciting research (sometimes from SIGGRAPH, sometimes from some smaller projects) that never seems to see the light of day.
About six years ago, I did some consulting work around machine learning and have felt like a child in a candy store, discovering something new and fascinating around every corner.
Am I worried about AI from a professional standpoint?
Nope. Not until they can handle clients.
If the chatbots I encounter are any indicators? It’s going to be a while.
For post-production? It’s frustrating when the tools don’t work. Because there’s no workaround that will fix it when it fails.
ChatGPT is an excellent example of this. It’s correct (passing the bar, passing the MCAT), until it’s confidentially incorrect. It gave me answers that just don’t exist/aren’t possible. How is someone to evaluate this?
If you use ChatGPT as your lawyer, and it’s wrong, where does the liability live?
That’s the key in many aspects – it needs guidance, a professional who knows what they’re doing.
In creating something from nothing: There are a couple of areas that are in the crosshairs
- Text2image. That works sorta well. The video is a little harder.
- Music generation. I totally expect this to be a legal nightmare. When the AI generates something close to an existing set of chords, who (if anyone) gets a payment? If you use it in your video, who owns the rights to that synthetic music
- Speech generation. We’ve been cloning voice decently (see Descript’s lyrebird and the newer Elevenlabs voice synthesis). Elevenlabs has at least priced it heavily – suddenly, audiobook generation with different voices for different characters will make it more difficult to make a living as a voice artist.
- Deepfakes. It’s still a long way from easy face replacement.
These tools excite me most in the functional areas instead of the “from scratch” perspective.
Taking painful things and reducing the difficulty.
That’s what good tools should always do, especially when they leave the artist the ability to influence the guidance.
- OpenAI’s Whisper really beats the pants off other speech-to-text tools. I’m dying just to edit the text. Descript does this, which is close to what I want.
- Colourlab.ai‘s matching models – 100% what I’m talking about. Different matching models, a quick pick, and you’re on your way. (Disclaimer, I do some work for Colourlab.
- Adobe’s Remix is a great example of this. It’s totally workable for nearly anyone and is like magic. It takes this painful act of splicing music to itself (shorter or longer) and makes it easy.
The brightest future.
You film an interview. You read the text, clean it up, and tell a great story.
Except there’s an issue – something is unclear in the statements made. You get clearance from the interviewee about the rephrasing of a statement. Then use an AI voice model of their voice to form the words. And another to re-animate the lips to look like the subject said it.
This is “almost here.”
The dark version of this?
It’s society-level scary (but so are auto-driving cars that can’t really recognize children, which one automaker is struggling with.)
Here’s a scary version: You get a phone call, and you think it’s a parent or significant other. It’s not. It’s a cloned voice and something like ChatGPT trained on content that can actually respond in near-real time. I’ll leave the “creepy” factor up to you here.
Jeff Foster brings up this question – what happens when we can convincingly make people say what we want?
At some level, we’ve had that power for over a decade. Just the fact that we could take a word out of someone’s interview gives us that power. It’ll just make that easier/more accessible. As well as “I didn’t say that; it was AI” being a defense.
It’s going to be ugly because our lawmakers, and our judicial system, can’t educate themselves quickly enough if the past is any indication.
Generative AI’s isn’t “one-click”
As Iain pointed out in the script he had ChatGPT write, it did the job, it found the format, but it wasn’t very good.
I wonder how it would help me around writer’s block?
Generative text is pretty scary – and may disrupt Google.
Since Google is based on inbound/outbound links – it’s going to be very soon that the blog spam will explode even more, and it’ll be harder to tell what content is well written and what is not.
Unless it comes from a specific person you trust.
And as Oliver pointed out, it’s problematic until I can train it with my data – it needs an artist.
The lack of being able to re-train will mean that failures will consistently fail. Then we’re in workaround hell.
Personally I believe that AI technologies are going to cause absolutely massive disruption not just to the production and post-production industries, but across the entire gamut of human activity in ways we can’t even imagine.
In the broadest sense, the course of evolution has been one of increasing complexity, often with exponential jumps (e.g., Big Bang, Cambrian explosion, Industrial Revolution). AI is a vehicle for another exponential leap. It is extraordinarily exciting and terrifying, fraught with danger, yet it will also create huge new opportunities.
How do we position ourselves to benefit from, or at least survive, this next revolution?
I’d suggest moving away from any task or process that AI is likely to take over in the short term. Our focus should be on what humans (currently) do better than AI. Billy Oppenheimer, in his article on The Coffee Cup Theory of AI, calls this Taste and Discernment. Your ability to connect to other humans through your storytelling, to tell the difference between the great and the good, to choose the line of dialog, the lighting, the composition, the character, the blocking, the take, the edit, the sound design…and use AI along the way to create all the scenarios from which you use your developed sense of taste to discern what will connect with an audience.
AI has already generated huge legal and ethical issues that I suspect will only grow larger. But the genie is out of the bottle – indeed he or she emerged at the Big Bang itself – so let’s work together to figure out how to work with this fast-emerging reality to continue to be storytellers that speak to the human condition.
(These words written by me with no AI assistance :-))