Worries about digital actors replacing the real thing remain on the horizon but closer to home synthetic voices are already “playing” video-game characters, and acting corporate videos. Could they put human voice talent out of a job?
As Karen Hao puts it in an article for MIT Technology Review, “AI voices are also cheap, scalable, and easy to work with.”
We’re all used to having Alexa and Siri or the digital navigator in our cars talk to us. Often the dialogue is a little clunky. Getting them to sound any more natural has been a laborious manual task.
Advances in deep learning have changed that. “Voice developers no longer needed to dictate the exact pacing, pronunciation, or intonation of the generated speech,” says Hao. “Instead, they could feed a few hours of audio into an algorithm and have the algorithm learn those patterns on its own.”
READ MORE: AI voice actors sound more human than ever—and they’re ready to hire (MIT Technology Review)
A number of startups are leveraging this to create artificial voice actors for hire.
Seattle’s WellSaid Labs claims to create voiceover “with AI voices indistinguishable from real ones.” It invites you to audition different voices based on style, gender, and the type of production you’re working on.
Capturing these nuances involves finding the right voice actors to supply the appropriate training data and fine-tune the deep-learning models. WellSaid tells Technology Review that the process requires at least an hour or two of audio and a few weeks of labor to develop a realistic-sounding synthetic replica.
It also points out that every voice on it platform is built with the “written consent of the talent who lent us their voice to create an AI likeness.” The company will never clone someone’s voice without their approval, WellSaid adds.
With nearly half of all media and media tech companies incorporating Artificial Intelligence into their operations or product lines, AI and machine learning tools are rapidly transforming content creation, delivery and consumption. Find out what you need to know with these essential insights curated from the NAB Amplify archives:
- This Will Be Your 2032: Quantum Sensors, AI With Feeling, and Life Beyond Glass
- Learn How Data, AI and Automation Will Shape Your Future
- Where Are We With AI and ML in M&E?
- How Creativity and Data Are a Match Made in Hollywood/Heaven
- How to Process the Difference Between AI and Machine Learning
Sonantic.io makes voices for video-game characters. “Reduce production timelines from months to minutes by rapidly transforming scripts into audio,” it claims. Users can create “highly expressive, nuanced performances” with “full control over voice performance parameters.”
It is also at pains to point out the ethical use of our technology. In accordance with the Ethics Guidelines for Trustworthy AI, “we make sure our algorithms are never trained on publicly available data without the voice owner’s permission.”
Unlike a recording of a human voice actor, AI voices can also update their script in real time, opening up new opportunities to personalize advertising.
VOCALiD builds custom voices that match a company’s brand identity. “Brands have thought about their colors,” says VOCALiD founder and CEO Rupal Patel. “They’ve thought about their fonts. Now they’ve got to start thinking about the way their voice sounds as well.”
Sonantic says many of its clients use the synthesized voices only in pre-production and switch to real voice actors for the final production. But it also says a few have started using them throughout the process, perhaps for characters with fewer lines. Resemble.ai says it has worked with film and TV producers to patch up actors’ performances when words get garbled or mispronounced.
“Our characters are all about emotional performance,” says Guy Gadney, CEO of Charisma AI, an interactive storytelling platform. “Siri, Alexa and other voices are monotonous, but Charisma characters come to life, get happy, sad, angry. Resemble’s capabilities in this regard are awesome and their markup language gave us the flexibility we needed to achieve our goals.”
Then news broke that the producers of Roadrunner: A Film about Anthony Bourdain had used AI simulate the television host’s voice for three lines of synthetic audio. Lines which he wrote in text but never said. In an interview with The New Yorker, the film’s director, Morgan Neville, revealed that he fed roughly 12 hours of Bourdain’s voiceovers into an AI model for narration of emails Bourdain wrote, totaling about 45 seconds. Reaction to the news was startlingly, if perhaps predictably angry. Some outright dismissed the film, because it has mislead the audience in this way.
Hao reports that actor’s union SAG-AFTRA expressed concern that some actors have grown increasingly worried about their livelihoods. They’re worried about being compensated unfairly or losing control over their voices, which constitute their brand and reputation.
This is now the subject of a lawsuit against TikTok brought by the Canadian voice actor Bev Standing, who alleges that the app’s built-in voice-over feature uses a synthetic copy of her voice without her permission.
Some companies are looking to be more accountable in how they engage with the voice-acting industry. The best ones, says SAG-AFTRA’s rep, have approached the union to figure out the best way to compensate and respect voice actors for their work.