OpenAI has paused a voice mode option for ChatGPT-4o, Sky, after backlash accusing the AI company of intentionally ripping off Scarlett Johansson’s critically acclaimed voice-acting performance in the 2013 sci-fi film Her.
In a blog defending its casting decision for Sky, OpenAI went into great detail explaining its process for choosing the individual voice options for its chatbot. But ultimately, the company seemed pressed to admit that Sky’s voice was just too similar to Johansson’s to keep using it, at least for now.
“We believe that AI voices should not deliberately mimic a celebrity’s distinctive voice—Sky’s voice is not an imitation of Scarlett Johansson but belongs to a different professional actress using her own natural speaking voice,” OpenAI’s blog said.
OpenAI is not naming the actress, or any of the ChatGPT-4o voice actors, to protect their privacy.
After this story was published, Johansson released a statement saying she was approached by OpenAI CEO Sam Altman to be the AI voice.
Last September, I received an offer from Sam Altman, who wanted to hire me to voice the current ChatGPT 4.0 system. He told me that he felt that by my voicing the system, I could bridge the gap between tech companies and creatives and help consumers to feel comfortable with the seismic shift concerning humans and Al. He said he felt that my voice would be comforting to people.
After much consideration and for personal reasons, declined the offer.
Nine months later, my friends, family and the general public all noted how much the newest system named “Sky” sounded like me.
When I heard the released demo, I was shocked, angered and in disbelief that Mr. Altman would pursue a voice that sounded so eerily similar to mine that my closest friends and news outlets could not tell the difference. Mr. Altman even insinuated that the similarity was intentional, tweeting a single word “her” – a reference to the film in which I voiced a chat system, Samantha, who forms an intimate relationship with a human.
Two days before the ChatGPT 4.0 demo was released, Mr. Altman contacted my agent, asking me to reconsider. Before we could connect, the system was out there.
As a result of their actions, I was forced to hire legal counsel, who wrote two letters to Mr. Altman and OpenAl, setting out what they had done and asking them to detail the exact process by which they created the “Sky” voice. Consequently, OpenAl reluctantly agreed to take down the “Sky” voice.
In a time when we are all grappling with deepfakes and the protection of our own likeness, our own work, our own identities, I believe these are questions that deserve absolute clarity. I look forward to resolution in the form of transparency and the passage of appropriate legislation to help ensure that individual rights are protected.
A week ago, Altman seemed to invite this controversy by posting “her” on X (formerly Twitter) after announcing the ChatGPT audio-video features that he said made it more “natural” for users to interact with the chatbot.
Altman has said that Her, a movie about a man who falls in love with his virtual assistant, is among his favorite movies. He told conference attendees at Dreamforce last year that the movie “was incredibly prophetic” when depicting “interaction models of how people use AI,” The San Francisco Standard reported. And just last week, Altman touted GPT-4o’s new voice mode by promising, “it feels like AI from the movies.”
But OpenAI’s chief technology officer, Mira Murati, has said that GPT-4o’s voice modes were less inspired by Her than by studying the “really natural, rich, and interactive” aspects of human conversation, The Wall Street Journal reported.
In 2013, of course, critics praised Johansson’s Her performance as expressively capturing a wide range of emotions, which is exactly what Murati described as OpenAI’s goals for its chatbot voices. Rolling Stone noted how effectively Johansson naturally navigated between “tones sweet, sexy, caring, manipulative, and scary.” Johansson achieved this, the Hollywood Reporter said, by using a “vivacious female voice that breaks attractively but also has an inviting deeper register.”
Her director/screenwriter Spike Jonze was so intent on finding the right voice for his film’s virtual assistant that he replaced British actor Samantha Morton late in the film’s production. According to Vulture, Jonze realized that Morton’s “maternal, loving, vaguely British, and almost ghostly” voice didn’t fit his film as well as Johansson’s “younger,” “more impassioned” voice, which he said brought “more yearning.”
Late-night shows had fun mocking OpenAI’s demo featuring the Sky voice. The demo showed the chatbot seemingly flirting with engineers, giggling through responses like “Oh, stop it. You’re making me blush.” Where The New York Times described these demo interactions as Sky being “deferential and wholly focused on the user,” The Daily Show‘s Desi Lydic joked that Sky was “clearly programmed to feed dudes’ egos.”
This is the best (and funniest!) take I’ve seen on the whole GPT-4o situation so far 😂 https://t.co/4CAJ9e1Vxh
— Sasha Luccioni, PhD 🦋🌎✨🤗 (@SashaMTL) May 20, 2024
OpenAI is likely hoping to avoid any further controversy amid plans to roll out more voices soon that its blog said will “better match the diverse interests and preferences of users.”
OpenAI did not immediately respond to Ars’ request for comment.
Voice actors versus AI
The OpenAI controversy arrives at a moment when many are questioning AI’s impact on creative communities, triggering early lawsuits from artists and book authors. Just this month, Sony opted all of its artists out of AI training to stop voice clones from ripping off top talents like Adele and Beyoncé.
Voice actors, too, have been monitoring increasingly sophisticated AI voice generators, waiting to see what threat AI might pose to future work opportunities. Recently, two actors sued an AI startup called Lovo that they claimed “illegally used recordings of their voices to create technology that can compete with their voice work,” The New York Times reported. According to that lawsuit, Lovo allegedly used the actors’ actual voice clips to clone their voices.
“We don’t know how many other people have been affected,” the actors’ lawyer, Steve Cohen, told The Times.
Rather than replacing voice actors, OpenAI’s blog said that they are striving to support the voice industry when creating chatbots that will laugh at your jokes or mimic your mood. On top of paying voice actors “compensation above top-of-market rates,” OpenAI said they “worked with industry-leading casting and directing professionals to narrow down over 400 submissions” to the five voice options in the initial rollout of audio-video features.
Their goals in hiring voice actors were to hire talents “from diverse backgrounds or who could speak multiple languages,” casting actors who had voices that feel “timeless” and “inspire trust.” To OpenAI, that meant finding actors who have a “warm, engaging, confidence-inspiring, charismatic voice with rich tone” that sounds “natural and easy to listen to.”
For ChatGPT-4o’s first five voice actors, the gig lasted about five months before leading to more work, OpenAI said.
“We are continuing to collaborate with the actors, who have contributed additional work for audio research and new voice capabilities in GPT-4o,” OpenAI said.
Arguably, these actors are helping to train AI tools that could one day replace them, though. Backlash defending Johansson—one of the world’s highest-paid actors—perhaps shows that fans won’t take direct mimicry of any of Hollywood’s biggest stars lightly, though.
While criticism of the Sky voice seemed widespread, some fans think that OpenAI has overreacted by pausing the Sky voice.
NYT critic Alissa Wilkinson wrote that it was only “a tad jarring” to hear Sky’s voice because “she sounded a whole lot” like Johansson. And replying to OpenAI’s X post announcing its decision to pull the voice feature for now, a clump of fans protested the AI company’s “bad decision,” with some complaining that Sky was the “best” and “hottest” voice.
At least one fan noted that OpenAI’s decision seemed to hurt the voice actor behind Sky most.
“Super unfair for the Sky voice actress,” a user called Ate-a-Pi wrote. “Just because she sounds like ScarJo, now she can never make money again. Insane.”
This story and headline were updated to include Scarlett Johansson’s statement.