被斯嘉丽约翰逊指控刻意模仿声线，OpenAI将停用一款声线

The 21st Century Economic Reporter Feng Runge reports from Guangzhou

The science fiction film "Her" 11 years ago told a story that the hero frequently communicated with Samantha, an AI assistant with the ability to feel, think, learn and make decisions independently, and finally fell in love. The GPT-4o recently released by OpenAI seems to confirm the possibility of this fantasy becoming a reality - at the other end of the screen, AI can really feel human emotions and communicate with people.

This generation of products may be inspired by "He". After the product launch, Sam Altman, who did not appear at the press conference, released a one word tweet: "He".

However, the "two-way rush" is not peaceful. Recently, Scarlett Johnson, who plays Samantha in Her, accused one of the voices in the new OpenAI product was too similar to her voice, which made her feel "shocked, angry and unbelievable".

each sticks to his argument

OpenAI's GPT-4o is more powerful than the previous generation. Besides text, it can also infer audio and video in real time. This generation of products can not only add emotion to their voice according to requirements, but also infer a person's emotional state by watching a self portrait video of the user's face. In the audio interaction function, ChatGPT provides five sounds: Breeze, Cove, Ember, Juniper and Sky.

Called by Scarlett and asked to be taken off the shelf is "Sky". She believes that the voice of Sky is very similar to that of Samantha, the AI assistant in Her.

According to Scarlett's long article, when ChatGPT launched its voice mode last September, OpenAI asked her to voice it, but she refused for personal reasons at that time. Two days before the press conference in May this year, OpenAI tried again to contact Scarlett for cooperation, but failed.

However, after the launch of the product, many people believed that Sky and Scarlett's voice in the movie were very similar. Therefore, Scarlett believes that OpenAI may have deliberately asked someone to imitate her voice.

Scarlett said in her long article that due to OpenAI's infringement, she has asked OpenAI to explain the creation process of "Sky" voice in detail through legal counsel.

OpenAI insists that the voice is from another professional voice actor. "Sky's voice is not imitating Scarlett Johnson, but belongs to a different professional actress. She uses her own natural voice," OpenAI said in its response blog. However, the company also pointed out that for "privacy protection" reasons, OpenAI would not further disclose the names of voice actors.

In the official website response, OpenAI disclosed the detailed creation process of five voices in ChatGPT. OpenAI solicited more than 400 voice actors to review the works of all participants in terms of emotion and voice characteristics, and finally made a choice.

During the period from June to July last year, the selected voice actors completed recording in Los Angeles. In September, the five selected voices were released to ChatGPT.

However, due to Scarlett's accusation, OpenAI has suspended the use of "Sky" mode voice. "We believe that AI voice should not deliberately imitate the unique voice of celebrities," OpenAI said in the same clarification blog post.

So far, the two sides have each held their own views, and there has been no new response or litigation action.

"On the whole, you can't take the" hitchhiker "of celebrities arbitrarily." In the opinion of You Yunting, senior partner of Shanghai Dabang Law Firm, if you use the voice of people who are similar to celebrities for training, and also use the movie marketing played by celebrities, it is "borderline".

Not an isolated case

OpenAI recently ushered in another round of personnel changes. In addition to Ilya Sutskever, the chief scientist, Jan Leike, head of the company's super alignment team, also announced his resignation. One of the main functions of the super alignment team in OpenAI is to guard against AI risks, evaluate and supervise the AI developed. Therefore, this team change has also triggered the industry's concern about the risks associated with AI technology.

In fact, Scarlett's hesitation and accusation of AI "plagiarism" may not be a groundless concern of a few people.

Recently, American voice actors Paul Lehrman and Linnea Sage accused AI startup Lovo of illegally using and selling their voice in AI voice dubbing technology without their own permission. In response to this act, the two actors asked the company to compensate $5 million. In addition to this lawsuit, they also tried to join other people whose voices were stolen to initiate a class action lawsuit.

Also in May, Sony issued a "warning letter" to more than 700 global technology companies and music platforms, asking them not to use their works to train AI without explicit permission. According to media reports, Google, Microsoft, OpenAI and other AI giants were all warned. Sony also issued a separate statement later, emphasizing that it is prohibited to use the group's music works (including songs, lyrics, album covers, etc.) for any content related to AI without authorization.

Earlier in 2023, in October, Universal Music and other three record companies collectively sued Anthropic for infringing the use of the music work training model, and claimed 75 million dollars.

In China, there are relevant precedents for the compliant use of personal voice in AI services in the current judicial practice.

In April, the Beijing Internet Court announced the country's first "AI voice infringement case" judgment. In this case, the defendant sold the voice of the plaintiff's voice dubber after it was AIized. In the end, the court found that the defendant infringed, and emphasized that the authorization of audio recordings does not mean the authorization of voice AIzation, and the unauthorized use or permission of others to use the voice in audio recordings without the permission of the obligee constitutes infringement.

In You Yunting's view, the key to protecting individual rights and interests in the AI industry from being infringed lies in service providers and developers: first, the use of others' voice portraits for training needs authorization; second, the generated content should be labeled as generated by AI. "In addition, according to China's current laws and regulations, AI service providers have the obligation to review the content generated by the platform," he pointed out.