OpenAI has unveiled an audio feature that reads text and clones human voices.


A spokesperson said that the OpenAI is sharing initial demos and use cases from a small-scale preview of a text-to-speech model called Voice Engine, which it has shared with about 10 developers so far.A spokesperson said that the company is sharing initial demos and use cases from a small-scale preview of a text-to-speech model called Voice Engine, which it has shared with about 10 developers so far.

Deepfake risks may arise as a result of OpenAI release of the first test results of a programme that can read words aloud in a human voice, highlighting a new use for artificial intelligence.

A spokesperson mentioned that the OpenAI is sharing initial demos and use cases from a small-scale preview of a text-to-speech model named Voice Engine, which it has so far shared with about 10 developers.


OpenAI has decided against a wide rollout of this feature, which it had informed reporters about at the beginning of this month.

Before opting to reduce the release, OpenAI took into account input from many stakeholders such as academics, artists, CEOs, and politicians, according to an official from the company. The company said during a previous press conference that the tool will be available to more than 100 developers through an application process.

In a blog post that went up on Friday, the business stated that “we believe there are serious risks in creating speech that closely mimics people’s voices, which are especially top of mind during an election year.” “We are connecting with American and international partners across government, media, entertainment, education, civil society, and beyond to ensure that their feedback is also included as we build.”

The use of other AI technologies has already been made in some contexts for imitating fake voices. In January, a fake but realistic-sounding phone call, purported to be from President Joe Biden, encouraged people in New Hampshire not to vote in the primaries – an incident that heightened fears of AI ahead of significant global elections.

The Voice Engine can generate speech that sounds like specific persons, complete with their own tone and cadence, unlike OpenAI’s earlier attempts to create audio material. To replicate the speaker’s voice, the software needs a 15-second audio recording of their speech.

During the demonstration of the tool, Bloomberg listened to a clip of Sam Altman, CEO of OpenAI, in which the technology was briefly explained in a voice that seemed indistinguishable from his actual speech, but was entirely AI-generated.

According to OpenAI’s head of product Jeff Harris, “if you have the right audio setup, it essentially has the capability of a human-like voice.” “It’s quite an impressive technical quality.” Nonetheless, Harris pointed out that “there are clearly a lot of security nuances around the ability to accurately mimic human speech.”

One of OpenAI’s current developer partners using the tool, the non-profit health system Lifespan at the Norman Prince Neuroscience Institute, is utilizing the technology to help patients recover their voice. For instance, this tool was used to restore the voice of a young patient who had lost the ability to speak clearly due to a brain tumor, by replicating their speech from a recording made for a school project, as stated in a company blog post.

Moreover, OpenAI’s voice model is distinct and has multilingual audio translation capabilities. Both Spotify Technology SA and other audio companies will gain from this. In an experiment, Spotify translated podcasts hosted by well-known hosts like Lex Fridman using this technique. According to OpenAI, which has also argued for other advantageous use of the technology, children’s educational materials ought to have a variety of voices.

OpenAI’s testing programme mandates that its partners accept its usage regulations, get the original speaker’s permission before utilising their voice, and disclose to listeners that the sounds they hear are artificial intelligence (AI) generated. The business is also using an inaudible audio stamp to identify whether a section of audio was created using its tool.

OpenAI said that before deciding whether to make the tool more widely available, it is consulting with outside experts. “It’s important that people around the world understand where this technology is headed, whether we ultimately deploy it widely ourselves or not,” the company stated in a blog post.

The software sample, according to OpenAI, “motivates the need to bolster societal resilience” against the difficulties posed by more sophisticated AI technology. As a security precaution for accessing bank accounts and sensitive data, the corporation, for instance, demanded that banks gradually phase out voice authentication. Along with greater technique development to determine if audio content is real or artificially generated, it aims to educate the public about false AI content.


Please enter your comment!
Please enter your name here