Skip to main content

This AI cloned my voice using just three minutes of audio

Tech For Change
This story is part of Tech for Change: an ongoing series in which we shine a spotlight on positive uses of technology, and showcase how they're helping to make the world a better place.

There’s a scene in Mission Impossible 3 that you might recall. In it, our hero Ethan Hunt (Tom Cruise) tackles the movie’s villain, holds him at gunpoint, and forces him to read a bizarre series of sentences aloud.

The pleasure of Busby’s company is what I most enjoy,” he reluctantly reads. “He put a tack on Miss Yancy’s chair, and she called him a horrible boy. At the end of the month, he was flinging two kittens across the width of the room ...”

Recommended Videos

Despite sounding random and unimportant, it quickly becomes clear that the words he’s reading aren’t random at all — they’re deliberately designed to help a software program clone his voice. Once he finishes the passage, the software parses the audio and instantly gives Hunt the ability to speak and sound exactly like the bad guy — the final piece of his near-perfect disguise.

Mission: Impossible 3 (2006) - Seeing Double Scene (5/8) | Movieclips

Now if you take that scene and subtract all the espionage, guns, and dramatic tension, you’re left with a pretty solid example of what I experienced at CES today during a demo of My Own Voice, an AI-powered “voice banking” service from a French startup called Acapela Group.

The company’s raison d’être  is to help people who will eventually lose the ability to speak. This is typically something that happens as a result of injury, illness, or diseases like ALS, Huntington’s disease, and laryngeal cancer. Whatever the cause may be, the company’s My Own Voice platform allows a person to synthetically clone their voice and preserve the unique tone, timbre, and personality that makes it theirs — something that’s typically lost with most text-to-speech software (think Stephen Hawking).

Now to be fair, voice cloning tech isn’t necessarily new or technologically groundbreaking at this point. Such services have existed for years, and thanks in part to the advent of deepfakes, there are currently dozens of other companies that can do the same thing that Acapela Group does. But there are two big things that set My Own Voice apart from the rest of the pack: speed and purpose.

Super Fast AI Voice Cloning at CES #shorts

My Own Voice is impressively quick. Unlike other services, which often require hours of reference audio to create a realistic-sounding clone, My Own Voice’s AI can spin up an astonishingly good synthetic after hearing just 50 short sentences, or roughly around 3 minutes of recorded audio. It’s basically just like that Mission Impossible scene; they’ve developed a streamlined set of reference sentences that make it easier for their AI to learn how you sound, so instead of manually recording every conceivable word, all you have to do is talk through a handful of easy phrases.

Arguably more important than the software’s speed, though, is its purpose. Again, this tech isn’t particularly new or novel. There have been a handful of noteworthy startups that have spun up similar voice-cloning tech — like Canadian startup Lyrebird or the London-based firm Sonantic, for example. But both of those startups were quickly acquired, and their voice-cloning tech ended up being used for AI overdubbing in movies and video-editing software.

That’s not to say that those aren’t good uses of voice cloning tech. They absolutely are, and they’re probably quite profitable ones to boot — but that’s precisely what makes My Own Voice so cool. It’s not often that you encounter such a powerful technology that, rather than being built for entertainment or productivity, was developed specifically to help disadvantaged people and quite literally give them a voice.

Drew Prindle
Former Digital Trends Contributor
Drew Prindle is an award-winning writer, editor, and storyteller who currently serves as Senior Features Editor for Digital…
Zoom’s new AI tools will let you ditch meetings for good
A person conducting a Zoom call on a laptop while sat at a desk.

Zoom has introduced its own AI-inundated offerings, which are intended to help you keep up to date with business information within the videoconferencing app.

The first feature of the new service, called Zoom IQ will assist you with summarizing Zoom meeting conversations that took place in your absence. You can access these summaries through the Zoom Team Chat or email without having to initiate any kind of recording. Hosts of the meeting also receive an overall summary for sharing with the group, or for record keeping.

Read more
These ingenious ideas could help make AI a little less evil
Profile of head on computer chip artificial intelligence.

Right now, there’s plenty of hand-wringing over the damage artificial intelligence (AI) can do. To offset that, Firefox maker Mozilla set out to encourage more accountable use of AI with its Responsible AI Challenge, and the recently announced winners of the contest show that the AI-infused future doesn’t have to be all doom and gloom.

The first prize of $50,000 went to Sanative AI, which “provides anti-AI watermarks to protect images and artwork from being used as training data” for the kind of large-language models that power AI tools like ChatGPT. There has been much consternation from photographers and artists over their work being used to train AI without permission, something Sanative AI could help to remedy.

Read more
This AI can spoof your voice after just three seconds
man speaking into phone

Artificial intelligence (AI) is having a moment right now, and the wind continues to blow in its sails with the news that Microsoft is working on an AI that can imitate anyone’s voice after being fed a short three-second sample.

The new tool, dubbed VALL-E, has been trained on roughly 60,000 hours of voice data in the English language, which Microsoft says is “hundreds of times larger than existing systems”. Using that knowledge, its creators claim it only needs a small smattering of vocal input to understand how to replicate a user’s voice.

Read more