Skip to main content

Google is learning to differentiate between your voice and your friend’s

Looking to Listen: Stand-up

We may be able to pick out our best friend’s or our mother’s voice from a crowd, but can the same be said for our smart speakers? For the time being, the answer may be “no.” Smart assistants aren’t always right about who’s speaking, but Google is looking to change that with a pretty elegant solution.

Thanks to new research detailed in a paper titled, “Looking to Listen at the Cocktail Party,” Google researchers explain how a new deep learning system is able to identify voices simply by looking at people’s faces as they speak.

“People are remarkably good at focusing their attention on a particular person in a noisy environment, mentally “muting” all other voices and sounds,” Inbar Mosseri and Oran Lang, software engineers at Google Research noted in a blog post. And while this ability is innate to human beings, “automatic speech separation — separating an audio signal into its individual speech sources — while a well-studied problem, remains a significant challenge for computers.”

Mosseri and Lang, however, have created a deep learning audio-visual model capable of isolating speech signals from a variety of other auditory inputs, like additional voices and background noise. “We believe this capability can have a wide range of applications, from speech enhancement and recognition in videos, through video conferencing, to improved hearing aids, especially in situations where there are multiple people speaking,” the duo said.

So how did they do it? The first step was training the system to identify individual voices (paired with their faces) speaking uninterrupted in an aurally clean environment. The researchers presented the system with about 2,000 hours of video, all of which featured a single person in the camera frame with no background interference. Once this was complete, they began to add virtual noise — like other voices — in order to teach its A.I. system to differentiate among audio tracks, and thereby allowing the system to identify which track is which.

Ultimately, the researchers were able to train the system to “split the synthetic cocktail mixture into separate audio streams for each speaker in the video.” As you can see in the video, the A.I. can identify the voices of two comedians even as they speak over one another, simply by looking at their faces.

“Our method works on ordinary videos with a single audio track, and all that is required from the user is to select the face of the person in the video they want to hear, or to have such a person be selected algorithmically based on context,” Mosseri and Lang wrote.

We’ll just have to see how this new methodology is ultimately implemented in Google products.

Lulu Chang
Former Digital Trends Contributor
Fascinated by the effects of technology on human interaction, Lulu believes that if her parents can use your new app…
Google Bard can now create and edit images, courtesy of Adobe
These are examples of images created with Adobe Firefly.

A few examples of images created with Adobe Firefly. Adobe

Adobe and Google are partnering to bring Firefly, a collection of AI image tools, and Adobe Express into Google Bard. Firefly's unique capabilities and training set it apart from other AI image generators.

Read more
‘Godfather of AI’ quits Google to speak more freely on concerns
google deepmind collaboration head and neck cancer treatment artificial intelligence

Artificial intelligence pioneer Geoffrey Hinton surprised many on Monday when he revealed he'd quit his job at Google where he worked for the last decade on AI projects.

Often referred to as “the godfather of AI” for his groundbreaking work that underpins many of today's AI systems, British-born Hinton, now 75, told the New York Times that he has serious concerns about the speed at which the likes of Open AI with its ChatGPT tool, and Google with Bard, are working to develop their products, especially as it could be at the cost of safety.

Read more
Google’s ChatGPT rival is an ethical mess, say Google’s own workers
ChatGPT versus Google on smartphones.

Google launched Bard, its ChatGPT rival, despite internal concerns that it was a “pathological liar” and produced “cringeworthy” results, a new report has claimed. Worker say these worries were apparently ignored in a frantic attempt to catch up with ChatGPT and head off the threat it could pose to Google’s search business.

The revelations come from a Bloomberg report that took a deep dive into Google Bard and the issues raised by employees who have worked on the project. It’s an eye-opening account of the ways the chatbot has apparently gone off the rails and the misgivings these incidents have raised among concerned workers.

Read more