Google’s AI Can Zero in on One Voice in Crowded, Noisy Room


Google AI can pick out a single speaker in a crowd: Expect to see it in tons of products

Google researchers have developed a deep-learning audio-visual model that can isolate one speaker’s voice in a cacophony of noise.

The ‘cocktail party effect’ — the ability to mute all voices in a crowd and focus on a single person’s voice — comes easily to humans but not machines.

It’s an obstacle to an application of the Google Glass smart glasses that I personally would like to see developed one day. That is, as a real-time speech-recognition and live-transcription system to support hearing-aid wearers……

Apparently voice separation is a hard nut to crack, but Google’s AI researchers may have a part of the answer to my Glass dream in the form of a deep-learning audio-visual model that can isolate speech from a mixture of sounds.

The scenario they present are two speakers standing side-by-side jabbering simultaneously. The technique hasn’t been proven in a real-world crowd but it does work on a video with two speakers on a single audio track.

Video: Google’s research combines the auditory and visual signals to separate speakers. Source: Google/YouTube

Read More
About Paul Gordon 3009 Articles
Paul Gordon is the publisher and editor of iState.TV. He has published and edited newspapers, poetry magazines and online weekly magazines. He is the director of Social Cognito, an SEO/Web Marketing Company. You can reach Paul at