When we talk, the skin on our faces makes subtle vibrations too slight to be noticed by the naked human eye. While experimenting with an instrument known as an interferometer, VocalZoom CEO Tal Bakish and his team noticed it could detect peculiar measurements. “When it measures the face, we found out that the vibrations were caused only by the speaker’s voice and were not affected at all by any background voice,” he told Digital Trends. “At this point we realized that we have a disruptive technology to extract the voice of speaker in any noisy condition.”
By focusing on a speaker’s face from within a few feet, the VocalZoom sensor can measure changes in velocity and distance up to the micrometer. “These vibrations are then translated into an acoustic signal,” Bakish said. “This acoustic output is then fed into a typical speech enhancement or noise-reduction [program] to be fused with an acoustic microphone to create a practically noise-free signal that is fed to an automatic speech recognition [program].”
According to VocalZoom, the result is a signal with limited background noise that’s clearer than those recorded with conventional microphones and noise-reduction units. This may help match facial cues with speech for more secure speech verification software. Bakish also hopes the measurement will help improve speech recognition where advancements in artificial intelligence have not yet reached.
“Over the past decade, solutions have relied only on data collected for training and strong processing technologies, such as neural networks and deep learning,” he said. “Now it is clear that to reach the 100 percent performance required for user adoption, the microphone technology needs to improve.”
With current partnerships with Motorola Solutions, Intel, 3M, and others, VocalZoom hopes to have its technology featured in consumer devices as early as next year. The startup also said it is in talks with major car makers to include the feature for voice control in vehicles.