Categories: Research

Headphones Use AI to Isolate Specific Speaker’s Voice

Revolutionary Technology Unveiled

Engineers at the University of Washington have developed a groundbreaking artificial intelligence system that puts the user in control, enabling headphone wearers to listen to a specific person selectively in a noisy environment. The “Target Speech Hearing” (TSH) system empowers users to focus on a speaker by looking at them for three to five seconds. This system isolates the speaker’s voice, effectively canceling out surrounding noise and other conversations, giving the user a personalized listening experience.

A Leap Beyond Noise Cancellation

Traditional noise-canceling headphones, such as the latest Apple AirPods Pro, can adjust sound levels based on environmental noise, but they offer limited user control over which sounds to prioritize. TSH advances this technology by letting users selectively hear a chosen speaker in real-time, even as the user moves and changes their orientation.

How It Works

The TSH system integrates with off-the-shelf headphones equipped with built-in microphones. The user taps a button to use the system while looking at the target speaker. This action triggers the enrollment process, a brief period during which the AI captures the speaker’s vocal patterns. This is done by analyzing the unique frequency and amplitude patterns of the speaker’s voice. The microphones on both sides of the headset detect the sound waves from the speaker’s voice. These signals are then processed by an onboard embedded computer running machine learning software that learns the desired speaker’s vocal patterns. The AI isolates the speaker’s voice and continues to do so even as the user moves around and no longer directly faces the speaker.

Technical Presentation and Findings

The team presented their findings on May 14, 2024, at the Honolulu ACM CHI Conference on Human Factors in Computing Systems. The code for the proof-of-concept device has been made available for other researchers and developers to build upon. However, the system has yet to be commercially available.

According to Shyam Gollakota, a professor at the Paul G. Allen School of Computer Science & Engineering at the University of Washington, “This project uses AI to modify auditory perception based on user preferences. With our system, you can hear a single speaker clearly even in noisy environments with many other people talking.”

User Experience and Testing

The research team conducted trials with 21 participants, who were highly impressed with the clarity of the enrolled speaker’s voice, rating it nearly twice as high compared to unfiltered audio. Participants used off-the-shelf headphones fitted with microphones to test the system. During enrollment, the headphones send the captured audio signal to an embedded computer, which processes the sound using machine learning algorithms. The system improves its focus on the enrolled voice as the speaker continues talking, providing more data for the AI to learn from, instilling confidence in the system’s effectiveness.

Previous Research and System Evolution

This work builds on the team’s previous research in “semantic hearing,” which allowed users to select specific sound classes, such as birds or voices, and filter out other environmental sounds. The TSH system represents a significant advancement by focusing on individual speakers rather than sound classes.

Current Limitations

Currently, the TSH system can enroll only one speaker at a time. Additionally, it can only enroll a speaker when there is no other loud voice coming from the same direction as the target speaker’s voice. Users have the option to re-enroll a speaker if they are not satisfied with the sound quality, which can enhance the clarity of the enrolled speaker’s voice. It’s important to note that the system may not be able to completely eliminate all background noise, especially in extremely noisy environments.

Future Developments

The team is working to expand the TSH system’s capabilities to include earbuds and hearing aids, aiming to make this technology more widely accessible. This could revolutionize personal audio experiences in various settings, such as guided tours, crowded public places, and even in educational environments where clear communication is essential, inspiring a new era of enhanced auditory experiences.

Practical Applications

The AI-driven TSH system could be highly beneficial in real-world scenarios. For example, during guided tours, users could focus solely on the tour guide’s narration, even amidst a noisy environment. In busy public places like cafes or streets, the system would allow users to have clear conversations without being overwhelmed by surrounding noise. In educational environments, the system could be used to enhance classroom discussions, ensuring that all students can hear the teacher’s voice clearly, regardless of their seating position.

Acknowledgments and Further Research

The University of Washington team has open-sourced the code for the proof-of-concept, inviting further development and collaboration from researchers and developers. The Moore Inventor Fellow award, the Thomas J. Cable Endowed Professorship, and funding from the UW CoMotion innovation gap fund partly supported this research.

The team is optimistic that this technology represents a significant step towards intelligent hearables that enhance human auditory perception with artificial intelligence. By allowing users to manipulate their acoustic surroundings in real-time, TSH provides a customizable listening experience based on user-defined characteristics like speech traits. This not only improves the user’s auditory experience but also opens up new possibilities for human-computer interaction, where the computer can adapt to the user’s auditory needs. Future research and development may further refine and expand the capabilities of this innovative system, making it an essential tool for improving auditory experiences in various settings.

Paper: Bandhav Veluri, Malek Itani, Tuochao Chen, Takuya Yoshioka, Shyamnath Gollakota. Look Once to Hear: Target Speech Hearing with Noisy Examples. ACM CHI Conference on Human Factors in Computing Systems, 2024 DOI: 10.1145/3613904.3642057


Like this article?  Keep up to date with AI news, apps, tools and get tips and tricks on how to improve with AI.  Sign up to our Free AI Newsletter

Also, come check out our free AI training portal and community of business owners, entrepreneurs, executives and creators. Level up your business with AI ! New courses added weekly. 

You can also follow us on X

AI News

Recent Posts

Kling AI from Kuaishou Challenges OpenAI’s Sora

In February 2024, OpenAI introduced Sora, a video-generation model capable of creating one-minute-long, high-definition videos.…

7 months ago

Alibaba’s Qwen2 AI Model Surpasses Meta’s Llama 3

Alibaba Group Holding has unveiled Qwen2, the latest iteration of its open-source AI models, claiming…

7 months ago

Google Expands NotebookLM Globally with New Features

Google has rolled out a major update to its AI-powered research and writing assistant, NotebookLM,…

7 months ago

Stability AI’s New Model Generates Audio from Text

Stability AI, renowned for its revolutionary AI-powered art generator Stable Diffusion, now unveils a game-changing…

7 months ago

ElevenLabs Unveils AI Tool for Generating Sound Effects

ElevenLabs has unveiled its latest innovation: an AI tool capable of generating sound effects, short…

7 months ago

DuckDuckGo Introduces Secure AI Chat Portal

DuckDuckGo has introduced a revolutionary platform enabling users to engage with popular AI chatbots while…

7 months ago