Generative Data Intelligence

Using WaveNet technology to reunite speech-impaired users with their original voices

Date:

This post details a recent project we undertook with Google and ALS campaigner Tim Shaw, as part of Google’s Euphonia project. We demonstrate an early proof of concept of how text-to-speech technologies can synthesise a high-quality, natural sounding voice using minimal recorded speech data.  

As a teenager, Tim Shaw put everything he had into football practice: his dream was to join the NFL. After playing for Penn State in college, his ambitions were finally realised: the Carolina Panthers drafted him at age 23, and he went on to play for the Chicago Bears and Tennessee Titans, where he broke records as a linebacker. After six years in the NFL, on the cusp of greatness, his performance began to falter. He couldn’t tackle like he once had; his arms slid off the pullup bar. At home, he dropped bags of groceries, and his legs began to buckle underneath him. In 2013 Tim was cut from the Titans but he resolved to make it onto another team. Tim practiced harder than ever, yet his performance continued to decline. Five months later, he finally discovered the reason: he was diagnosed with Amyotrophic lateral sclerosis (ALS, commonly known as Lou Gehrig’s disease). In ALS, the neurons that control a person’s voluntary muscles die, eventually leading to a total loss of control over one’s body. ALS has no known cause, and, as of today, has no cure.  

Today, Tim is a powerful advocate for ALS research. Earlier this year, he published a letter to his younger self advising acceptance–“otherwise, you’ll grieve yourself to death.” Now a wheelchair user, he lives under the constant care of his parents. People with ALS have trouble moving, and the disease makes speaking, swallowing, and even breathing on their own difficult and then impossible. Not being able to communicate can be one of the hardest aspects for people with ALS and their families. As Tim put it: “it’s beyond frustrating not to be able to express what’s going on in my mind. I’m smarter than ever but I just can’t get it out.”

Losing one’s voice can be socially devastating. Today, the main option available to people to preserve their voice is message banking, wherein people with ALS can digitally record and store personally meaningful phrases using their natural inflection and intonation. Message banking is a source of great comfort for people with ALS and their families, helping to preserve a core part of their identity – their voice – through a deeply challenging time. But message banking lacks flexibility, resulting in a static dataset of phrases. Imagine being told you will never be able to speak again. Now imagine that you were given the chance to preserve your voice by recording as much of it as possible. How would you decide what to record? How would you capture what you most want to be able to say in the future?  Would it be a meaningful story, a favorite phrase or a simple “I love you”? The process can be time consuming and emotionally draining, especially as someone’s voice degrades. And people who aren’t able to record phrases in time are left to choose a generic computer synthesized voice that lacks the same power of connection as their own.

Source: https://deepmind.com/blog/article/Using-WaveNet-technology-to-reunite-speech-impaired-users-with-their-original-voices

spot_img

Latest Intelligence

spot_img