Amazon released an open-source speech dataset supporting 51 languages on Wednesday, encouraging developers to build more third-party apps and services for its AI speaker device Alexa.
Speech recognition and natural language understanding (NLU) algorithms have steadily improved, paving the way for voice-activated digital assistants like Siri, Alexa or Google Assistant. Unfortunately, the technology is still limited to a few select languages.
Alexa for instance currently supports English, German, Portuguese, French, Hindi, Italian, Japanese, Spanish, and Arabic. With this 51-language punt, Amazon is hoping to kickstart a global NLU translation system, which could also be very profitable for the company.
“Imagine that all people around the world could use voice AI systems like Alexa in their native tongues,” it wrote in a blog post.
As part of its efforts to expand to more languages, researchers have published a dataset – Multilingual Amazon SLURP for Slot Filling, Intent Classification, and Virtual-assistant Evaluation, or MASSIVE for short – containing one million spoken samples across 51 languages as well as open-source code to help developers train multilingual AI models.
Amazon is also hosting a competition dubbed Massively Multilingual NLU 2022 (MMNLU-22), challenging researchers to build the best translation systems using the dataset. The results from the competition will be presented in a workshop at Empirical Methods in Natural Language Processing, an academic conference on natural language processing taking place in December.
MASSIVE was compiled by having professional translators translate an English-only dataset into numerous languages spoken across Africa, Europe, Latin America, and Asia. The dataset is unsurprisingly tailored for communication with devices – it’s mostly made up of questions or common commands like asking to play a song by a specific artist or inquiring about the weather.
The system works by converting speech into text first. The text is then passed onto a series of NLU models, which analyze keywords to figure out what a user is asking the device to do. “For instance, given the utterance ‘what is the temperature in new york’, an NLU model might classify the intent as ‘weather_query’ and fill the slots as ‘weather_descriptor: temperature and place_name: new york’,” according to a paper describing the MASSIVE dataset in more detail.
Amazon hopes that the dataset and competition will encourage more developers to build third-party apps for the company’s Alexa Skills platform. Its ambitions are not small – the release hints at scaling natural language technology “to every language on Earth.” That’s a lot of languages – more than 7,000.
“NLU is a key component of Alexa Skills, which anyone can develop using the Alexa Skills Kit. Massively multilingual NLU technology, the development of which MASSIVE will help spur, is a promising method for making services like Alexa Skills available in many more languages,” Jack FitzGerald, a senior applied scientist at Amazon’s Alexa AI Natural Understanding unit, told The Register.
“Internationalization of all of our products and services is incredibly important – Alexa and Echo are no different. It’s our vision for Alexa to be everywhere our customers are and on all the devices they want it to be,” he concluded. ®