(Cross-posted from the Google Research Blog)
Welkom*!
Today we’re introducing Voice Search support for Zulu and Afrikaans, as well as South African-accented English. The addition of Zulu in particular represents our first effort in building Voice Search for underrepresented languages.
We define underrepresented languages as those which, while spoken by millions, have little presence in electronic and physical media, e.g., webpages, newspapers and magazines. Underrepresented languages have also often received little attention from the speech research community. Their phonetics, grammar, acoustics, etc., haven’t been extensively studied, making the development of ASR (automatic speech recognition) voice search systems challenging.
We believe that the speech research community needs to start working on many of these underrepresented languages to advance progress and build speech recognition, translation and other Natural Language Processing (NLP) technologies. The development of NLP technologies in these languages is critical for enabling information access for everybody. Indeed, these technologies have the potential to break language barriers.
We also think it’s important that researchers in these countries take a leading role in advancing the state of the art in their own languages. To this end, we’ve collaborated with the Multilingual Speech Technology group at South Africa’s North-West University led by Prof. Ettiene Barnard (also of the Meraka Research Institute), an authority in speech technology for South African languages. Our development effort was spearheaded by Charl van Heerden, a South African intern and a student of Prof. Barnard. With the help of Prof. Barnard’s team, we collected acoustic data in the three languages, developed lexicons and grammars, and Charl and others used those to develop the three Voice Search systems. A team of language specialists traveled to several cities collecting audio samples from hundreds of speakers in multiple acoustic conditions such as street noise, background speech, etc. Speakers were asked to read typical search queries into an Android app specifically designed for audio data collection.
For Zulu, we faced the additional challenge of few text sources on the web. We often analyze the search queries from local versions of Google to build our lexicons and language models. However, for Zulu there weren’t enough queries to build a useful language model. Furthermore, since it has few online data sources, native speakers have learned to use a mix of Zulu and English when searching for information on the web. So for our Zulu Voice Search product, we had to build a truly hybrid recognizer, allowing free mixture of both languages. Our phonetic inventory covers both English and Zulu and our grammars allow natural switching from Zulu to English, emulating speaker behavior.
This is our first release of Voice Search in a native African language, and we hope that it won’t be the last. We’ll continue to work on technology for languages that have until now received little attention from the speech recognition community.
Salani kahle!**
* “Welcome” in Afrikaans
** “Stay well” in Zulu
Selasa, 09 November 2010
Langganan:
Posting Komentar (Atom)
Blog Archive
-
▼
2010
-
▼
November
- Deck the halls with smarter shopping
- Learn about the human side of climate change with ...
- Introducing Google Earth 6—the next generation of ...
- Spreading holiday cheer and regional cuisine throu...
- Preserving Alan Turing’s papers at Bletchley Park
- This week in search 11/19/10
- Great tech support and good karma found with Googl...
- 2010 Google Faculty Summit in Shanghai
- YouTube highlights – 11/18/2010
- A curious guide to browsers and the web
- Ten times more applications for Google Apps customers
- Editing your Google Docs on the go
- Introducing Boutiques: a new way to shop for fashi...
- Google Voice for iPhone
- Offline, meet online: a marketing experiment with ...
- Improvements to Product Search for this holiday se...
- This week in search 11/12/10
- Google Apps highlights – 11/12/2010
- Honoring Veterans Day 2010 at Google
- Helping you find emergency information when you ne...
- A brand-new interface for AdSense
- Voice Search in underrepresented languages
- App Tuesday: Eight new apps for your business
- Beyond Instant results: Instant Previews
- Happy holidays from Google Chrome: free holiday Wi...
- This week in search 11/05/10
- The power of Google Instant, now in your pocket
- YouTube highlights – 11/4/2010
- More midterm election search trends as the results...
- More midterm election search trends
- Announcing the Panoramio Photo Contest
- Passing the torch to the cloud: NYU is going Google
-
▼
November
0 komentar:
Posting Komentar