The general media has gone mad over Google’s announcement that, in a few years, they hope to be-able to translate speech instantly.
This is currently available, though in a limited way (Spoken English to Japanese or Spanish), on the iPhone using a 3rd party app.
I’d expect that if and when Google release such a thing it will also have limitations – to believe the media and think that it will support several thousand languages however is a mistake.
For specific needs there are already alternative solutions. The LAPD use a device which has thousands of pre-recorded messages relevant to their standard questions and general announcements, stored in multiple languages. And the device has also been used in Afghanistan and Iraq by American soldiers needing to communicate in other languages.
The device was developed through DARPA funding, and DARPA have been investing heavily in finding automated language solutions for years – so it will be interesting to see how quickly Google can bring a high quality technical solution to market. Alternatively if you need a professional interpreter while waiting for Google’s solution, then you can get one on your mobile phone at any time – just give us a call and we can help set you up to make use of our Instant Telephone Interpreting (ITI) system.
From a technical viewpoint, to deliver automatic speech to speech translation to a mass-market there are 3 key components that need to be perfected;
1. The system would first need to support good quality voice recognition, be able to differentiate between thousands of accents and dialects, and be able to turn the speech into a text form. Anyone who has used speech recognition over the last few years will probably agree that it’s come a long way – however it’s still a long way from where it needs to be, especially when dealing with accents and fast spoken languages.
2. The system is probably going to rely on machine translation to turn the text into its translated equivalent. Currently, there is a lot of work being done in this area using language models, statistical models or hybrid systems. Ultimately while there have been some fantastic gains over recent years, there is still a phenomenal amount of work to be done – especially when dealing with translations in a specific specialised subject area, such as medicine.
3. Finally the system is going to turn the text back into speech. This part of the system should be the easiest part to prepare. Text to speech has been around for a long time, and users would ultimately accept, maybe even expect, an accented digitised voice.
From the announcement I don’t think that Google is suggesting that it will be-able to translate every language to every other language. Nor will it impact the language services industry – unless your core business is interpreting for consumers. Google is merely highlighting the possibility.
The concept is an engineering challenge and if anyone is setup to attack these kinds of challenges, it’s Google.