Top minds taxed by translation challenge
RSS feeds on msnbc.com |
Add these headlines to your news reader |
Unraveling the languages
GALE’s goal is to deliver, by 2010, software that can almost instantly translate Arabic and Mandarin Chinese with 90 to 95 percent accuracy.
|
Fortunately for the GALE teams, they didn’t have to be near 95 percent right away. In the first year, they were expected to translate Arabic and Mandarin speech with 65 percent accuracy; with text the goal was 75 percent.
How hard was that? Before GALE, BBN boasted that it could automatically translate foreign news broadcasts with better than 80 percent accuracy. But DARPA wants translations not only from such controlled, well-articulated sources. GALE incorporates man-on-the-street interviews and raucous colloquial chats on the Web.
That’s where things get tricky. Background noise, dialects, accents, slang, short words like “on” or “of” that most speakers don’t bother to clearly enunciate — these are the stuff of nightmares for speech-recognition and machine-translation engineers.
Not to mention that Chinese and Arabic are structured very differently than English, making them a pain to translate.
“Arabic has this property: ‘He gave it to her’ would be one word. Little pieces in the one word capture lots of meaning,” said Salim Roukos, IBM’s GALE chief. Meanwhile, tense and gender are absent in Chinese.
To wring improvements from their translation software, the GALE teams fed their computers huge pools of sample broadcasts and texts in Arabic and Chinese. As the machines were exposed to more and more foreign sentences, they analyzed the content and structure, compiling an ever-deeper library of how words are spoken and the rules governing the languages.
Or so the researchers hoped. The name of the game is to fine-tune the computer process, known as an algorithm, that does the language analysis. Programming missteps can cause a computer to gain minimal insight from the new language data it is fed. It could even get worse at its translation task.
“It’s sort of trial and error, guided by intuitions and some knowledge,” BBN’s Schwartz said.
Though that’s not how it gets described in computer scientists’ meetings. “Rewrote the forward pass of the decoder algorithm to be a recursive transversal over the hypergraph, rather than a loop over spans,” one BBN programmer assured his team in a May presentation.
- Discuss Story On Newsvine
- Rate Story:
View popularLowHigh - Instant Message
MORE FROM INNOVATION |
| Add Innovation headlines to your news reader: |

