Skip navigation
sponsored by 

Top minds taxed by translation challenge


< Prev | 1 | 2 | 3 | Next >

Unraveling the languages
GALE’s goal is to deliver, by 2010, software that can almost instantly translate Arabic and Mandarin Chinese with 90 to 95 percent accuracy.

‘Arabic has this property: “He gave it to her” would be one word. Little pieces in the one word capture lots of meaning.’

— Salim Roukos
IBM
That might be impossible. Humans might not even be that precise. Consider all the ways we mishear each other, or fail to grasp idioms, or apply one subjective interpretation instead of another. Why else do new translations of “Don Quixote” keep emerging, 400 years after it was written?

Fortunately for the GALE teams, they didn’t have to be near 95 percent right away. In the first year, they were expected to translate Arabic and Mandarin speech with 65 percent accuracy; with text the goal was 75 percent.

Story continues below ↓
advertisement

How hard was that? Before GALE, BBN boasted that it could automatically translate foreign news broadcasts with better than 80 percent accuracy. But DARPA wants translations not only from such controlled, well-articulated sources. GALE incorporates man-on-the-street interviews and raucous colloquial chats on the Web.

That’s where things get tricky. Background noise, dialects, accents, slang, short words like “on” or “of” that most speakers don’t bother to clearly enunciate — these are the stuff of nightmares for speech-recognition and machine-translation engineers.

Not to mention that Chinese and Arabic are structured very differently than English, making them a pain to translate.

“Arabic has this property: ‘He gave it to her’ would be one word. Little pieces in the one word capture lots of meaning,” said Salim Roukos, IBM’s GALE chief. Meanwhile, tense and gender are absent in Chinese.

To wring improvements from their translation software, the GALE teams fed their computers huge pools of sample broadcasts and texts in Arabic and Chinese. As the machines were exposed to more and more foreign sentences, they analyzed the content and structure, compiling an ever-deeper library of how words are spoken and the rules governing the languages.

Or so the researchers hoped. The name of the game is to fine-tune the computer process, known as an algorithm, that does the language analysis. Programming missteps can cause a computer to gain minimal insight from the new language data it is fed. It could even get worse at its translation task.

“It’s sort of trial and error, guided by intuitions and some knowledge,” BBN’s Schwartz said.

Though that’s not how it gets described in computer scientists’ meetings. “Rewrote the forward pass of the decoder algorithm to be a recursive transversal over the hypergraph, rather than a loop over spans,” one BBN programmer assured his team in a May presentation.


Resource guide

Get Your 2008 Credit Score

Find a business to start

Try for Free

Search Jobs

Find Your Dream Home

$7 trades, no fee IRAs

Find your next car