Skip navigation

Top minds taxed by translation challenge


< Prev | 1 | 2 | 3

Tweaking the algorithms
Speech recognition, machine translation and language distillation don’t harbor many secret recipes. Everyone knows what everyone else is trying to do — tweak algorithms over and over.

Judges scored the computer translations by counting the number of human edits that the sentences needed to get at the correct meaning.

The defining element of GALE — the government’s evaluation — was on the honor system, in keeping with the field’s open nature. The teams got the test in June — thousands of hours of audio and millions of pages in Arabic and Mandarin — and were expected to turn in their results later.

DARPA judges scored the computer translations by counting the number of human edits that the sentences needed in order for them to have the correct meaning. By this measure, the results largely met DARPA’s demands of 75 percent accuracy for text translation and 65 percent for speech.

Story continues below ↓
advertisement | your ad here

The BBN-led team produced 75.3 percent accuracy with Arabic text, 75.2 percent in Chinese. It scored 69.4 percent in Arabic speech; 67.1 percent in Mandarin. IBM scored higher with Arabic text and SRI scored higher in Mandarin.

Then came the distillation section: open-ended questions posed to each team’s computers — based on 600,000 documents in Arabic, Chinese and English.

“How did Israel react to the Hamas election victory?” was one such question. “Describe attacks in Kuwait,” was another.

DARPA wanted to see how well the computers replicated human performance on such questions, including how precisely they could recall certain facts.

Here, too, the computers managed some articulate responses. “Since Jan. 10 (2005), police have clashed with Muslim fundamentalists and pursued them around the country, killing eight militants and arresting scores of others,” went one BBN response to the Kuwait question.

But it was not until three months later — after all three teams began working on year two of GALE in case they were picked to continue — that the researchers got DARPA’s ruling about who passed.

Tightening the screws
So who got rejected? No one.

At least not yet.

‘The pressure — it’s not off. It’s higher, in fact. Now the goals are harder for the second year than they were before.’

— John Makhoul
BBN
DARPA Director Anthony Tether and GALE program manager Joseph Olive decided each team had shown significant progress worth continuing to track.

But they did tighten the screws. In addition to expecting better translation accuracy in each of GALE’s four remaining years, DARPA will measure that performance more stringently. Now a high level of accuracy must be sustained over a very high percentage of a document. A bad patch of computer translation cannot be averaged away.

Just days after being informed of the new framework, Makhoul already had his eye on the next GALE evaluation, in June, and how his team would deliver the performance DARPA — and BBN — needed.

“It’s the same feeling again,” he said. “The pressure — it’s not off. It’s higher, in fact. Now the goals are harder for the second year than they were before.”

© 2009 The Associated Press. All rights reserved. This material may not be published, broadcast, rewritten or redistributed.


< Prev | 1 | 2 | 3

Resource guide