Skip navigation

New endeavors aim to build a better Internet


< Prev | 1 | 2

The Web of the future
Exactly how the Web of the future will process and distribute information has yet to be fully resolved. Some efforts, like San Francisco-based Metaweb Technologies’ Freebase , are converting Wikipedia entries and other existing databases to machine-readable formats. To date, Freebase has stored information on about 30,000 movies and 580,000 famous people, according to Robert Cook, Metaweb’s co-founder and executive vice president of products.

The database “tends to be of higher quality” than that compiled through natural language processing, Cook said, though he conceded that it covers a narrower knowledge set. The information can also skew toward the interests of the avid community helping to curate some of Freebase’s newest entries, including a summary of every “X-Files” episode, the causes of death for various celebrities, statistical links between fantasy football and real NFL teams, and an annotated map of all human genes with links to relevant research articles.

At IBM’s Almaden Research Center in San Jose, Calif., a project known as Avatar Semantic Search has taken an approach more akin to Etzioni’s, though with a focus on Intranet systems that host companies’ e-mail and messaging systems. As an example of the project’s utility, Shiv Vaithyanathan, the center’s manager of Unstructured Information Mining, recalled his frustration in locating the phone number of a student who had included the digits in just one or two e-mails out of several hundred sent. “I was trying to guess when it was that he sent me the e-mail that contained his number,” Vaithyanathan said.

Story continues below ↓
advertisement | your ad here

The IBM system aims to take guesswork out of similar queries. “Ideally, a semantic search needs to do several things: identify the sequence as a phone number and realize from the way the sentence is written that the phone number belongs to the person who sent the e-mail,” he said. “Once we know what is it that you want, then packaging it up and delivering it to you is not that hard of a job.”

Raising the bar
In September, Etzioni’s group revealed another peek at what search engines might deliver in the future with the debut of a research prototype called PanImages. Unlike typical image search engines, PanImages allows word translations across hundreds of languages and sends a query for files tagged with the appropriate word to both Flickr and Google Images, then displays results from the sites on a split screen.

“It’s an order of magnitude more languages than people have supported in translation systems before,” Etzioni said. “And where it’s really a boon is for people who speak less popular languages.” Slovenian or Hungarian speakers might be constrained if searching for flower images tagged only in their native tongue (“cvet” and “virag,” respectively). But running the same search with PanImages can retrieve blooms from around the world.

Word translations are based on automated readings of hundreds of dictionaries and wiktionaries on the Internet with some sophisticated reasoning added in, “so the whole is greater than the sum of the parts,” he said. Like Wikipedia, the site hasn’t been immune from mischievous intent. But its availability to the public has yielded translations for the site’s main interface in nearly 50 languages. Word by word, the translation database is expanding and evolving in what Etzioni describes as “Web 2.0 meets Web 3.0.”

As they mature, Etzioni says Internet applications are raising the bar on the quality and transparency of available information. Beyond that is the promise of helping people keep better track of RSS feeds, blogs, social networking sites and other chunks of data that could easily consume every waking moment. “So my belief is that we need technology – those of us who want to have a chance to take a walk outside and breathe the fresh air once in a while or be with our kids – to help us manage the flow of information,” he said.

For the countless surfers tethered to the Internet, that act of liberation may be one of Web 3.0’s most promising aspects yet.

© 2009 msnbc.com Reprints


< Prev | 1 | 2

Resource guide