Google Sets Sights on Clustering, Translation
Google Sets Sights on Clustering, Translation
"[We're] trying to go just beyond keywords and the linking structure of the Web, the innovation that we brought to search, and get behind the deeper meaning," Norvig said during his presentation.
In clustering, Norvig demonstrated a six-month-old project called "named entities abstraction," where Google's researchers are analyzing the company's large Web index to extract entities—such as the name of a company—from the structure of content and then decipher their relationship to one another.
With word clustering, the focus is on making the search engine better at understanding the multiple meanings of a word, Norvig said. Google started working on word clustering about three years ago.
Apropos of the heated U.S. presidential election, Norvig demonstrated a prototype of word clustering with results both for President Bush and for his Democratic contender, Sen. John Kerry.
Bush appeared in clusters for words around "president" and "White House," to name some examples, but the results drew laughter when he also appeared in descriptive categories such as "idiot" and "chimp."
"This is what the Web says, not my opinion," Norvig said following the laughter.
Kerry appeared within groups for "senator" and for his wife, "Teresa Heinz Kerry," as well as for "Bob Kerry," a former senator with whom some people may confuse him.
A growing number of search startups have targeted the automatic clustering of search results. Vivisimo Inc., one of the best-known startups that recently launched Clusty search site, groups results gathered from other search engines into clusters, or categories, as a way of drilling down into results.
While it might make sense for startups to deploy clustering technology today, Norvig said, Google still views the technology as too immature. It is most useful only for a small percentage of search results, he said, so Google is focusing on improving the technology and increasing its usefulness.
"Our take is that the state of the art is not there yet," Norvig said.

0 Comments:
Post a Comment
<< Home