Correlator showcases our ability to go “beyond document ranking”, beyond the usual title and snippet proposed by most search engines. For example, Correlator searches things “inside” documents: it finds the people, dates and locations that are mentioned in sentences. It locates and groups sentences of interest.

Correlator also searches “outside” of documents. For example, the first time that you type a query you will be taken to a “virtual Wikipedia page” where we try to create on the fly, for any subject, a page of Wikipedia content showing different aspects of your query in Wikipedia. Here is what Borkur Sigurbjornsson, the creator of this page, had to say about it:

"Correlator presents search results in a different way from the major search engines. The aim is to create a synthetic document which gives a better overview of the search results. Take for example the query tennis olympic games. Correlator identifies that this is a query composed of two concepts tennis and Olympic Games. The synthetic document gives an overview of the results by first presenting a summary of the two concepts (tennis and Olympic Games) and then gives a list of Wikipedia pages that discuss the two concepts, e.g., pages about Tennis at the Olympics, pages about Chinese table tennis players (who have won medals at the Olympics), etc."

Is this the Semantic Web? This is what Peter Mika, a Semantic Web expert and one of the creators of Correlator had to say on this topic:

"This demo is going back to the origins of the Semantic Web and trying to 'understand' natural language text as a source of resources, rather than expecting information to be already in a database. Unlike mashups using DBpedia or Freebase, we are showcasing here our ability to extract entities which are described in natural language (i.e. in the Wikipedia text, not in the infoboxes!) Of course, both sources of information are compatible and should be used in combination; we are working on that right now."

The core of Correlator is a search engine capable of returning not only relevant documents, but also relevant sentences and entities. This search engine is the fruit of work at the Yahoo! Research Barcelona lab, where we have been trying to improve search (web search, news search, social search, add search...) by applying ideas from computational linguistics. Although the idea may be simple and appealing, this problem has been haunting researchers in Computer Science since the early days of computers and search, and it is still far from solved...

One of the components we use to understand human queries is breaking down text (language) into basic components like “entities” (proper names like a person name, a country, an organization...) As we worked with these components to improve the quality of search in general, we noticed that we were getting very interesting “lists” of entities. For a while we kept these lists on the side, as if they were crumbs left over from our “main experiments”. But I couldn’t resist coming back to them over and over again, amazed at their quality. One day, tired of some failed experiments, I built a small demo to look at these lists, wondering if they could be used for anything. As I showed it around people seemed to like them, so we started to work more on generating the lists fast, grouping them properly, improving their ranking etc. About the same time we had an intern student (Henning Rode) who really like the idea and worked with us on this. He built an evaluation framework so we could experiment with these lists and find the algorithms that would give us the lists best liked by human subjects. We even set up a challenge in the lab (the prize was one bottle of champagne!) to force our colleagues to play with the lists, tell us about their quality, etc. This work would later become the core of “Correlator”...

There is much more state-of-the-art research hidden behind this demo than we can talk about in a blog post... but we welcome you to go and check the work of some of the other people involved in the creation of Correlator: Massimiliano Ciaramita and Jordi Atserias who worked on the semantic tagging technology, Jacob Leatherman and George Levchenko created amazing User Interface, Tejaswi Kasturi and Pras Sarkar who worked on the architecture, Carlos Castillo who works on web-mining, and my own work on natural language search.

Hugo Zaragoza.

Comments
There are no comments yet.

Tell us what you think, leave a comment: