Carnegie Mellon University was the birthplace of Lycos, one of the oldest search engines on the web. Stanford University can boast of being the birthplace of two of the most widely used search engines, Google and Yahoo. So no one should be surprised that, as the challenges of search have changed over the past few years, universities are moving to stay at the forefront of research and development related to search.
The University of California at Berkeley brought this point home recently by announcing the creation of an interdisciplinary center for advanced search technologies. The university is talking with a number of search companies to interest them in the project, including Google. Robert Wilensky, the center’s director and a professor of computer science and information management at the university, speaks about the center with infectious enthusiasm. “If you have 20 researchers interested in search, then getting them together where they are cross-fertilizing ideas, you make something bigger than its parts. You can create a nuclear reaction,” he said in an interview with CNet. Professor Wilensky hopes to open the interdisciplinary center early next year.
Berkeley’s center is just one project among many that have been inspired by the booming growth of internet search. With $5 billion spent on search advertising every year, the major search engines have been funding their own research projects, and smaller start-up companies have looked to cash in by carving out niches in specialty search areas. Even with the tremendous amount of activity in the private sector, however, those who want to see what search will look like in five or ten years would be wise to see what is brewing in the universities today. “A big source of new ideas comes out of universities,” notes Geoff Yang, a venture capitalist whose firm has backed such companies as Ask Jeeves and TiVo.
In the earlier days of the internet, most web surfers were happy to find websites with textual content related to their queries. At that time, most of what was available on the internet was in the form of text. Today, while text still makes up the lion’s share of almost any website’s content, users can find a great deal more. Books, scholarly articles and dissertations, television programs and other videos, music, and images are all being digitized and made available online. Since the source material is so varied and so different from traditional text, different techniques are required, and search engines need to be that much better to help users find what they’re looking for.
Modern search engenders other issues. For example, with several major search engines offering personalized search options, privacy concerns arise over how the engines treat individual search histories. And older questions come back forcing researchers to face issues in new ways. These include such matters as how a user can determine whether to trust the information he finds, and how users can ask (and search engines can handle) more complex questions.
Carnegie Mellon University’s Language Technologies Institute is working on an interesting project to deal with one of the privacy issues. The application is designed to act as an adjunct (and possibly substitute) to the personalized search histories that are already being collected by Google and Yahoo and stored on the search engines’ own networks. Users download the program to their own PCs. It lets them maintain and modify personal information, search preferences, and search history, within a search profile. Search engines would then query the profile, which never leaves the user’s PC. Not only would such an add-on keep personal information off the network, but used properly, it could work with multiple search engines. The technology could be ready as soon as the middle of next year.
Another fascinating project at Carnegie Mellon, dubbed Javelin (stands for Justification-based Answer Valuation through Language Interpretation), will take longer to mature. The project, funded by a government grant, examines question-and-answer search technology. Most of the major search engines can now answer simple factual questions such as “What is the population of New York?” Try to ask a more complicated question, such as “Which university has the largest computer science department?” and you run into some serious problems.
Jaime Carbonell, director of CMU’s Language Technologies Institute, explains the difference between the two types of questions. “This is dynamic information. You must parse the question, look for answers in multiple places and do a comparison. There are multiple steps, and we’re looking at how to do it one step at a time and provide a trace for the user.”
The Javelin Project home page, located here (http://www.lti.cs.cmu.edu/Research/JAVELIN/), provides a comprehensive overview of the issue. Using a diagram and a mathematical statement that bears a nodding similarity to the Drake equation, it explains the factors that go into answering a complicated question such as “What are the consequences of the Sudanese civil war?” Whether a search engine gives any particular item of information to answer this question depends on a number of things, including:
- The item’s relevance to the requested information.
- The likelihood that the person asking the question doesn’t already know this particular item.
- The veracity of the item’s source, and how well the item supports the conclusion.
- The diversity of the source of the item, if, for example, the person requesting the information wants contrasting or reinforcing points of view.
- Whether the person asking the question is likely to understand the item.
- The amount of time it will take for the person asking the question to digest the item.
How the search engine presents the information also matters. It would defeat the purpose, and possibly cause information overload, to present everything all at once. To make it easier on the searcher, the project home page suggests that the search engine or application might start by “interactively presenting the main consequences, permitting him or her to initiate interactive strategy refinement.” Carbonell believes this technology will not be ready for widespread consumer usage for another four or five years.
Some university search projects have shown so much promise that they found themselves purchased by their commercial counterparts. Over the past two years, Google has purchased at least two projects started at Stanford. One may already be bearing fruit for the search engine giant: personalization search tool Kaltix, a technology Google may be using to power its own personalized search histories.
Stanford professors are working on other search-related projects. Associate professor Andrew Ng and others are working on a project titled “Learning to Make Textual Inferences.” Its home page, located here (http://forum.stanford.edu/research/project.php?id=306), explains that the project is dedicated to using AI techniques to allow an algorithm to make inferences as to whether one English sentence logically follows from another (for example, one can infer from the sentence “Guerrillas killed 400 peasants” that “Guerrillas killed a civilian”). In a similar vein, other algorithms are capable of reading a large amount of text and deriving knowledge from it. For example, from the sentence “ldots heavy water rich in the doubly heavy hydrogen atom called deuterium,” the algorithm can infer that deuterium is a type of atom.
Another Stanford professor founded SearchFox, a privately held company with a take on search that is somewhat reminiscent of Yahoo’s MyWeb 2.0. It offers a search engine toolbar that lets users share favorite links. Those interested in this new model of search can check it out at the SearchFox home page here (http://www.searchfox.com/index.php).
The Massachusetts Institute of Technology and the World Wide Web Consortium have teamed up to create technology that will allow searchers to combine information in new ways. An MIT graduate student working under the umbrella of this partnership developed an interesting Firefox browser plug-in. Dubbed Piggybank, the tool allows users to combine information from several different sites and browse it all together. The site for the tool uses the example of integrating data from Boston.com, a movie web page and Google Maps. The tool can then show users where coffee shops, movie theaters and restaurants are located in relation to each other – very useful information when planning a date, for example, especially if you’re looking for something a little different that isn’t too out of the way.
This is just a sampling of the number and kind of search-related research projects brewing at the major universities. The internet and search engines changed the way we look for, use, and think about information. Those changes are likely to continue for the foreseeable future as research continues, with the goal of presenting information in forms that are more and more useful to the searchers.