How Search Engines Deliver Results Pages
(Page 1 of 4 )
In a previous article titled “How Search Engines Work (and Sometimes Don’t),” I discussed the four main tasks that search engines perform: web crawling, document indexing, query processing, and ranking results. I then went into detail about the kinds of things that can trip up a web crawler when it comes to indexing a web page. In this article, I will cover the other processes in greater detail.
It’s interesting that the ever-so-modern search engines many of us use every day have their roots in a decades-old science called information retrieval. When the science sprang up about forty years or so ago, it mainly served large organizations such as libraries, research facilities, and government labs. Back then, scientists realized that two components were critical to a search’s success. These components have parallels in modern search.
The first of these components is relevance, which is how closely the contents of documents returned by a search match the searcher’s query. If the searcher’s query terms show up multiple times in a document, particularly in important parts of the document such as the title and/or subheadings, the document is judged to be very relevant to the query. In modern search parlance, this is known as document analysis. Modern search engines check important areas of web pages, such as the title, the meta data, the heading tags and the body of text content, to see how closely they match the search query.
The second component is popularity. Those of you who remember doing research papers in college may have run into this. Do you recall your professor telling you to pay close attention to the footnotes and bibliographies of your sources, and in particular to keep your eyes out for any work that was cited by several of your sources? Such a work would be considered “popular,” and by implication important to the subject at hand. On the Internet, this translates into link analysis, where search engines measure who is linking to a site or page, how many incoming links the site or page has from outside sources, and even what these outside sources are saying about that site or page.
In addition to relevance and popularity, search engines must take into consideration how much they can trust their sources. In an academic environment, it is assumed that commercial interests aren’t influencing the results for document searches (whether or not they are, is beyond the scope of this article). But that assumption cannot be made on the Internet; quite the opposite, in fact! This is one reason why links from .edu and .gov web pages are generally counted as more valuable; they’re considered not to be commercial as a matter of course. Link and document analysis techniques examine literally hundreds of factors that go through the search engine algorithms to determine in what order results should be presented to the searcher.
Next: Knowing Who to Trust >>
More Search Optimization Articles
More By Terri Wells