Google: World's Best Search Engine? - Google's Indexer and Query Processor
(Page 3 of 10 )
The Googlebot gives the indexer the full text of the pages it finds. These pages are stored in Google's database, usually in an inverted-index data structure. This index is sorted alphabetically by search term, with each index entry storing a list of documents in which the term appears and the location within the text where it occurs. This data structure allows rapid access to documents that contain user query terms.
To improve search performance, Google eliminates common words called stop words (such as the, is, on, or, of, how, why, as well as certain single digits and single letters). Stop words are so common that they do little to narrow a search, and therefore they can safely be discarded. The indexer also eliminates some punctuation and multiple spaces, as well as converting all letters to lowercase, to improve Google's performance.
Google's Query Processor
The query processor has several parts, including the user interface (search box), the "engine" that evaluates queries and matches them to relevant documents, and the results formatter.
Google considers over a hundred factors in determining which documents are most relevant to a query, including the popularity of the page, the position and size of the search terms within the page, and the proximity of the search terms to one another on the page. Google also applies machine-learning techniques to improve its performance automatically by learning relationships and associations within the stored data. For example, the spelling-correcting system uses such techniques to figure out likely alternative spellings. Google closely guards the formulas it uses to calculate relevance, and tweaks them to improve quality and performance, and to outwit the latest devious techniques used by spammers.
Indexing the full text of the web allows Google to go beyond simply matching single search terms. Google gives more priority to pages that have search terms near each other and in the same order as the query. Google can also match multi-word phrases and sentences. Since Google indexes HTML code in addition to the text on the page, users can restrict searches on the basis of where query words appear, e.g., in the title, in the URL, in the body, and in links to the page, options offered by the Advanced-Search page and search operators.
Let's see how Google processes a query.

Next: Appearance of Results Page >>
More Google Optimization Articles
More By Atul Davare