Topic Sensitive PageRank - How is Topic-Sensitive PageRank calculated?
(Page 3 of 5 )
Like the currently used Google PageRank system, the Topic-Sensitive PageRank would also be pre-computed, to save time in the search query processing. Since there are multiple themes to calculate, each page would be scored against multiple topics. Instead of one PR number, there would be many numbers, based on the total number of themes used in the computation.
At the time of the search query, all of the pre-calculated PR numbers are used together, to create a composite number for that specific topic. As each topic is weighed differently, relative to each web page, the PR number could vary widely from search to search.
The first step in the calculation process is to generate the weighted topic vectors. These would be calculated offline prior to updates. The exact mathematical formula is very complex, and may not even be the one considered for use. As always, Google won't be discussing their formula in public. Because of the possibility of using a completely different formula, from the one presented in the Stanford presentation, there is no point in studying it precisely.
What is important to know, however, is the system as proposed in the Stanford presentation, is heavily based on the categories found in the Open Directory Project (DMOZ). The confidence placed in the DMOZ, is based on the assumption that the data in that directory is lacking bias, due to the editors being volunteers. Many observers might question that assumption. Despite the best efforts of the volunteer editors to provide the best possible directory, intentional or unintentional errors can still be made.
The initial bias toward themes, as found in the Open Directory Project, is only the first part of the calculation process. The first part created the weighted PageRanks. The second part is computed for the individual search engine query.
What is also important to know is that each web page will end up with multiple PageRanks depending upon the keywords being searched.
The calculation for each individual search query could be performed in one of two ways. Again, keep in mind that we can't be certain which way Google would choose to make the tabulation.
The first way, and the example used in the research paper, is to make the calculation a uniform one. All users searching a particular keyword, or combination of keywords, would receive similar results. The system, based on uniformity, would be easier to implement.
The second way, would be to make the results individualized to the search engine user. By taking into consideration prior searches, and surfing habits of a user, that person's query could be personalized. The resulting returns would be based on that user's individual interests. Such a system would presuppose the use of surfer tracking techniques.
Next: The Value of Incoming Links >>
More Google Optimization Articles
More By Wayne Hurlbert