Microsoft`s Live Search Patents and Algorithms Related to Blogs - Ranking method using hyperlinks in blogs patent
(Page 3 of 4 )
On March 30, 2007 Microsoft filed another patent titled "Ranking method using hyperlinks in blogs." It listed Steve Chien and Dennis Fetterly as the inventors, and assigned the patent to Microsoft. It was published October 2, 2008, and received US patent application number 20080243812.
A method for static ranking of web documents is disclosed. Search engines are typically configured such that search results having a higher PageRank.RTM. score are listed first. A modified scoring technique is provided whereby the score includes a reset vector that is biased toward web pages linked to blogs. This requires identifying web pages as either blogs or non-blogs.
This algorithm identifies blogs and then recalculates pagerank based on links coming from those blogs.
The problem that Microsoft set out to fix with this algorithm is the one that many SEOs rely on, namely, getting high PR links for the sole purpose of manipulating search results.
Microsoft states that most blogs are run by humans, thus most links from blogs are editorial and can be trusted more than regular links. This is not correct, since many blogs are auto-generated and there's a small industry called "pay-per post." No doubt many bloggers are genuine in nature, but many are not. Link buying is still a part of the economy that drives the SEO industry, and blog posts offer a perfect opportunity to camouflage bought links as natural.
SEO by the SEA reports that authors of this patent performed experiments with 472 million pages and found results to be cleaner than with Page Rank alone. Authors also state they may put weight on blog subscribers.
Google has also modified PR with Hilltop, LocalRank, TrustRank, Topic Sensitive Trust Rank and other measures to combat the problem Microsoft is facing. The Pagerank system can be easily manipulated with links from high PR pages, thus more emphasis must go towards other indicators.
Microsoft's Vision Based Document Segmentation
On September 23, 2008 Microsoft was granted a patent titled "Vision-based document segmentation." The patent was filed on July 28, 2003, and listed Ji-Rong Wen, Shipeng Yu. Deng Cai, and Wei-Ying Ma as the inventors (it was assigned to Microsoft, of course). It was awarded US patent number 7,428,700.
There are many web pages that contain useful but unrelated information all together on one page. This information may be featured in blocks of text, located on different parts of the page (top/bottom/left/right) or presented in other forms. Microsoft aims to differentiate and then possibly rank content by identifying it as unrelated. It uses the following cues:
HTML tags
Font size and font types
Color of fonts
Background colors
Other unique identifiers
The patent does not mention CSS analysis, which is the dominant styling language. Many websites formatted with CSS have the simplest layouts in pure HTML and make it impossible to identify background information by its font size and position on the page (other than being contained within div tags). I do not know if Microsoft has technology that can take into account style sheet information.
Search engines are stylesheet blind and view pages in simplest forms. You can get a snippet of how search engines view pages by turning off CSS support in your browser.
You can download the Vision-based document segmentation white paper for more information on this algorithm. It is full of mathematical equations, so if you're an algebra whiz:
Next: Microsoft playing the catch up game >>
More MSN Optimization Articles
More By Ivan Strouchliak