Search Engines and Algorithms: Optimizing for MSN’s RankNet Technology - Clues About RankNet Technology
(Page 2 of 5 )
The clues we get about this technology comes from the patent filings for RankNet. The first patent identified is “Method for scanning, analyzing and handling various kinds of digital information content” which mentions the neural net concept in the patent abstract:
Computer-implemented methods are described for, first, characterizing a specific category of information content–pornography, for example–and then accurately identifying instances of that category of content within a real-time media stream, such as a web page, e-mail or other digital dataset. This content-recognition technology enables a new class of highly scalable applications to manage such content, including filtering, classifying, prioritizing, tracking, etc. An illustrative application of the invention is a software product for use in conjunction with web-browser client software for screening access to web pages that contain pornography or other potentially harmful or offensive content. A target attribute set of regular expression, such as natural language words and/or phrases, is formed by statistical analysis of a number of samples of datasets characterized as “containing,” and another set of samples characterized as “not containing,” the selected category of information content. This list of expressions is refined by applying correlation analysis to the samples or “training data.” Neural-network feed-forward techniques are then applied, again using a substantial training dataset, for adaptively assigning relative weights to each of the expressions in the target attribute set, thereby forming an awaited list that is highly predictive of the information content category of interest.
And Chris Burgess, mentioned in the MSN Search Blog post and head author of the “Learning to Rank with Gradient Descent” paper (one of the RankNet White Papers), was one of the co-authors of this patent application which describes neural network; “System and method for identifying content and managing information corresponding to objects in a signal.” The patent abstract states:
An “interactive signal analyzer” provides a framework for sampling one or more signals, such as, for example, one or more channels across the entire FM radio spectrum in one or more geographic regions, to identify objects of interest within the signal content and associate attributes with that content. The interactive signal analyzer uses a signal fingerprint extraction algorithm, i.e., a “fingerprint engine,” for deriving traces from segments of one or more signals. These traces are referred to as “fingerprints” since they are used to uniquely identify the signal segments from which they are derived. These fingerprints are then used for comparison to a database of fingerprints of known objects of interest. Information describing the identified content and associated object attributes is then provided in an interactive user database for viewing and interacting with information resulting from the comparison of the fingerprints to the database.
Next: Crawling Behaviors and Optimization Techniques >>
More MSN Optimization Articles
More By Jennifer Sullivan Cassidy