The History of Search and Search Technology - Basic Relevancy Factors
(Page 4 of 4 )
Stop Words
"Stop words" are words that are frequently used in the English language. Those words include a, the, all, also, but, down, full, much etc. They are words that are used by everyone regardless of the topic. Generally, search engines ignore "stop" words and will usually correct your search to exclude them. For example, when you search for "cat and dog" search engines will exclude "and" and only search for "cat" "dog."
Google does use stop words to an extent.
Keyword Density
Keyword density is a measure of how often a word appears on the page in relation to other words. It is an over-hyped measurement that doesn't help in search rankings. Search engines use far more than keyword density for on-page analysis. Their technology includes the location of terms on the page, word proximity and natural language processing.
Google has purchased Applied Semantics for its AdSense Network, but may also be using this technology for on-page analysis. Additionally, please keep in mind that one of Google's current projects involves scanning thousands of books, from which it may learn more about natural language patterns.
Location of Terms on The Page
By analyzing how terms are located in relation to each other on the page, search engines can determine partial relevancy of the page. The closer terms are to each other, the more relevant a page is.
In many cases, keywords appear separately from each other throughout the page. This is considered normal in most cases, but be sure to include a term together at least once in the title, heading or paragraph.
Link Analysis
Link analysis is at the core of all search engine relevancy. Apart from Page Rank and general link popularity, Google looks at: link anchor text, the page from which the link comes, age of the link, location of the link, title of the page from which the link comes, authority of the linking page and more.
Links are the biggest quality indicators that search engines have at the moment. Before search engines existed, and before the web was commercialized it was much harder to find information. All you had to rely on was links. There were few if any spammers, and people who found interesting sites shared those sites with others by placing a link. Also, the first web pages and servers were universities and colleges; this is why Google is biased toward .edu domains - they were the first on the scene, and usually contain quality content and resources.
As the web became commercial and Google's Page Rank well known, links became a form of advertising, where a link could be bought or artificially made by spammers. This is the reason for Google's bias toward older links and links from trusted domains.
Yahoo put less weight on link analysis than Google, while Ask.com is more about "authoritative hubs." Ask.com generally has a harder time ranking documents unless there's a community around a topic.
Size and Length of the Page
There's no "best" page copy length for ranking on search results. Search engines have specifically addressed this issue, and both long content and short content have equal chances to rank.
Behavioral Feedback
All major search engines such as Google, Yahoo, Live and Ask collect user feedback about web pages. They look at search queries, prior search queries, time interval between those queries and semantic relationships in order to learn more about intent. They also track click through rates for different listings. If, for example, users click on a listing and then go back right away, search engines may remove that listing and artificially lower its position for one or more keywords.
This brings up the fact that user experience is becoming an important part of SEO. As search engines collect more data, they are constantly learning to interpret it. As they get better at it, retaining users on your pages for a certain time period (maybe a benchmark for an industry) may become an important factor in the SEO game.
Behavior feedback is currently used in personalized search.
Bibliography
http://www.searchenginehistory.com/
http://www.webmasterworld.com/forum5/1008.htm
http://en.wikipedia.org/wiki/Web_search_engine
| DISCLAIMER: The content provided in this article is not warranted or guaranteed by Developer Shed, Inc. The content provided is intended for entertainment and/or educational purposes in order to introduce to the reader key ideas, concepts, and/or product reviews. As such it is incumbent upon the reader to employ real-world tactics for security and implementation of best practices. We are not liable for any negative consequences that may result from implementing any information covered in our articles or tutorials. If this is a hardware review, it is not recommended to open and/or modify your hardware. |