Search Engines and Algorithms: Search Engine Algorithms Explored

In this article, we are going to look at search engine algorithms, how diverse they are, what they have in common, why it’s important to know their differences, and how to make this information work for you in SEO. There is something for everyone, from the novice to the expert. Over the course of this series, we will look at optimizing your site for specific search engines, as well. The top six major players we will look at in this series are AOL Search, Google, and AskJeeves in the first article; Yahoo! and AltaVista in part 2; MSN in part 3; and in the last article, part 4, we’ll look at MetaSearch Engines.

Just about everyone knows what a search engine is.  Whenever you have a question, want to look up the address of your favorite restaurants or need to make a qualified online purchase, chances are, you visit a search engine on the Internet.

If you’ve ever used two different search engines to conduct the same search query, then you will have noticed that the results weren’t the same.  So why will the same query on different search engines produce different results? Part of the answer is because not all search engine indexes are going to be exactly the same, as it depends on what the spiders find or what information humans have submitted to the database. But more importantly, not every search engine uses the same algorithm to search through their databases.  An algorithm is what the search engines use to determine the relevance of the information in the database to what the user is searching for.

What is a Search Engine Algorithm?

A search algorithm is defined as a math formula that takes a problem as input and returns a solution to the problem, usually after evaluating a number of possible solutions.  A search engine algorithm uses keywords as the input problem, and returns relevant search results as the solution, matching these keywords to the results stored in its database.  These keywords are determined by search engine spiders that analyze web page content and keyword relevancy based on a math formula that will vary from one search engine to the next.

Types of Information that Factor into Algorithms

Some services collect information on the queries individual users submit to search services, the pages they look at subsequently, and the time spent on each page. This information is used to return results pages that most users visit after initiating the query. For this technique to succeed, large amounts of data need to be collected for each query. Unfortunately, the potential set of queries to which this technique applies is small, and this method is open to spamming.

Another approach involves analyzing the links between pages on the web on the assumption that pages on the topic link to each other, and authoritative pages tend to point to other authoritative pages.  By analyzing how pages link to each other, an engine can both determine what a page is about, and whether that page is considered relevant.  Similarly, some search engine algorithms figure internal link navigation into the picture.  Search engine spiders follow internal links to weigh how each page relates to another, and considers the ease of navigation.  If a spider runs into a dead-end page with no way out, this can be weighed into the algorithms as a penalty.

Original search engine databases were made up of all human classified data.  This is a fairly archaic approach, but there are still many directories that make up search engine databases, like the Open Directory (also known as DMOZ), that are entirely classified by people.  Some search engine data are still managed by humans, but after the algorithmic spiders have collected the information.

One of the elements that a search engine algorithm scans for is the frequency and location of keywords on a web page. Those with higher frequency are typically considered more relevant.  This is referred to as keyword density.  It’s also figured into some search engine algorithms where the keywords are located on a page. 

Like keywords and usage information, meta tag information has been abused.  Many search engines do not factor in meta tags any longer, due to web spam.  But some still do, and most look at Title and Descriptions.  There are many other factors that search engine algorithms figure into the calculation of relevant results.  Some utilize information like how long the website has been on the Internet, and still others may weigh structural issues, errors encountered, and more.

Why are Search Engines so different?

Search engine algorithms are highly secret, competitive things.  No one knows exactly what each search engine weighs and what importance it attaches to each factor in the formula, which leads to a lot of assumption, speculation, and guesswork.  Each search engine employs its own filters to remove spam, and even have their own differing guidelines in determining what web spam is!

Search engines generally implement two or three major updates every year.  One simply has to follow the patent filings to know this.  Even if you are not interested in the patent itself, they may give you a heads up to possible changes that will be following in a search engine algorithm.

Another reason that search engines are so diverse is the widespread use of technology filters to sort out web spam.  Some search engines change their algorithms to include certain filters, while others don’t change the basic algorithms, yet implement filters on top of the basic calculations.  According to the dictionary, filters are essentially “higher-order functions that take a predicate and a list and returns those elements of the list for which the predicate is true.”  A simpler way to think of search engine filters are like you would think of a water purifier:  the water passes through a device made of porous material that removes unwanted impurities.  A search engine filter also seeks to remove unwanted “impurities” from its results.

How to achieve good SEO among all the mainstream search engines

It may seem like a daunting task to please all of the many search engines out there, as there are thousands. Still, there are several pieces of advice I would give in order to streamline your SEO efforts among all of the major search engines.

Do your keyword research.  This means learning how many times your keywords are being searched for every day, what competition you have, and how these relate to your content in each page.

Select 3 – 5 phrases to optimize each page, instead of a whole slew of keywords.  The more keywords you try to use, the more diluted your keyword density becomes.  Use keywords for each page, not geared toward the entire site.  (Keyword density ranges for all of the above websites run from .07% for Google to 1.7% for Yahoo.)

Write unique, compelling titles for each page.  Titles are still important to all five top search engines.

Focus on writing unique content that adds value to users and incorporates valuable keywords.  Write for ideas, not keywords.  When you are finished with your ideas, your keywords should result from the content, and not the content from the keywords.

Ensure site architecture and design do not prohibit thorough search engine crawling.  Clear navigation, readable code, no broken links, and validated markup will allow you to not only make it easier for the search engine spiders to crawl your website, but this will also mean better page stickiness to your visitors.

Build high-quality, relevant, in-bound links.  Since all search engines rely upon in-bound links to rank the relevancy of a site, it is good to concentrate on this area.  Don’t inflate your backlinks with artificial links.  Build organic, natural links, and keep the sites you link to relevant.  Avoid link directories on your website.  In the case of all search engines, the more links, the better.

Be familiar with how the top search engines work. When you do your homework and understand the workings of the search engines, it will help you determine what search engines look for in a website that it considers relevant, and what practices to stay away from.

Stay with it. SEO is not a one-time project. Continual growth in content, links, and pages is required for long-term success.  The key to keeping your site relevant to search engines is fresh and unique content.  This applies to keywords, links and content.

Don’t sweat the small stuff.  Changes in search engine algorithms should not affect too much you if you follow the basic principles of SEO.  While tweaking may be necessary, there is certainly no cause for alarm.

So why is understanding the search engines so important?  As my son would say, “Mom, you gotta know your enemy.”  Well, not necessarily the enemy in this case, but what he says has a grain of truth in it when it comes to search engines.  You have to be familiar with the search engines, if you ever hope to optimize for them.  The more you familiarize yourself with the way they work and treat websites in relevancy of search results, the better your chances will be for ranking in those search engines.

If you optimize only for Google or Yahoo, however, you’ll be missing out on two-thirds of all of the search engine traffic, and only optimizing for MSN means you’ll lose about 85% of potential search traffic.  Optimizing for several search engines is going to be more easily done if you understand the basic concepts of each search engine.

Does this mean that any time a search engine changes you have to go running to change your website?  No.  In fact, one of the most reasonable pieces of advice I can give you when a search engine algorithm changes is not to panic.  However, when a major update is implemented, being familiar with the search engines is your best possible defense; that way, the new filters don’t force upon you a huge learning curve.

So while trying to figure out the exact formulas for each search engine’s algorithm will be a never-ending, insurmountable source of frustration for any SEO, focusing instead on ways to make all of them happy at once seems like a better use of your time, if not just as frustrating.  But if you follow the basic concepts of SEO, then you’ll not only be found in the search engine results, but be a more relaxed individual in the process.

Differences in the Mainstream Search Engines

Knowing your “enemy” will go a long way in helping you understand why your website may be performing the way it is in a particular search engine.  Search engine algorithms change constantly, some daily.  There is no way to know exactly when or how to predict changes in a search engine, but there are trends to follow.  There are some major factors that, even though they all weigh in their relevancy scores, each are weighted differently. 

The breakdown of the market share of search queries that these six search engines currently control is listed below.  These figures were taken from Forbes.com for August, 2005.

  • Google: 37.3%, up from 36.5% in July
  • Yahoo: 29.7% down from 30.3% in July
  • MSN: 15.8% up from 15.5% in July
  • AOL: 9.6% down from 9.9% in July
  • AskJeeves: 3%
  • AltaVista: 1.6%

AOL Search

AOL claims to be the first major search engine featuring “clustering” technology. Search results will be automatically clustered into relevant topics and displayed alongside the list of general results, using technology licensed by Vivisimo.  However, now, 80% of searches done on AOL currently use Google’s databases.  For all intents and purposes, optimizing for AOL Search is currently very similar to optimizing for Google.  Of the top ten results for “pets” in both AOL Search and Google, there is no distinction of differing results, and for the search term “pet feeding supplies,” there is only one variance.  Similar results were achieved for other randomly chosen keywords and phrases.  AOL Search queries make up approximately 16% of searches on the web.

Google

Google uses what is commonly known as the Hilltop Algorithm, or the “Austin Update.”  Hilltop emphasizes the voting power of what it considers “authority sites.” These are websites or pages that Google assesses to be of strong importance on a particular keyword topic.

For the purpose of quality search results and especially to make search engines resistant against automatically generated web pages based upon the analysis of content specific ranking criteria (doorway pages), the concept of link popularity was developed.  Link Popularity weighs very heavily into Google’s PageRank algorithm. 

According to WikiPedia.org, “PageRank is a family of algorithms for assigning numerical weightings to hyperlinked documents (or web pages) indexed by a search engine.”

There are over 100 factors that are calculated into Google’s PageRank algorithm.  What exact weight is given to each factor is unknown, but we do know that backlinks are probably one of the highest weighed factors when determining relevancy.  Some other factors might be: keyword density, date of domain registration or age of the website, clean design, error free pages, text navigation, absence of spam, and more.  We aren’t sure if these things are factored into the algorithm, or are just used in filters employed by Google.

Google is probably the most mysterious of all of the top search engines, and is currently the most popular search engine.  Google is estimated to have about 35% of the searches made.  While currently only ranking at #3, we could easily expect this to change in the near future.

AskJeeves

AskJeeves is a directory built entirely by human editors, with results presented as a set of related questions for which “answers,” or links, exist.  Enter your question in plain English into the text box, then click on the “Ask” button and Jeeves does the rest.  Now, the search engine’s being rebranded, and its mascot, the butler, is soon to be history.

This search engine is powered by Teoma.  A few observations by fellow SEOs have noted: Teoma holds orphans in its index longer than any other search engine.  So if you utilize redirects, whether temporary, 302, redirects, or permant 301s, there is a high chance that your old URLs will stay in the index for a long time.  The search engine’s re-index schedule in the past used to be between three and six months, with occasionally a couple crawls in one month, but the level of re-index has always been fairly sporadic and mainly shallow, and it seems that sponsored listings receive far more attention than any site listed organically in their index.  There is no way to submit your website to Teoma’s or AskJeeve’s index unless you pay for sponsored listings, or just wait for the robot, Ask, to crawl your site.  This can take a long time.

So for a pay-per-click platform, AskJeeves seems to be a good place to advertise.  AskJeeves displays sponsored listings from Google’s AdWords program.  If you have pretty good qualified traffic from your AdWords or you have an Amazon.com store, then you’ll have better luck with AskJeeves.  The key word here is “qualified.”  There seems to be a higher amount of click fraud, by nature, with AskJeeves.  I believe the primary reason for this is the way that AskJeeves SERPs are shown.  In Google, you have sponsored listings that are distinctly away from the organic listings: either highlighted in blue at the top, or down the right side.  With AskJeeves, your organic results look just like pay-for-placement listings.

In the next article, we will look at AltaVista and Yahoo’s search engines in depth, showing you what they specifically look for in a web page to deem it relevant to its search results.  Stay tuned!

Google+ Comments

Google+ Comments