SunQuest
 
       Search Engine Spiders
  Home arrow Search Engine Spiders arrow How Search Engines Work (and Sometimes...
IBM developerWorks
SEO Chat Forums  
Choosing Keywords  
Google Optimization  
Link Trading  
MSN Optimization  
Search Engine News  
Search Engine Spiders  
Search Optimization  
Web Directories  
Website Marketing  
Website Promotion  
Website Submission  
Yahoo Optimization  
SEO Tools
Adsense Calculator
AdSense Preview
Advanced Meta-Tags
Alexa Rank Tool
Check Server Headers
Class C Checker
Code to Text Ratio
CPM Calculator
Domain Age Check
Domain Typos
Future PageRank
Google Dance
Google Keywords
Google Search
Google Suggest
Google vs Yahoo
Indexed Pages
Keyword Cloud
Keyword Density
Keyword Difficulty
Keyword Optimizer
Keyword Position
Keyword Typos
Link Popularity
Link Price Calculator
Meta Analyzer
Meta Tag Generator
Multiple Link Popularity
Page Comparison
Page Size
PageRank Lookup
PageRank Search
Robots.txt Generator
ROI Calculator 
S.E. Comparison 
S.E. Keyword Position 
Site Link Analyzer 
Spider Simulator 
URL Redirect Check 
URL Rewriting 
Dedicated Servers  
Actuate Whitepapers 
Moblin 
IBM® developerWorks 
Sun Developer Network 
SEO Weekly Newsletter
 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid 
Request Media Kit
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
SEARCH ENGINE SPIDERS

How Search Engines Work (and Sometimes Don’t)
By: Terri Wells
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 4 stars4 stars4 stars4 stars4 stars / 40
    2005-12-26

    Table of Contents:
  • How Search Engines Work (and Sometimes Don’t)
  • Stumbling Instead of Crawling
  • More Stumbling Blocks
  • What Do Spiders See in a Hyperlink?

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     
    ADVERTISEMENT

    Stay one step ahead of the competition. Evaluate and give feedback on some of the hottest web development tools on the market today. Make your opinion heard! Click Here

    How Search Engines Work (and Sometimes Don’t)


    (Page 1 of 4 )

    You know how important it is to score high in the SERPs. But your site isn't reaching the first three pages, and you don't understand why. It could be that you're confusing the web crawlers that are trying to index it. How can you find out? Keep reading.

    You have a masterful website, with lots of relevant content, but it isn’t coming up high in the search engine results pages (SERPs). You know that if your site isn’t on those early pages, searchers probably won’t find you. You can’t understand why you’re apparently invisible to Google and the other major search engines. Your rivals hold higher spots in the SERPs, and their sites aren’t nearly as nice as yours.

    Search engines aren’t people. In order to handle the tens of billions of web pages that comprise the World Wide Web, search engine companies have almost completely automated their processes. A software program isn’t going to look at your site with the same “eyes” as a human being. This doesn’t mean that you can’t have a website that is a joy to behold for your visitors. But it does mean that you need to be aware of the ways in which search engines “see” your site differently, and plan around them.

    Despite the complexity of the web, and dealing with all that data at speed, search engines actually perform a short list of operations in order to return relevant results to their users. Each of these four operations can go awry in certain ways. It isn’t so much that the search engine itself has gone awry; it may have simply encountered something that it was not programmed to deal with. Or the way it was programmed to deal with whatever it encountered led to less than desirable results.

    Understanding how search engines operate will help you understand what can go wrong. All search engines perform the following four tasks:

    • Web crawling. Search engines send out automated programs, sometimes called “bots” or “spiders,” which use the web’s hyperlink structure to “crawl” its pages. According to some of our best estimates, search engine spiders have crawled maybe half of the pages that exist on the Internet.

    • Document indexing. After spiders crawl a page, its content needs to be put into a format that makes it easy to retrieve when a user queries the search engine. Thus, pages are stored in a giant, tightly managed database that makes up the search engine’s index. These indexes contain billions of documents, which are delivered to users in mere fractions of a second.

    • Query processing. When a user queries a search engine, which happens hundreds of millions of times each day, the engine examines its index to find documents that match. Queries that look superficially the same can yield very different results. For example, searching for the phrase “field and stream magazine,” without quotes around it, yields more than four million results in Google. Do the same search with the quote marks, and Google returns only 19,600 results. This is just one of many modifiers a searcher can use to give the database a better idea of what should count as a relevant result.

    • Ranking results. Google isn’t going to show you all 19,600 results on the same page – and even if it did, it needs some way to decide which ones should show up first. Thus, the search engine runs an algorithm on the results to calculate which ones are most relevant to the query. These are shown first, with all the others in descending order of relevance.

    Now that you have some idea of the processes involved, it’s time to take a closer look at each one. This should help you understand how things go right, and how and why these tasks can go “wrong.” This article will focus on web crawling, while a later article will cover the remaining processes.

    More Search Engine Spiders Articles
    More By Terri Wells


       · I hope you found this article informative and entertaining. I welcome your...
       · No Follow tag, you address this, but why do it? You address better site maps, what...
       · Actually, I did explain what the nofollow tag does, but not very explicitly; I've...
       · Hello there, I found your article very interesting.However, I am still no...
       · Just from that description, I really can't tell -- but if you check out our forums...
       · I 've the same problem with my website Fitness-Boat. can anyone suggest few step...
       · This may be a bit outdated from when the message was first posted, but you can get...
       · thanks terri,i followed your article and bang my page rank shot up.my website is...
       · Thanks Frank, I'm really delighted to hear about your success! I'm glad you found my...
       · hi terri,i would like to publish your article on my web site.
       · Thanks for asking, but I can't give you permission to republish this article on your...
     

    SEARCH ENGINE SPIDERS ARTICLES

    - The Yahoo SLURP Crawler
    - How Search Engines Work (and Sometimes Don’t)
    - Spider Guts
    - Score One for the Spiders?
    - Protect Against Invaders by SPAM-Proofing Yo...
    - ROBOTS.TXT Primer
    - Designing Websites For Humans In A World Of...






    © 2003-2008 by Developer Shed. All rights reserved. DS Cluster 2 hosted by Hostway