Search Optimization
  Home arrow Search Optimization arrow Page 2 - Advanced Use of Robots.txt
SEO Chat Forums  
Choosing Keywords  
Google Optimization  
Link Trading  
MSN Optimization  
Search Engine News  
Search Engine Spiders  
Search Optimization  
Web Directories  
Website Marketing  
Website Promotion  
Website Submission  
Yahoo Optimization  
SEO Tools
Adsense Calculator
AdSense Preview
Advanced Meta-Tags
Alexa Rank Tool
Check Server Headers
Class C Checker
Code to Text Ratio
CPM Calculator
Domain Age Check
Domain Typos
Future PageRank
Google Dance
Google Keywords
Google Search
Google Suggest
Google vs Yahoo
Indexed Pages
Keyword Cloud
Keyword Density
Keyword Difficulty
Keyword Optimizer
Keyword Position
Keyword Typos
Link Popularity
Link Price Calculator
Meta Analyzer
Meta Tag Generator
Multiple Link Popularity
Page Comparison
Page Size
PageRank Lookup
PageRank Search
Robots.txt Generator
ROI Calculator 
S.E. Comparison 
S.E. Keyword Position 
Site Link Analyzer 
Spider Simulator 
URL Redirect Check 
URL Rewriting 
Mobile Linux 
APP Generation ROI 
IBM® developerWorks 
SEO Weekly Newsletter
 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid 
Request Media Kit
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
SEARCH OPTIMIZATION

Advanced Use of Robots.txt
By: Jennifer Sullivan Cassidy
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 4 stars4 stars4 stars4 stars4 stars / 17
    2005-11-28

    Table of Contents:
  • Advanced Use of Robots.txt
  • MSNbot, Slurp, Googlebot, IA
  • Advanced Robots.txt Commands and Features
  • Meta Tag Instructions and Bandwidth
  • Using Robots.txt for Corporate Security

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     
    ADVERTISEMENT


    Advanced Use of Robots.txt - MSNbot, Slurp, Googlebot, IA


    (Page 2 of 5 )

    MSNbot

    MSN’s search engine robot is called MSNbot.  The MSNbot has quite a voracious appetite for spidering websites. Some webmasters love it and try to feed it as much as possible. Other webmasters don't see any reason to use up bandwidth for a search engine that doesn't bring them traffic.  Either way, MSNbot will not spider your website unless you have the robots.txt.  Once it finds your robots.txt, it will wander the site, almost timidly at first.  Then MSNbot builds up courage and indexes files rapidly.  So much so, that use of the crawl-delay directive is recommended with this robot.  I’ll cover this more later.

    Recent events could be the cause of this.  Several months ago, MSN received many complaints that MSNbot was ignoring directives written into the robots.txt files, such as crawling directories it has been instructed to stay out of.  Engineers looked into the problem, and I believe they changed a few things to help control this type of behavior with the robot. 

    In the process, they may have changed it in such a way as to instruct the MSNbot to follow the robots.txt to the letter, and for websites that didn’t have one, it probably got confused and just left, not having a letter of the law to go by.  While this is probably mere speculation, the spidering behavior of the robot seems to fit this assessment.

    Yahoo’s Inktomi Slurp

    Yahoo incorporated the use of Inktomi’s search engine crawler, and is now known as Slurp.  Inktomi/Yahoo's Slurp seems to gobble greedily for a couple of days, disappear, come back, gobble more, and disappear again. Without the robots.txt, however, it will crawl fairly slowly, until it just kind of fades away, unless it finds great, unique content.  But still, without the presence of the robots.txt, it may not crawl very deeply into your website.

    Googlebot

    On Google’s website, they instruct webmasters on the use of the robots.txt, and recommend that you do so.  SEOs know that Google’s “guidelines” for webmasters are actually more like step by step directions on how to optimize for the search engine.  So if Google makes mention of the robots.txt, then I would definitely follow those recommendations to a T. 

    Google will crawl a site, robots.txt or no, sporadically either way, but it will heed the instructions in the file if it is there.  Googlebot has been known to only crawl one or two levels deep without the presence of the robots.txt file.

    IA_Archiver

    Alexa’s search engine robot is called ia_archiver.  It is an aggressive spider with a big appetite; however it is also very polite.  It tends to limit its crawls to a couple hundred pages at a time, crawling without using extraneously large amounts of bandwidth, and slow enough as to not overload the server.  It will continue its crawl over a couple of days, and then come back after that fairly consistently as well.  So much so, that by analyzing your web stats, you can almost predict when ia_archiver will perform its next crawl.  Alexa’s ia_archiver obeys the robots.txt commands and directives.

    There are many other spiders and robots that exhibit particular behaviors when crawling your site.  The good ones will follow the robots.txt directives, and many of the bad ones will not.  Later, I’ll show you a few ways to help prevent some problems you might encounter from search engine robots, and how to utilize your robots.txt to help.

    More Search Optimization Articles
    More By Jennifer Sullivan Cassidy


       · I wanted to show you another good example of utilizing your robots.txt to keep...
       · Thanks for the article. I was wondering your comment on that msnbot doesn't spider...
       · At the time the article was written (in early October, I think) that was the case...
       · Thanks for the reply. I'm really new at this whole SEO stuff. I just started doing...
       · MSN Bot spidered my sites just fine before the summer of '05 and I've never had a...
     

    SEARCH OPTIMIZATION ARTICLES

    - Implementing Six Sigma Methodology for SEO
    - Introducing Six Sigma Methodology for SEO
    - What is Mobile SEO?
    - Using Lynx for SEO Analysis
    - Mastering Lynx (Open Source Text Browser) fo...
    - More Blogging Tips: Cooking with Gas
    - Blogging Tips from Julie and Julia
    - SEO Essentials: the Proper Web Server and Pl...
    - Steps to Higher Rankings and Traffic
    - Building Linkable Pieces and Titles
    - Page Rank Sculpting
    - Page Rank Optimization
    - ClickTale Review
    - Final Issues: Moving Blogger to WordPress wi...
    - Avoid the Mistakes New SEOs Make





    © 2003-2009 by Developer Shed. All rights reserved. DS Cluster 2 Hosted by Hostway
    For more Enterprise Application Development news, visit eWeek