Search Optimization
  Home arrow Search Optimization arrow Page 3 - Advanced Use of Robots.txt
SEO Chat Forums  
Choosing Keywords  
Google Optimization  
Link Trading  
MSN Optimization  
Search Engine News  
Search Engine Spiders  
Search Optimization  
Web Directories  
Website Marketing  
Website Promotion  
Website Submission  
Yahoo Optimization  
SEO Tools
Adsense Calculator
AdSense Preview
Advanced Meta-Tags
Alexa Rank Tool
Check Server Headers
Class C Checker
Code to Text Ratio
CPM Calculator
Domain Age Check
Domain Typos
Future PageRank
Google Dance
Google Keywords
Google Search
Google Suggest
Google vs Yahoo
Indexed Pages
Keyword Cloud
Keyword Density
Keyword Difficulty
Keyword Optimizer
Keyword Position
Keyword Typos
Link Popularity
Link Price Calculator
Meta Analyzer
Meta Tag Generator
Multiple Link Popularity
Page Comparison
Page Size
PageRank Lookup
PageRank Search
Robots.txt Generator
ROI Calculator 
S.E. Comparison 
S.E. Keyword Position 
Site Link Analyzer 
Spider Simulator 
URL Redirect Check 
URL Rewriting 
Moblin 
JMSL Numerical Library 
IBM® developerWorks 
Sun Developer Network 
SEO Weekly Newsletter
 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid 
Request Media Kit
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
SEARCH OPTIMIZATION

Advanced Use of Robots.txt
By: Jennifer Sullivan Cassidy
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 4 stars4 stars4 stars4 stars4 stars / 17
    2005-11-28

    Table of Contents:
  • Advanced Use of Robots.txt
  • MSNbot, Slurp, Googlebot, IA
  • Advanced Robots.txt Commands and Features
  • Meta Tag Instructions and Bandwidth
  • Using Robots.txt for Corporate Security

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     
    ADVERTISEMENT


    Advanced Use of Robots.txt - Advanced Robots.txt Commands and Features


    (Page 3 of 5 )

    Advanced Robots.txt Commands and Features

    While the basic commands that make up a robots.txt file are two types of information, there are some commands and features that can be used.  I should let you know, however, that not all search engine spiders understand these commands.  It’s important to know which ones do and which do not.

    Crawl Delay

    Some robots have been known to crawl web pages at lightening speeds, forcing web servers to ban ip addresses from the robots, or disallowing them to crawl the websites.  Some web servers have automatic flood triggers implemented, with automatic ip-banning software in place.  If a search engine spider crawls too quickly, it can trigger these ip-bans, blocking the subsequent crawling activities of the search engine.  While some of these robots would do well with a ban, there are others more likely that you do not wish banned.

    Instead of the following example, which subsequently bans the robot from crawling any of your pages, another solution was offered to this problem.  The crawl delay command.

    User-agent: MSNbot
    Disallow: /

    MSNbot was probably the most notorious offender.  In an SEO forum, “msndude” gave some insight into this:  “With regards to aggressiveness of the crawl: we are definitely learning and improving. We take politeness very seriously and we work hard to make sure that we are fixing issues as they come up… I also want to make folks aware of a feature that MSNbot supports…what we call a crawl delay. Basically it allows you to specify via robots.txt an amount of time (in seconds) that MSNbot should wait before retrieving another page from that host. The syntax in your robots.txt file would look something like:

    User-Agent: MSNbot
    Crawl-Delay: 20

    “This instructs MSNbot to wait 20 seconds before retrieving another page from that host. If you think that MSNbot is being a bit aggressive this is a way to have it slow down on your host while still making sure that your pages are indexed.”

    Other search engine spiders that support this command are Slurp, Ocelli, Teoma/AskJeeves, Spiderline and many others.  Googlebot does not officially support this command, however it is usually fairly well-mannered and doesn’t need it.  If you are not sure which robots understand this command, a simple question presented to the search engine’s support team could easily help you with this.  There is a good list of search engine robots at RobotsTxt.org with contact information if you are unsure how to reach them.  It’s not always easy to know which website the robot belongs to.  You may not know, for example, that Slurp belongs to Yahoo, or that Scooter belonged to AltaVista.

    More Search Optimization Articles
    More By Jennifer Sullivan Cassidy


       · I wanted to show you another good example of utilizing your robots.txt to keep...
       · Thanks for the article. I was wondering your comment on that msnbot doesn't spider...
       · At the time the article was written (in early October, I think) that was the case...
       · Thanks for the reply. I'm really new at this whole SEO stuff. I just started doing...
       · MSN Bot spidered my sites just fine before the summer of '05 and I've never had a...
     

    SEARCH OPTIMIZATION ARTICLES

    - SEO Overview and Tips for Beginners
    - Stumbling Blocks to Web Site Success
    - Web Pages to Include in Your Site
    - Big Sites Don`t Automatically Rule Search En...
    - You Need More Than One Site Map
    - The Whys and Hows of Video Search Optimizati...
    - An SEO`s Experience: 21 Rules for Performing...
    - An SEO Eyeful: Interview with Ronald Herskow...
    - Research Your Competition for SEO
    - How I Became Number 1
    - Popular SEO Myths
    - Meta Elements: A Field Guide
    - Optimize This Page Title
    - What to Consider When Hiring an SEO
    - Raising Your Visibility with LinkedIn





    © 2003-2008 by Developer Shed. All rights reserved. DS Cluster 4 hosted by Hostway
    Stay green...Green IT