Search Optimization
  Home arrow Search Optimization arrow Page 4 - Advanced Use of Robots.txt
SEO Chat Forums  
Choosing Keywords  
Google Optimization  
Link Trading  
MSN Optimization  
Search Engine News  
Search Engine Spiders  
Search Optimization  
Web Directories  
Website Marketing  
Website Promotion  
Website Submission  
Yahoo Optimization  
SEO Tools
Adsense Calculator
AdSense Preview
Advanced Meta-Tags
Alexa Rank Tool
Check Server Headers
Class C Checker
Code to Text Ratio
CPM Calculator
Domain Age Check
Domain Typos
Future PageRank
Google Dance
Google Keywords
Google Search
Google Suggest
Google vs Yahoo
Indexed Pages
Keyword Cloud
Keyword Density
Keyword Difficulty
Keyword Optimizer
Keyword Position
Keyword Typos
Link Popularity
Link Price Calculator
Meta Analyzer
Meta Tag Generator
Multiple Link Popularity
Page Comparison
Page Size
PageRank Lookup
PageRank Search
Robots.txt Generator
ROI Calculator 
S.E. Comparison 
S.E. Keyword Position 
Site Link Analyzer 
Spider Simulator 
URL Redirect Check 
URL Rewriting 
Moblin 
JMSL Numerical Library 
IBM® developerWorks 
Sun Developer Network 
SEO Weekly Newsletter
 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid 
Request Media Kit
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
SEARCH OPTIMIZATION

Advanced Use of Robots.txt
By: Jennifer Sullivan Cassidy
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 4 stars4 stars4 stars4 stars4 stars / 17
    2005-11-28

    Table of Contents:
  • Advanced Use of Robots.txt
  • MSNbot, Slurp, Googlebot, IA
  • Advanced Robots.txt Commands and Features
  • Meta Tag Instructions and Bandwidth
  • Using Robots.txt for Corporate Security

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     
    ADVERTISEMENT


    Advanced Use of Robots.txt - Meta Tag Instructions and Bandwidth


    (Page 4 of 5 )

    Meta Tag Instructions

    With the availability of search engine robot technology, there are thousands of search engine robots.  There just isn’t a way to list them all, along with their capabilities and disadvantages.  Many of these lesser known robots don’t even attempt to view your robots.txt.  So what do you do then?  Many webmasters find it handy to be able to place a few commands directly into their meta tags to instruct robots.  These tags are placed in the <head> section like any other meta tags.

    <meta name="robots" content="noindex">

    This meta tag tells the robot not to index this page.

    <meta name="robots" content="noindex,nofollow">

    This tag tells a robot should neither index this document, nor analyze it for links.

    Other tags you might have use of are:

    <meta name="robots" content="index,follow">
    <meta name="robots" content="noindex,follow">
    <meta name="robots" content="index,nofollow">
    <meta name="robots" content="all">

    Unfortunately, there is no way to guarantee that these less than polite robots will follow your instructions in your meta tags any more than they will follow your robots.txt.  In these extreme cases, it would be to your benefit to view your server logs, find out the ip address of this erring robot, and just ban it.

    Bandwidth Limitations

    Another complaint for having a search engine spider crawl un-instructed lies in the area of bandwidth.  A search engine spider could easily eat up a gigabyte of bandwidth in a single crawl.  For those of you paying for only so much bandwidth, this could be a big, if not just expensive, problem. 

    Without a robots.txt file, search engine spiders will request it anyway, causing a 404 Error to be presented.  If you have a custom 404 Page Not Found error page, then you are going to be wasting bandwidth.  A robots.txt file is a small file, and will cause less bandwidth usage than not having one.  Usually the crawl-delay directive can help with this.

    Some webmasters believe that another good way to keep a search engine spider from using too much bandwidth is with the revisit-after tag.  However, many believe this to be a myth.

    <meta name="revisit-after" content="15 days">

    Most search engine robots, like Google, do not honor this command.  If you feel that Googlebot is crawling too frequently and using too much bandwidth, you can visit Google’s help pages and fill out a form requesting Googlebot to crawl your site less often.

    You can also block all robots except the ones you specify, as well as provide different sets of instructions for different robots.  The robots.txt file is very flexible in this way.

    More Search Optimization Articles
    More By Jennifer Sullivan Cassidy


       · I wanted to show you another good example of utilizing your robots.txt to keep...
       · Thanks for the article. I was wondering your comment on that msnbot doesn't spider...
       · At the time the article was written (in early October, I think) that was the case...
       · Thanks for the reply. I'm really new at this whole SEO stuff. I just started doing...
       · MSN Bot spidered my sites just fine before the summer of '05 and I've never had a...
     

    SEARCH OPTIMIZATION ARTICLES

    - SEO Overview and Tips for Beginners
    - Stumbling Blocks to Web Site Success
    - Web Pages to Include in Your Site
    - Big Sites Don`t Automatically Rule Search En...
    - You Need More Than One Site Map
    - The Whys and Hows of Video Search Optimizati...
    - An SEO`s Experience: 21 Rules for Performing...
    - An SEO Eyeful: Interview with Ronald Herskow...
    - Research Your Competition for SEO
    - How I Became Number 1
    - Popular SEO Myths
    - Meta Elements: A Field Guide
    - Optimize This Page Title
    - What to Consider When Hiring an SEO
    - Raising Your Visibility with LinkedIn





    © 2003-2008 by Developer Shed. All rights reserved. DS Cluster 2 hosted by Hostway
    Stay green...Green IT