Search Optimization
  Home arrow Search Optimization arrow Page 3 - I, Robots.txt
SEO Chat Forums  
Choosing Keywords  
Google Optimization  
Link Trading  
MSN Optimization  
Search Engine News  
Search Engine Spiders  
Search Optimization  
Web Directories  
Website Marketing  
Website Promotion  
Website Submission  
Yahoo Optimization  
SEO Tools
Adsense Calculator
AdSense Preview
Advanced Meta-Tags
Alexa Rank Tool
Check Server Headers
Class C Checker
Code to Text Ratio
CPM Calculator
Domain Age Check
Domain Typos
Future PageRank
Google Dance
Google Keywords
Google Search
Google Suggest
Google vs Yahoo
Indexed Pages
Keyword Cloud
Keyword Density
Keyword Difficulty
Keyword Optimizer
Keyword Position
Keyword Typos
Link Popularity
Link Price Calculator
Meta Analyzer
Meta Tag Generator
Multiple Link Popularity
Page Comparison
Page Size
PageRank Lookup
PageRank Search
Robots.txt Generator
ROI Calculator 
S.E. Comparison 
S.E. Keyword Position 
Site Link Analyzer 
Spider Simulator 
URL Redirect Check 
URL Rewriting 
Dedicated Servers  
Download TestComplete 
IBM® developerWorks 
SEO Weekly Newsletter
 
Developer Updates  
Free Website Content 
IBM Rational Software Development Conference
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid 
Request Media Kit
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
SEARCH OPTIMIZATION

I, Robots.txt
By: Jamesp
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 5 stars5 stars5 stars5 stars5 stars / 8
    2008-03-26

    Table of Contents:
  • I, Robots.txt
  • A Few Pointers
  • Creating Your First Robots.txt
  • Limiting Bots
  • Commenting and Loosely Supported Extensions

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     
     
    ADVERTISEMENT

    PCmover - $15 Off with Coupon Code CJPH7Q

    I, Robots.txt - Creating Your First Robots.txt
    (Page 3 of 5 )


    If you don't wish for any bots to to index your site, you would type the following into your text file:


    User-agent: *

    Disallow: /

    In this example, the “*” is known as a wildcard and says that the rule applies to all bots. A wildcard is a special character that could stand for anything. In typical usage, if you write d*ng, a computer can interpret this as being: “ding”, “dang”, “dong”, “dung”, “dzing” and so forth. Simply put, the “*” could be anything.

    The Disallow part says that no directory or file should be scanned. It's important to note how this works. The patterns in the Disallow are matched by using a substring comparison. The robot sees what is written there and says, “Does this directory or file contain this?” For instance, let's say our site is www(dot)sample(dot)com. If I have a directory called “images,” it would be listed as www(dot)sample(dot)com/images/.

    In this instance, the bot sees the “/” in the www(dot)sample(dot)com/images/ and will ignore it.

    To allow all bots to visit every file and directory, you would write this in your file:


    User-agent: *

    Disallow:

    Again, User-Agent uses the wildcard to say that whatever is in the Disallow line applies to all bots. Since the Disallow is blank, there is nothing to match, and so all files and directories are available.

    If you want every bot to ignore one directory, we would write:


    User-agent: *

    Disallow: /images/

    Again, the wildcard says all bots should follow the Disallow. The Disallow asks the bots to stay away from /images/. If the bots are compliant, they won't scan this directory or the files therein. Note again that I wrote “/images/” and not “/image”. You always want to include that final forward slash (/).

    To tell all bots not to scan a specific file, we use this code:


    User-Agent: *

    Disallow: /images/biggorillaonatricycle.jpg

    Now all bots should scan everything except the biggorillaonatricycle image. When it finds that picture in the “image” directory, it looks away, even though, let's face it, who wouldn't want to see that? An important thing to note here is that if we had, say, a secondary directory (named "imagestwo" perhaps) that held some photos and included the same picture, the bots would still scan that one, unless you told them otherwise.

    Here is how you could make it so that neither of the pictures of our buddy the gorilla riding on his tricycle get scanned:


    User-agent: *

    Disallow: /images/biggorillaonatricycle.jpg

    Disallow: /imagestwo/biggorillaonatricycle.jpg

    This rule applies to directories as well:


    User-agent: *

    Disallow: /images/

    Disallow: /imagestwo/

    Disallow: /aboutus/

    The above tells all bots to ignore the three directories. Note that we can also mix our directories and files together:


    User-agent: *

    Disallow: /images/

    Disallow: /imagestwo/

    Disallow: /aboutus/wearereallyevil.html

    More Search Optimization Articles
    More By Jamesp


       · Thanks for stopping by to read my article on the robots.txt protocol. In it, I...
       · Thanks for providing such an informative post. It is always great to see issues...
       · Thanks, glad you enjoyed it!
     

    SEARCH OPTIMIZATION ARTICLES

    - Navigating Your Way into Your Visitors` Hear...
    - Creating SEO-Friendly Content
    - Network Solutions: Unethical SEO
    - Basic SEO: What Search Engines Hate
    - Basic SEO: What Search Engines Love
    - I, Robots.txt
    - Same Game, Different Name
    - Use SEO to Get Your New Site Off to a Good S...
    - Basic SEO Troubleshooting
    - Getting an SEO Education
    - Site Target Marketing
    - Costly SEO Mistakes You Must Avoid
    - Defending Against Black Hat and Negative SEO...
    - Scoring SEO Efforts Realistically
    - Don`t Make These Common SEO Mistakes!


     
    Accelerating Trading Partner Performance
     
    Competing on Analytics
     
    Cost Effective Scaling with Virtualization and Coyote Point Systems
     
    Five Checkpoints to Implementing IP Telephony
     
    Hosted Email Security: Staying Ahead of New Threats
     




    © 2003-2008 by Developer Shed. All rights reserved. DS Cluster 4 hosted by Hostway