Search Optimization
  Home arrow Search Optimization arrow Page 3 - I, Robots.txt
SEO Chat Forums  
Choosing Keywords  
Google Optimization  
Link Trading  
MSN Optimization  
Search Engine News  
Search Engine Spiders  
Search Optimization  
Web Directories  
Website Marketing  
Website Promotion  
Website Submission  
Yahoo Optimization  
SEO Tools
Adsense Calculator
AdSense Preview
Advanced Meta-Tags
Alexa Rank Tool
Check Server Headers
Class C Checker
Code to Text Ratio
CPM Calculator
Domain Age Check
Domain Typos
Future PageRank
Google Dance
Google Keywords
Google Search
Google Suggest
Google vs Yahoo
Indexed Pages
Keyword Cloud
Keyword Density
Keyword Difficulty
Keyword Optimizer
Keyword Position
Keyword Typos
Link Popularity
Link Price Calculator
Meta Analyzer
Meta Tag Generator
Multiple Link Popularity
Page Comparison
Page Size
PageRank Lookup
PageRank Search
Robots.txt Generator
ROI Calculator 
S.E. Comparison 
S.E. Keyword Position 
Site Link Analyzer 
Spider Simulator 
URL Redirect Check 
URL Rewriting 
Mobile Linux 
APP Generation ROI 
IBM® developerWorks 
SEO Weekly Newsletter
 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid 
Request Media Kit
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
SEARCH OPTIMIZATION

I, Robots.txt
By: Jamesp
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 5 stars5 stars5 stars5 stars5 stars / 9
    2008-03-26

    Table of Contents:
  • I, Robots.txt
  • A Few Pointers
  • Creating Your First Robots.txt
  • Limiting Bots
  • Commenting and Loosely Supported Extensions

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     
    ADVERTISEMENT


    I, Robots.txt - Creating Your First Robots.txt


    (Page 3 of 5 )


    If you don't wish for any bots to to index your site, you would type the following into your text file:


    User-agent: *

    Disallow: /

    In this example, the “*” is known as a wildcard and says that the rule applies to all bots. A wildcard is a special character that could stand for anything. In typical usage, if you write d*ng, a computer can interpret this as being: “ding”, “dang”, “dong”, “dung”, “dzing” and so forth. Simply put, the “*” could be anything.

    The Disallow part says that no directory or file should be scanned. It's important to note how this works. The patterns in the Disallow are matched by using a substring comparison. The robot sees what is written there and says, “Does this directory or file contain this?” For instance, let's say our site is www(dot)sample(dot)com. If I have a directory called “images,” it would be listed as www(dot)sample(dot)com/images/.

    In this instance, the bot sees the “/” in the www(dot)sample(dot)com/images/ and will ignore it.

    To allow all bots to visit every file and directory, you would write this in your file:


    User-agent: *

    Disallow:

    Again, User-Agent uses the wildcard to say that whatever is in the Disallow line applies to all bots. Since the Disallow is blank, there is nothing to match, and so all files and directories are available.

    If you want every bot to ignore one directory, we would write:


    User-agent: *

    Disallow: /images/

    Again, the wildcard says all bots should follow the Disallow. The Disallow asks the bots to stay away from /images/. If the bots are compliant, they won't scan this directory or the files therein. Note again that I wrote “/images/” and not “/image”. You always want to include that final forward slash (/).

    To tell all bots not to scan a specific file, we use this code:


    User-Agent: *

    Disallow: /images/biggorillaonatricycle.jpg

    Now all bots should scan everything except the biggorillaonatricycle image. When it finds that picture in the “image” directory, it looks away, even though, let's face it, who wouldn't want to see that? An important thing to note here is that if we had, say, a secondary directory (named "imagestwo" perhaps) that held some photos and included the same picture, the bots would still scan that one, unless you told them otherwise.

    Here is how you could make it so that neither of the pictures of our buddy the gorilla riding on his tricycle get scanned:


    User-agent: *

    Disallow: /images/biggorillaonatricycle.jpg

    Disallow: /imagestwo/biggorillaonatricycle.jpg

    This rule applies to directories as well:


    User-agent: *

    Disallow: /images/

    Disallow: /imagestwo/

    Disallow: /aboutus/

    The above tells all bots to ignore the three directories. Note that we can also mix our directories and files together:


    User-agent: *

    Disallow: /images/

    Disallow: /imagestwo/

    Disallow: /aboutus/wearereallyevil.html

    More Search Optimization Articles
    More By Jamesp


       · Thanks for stopping by to read my article on the robots.txt protocol. In it, I...
       · Thanks for providing such an informative post. It is always great to see issues...
       · Thanks, glad you enjoyed it!
     

    SEARCH OPTIMIZATION ARTICLES

    - Implementing Six Sigma Methodology for SEO
    - Introducing Six Sigma Methodology for SEO
    - What is Mobile SEO?
    - Using Lynx for SEO Analysis
    - Mastering Lynx (Open Source Text Browser) fo...
    - More Blogging Tips: Cooking with Gas
    - Blogging Tips from Julie and Julia
    - SEO Essentials: the Proper Web Server and Pl...
    - Steps to Higher Rankings and Traffic
    - Building Linkable Pieces and Titles
    - Page Rank Sculpting
    - Page Rank Optimization
    - ClickTale Review
    - Final Issues: Moving Blogger to WordPress wi...
    - Avoid the Mistakes New SEOs Make





    © 2003-2009 by Developer Shed. All rights reserved. DS Cluster 5 Hosted by Hostway
    For more Enterprise Application Development news, visit eWeek