I, Robots.txt - Creating Your First Robots.txt (Page 3 of 5 )
If you don't wish for any bots to to index your site, you would type the following into your text file:
User-agent: *
Disallow: /
In this example, the “*” is known as a wildcard and says that the rule applies to all bots. A wildcard is a special character that could stand for anything. In typical usage, if you write d*ng, a computer can interpret this as being: “ding”, “dang”, “dong”, “dung”, “dzing” and so forth. Simply put, the “*” could be anything.
The Disallow part says that no directory or file should be scanned. It's important to note how this works. The patterns in the Disallow are matched by using a substring comparison. The robot sees what is written there and says, “Does this directory or file contain this?” For instance, let's say our site is www(dot)sample(dot)com. If I have a directory called “images,” it would be listed as www(dot)sample(dot)com/images/.
In this instance, the bot sees the “/” in the www(dot)sample(dot)com/images/ and will ignore it.
To allow all bots to visit every file and directory, you would write this in your file:
User-agent: *
Disallow:
Again, User-Agent uses the wildcard to say that whatever is in the Disallow line applies to all bots. Since the Disallow is blank, there is nothing to match, and so all files and directories are available.
If you want every bot to ignore one directory, we would write:
User-agent: *
Disallow: /images/
Again, the wildcard says all bots should follow the Disallow. The Disallow asks the bots to stay away from /images/. If the bots are compliant, they won't scan this directory or the files therein. Note again that I wrote “/images/” and not “/image”. You always want to include that final forward slash (/).
To tell all bots not to scan a specific file, we use this code:
User-Agent: *
Disallow: /images/biggorillaonatricycle.jpg
Now all bots should scan everything except the biggorillaonatricycle image. When it finds that picture in the “image” directory, it looks away, even though, let's face it, who wouldn't want to see that? An important thing to note here is that if we had, say, a secondary directory (named "imagestwo" perhaps) that held some photos and included the same picture, the bots would still scan that one, unless you told them otherwise.
Here is how you could make it so that neither of the pictures of our buddy the gorilla riding on his tricycle get scanned:
User-agent: *
Disallow: /images/biggorillaonatricycle.jpg
Disallow: /imagestwo/biggorillaonatricycle.jpg
This rule applies to directories as well:
User-agent: *
Disallow: /images/
Disallow: /imagestwo/
Disallow: /aboutus/
The above tells all bots to ignore the three directories. Note that we can also mix our directories and files together:
User-agent: *
Disallow: /images/
Disallow: /imagestwo/
Disallow: /aboutus/wearereallyevil.html
Next: Limiting Bots >>
More Search Optimization Articles
More By Jamesp