I, Robots.txt - A Few Pointers
(Page 2 of 5 )
In order to stop web spiders and web robots (as opposed to the real world kind, of which there is no stopping) from accessing and indexing every inch of your website, you use a file known as robot.txt. As the filename suggests, robot.txt is a text file. It contains data that tells a robot whether or not it can access certain areas of your site. Whether or not it abides by your wishes is another matter, but, as you will see in a bit, most of the big search engines presently do.
You store the file in the top-level directory of your site. If you have sub-domains, then each one will require its own robots.txt file. If you exclude it, then the rules will apply for yoursite(dot)com but not for, say, sample(dot)yoursite(dot)com.
Some examples of top-level directories are:
www(dot)sample(dot)com/robots.txt
www(dot)devshed(dot)com/robots.txt
www(dot)nerditup(dot)com/robots.txt
Examples of sub-domain directories where you would store the robots.txt
www(dot)your(dot)sample(dot)com/robots.txt
www(dot)some(dot)sample(dot)com/robots.txt
www(dot)bad(dot)sample(dot)net/robots.txt
Your First Steps
Your first step is to create a new text file. I use Notepad, but Word and OpenOffice work just as well, so long as you save the file as a .txt. The robots.txt file uses two basic lines, the User-Agent and the Disallow. User-Agent lists the spider or bot that you either wish to grant access to or deny access to. Disallow lists the directory or filename you wish for the bot/spider to crawl or not crawl.
Next: Creating Your First Robots.txt >>
More Search Optimization Articles
More By Jamesp