ROBOTS.TXT Primer

There is often confusion as to the role and usage of the robots.txt file. I thought it would be a good idea to dispel some myths and highlight what robots.txt files are all about.

There is often confusion as to the role and usage of the robots.txt file. I thought it would be a good idea to dispel some myths and highlight what robots.txt files are all about. Firstly, a robots.txt file is NOT to let search engine robots and other crawlers know which pages they are allowed to spider (enter), it is primarily to tell them what pages (and directories) they can NOT spider.

The majority of websites do not have a robots.txt, and do not suffer from not having one. The robots.txt file does not influence ranking in any way. Its goal is to disallow certain spiders from visiting and taking back with them pages you do not wish for it to do so.
{mospagebreak title=Reasons To Use A ROBOTS.TXT File&toc=1} Below are a few reasons why one would use the robots.txt file.

1. Not all robots which visit your website have good intentions! There are many, many robots out there whose sole purpose is to scan your website and extract your email address for spamming purposes! A list of the “evil” ones later.

2. You may not be finished building your website (under construction) or sections may be date/ sensitive. I for example excluded all robots from any page of my website whilst I was designing it. I did not want a half complete un-optimized page with an incomplete link structure to be indexed, as if found, it would reflect badly on myself and ABAKUS. I only let the robots in when the site was ready. This is not only useful for new websites being built but also for old ones getting re-launched.

3. You may well have a membership area that you do not wish to be visible in googles cache. Not letting the robot in is one way to stop this.

4. There are certain things you may wish to keep private. If you have a look at the abakus robots.txt file (http://www.abakus-internet-marketing.de/robots.txt) You will notice I use it to stop indexation of unnecessary forum files/profiles for privacy reasons. Some webmasters also block robots from their cgi-bin or image directories.

So let’s analyse a very simple robots.txt syntax.
{mospagebreak title=ROBOTS.TXT Analysis&toc=1} User-agent: EmailCollector
Disallow: /

If you were to copy and paste the above into notepad, save the file as robots.txt and then upload it to the root directory of your server (where you will find your home page)what you have done, is told a nasty email collector to keep out of your website. Which is good news as it may mean less spam!

I do not have the space here for a fully fledged robots.txt tutorial, however there is a good one at
http://www.robotstxt.org/wc/exclusion-admin.html

Or simply use the robotsbeispiel.txt I have uploaded for you. Simply copy and paste it into notepad, save it as robots.txt and upload it to your server root directory.
http://www.abakus-internet-marketing.de/robotsbeispiel.txt

Google+ Comments

Google+ Comments