Write a Robots.txt File - More uses for Disallows and User-agents in Robots.txt
(Page 4 of 5 )
You can provide multiple Disallows to one User-agent. In the following example, all spiders will be told not to index the cgi-bin and the images directories.
User-agent: *
Disallow: /cgi-bin/
Disallow: /images/
We can also use the robots.txt file to help improve search engine rankings that we may have achieved with a dynamic page such as php. Googlebot may have problems with them if there are too many variables in the Session IDs of the URL.
A URL with session IDs will look similar to the below:
http://www.yourcoolsite.com/cat.php?par=887&show=subcats?=0431Tr
If your cool website is written in php and is converted into HTML pages for googlebot to index, the robot will still try to index the php pages. After copying the pages from php to HTML, place each set of pages in their own folder. Title them something easy for you to remember. Place all the php pages into a folder named "php." This will allow you to leave the HTML pages under the root directory which is easily indexed by the spiders.
Then using what you have learned so far, implement the following in your robots.txt file:
User-agent: googlebot
Disallow: /php/
Now we have kept googlebot out of the php pages, which the bot usually has problems crawling. It leaves the spider to crawl the more friendly html pages, and it will not see your original content duplicated on your site between the php and html versions. If the pages are cleanly coded, this will often result in improved rankings in all three of the major search engines.
Next: Leaving Comments in the File >>
More Search Optimization Articles
More By Clint Dixon