Polite Bots - The Basics
(Page 2 of 4 )
You need to be able to control what gets seen and what does not get seen on your web site. Some pages on your site will contain sensitive information or content which has to be paid for by the visitor before viewing. There also may be some pieces of personal information which you simply don't want to be archived on the search engines. The most common way to handle this is by using the robots exclusion protocol. The basic form of the robots.txt file for Googlebot is this:
User-Agent: Googlebot
Disallow: /logs/
Apart from this form of the exclusion protocols (which is done by you saving the above command in notepad as robots.txt and uploading the file into your root directory), you can put in the meta tags a command disallowing a certain bot from indexing a certain page
<html>
<head>
<meta name="googlebot" content="noindex">
...
This covers the basics. Now we will delve into some details of the robots exclusion protocols. Note that we are dealing specifically with the Googlebot; for a list of other bots from other search engines you can go to http://www.robotstxt.org/, but this article simply deals with the robots exclusion protocols as explained over at the Google Webmaster Blog (hopefully in a much simpler manner).
Why bother with another robots exclusion protocols article? This one is all about Googlebot from those at the Googleplex, should clarify a few interesting questions such as issues webmasters have over "conflicting values" and answer questions about how exactly search engines handle meta tags (especially Google).
Next: Meta Tags and Content Values >>
More Website Submission Articles
More By Akinola Akintomide