Search Engine Optimization, Website Development and Search Engine Spiders - Robot TXT Use
(Page 4 of 5 )
Robot.txt is the file that instructs different search engine robots how to crawl your website. You can block search engine spiders from indexing your entire website or exclude specific folders and pages.
Even if you want search engines to spider you entire site, including robot.txt will keep your error log clear of unsuccessful robot.txt requests.
Example:
User-agent: * Disallow: | Everything is allowed for all robots |
User-agent: [botname] Disallow: / | Everything is disallowed to a specific bot |
User-agent: * Disallow: / | Everything is disallowed to all bots |
User-agent: * Disallow: /cgi-bin/ Disallow: /tmp/ Disallow: /private | Following directories are not allowed to all bots |
User-agent: googlebot Disallow: /cgi-bin/ Disallow: /tmp/ Disallow: /private | Following directories are not allowed to Googlebot |
User-agent: * Disallow: /Folder/some-content.html | Page not allowed for all search engines |
User-agent: SLURP Disallow: /Folder/some-content.html | Page not allowed to Yahoo bot only. |
SEO Chat has an automatic robot.txt generator. Specify bots you would like to exclude or paste in a specific URL that should be kept private. Use Robot.txt generator. Learn more about Robot.txt rules.
Robot Specific Meta Commands
Apart from robot.txt, you can specify commands to search engine spiders on each separate page.
Noindex - page not indexed by Google, Yahoo, MSN and Ask.
Nofollow - all links on the page are nofollowed by Google, Yahoo, MSN and Ask.
Noarchive - pages is not cached by Google, Yahoo, MSN and Ask.
Noodp - stop search engines from pulling the description from DMOZ on Google, Yahoo and MSN.
Noydir - stops Yahoo from pulling a page description from Yahoo Directory.
Nosnippet - stops Google from generating a page description based on page snippets.
Enter the above commands to the code of each separate web page in this format:
<meta name="ROBOT NAME" content="Noindex" />
If you want all robots to abide at once, enter
<meta name="robots" content="nofollow"/>
Search Engine Pitfalls
Spiders do not like:
Session IDs
Frames
Logins
Forms
Google can fill out forms, make selections and click on some buttons in order to see results past the form.
....when we encounter a <FORM> element on a high-quality site, we might choose to do a small number of queries using the form. For text boxes, our computers automatically choose words from the site that has the form; for select menus, check boxes, and radio buttons on the form, we choose from among the values of the HTML... - Google Webmaster Central
Next: Sitemaps >>
More Search Optimization Articles
More By Ivan Strouchliak