Home arrow Website Submission arrow Polite Bots
SEARCH DEVARTICLES

TOOLS YOU CAN USE

advertisement

Polite Bots


(Page 1 of 4 )

If you've ever wondered how to get a little better control over what parts of your web site get crawled by the search engines, how they crawl your pages, and how to encourage them to visit, keep reading. This article will explain the various protocols that the search engine robots (particularly Google's) follow. It will also touch upon ways to help you guard against scraper bots.

Polite Bots

There have been quite a number of articles on the Robots.txt primer. All have explained the basics of the robots exclusion protocols. Recently while working on removing some pages from Google's archives, I browsed through Google's Webmaster Central Blog over at blogspot and saw some posts by Dan Crow and Vanessa Fox. These posts explained how the Googlebot worked in detail. 

Apart from explaining the robots exclusion protocol in detail, Google has new tools which allow the removal of cached pages using the Webmaster Dashboard -- we will only cover that briefly in this piece since I go into detail about it in a different article. This article will look at the specifics of the robots.txt primer specifically for the Googlebot, quoting Dan Crow, Google product manager. Google's bot is incredibly polite when it is indexing pages; we will compare its behavior to that of some malicious scraper bots.

Googlebot has several quirks to it, as all bots do. We will look at a few of these quirks before we discuss the basics of search engine bots. For example if you have your web site down temporarily and you want Googlebot to come back you can use an HTTP 503 command to tell the bot (and your users) that your network is temporarily unavailable. Without this command it is probable that Googlebot will index your "this website is down for maintenance" page. You can get more information on the HTTP 503 status code at askapache.com.

Also note that if the Googlebot is crawling your site too frequently (and hence grabbing all your bandwidth), you can contact Google Support; they should work with you to ensure that the bots don't overload your servers. According to Vanessa Fox, there probably will be a tool that allows you to adjust the crawl rate of the Googlebot on your site. 

Googlebot is Google's primary agent in crawling and indexing pages on the web; it's incredibly large, truly living up to the name World Wide Web. As Dan Crow puts it, it's "really, really big." And not every one on the public web wants particular pages crawled. There are pages containing client information or inflammatory material. Some don't mind the crawling but don't want to be cached on Google's database for whatever reason.

More Website Submission Articles
More By Akinola Akintomide

blog comments powered by Disqus

WEBSITE SUBMISSION ARTICLES

- Google Market Share Hits Turning Point?
- Polite Bots
- Put Your Site on the Map with Google Sitemaps
- Open Directory Project: DMOZ: Frequently Ask...
- DMOZ: Advanced submissions and listings
- Search Engine and Directory Submission: Auto...
- Blogs and Internet Directories: The Same and...
- Submitting to Directories: A Comprehensive G...
- The DMOZ Directory: Get Your Site Listed
 
SEO Chat Forums  
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Contact Us 
Site Map 
Request Media Kit
Write For Us Get Paid 
SEO Weekly Newsletter
 
SEO Tools
Adsense Calculator
AdSense Preview
Advanced Meta-Tags
Alexa Rank Tool
Check Server Headers
Class C Checker
Code to Text Ratio
CPM Calculator
Domain Age Check
Domain Typos
Future PageRank
Google Dance
Google Keywords
Google Search
Google Suggest
Google vs Yahoo
Indexed Pages
Keyword Cloud
Keyword Density
Keyword Difficulty
Keyword Optimizer
Keyword Position
Keyword Typos
Link Popularity
Link Price Calculator
Meta Analyzer
Meta Tag Generator
Multiple Link Popularity
Page Comparison
Page Size
PageRank Lookup
PageRank Search
Robots.txt Generator
ROI Calculator 
S.E. Comparison 
S.E. Keyword Position 
Site Link Analyzer 
Spider Simulator 
URL Redirect Check 
URL Rewriting 
Privacy Policy 
Support 


© 2003-2012 by Developer Shed. All rights reserved. DS Cluster 9 - Follow our Sitemap
Popular SEO Chat Topics
All Tutorials & Tools