Polite Bots - Is There Any Need to Trigger Google's Bot?
(Page 4 of 4 )
If you check most of the top bots of Google, Yahoo, MSN and then the alternative search engines, Ask and Snap, you will discover that the ones you will see the least of in your server logs are Ask's and Snap's. Ask is notoriously hard to trigger if you are an obscure site in fact. Google Is pretty much "all over the place," and the same is true for MSN and Yahoo.
ODP listings or using Adsense on your site will bring Googlebot over. Google will almost always index your site; maybe I am a bit relaxed over this because I have never had issues getting my pages indexed. But, if all fails, put a line of Adsense on your page, or create a blog on Blogspot and link to your web page; the robot will follow.
Malicious Bots
Robots are created by humans, so a robot simply does what its human programmer wants it to do. Some humans beings are more immoral than others, and write impolite scraper bots. Scraper bots are programs which crawl the hyper text structure of the web, looking for security flaws in order to access sensitive files.
I have had a fascination with protecting web sites against malicious bots, especially on sites where access to the content is restricted to members. Someday, when I have perfected a good system to keep scraper bots away I will write a piece on it. A good way to protect your files is to put them in a file which requires a user name and password (and which sets cookies on the user's PC) every time a request is made to it. Another way to protect your super sensitive information is to have a directory with a dynamically changing password, whose password changes with each request (apart from your own admin password).
Make sure you track your users' behavior and if you notice any such malicious bot, you can list it on http://www.robotstxt.org/ or check to see if you can make a complaint about a malicious program hacking your site (don't forget to note the host and the IP address!). If you don't keep an eye on your server logs you may never notice that you have been crawled by a malicious bot.
Note that you don't need to go to these levels to protect your files against the search engine bots. They are extremely polite and will definitely back off at the first sign of a restriction.
| DISCLAIMER: The content provided in this article is not warranted or guaranteed by Developer Shed, Inc. The content provided is intended for entertainment and/or educational purposes in order to introduce to the reader key ideas, concepts, and/or product reviews. As such it is incumbent upon the reader to employ real-world tactics for security and implementation of best practices. We are not liable for any negative consequences that may result from implementing any information covered in our articles or tutorials. If this is a hardware review, it is not recommended to open and/or modify your hardware. |