Search Engine Spiders
  Home arrow Search Engine Spiders arrow Page 4 - Protect Against Invaders by SPAM-Proof...
SEO Chat Forums  
Choosing Keywords  
Google Optimization  
Link Trading  
MSN Optimization  
Search Engine News  
Search Engine Spiders  
Search Optimization  
Web Directories  
Website Marketing  
Website Promotion  
Website Submission  
Yahoo Optimization  
SEO Tools
Adsense Calculator
AdSense Preview
Advanced Meta-Tags
Alexa Rank Tool
Check Server Headers
Class C Checker
Code to Text Ratio
CPM Calculator
Domain Age Check
Domain Typos
Future PageRank
Google Dance
Google Keywords
Google Search
Google Suggest
Google vs Yahoo
Indexed Pages
Keyword Cloud
Keyword Density
Keyword Difficulty
Keyword Optimizer
Keyword Position
Keyword Typos
Link Popularity
Link Price Calculator
Meta Analyzer
Meta Tag Generator
Multiple Link Popularity
Page Comparison
Page Size
PageRank Lookup
PageRank Search
Robots.txt Generator
ROI Calculator 
S.E. Comparison 
S.E. Keyword Position 
Site Link Analyzer 
Spider Simulator 
URL Redirect Check 
URL Rewriting 
Mobile Linux 
APP Generation ROI 
IBM® developerWorks 
SEO Weekly Newsletter
 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid 
Request Media Kit
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
SEARCH ENGINE SPIDERS

Protect Against Invaders by SPAM-Proofing Your Website
By: Benjamin Pfeiffer
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 4 stars4 stars4 stars4 stars4 stars / 17
    2004-05-05

    Table of Contents:
  • Protect Against Invaders by SPAM-Proofing Your Website
  • How to Use the JavaScript Method
  • Using mod_rewrite
  • Blocking Malicious "Good for Nothing" Robots

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     
    ADVERTISEMENT


    Protect Against Invaders by SPAM-Proofing Your Website - Blocking Malicious "Good for Nothing" Robots


    (Page 4 of 4 )

    The robots that you will want to block will depend on your preferences, as well as any bots that frequent your website on a regular basis.  Cutting down on bandwidth costs, preventing robots from collecting your email address, and preventing robots from collecting information from you or your website are all good reasons to block a potential robot.

    The best method of deciding which robots to block is to do some quick research about the robots that like to take residence on your site.  If you cannot find reliable information about a robot or its use of something you would not approve of, simply block the robot by using a robots.txt file.  If you find that a robot does not obey the robots.txt file, pull out the big guns and use mod_rewrite to stop them dead in their tracks.

    Example Robots

    There are several common bots that one might run into frequently such as "Microsoft URL Control" which is a robot that ignores the robots.txt file and fetches as many pages as it can before leaving the site.  This SPAMbot is used by many different people all using the same name. 

     The second robot that frequents websites is the NameProtect (NPbot) robot. This robot's job is to collect information about websites that are potentially violating brand names of clients.  This robot does not obey the robots.txt file, responds to emails sent to the NameProtect company, and serves no good purpose as far as we have determined.

    To Block the Microsoft URL Control Robot by User Agent:

    RewriteCond %{HTTP_USER_AGENT} "Microsoft URL Control"
    RewriteRule .* - [F,L]

    To Block the Nameprotect Robot by User Agent:

    RewriteCond %{HTTP_USER_AGENT} "NPbot"
    RewriteRule .* - [F,L]

    Furthermore, once you establish a good number of bots that you would like to block using mod_rewrite, you can compile a list and add comments as well, like so:

    RewriteCond %{HTTP_USER_AGENT} "Microsoft URL Control" [OR] #bad bot
    RewriteCond %{HTTP_USER_AGENT} "NPbot"
    RewriteRule .* - [F,L]

    One thing to note about using the examples here, make sure that you correctly know how to insert the script into mod_rewrite and that you do so in the proper rules required for this technique to be effective.  Additionally, one last thing to note is that mod_rewrite rules are not an ultimate solution to SPAM and malicious bot problems. You can, however, effectively block a good majority of bots out there and dramatically cut down on the amount of SPAM you receive. If you use the JavaScript methods and mod_rewrite then, not only will your website be one heavily guarded anti-SPAM site, but you may actually enjoy downloading your all email messages to find them SPAM free.


    DISCLAIMER: The content provided in this article is not warranted or guaranteed by Developer Shed, Inc. The content provided is intended for entertainment and/or educational purposes in order to introduce to the reader key ideas, concepts, and/or product reviews. As such it is incumbent upon the reader to employ real-world tactics for security and implementation of best practices. We are not liable for any negative consequences that may result from implementing any information covered in our articles or tutorials. If this is a hardware review, it is not recommended to open and/or modify your hardware.

     

    SEARCH ENGINE SPIDERS ARTICLES

    - The Yahoo SLURP Crawler
    - How Search Engines Work (and Sometimes Don’t)
    - Spider Guts
    - Score One for the Spiders?
    - Protect Against Invaders by SPAM-Proofing Yo...
    - ROBOTS.TXT Primer
    - Designing Websites For Humans In A World Of...





    © 2003-2009 by Developer Shed. All rights reserved. DS Cluster 1 Hosted by Hostway
    Stay green...Green IT