Protect Against Invaders by SPAM-Proofing Your Website

Benjamin Pfeiffer discusses how to SPAM-proof your website. He explains how to use Javascript and mod_rewrite to stop SPAMbots and Spybots from finding email addresses on your website. He also talks about how to find and set up the .htaccess file and gives examples of robots and how to block them.

Despite recent improvement in tools and programs in the battle against SPAM, most of us cannot escape the menace that plagues most of our inboxes on a regular basis. Each day most of us probably receive more SPAM than actual real email, and with Spammers getting more and more creative in their ways to circumvent traditional anti-SPAM tactics, it’s vital webmasters empower themselves with some anti-SPAM tactics for their own websites.

In this article I will discuss a few ways to SPAM-proof your website against malicious SPAM robots that inevitably collect your email to be sold by the thousands to Spammers worldwide, whether it be for using your information inappropriately, or simply for no-good reasons.  These tactics are so effective that within a month of implementing them, you should see a dramatic drop in the amount of SPAM that makes it through to your website email addresses, not to mention a decrease in bandwidth.

How to Stop SPAMbots Dead in Their Tracks

1. Using JavaScript
2. Using Mod_Rewrite

Both of these techniques are effective in blocking SPAMbots and Spybots from finding your email address or other personal information on your website. While JavaScript is an easier solution, using mod_rewrite to block SPAMbots is more technical and requires knowledge of editing your .htaccess file. It’s best to try the JavaScript method first, and then venture into using mod_rewrite to further block SPAMbots from hitting your website.

Using JavaScript

To understand how to use JavaScript to block SPAMbots from harvesting your email, let’s examine the ways that they find your email in the first place.

1. Mailto: Links – these are common links placed in the HTML code of a website, offering a potential visitor the ability to send an email to the webmaster of the site.  A visitor clicks on the email link and it opens an email client with the To: field already filled in with the address specified in the code.  These links are the prime target of SPAMbots harvesting your email address, and simple use of JavaScript can cut down on email harvesters hitting your inbox with SPAM.  The main objective with using JavaScript is to change the appearance of your email address so that email harvesters do not recognize your email, but still retain complete functionality for legitimate visitors to send you an email.

2. Contact Forms - this is another prime location for SPAMbots to leave their tracks, steal your email address and be gone, ready to report back with fresh email addresses.  These forms are another common feature on websites, and the following is what most often causes SPAMbots to find your email.

<input type=”hidden” name=”recipient” value=”support@example.com”>

The following are examples of JavaScript that you can use to make your email address appear different in the code but still perform the same function as if it were regularly coded in HTML (ie: mailto:support@example.com).  To use these examples, just copy and paste the code into your HTML document and replace the required field(s) with your email address.

1. Basic Email Script

<script language=JavaScript>
<!–
document.write(“support” + “@” + “example.com”);
//–>
</script>

Result:  support@example.com

2. Basic Mailto: Email Script with Link Text

<script language=JavaScript>
<!–
var username = “support”;
var hostname = “example.com”;
var linktext = username + “@” + hostname;
document.write(“<a href=” + “mail” + “to:” + username +
“@” + hostname + “>” + linktext + “</a>”);
//–>
</script>

Result: support@example.com

3. Inline JavaScript

<a href=”#” onclick=”JavaScript:window.location=’mailto:’+'support’+'@’+'example’+’.com’”>Link Text</a>

Result: Link Text

The three scripts options above should give you some flexibility in how you choose to use these on your website.  Remember to insert your own email address into the fields where the support@example.com email address is located.

Problems Associated with JavaScript

There doesn’t appear to be many problems with using the above scripts in the HTML code of your documents.  The biggest issue may be incorrectly coding the scripts or issues with older browsers that do not support JavaScript. One last issue that may see its day in history is email harvester programmers being able to find email addresses among the JavaScript code.  While this may be a reality sooner than we expect, for the most part JavaScript should be SPAM-proof enough to block most malicious SPAM bots.

In this section, the use of mod_rewrite is very successful in blocking the SPAMbots and other spybots that visit the website with a mission to either steal your email address or grab information from your website without your permission. Consider this method as a step above using JavaScript, because it stops them before they ever read the webpage itself.  So if you are thinking of using JavaScript on the page to block bots from finding your email, consider the use of mod_rewrite as a primary defense weapon against SPAM and other malicious robots.

One note to readers: The use of mod_rewrite requires that you have it installed on your server, and you have the ability to edit the .htaccess file.  Below is a simple way to locate the .htaccess file while using a program such as CuteFTP (or a similar FTP client that performs the same functions).  If you are unsure whether you have mod_rewrite installed, you should first consult the server administrator with your primary hosting company.  Ask them if you have mod_rewrite and permissions to edit the .htaccess file.

How to Find .htaccess in a Common FTP Client

To locate the .htaccess file, most often you need to display all hidden files present when connecting to your hosting account.

To enable your FTP client to display all hidden files (.htaccess and many other files not normally seen by the user).

  1. First locate your saved site properties.

  2. Right click on the profile of the website you want to display hidden files. This is most often located in the “FTP Sites” section of most clients.

  3. Once you right click on the FTP site, select “SITE PROPERTIES” from the menu.

  4. An option box will load up displaying the site properties of your site. Look for a tab called “ACTIONS” and click on it.

  5. It will display the actions of the site. Locate a gray box called “FILTERS” and click on it.

  6. This will display the “Filters” properties of the site.

  7. Locate the “Enable Filtering” from the options available. Make sure this box is checked.

  8. Once you have checked the enable filtering box, a small box at the bottom of the options will be displayed.

  9. It should say something similar to “Enable Server Side Filtering”. Make sure this box is checked as well.

  10. Now enter the following into the “Remote Filter” box: -a

Once you have entered in the filtering options, make sure to click “Ok” or “Apply” in order to save your changes.  You should now be able to see all hidden files on the server.  Make sure you start a new connection to view all files.  If you are still having trouble viewing all your files and can’t seem to locate the .htaccess file, don’t give up, but consult the system administrator of your hosting account to assist.

How to Setup Your .htaccess File

Once you have confirmed that you do have a .htaccess file, and mod_rewrite is turned on, add the following lines to your .htaccess file:

Options +FollowSymlinks
RewriteEngine On
RewriteBase /

The robots that you will want to block will depend on your preferences, as well as any bots that frequent your website on a regular basis.  Cutting down on bandwidth costs, preventing robots from collecting your email address, and preventing robots from collecting information from you or your website are all good reasons to block a potential robot.

The best method of deciding which robots to block is to do some quick research about the robots that like to take residence on your site.  If you cannot find reliable information about a robot or its use of something you would not approve of, simply block the robot by using a robots.txt file.  If you find that a robot does not obey the robots.txt file, pull out the big guns and use mod_rewrite to stop them dead in their tracks.

Example Robots

There are several common bots that one might run into frequently such as “Microsoft URL Control” which is a robot that ignores the robots.txt file and fetches as many pages as it can before leaving the site.  This SPAMbot is used by many different people all using the same name. 

 The second robot that frequents websites is the NameProtect (NPbot) robot. This robot’s job is to collect information about websites that are potentially violating brand names of clients.  This robot does not obey the robots.txt file, responds to emails sent to the NameProtect company, and serves no good purpose as far as we have determined.

To Block the Microsoft URL Control Robot by User Agent:

RewriteCond %{HTTP_USER_AGENT} “Microsoft URL Control”
RewriteRule .* – [F,L]

To Block the Nameprotect Robot by User Agent:

RewriteCond %{HTTP_USER_AGENT} “NPbot”
RewriteRule .* – [F,L]

Furthermore, once you establish a good number of bots that you would like to block using mod_rewrite, you can compile a list and add comments as well, like so:

RewriteCond %{HTTP_USER_AGENT} “Microsoft URL Control” [OR] #bad bot
RewriteCond %{HTTP_USER_AGENT} “NPbot”
RewriteRule .* – [F,L]

One thing to note about using the examples here, make sure that you correctly know how to insert the script into mod_rewrite and that you do so in the proper rules required for this technique to be effective.  Additionally, one last thing to note is that mod_rewrite rules are not an ultimate solution to SPAM and malicious bot problems. You can, however, effectively block a good majority of bots out there and dramatically cut down on the amount of SPAM you receive. If you use the JavaScript methods and mod_rewrite then, not only will your website be one heavily guarded anti-SPAM site, but you may actually enjoy downloading your all email messages to find them SPAM free.

Google+ Comments

Google+ Comments