In Terri’s article, “Click Fraud: What it is, What You Can Do About it”, she gave the following bit of advice: if you want to do the analysis yourself, you need to obtain your website’s server logs. You should be able to get these from your Web hosting company. Once you have them, there are a number of data points to examine. These include the following:
- Repetition of IP addresses
- A large or irregular number of clicks from the same geographic area
- IP addresses that belong to cloaking software companies
- A change in the amount of traffic seen for certain keywords, particularly a rapid increase
- A doubling or tripling of clickthroughs without any bid changes or rank changes (or, possibly, a corresponding increase in business)
This is not something you can do once and then just forget. Compiling statistical data takes time and effort, but it is the kind of data that Google and Overture will take seriously; you cannot approach them with a suspicion and expect them to listen unless you can back it up with hard data. You should monitor you bids daily, and even review them weekly or monthly. Remember, nobody is going to be as interested in making sure your business succeeds as you.
At this point there is something I want to point out. People that engage in click fraud simply click from the link on the pay per click link, go to your site, and leave. They aren’t doing anything on your site, they aren’t sticking around reading your page or visiting other pages. You can use this information to help drastically narrow down any possible fraud.
To that end, it would probably be good to actually know HOW to get this done. In this article I’m going to avoid going into a specific programming language, but if you like, I will be happy to provide you with some PHP code to accomplish what will be discussed below. Simply leave a comment about this article and let me know you’re interested.
For this you will want to create a simple four row table with whatever database your Web host uses (probably MySQL … because of that I am going to assume this for how I describe the fields below). For this example we are going to call the table “fraudcheck,” and the four rows will be:
FraudKey (double) auto increment primary key,
IPAddress (varchar 20),
AnotherPage char(3) (default to “no”)
Don’t worry about the names of the rows, the meanings behind their names will become apparent in a moment.
To make the final code easier to understand, allow me to explain what will happen within the code itself.
At the very top of your page you want to check and see if the referer is one of the companies with which you have a pay per click account.
If it is, then capture the person’s IP address and insert a new row into your table. All you need to insert is the IP address and the date and time. The FraudKey field and AnotherPage fields will take care of themselves.
Now, you’ll need to read that record back out again because we’re going to need that FraudKey to track a behavior of the user, namely whether or not they do anything else on the site besides enter the main page.
To do that, what you’ll want to do is select all rows from the database where the IPAddress is equal to the one you just wrote…but you’ll want to order the results by the FraudKey field in desending order. The first record in that result set will be the one you are after.
With that done (while it seems wordy, this would take place very fast, probably only about a hundredth of a second or so), you display the page to the user.
From here I won’t be able to give you any code because it would wildly depend on exactly how your site is set up, but all that needs to be done is, if the user clicks on any link on your site, you are going to update the record containing the FraudKey field so that AnotherPage is now set to “yes.”
Understand that at this point, all you have to do is, once a day, simply pull all rows from the table where AnotherPage is still “no.” That will give you all users that visited your main page from the pay per click engine, but didn’t do anything on your site, thus drastically narrowing down the amount of potential traffic you have to wade through.
Now, order the results by IP address.
What you are looking for is two things:
First, if you get more than a couple of clicks from the same IP address, you have some potential fraud.
If you get more than a few clicks (ten or so) from IP addresses that are very similar, then you have some potential fraud.
If you’re getting a lot of either of those two, then you have some definite fraud.
You see, people that come to your site legitimately from a search engine are searching for something. Because of the nature of pay per click ads, most people that click on them know what they are. They know what kind of site they are visiting.
A legitimate visitor should be doing more than just jumping to your main site and not going anywhere else. Thus any traffic that enters your site from a pay per click engine that doesn’t visit any other pages is automatically suspect. If you get someone (the same IP address) visiting your site more than once or twice and not doing anything or going anywhere on your site except visiting your main page, that is almost certainly fraud, as is any significant number of visits from IP addresses that are very close to each other.
Someone who is committing click fraud won’t be staying around for 15 seconds, so the code won’t run. On the other hand, someone who is actually visiting your site will certainly be around at least that long, if not longer. This will help you to further filter out the legitimate traffic from the fraud.
Another option, if you have access to your Web server’s log files, is to simply use one of the free tools that are already available. A fantastic one is aWebVisit.
This tool will give you all kinds of information about your visitors. Besides being useful for fraud tracking (simply look for similarities amoung people that don”t stay more than a few seconds and are coming from pay per click engines), this tool provides all kinds of robust user tracking features that can simply make running your site easier. You can see what people are doing, how long they are spending doing it, what their click paths are (which will help with making your site design as efficient for your users as possible) and so on.
If you want, you can find several tools that are like aWebVisit by going to the CGI Resource Index — Logging and Statistics page. Several of the tools available there will create their own log files instead of relying on your server log file if you don’t have access to it. Some of the programs on this site are free, and others come with (usually) a fairly nominal charge.
The benefit to using a program like aWebVisit is all of the other “stuff” you can learn. The minus to such a tool is that you don’t have the same level of flexibility in filtering out data that you would if you wrote a PHP program to the specifications I provided earlier.