Statistical Process Control Implementation in Web Analytics: Key Concepts

Many of you reading this article may not know what “statistical process control” or “SPC” is. It is a problem-detection technique implemented for any measurable process. Keep reading as we delve deeper into the appropriate methods and show how to apply them to SEO.

The word “statistical” implies that the data gathered is being analyzed to determine whether a certain event can be classified as a “common cause” or a “special cause.” “Common causes” are random normal noises in any process.

An example of a “process” is a “search engine” with search engine results as an output and its algorithm doing the computations. Random noises always occur in the ranking (e.g minor algorithmic adjustment), which explains the day-to-day fluctuations in ranking, which also results in fluctuations to website traffic.

There is no such thing as a constant ranking in Google or constant daily website traffic. It is impossible to believe your website is getting exactly 105 unique visitors a day for an entire month or that it is ranking at position 9 all through the year. Fluctuations in most common web marketing metrics (unique website traffic or conversion rate) are caused either by so-called “common causes” or ”special causes.”

On the other hand, a “special cause,” as the term suggests, is a rare event that happens in the process that results in a substantial problem. In search engine optimization, a good example of a “special cause” is a “complete overhaul of Google’s search ranking algorithm” or a major update, which can cause a substantial change in rankings and traffic.

Another special cause is internal, such as getting your entire website tagged with <meta name="robots" content="noindex, follow"> causing all the indexed pages in Google to disappear, as  well as the website traffic associated with it.

Details concerning “common cause” and “special cause” issues are covered in this article, and you will learn whether a certain trend of events in your website can be classified as normal or not. You can also learn about the general definitions of such causes. 

Identifying the nature and type of these causes is beneficial to both SEO practitioners and clients. Failing to examine the scenario without clear evidence (such as not using statistical process control) can lead to “over-control” (an excessive reaction to ranking/traffic fluctuations which in fact are normal, like getting position 6 yesterday and position 9 today) and “under-control” (underestimating a problem which in fact is serious, such as having 10 consecutive days of down-trending traffic).

By understanding the concepts of statistical process control, any webmaster, business owner or SEO can properly identify real problems on the spot. This will prevent you from spending a lot of money on situations which are not really a problem at all. A good example is immediately doing on-site tweaks (or purchasing an SEO investigation) by hiring an on-site worker or SEO professional on a ranking that falls from Position 6 to Position 9 or a traffic drop from 1000 to 950. This might sound crazy, but it happens a lot when dealing with paranoid clients and over-reactive SEO.

The objective of this article is to illustrate key concepts of how statistical process control can be used to monitor search engine marketing variables such as web traffic, conversion rate, and so forth to spot special causes (real problems) occurring in the website daily.

A process is said to be “stable” if only “common causes” appear. To detect is to use some kind of quality-monitoring chart, called a “control chart.” This type of chart has three vital components, which are required in every chart.

The first component is the upper control limit. This is a statistical maximum of the data points. These are computed based on the historical data. In search engine marketing, any web master can obtain historical data using Google Analytics, Stat Counter or any reputable web analytics software.

Computation of these limits is out of the scope of this article. A detailed discussion of how to control charts will be covered on ASP Free in an upcoming tutorial that will show you how to construct control charts with MS Excel for search engine marketing.

A point lying above the UCL means an “out of control” process. If you are monitoring the stability of your website traffic, exceeding the UCL is a special cause, but this does not necessarily mean a problem. For example, say your website makes its way to the first page of Digg; that could be the reason for a traffic spike. In search engine marketing, a point outside the upper control limit is often desirable, but not always, as this could mean a “denial of service” series of attacks.

The second component is the mean, which is the average of the historical data. Naturally, any data (for example “web traffic”) should swing above and below the mean caused by normal variations (common causes). If substantial data points change (for example, web traffic data falls below the average for more than 10 consecutive days), it is a sign that a special problem has now occurred.

The third component is the lower control limit. This is the statistical minimum of the data points. Again this is computed based on historical data. A point below the lower control limit means a “special cause” has occurred that requires investigation and corrective action.

Say you are monitoring daily web traffic, and then days after you start this, it goes below the lower control limit; a real problem has occurred in the website. In this situation, it is recommended that you execute an OCAP (out of control action plan) and the webmaster will then investigate the root causes for those out-of-control events and immediately formulate corrective actions.

Below is sample control chart implementing SPC techniques:

In the above screen shot only one special cause is found (a point above the UCL). This is not exactly a problem, however; indeed, something good is happening on that day (the website was featured on a high-traffic-related site). Aside from that, the daily unique visitors to the website appears “stable” and “in-control.” There is no need to investigate or execute any out-of-control action plan.

The red line is the upper control limit, green line is the center line (or mean) while the yellow line is the lower control limit. These limits are computed on three months of data on daily unique visitors before the control chart is set up.

Of course, if there is an improvement action implemented on the website (such as a six sigma project) it is worth recalculating the limits to reflect the new stable (improved) condition.

So far we have only discussed the basics of the control chart. How do we detect special causes occurring on the chart? Below is the short guide to realistically implementing out-of-control rules in web analytics (excerpted from "Use and interpretation of statistical quality control charts" by James C. Benneyan):

1. Any point above the UCL – as you saw on the previous page. This is not usually a problem with respect to the most common web analytic variables, but you need to re-confirm the real cause.

2. Any point below the LCL – This means a real problem has occurred, like unique daily visitors falling below the LCL (lower control limit). This is worth investigating. When the root cause has been determined, corrective action will be formulated and implemented.

3. More than eight consecutive points above the mean – This usually happens when an improvement plan has been permanently implemented, resulting in substantial traffic improvement. If this cause has been re-confirmed, then a recalculation of control limits is recommended to reflect the new stable condition.

4. More than eight consecutive points below the mean – This is a sign of a problem slowly affecting the website. An example of such a situation includes an indexing problem with inner pages; this may not be immediately noticeable, especially if your website has thousands of indexable URLs. This results in pages slowly dropping in the Google index, which also lowers the web traffic.

5. Six consecutive points with either an increasing or decreasing trend – If the traffic trend were six consecutive points downtrend, then it is worth investigating to determine if something has been adjusted in the website (e.g pages accidentally blocked in robots.txt). An up trend is not always a sign of a problem, but a sign of progress (improving website traffic).

6. No variation – If there is no variation, it is also worth investigating, because your charts should show normal variations. It may mean something is wrong with your web analytics software or the accuracy of your data.

If the control charts of your web analytics data do not appear to fall under any of the six classifications above, it is said that your process or website is “stable.”

See the sample screen shot below of out-of-control scenarios:

Image credits: GmcGlinn (http://commons.wikimedia.org/wiki/User:GMcGlinn )

Google+ Comments

Google+ Comments