Estimate Website Traffic with Compete.Com by Using Regression Analysis

As a webmaster who competes with other websites or has an interest in entering a new niche, you might want to get traffic numbers for sites that you don’t own. You can’t actually do that with Google Analytics…but Compete.com, combined with Google Analytics, may give you at least a very reasonable estimate. Keep reading to find out how.

It is important as a webmaster to at least estimate the number of actual unique visitors to any website. Of course, you know that you can get accurate data using Google Analytics and other tools. However, you need to be the owner of the website in order to see those data.

If you are not the verified owner of the website, then you cannot obtain website traffic data using Google Analytics or other tools such as Stat Counter.

A feasible but not entirely accurate approach is to use online tools that can estimate the traffic/unique visitors’ data of any website, and for free. One of these tools is Compete.com.

However, the main problem with using such tools is the accuracy of the result. The data given by Compete.com could never be the same as the Google Analytics data.

While the tool provides you with some data, you will never have a clue as to how it relates to Google Analytics which is a standard in web analytics.

This study aims to estimate the unique visitors of a website as if measured by Google Analytics but using Compete.com’s raw data.

At the end of this study, any webmaster will be able to estimate the number of unique monthly visitors to any website, if they are using Google Analytics, given its Compete traffic value, to a certain accuracy level (an 83% confidence level, for example).

The main objective is that, even if you do not have access to the Google Analytics account of a certain website, you will still be able to estimate the number of  unique visitors it receives, using Compete.com data.

Methodology of the Study

In order to estimate traffic, a model needs to be generated using regression analysis. To conduct a regression analysis, the following steps are employed:

Step 1: Select a website with at least one full year of Google analytics data.

Step 2: Gather the Compete.Com unique visitors’ data of the website. Compete.com by default provides one full year of data (12 months maximum).

To do this, you need to go to this URL: http://compete.com/. Click “Site Profile,” enter the domain name and then hit “Go.”

Screen shot:

The unique visitors’ data is available from the resulting unique visitors’ plot.

Step 3: Gather the equivalent unique visitor’s data in Google Analytics for those months with Compete.com data.

To get the absolute unique visitors data in Google Analytics, first, click “View Report” after logging to your Google Analytics account. In the “Dashboard,” adjust the date range to reflect the same date range used by Compete.com’s data gathering.

For example, if Compete.com provides September 2009 to September 2010 data, then adjust the date period to September 1, 2009 to September 30, 2010 in Google Analytics.

Finally, click “Visitors” -> click “Absolute Unique Visitors.” To get monthly data, click the ”Month” option beside “Graph by:” It should look like the screen shot below:

Step 4: Summarize all the data gathered in an Excel spreadsheet.

Step 5: Perform regression data analysis.

Step 6: Make conclusions and consider recommendations/case examples.

Below is the screen shot of the Excel spreadsheet containing the data:

You can also download the Excel regression analysis as discussed in this study at the link.  

The plot shows that Compete and Google Analytics are “positively” correlated. Refer to this page for the definition of positive correlation. This means that a high number of unique visitors in Google Analytics relates to a high number of unique visitors in Compete.com.

The x-axis is the Compete.com data, while the y-axis is the Google Analytics data in the Scatter plot.

To do regression analysis in Excel, the “Analysis Toolpak” add-in must be installed. 

Below are the results of the regression analysis:

 

The R squared is around 0.38, or 38%. To test if the regression model is significant or not, the P-value is compared to an acceptable error.

Suppose our confidence level is 83%. The acceptable error, then, is 17%. If the p value is less than 0.17, then you can say that the relationships between Google analytics and Compete data are significant to an 83% confidence level. 

Otherwise, if the p-value is greater than 0.17, then the relationship is not significant.

Conclusion

Based on the analysis, the p-value of the ANOVA (analysis of variance) is 0.05638, or 5.638%, which is less than 17%. Therefore the regression model is significant at an 83% confidence level.

If this is your first encounter with regression analysis, it is recommended that you read this linear regression analysis tutorial for details on doing regression analysis in MS Excel, as well as interpreting the results.

The pink line in the graph as shown in the above screen shot is governed by this regression model:

Y= 1.2113x + 3226.8 where X is the Compete.com data and Y is the predicted Google analytics equivalent data.

However, for real-world applications, estimating the 83% confidence interval is much more useful and meaningful.

Based on the regression analysis, the following are the 83% confidence interval equations:

Upper 83%: Y = 2.031X + 4477.2945
Lower 83%: Y = 0.392x + 1976.291

Website Traffic/Unique Visitors Estimation Examples

Now that the regression analyses are done, it is time to use it to estimate Google Analytics website traffic using Compete.com’s unique visitors data.

Case Example 1: Estimate the latest monthly unique visitors measured by Google analytics of the seochat.com domain.

Step 1. Get the latest Compete.com (http://compete.com/) unique visitors data for the seochat.com domain. Go to “Site Profile,” enter seochat.com in the text box, and then press the go button.

According to Compete, the seochat.com domain’s unique visitors as of September 2010 (latest month in their report) are around 186,389.

Let’s use the 83% confidence interval equations in the regression analysis model to estimate the upper and lower Google Analytics equivalent number of unique visitors.

Upper 83%: Y = 2.031X + 4477.2945= 2.031*186389 + 4477.2945 = 383033
Lower 83%: Y = 0.392x + 1976.291= 0.392*186389 + 1976.291 = 75041

Interpretation of Results: The seochat.com domain’s September 2010 unique visitors number somewhere around 75,041 to 383,033 at an 83% confidence level.
Of course, there is a 17% chance of error/prediction mistake, because the confidence level is 83%.

Using the above analysis, Compete.com data will be much more useful and meaningful if it is used to compute the equivalent upper and lower limit of Google Analytics unique visitors.

Case Example 2: Estimate the January 2010 monthly unique visitors as measured by Google Analytics for Americantowns.com using Compete.com data.

Given: Using the Compete.com site profile tool, Americantowns.com had around 1,558,548 unique visitors in January 2010.

Solution: Using the regression model with 83% confidence interval models:

Upper 83%: Y = 2.031X + 4477.2945= 2.031*1558548 + 4477.2945 = 3169888
Lower 83%: Y = 0.392x + 1976.291= 0.392*1558548 + 1976.291 = 612927

And this press release claims that Americantowns.com got 3 million unique visitors, according to Google Analytics, in January 2010.

So this means that the “actual” website traffic falls between the estimated range using Compete.com’s data, or between 612,927 and 3,169,888 estimated unique visitors to the website.

If you need an online tool for this regression model for quick calculations, you can find it here: http://www.php-developer.org/estimateuniquevisitors/  

Of course, the accuracy of this regression model can be improved further by adding more samples and data to the calculation. The sample size analyzed is around 10, determined by this stat sampling tool:

http://www.danielsoper.com/statcalc/calc01.aspx  

This 10 samples result in an 83% confidence interval that explains the wide gap between the lower and upper limit of the Google Analytics unique visitors prediction.

 

 

 

Google+ Comments

Google+ Comments