Definition of the Factors
To quantify keyword difficulty, you will associate it with the keyword searches; the higher the keyword search volume, the more competitive the keyword, in most cases. We can look not only at the amount of search volume, but also the number of indexed pages in Google containing the exact term. The higher the number of competing pages, the more the keyword would tend to be competitive.
Therefore, to give a mathematical definition, keyword difficulty can be approximated by the following formula:
Keyword Difficulty = Keyword Search Volume (Exact match) x Google competing pages
In the above definition, it is obvious that if a certain keyword has a very high search volume and many competing pages, it is a difficult keyword for which to rank in Google.
The size of the domain can be measured by the number of indexed URLs in Google using the operator:
So if the ranking URL is http://www.seochat.com , the site operator query will be:
The main objective is to determine whether Google prefers to rank big websites for highly competitive terms. In this study, you will need a statistically valid sample size. A sample of size of 30 is selected. The 30 random samples will be comprised of the following breakdown:
~10 highly competitive keywords
~10 medium competitive keywords
~10 very easy (non-competitive keywords)
Below is the procedure for gathering the data:
Step 1: Make a final list of keywords selected – a total of around 30 keywords based on the sampling distribution, with varying difficulty. Put the data in a table format for easier organization. Use the Google Keyword Tool and the Google search engine itself to get the raw data for keyword difficulty computation.
Step 2: Using Google.com, get the top 10 ranking URLs for each of those keywords on the list.
Step 3: Get the number of indexed URLs for each of those top 10 ranking domains, starting from position 1 and going through to position 10, removing duplicate domains and outliers (refer to spreadsheet for details; link provided in a later section). Tabulate the data in the Excel spreadsheet.
Step 4: Compute the average number of indexed URLs for each keyword from position 1 through position 10 domains.
Step 5: Make a correlation plot between average number of indexed URLs vs. keyword difficulty.
Step 6: Make a regression analysis on the data.
Step 7: We’ll draw conclusions and make recommendations.
Keyword Selection List and Keyword Difficulty Data
Data has been gathered according to the methodology set in the previous section, and then keyword difficulty has been calculated. Below you’ll see a screen shot of the data gathered:
You’ve probably noticed that the highly competitive keywords (the first ten) have very high keyword difficulty values as compared to medium and easy keywords.
After the top 10 domain name data is gathered from Google for the selected keywords, the indexed pages of the domain are gathered as well. How many pages are indexed correlates with the size of the domain.
However, after the data has been gathered, to ensure accuracy of the results, the data outliers are removed. These outliers are defined as data points outside:
Outlier = average + standard deviation;
In layman’s term, this outlier represents some noise or special cases which are not considered “normally occurring.” After removal of the outliers, the final average data will be used for correlation plot. Below is screen shot of the data table containing an outlier:
Data Tabulation of the Overall Results
Once the outlier has been removed and the average numbers of indexed pages are re-computed, a new column will be added to the original data table labeled ”Average indexed URLs of Top Ranking domains.”
For example, in the above screen shot, for the competitive keyword “books,” the average indexed URLs for all of the top 10 ranking domains for “books” (position 1 to position 10) is 7571714. This number of indexed pages signifies ”very large ranking domains” for Google’s top 10 positions.
Since you are interested in finding out whether Google prefers to rank “big” sites for “difficult” keywords, you will need to make a correlation plot between these two factors/variables (keyword difficulty and average number of indexed URLs of top ranking domains).
To download all of the data sets, tables and computations used in this article for your own evaluation purposes, you can go here:
Finally, once the data table has been finalized, you are ready to make a correlation plot. Using the MS Excel correlation plot feature generates a chart that looks like the one below:
It is surprising to see that there exists a strong correlation (78%) between “average indexed URLs of Top Ranking domains” and “Keyword difficulty.” In short, big domains are the ones ranking for difficult keywords. Stating this in another way, “Google typically prefers to rank big websites for highly competitive terms.”
For easier keywords (low keyword difficulty value), Google returns smaller websites with a lower number of indexed URLs. However for highly difficult keywords like “LCD” and “SEO,” Google prefers ranking big websites.
Conclusions and Recommendations
The result of the study pointed out that Google typically loves big websites for very competitive terms. There are lots of reasons for this, aside from the size of the website alone. Other factors could be the age. Of course these websites are old and continually updating their content. As a result, such websites become very large and earn a lot of trusted/natural links which forms a strong ranking foundation in Google for competitive terms.
You should consider the following recommendations:
First, if a website is small, this also means that the website is still young or underdeveloped. Such a web site should never target medium to highly competitive keywords because Google apparently prefers to rank big websites for competitive terms. This will avoid wasting a lot of resources on something that would be very hard or impossible to achieve.
Second, for big websites, it might be feasible to target competitive terms, provided other important search engine ranking factors are also considered. These include such items as the link profile of the domain, content and authority.
Third, if there is one important thing you can do to your website now which can substantially help search engine rankings in the future, it is adding and updating content. Doing this can significantly increase the size of your website, which is a positive factor for ranking competitive terms. Also, it can be a big factor for attracting long tail traffic and natural deep links.
Fourth, big websites do have some advantages in earning natural links and earning long tail traffic. Once you start earning deep links, it helps you to attain a natural link profile for your website (your inner pages have value or importance), which is a positive factor. And then, these pages will also rank for deep searches and contribute to your overall traffic.
Fifth, as far as internal link development, if you link all of these inner pages to your home page, it will substantially strengthen the home page as far as ranking for highly competitive terms. This is similar to what http://www.nasdaq.com does to rank in the first position for “stock market.”