Getting SEO Information from Google`s Cache

Google’s cache has been around in the search results for a long time. In fact, Google’s cache is often ignored in SEO strategy and analysis. But using it can provide you with a lot of information that can increase leads, sales, user satisfaction and even offer clues to existing problems with your website. This article will examine Google’s cache page in detail and recommend possible ways to use the information provided in your search engine optimization strategy.

Basic components of Google’s cache

Before we dive deeper, a short introduction to Google’s cache is helpful. You can see the Google cache of your website/web pages in three different ways.

Method 1: When you visit the URL that you need to view in the Google cache, click the Google toolbar (e.g using Firefox browser); in the drop down, click "Google snapshot of page."

Method 2: In the Google search result showing the URL that you need to view, click the "Cache" link.

Method 3: In the Google search box, type:

cache:www.thisisyourdomain.com/thisisthepage.htm

Or if you are checking for your home page:

cache:www.thisisyourdomain.com/

Details are shown in the screenshot below:

The Google Cache consists of the following important information:

  • The date Google cached the page

  • A link to the text-only version

  • The Google cache content in HTML view

You can find the Google cache header information (containing the cache date and a link to the "Text only version") in the top most portion of the cache result.

The Google cache content in HTML is shown in the usual location as the ordinary browser view.

Some websites do not like their content to be cached by Google and other search engines. They use the following code:

<meta name="robots" content="noarchive">

So when users from search engines type certain keywords in which your site is ranking very well, they will click on that result, hoping to read and understand your content/services. However, there are certain unavoidable scenarios when the server is down or your page loads up too slowly in that specific frame of time. The most common user reaction is for them to press the browser back button and look for the cache link in your website in Google search results.

If you use the meta tag shown above, Google does not show the cached link to your website. The result is a poor user experience because potential readers or customers fail to read and understand your services. Had they been able to do so, they could have become repeating or loyal customers in the future.

This is why the Google cache link is always helpful as a backup in case the live version of your web page fails to load properly. In this case, users can take note of your website and may come back the next time it is live to make a product purchase, an inquiry or avail themselves of your services.

Recommendations:

1. If you do not want your web page to be cached by Google because it presents some confidential information, try to think twice before publishing your content online. If it poses some risk or there is no need for users to read that content or to be indexed by Google, you can block the URLs in robots.txt or put in a meta no index tag. This is the safer approach. You can even remove the URLs using the Google URL removal tool, provided it has been blocked, 404ed or placed with a Meta no index tag.

2. To contribute to the user’s experience, it is always recommended to have the most important pages in your website (especially those that you need to rank or are currently ranking in Google) cacheable by Google.

3. Content is very important in SEO. Cached pages provide an alternative with which users can interact. They can read your content, even if your server or website is down. In this case, they take note of the URL and could come back to your site in the future, perhaps even link to it — thus helping the SEO aspects of your website.

The text-only version of Google’s cache provides important clues as to whether Google is picking up the right textual content of your web page. You can view it by clicking the "text-only version" link in the Google cache result of your website (see screenshot below). If, during normal browsing, you can see your text, but you cannot see it in the text-only version, then Googlebot cannot see the text in your web page. It might be JavaScript-driven or displayed using Adobe Flash, which is common. In short, Googlebot is not properly indexing your website’s text content.

In fact, the text-only version of Google’s cache delivers results that are highly similar results to the Lynx Text Browser. This makes sense, since Google admits that its Googlebot and that browser behave similarly. 

Recommendations:

1. If you have any doubt as to whether the hyperlinks in your website are crawlable, try viewing the Google cache text-only version to check whether you can see those hyperlinks and that they can be clicked.

2. The text-only version is useful for diagnosing if you have some hidden text, JavaScript cloaking and flash-based content in your website. If content seen by Googlebot is very different from what shows up with a normal browser, you need to improve your content presentation so that what Googlebot sees matches what your human visitors see. 

Google’s cache date is, at least theoretically, the new Google Page Rank, according to Aaron Wall. For curiosity and a need for a proof, it would be easy to establish a scientific basis by gathering samples of Google cache date data and correlating it to Google Page Rank’s measurement of importance.

Below are the results:

Based on the data provided above, authority and trusted websites are indexed frequently by Google when compared to less well-known websites. So Google cache data indeed reveals some information about Google’s trust in your website.

If you’re having a hard time understanding the data table provided above, see the correlation plot below between Google Page Rank and Google cache date:

It tells clearly that high PR websites are frequently indexed (hours last indexed value is small + difference = Date check – Google cache date) as opposed to low PR sites which are visited around 250 hours ago (or around 250/24 ~ 10 days ago) for a common PR 3 website.

Recommendations:

1. According to this study, Google Page Rank and back links are highly correlated. Then, according to the study above (between Google’s cache date and Google Page rank), websites that possess authority and trust are frequently indexed, so it make sense to work hard on content that attracts quality back links.

2. Google Page Rank and Google cache date as shown in those studies does not directly translate to "good rankings," so if you are aiming for reputation and trust, you still need the "relevance" factor to rank well in Google.

The relevance factor involves the way your website presents content. Does it present content in such a way Google finds it relevant to a specific search query? Since you have the trust, ranking will be easy, especially if you have the relevance factor in your website.

Google+ Comments

Google+ Comments