Google 101 - Cached and Similar Pages
(Page 7 of 12 )
Cached
As Google tracks Web pages, it keeps copies of them on its servers in a repository called a cache. While the page title link takes you to the current site, the Cached link delivers you to the copy Google made when it recorded the page. Google rerecords most pages every few weeks. This time difference is significant because if a page has changed recently, you can still see a slightly older version, which might include the nugget you’re looking for or some info you remember from a previous visit.
Note: Webmasters can set up a site so that Google won’t cache it. As a result, you might not be able to reach a previous version of every page you find in a list of Google results. In such cases you simply won’t see a cache link. For a discussion of setting up your site to ward off caching, see page 233.
Google’s cache is also handy when a page you need has been deleted or its link is broken. Just click the Cached link, and Google takes you into its time machine. Figure 1-7 shows you what a cached page looks like.
Google’s cache feature is notorious for bringing deleted Web pages back from the grave. For example, in early 2003, Microsoft accidentally published activation codes on its site that let people use its software. Googlers are still hailing the cache feature for helping them find those codes for a couple of weeks after Microsoft pulled down the offending pages.
The cache can save you not only when you’re searching for something you recall seeing earlier or something you’ve heard somebody else wants to hide, but also when you’ve deleted a page from your own site by mistake. Just hit the Cached link, right-click the page to get the source code, copy it, upload it, and you’re back in business.
But the cache isn’t a cure-all for Web staleness. First of all, a cached page only lasts until Google rerecords the live page, usually every few weeks. Second, cached pages often include dead links. So if you’re reading a hot article on a cached page, and it flows to a second page, clicking the Next Page link may get you nowhere. And third, sometimes Google updates the cache before it updates the snippet, so your result listing may include some text you want but that isn’t even in the cache anymore. Consider yourself forewarned.
Tip: The Web Archive’s Wayback Machine (http://web.archive.org) is a public archive of the Web. Unlike Google, however, it keeps track of Web sites in perpetuity—making it kind of a permanent cache. It’s a great resource when you need to find a site that’s been defunct for more than a few weeks and has therefore fallen off Google’s radar.

Similar pages
The “Similar pages” link searches the Web for pages that fall into the same general category as that result (often, pages of a feather link to each other, which is part of how Google determines similarity). For example, the pages related to ConsumerRe-ports.org include ConsumerWorld.org, the site for the Better Business Bureau, and other consumer advocacy groups and agencies. Or, if you want to find a particular marathon training program, and you’ve clicked through to the New York Road Runners’ site (www.nyrrc.org) to no avail, try “Similar pages” to get links to Runner’s World, the Boston Athletic Association, and more. In short, similar pages is a really good way to find pages in a category, including those that don’t necessarily contain your original keywords.
Indented results
When you run a search and Google finds more than one page with your terms within the same Web site, it lists what it thinks is the most important page first, and then it indents less relevant pages, as shown in Figure 1-8.

File format
Web sites often store documents that you can download by clicking a link. Google searches those documents—provided they’re in any of twelve common formats—and tells you if something you’re looking for is in such a file.
When your query matches words that Google finds in a formatted document, it lets you know by placing a little format marker before the page title, as Figure 1-9 shows.

Here’s a list of the file types Google recognizes, along with their abbreviations (page 50 offers tips on searching for specific file types):
- Adobe Portable Document Format (pdf)
- Adobe PostScript (ps)
- Lotus 1-2-3 (wk1, wk2, wk3, wk4, wk5, wki, wks, wku)
- Lotus WordPro (lwp)
- Macromedia Flash (swf)
- MacWrite (mw)
- Microsoft Excel (xls)
- Microsoft PowerPoint (ppt)
- Microsoft Word (doc)
- Microsoft Works (wks, wps, wdb)
- Microsoft Write (wri)
- Rich Text Format (rtf)
- Text (ans, txt)
Note: PDF, Adobe’s Portable Document Format,lets anyone—using any operating system—create documents that anyone else—no matter his or her operating system—can read. To read a PDF, you need Adobe’s free Acrobat Reader program, which you can download from www.adobe.com, or Mac OS X’s built-in Preview program.
But what if the page with your critical gem is a PowerPoint document, and you don’t happen to own PowerPoint? You’re in luck. Google not only keeps track of documents on the Web, it also converts them to HTML—a code your browser can read (see box on page 33)—and keeps a copy of the HTML for your viewing pleasure. Below the page title is the unassuming link, “View as HTML,” which might as well be called “Life Saver.” Just click the link, and in a split second, you’re reading the file on your browser as a normal Web page.
The “View as HTML”link is also ideal when you don’t want to spend half the morning waiting for a file to download and open. For example, if you want to view an Excel spreadsheet, your computer first has to open Excel, and then it has to download and open the actual spreadsheet file. Under the best circumstances, this process can take 10 or 20 long seconds. And if the file contains a lot of graphics, it can take a couple of semesters. Click the HTML link to bypass this morass, and you’re reading the file immediately. (If the HTML version of a document appears in a font too small for you to read, look around your browser for a feature that lets you zoom in on a page, something like View→Text Size.)
Note: Sometimes the HTML version of a file appears super-scrambled. If that’s the case, no harm, no foul: you can always go back and download the primary version. But if you don’t have the right program to read it, you can probably glean most of the info you need from the scrambled HTML version.
 | If you've enjoyed what you've seen here, or to get more information, click on the "Buy the book!" graphic. Pick up a copy today!
Visit the O'Reilly Network http://www.oreillynet.com for more online content. |
Next: The Things You Didn’t Ask For >>
More Google Optimization Articles
More By OReilly Media