Danny Sullivan noted the release of several new tools to allow webmasters to remove content from the SERPs. I saw this article after I noticed that my brother put some personal information on one of his sites. I am finding the tools quite useful, though I have not removed the offending page yet.
The tool can be used by webmasters to remove specific pages and also images. This can be used to complement your Robot Exclusion protocols. If some files escape your robots.txt file and your Meta content values, simply stroll over to Google’s webmaster tools page and click on the Dashboard option. Note that you must be a verified member of the webmaster page to use the Dashboard option. To be a verified member of the dashboard page you will be asked to supply a specific file from your website (or add a meta content value in your home page).
I will do an overview first, since I assume that there are some people who do not know about this set of tools and would like to make use of their functions.
When you get to Webmaster Central you will be provided with six links; one is called "Webmaster tools (including site maps)." It is this particular aspect of Webmaster Central we are interested in. Incidentally, the others are quite useful too; they feature discussions and a means of checking how many of your pages are indexed by Google, among other things.
The webmaster tools allow you to submit your site map, check how frequently the Googlebot has crawled your pages and when it requested pages from your site last, and errors it experienced while it was crawling your site. They also allow you to run diagnostics and checks on your site.
Before you can use all of these features, Webmaster Central will require you to sign in to use the tools. You will need to have a Gmail account for this (coercion I say!). I believe that there are very few Google users out there without Gmail by now.
After you sign in to the Dashboard, which will be indicated on the top of the page when you click through to it, the Dashboard will require you to add your website URL. Note that it is also on the Dashboard that you report incidences of spammers and submit re-inclusion requests if you have been removed from the Google Index.
Google "auto detected" my IP address and logged me into the Dashboard without my filling in anything; it probably helped that my Gmail account was up and running. After giving me a form with which to add my site URL it asked me to verify the URL by either uploading a particular HTML file from the site to Webmaster Central or by adding a meta content value to the home page of the site. Ergo I call up my brother and harangued him to pass the cpanel password to me pronto; then I remembered that I handled the hosting and start digging into the archives.
By the time you verify your ownership of the site by following the instructions, you can start having some serious fun with the Dashboard. Google gives you comprehensive statistics on the Webmaster Tools page concerning your web site. To get access to remove pages from the index or from the cache, you navigate to the "Diagnostics," and then to the "Remove URL page." This is to the left.
Once you have navigated to the "Diagnostics" and "remove URL" page you fill in the options and remove an individual file, a directory, a cached copy or even an entire site. These tools provide a vast improvement in performance over the past methods used in removing pages from Google. Back then you would pretty much have to wait until Google refreshed their index; that’s a factor over which you have little or no control.
Please take note: to make sure that these tools work as advertised, you will have to modify your robots.txt file, add a meta content value to block the desired file(s), or delete the pages manually, so when the Googlebot requests the pages it will be served a 404 response from the server (server to requester: "I can’t find this file, it has either been moved or permanently deleted"). Once the request for the files to be removed from the index or from the cache has been submitted (and the required robot exclusion protocols followed), the tool will process the request in a couple of days.
You can monitor the progress of your request over time on your Dashboard; you can also remove your request before it is accepted. You monitor your progress using the "current requests" tab in the "Remove URL" page where its status is shown (Pending, Removed or Denied). Once removed, the file is not added no matter what for the next six months (unless you specifically request re-inclusion from the Dashboard).
After six months Google returns to normal crawling and indexing. If you want the pages permanently removed you must still have your existing protocols excluding bots from those pages in place. Apart from your robots protocol, permanently removing the pages (an HTTP 404 response from your server) will of course ensure that the files are not included again.
Using the "noindex" meta content value is all you need to make sure you are not listed on the SERPs. If you just want to remove the "cached" copy on your site, you use the meta content "noarchive" and fill in the necessary options on the "remove URL" page. Once the Cached copy is removed, no description or cached copy is available for viewing. Again Google excludes the page from its cache for six months, and then proceeds to start crawling, indexing and caching as normal, so make sure that your robots exclusion protocols are in place.
Third Party Removal and Other Interesting Horror Stories
Now this is the most interesting bit. You can actually wipe out cached copies of another site owner’s site if you want using these particular tools (if certain activities have been done of course). Let’s assume your content has changed, as in the case of an SEO professional removing a negative article from his client’s web site because it generates bad publicity. Now let’s say your suddenly (gasp!) inquisitive and web savvy client checks out the cached copy and discovers that you have been sleeping on the job, and the SERPs still contain cached copies of the offending page.
However, instead of griping about your tardiness ("I changed it," you feebly explain) he goes to Webmaster Central himself, and from the remove URL page asks for the removal of that particular cached copy instead of just waiting for Google to refresh it whenever. While filling out the form to remove the cached copy, he adds that certain words have been removed from the page, and voila! The Googlebot checks the page, discovers it is true and wipes out the previous cached copy which exists.
Note that this means anyone (and I do mean anyone) can wipe out a cached copy of a page for SIX months if you have made any changes to it! This means no description on the SERPs for that page, and nothing for the searcher to check before he jumps head first into your page. This feature has been around for quite some time on Webmaster Central, but the tool makes the whole process faster for third parties to aid in the speedy removal of cached copies.
You can also remove pages which contain any sensitive information. For example, if somebody puts up my credit card information or a copy of my signature online, I can as a third party request that Google remove it from the SERPs. Also if another site violates my copyright I can request the removal of that page without asking for the site owner’s permission (scrapers beware!).
Bloggers have been discussing this issue somewhat tamely, in part because there is hardly any information on it. Still, there are some people who claim that Google as a gatekeeper should seek to cache as much as possible for historical purposes. This is a moot point, judging from the outrage displayed when Google swapped pre-Katrina images for post-Katrina pages in New Orleans in Google Maps; it was pretty phenomenal. Basically the "owners" of the content complained that they were being misrepresented. Google never claimed to be historians, and nobody is buying them extra servers to save all that lovely history.
Danny Sullivan at Search Engine Land noted that Google should not give third parties so much power over cached copies, saying that people could easily wipe out cached copies if they notice changes in the page. Google says that that’s the point; if the page content changes, they want their cached copies to reflect those changes. Danny Sullivan has actually been following these new tools extremely closely and I have to acknowledge that my first whiff of it was when I checked out Search Engine Land. His article on it remains the only article resource (apart from this one) on the SERPs for "Google Content Removal Tools;" do check his article out here.
I believe that Google is right in wanting accurate and timely information. But they really should separate caches and descriptions. Not having a description on a page is not cool, believe me – all you see is a bunch of links.
I am having a party "blasting" various cached pages which have been changed (bad, bad me). I found Danny’s article when I was looking for a way to quickly remove some listings from the SERPs. I was especially happy to have the tool when I discovered cached copies of pages I had deleted close to three months ago! I particularly love the tool for removal of copyright infringing material.
I’m still monitoring the ripple effects of this tool. I like its ergonomics; it is extremely easy to use. If you ever want to remove a page from Google’s index or cache, this tool will make it a simple task.