How Google`s Quality Raters Treat Web Spam

This is the second part of a two-part article on the leaked Google document "Google Guidelines for Quality Raters." In the first part we reviewed how Google raters are told to treat such things as relevance depending on the country of the query, and the actual rating scale. The document also revealed what raters consider “Useful Content” and showed us what it’s like “Inside the Raters HUB.” In this part, we’ll cover web spam guidelines.

If you want to read the entire 44-page document yourself, here’s the link: Google Guidelines for Quality Raters.

What is Considered Web Spam by Google

Webspam is the term for web pages that are designed by webmasters to trick search engine robots and direct traffic to their websites.”

Google treats spam as the enemy, even if the “spam” page is considered relevant or useful to the search query. Google states: “It is possible for a page to receive a very high rating – even a Vital rating – and also be assigned a Spam label.”

Once a page is identified as spam it goes to the “recycle bin.” If you use deceptive search engine optimization techniques, read the web spam section of the document very carefully.

Types of Spam

Here’s what Google Raters consider to be spam.

  • PPC pages

  • Parked domains

  • Thin affiliates

  • Hidden text and hidden links

  • JavaScript redirects

  • Keyword stuffing

  • 100% frame

  • Sneaky redirects



PPC pages are pages camouflaged as search results. Often they are blog posts that have their content surrounded by PPC ads, and do not provide much value to users. Here are the types of “PPC pages” raters identify:

Fake Directories with PPC Ads: refers to PPC listings appearing as search results or directory listings. Here’s an example: www.fico.ca

Fake Blogs with PPC Ads: blogs with auto-generated or scraped content surrounded by PPC ads.

Fake Message Boards with PPC Ads: Fake forums with worthless content that rely on content scraped from other forums. Fake forums plug in PPC links within their content or are surrounded by PPC ads.

Scraped or Copied Content with PPC Ads: Google is very good at detecting duplicate content. If algorithms are unsure, the page is flagged for manual review by the human raters. To make sure no one steals your content, use CopyScape. The document mentions that if webmasters give credit to the original source, it is enough to remove the “spam” label from the page.

An entire scraped website will be spotted, but if you copy an article once in a blue moon and give credit to the original source, there’s nothing to worry about. Check the search results for “worst SEO mistakes.” There are several sites that rank on the first and second pages with duplicate content. I also found this with several other search queries.

In short, copied content is okay if your website is trusted and has plenty of original content. Copied content is not okay if it’s your strategy.

To identify duplicate content, Google quality raters must copy a sentence from the page and paste it into search results, something that you can do yourself.

The document also says: You can look for suspicious “computer-manufactured” grammar,” which bring us to the topic of writing your own content.

A while ago a member from Digital Point forums (I couldn’t find the thread when I looked) offered some innovative software that took text and made it look unique. It passed CopyScape and delighted many webmasters, who gave it rave reviews. The secret behind it’s efficiency was very simple – it took unique pages, translated those pages into a foreign language using Google Translator (or other service) and then translated it back into English. This produced perfectly “different” pages and provided webmasters with “fresh and unique content” to make Google algorithms happy.

As an occasional writer, I couldn’t stop laughing. I am not sure if they got caught (because Matt Cutts lurks on those forums), but the lesson is – don’t be so cheap. Content production is expensive, but please don’t pervert writing like that. It doesn’t work long term, and deletion from Google’s index is far more expensive than paying for original content.

Google does not consider these items to be duplicate content:

  • Lyrics

  • poems

  • ring tones

  • quote

  • proverbs


The document also mentions content positioned below the fold, with nothing but ads on top. Google seems to be okay with this technique as long as content is ORIGINAL.

The important thing to remember is that if the scraped (copied) content on the page is removed and all that remains is ads, it is Spam.

A parked domain is a domain that expired, but was purchased by a scammer before the owner could renew it. Spammers put their own content on the site and benefit from the domain’s link power.

I believe that Google monitors WHOIS renewal dates and can easily detect and correlate expiration dates to radical changes in content. This means that if parked domains get through algorithms, they can be flagged for human review.

Quality raters are instructed to use http://www.waybackmachine.org to learn how the site looked in the past. It’s very easy to spot parked domains since they are usually machine generated. http://www.dasonet.com/todahfzkdk.htm 

Thin Affiliates

If you rely on affiliate networks for branding and income I have some bad news – Google doesn’t like low quality affiliates, as they call them “Thin Affiliates.”

Thin affiliates copy merchant’s text and images without giving much unique value. Google considers those pages low quality and instructs raters to label them as spam. As a merchant you are protected because raters cannot ban original merchants (unless you break other rules).

Thin affiliates may pass filters if they offer unique value like reviews, customer feedback or price comparison.

Hidden Text and Hidden Links / JavaScript Redirect / 100% Frames

This refers to cloaking. If you’re new to SEO, stay away from cloaking. There are Black Hat masters out there who rely on it, but I do not know what they know, so I can’t recommend it. You need to know what you’re doing to make cloaking work.

Building a business model around cloaking is foolish. Google always patches the holes, so relying on cloaking software to fool search engines is kind of risky for your wallet. Software vendors may not be as fast to come up with new solutions.

This method is extremely useless. Not only it can flag your site for manual rater review or get your site deleted from Google’s index, it has zero effect on your search engine rankings. “Keyword Density” measurement is a thing of the past. Keyword density has no impact on search engine rankings (assuming you a have healthy amount of keywords on your pages) and too many keywords can actually hurt you.

DO NOT put excessive keywords in your tags, ALT tags and IMAGE tags, text and headlines. Google algorithms can easily spot overly-stuffed pages and flag them for human review.

Today’s SEO game is about social branding (LINKS) and MEGA-useful content.

Sneaky Redirects

A sneaky redirect takes place when a page redirects the user to a different URL on a different domain. While being redirected, you might observe the page being redirected through several URLs before ending up on the landing page. Search engines index and score the content on the first domain, yet the user is redirected to a different domain. Again, the webmaster is presenting different content to the search engine robot and the user.”

Google raters aren’t cool with this, but there are times when you just have to redirect users. Maybe because you moved a site to a new domain or bought another company.

When you do a Google-friendly 301 or 304 redirect, part or full link power from the old domain is passed to the new destination, hence it CRUCIAL to do a redirect right. This is especially true if you’ve been building links to the old domain for a while. Who wants to lose months or maybe even years of hard work?

Make sure that the old domain and new domain contain related WHOIS data, so Google and its raters know that it’s you.

The topic of redirection is too large to discuss in this article, so do some research or ask around the forums if you want to learn how to do redirects right.

Conclusion

The Google Raters document describes how Google wants its search engine to work. The Rater Guidelines are built entirely with the aim of returning exceptional quality sites for end users, which is Google’s goal. As for you, small business owner, the best way to play the SEO game is to play it by Google’s rules – deliver exceptional quality to users. Of course there are links to worry about, but you can outsource link building to a good SEO company while you focus on delivering MEGA useful and exceptional content.

Using spam techniques won’t get past Google for long. It pays to those who know how to use it, but if you’re new to SEO, stay away from “black” methods. Sometimes it’s tempting to try something and get instant results, especially if you have been working on your standing for a while and don’t see progress yet. Don’t do it; newbies are burned faster than veterans.

Google+ Comments

Google+ Comments