Content Scraping Gets Political

Every good SEO knows that content scraping is a bad idea. It infringes copyrights, and leads to Google penalties. Now, we’ll need to add “embarrasses political office seekers” to the list.

No, neither Barack Obama nor Mitt Romney have been caught in content scraping scandals – at least, not yet. But across the pond, over in the UK, it appears to be a different story. Danny Sullivan, writing for Search Engine Land, reported that the new co-chair of the UK’s Conservative Party, the Right Honorable Grant Shapps, apparently started a rather less-than-honorable business about seven years ago. This business sells a content scraping tool.

By all means, check out the link above to Sullivan’s full article; it’s a long and wonderfully comprehensive piece. You can also check out the Guardian’s coverage of this story. Sadly, I only have room to report the essentials here.

The company Shapps founded is named HowToCorp, and the software package it sells for nearly USD$500 is named TrafficPaymaster. According to the Guardian, this software steals content from other websites and then “spins” it to make it look original. The whole idea behind using such software is to quickly generate web pages and whole websites on which the software’s users can place – and earn money from – AdSense ads from Google.

This software blatantly violates Google’s terms of service, assuming it works as advertised. Sullivan is cautious about his own statements concerning TrafficPaymaster, since he hasn’t actually tried it out or reviewed all the websites using it. But if it does work like similar software packages, there are other reasons not to use it. Sites created with the software give visitors bad experiences.

To demonstrate this point, Sullivan goes to a website touted by “Sebastian Fox” in a blog entry at HowToCorp. (For various reasons, Sullivan suspects this is Shapps using a pseudonym, but again, I don’t have the space to go into the details here). This site was supposedly put together in only 10 minutes. In two weeks, the blog post claims, it got more than 200 pages in Google’s index. Sullivan found more than 1,000 pages using a “site:” search on the URL. Interestingly, I did the search late on Thursday and found no matches, though I’m pretty sure that at least some of their pages are in Google’s index still.

So the pages apparently got indexed. As Sullivan points out, that doesn’t prove anything. “Anyone can get pages listed. But if the pages don’t rank for actual searches that people do, they might as well be invisible,” Sullivan explains. And those pages didn’t rank for the keywords you’d expect them to, such as “golf grip” or “golf lessons.”

Worse, the content these pages offer after spinning is less useful than the stuff they spread on the fairways to make the grass grow. Sullivan quotes the first sentence of one of the site’s pages on free golf lessons: “A free of charge golf swing lesson appears a very little as well superior to be accurate.” Huh? I’ve edited some difficult work, but rarely have I encountered a sentence that confusing. No native English speaker, and very few who speak English reasonably well as their second (or even third) language, would write like that.

What would produce that sentence, though, as Sullivan explains, is software that copies content from elsewhere on the web and then changes the words around or replaces them with synonyms. You’re left with content that may not be a direct copy – but also doesn’t make any sense. You’re not going to hold any visitors with bogey shots like that. Of course, for those kinds of pages, that’s not the idea; the plan is to put ads prominently on the page, and generate income from visitors clicking on the ads.

The Guardian cited sources at Google who confirmed that TrafficPaymaster violated its policies, and that its search engine’s algorithms could drop the ranking of any web pages made with HowToCorp’s software. Perhaps the fact that none of “Sebastian Fox”’s web pages turned up when I did a “site:” search indicates that the search giant has already taken action.

To be fair, Shapps claims that he’s not involved in the business, and that his wife runs it entirely. Indeed, a Shapps spokesman claims that the politician “derives no income, dividends, or other income from this business, which is run by his wife Belinda…He is quite simply not involved in this business.” There’s no telling what effect (if any) these revelations will have on Shapps’s political career – but  no one should be involved in this kind of business. It’s bad for the web and bad for visitors.

I’ll get off my soapbox now. What do you think? Feel free to post in the comments.

[gp-comments width="770" linklove="off" ]