Spamming the Blogosphere: the Spread of Splogs

It’s a black hat tactic for scoring high in the SERPs, and it takes the fun out of searching for new and interesting blogs. It’s called splogging, and it’s proving to be no easy matter to wipe out. Keep reading to learn more, including what can be done about it.

I can only plead the disadvantage of not being a full-time SEO as an excuse for this topic not hitting my radar sooner and harder. It’s been around since at least 2003, according to Wikipedia. The actual slang for it hit the press around August 2005, just in time for a major spike in the phenomenon in late October. I’m talking about splogs, and if you’re an SEO who hasn’t heard of this black hat trick for increasing your standing in the SERPs, you’re probably in the minority.

Normally I wouldn’t care to write about black hat SEO tactics. The thing is, this particular one really gums up the works of search engines, and frustrates users looking for real content. It takes something that many people enjoy—discovering web logs with new and cool information or a fresh point of view—and turns it into an exercise in commercialization and content scraping. It’s not fair to the folks who are looking, and it’s not fair to the folks who write real blogs with good content. And these days there are programs to automate the process.

I’m not saying there’s anything wrong with commercialization. Some of the websites I enjoy visiting regularly are all about commerce. But it’s wrong to deceive the search engines and web surfers by setting them up to think they’re receiving one kind of content, and it turns out you’re giving them something entirely different. That’s exactly what splogging does, and why it is classified as a black hat SEO tactic. That said, let’s take a closer look at this scourge, why people do it, and what about it, exactly, causes problems. I’ll also discuss some of the ways that people are fighting back, so that splogs (hopefully) won’t end up taking over the Internet.

So what exactly is splogging, anyway? Wikipedia defines splogs as “weblog sites which the author uses only for promoting affiliated websites. The purpose is to increase the PageRank of the affiliated sites, get ad impressions from visitors, and/or use the blog as a link outlet to get new sites indexed.” In other words, sploggers basically create tons of weblogs and have them host ads and/or include links pointing to the sites they want to promote. They suck up the PageRank from this, and laugh all the way to the bank on the income they get from people clicking ads on the splogs and their website when searchers get directed there from the search engines.

So where do they get the content? In many cases, they simply take it from other blogs. If you have your own online journal, you may be a victim of splogging and never even know it. Wikipedia is also reportedly very popular with the splogging set. And there are sites online that feature “private label articles.” These kinds of articles are purchased under a special type of license that legally allows you to edit and publish the article as your own, right down to putting in your own name as the author. Aside from the lack of original content, one of the big differences between a splog and a regular blog is that the splog often contains the same word or phrase repeated over and over (to score well under that keyword in the search engines) and lots of particular kinds of ads in the sidebar, usually promoting porn, gambling, tobacco, Viagra, or mortgage loan websites.

In late October 2005, Google’s Blogger and BlogSpot hosting service were hit with what one commentator called a “splogsplosion.” It led to clogged RSS readers and many bloggers suffering from overflowing in-boxes. In the aftermath of the activity, Google deleted 13,000 fake blogs. It also installed new software designed to prevent this from happening again.

Who or what caused the flood? Analysts agreed it wasn’t a sweatshop of low-paid laborers being forced to blog. A black hat SEO must have used some kind of software to automate the process. Tim Bray, web technologies director at Sun, seemed both horrified and impressed by the results: “The total numbers (of fake sites) must be mind-boggling…The software that’s generating these things is pretty sophisticated; you might think (the sites) were real at first glance.”

I admit that this is what first made me take notice of this topic. Doing lots of fake blogs, even if you’re cutting and pasting, would clearly take a long time. But The Register recently reported on Brian Adams of Blue Diamond Enterprises, hailing him with tongue in cheek as the Gutenberg and Jefferson of the digital age, all rolled into one.

Adams created the Blog Mass Installer, a software tool that can create 100 fake blogs in under half an hour. You don’t even have to monitor the software too closely while it does its job. In fact, it’s even designed to get around the programs that would normally trip up an automated blog maker.

Google’s Blogger now uses a word verification system called Captcha as part of the blog creation process. You’ve seen this kind of thing before; it shows you some whirled letters that you have to type in to prove you’re a human and not a machine. The Blog Mass Installer chimes when it needs a human to enter the Captcha word. So it may not be totally automated, but you can work on other tasks while the process is going on—and far more blogs can be created in the same period of time.

This doesn’t guarantee that your blog won’t be deleted. It does, however, get you over what is supposed to be one of the biggest hurdles to proving that you’ve just created an actual blog, not a splog. And Adams is offering this software for less than $200.

For his part, Adams doesn’t see any real harm in his program. A tool is only a tool. “I wouldn’t say that the tools are just polluting [the web]. It’s the responsibility of the webmaster to put up content that’s actually useful. If they don’t do that, Google will delete them,” he said in an interview with The Register.

And Google would certainly be right to. Using the Blog Mass Index in the way most black hat SEOs would be inclined is a violation of the rules surrounding Google’s own AdSense program. Their guidelines include the statement that no AdSense ad may be hosted on a page “published specifically for the purpose of showing ads, whether or not the page is relevant.” Assuming Google itself actually adheres to these guidelines and enforces them, it would be a case of “not doing evil” winning out over the profit motive. Think about it; Google makes its money from advertising. Doesn’t disallowing this kind of thing mean it’s shooting itself in the foot?

The Blog Mass Index isn’t the only automated splog producer on the market; it’s just the one that happened to hit my radar first. This has been going on for a while. Most people know that Google’s index is about eight billion pages strong; what not everyone knows is that an estimated one third of those pages have been generated by machines. Wikipedia estimates that one in five blogs may in fact be splogs. Back in October 2005, blog search provider Technorati estimated that 5.8 percent of new blogs overall are fake or potentially fake. That’s 50,000 posts. As of November 2005, SplogSpot’s database listed 41,000 splogs. A check of recently updated splogs at SplogSpot showed 500—but keep in mind, that’s only one resource (so it probably didn’t catch all of them), and at that the list only covers the ones that have been updated most recently.

These fake blogs use up disk space, pollute search engine results (especially blog search engine results), and damage networking within the blog community. Well-known blogger Chris Pirillo pointed out “What happens when all the search terms become infested with these splogs? It makes it that much harder to find the stuff you really want to look for.” This hurts everyone on the web.

So what is being done to fight splogging? I’ve already mentioned one of the actions that Google took. There are also resources online that hunt down splogs so that blog aggregators don’t list them. SplogSpot (http://www.splogspot.com) is one; this company maintains a database of splogs which it makes available to the public via APIs. Splog Reporter (http://www.splogreporter.com) also maintains a list of splogs. A2B’s (http://www.a2b.cc/) main trick lets you find websites by geographic location, but it also blocks web server IP addresses that splog URLs resolve to.

Most of these resources (including Google’s Blogger) have ways for you to report splog. If the splog has Google ads on it, it can also be reported to Google’s AdSense program. If Google determines that it is splog, the splogger could then lose their AdSense account. In short, the best way to fight splog is to be vocal to those who may be inadvertently hosting it; let them know it’s there, and more than likely they’ll take it down. After all, it’s in everyone’s best interest to help keep the blogosphere—and the search engines—truly relevant.

Google+ Comments

Google+ Comments