Thwarting Content Theft

Not long ago I found several complete articles I wrote online – on someone else’s web site. We didn’t grant permission to republish those articles. If you’ve experienced the same problem with your original work, you’ll want to keep reading. Content theft is no minor issue.

The most typical form of content theft happens when the thief sends a scraper bot to grab original articles from web sites, and then throws a bunch of them together onto a single site. The thief then sets up the site to make money from AdSense. If you’re a blogger, and you don’t make money from your content, this can be particularly infuriating.

The thief may or may not realize that the search engines penalize sites for duplicate content. So he may be penalized – or he may not be. Search engines can’t tell who published an article first. And content thieves often work very fast; I’ve seen articles I’ve written published in full on other sites within a week or two of their going live here. So it’s entirely possible that you will suffer an unjustified penalty for content you created!

Fortunately, you don’t have to live with this problem. You can not only detect content theft, you can also make the brazen bandits take your content down. Especially in the US, as the original content creator, the law is on your side.

I won’t lie to you; policing your content and getting thieves to take it down requires work. But it’s a small price to pay – and it may not take as long as you’d think. Charles Fagundes, editorial manager here at SEO Chat for years before he moved back to the programming side, notes that it depends on whether the theft was committed unknowingly or on purpose. “If it’s unknowingly, it can take about an hour of work, and the content comes down in a few days,” he explained. “If it’s on purpose, it can take multiple hours of searching and threatening and a week or two to have it fixed.”

Some people manage it in even less time. Jonathan Bailey, writing about content theft myths for Blog Herald, observed from his experience with more than 600 cases of plagiarism that “an average case of plagiarism should never take longer than twenty minutes to resolve. Most, in fact, can be resolved in less than ten.” So let’s take a look at what you need to do to keep your content where it belongs.

{mospagebreak title=Detection}

A poster to a DigitalPoint thread suggested that content theft might be the culprit if you see your site dip for a keyword in Google and then not recover. Put the URL on your site that targets that keyword into Copyscape, and it will return a list of potential matches. You can use the free service to check for 10 copies per month; the premium service costs five cents a search, but allows you to perform unlimited searches. It also helps you track cases of plagiarism. Copyscape also offers a Copysentry service that provides automatic monitoring of your content for a fee, with weekly or daily checks of pages depending on the level of monitoring you choose.

If you already use Google Alerts, you know how convenient it can be to have items of interest mailed to your inbox. Why not put those alerts to work to help to catch content thieves? An original writer creating unique content is all but building a fingerprint; you can make use of that by having Google Alerts watch for a phrase that will stand out as being specifically from the item you’re trying to protect. You only need to wait for Google to email the suspects to you – which it will, every day. Bailey recommends this approach for important and/or static pages on your web site.

Mahalo boasts an immensely cool page on Plagiarism (Literary). On the right-hand side of the page, in a Guide Note, you’ll find a link to their Plagiarism Detection Tool. It’s a JavaScript bookmarklet, and they include the script. When you add this applet to your browser’s toolbar, you can highlight some text on a page, click a button and watch Google turn up results that contain that text.

Another poster to the DigitalPoint thread I mentioned earlier offers an interesting suggestion to help you detect content theft. He said he tried for months to stop his content from being stolen automatically. Then he starting planting a link to his site in the middle of his articles, “and then a few months later I could tell exactly who had been stealing my articles.” It must have been a simple matter to track the links back and then ban the appropriate IP addresses.

Once you’ve spotted the stolen content, you might want to preserve a snapshot or archive of the page before you take any further action. This is to protect you in case the owner changes the site and disputes your claim. You can do this yourself, but it’s even better to get a third party to do it. The Internet Archive, which runs the Wayback Machine, is a great resource for this. You might also want to check out Yahoo’s MyWeb, currently in beta.

{mospagebreak title=Making Contact}

Once you’ve determined that someone has stolen your content, you need to find out who did it – so you can contact them and tell them to take it down. If the thief posted the content on his own domain, you can track him down by doing a Whois search. If the site uses an anonymous service, don’t let that stop you; those often forward email to a real account, so they should get your message.

If the content is on a social network, you’ll probably have to create an account on the network yourself so you can contact them. Bailey recommends that you keep your personal email address private when you do this, and suggests using 10 Minute Mail to create a temporary email account for registration emails.

Even with these options, you may find yourself unable to contact the plagiarist quite so directly. In that case, if it’s possible, you’ll need to leave a comment on the site. And you may need to monitor the site for a little while for a reply.

So what should you say? Tell the plagiarist that you own the copyright to that article. Give the name of the item and the link at which it appears on the plagiarist’s web site. Include the link where it appears on your web site and give the date it was originally published. Point out that you have not authorized the plagiarist’s use of your work. Tell them to remove the work from their site, and if they fail to do so after a set period of time (as short as 72 hours), you plan to take action which may include contacting the site’s administrators and/or your lawyer. If you are already dealing with the site’s administrator, you can tell them that you will contact their web host. Plagiarism Today has some excellent form letters for this situation and others, such as those that I’ll discuss in the next section.

{mospagebreak title=Escalation}

Usually that’s as far as you need to take it; most site owners you’ll encounter are reasonable and will do as you requested. But some aren’t. In that case, you’ll need to follow through and contact their web host.

You can find out who is hosting their domain through Whois once again, and of course you can use that tool to get the contact information for the web host. The person you will need to contact is whoever handles DMCA claims at the web host. Check the site’s Terms of Service or “legal” page. If you can’t find it there, the US Copyright Office has a list of agents. There’s a good chance the web host has registered there, because they can’t claim protection under the DMCA without registering. A check under Go Daddy on this list at the time of writing revealed that their designated DMCA agent, for example, is Ben Butler; it also gives the postal address, phone number, and email address at which to contact him with DMCA issues.

If you want to try hitting the content thief in the pocketbook, you may be able to contact his advertisers. This is much easier said than done. Google’s AdSense has a DMCA policy; it will not take action unless you send a DMCA takedown notice. Unfortunately, they will only accept communications of this kind via fax or postal mail.

DMCA stand for Digital Millennium Copyright Act, of course, and a DMCA takedown notice should include the following information:

  • That your letter is a Notice of Infringement as authorized in § 512(c) of the US copyright law.
  • That you are reporting what you feel is in good faith an instance of copyright infringement (this may be more important than you think; DMCA takedown notices filed in bad faith can open the filer to a lawsuit).
  • The title(s) of the work under contention.
  • The web site address where the plagiarism appears.
  • The web site address where the original material appears.
  • Your contact information: name, address, phone number, and email address.

Again, Plagiarism Today has a great template you can use with blanks in the right places so you can just put in your own information as it applies to your case.

The ultimate level involves going to the search engines. They are required by law to remove infringing URLs from their indexes. This doesn’t actually remove the content from the web, but it does help prevent other people from finding it. Google has a Site Status Wizard which you can use to find out whether the site is listed in the first place, and monitor to make sure it is removed from the search engine’s index. Be aware that Google requires a handwritten signature on a DMCA takedown notice before it will take action. But with any luck, you’ll never have to go this far. 

[gp-comments width="770" linklove="off" ]