Make Money Without Doing Evil - A Lesson in Content Scraping
(Page 1 of 4 )
Google regularly clears out scraper sites and directories built for the sole purpose of generating adsense dollars. While doing so, Google also smacked down a few legitimate websites from their index. The penalties for the few who abuse the rules often hurt those who were behaving well, and the results don't seem to be pretty.
This penalty has its roots in duplicate content and the attempt to manipulate search engines with scripts that regenerated other people's content into supposedly new pages of content. To Google, duplicate content is not a good thing. It is not good for the search engines. It is not good for the hosting resources of the varios search engines. Most importantly, it is not good for users. As I am sure each one of us reading this article can agree, when we do a search we do not want 10 exact copies of one page that matches our search query.
Now a great many will debate that Google could not possibly catch duplicate content that easily and trying to do so would strain their resources, but I have some news for you. I can assure you that Google does eliminate duplicated content from their general index very easily, and not only can they filter the content out, it can also leave certain duplicate content in the index.
This area is actually a very important issue, in which Brin had the foresight to see problems and had the algorithim built to weed out this issue before it ever became a major concern. There are duplicate pages on the Internet, and there always will be due to news sources gathering information from the same feeds.
In a patent related to "Detecting duplicate and near-duplicate files" filed in January 2001, Google has an invention to detect duplicate content. The patent explains how the search engine works to weed out duplicate content as well as which to filter out of their general index.
Next: Google Knows When You've Been Naughty >>
More Search Optimization Articles
More By Clint Dixon