Before there used to be such a dominating presence of search engines and the role they play in helping you and me find what it is we are looking for, a web site could only succeed if it offered unique content that you couldn’t find anywhere else. This would accomplish several things such as ensuring repeat traffic, ensuring new traffic via word of mouth, possibly landing a mention in a print publication, etc. Those days have ended.
These days a site cannot exist for any substantial amount of time unless it is at least somewhat optimized for the search engines. This is a sad but true statement. If you don’t mind your site existing in a vacuum where its readership consists of family and friends then this is a non issue. There is nothing wrong with that, but if your site is of a commercial nature you can see how the lack of traffic might be a hindrance to say the least.
Pseudo Blogging Web Sites
The big trend/fad these days seems to be these pseudo blogging web sites. These sites have no original content; rather, they search other sites for original content, write one or two sentences about the original content, and then link to the source. How lame is that? Then they run Google ads and other types of ads around the stolen/reprinted content. It’s one thing if it’s a site for personal gratification, but to create this type of site to make money by leaching off of other sites is deplorable. Off the top of my head I can think of several SEO related websites where all they do is visit the forums of other SEO sites, copy threads that they find there and then add one or two opinionated sentences and that’s their entire web site. Sounds crazy? It is true. And they make money by doing it.
Have you seen the kinds of sites I am talking about? I bet you have even though you may not recognized it at the time. Maybe you are searching Google for some help on optimizing your site and Google returns a fistful of what appear to be hits to your query, but in reality its web sites who have stuffed the word “optimization” into their pages and offer no original content whatsoever.
Is this what passes for quality content on the Internet these days? I guess so. I have seen more and more of these lazy sites than I care to. Why should we give a crap? If they can create vapid sites with no intrinsic value whatsoever and make money doing it, more power to them! But here is the problem: all these kinds of sites do is add to the confusion and congestion of the search engines, making it much more difficult to find what you are looking for.
Let us say you type in “help with xyz widget.” Let us assume that there is a forum with the thread topic “help with xyz widget.” Now let us also assume one of these leach sites we discussed previously decides that thread is worth “covering,” so they write their one or two sentences and then link off to the thread. But the main difference is their leach site is totally optimized so that it actually shows up ahead of the original thread in the search engines for “help with xyz widget” that they copied! This scenario happens a lot and these thieving types of sites count on it. There is big money at stake.
This type of “Internet stuffing” is beginning to become a large problem and there are some major repercussions looming for the entire Internet population. As long as search engines pay more attention to these leach types of sites than the true originators of content, it will become increasingly difficult to find what it is you are looking for. This, as a result, will only encourage these types of bogus sites to grow in size and number.
Imagine if, for every query you type into a search engine, there are 10 bogus hits for every 1 valid hit, and the 9 bogus hits showed first. While this might seem to be a slight exaggeration, we are most definitely heading down that path unless there are some significant changes to both the algorithms the search engines employ as well as the mentality of the average Internet user.
What would be awesome is if Google and the others could figure out a way to parse through all of these fraud sites and valid ones and determine which site is the originator and which one is the copier of the actual content in question. Granted this is a daunting task but one well worth undertaking. This would make it so much easier to find what it is you are looking for, offer more just desserts for the sites providing original content, and remove most of the aforementioned thief web sites from the Internet. After all what would be the point of them copying content if they showed up on page 9 of the results?
Another byproduct of the “Internet stuffing” that is occurring is simply overcrowding of data. Envision an article about global warming being published tomorrow. Now envision a week from tomorrow there being 10,000 web sites who cover the same article by adding a sentence or two, and then regurgitating the same information. Now for that one single topic which was posted once by the original creator, it has been replicated 10,000 times over.
At the rate at which all kinds of information is being added to the Internet on a daily basis, it stands to reason that search engines will have to create much more stringent rules pertaining to which sites to index, and which not to index. At first glace this kind of thought process would be considered blasphemy; it brings up questions such as who decides what to index and what not to index, what guidelines or rules would be followed to make such determinations, would the small guy/site get screwed in the process, and so on.
All of these questions are valid ones and I don’t pretend to have all the answers. But that should not preclude us from taking a closer look at this issue. Google has publicly stated that they are experiencing disk space issues, which is one of the reasons why they created a “supplemental index,” so any content/sites not deemed to be Google worthy would get tossed in the supplemental index, making your site much harder to find. Logic dictates that very barbaric solution is only a stop gap, and invariably Google will have a much more widespread plan implemented which will weed out authority sites from non-authority sites to the point where a non-authority site won’t show up on page 9 – it simply won’t show up at all, and will not be indexed.
One idea might be for Google to make some severe modifications as to how their Google sitemaps currently works. For instance, if it grabs your data on a nightly basis, maybe it should time stamp all the data it collects and keep track of these time stamps. Then if an article or web site attempts to copy it or feed off of it like many leach sites out there do, it will recognize the original authority of the article and give it preferential treatment when deciding rank and position for queries. Obviously I would also recommend all of the other search engines offer a similar type of service as well. I am sure there are some contingencies this simplistic solution does not account for, but at least it would be a move in the right direction when attempting to decide who the rightful originator of content is.
As I mentioned earlier, I do believe that a joint approach is necessary if we are going to at least reduce the amount of clutter that currently exists on the Internet and delay the future growth rate of crap permeating our online experience. Search engines must do their part, but we must do our part as well. Wouldn’t it be cool if there was some collaborative effort across the Internet which would allow anyone to report a duplicate site or one that is thieving content from another or simply leaching off of original content? Conceptually I would envision something along the lines of MySpace except really for the benefit of the entire Internet population and not for monetary purposes – a greater good focus, if you will.
It is unclear how many people or businesses would participate, but I would imagine anyone, whether they’re an individual or large entity, would have a shared interest in weeding out copiers from originators. The way the Internet in general, and search engines in particular, are currently operating, one thing is certain: this particular landscape will not be able to sustain itself indefinitely. Extreme modifications will have to be made in order to preserve some semblance of the effectiveness we have grown accustomed to when doing research on the Internet. My big concern is that these space restrictions and their implications on the Internet as a whole will not be realized until it is in a critical phase, at which point only the most drastic of measures will be deemed even moderately effective. And to be certain, when that day comes, the results of any trimming of data indexed by the search engines will have very far reaching implications.
You know how they say to never go food shopping on a empty stomach because you will most likely purchase things you don’t want, don’t need, and overall will purchase more than you normally would? Well I sort of envision this cluttering scenario as the same type of situation. If things are allowed to continue the way that they are now with search engines not only indexing these duplicate sorts of sites but ranking many of them higher than the originator of the content in question, then that space threshold is going to be reached sooner rather than later. When it happens there will be a quick and hectic scramble by the search engines to determine which data should be omitted, because things will be in such dire straits.
It is my opinion that there may not be as much acute attention paid to exactly the ways and means of determining what data to omit, and stuff will get sliced and diced and damn the consequences. Well the consequences are going to impact folks like you and me. Remember, at the end of the day most of the large search engines today are for-profit entities; this means they are going to do whatever is necessary to keep those profit margins and return shareholder value. Companies that are public experience an acute pressure to perform, and often make rash decisions which will help the bottom line short term, but do not truly look toward the future. I am willing to bet that if Google hasn’t seen the space crunch coming yet, they will shortly. And when that happens there are going to be several stop gap methods used to stop the bleeding before they can brainstorm from a much more macro standpoint and come up with a more permanent solution. My fear (which should be your fear as well) is that with these short term solutions (i.e. supplemental listings), it will almost always be the smaller sites which are impacted the most.