Preventing Duplicate Content on an E-Commerce Site from Session IDs
(Page 1 of 4 )
It's very important for your e-commerce website to show up well in the search engines. Unfortunately, you may be fighting duplicate content issues you've never even heard of -- issues that can cause your site to rank poorly in Google and never even be seen by searchers. If your site uses session IDs to track visitors, you need to read this article. We'll show you the problem, and provide you with several good solutions.
E-commerce keeps getting more and more popular. According to latest survey by Nielsen (source: http://www.nielsen.com/media/2008/pr_080128b.html ), the number of Internet shoppers increased by 40 percent in just two years!
The growth and popularity of online shopping has encouraged the creation of many e-commerce web sites, selling different types of online products. To get sales, these sites need a lot of visitors coming to them from popular search engines such as Google.
To track visitors, these sites use session IDs, a weird and long combination of characters appended to their URLs. Every unique shopper on an e-commerce web site receives a unique session ID.
The session ID data will be used from the time the visitor starts shopping until they complete their check-out. Session IDs are used for security purposes, to ensure that all website transactions are traceable.
OsCommerce, a suite of open source shopping software, is one of the most popular e-commerce-based applications. It relies heavily on session IDs for its day-to-day shopping operation.
This article focuses on the correction of duplicate content in OsCommerce-based websites resulting from session IDs.
The Problem with Session IDs
Now what? Since these session IDs will be appended to the URL, they cause tremendous problems with search engines. Say you have this one canonical URL in your website:
http://www.mywebsite.com/buymymusic.html
This is the URL that needs to be indexed by Google, since it is the official version. But when Googlebot visits the site, your web server will then provide this URL (for example):
http://www.mywebsite.com/buymymusic.html?osCsid=4e2f1
Google indexes this URL -- and to its eyes, it sees duplicate content, because the canonical site and the site it is looking at now have two different URLs. And again, when Googlebot visits the site, your web server might assign yet another session ID, for example:
http://www.mywebsite.com/buymymusic.html?osCsid=5c3g1
This process is repetitive and will make your site very difficult to understand from the search engines' point of view, because they now see a lot of URLs containing the same content.
This means you face a serious duplicate content issue. The side effects of duplicate content caused by session IDs include:
- The number of indexed URLs will increase in the Google index; this means Google will take longer to determine the important pages on your site, because its index is clouded with duplicate content URLs.
- It reduces the relevance score of your canonical URL. This is because the power of your canonical URL is being diluted with a lot of duplicate content URLs using session IDs.
- A low relevance score means lower rankings in Google, and lower rankings means less traffic. Less traffic means fewer sales online.
Next: Robots.txt and Sitemap (XML, dynamic and static versions) >>
More Search Optimization Articles
More By Codex-M