Preventing Duplicate Content on an E-Commerce Site from Session IDs - Robots.txt and Sitemap (XML, dynamic and static versions)
(Page 2 of 4 )
The easiest and fastest way to fix the Googlebot indexing session ID issue is to block session IDs in the robots.txt file. This is recommended only at the launch stage of your website. The primary reasons are:
- Googlebot still has to find canonical URLs in your site for the first time, so it is wise to give them crawling directions, such as blocking those session IDs and providing them a list of canonical URLs in a sitemap or in navigational links.
- If you do this at a later stage of your website (when Googlebot is already indexing thousands of URLs with session IDs), you will probably lose some traffic you have from search engines.
Using robots.txt, we can formulate the syntax that will prevent the robot from crawling session ID URLs.
In our previous example, the Oscommerce session ID is in the form of:
http://www.mywebsite.com/buymymusic.html?osCsid=5c3g1
And the robots.txt syntax to block this URL will now be:
User-agent: *
Disallow: /*osCsid
Sitemap: http://www.mywebsite.com/sitemap.xml
Remember the important rule for robots.txt: it should be uploaded to the root directory of the site server. Read more about robots.txt in this article.
What about a sitemap.xml file? This file also needs to be updated and uploaded to your site server root directory. Below are the important rules when selecting the URLs to be listed in the sitemap.xml file:
- It should not contain session IDs.
- It should only list the canonical URLs (do not include URLs that contain content that duplicates or is similar to the content at canonical URLs).
- It should only list the most important pages in your site (excluding low value pages, such as thousands of Contact Us pages).
- All URLs listed in sitemap.xml file should not be blocked in robots.txt
The dynamic version of the sitemap, such as sitemap.php or sitemap.asp, should not list URLs containing session IDs.
Next: Oscommerce Admin Configuration >>
More Search Optimization Articles
More By Codex-M