It is always desirable to have your new product pages indexed as quickly as possible. This can give you a competitive edge over your competitors, if for example you’re both competing to launch a similar line of products. Also, having pages well-indexed in Google increases the chances of maximizing web traffic if there are no other issues affecting rankings.
In fact, if you have a poorly-indexed large website, you are losing a lot of valuable traffic. Losing traffic from the search engines means losing sales, which can affect the profitability of your online business.
Quick Basics and Tools
You can quickly check the indexed pages of your website in Google by typing: site:domainname
So for example, if your domain is www.exampleonly.com , to check indexed pages you would type in: site:exampleonly.com
However, this result will give the indexed pages for the whole domain, including the indexed www and non-www URLs, sub domains (if, for instance, you have a sub domain: test.exampleonly.com)
If you only need to know the www URLS, then you should include www in the Google site command: site:www.exampleonly.com
If you need to find out the indexed pages of your sub domain named test.exampleonly.com, then the site command will be: site:test.exampleonly.com
A detailed discussion of search queries is out of the scope of this tutorial. You may, however, refer to the Google document on the topic.
The Googlebot will not penetrate deep content in a vertical website structure if those pages are just duplicate content or contain no substantial/useful information for indexing. If they are indexed, these highly similar pages will be filtered in search engine results (and cannot rank).
A vertical website structure is the most common structure for big ecommerce websites, because it is the most logical structure.
Based on the screen shot above, the most valuable pages for sale conversions (products) are buried deeply in the structure. In almost all websites, the home page is the most frequently indexed URLs because it has the highest Page Rank. So if Googlebot always lands on the home page, it starts crawling from the top to the bottom of the structure.
It also follows that the deeper the page, the less will be its Page Rank internally, since only a few will link to that page. Page Rank is a measure of a page’s importance, so if a page with less importance also contains duplicate content, there is a greater chance that the Googlebot will ignore it or that the page will hardly rank in search results due to filtering.
Recommended techniques to solve this particular issue include:
When adding new products, add enough description to minimize page-to-page similarities. This will not only reduce the duplicate content problem but will help the potential customer to decide to buy a certain product. A study by allurent.com a few years ago pointed out that 67 percent of potential buyers will end up leaving a site due to lack of information.
Many make the mistake of copying content from other websites. This will not work in the long run because it introduces a conflict with other websites’ content (Googlebot can flag your site as not authoritative if the one you copied ranks higher or it more trusted than your website). Copying content can also lead to DMCA cases, which can be costly.
The concept is to block unimportant URLs in your website which could contribute to the duplicate content issue. Remember, this issue will slow down the indexing of big web sites. A detailed explanation of blocking procedures is beyond the scope of this article and it is highly recommended that you read this one for that information.
Do not let session IDs contribute to duplicate content issues. Ecommerce websites use session ID to track customers and purchases. However, a lot of websites fail because they do this. They end up with thousands of URLs indexed with session IDs. URLs with session IDs are not the canonical URLs of the website, so it is best to prevent Google from crawling and indexing these URLs.
You’ll find an article on preventing duplicate content due to session ID here on SEO Chat.
Bear in mind, however, that a lot of robots.txt or even Meta robots accidental mistakes can be costly. They’re one of the big reasons that those deeper pages are not indexed in the first place. So you’d better DOUBLE CHECK everything before uploading your robots.txt or Meta robots edits.
You already know the weakness of a vertical website structure when it comes to getting low exposure due to lack of Page Rank and internal links. It follows that a good marketing strategy is to start getting fresh links from other related websites pointing to your important products. Truthfully, the impact depends on the PR of the links; the higher the PR, the more frequently will Google follow those links.
This encourages the Googlebot’s crawling and indexing, not only starting from the home page, but on the deeper pages. This increases the chances of getting new product pages indexed as soon as possible.
There are a lot of techniques for this. For example, if you have some industry contacts on other websites, you can let them review your product and include a link pointing to your product/category/brand pages.
Another popular technique is to point a link from the home page (which has the highest PR) to new product URLs. You can then label these links "New Products" on the home page. This will act as a "shortcut" to your new product URLs.
Additionally, make sure you double check all URLs for "dead links." This can also cause issues. You should also make sure that all links pointing to deeper pages do not include a "rel=nofollow."
Take advantage of a sitemap
Google recommends that you use a sitemap in your website. A lot of website owners, particularly beginners are confused by the term "sitemap." There are two types of sitemap, namely the normal web sitemap (using .htm, .html, .php, .asp, extension) and the XML sitemap (for example: sitemap.xml).
A common mistake is to make the XML sitemap the "web" sitemap and vice versa. I have seen this in practice, and the problem is a lack of proper information.
The web version of your sitemap should be easily seen by your visitors and placed in your website. All URLs should link to it. It should NOT use .XML or any other other RSS style format.
Only place canonical URLs in your sitemap. Do not include URLs which you have blocked in robots.txt or URLs with session IDs.
Do not make a sitemap with more than 100 links per page. This is from Google’s technical guidelines. If you are not sure how many links are in the sitemap, you can use the SEO chat site link analyzer.
For details on how to configure the site link analyzer to provide accurate results, enter your sitemap URL, and then in the "type of links to return," set it to "both types" and check "show nofollow links?" To count the number of links in your sitemap, simply add the total indexable links (the sum of indexable internal and external links).
Total links in the page = Total indexable internal links + Total indexable external links
In the example, above, total links in the page = 57 + 0 = 57 links in the page.
If you have a very big website, it will be impossible to list all URLs in the web sitemap. In this case, you will only list the categories or brands. This makes sense; if you can use the exact/descriptive category and brand name as the anchor text of the sitemap link, you’ll find that this is a more user-friendly approach.
How to Make and Use an XML Sitemap
An XML sitemap is more detailed; you can list ALL canonical URLs here, products and everything. Remember that this should be a live document, so in every website update (for example, when you add new products), this sitemap.xml will automatically update too.
Only submit an XML sitemap (sitemap.xml) to Google’s webmaster tools sitemap section.
The XML sitemap should be located in the root directory, and NO link from the URLs of your website should be pointing to this file. You do not need this to be viewed by users and to be indexed by the Googlebot.
However, do not block this file in robots.txt.