In this article, we will explain how database driven websites cause indexing problems for search engines and some useful ways to optimize and achieve ranking in major search engines.
Why are dynamic websites difficult for search engines to index?
On today’s Internet, database driven or dynamic websites are very popular and necessary to pass information between the database and the user in the easiest way. Database driven websites often need certain information before they can return page contents: session id, cookie data, or query string. Only one page written in server side scripting (ASP) can handle thousands or millions of records separately.
URLs containing Query string use a question mark (?). The problem is web crawlers cannot read and are not trained to understand a dynamic website’s URLs containing question marks (?) , equal signs (=) or other marks such as: # , & , ! and so forth. These signs are referred to as “spider traps.” Any of these signs in the URL spell bad news for dynamic websites. Most of the search engine spiders check the URL for these signs, and then ignore it.
Here is an example of a dynamic website’s URL:
Additionally, dynamic websites contain dozens of function in a single page, and all optimized HTML code lie in these functions. Search engines do not like to execute these functions because the repeated requests for pages can crash the server. So what is the benefit of optimized code if the search engine will not see it? And surely no one can find that site.
Many programmers of database driven websites make the following complaint: “I can develop dynamic websites with efficiency. But why are my website’s average views only 10 per day? Even my partner has static websites but 10,000 views per day.” This is a simple question I have heard from many programmers. The reason is that your partner spends some time optimizing code and thinking about search engine ranking. Static websites can rank higher than dynamic websites.
What is going wrong? Please notice these URLs:
These web pages show you the product lines from the database. They simply change the query string (category id), and a separate page opens, with the corresponding product. That’s great because only one page is handling all the products. It is easy and cheap to develop. What’s wrong with this approach? You lost three pages of content by re-using one. This makes it easy on the developer, but does not give you a good website ranking.
The best way to rank your dynamic website is to place a mask of static URL (rewrite URL as static look) on dynamic URL.
Interesting, we have three different pages for three different product lines. Now these pages can be highly ranked in search engines. But where is my dynamic programming skill? Where is the database drive website? It may not seem like it, but trust me, they are there. Your Web pages are still dynamic and database driven. You only place a mask of static URL upon the dynamic URL. When you open that static page, you actually open the dynamic Web page with query stings.
But how it this possible? Well, it is possible, but it is a technically demanding solution. You can select from several different options, depending on the type of Web server you use.
Use CGI/Perl scripts
One of the easiest why to get your dynamic sites indexed by search engine is using CGI/Perl scripts. Path_Info or Script_Name is a variable in dynamic applications that contains the complete URL address including query strings information. You’ll need to write a script that will pull all the information before the query string and set the rest of the information equal to a variable. You can then use this variable in your URL address.
Write a script that strips out all the information before the query string and the query string part of the dynamic URL is assigned to a variable. In the above example “?cid=7” is assigned to a variable, say “V”. Now you can use that variable in your URL address.
Now the dynamic URL will changes to http://www.mysite.com/products/category/A which can easily be indexed by search engine.
Apache has a special rewrite module (mod_rewrite) that allows you to translate URL’s containing query strings into URL addresses that search engine spiders can easily index. The mod_rewrite is not installed with Apache software by default. You will need to check with your web host and see if it’s available on your server.
For more information about configuring Apache server for mod_rewrite, please visit: http://httpd.apache.org/docs-2.1/en/mod/mod_rewrite.html
There are several programs available that can change the dynamic URLs to static ones. The most common are: XQASP (http://www.xde.net) and LinkFreeze (http://www.isapirewrite.com). They will remove the “?” in the Query String and replace it with “/”. With this technique we can allow search engine spiders to index our dynamic Web pages.
will change to
Another useful program is ISAPI_Rewrite. It is a powerful URL manipulation engine. It acts mostly like Apache’s mod_Rewrite, but is designed specifically for Microsoft’s Internet Information Server (IIS). It can change the URL to a static file. For example, a URL such as:
will change to
Most webmasters have adopted this technique for search engine ranking. That is, create a static Web page and link all other dynamic Web pages to it. This approach is very effective, especially if you are the owner of a small online store selling a few products online.
Another similar technique is to create static pages and place on them links to dynamic pages. Place all appropriate keywords, meta tags, and titles in the static pages (optimize code). Place these static pages in the search engines and get your ranking. However, this technique is not suitable for large websites, with thousands or millions of dynamic pages.
Use a 404 trick
Many people do not know what an error 404 message is. An error 404 page simply means that the Web browser could not find the file you requested. This is termed as a “page not found” error.
But we can benefit from using this ugly error for our search engine ranking purposes. We need to create a custom 404 page. It is not possible to customize your 404 error page if your Web host has not enabled this facility for your website.
The trick is that, if the page is not found, the redirect takes the search engine to the custom error page. In the custom error page we examine the URL, and according to the URL we redirect it to corresponding dynamic page.
The user wants to open this page, but it does not exist:
Now it will redirect to the custom 404 error page, say “custom.asp,” where we examine the URL. In the URL we found two keywords, “web” and “camera.” So we redirect from the custom page to the corresponding dynamic page, which actually contains information about web cameras:
By using this way we can achieve search engine ranking.
To learn more about how to configure custom 404 error page in Apache, please visit: How to Set up a Custom 404 Not Found Page (http://www.thesitewizard.com/archive/custom404.shtml)
For IIS, please visit: Custom Error Pages with IIS 4.0 (http://www.15seconds.com/issue/980210.htm)
Not all search engines hate dynamic websites
The spiders of InfoSeek and HotBot can index dynamic Web pages. But they don’t do it automatically. You have to invite them. InfoSeek’s spider, called Slurp, will index dynamic pages that you submit, but won’t crawl through your dynamic website by default. You will need to only select keywords and submit corresponding dynamic website URL, no matter how many query strings it contains.
With Google starting to index dynamic URLs a few months ago, the picture is going to change quickly. But other big search engines such as MSN, Yahoo, Overture are still coming to grips with dynamic websites.
We can use any of the above techniques to make a dynamic page URL look like a static page URL. You should not modify you page design or page URL addresses after submitting them to search engines. Your dynamic pages still need to have good contents, meta tags, and appropriate keywords, and should be fully optimized.