Below is a list of five reasons search engine spiders may not crawl your site:
- Flash or Java Links
- robots.txt
- Too Many Link on One Page
- Links in Forms
- Links in frames
Flash or Java Links
While Google assures us that Flash is becoming more crawlable, for the time being, you will probably want to avoid relying on embedding links in Flash. If you do it, be sure to have a link outside of Flash on the page as well. The same goes for Java, Silverlight, and any other type of plug-in.
Robots.txt
I wrote an article a while back concerning the uses of robots.txt that may be helpful to check out if you are experiencing a lack of indexing. Basically, the robots.txt file tells the spiders that they may not access the information on a given page (ie; crawl the page).
There can be many reasons why you would not want a page to be available to the engines, but still want it accessible to certain individuals. For instance, you may wish to allow only certain people to view your portfolio page via a direct link, and not have it available to anyone with access to Google.
Too Many Links on the Page
For what should be obvious reasons, piling a million links on a page is a bad idea. For one, it is too confusing to the user. Second, it is usually a sign that you are a spammer or trying to put one over on the search engines. Spiders crawl maybe 100 or so links on any given page. Any more than that puts you at risk of not getting crawled.
Links in Forms
If you have users fill out a form prior to viewing a page, and that is the only way to access the page, be forewarned — it is not likely that the page will be crawled. The reason for this is, bots do not like to provide personal information, nor do they like to click on submit buttons. Call it bigotry if you like, but always be sure to include a link to your content that does not require a form to be submitted.
Links in Frames
Sometimes when you use frames on your page, Google, Yahoo, and Bing get confused by the structure of your site, and that confusion can lead to a traffic jam. Besides ruining your SEO, frames, to me, are just hideous anyway. Avoid using them to avoid this common pitfall.
