How Search Engines Work (and Sometimes Don’t) - Stumbling Instead of Crawling
(Page 2 of 4 )
You’re probably thinking chiefly of your human visitors when you set up your website’s navigation, as well you should. But certain kinds of navigation structures will trip up spiders, making it less likely for those visitors to find your site in the first place. As an added bonus, many of the things you do to your site that will make it easier for a spider to find content, will often make it easier for visitors to navigate your site.
It’s worth keeping in mind, by the way, that you might not want spiders to be able to index everything on your site. If you own a site with content that users pay a fee to access, you probably don’t want a Google bot to grab that content and show it to anyone who enters the right keywords. There are ways to deliberately block spiders from such content. In keeping with the rest of this article, which is intended mainly as an introduction, they will only be mentioned briefly here.
Dynamic URLs are one of the biggest stumbling blocks for search engine spiders. In particular, pages with two or more dynamic parameters will give a spider fits. You know a dynamic URL when you see it; it usually has a lot of “garbage” in it such as question marks, equal signs, ampersands (&) and percent signs. These pages are great for human users, who usually get to them by setting certain parameters on a page. For example, typing a zip code into a box at weather.com will return a page that describes the weather for a particular area of the US – and a dynamic URL as the page location.
There are other ways in which spiders don’t like complexity. For example, pages with more than 100 unique links to other pages on the same site can make them get tired with just one look. A spider may not follow each link. If you are trying to build a site map, there are better ways to organize it.
Pages that are buried more than three clicks from your website’s home page also might not be crawled. Spiders don’t like to go that deep. For that matter, many humans can get “lost” on a website with that many levels of links if there isn’t some kind of navigational guidance.
Pages that require a “Session ID” or cookie to enable navigation also might not be spidered. Spiders aren’t browsers, and don’t have the same capabilities. They may not be able to retain these forms of identification.
Another stumbling block for spiders is pages that are split into “frames.” Many web designers like frames; it allows them to keep page navigation in one place even when a user scrolls through content. But spiders find pages with frames confusing. To them, content is content, and they have no way of knowing which pages should go in the search results. Frankly, many users don’t like pages with frames either; rather than providing a cleaner interface, such pages often look cluttered.
Next: More Stumbling Blocks >>
More Search Engine Spiders Articles
More By Terri Wells