Home arrow Search Optimization arrow Page 5 - Study Results: Search Engines, Meta Ro...

TOOLS YOU CAN USE

advertisement

Study Results: Search Engines, Meta Robots Tag and Robots.txt - Answering the sixth and seventh questions


(Page 5 of 5 )

For our sixth question, we wanted to know if some engine bots crawl and index "Page 2" which is blocked in robots.txt? (See illustration below)

Likewise, for our seventh question, using the illustration below, we asked whether some search engine bots crawl and index "Page 3."

Affected URLs:

http://www.php-developer.org/blockedbyrobots.php

http://www.php-developer.org/blockedrobotslink.php

Illustration of the question:

Page 1 (Home page) link to --------> Page 2 (This page blocked by robots.txt) then link to ------> Page 3 (Crawlable and indexable page)

Screenshot:

Crawling results:

As expected, no search engine bots crawled the blocked URL: http://www.php-developer.org/blockedbyrobots.php , so they obey the robots.txt well.

Since http://www.php-developer.org/blockedrobotslink.php is referenced by the blocked URL which is not crawled, http://www.php-developer.org/blockedrobotslink.php is also not crawled by all three main search engine bots.

Indexing results:

It is hard to believe that Google indexes both URLs:

http://www.php-developer.org/blockedbyrobots.php

http://www.php-developer.org/blockedrobotslink.php

Both Yahoo and MSN do not index any of the above URLs. This means Google treats the blocked URL differently. Even though it won't come out as a crawled URL in the logs (see above result), from the fact that there are so many links pointing to the blocked URL, Google alone can index those URLs just because of the referenced links.

Conclusions and Recommendations

So many applications can arise from learning the results of this experiment thoroughly, but the most important are as follows:

Preventing duplicate content issues in Oscommerce/other similar CMS-based powered template/websites. Since these templates use a lot of product/content categories and product/content pagination which are highly similar to each other and do not need to be indexed, any SEO professional can simply suggest: <meta name="robots" content="noindex"> in the categories/pagination URLs. This will let search engines ignore the duplicate content URLs (categories/pagination) but still allow them to index the product URLs/inner important content or posts (with the exception of the Bing search engine; see results above).

Completely preventing search engines from indexing a particular page with sensitive content. Now that we know that blocking URLs using robots.txt can still make the URLs appear in search engine results, the best method is to place: <meta name="robots" content="noindex, nofollow"> on URLs if you want them to never be indexed at all by search engines. But do note that they will also never follow links on that page, so if you have important/indexable URLs deeper in the site's structure, search engines may never crawl and index it.

Saving bandwidth consumed by search engine bots on unimportant URLs. The best approach for this is to use robots.txt. This is because the top three main search engine bots (Google, Yahoo and MSN) will never crawl URLs blocked by robots.txt; this is proven by the experiment. Bear in mind that Googlebot will still find and indexed URLs found on the robots.txt blocked pages, so if you have sensitive data, this may concern you.

blog comments powered by Disqus

SEARCH OPTIMIZATION ARTICLES

- Write Content For the Four Buying Personalit...
- Write SEO Content for Your Visitor`s Goals
- Title Tags: Not Just for Keywords Anymore
- The Challenge of SEO for Large Enterprises
- Viral Writing: the Beauty of Controversy
- The 375 Million Active Searchers You`re Prob...
- A Closer Look at Crushing Local SEO Competit...
- Crushing Local SEO Competition: A Case Study
- More Ways to Get Attention For Your Blog
- Cosmetic Surgery Marketing: Inbound Marketin...
- Dominating Local SEO
- Local SEO: Secrets to Killing The Competition
- Scientific Results Of 23 Million Visits: Cre...
- Feed Your Blog`s Readers Well
- Laying Out An SEO and Traffic Generation Str...
 
SEO Chat Forums  
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Contact Us 
Site Map 
Request Media Kit
Write For Us Get Paid 
SEO Weekly Newsletter
 
SEO Tools
Adsense Calculator
AdSense Preview
Advanced Meta-Tags
Alexa Rank Tool
Check Server Headers
Class C Checker
Code to Text Ratio
CPM Calculator
Domain Age Check
Domain Typos
Future PageRank
Google Dance
Google Keywords
Google Search
Google Suggest
Google vs Yahoo
Indexed Pages
Keyword Cloud
Keyword Density
Keyword Difficulty
Keyword Optimizer
Keyword Position
Keyword Typos
Link Popularity
Link Price Calculator
Meta Analyzer
Meta Tag Generator
Multiple Link Popularity
Page Comparison
Page Size
PageRank Lookup
PageRank Search
Robots.txt Generator
ROI Calculator 
S.E. Comparison 
S.E. Keyword Position 
Site Link Analyzer 
Spider Simulator 
URL Redirect Check 
URL Rewriting 
Privacy Policy 
Support 


© 2003-2012 by Developer Shed. All rights reserved. DS Cluster 8 - Follow our Sitemap
Popular SEO Chat Topics
All Tutorials & Tools
 
SEO Chat is sponsored by:
Close this Sponsor Message