Search Optimization
  Home arrow Search Optimization arrow Page 3 - Blocking Complicated URLs with Robots....
SEO Chat Forums  
Choosing Keywords  
Google Optimization  
Link Trading  
MSN Optimization  
Search Engine News  
Search Engine Spiders  
Search Optimization  
Web Directories  
Website Marketing  
Website Promotion  
Website Submission  
Yahoo Optimization  
SEO Tools
Adsense Calculator
AdSense Preview
Advanced Meta-Tags
Alexa Rank Tool
Check Server Headers
Class C Checker
Code to Text Ratio
CPM Calculator
Domain Age Check
Domain Typos
Future PageRank
Google Dance
Google Keywords
Google Search
Google Suggest
Google vs Yahoo
Indexed Pages
Keyword Cloud
Keyword Density
Keyword Difficulty
Keyword Optimizer
Keyword Position
Keyword Typos
Link Popularity
Link Price Calculator
Meta Analyzer
Meta Tag Generator
Multiple Link Popularity
Page Comparison
Page Size
PageRank Lookup
PageRank Search
Robots.txt Generator
ROI Calculator 
S.E. Comparison 
S.E. Keyword Position 
Site Link Analyzer 
Spider Simulator 
URL Redirect Check 
URL Rewriting 
Mobile Linux 
APP Generation ROI 
IBM® developerWorks 
SEO Weekly Newsletter
 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid 
Request Media Kit
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
SEARCH OPTIMIZATION

Blocking Complicated URLs with Robots.txt
By: Codex-M
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 4 stars4 stars4 stars4 stars4 stars / 4
    2008-10-28

    Table of Contents:
  • Blocking Complicated URLs with Robots.txt
  • Benefits of Proper Robots.txt Usage
  • Google Webmaster tools Robots.txt Analysis Tool
  • More Rules
  • Folder Name

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     
    ADVERTISEMENT


    Blocking Complicated URLs with Robots.txt - Google Webmaster tools Robots.txt Analysis Tool


    (Page 3 of 5 )

    Google's webmaster tools include the very important robots.txt analysis tool which will help webmasters test their robots.txt file before uploading it to the server. The objective is to check whether those URLs are blocked as intended and to check for syntax errors.

    The advantage is that you can test any web site's robots.txt in the tool, even if the site is not verified in your webmaster tools account. You just need a Google webmaster tools account.

    To use the robots.txt tool, first add the website URL (only the homepage URL) on the dashboard, then click "add site." After the site is added, click the website in the dashboard, then "Tools," and finally, "Analyze Robots.txt."

    Below is what the robots.txt analysis tool looks like in Google Webmaster Tools:



    It is highly important to know that Google is case sensitive in blocking URLs, so if you have blocked /Folder,/foldercan still be indexed by Google because the "f" is lower case and  you've only blocked the one with the upper case "F."

    Also, since the homepage URL is the most important part of any web site, it is highly recommended that you always include them in the "Test URLs against this robots.txt file" analysis.

    What are complicated URLs? What are the rules for blocking items with robots.txt?

    Complicated URLs are often dynamic URLs, and therefore the type of URLs that cannot be blocked by ordinary robots.txt syntax. Below I've listed difficult URLs commonly found in e-commerce sites and blogs:

    1.Blocking/folder/to avoid duplicate content with /folder/default.aspand there other files under/folder. This is tricky, though it looks uncomplicated, as this creates a conflict with/folder. In the Microsoft IIS structure,/folderand/folder/default.aspare one. Assuming you have other files in the /folder such as:

    /folder/fileone.asp

    /folder/filetwo.asp


    If you use the syntax below, it blocks the entire contents of /folder; all files will not be indexed, which is not correct.


    User-agent: *

    Disallow: /folder


    To block properly, you need to use the Allow command.


    User-agent: *

    Disallow: /folder/

    Allow: /folder/default.asp

    Allow: /folder/fileone.asp

    Allow: /folder/filetwo.asp


    The above syntax should block only /folder/ and not affect all the files under it. But please note that Google may not find these files. Therefore, in your homepage, you should always include a consistent navigation link pointing to those files so that they can be crawled.

    The only disadvantage with this technique is that if you add new files under/folder, you will need to update your robots.txt file so that they will be indexed by Google.

    More Search Optimization Articles
    More By Codex-M


       · Like one excample:I have one website url...
     

    SEARCH OPTIMIZATION ARTICLES

    - Mobile SEO: Create, Post, and Track Content ...
    - Has Your Website Been Hacked?
    - WordPress 301 Redirect: Tips and Techniques
    - Five Ways to Optimize Pages
    - Updating WordPress Tips and Techniques
    - WordPress Database Tutorial: Security, Backu...
    - Submit and Update a WordPress Plug-in
    - Are You Optimized? Use SEO Analysis
    - WordPress SEO Tips: Benchmarking Matt Cutts ...
    - How to Increase the Conversion Rate of Your ...
    - SEO Strategies: A Guide to Which Ideas Work ...
    - Setting Up Feedburner for SEO
    - How to Use Feedburner for SEO
    - Statistical Process Control Implementation i...
    - Create Focused SEO with Subtitles



     



    © 2003-2010 by Developer Shed. All rights reserved. DS Cluster 8 Hosted by Hostway
    For more Enterprise Application Development news, visit eWeek