Search Optimization
  Home arrow Search Optimization arrow Page 4 - Blocking Complicated URLs with Robots....
SEO Chat Forums  
Choosing Keywords  
Google Optimization  
Link Trading  
MSN Optimization  
Search Engine News  
Search Engine Spiders  
Search Optimization  
Web Directories  
Website Marketing  
Website Promotion  
Website Submission  
Yahoo Optimization  
SEO Tools
Adsense Calculator
AdSense Preview
Advanced Meta-Tags
Alexa Rank Tool
Check Server Headers
Class C Checker
Code to Text Ratio
CPM Calculator
Domain Age Check
Domain Typos
Future PageRank
Google Dance
Google Keywords
Google Search
Google Suggest
Google vs Yahoo
Indexed Pages
Keyword Cloud
Keyword Density
Keyword Difficulty
Keyword Optimizer
Keyword Position
Keyword Typos
Link Popularity
Link Price Calculator
Meta Analyzer
Meta Tag Generator
Multiple Link Popularity
Page Comparison
Page Size
PageRank Lookup
PageRank Search
Robots.txt Generator
ROI Calculator 
S.E. Comparison 
S.E. Keyword Position 
Site Link Analyzer 
Spider Simulator 
URL Redirect Check 
URL Rewriting 
Mobile Linux 
APP Generation ROI 
IBM® developerWorks 
SEO Weekly Newsletter
 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid 
Request Media Kit
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
SEARCH OPTIMIZATION

Blocking Complicated URLs with Robots.txt
By: Codex-M
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 4 stars4 stars4 stars4 stars4 stars / 4
    2008-10-28

    Table of Contents:
  • Blocking Complicated URLs with Robots.txt
  • Benefits of Proper Robots.txt Usage
  • Google Webmaster tools Robots.txt Analysis Tool
  • More Rules
  • Folder Name

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     
    ADVERTISEMENT


    Blocking Complicated URLs with Robots.txt - More Rules


    (Page 4 of 5 )

    2.Blocking all of the Dynamic URLs of a single file containing different query strings

    Suppose you need to block all/product.aspand you find out that the URLs are all dynamic, with query strings such as:

    /product.asp?idproduct=1

    /product.asp?idproduct=5

    /product.asp?idproduct=3

    /product.asp?idproduct=4

    /product.asp?idproduct=8


    This list is small, but in actual dynamic web sites it could grow to thousands of URLs. It would be impossible to list all of the URLs you want to block in the robots.txt file. The following approach, therefore, is NOT recommended:

    User-agent: *

    Disallow: /product.asp?idproduct=1

    Disallow: /product.asp?idproduct=5

    Disallow: /product.asp?idproduct=3

    Disallow: /product.asp?idproduct=4

    Disallow: /product.asp?idproduct=8


    The advanced technique lets you block them all in one line by directly blocking the file itself and not including those query strings. So the correct robots.txt syntax for this is just:


    User-agent: *

    Disallow: /product.asp


    The above syntax will block all of the /product.asp pages and their query-related URLs.


    3. Blocking a specific folder name that may occur at different directory levels, associated with different categories and different dates in a blog structure.


    The best example of this issue is Wordpress feeds URLs. Trackback URLs also follow this type of structure. Consider the example below:


    http://www.thisisasampledomain.com/blog/2007/10/20/post1/feed/

    http://www.thisisasampledomain.com/blog/2007/10/20/post2/feed/

    http://www.thisisasampledomain.com/blog/2007/10/20/feed/

    http://www.thisisasampledomain.com/blog/feed

    http://www.thisisasampledomain.com/feed


    This cannot be blocked properly using the syntax below:


    User-agent: *

    Disallow: /blog/2007/10/20/post1/feed

    Disallow: /blog/2007/10/20/post2/feed

    Disallow: /blog/2007/10/20/feed

    Disallow: /blog/feed

    Disallow: /feed


    This is a more challenging problem, as/feedis associated with different posts, different dates and different categories. The above syntax can block only the feed URL in post 1 and post 2. But what if you add another post? You will keep needing to change the robots.txt file, which is not advisable.

    The correct approach involves applying regular expression techniques in the robots.txt file. All /feed URLscan be block using the proper syntax below:

    User-agent: *

    Disallow: */feed


    A similar scenario can be applied to Wordpress trackback URLs:


    User-agent: *

    Disallow: */trackback


    Combining the two robots.txt items into one will look like this:


    User-agent: *

    Disallow: */feed

    Disallow: */trackback


    These will block all feed and trackback URLs regardless of what post title and directory levels they are in the Wordpress blog.

    More Search Optimization Articles
    More By Codex-M


       · Like one excample:I have one website url...
     

    SEARCH OPTIMIZATION ARTICLES

    - More Blogging Tips: Cooking with Gas
    - Blogging Tips from Julie and Julia
    - SEO Essentials: the Proper Web Server and Pl...
    - Steps to Higher Rankings and Traffic
    - Building Linkable Pieces and Titles
    - Page Rank Sculpting
    - Page Rank Optimization
    - ClickTale Review
    - Final Issues: Moving Blogger to WordPress wi...
    - Avoid the Mistakes New SEOs Make
    - Move Your Blogger Blog to WordPress: Getting...
    - How to Move from Blogger to WordPress Using ...
    - Must Have WordPress SEO Plugins
    - Creating Search Engine Friendly URLs with PHP
    - Online Reputation Management with SEO





    © 2003-2009 by Developer Shed. All rights reserved. DS Cluster 6 Hosted by Hostway
    Stay green...Green IT