Home arrow Search Optimization arrow Blocking Complicated URLs with Robots....
SEARCH DEVARTICLES

TOOLS YOU CAN USE

advertisement

Blocking Complicated URLs with Robots.txt


(Page 1 of 5 )

If you have a large web site, you might have some content that you do not want the search engines to index -- perhaps for duplicate content reasons or you simply don't want someone casually stumbling across it. You know you can use robots.txt, but what if you need to block thousands of pages or block only certain files within a folder? This article will explain some of the more advanced uses of robots.txt. You will even learn how to block dynamic pages!

An Overview of the Robots.txt File

Robots.txt is one of the most important files to place in a web server. Basically, below are the main uses of robots.txt file:

  1. It will tell the bots which URLs cannot be crawled.

  2. When the bots receive these restrictions, they will focus on crawling those parts of your web site which are not restricted.

The main uses are very simple, but actually using robots.txt could be complex and making a mistake with it can banish your site in the search engine index. The objective of this article is to provide advanced techniques in robots.txt for blocking complicated URLs.

To use a robots.txt file you need server access or FTP access. This tutorial assumes you satisfy the following requirements:

  1. You have a web site and you have full control of it with FTP access for the root directory.

  2. You have registered your web site with Google Webmaster tools.

Any webmaster can control which parts of their website can be crawled; the problem is the syntax of the robots.txt file. It is sometimes difficult to create the robots.txt syntax correctly without proper training and tools. After you finish reading this tutorial, you should have full knowledge of how to handle robots.txt.

The basic syntax of robots.txt is:


User-agent: *

Disallow: /file or folder to be blocked

Allow: /file or folder to be allowed


This file should be uploaded to the root directory of your website to properly function and avoid conflicts. The first line, User-agent: * , means that the syntax will be applied to all bots. To avoid serious problems, it is highly recommended that you use "*" in the user-agent as blocking several bots (instead of all of them at once) increases the risk of mistakes and makes the search engines doubtful of your site content.

This discussion focuses more on using the robots.txt file for Google. Principles learned from this article can still be applied to other search engines, such as Yahoo; however the practical examples to be illustrated will be tested in Google.

More Search Optimization Articles
More By Codex-M

blog comments powered by Disqus

SEARCH OPTIMIZATION ARTICLES

- Scientific Results Of 23 Million Visits: Cre...
- Feed Your Blog`s Readers Well
- Laying Out An SEO and Traffic Generation Str...
- Educational Viral Content
- Viral Content Ideas that Appeal to Emotions
- Tapping Popularity for Your Site`s Content
- Mobile SEO Not So Different After All
- List-Worthy Approaches to Viral Content for ...
- Creating Different Types of Viral Content
- How User Generated Content Helps SEO
- Cyber Monday May Break Records
- Should You Move Your Site to HTML5?
- Schema.org and Microdata Markups for SEO
- Search Optimization and 404 Errors
- How to Compute Your Website`s Value
 
SEO Chat Forums  
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Contact Us 
Site Map 
Request Media Kit
Write For Us Get Paid 
SEO Weekly Newsletter
 
SEO Tools
Adsense Calculator
AdSense Preview
Advanced Meta-Tags
Alexa Rank Tool
Check Server Headers
Class C Checker
Code to Text Ratio
CPM Calculator
Domain Age Check
Domain Typos
Future PageRank
Google Dance
Google Keywords
Google Search
Google Suggest
Google vs Yahoo
Indexed Pages
Keyword Cloud
Keyword Density
Keyword Difficulty
Keyword Optimizer
Keyword Position
Keyword Typos
Link Popularity
Link Price Calculator
Meta Analyzer
Meta Tag Generator
Multiple Link Popularity
Page Comparison
Page Size
PageRank Lookup
PageRank Search
Robots.txt Generator
ROI Calculator 
S.E. Comparison 
S.E. Keyword Position 
Site Link Analyzer 
Spider Simulator 
URL Redirect Check 
URL Rewriting 
Privacy Policy 
Support 


© 2003-2012 by Developer Shed. All rights reserved. DS Cluster 10 - Follow our Sitemap
Popular SEO Chat Topics
All Tutorials & Tools
 
SEO Chat is sponsored by:
Close this Sponsor Message