Search Optimization
  Home arrow Search Optimization arrow Page 5 - Advanced Use of Robots.txt
SEO Chat Forums  
Choosing Keywords  
Google Optimization  
Link Trading  
MSN Optimization  
Search Engine News  
Search Engine Spiders  
Search Optimization  
Web Directories  
Website Marketing  
Website Promotion  
Website Submission  
Yahoo Optimization  
SEO Tools
Adsense Calculator
AdSense Preview
Advanced Meta-Tags
Alexa Rank Tool
Check Server Headers
Class C Checker
Code to Text Ratio
CPM Calculator
Domain Age Check
Domain Typos
Future PageRank
Google Dance
Google Keywords
Google Search
Google Suggest
Google vs Yahoo
Indexed Pages
Keyword Cloud
Keyword Density
Keyword Difficulty
Keyword Optimizer
Keyword Position
Keyword Typos
Link Popularity
Link Price Calculator
Meta Analyzer
Meta Tag Generator
Multiple Link Popularity
Page Comparison
Page Size
PageRank Lookup
PageRank Search
Robots.txt Generator
ROI Calculator 
S.E. Comparison 
S.E. Keyword Position 
Site Link Analyzer 
Spider Simulator 
URL Redirect Check 
URL Rewriting 
Mobile Linux 
APP Generation ROI 
IBM® developerWorks 
Sun Developer Network 
SEO Weekly Newsletter
 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid 
Request Media Kit
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
SEARCH OPTIMIZATION

Advanced Use of Robots.txt
By: Jennifer Sullivan Cassidy
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 4 stars4 stars4 stars4 stars4 stars / 17
    2005-11-28

    Table of Contents:
  • Advanced Use of Robots.txt
  • MSNbot, Slurp, Googlebot, IA
  • Advanced Robots.txt Commands and Features
  • Meta Tag Instructions and Bandwidth
  • Using Robots.txt for Corporate Security

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     
    ADVERTISEMENT


    Advanced Use of Robots.txt - Using Robots.txt for Corporate Security


    (Page 5 of 5 )

    Using Robots.txt for Corporate Security

    While some of you are familiar with a company called Perfect 10 and its security issues, some are not.  Perfect 10 is an adult company with copyrighted pictures of models.  They filed a preliminary injunction against Google in August of 2005.  According to BusinessWire.com, “The motion for preliminary injunction seeks to enjoin Google from copying, displaying, and distributing Perfect 10 copyrighted images. Perfect 10 filed a complaint against Google, Inc. for copyright infringement and other claims in November of 2004. It is Perfect 10's contention that Google is displaying hundreds of thousands of adult images, from the most tame to the most exceedingly explicit, to draw massive traffic to its web site, which it is converting into hundreds of millions of dollars of advertising revenue. Perfect 10 claims that under the guise of being a "search engine," Google is displaying, free of charge, thousands of copies of the best images from Perfect 10, Playboy, nude scenes from major movies, nude images of supermodels, as well as extremely explicit images of all kinds. Perfect 10 contends that it has sent 35 notices of infringement to Google covering over 6,500 infringing URLs, but that Google continues to display over 3,000 Perfect 10 copyrighted images without authorization.”

    What is interesting in this situation is that the blame actually lies with Perfect 10, Inc.  The company failed to direct the search engine to stay out of its image directory.  Two simple lines in a robots.txt file on their web server would have easily barred Google from indexing these images in the first place, a practice which Google themselves mention in their guidelines for webmasters.

    User-agent: Googlebot-Image
    Disallow: /images

    One good piece of advice given in an SEO forum is this: “If you want to keep something private on the web, .htaccess and passwords are your friends. If you want to keep something out of Google (or any other search engine), robots.txt and meta tags are your friends. If someone can type a URL into a browser and find your page, don't count on a secret URL remaining secret. Use passwords or robots.txt to protect data.”

    Using robots.txt to keep search engines out of sensitive areas is a simple task, and a step that every webmaster has use of.  Search engines have been known to index members-only areas, development documents, and even employee personnel records.  It is the responsibility of the webmaster to ensure the protection of their sensitive data and copyrighted material.  A search engine spider cannot be expected to know the difference between copyrighted material and other data, especially when it makes it clear what would be an easy deterrent to this type of behavior.  This is one of the many consequences a webmaster will face if they do not utilize their robots.txt file.

    Between Clint’s article and this one, I hope you understand the importance of using a robots.txt on your web server.  Ultimately, it’s up to you to help control the behaviors of search engine robots when spidering your site’s pages.  Using robots.txt is easy, and there is no excuse for lack of security, spider bandwidth issues or not getting indexed because you failed to do this simple thing.  If you need help generating a robots.txt, there are many websites that give you step by step instructions, or can even generate the file for you.  With this powerful tool at your disposal, you need to make use of it.  It’s your own fault if you don’t.


    DISCLAIMER: The content provided in this article is not warranted or guaranteed by Developer Shed, Inc. The content provided is intended for entertainment and/or educational purposes in order to introduce to the reader key ideas, concepts, and/or product reviews. As such it is incumbent upon the reader to employ real-world tactics for security and implementation of best practices. We are not liable for any negative consequences that may result from implementing any information covered in our articles or tutorials. If this is a hardware review, it is not recommended to open and/or modify your hardware.

       · I wanted to show you another good example of utilizing your robots.txt to keep...
       · Thanks for the article. I was wondering your comment on that msnbot doesn't spider...
       · At the time the article was written (in early October, I think) that was the case...
       · Thanks for the reply. I'm really new at this whole SEO stuff. I just started doing...
       · MSN Bot spidered my sites just fine before the summer of '05 and I've never had a...
     

    SEARCH OPTIMIZATION ARTICLES

    - Building Search Engine Tag Trails
    - Blocking Complicated URLs with Robots.txt
    - Is Your Web Content Accessible?
    - Links and More SEO Tips for Beginners
    - Ten SEO Guidelines
    - SEO Overview and Tips for Beginners
    - Stumbling Blocks to Web Site Success
    - Web Pages to Include in Your Site
    - Big Sites Don`t Automatically Rule Search En...
    - You Need More Than One Site Map
    - The Whys and Hows of Video Search Optimizati...
    - An SEO`s Experience: 21 Rules for Performing...
    - An SEO Eyeful: Interview with Ronald Herskow...
    - Research Your Competition for SEO
    - How I Became Number 1





    © 2003-2008 by Developer Shed. All rights reserved. DS Cluster 4 hosted by Hostway
    Stay green...Green IT