Search Engine Spiders
  Home arrow Search Engine Spiders arrow Page 2 - The Yahoo SLURP Crawler
SEO Chat Forums  
Choosing Keywords  
Google Optimization  
Link Trading  
MSN Optimization  
Search Engine News  
Search Engine Spiders  
Search Optimization  
Web Directories  
Website Marketing  
Website Promotion  
Website Submission  
Yahoo Optimization  
SEO Tools
Adsense Calculator
AdSense Preview
Advanced Meta-Tags
Alexa Rank Tool
Check Server Headers
Class C Checker
Code to Text Ratio
CPM Calculator
Domain Age Check
Domain Typos
Future PageRank
Google Dance
Google Keywords
Google Search
Google Suggest
Google vs Yahoo
Indexed Pages
Keyword Cloud
Keyword Density
Keyword Difficulty
Keyword Optimizer
Keyword Position
Keyword Typos
Link Popularity
Link Price Calculator
Meta Analyzer
Meta Tag Generator
Multiple Link Popularity
Page Comparison
Page Size
PageRank Lookup
PageRank Search
Robots.txt Generator
ROI Calculator 
S.E. Comparison 
S.E. Keyword Position 
Site Link Analyzer 
Spider Simulator 
URL Redirect Check 
URL Rewriting 
Mobile Linux 
APP Generation ROI 
IBM® developerWorks 
Sun Developer Network 
SEO Weekly Newsletter
 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid 
Request Media Kit
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
SEARCH ENGINE SPIDERS

The Yahoo SLURP Crawler
By: Akinola Akintomide
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 4 stars4 stars4 stars4 stars4 stars / 26
    2006-08-08

    Table of Contents:
  • The Yahoo SLURP Crawler
  • The Robot
  • Stonewalling
  • Getting SLURP to Come Over

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     
    ADVERTISEMENT


    The Yahoo SLURP Crawler - The Robot


    (Page 2 of 4 )

    SLURP crawls websites, scans their contents and meta tags, and travels down the links contained on the page. It then brings back information for the search engine to index. Yahoo SLURP 2.0 stores the full text of the page it crawls in its memory and then returns to Yahoo’s searchable database. This is one of the semi-unique points of Yahoo SLURP; not all search engine crawlers store the entire text of the pages they crawl.

    While SLURP has some features unique to it, it also obeys the robots.txt command. This command is very important since it ensures that you have control over which pages the crawler searches and indexes. This lets you protect the sensitive pages which you need to keep secure, pages which contain information you would rather not have in the hands of hackers (who regularly try and infiltrate search engines databases), or pages which you don’t want indexed at all (for whatever reason).

    Another good thing about the robots.txt file is that it enables you to exclude specific robots, so you can inhibit the Googlebot but enable SLURP to crawl a particular page. This can be useful if you have optimized different pages for separate search engines. This may occur in order to give you flexibility, but a search engine may think you have duplicate pages and may penalize you. So careful use of the robots.txt file should definitely be on our list of how to make your website more search engine friendly. So how do you use the robots.txt file? You open notepad and type in the following lines:

      User-Agent: Slurp
      Disallow: whatsisname.html
      Disallow: page_optimized_for_google.html
      Disallow: credit_card_list.html
      Disallow: whatnot.html

    Save it as robots.txt and upload it into your root directory. You can disallow as many pages for each crawler robot as you want, but to disallow certain pages for another crawler, you start a new line of code.

      User-Agent: Slurp
      Disallow: whatsisname.html
      Disallow: page_optimized_for_google.html
      Disallow: credit_card_list.html
      Disallow: whatnot.html
      User-Agent: Googlebot
      Disallow: page_optimized_for_yahoo.html
      Disallow: credit_card_list.html
      Disallow: whatnot.html

    If you want to disallow all crawlers, you replace the name of the user agent with the wildcard command (*)

    Robots.txt is useful for not getting banned on search engines and can also be used to pinpoint crawlers when they come calling. Only crawlers request Robots.txt, and these requests show up on the server logs.

    More Search Engine Spiders Articles
    More By Akinola Akintomide


       · robots inability to understand context, and their susceptbility to being fooled by...
       · wonderful article!although i have a question about Slurp and underscores. How...
       · I was looking for alot more information than this article gave me. There was...
       · ouch, do u have any paricular questions? i would be more than happy to answer them,...
       · I would agree that it is a fairly generic article.What factors may contribute to...
       · I know hyphens are not seen by some robots, so it probably sees the whole word...
     

    SEARCH ENGINE SPIDERS ARTICLES

    - The Yahoo SLURP Crawler
    - How Search Engines Work (and Sometimes Don’t)
    - Spider Guts
    - Score One for the Spiders?
    - Protect Against Invaders by SPAM-Proofing Yo...
    - ROBOTS.TXT Primer
    - Designing Websites For Humans In A World Of...





    © 2003-2008 by Developer Shed. All rights reserved. DS Cluster 6 hosted by Hostway
    Stay green...Green IT