Search Engine News
  Home arrow Search Engine News arrow Page 2 - Learning to Crawl: an Investigation of...
SEO Chat Forums  
Choosing Keywords  
Google Optimization  
Link Trading  
MSN Optimization  
Search Engine News  
Search Engine Spiders  
Search Optimization  
Web Directories  
Website Marketing  
Website Promotion  
Website Submission  
Yahoo Optimization  
SEO Tools
Adsense Calculator
AdSense Preview
Advanced Meta-Tags
Alexa Rank Tool
Check Server Headers
Class C Checker
Code to Text Ratio
CPM Calculator
Domain Age Check
Domain Typos
Future PageRank
Google Dance
Google Keywords
Google Search
Google Suggest
Google vs Yahoo
Indexed Pages
Keyword Cloud
Keyword Density
Keyword Difficulty
Keyword Optimizer
Keyword Position
Keyword Typos
Link Popularity
Link Price Calculator
Meta Analyzer
Meta Tag Generator
Multiple Link Popularity
Page Comparison
Page Size
PageRank Lookup
PageRank Search
Robots.txt Generator
ROI Calculator 
S.E. Comparison 
S.E. Keyword Position 
Site Link Analyzer 
Spider Simulator 
URL Redirect Check 
URL Rewriting 
Mobile Linux 
APP Generation ROI 
IBM® developerWorks 
SEO Weekly Newsletter
 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid 
Request Media Kit
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
SEARCH ENGINE NEWS

Learning to Crawl: an Investigation of the Personal Web Crawler
By: Bruce Coker
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 3 stars3 stars3 stars3 stars3 stars / 3
    2008-10-21

    Table of Contents:
  • Learning to Crawl: an Investigation of the Personal Web Crawler
  • A better way?
  • Examples
  • Copernic Agent

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     
    ADVERTISEMENT


    Learning to Crawl: an Investigation of the Personal Web Crawler - A better way?


    (Page 2 of 4 )

    There is, of course, another way to approach all this. Why not cut out the middleman? Rather than expecting Google or whoever to magically understand their precise requirements, users can use their own personal web crawler to seek out the information they need. This has a number of significant advantages.

    • Targeted information
      The main advantage is that, unlike those used by the search engines, your own web crawler can be configured to target precisely what you need. As an example, it’s useful to think about media research organizations whose business it is to seek out media reports about their clients. These clients are typically famous individuals or well-known corporations who need to locate information about themselves, usually as the basis for commercial decision making. The Web is obviously a major repository for such information, but a vast amount of pointless information is sure to be found alongside the useful data: blog references, mentions in passing in articles dedicated to other subjects, and references to namesakes are just some of the kinds of information that would typically need to be filtered out to make such research worthwhile.

      A personal web crawler provides one answer to this situation, since it can be configured to ignore certain types of references, or simply to only search certain sites or types of sites. This offers a high degree of control over the information that is returned for a particular search, vastly increasing the likelihood that it will be relevant.

    • Background operation
      Another key advantage of the personal web crawler is that it can work in the background. Unlike typical searches which must be carried out actively by typing search terms into the search engine’s interface, a web crawler will continuously monitor the web – or at least the areas of it you specify - and return results as they are uncovered. This is a far more efficient approach for those who seek similar types of information on a regular or ongoing basis.

    • Privacy
      The privacy benefits of personal crawlers shouldn’t be underestimated, especially in these times of increasing concern over the amount of personal data gathered and retained by Google and other major search engines. This data has commercial value, allowing the delivery of precisely targeted advertising, but many individuals quite justifiably believe that what they search for on the web is nobody’s business but their own.

      With your own crawler, privacy ceases to be a concern, since search records are not visible beyond the local network. On a similar theme, and unlike public search engines, personal crawlers can not be censored, making them highly useful in localities where web access is restricted for political or social reasons.

    Of course there are limitations to the uses of personal web crawlers. A major constraint is imposed by the sheer scale of the Web: to crawl the entire Internet with any degree of efficiency would require a server farm approaching the size of Google’s, which is obviously an impossible aspiration for an individual or small organization. For this reason, crawlers are most suitable when searches can be usefully restricted to very specific areas of the Web – newspaper and media sites for instance, as in the above example.

    There are also limits on the sheer volume of information it is reasonable to expect a personal crawler to gather and index. Again, to attempt to index the entire web would be ridiculous as well as meaningless. Large scale search engines can do that kind of thing much more efficiently than any individual. The core strength of the personal crawler lies in accurately indexing large amounts of very specific information. Used appropriately, it can remove much of the drudgery of this kind of task, leaving the user free to use the information it gathers, which means that he or she won't waste time looking for it.

    More Search Engine News Articles
    More By Bruce Coker


     

    SEARCH ENGINE NEWS ARTICLES

    - Fast Flip, Google`s New News Reading Service
    - Masterseek: a Global Business Search Engine
    - Behavioral Advertising Bill Breaks New Ground
    - Microsoft-Yahoo Deal: Where Do We Go From He...
    - The History of Search and Search Technology
    - Yahoo Closes Geocities
    - Tokoni Takes Storytelling in New Direction
    - Stumpedia: Yet Another Human-Powered Search ...
    - Does Mufin Know Music?
    - Google Layoffs: A Sign of the Times
    - What Makes Question and Answer Sites Popular?
    - Taking a DeepDyve into the Deep Web
    - Is Yahoo`s New CEO Up to the Challenge?
    - Yasni Puts the People in People Search
    - Yasni: Yet Another People Search Engine?





    © 2003-2009 by Developer Shed. All rights reserved. DS Cluster 6 Hosted by Hostway
    For more Enterprise Application Development news, visit eWeek