Choosing Keywords
  Home arrow Choosing Keywords arrow Page 4 - Matching Strings and Algorithms
SEO Chat Forums  
Choosing Keywords  
Google Optimization  
Link Trading  
MSN Optimization  
Search Engine News  
Search Engine Spiders  
Search Optimization  
Web Directories  
Website Marketing  
Website Promotion  
Website Submission  
Yahoo Optimization  
SEO Tools
Adsense Calculator
AdSense Preview
Advanced Meta-Tags
Alexa Rank Tool
Check Server Headers
Class C Checker
Code to Text Ratio
CPM Calculator
Domain Age Check
Domain Typos
Future PageRank
Google Dance
Google Keywords
Google Search
Google Suggest
Google vs Yahoo
Indexed Pages
Keyword Cloud
Keyword Density
Keyword Difficulty
Keyword Optimizer
Keyword Position
Keyword Typos
Link Popularity
Link Price Calculator
Meta Analyzer
Meta Tag Generator
Multiple Link Popularity
Page Comparison
Page Size
PageRank Lookup
PageRank Search
Robots.txt Generator
ROI Calculator 
S.E. Comparison 
S.E. Keyword Position 
Site Link Analyzer 
Spider Simulator 
URL Redirect Check 
URL Rewriting 
Mobile Linux 
APP Generation ROI 
IBM® developerWorks 
SEO Weekly Newsletter
 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid 
Request Media Kit
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
CHOOSING KEYWORDS

Matching Strings and Algorithms
By: Simon White
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 4 stars4 stars4 stars4 stars4 stars / 20
    2004-03-15

    Table of Contents:
  • Matching Strings and Algorithms
  • Equivalence Methods
  • Wildcards and Regular Expressions
  • Similarity Ranking Methods
  • Conclusions

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     
    ADVERTISEMENT


    Matching Strings and Algorithms - Similarity Ranking Methods


    (Page 4 of 5 )

    Similarity Ranking Methods
    Similarity ranking methods compare a given string to a set of strings and rank those strings in order of similarity. To produce a ranking, we need a way of saying that one match is better than another. This is done by returning a numeric measure of similarity as the result of each comparison. Alternatively, you can think of the distance between two strings, instead of their similarity. Strings with a large distance between them have low similarity, and vice versa.

    Two very common methods for ranking similarity are the Longest Common Sub-string and Edit Distance.

    Longest Common Substring
    The longest common substring between two strings is the longest contiguous chain of characters that exists in both strings. The longer the substring, the better the match between the two strings. This simple approach can work very well in practice.

    A disadvantage of this approach is that the position of an ‘error’ in the input affects the computed similarity between the two strings. If the error occurs in the middle of the string, then the distance between the two strings will be greater than if the error occurred at one end. For example, suppose we make a simple typing error on the keyboard by pressing the key adjacent to the one intended. With a word such as ‘PINEAPPLE’, typing ‘PINESPPLE’ gives a longest common substring of length 4, whereas ‘OINEAPPLE’ gives a value of 8. The problem is that ‘PINESPPLE’ is deemed to be just as good a match with ‘PINEAPPLE’ as the string ‘PINE’, which is probably not what you want.

    Edit Distance
    This method focuses on the most common typing errors, namely character omissions, insertions, substitutions and reversals. The idea is to compute the minimum number of such operations that it would take to transform one string into another. This number gives an indication of the similarities of the strings. A value of 0 indicates that the two strings are identical

    The algorithm can be described more generally by associating a cost with each of the operations, and deriving the distance between two strings as the minimum cost that transforms one string into another. There are two widely recognized variations of the edit distance. The Levenshtein Edit Distance is the most common variation and allows insertion, deletion or substitution of a single character where the cost of each operation is 1. The Damerau Edit Distance is identical to the Levenshtein edit distance, except that it also allows the operation of transposing (swapping) two adjacent characters.
    Implementations of the Levenshtein edit distance can be found on the here.

    Hamming Distance
    Although I do not recommend the Hamming distance for the majority of string-based information retrieval tasks, I should mention it for completeness. The Hamming distance between two character strings is the number of positions in which the characters of the two strings are different. So, for example, the distance between ‘REPAIR’ and ‘REPOSE’ is 3, whereas the distance between ‘WORK’ and ‘REST’ is 4. (According to the literature, strings of different lengths have an infinite Hamming distance between them, so when comparing strings of different lengths, you may decide to ‘cheat’ by padding the shorter of the two strings on the right with extra spaces before comparing the strings.)
    The problem with this metric is that apparently very similar strings can be given a high Hamming distance. Consider that there are no two pairs of characters that match positionally in the following two strings:

    • ‘THE QUICK BROWN FOX JUMPED OVER THE LAZY DOG’
    • ‘MY QUICK BROWN FOX JUMPED OVER THE LAZY DOGS’

    More Choosing Keywords Articles
    More By Simon White


     

    CHOOSING KEYWORDS ARTICLES

    - Increase Your AdSense Revenue Through Keywor...
    - The Lowdown on Keyword Density
    - Using Calendar-Based Keywords
    - Encourage Conversion: More Advanced Keyword ...
    - Advanced Keyword Research Strategies
    - Keyword Research Tips
    - Think Like a Searcher to Increase Your Traff...
    - Using Search Tools for SEO
    - Effective Keyword Choice Strategy and Useful...
    - Content is King: Information Architecture
    - The Hard Line Keyword Sales Pitch
    - Web Development: Keyword Themes Increase Vis...
    - Integrating Your Keywords into Your Content
    - How to Effectively Choose Your Web Site`s Ke...
    - Thinking About Keywords for PPC Ads



     



    © 2003-2009 by Developer Shed. All rights reserved. DS Cluster 4 Hosted by Hostway
    For more Enterprise Application Development news, visit eWeek