module Amatch
Constants
- DiceCoefficient
The pair distance between two strings is based on the number of adjacent character pairs, that are contained in both strings. The similiarity metric of two strings s1 and s2 is
2*|union(pairs(s1), pairs(s2))| / |pairs(s1)| + |pairs(s2)|
If it is 1.0 the two strings are an exact match, if less than 1.0 they are more dissimilar. The advantage of considering adjacent characters, is to take account not only of the characters, but also of the character ordering in the original strings.
This metric is very capable to find similarities in natural languages. It is explained in more detail in Simon White's article “How to Strike a Match”, located at this url: www.catalysoft.com/articles/StrikeAMatch.html It is also very similar (a special case) to the method described under citeseer.lcs.mit.edu/gravano01using.html in “Using q-grams in a DBMS for Approximate
String
Processing.”- VERSION
Amatch
version