Class LevenshteinDistance
- java.lang.Object
-
- org.apache.commons.text.similarity.LevenshteinDistance
-
- All Implemented Interfaces:
EditDistance<java.lang.Integer>
,SimilarityScore<java.lang.Integer>
public class LevenshteinDistance extends java.lang.Object implements EditDistance<java.lang.Integer>
An algorithm for measuring the difference between two character sequences.This is the number of changes needed to change one sequence into another, where each change is a single character modification (deletion, insertion or substitution).
This code has been adapted from Apache Commons Lang 3.3.
- Since:
- 1.0
-
-
Field Summary
Fields Modifier and Type Field Description private static LevenshteinDistance
DEFAULT_INSTANCE
Default instance.private java.lang.Integer
threshold
Threshold.
-
Constructor Summary
Constructors Constructor Description LevenshteinDistance()
This returns the default instance that uses a version of the algorithm that does not use a threshold parameter.LevenshteinDistance(java.lang.Integer threshold)
If the threshold is not null, distance calculations will be limited to a maximum length.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description java.lang.Integer
apply(java.lang.CharSequence left, java.lang.CharSequence right)
Finds the Levenshtein distance between two Strings.static LevenshteinDistance
getDefaultInstance()
Gets the default instance.java.lang.Integer
getThreshold()
Gets the distance threshold.private static int
limitedCompare(java.lang.CharSequence left, java.lang.CharSequence right, int threshold)
Find the Levenshtein distance between two CharSequences if it's less than or equal to a given threshold.private static int
unlimitedCompare(java.lang.CharSequence left, java.lang.CharSequence right)
Finds the Levenshtein distance between two Strings.
-
-
-
Field Detail
-
DEFAULT_INSTANCE
private static final LevenshteinDistance DEFAULT_INSTANCE
Default instance.
-
threshold
private final java.lang.Integer threshold
Threshold.
-
-
Constructor Detail
-
LevenshteinDistance
public LevenshteinDistance()
This returns the default instance that uses a version of the algorithm that does not use a threshold parameter.- See Also:
getDefaultInstance()
-
LevenshteinDistance
public LevenshteinDistance(java.lang.Integer threshold)
If the threshold is not null, distance calculations will be limited to a maximum length. If the threshold is null, the unlimited version of the algorithm will be used.- Parameters:
threshold
- If this is null then distances calculations will not be limited. This may not be negative.
-
-
Method Detail
-
getDefaultInstance
public static LevenshteinDistance getDefaultInstance()
Gets the default instance.- Returns:
- The default instance
-
limitedCompare
private static int limitedCompare(java.lang.CharSequence left, java.lang.CharSequence right, int threshold)
Find the Levenshtein distance between two CharSequences if it's less than or equal to a given threshold.This implementation follows from Algorithms on Strings, Trees and Sequences by Dan Gusfield and Chas Emerick's implementation of the Levenshtein distance algorithm from http://www.merriampark.com/ld.htm
limitedCompare(null, *, *) = IllegalArgumentException limitedCompare(*, null, *) = IllegalArgumentException limitedCompare(*, *, -1) = IllegalArgumentException limitedCompare("","", 0) = 0 limitedCompare("aaapppp", "", 8) = 7 limitedCompare("aaapppp", "", 7) = 7 limitedCompare("aaapppp", "", 6)) = -1 limitedCompare("elephant", "hippo", 7) = 7 limitedCompare("elephant", "hippo", 6) = -1 limitedCompare("hippo", "elephant", 7) = 7 limitedCompare("hippo", "elephant", 6) = -1
- Parameters:
left
- the first CharSequence, must not be nullright
- the second CharSequence, must not be nullthreshold
- the target threshold, must not be negative- Returns:
- result distance, or -1
-
unlimitedCompare
private static int unlimitedCompare(java.lang.CharSequence left, java.lang.CharSequence right)
Finds the Levenshtein distance between two Strings.A higher score indicates a greater distance.
The previous implementation of the Levenshtein distance algorithm was from https://web.archive.org/web/20120526085419/http://www.merriampark.com/ldjava.htm
This implementation only need one single-dimensional arrays of length s.length() + 1
unlimitedCompare(null, *) = IllegalArgumentException unlimitedCompare(*, null) = IllegalArgumentException unlimitedCompare("","") = 0 unlimitedCompare("","a") = 1 unlimitedCompare("aaapppp", "") = 7 unlimitedCompare("frog", "fog") = 1 unlimitedCompare("fly", "ant") = 3 unlimitedCompare("elephant", "hippo") = 7 unlimitedCompare("hippo", "elephant") = 7 unlimitedCompare("hippo", "zzzzzzzz") = 8 unlimitedCompare("hello", "hallo") = 1
- Parameters:
left
- the first CharSequence, must not be nullright
- the second CharSequence, must not be null- Returns:
- result distance, or -1
- Throws:
java.lang.IllegalArgumentException
- if either CharSequence input isnull
-
apply
public java.lang.Integer apply(java.lang.CharSequence left, java.lang.CharSequence right)
Finds the Levenshtein distance between two Strings.A higher score indicates a greater distance.
The previous implementation of the Levenshtein distance algorithm was from http://www.merriampark.com/ld.htm
Chas Emerick has written an implementation in Java, which avoids an OutOfMemoryError which can occur when my Java implementation is used with very large strings.
This implementation of the Levenshtein distance algorithm is from http://www.merriampark.com/ldjava.htmdistance.apply(null, *) = IllegalArgumentException distance.apply(*, null) = IllegalArgumentException distance.apply("","") = 0 distance.apply("","a") = 1 distance.apply("aaapppp", "") = 7 distance.apply("frog", "fog") = 1 distance.apply("fly", "ant") = 3 distance.apply("elephant", "hippo") = 7 distance.apply("hippo", "elephant") = 7 distance.apply("hippo", "zzzzzzzz") = 8 distance.apply("hello", "hallo") = 1
- Specified by:
apply
in interfaceEditDistance<java.lang.Integer>
- Specified by:
apply
in interfaceSimilarityScore<java.lang.Integer>
- Parameters:
left
- the first string, must not be nullright
- the second string, must not be null- Returns:
- result distance, or -1
- Throws:
java.lang.IllegalArgumentException
- if either String inputnull
-
getThreshold
public java.lang.Integer getThreshold()
Gets the distance threshold.- Returns:
- The distance threshold
-
-