Class IntersectionSimilarity<T>

  • Type Parameters:
    T - the type of the elements extracted from the character sequence
    All Implemented Interfaces:
    SimilarityScore<IntersectionResult>

    public class IntersectionSimilarity<T>
    extends java.lang.Object
    implements SimilarityScore<IntersectionResult>
    Measures the intersection of two sets created from a pair of character sequences.

    It is assumed that the type T correctly conforms to the requirements for storage within a Set or HashMap. Ideally the type is immutable and implements Object.equals(Object) and Object.hashCode().

    Since:
    1.7
    See Also:
    Set, HashMap
    • Field Summary

      Fields 
      Modifier and Type Field Description
      private java.util.function.Function<java.lang.CharSequence,​java.util.Collection<T>> converter
      The converter used to create the elements from the characters.
    • Constructor Summary

      Constructors 
      Constructor Description
      IntersectionSimilarity​(java.util.function.Function<java.lang.CharSequence,​java.util.Collection<T>> converter)
      Create a new intersection similarity using the provided converter.
    • Field Detail

      • converter

        private final java.util.function.Function<java.lang.CharSequence,​java.util.Collection<T>> converter
        The converter used to create the elements from the characters.
    • Constructor Detail

      • IntersectionSimilarity

        public IntersectionSimilarity​(java.util.function.Function<java.lang.CharSequence,​java.util.Collection<T>> converter)
        Create a new intersection similarity using the provided converter.

        If the converter returns a Set then the intersection result will not include duplicates. Any other Collection is used to produce a result that will include duplicates in the intersect and union.

        Parameters:
        converter - the converter used to create the elements from the characters
        Throws:
        java.lang.IllegalArgumentException - if the converter is null
    • Method Detail

      • getIntersection

        private static <T> int getIntersection​(java.util.Set<T> setA,
                                               java.util.Set<T> setB)
        Computes the intersection between two sets. This is the count of all the elements that are within both sets.
        Type Parameters:
        T - the type of the elements in the set
        Parameters:
        setA - the set A
        setB - the set B
        Returns:
        The intersection
      • apply

        public IntersectionResult apply​(java.lang.CharSequence left,
                                        java.lang.CharSequence right)
        Calculates the intersection of two character sequences passed as input.
        Specified by:
        apply in interface SimilarityScore<T>
        Parameters:
        left - first character sequence
        right - second character sequence
        Returns:
        The intersection result
        Throws:
        java.lang.IllegalArgumentException - if either input sequence is null
      • getIntersection

        private int getIntersection​(IntersectionSimilarity.TinyBag bagA,
                                    IntersectionSimilarity.TinyBag bagB)
        Computes the intersection between two bags. This is the sum of the minimum count of each element that is within both sets.
        Parameters:
        bagA - the bag A
        bagB - the bag B
        Returns:
        The intersection
      • toBag

        private IntersectionSimilarity.TinyBag toBag​(java.util.Collection<T> objects)
        Converts the collection to a bag. The bag will contain the count of each element in the collection.
        Parameters:
        objects - the objects
        Returns:
        The bag