Class UnicodeMap<T>

  • All Implemented Interfaces:
    StringTransform, Transform<java.lang.String,​java.lang.String>, Freezable<UnicodeMap<T>>, java.lang.Cloneable, java.lang.Iterable<java.lang.String>

    public final class UnicodeMap<T>
    extends java.lang.Object
    implements java.lang.Cloneable, Freezable<UnicodeMap<T>>, StringTransform, java.lang.Iterable<java.lang.String>
    Class for mapping Unicode characters and strings to values, optimized for single code points, where ranges of code points have the same value. Much smaller storage than using HashMap, and much faster and more compact than a list of UnicodeSets. The API design mimics Map but can't extend it due to some necessary changes (much as UnicodeSet mimics Set). Note that nulls are not permitted as values; that is, a put(x,null) is the same as remove(x).
    At this point "" is also not allowed as a key, although that may change.
    • Field Detail

      • length

        private int length
      • transitions

        private int[] transitions
      • values

        T[] values
      • availableValues

        private java.util.LinkedHashSet<T> availableValues
      • staleAvailableValues

        private transient boolean staleAvailableValues
      • errorOnReset

        private transient boolean errorOnReset
      • locked

        private transient volatile boolean locked
      • lastIndex

        private int lastIndex
      • stringMap

        private java.util.TreeMap<java.lang.String,​T> stringMap
    • Constructor Detail

      • UnicodeMap

        public UnicodeMap()
      • UnicodeMap

        public UnicodeMap​(UnicodeMap other)
    • Method Detail

      • equals

        public boolean equals​(java.lang.Object other)
        Overrides:
        equals in class java.lang.Object
      • areEqual

        public static boolean areEqual​(java.lang.Object a,
                                       java.lang.Object b)
      • hashCode

        public int hashCode()
        Overrides:
        hashCode in class java.lang.Object
      • cloneAsThawed

        public UnicodeMap<T> cloneAsThawed()
        Standard clone. Warning, as with Collections, does not do deep clone.
        Specified by:
        cloneAsThawed in interface Freezable<T>
      • _checkInvariants

        void _checkInvariants()
      • _findIndex

        private int _findIndex​(int c)
        Finds an index such that inversionList[i] <= codepoint < inversionList[i+1] Assumes that 0 <= codepoint <= 0x10FFFF
        Parameters:
        codepoint -
        Returns:
        the index
      • _checkFind

        private void _checkFind​(int codepoint,
                                int value)
      • __findIndex

        private int __findIndex​(int codepoint)
      • _removeAt

        private void _removeAt​(int index,
                               int count)
        Remove the items from index through index+count-1. Logically reduces the size of the internal arrays.
        Parameters:
        index -
        count -
      • _insertGapAt

        private void _insertGapAt​(int index,
                                  int count)
        Add a gap from index to index+count-1. The values there are undefined, and must be set. Logically grows arrays to accommodate. Actual growth is limited
        Parameters:
        index -
        count -
      • _put

        private UnicodeMap _put​(int codepoint,
                                T value)
        Associates code point with value. Removes any previous association. All code that calls this MUST check for frozen first!
        Parameters:
        codepoint -
        value -
        Returns:
        this, for chaining
      • _putAll

        private UnicodeMap _putAll​(int startCodePoint,
                                   int endCodePoint,
                                   T value)
      • put

        public UnicodeMap<T> put​(int codepoint,
                                 T value)
        Sets the codepoint value.
        Parameters:
        codepoint -
        value -
        Returns:
        this (for chaining)
      • put

        public UnicodeMap<T> put​(java.lang.String string,
                                 T value)
        Sets the codepoint value.
        Parameters:
        codepoint -
        value -
        Returns:
        this (for chaining)
      • putAll

        public UnicodeMap<T> putAll​(UnicodeSet codepoints,
                                    T value)
        Adds bunch o' codepoints; otherwise like put.
        Parameters:
        codepoints -
        value -
        Returns:
        this (for chaining)
      • putAll

        public UnicodeMap<T> putAll​(int startCodePoint,
                                    int endCodePoint,
                                    T value)
        Adds bunch o' codepoints; otherwise like add.
        Parameters:
        startCodePoint -
        endCodePoint -
        value -
        Returns:
        this (for chaining)
      • putAll

        public UnicodeMap<T> putAll​(UnicodeMap<T> unicodeMap)
        Add all the (main) values from a UnicodeMap
        Parameters:
        unicodeMap - the property to add to the map
        Returns:
        this (for chaining)
      • putAllFiltered

        public UnicodeMap<T> putAllFiltered​(UnicodeMap<T> prop,
                                            UnicodeSet filter)
        Add all the (main) values from a Unicode property
        Parameters:
        prop - the property to add to the map
        Returns:
        this (for chaining)
      • setMissing

        public UnicodeMap<T> setMissing​(T value)
        Set the currently unmapped Unicode code points to the given value.
        Parameters:
        value - the value to set
        Returns:
        this (for chaining)
      • keySet

        public UnicodeSet keySet​(T value,
                                 UnicodeSet result)
        Returns the keyset consisting of all the keys that would produce the given value. Deposits into result if it is not null. Remember to clear if you just want the new values.
      • keySet

        public UnicodeSet keySet​(T value)
        Returns the keyset consisting of all the keys that would produce the given value. the new values.
      • keySet

        public UnicodeSet keySet()
        Returns the keyset consisting of all the keys that would produce (non-null) values.
      • values

        public <U extends java.util.Collection<T>> U values​(U result)
        Returns the list of possible values. Deposits each non-null value into result. Creates result if it is null. Remember to clear result if you are not appending to existing collection.
        Parameters:
        result -
        Returns:
        result
      • values

        public java.util.Set<T> values()
        Convenience method
      • get

        public T get​(int codepoint)
        Gets the value associated with a given code point. Returns null, if there is no such value.
        Parameters:
        codepoint -
        Returns:
        the value
      • get

        public T get​(java.lang.String value)
        Gets the value associated with a given code point. Returns null, if there is no such value.
        Parameters:
        codepoint -
        Returns:
        the value
      • transform

        public java.lang.String transform​(java.lang.String source)
        Change a new string from the source string according to the mappings. For each code point cp, if getValue(cp) is null, append the character, otherwise append getValue(cp).toString() TODO: extend to strings
        Specified by:
        transform in interface StringTransform
        Specified by:
        transform in interface Transform<java.lang.String,​java.lang.String>
        Parameters:
        source -
        Returns:
      • toString

        public java.lang.String toString()
        Overrides:
        toString in class java.lang.Object
      • toString

        public java.lang.String toString​(java.util.Comparator<T> collected)
      • getErrorOnReset

        public boolean getErrorOnReset()
        Returns:
        Returns the errorOnReset value.
      • setErrorOnReset

        public UnicodeMap<T> setErrorOnReset​(boolean errorOnReset)
        Puts the UnicodeMap into a state whereby new mappings are accepted, but changes to old mappings cause an exception.
        Parameters:
        errorOnReset - The errorOnReset to set.
      • isFrozen

        public boolean isFrozen()
        Description copied from interface: Freezable
        Determines whether the object has been frozen or not.
        Specified by:
        isFrozen in interface Freezable<T>
      • freeze

        public UnicodeMap<T> freeze()
        Description copied from interface: Freezable
        Freezes the object.
        Specified by:
        freeze in interface Freezable<T>
        Returns:
        the object itself.
      • findCommonPrefix

        public static int findCommonPrefix​(java.lang.String last,
                                           java.lang.String s)
        Utility to find the maximal common prefix of two strings. TODO: fix supplemental support
      • getRangeCount

        public int getRangeCount()
        Get the number of ranges; used for getRangeStart/End. The ranges together cover all of the single-codepoint keys in the UnicodeMap. Other keys can be gotten with getStrings().
      • getRangeStart

        public int getRangeStart​(int range)
        Get the start of a range. All code points between start and end are in the UnicodeMap's keyset.
      • getRangeEnd

        public int getRangeEnd​(int range)
        Get the start of a range. All code points between start and end are in the UnicodeMap's keyset.
      • getRangeValue

        public T getRangeValue​(int range)
        Get the value for the range.
      • getNonRangeStrings

        public java.util.Set<java.lang.String> getNonRangeStrings()
        Get the strings that are not in the ranges. Returns null if there are none.
        Returns:
      • containsKey

        public boolean containsKey​(java.lang.String key)
      • containsKey

        public boolean containsKey​(int key)
      • containsValue

        public boolean containsValue​(T value)
      • isEmpty

        public boolean isEmpty()
      • putAll

        public UnicodeMap<T> putAll​(java.util.Map<? extends java.lang.String,​? extends T> map)
      • putAllIn

        public UnicodeMap<T> putAllIn​(java.util.Map<? super java.lang.String,​? super T> map)
        Deprecated.
        Utility for extracting map
      • putAllInto

        public <U extends java.util.Map<java.lang.String,​T>> U putAllInto​(U map)
        Utility for extracting map
      • putAllCodepointsInto

        public <U extends java.util.Map<java.lang.Integer,​T>> U putAllCodepointsInto​(U map)
        Utility for extracting map
      • remove

        public UnicodeMap<T> remove​(java.lang.String key)
      • size

        public int size()
      • entrySet

        public java.lang.Iterable<java.util.Map.Entry<java.lang.String,​T>> entrySet()
      • entryRanges

        public java.lang.Iterable<UnicodeMap.EntryRange<T>> entryRanges()
        Returns an Iterable over EntryRange, designed for efficient for loops over UnicodeMaps. Caution: For efficiency, the EntryRange may be reused, so the EntryRange may change on each iteration! The value is guaranteed never to be null. The entryRange.string values (non-null) are after all the ranges.
        Returns:
        entry range, for for loops
      • iterator

        public java.util.Iterator<java.lang.String> iterator()
        Specified by:
        iterator in interface java.lang.Iterable<T>
      • getValue

        public T getValue​(java.lang.String key)
        Old form for compatibility
      • getValue

        public T getValue​(int key)
        Old form for compatibility
      • getAvailableValues

        public java.util.Collection<T> getAvailableValues()
        Old form for compatibility
      • getAvailableValues

        public <U extends java.util.Collection<T>> U getAvailableValues​(U result)
        Old form for compatibility
      • getSet

        public UnicodeSet getSet​(T value)
        Old form for compatibility
      • stringKeys

        public final java.util.Set<java.lang.String> stringKeys()
        Returns the keys that consist of multiple code points.
        Returns:
      • addInverseTo

        public <U extends java.util.Map<T,​UnicodeSet>> U addInverseTo​(U target)
        Gets the inverse of this map, adding to the target. Like putAllIn
        Returns:
      • freeze

        public static <T> java.util.Map<T,​UnicodeSet> freeze​(java.util.Map<T,​UnicodeSet> target)
        Freeze an inverse map.
        Parameters:
        target -
        Returns:
      • putAllInverse

        public UnicodeMap<T> putAllInverse​(java.util.Map<T,​UnicodeSet> source)
        Parameters:
        target -
        Returns: