Class UnicodeSetStringSpan


  • public class UnicodeSetStringSpan
    extends java.lang.Object
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      private static class  UnicodeSetStringSpan.OffsetList
      Helper class for UnicodeSetStringSpan.
    • Constructor Summary

      Constructors 
      Constructor Description
      UnicodeSetStringSpan​(UnicodeSetStringSpan otherStringSpan, java.util.ArrayList<java.lang.String> newParentSetStrings)
      Constructs a copy of an existing UnicodeSetStringSpan.
      UnicodeSetStringSpan​(UnicodeSet set, java.util.ArrayList<java.lang.String> setStrings, int which)
      Constructs for all variants of span(), or only for any one variant.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      private void addToSpanNotSet​(int c)
      Adds a starting or ending string character to the spanNotSet so that a character span ends before any string.
      boolean contains​(int c)
      For fast UnicodeSet::contains(c).
      (package private) static short makeSpanLengthByte​(int spanLength)  
      private static boolean matches16​(java.lang.CharSequence s, int start, java.lang.String t, int length)  
      (package private) static boolean matches16CPB​(java.lang.CharSequence s, int start, int limit, java.lang.String t, int tlength)
      Compare 16-bit Unicode strings (which may be malformed UTF-16) at code point boundaries.
      boolean needsStringSpanUTF16()
      Do the strings need to be checked in span() etc.?
      int span​(java.lang.CharSequence s, int start, UnicodeSet.SpanCondition spanCondition)
      Spans a string.
      int spanAndCount​(java.lang.CharSequence s, int start, UnicodeSet.SpanCondition spanCondition, OutputInt outCount)
      Spans a string and counts the smallest number of set elements on any path across the span.
      int spanBack​(java.lang.CharSequence s, int length, UnicodeSet.SpanCondition spanCondition)
      Span a string backwards.
      private int spanContainedAndCount​(java.lang.CharSequence s, int start, OutputInt outCount)  
      private int spanNot​(java.lang.CharSequence s, int start, OutputInt outCount)
      Algorithm for spanNot()==span(SpanCondition.NOT_CONTAINED) Theoretical algorithm: - Iterate through the string, and at each code point boundary: + If the code point there is in the set, then return with the current position.
      private int spanNotBack​(java.lang.CharSequence s, int length)  
      (package private) static int spanOne​(UnicodeSet set, java.lang.CharSequence s, int start, int length)
      Does the set contain the next code point? If so, return its length; otherwise return its negative length.
      (package private) static int spanOneBack​(UnicodeSet set, java.lang.CharSequence s, int length)  
      private int spanWithStrings​(java.lang.CharSequence s, int start, int spanLimit, UnicodeSet.SpanCondition spanCondition)
      Synchronized method for complicated spans using the offsets.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • FWD_UTF16_NOT_CONTAINED

        public static final int FWD_UTF16_NOT_CONTAINED
        See Also:
        Constant Field Values
      • BACK_UTF16_NOT_CONTAINED

        public static final int BACK_UTF16_NOT_CONTAINED
        See Also:
        Constant Field Values
      • ALL_CP_CONTAINED

        static final short ALL_CP_CONTAINED
        Special spanLength short values. (since Java has not unsigned byte type) All code points in the string are contained in the parent set.
        See Also:
        Constant Field Values
      • spanSet

        private UnicodeSet spanSet
        Set for span(). Same as parent but without strings.
      • spanNotSet

        private UnicodeSet spanNotSet
        Set for span(not contained). Same as spanSet, plus characters that start or end strings.
      • strings

        private java.util.ArrayList<java.lang.String> strings
        The strings of the parent set.
      • spanLengths

        private short[] spanLengths
        The lengths of span(), spanBack() etc. for each string.
      • maxLength16

        private final int maxLength16
        Maximum lengths of relevant strings.
      • someRelevant

        private boolean someRelevant
        Are there strings that are not fully contained in the code point set?
      • all

        private boolean all
        Set up for all variants of span()?
    • Constructor Detail

      • UnicodeSetStringSpan

        public UnicodeSetStringSpan​(UnicodeSet set,
                                    java.util.ArrayList<java.lang.String> setStrings,
                                    int which)
        Constructs for all variants of span(), or only for any one variant. Initializes as little as possible, for single use.
      • UnicodeSetStringSpan

        public UnicodeSetStringSpan​(UnicodeSetStringSpan otherStringSpan,
                                    java.util.ArrayList<java.lang.String> newParentSetStrings)
        Constructs a copy of an existing UnicodeSetStringSpan. Assumes which==ALL for a frozen set.
    • Method Detail

      • needsStringSpanUTF16

        public boolean needsStringSpanUTF16()
        Do the strings need to be checked in span() etc.?
        Returns:
        true if strings need to be checked (call span() here), false if not (use a BMPSet for best performance).
      • contains

        public boolean contains​(int c)
        For fast UnicodeSet::contains(c).
      • addToSpanNotSet

        private void addToSpanNotSet​(int c)
        Adds a starting or ending string character to the spanNotSet so that a character span ends before any string.
      • span

        public int span​(java.lang.CharSequence s,
                        int start,
                        UnicodeSet.SpanCondition spanCondition)
        Spans a string.
        Parameters:
        s - The string to be spanned
        start - The start index that the span begins
        spanCondition - The span condition
        Returns:
        the limit (exclusive end) of the span
      • spanWithStrings

        private int spanWithStrings​(java.lang.CharSequence s,
                                    int start,
                                    int spanLimit,
                                    UnicodeSet.SpanCondition spanCondition)
        Synchronized method for complicated spans using the offsets. Avoids synchronization for simple cases.
        Parameters:
        spanLimit - = spanSet.span(s, start, CONTAINED)
      • spanAndCount

        public int spanAndCount​(java.lang.CharSequence s,
                                int start,
                                UnicodeSet.SpanCondition spanCondition,
                                OutputInt outCount)
        Spans a string and counts the smallest number of set elements on any path across the span.

        For proper counting, we cannot ignore strings that are fully contained in code point spans.

        If the set does not have any fully-contained strings, then we could optimize this like span(), but such sets are likely rare, and this is at least still linear.

        Parameters:
        s - The string to be spanned
        start - The start index that the span begins
        spanCondition - The span condition
        outCount - The count
        Returns:
        the limit (exclusive end) of the span
      • spanContainedAndCount

        private int spanContainedAndCount​(java.lang.CharSequence s,
                                          int start,
                                          OutputInt outCount)
      • spanBack

        public int spanBack​(java.lang.CharSequence s,
                            int length,
                            UnicodeSet.SpanCondition spanCondition)
        Span a string backwards.
        Parameters:
        s - The string to be spanned
        spanCondition - The span condition
        Returns:
        The string index which starts the span (i.e. inclusive).
      • spanNot

        private int spanNot​(java.lang.CharSequence s,
                            int start,
                            OutputInt outCount)
        Algorithm for spanNot()==span(SpanCondition.NOT_CONTAINED) Theoretical algorithm: - Iterate through the string, and at each code point boundary: + If the code point there is in the set, then return with the current position. + If a set string matches at the current position, then return with the current position. Optimized implementation: (Same assumption as for span() above.) Create and cache a spanNotSet which contains all of the single code points of the original set but none of its strings. For each set string add its initial code point to the spanNotSet. (Also add its final code point for spanNotBack().) - Loop: + Do spanLength=spanNotSet.span(SpanCondition.NOT_CONTAINED). + If the current code point is in the original set, then return the current position. + If any set string matches at the current position, then return the current position. + If there is no match at the current position, neither for the code point there nor for any set string, then skip this code point and continue the loop. This happens for set-string-initial code points that were added to spanNotSet when there is not actually a match for such a set string.
        Parameters:
        s - The string to be spanned
        start - The start index that the span begins
        outCount - If not null: Receives the number of code points across the span.
        Returns:
        the limit (exclusive end) of the span
      • spanNotBack

        private int spanNotBack​(java.lang.CharSequence s,
                                int length)
      • makeSpanLengthByte

        static short makeSpanLengthByte​(int spanLength)
      • matches16

        private static boolean matches16​(java.lang.CharSequence s,
                                         int start,
                                         java.lang.String t,
                                         int length)
      • matches16CPB

        static boolean matches16CPB​(java.lang.CharSequence s,
                                    int start,
                                    int limit,
                                    java.lang.String t,
                                    int tlength)
        Compare 16-bit Unicode strings (which may be malformed UTF-16) at code point boundaries. That is, each edge of a match must not be in the middle of a surrogate pair.
        Parameters:
        s - The string to match in.
        start - The start index of s.
        limit - The limit of the subsequence of s being spanned.
        t - The substring to be matched in s.
        tlength - The length of t.
      • spanOne

        static int spanOne​(UnicodeSet set,
                           java.lang.CharSequence s,
                           int start,
                           int length)
        Does the set contain the next code point? If so, return its length; otherwise return its negative length.
      • spanOneBack

        static int spanOneBack​(UnicodeSet set,
                               java.lang.CharSequence s,
                               int length)