Class BMPSet


  • public final class BMPSet
    extends java.lang.Object
    Helper class for frozen UnicodeSets, implements contains() and span() optimized for BMP code points. Latin-1: Look up bytes. 2-byte characters: Bits organized vertically. 3-byte characters: Use zero/one/mixed data per 64-block in U+0000..U+FFFF, with mixed for illegal ranges. Supplementary characters: Binary search over the supplementary part of the parent set's inversion list.
    • Field Summary

      Fields 
      Modifier and Type Field Description
      private int[] bmpBlockBits
      One bit per 64 BMP code points.
      private boolean[] latin1Contains
      One boolean ('true' or 'false') per Latin-1 character.
      private int[] list
      The inversion list of the parent set, for the slower contains() implementation for mixed BMP blocks and for supplementary code points.
      private int[] list4kStarts
      Inversion list indexes for restricted binary searches in findCodePoint(), from findCodePoint(U+0800, U+1000, U+2000, .., U+F000, U+10000).
      private int listLength  
      private int[] table7FF
      One bit per code point from U+0000..U+07FF.
      static int U16_SURROGATE_OFFSET  
    • Constructor Summary

      Constructors 
      Constructor Description
      BMPSet​(int[] parentList, int parentListLength)  
      BMPSet​(BMPSet otherBMPSet, int[] newParentList, int newParentListLength)  
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      boolean contains​(int c)  
      private boolean containsSlow​(int c, int lo, int hi)  
      private int findCodePoint​(int c, int lo, int hi)
      Same as UnicodeSet.findCodePoint(int c) except that the binary search is restricted for finding code points in a certain range.
      private void initBits()  
      private static void set32x64Bits​(int[] table, int start, int limit)
      Set bits in a bit rectangle in "vertical" bit organization.
      int span​(java.lang.CharSequence s, int start, UnicodeSet.SpanCondition spanCondition, OutputInt outCount)
      Span the initial substring for which each character c has spanCondition==contains(c).
      int spanBack​(java.lang.CharSequence s, int limit, UnicodeSet.SpanCondition spanCondition)
      Symmetrical with span().
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • U16_SURROGATE_OFFSET

        public static int U16_SURROGATE_OFFSET
      • latin1Contains

        private boolean[] latin1Contains
        One boolean ('true' or 'false') per Latin-1 character.
      • table7FF

        private int[] table7FF
        One bit per code point from U+0000..U+07FF. The bits are organized vertically; consecutive code points correspond to the same bit positions in consecutive table words. With code point parts lead=c{10..6} trail=c{5..0} it is set.contains(c)==(table7FF[trail] bit lead) Bits for 0..FF are unused (0).
      • bmpBlockBits

        private int[] bmpBlockBits
        One bit per 64 BMP code points. The bits are organized vertically; consecutive 64-code point blocks correspond to the same bit position in consecutive table words. With code point parts lead=c{15..12} t1=c{11..6} test bits (lead+16) and lead in bmpBlockBits[t1]. If the upper bit is 0, then the lower bit indicates if contains(c) for all code points in the 64-block. If the upper bit is 1, then the block is mixed and set.contains(c) must be called. Bits for 0..7FF are unused (0).
      • list4kStarts

        private int[] list4kStarts
        Inversion list indexes for restricted binary searches in findCodePoint(), from findCodePoint(U+0800, U+1000, U+2000, .., U+F000, U+10000). U+0800 is the first 3-byte-UTF-8 code point. Code points below U+0800 are always looked up in the bit tables. The last pair of indexes is for finding supplementary code points.
      • list

        private final int[] list
        The inversion list of the parent set, for the slower contains() implementation for mixed BMP blocks and for supplementary code points. The list is terminated with list[listLength-1]=0x110000.
      • listLength

        private final int listLength
    • Constructor Detail

      • BMPSet

        public BMPSet​(int[] parentList,
                      int parentListLength)
      • BMPSet

        public BMPSet​(BMPSet otherBMPSet,
                      int[] newParentList,
                      int newParentListLength)
    • Method Detail

      • contains

        public boolean contains​(int c)
      • span

        public final int span​(java.lang.CharSequence s,
                              int start,
                              UnicodeSet.SpanCondition spanCondition,
                              OutputInt outCount)
        Span the initial substring for which each character c has spanCondition==contains(c). It must be spanCondition==0 or 1.
        Parameters:
        start - The start index
        outCount - If not null: Receives the number of code points in the span.
        Returns:
        the limit (exclusive end) of the span NOTE: to reduce the overhead of function call to contains(c), it is manually inlined here. Check for sufficient length for trail unit for each surrogate pair. Handle single surrogates as surrogate code points as usual in ICU.
      • spanBack

        public final int spanBack​(java.lang.CharSequence s,
                                  int limit,
                                  UnicodeSet.SpanCondition spanCondition)
        Symmetrical with span(). Span the trailing substring for which each character c has spanCondition==contains(c). It must be s.length >= limit and spanCondition==0 or 1.
        Returns:
        The string index which starts the span (i.e. inclusive).
      • set32x64Bits

        private static void set32x64Bits​(int[] table,
                                         int start,
                                         int limit)
        Set bits in a bit rectangle in "vertical" bit organization. start
      • initBits

        private void initBits()
      • findCodePoint

        private int findCodePoint​(int c,
                                  int lo,
                                  int hi)
        Same as UnicodeSet.findCodePoint(int c) except that the binary search is restricted for finding code points in a certain range. For restricting the search for finding in the range start..end, pass in lo=findCodePoint(start) and hi=findCodePoint(end) with 0<=lo<=hi
        Parameters:
        c - a character in a subrange of MIN_VALUE..MAX_VALUE
        lo - The lowest index to be returned.
        hi - The highest index to be returned.
        Returns:
        the smallest integer i in the range lo..hi, inclusive, such that c < list[i]
      • containsSlow

        private final boolean containsSlow​(int c,
                                           int lo,
                                           int hi)