Class TrieBuilder

  • Direct Known Subclasses:
    IntTrieBuilder

    public class TrieBuilder
    extends java.lang.Object
    Builder class to manipulate and generate a trie. This is useful for ICU data in primitive types. Provides a compact way to store information that is indexed by Unicode values, such as character properties, types, keyboard values, etc. This is very useful when you have a block of Unicode data that contains significant values while the rest of the Unicode data is unused in the application or when you have a lot of redundance, such as where all 21,000 Han ideographs have the same value. However, lookup is much faster than a hash table. A trie of any primitive data type serves two purposes:
    • Fast access of the indexed values.
    • Smaller memory footprint.
    This is a direct port from the ICU4C version
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      static interface  TrieBuilder.DataManipulate
      Character data in com.ibm.impl.Trie have different user-specified format for different purposes.
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected static int BMP_INDEX_LENGTH_
      Length of the BMP portion of the index (stage 1) array.
      static int DATA_BLOCK_LENGTH
      Number of data values in a stage 2 (data array) block.
      protected static int DATA_GRANULARITY_
      The alignment size of a stage 2 data block.
      protected static int INDEX_SHIFT_
      Shift size for shifting left the index array values.
      protected int m_dataCapacity_  
      protected int m_dataLength_  
      protected int[] m_index_
      Index values at build-time are 32 bits wide for easier processing.
      protected int m_indexLength_  
      protected boolean m_isCompacted_  
      protected boolean m_isLatin1Linear_  
      protected int[] m_map_
      Map of adjusted indexes, used in utrie_compact().
      protected static int MASK_
      Mask for getting the lower bits from the input index.
      private static int MAX_BUILD_TIME_DATA_LENGTH_
      Maximum length of the build-time data (stage 2) array.
      protected static int MAX_DATA_LENGTH_
      Maximum length of the runtime data (stage 2) array.
      protected static int MAX_INDEX_LENGTH_
      Length of the index (stage 1) array before folding.
      protected static int OPTIONS_DATA_IS_32_BIT_
      If set, then the data (stage 2) array is 32 bits wide.
      protected static int OPTIONS_INDEX_SHIFT_
      Shifting to position the index value in options
      protected static int OPTIONS_LATIN1_IS_LINEAR_
      If set, then Latin-1 data (for U+0000..U+00ff) is stored in the data (stage 2) array as a simple, linear array at data + DATA_BLOCK_LENGTH.
      protected static int SHIFT_
      Shift size for shifting right the input index.
      protected static int SURROGATE_BLOCK_COUNT_
      Number of index (stage 1) entries per lead surrogate.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      protected static boolean equal_int​(int[] array, int start1, int start2, int length)
      Compare two sections of an array for equality.
      protected static int findSameIndexBlock​(int[] index, int indexLength, int otherBlock)
      Finds the same index block as the otherBlock
      protected void findUnusedBlocks()
      Set a value in the trie index map to indicate which data block is referenced and which one is not.
      boolean isInZeroBlock​(int ch)
      Checks if the character belongs to a zero block in the trie
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • DATA_BLOCK_LENGTH

        public static final int DATA_BLOCK_LENGTH
        Number of data values in a stage 2 (data array) block. 2, 4, 8, .., 0x200
        See Also:
        Constant Field Values
      • m_index_

        protected int[] m_index_
        Index values at build-time are 32 bits wide for easier processing. Bit 31 is set if the data block is used by multiple index values (from setRange()).
      • m_indexLength_

        protected int m_indexLength_
      • m_dataCapacity_

        protected int m_dataCapacity_
      • m_dataLength_

        protected int m_dataLength_
      • m_isLatin1Linear_

        protected boolean m_isLatin1Linear_
      • m_isCompacted_

        protected boolean m_isCompacted_
      • m_map_

        protected int[] m_map_
        Map of adjusted indexes, used in utrie_compact(). Maps from original indexes to new ones.
      • SHIFT_

        protected static final int SHIFT_
        Shift size for shifting right the input index. 1..9
        See Also:
        Constant Field Values
      • MAX_INDEX_LENGTH_

        protected static final int MAX_INDEX_LENGTH_
        Length of the index (stage 1) array before folding. Maximum number of Unicode code points (0x110000) shifted right by SHIFT.
        See Also:
        Constant Field Values
      • BMP_INDEX_LENGTH_

        protected static final int BMP_INDEX_LENGTH_
        Length of the BMP portion of the index (stage 1) array.
        See Also:
        Constant Field Values
      • SURROGATE_BLOCK_COUNT_

        protected static final int SURROGATE_BLOCK_COUNT_
        Number of index (stage 1) entries per lead surrogate. Same as number of indexe entries for 1024 trail surrogates, ==0x400>>UTRIE_SHIFT 10 - SHIFT == Number of bits of a trail surrogate that are used in index table lookups.
        See Also:
        Constant Field Values
      • MASK_

        protected static final int MASK_
        Mask for getting the lower bits from the input index. DATA_BLOCK_LENGTH - 1.
        See Also:
        Constant Field Values
      • INDEX_SHIFT_

        protected static final int INDEX_SHIFT_
        Shift size for shifting left the index array values. Increases possible data size with 16-bit index values at the cost of compactability. This requires blocks of stage 2 data to be aligned by UTRIE_DATA_GRANULARITY. 0..UTRIE_SHIFT
        See Also:
        Constant Field Values
      • MAX_DATA_LENGTH_

        protected static final int MAX_DATA_LENGTH_
        Maximum length of the runtime data (stage 2) array. Limited by 16-bit index values that are left-shifted by INDEX_SHIFT_.
        See Also:
        Constant Field Values
      • OPTIONS_INDEX_SHIFT_

        protected static final int OPTIONS_INDEX_SHIFT_
        Shifting to position the index value in options
        See Also:
        Constant Field Values
      • OPTIONS_DATA_IS_32_BIT_

        protected static final int OPTIONS_DATA_IS_32_BIT_
        If set, then the data (stage 2) array is 32 bits wide.
        See Also:
        Constant Field Values
      • OPTIONS_LATIN1_IS_LINEAR_

        protected static final int OPTIONS_LATIN1_IS_LINEAR_
        If set, then Latin-1 data (for U+0000..U+00ff) is stored in the data (stage 2) array as a simple, linear array at data + DATA_BLOCK_LENGTH.
        See Also:
        Constant Field Values
      • DATA_GRANULARITY_

        protected static final int DATA_GRANULARITY_
        The alignment size of a stage 2 data block. Also the granularity for compaction.
        See Also:
        Constant Field Values
      • MAX_BUILD_TIME_DATA_LENGTH_

        private static final int MAX_BUILD_TIME_DATA_LENGTH_
        Maximum length of the build-time data (stage 2) array. The maximum length is 0x110000 + DATA_BLOCK_LENGTH + 0x400. (Number of Unicode code points + one all-initial-value block + possible duplicate entries for 1024 lead surrogates.)
        See Also:
        Constant Field Values
    • Constructor Detail

      • TrieBuilder

        protected TrieBuilder()
      • TrieBuilder

        protected TrieBuilder​(TrieBuilder table)
    • Method Detail

      • isInZeroBlock

        public boolean isInZeroBlock​(int ch)
        Checks if the character belongs to a zero block in the trie
        Parameters:
        ch - codepoint which data is to be retrieved
        Returns:
        true if ch is in the zero block
      • equal_int

        protected static final boolean equal_int​(int[] array,
                                                 int start1,
                                                 int start2,
                                                 int length)
        Compare two sections of an array for equality.
      • findUnusedBlocks

        protected void findUnusedBlocks()
        Set a value in the trie index map to indicate which data block is referenced and which one is not. utrie_compact() will remove data blocks that are not used at all. Set - 0 if it is used - -1 if it is not used
      • findSameIndexBlock

        protected static final int findSameIndexBlock​(int[] index,
                                                      int indexLength,
                                                      int otherBlock)
        Finds the same index block as the otherBlock
        Parameters:
        index - array
        indexLength - size of index
        otherBlock -
        Returns:
        same index block