Class TrieIterator

  • All Implemented Interfaces:
    RangeValueIterator

    public class TrieIterator
    extends java.lang.Object
    implements RangeValueIterator

    Class enabling iteration of the values in a Trie.

    2015-sep-03 TODO: Only used in test code, move there.

    Result of each iteration contains the interval of codepoints that have the same value type and the value type itself.

    The comparison of each codepoint value is done via extract(), which the default implementation is to return the value as it is.

    Method extract() can be overwritten to perform manipulations on codepoint values in order to perform specialized comparison.

    TrieIterator is designed to be a generic iterator for the CharTrie and the IntTrie, hence to accommodate both types of data, the return result will be in terms of int (32 bit) values.

    See com.ibm.icu.text.UCharacterTypeIterator for examples of use.

    Notes for porting utrie_enum from icu4c to icu4j:
    Internally, icu4c's utrie_enum performs all iterations in its body. In Java sense, the caller will have to pass a object with a callback function UTrieEnumRange(const void *context, UChar32 start, UChar32 limit, uint32_t value) into utrie_enum. utrie_enum will then find ranges of codepoints with the same value as determined by UTrieEnumValue(const void *context, uint32_t value). for each range, utrie_enum calls the callback function to perform a task. In this way, icu4c performs the iteration within utrie_enum. To follow the JDK model, icu4j is slightly different from icu4c. Instead of requesting the caller to implement an object for a callback. The caller will have to implement a subclass of TrieIterator, fleshing out the method extract(int) (equivalent to UTrieEnumValue). Independent of icu4j, the caller will have to code his own iteration and flesh out the task (equivalent to UTrieEnumRange) to be performed in the iteration loop.

    There are basically 3 usage scenarios for porting:

    1) UTrieEnumValue is the only implemented callback then just implement a subclass of TrieIterator and override the extract(int) method. The extract(int) method is analogous to UTrieEnumValue callback.

    2) UTrieEnumValue and UTrieEnumRange both are implemented then implement a subclass of TrieIterator, override the extract method and iterate, e.g

    utrie_enum(&normTrie, _enumPropertyStartsValue, _enumPropertyStartsRange, set);
    In Java :

     class TrieIteratorImpl extends TrieIterator{
         public TrieIteratorImpl(Trie data){
             super(data);
         }
         public int extract(int value){
             // port the implementation of _enumPropertyStartsValue here
         }
     }
     ....
     TrieIterator fcdIter  = new TrieIteratorImpl(fcdTrieImpl.fcdTrie);
     while(fcdIter.next(result)) {
         // port the implementation of _enumPropertyStartsRange
     }
     

    3) UTrieEnumRange is the only implemented callback then just implement the while loop, when utrie_enum is called

     // utrie_enum(&fcdTrie, NULL, _enumPropertyStartsRange, set);
     TrieIterator fcdIter  = new TrieIterator(fcdTrieImpl.fcdTrie);
     while(fcdIter.next(result)){
         set.add(result.start);
     }
     

    Since:
    release 2.1, Jan 17 2002
    See Also:
    Trie
    • Field Detail

      • BMP_INDEX_LENGTH_

        private static final int BMP_INDEX_LENGTH_
        Size of the stage 1 BMP indexes
        See Also:
        Constant Field Values
      • LEAD_SURROGATE_MIN_VALUE_

        private static final int LEAD_SURROGATE_MIN_VALUE_
        Lead surrogate minimum value
        See Also:
        Constant Field Values
      • TRAIL_SURROGATE_MIN_VALUE_

        private static final int TRAIL_SURROGATE_MIN_VALUE_
        Trail surrogate minimum value
        See Also:
        Constant Field Values
      • TRAIL_SURROGATE_COUNT_

        private static final int TRAIL_SURROGATE_COUNT_
        Number of trail surrogate
        See Also:
        Constant Field Values
      • TRAIL_SURROGATE_INDEX_BLOCK_LENGTH_

        private static final int TRAIL_SURROGATE_INDEX_BLOCK_LENGTH_
        Number of stage 1 indexes for supplementary calculations that maps to each lead surrogate character. See second pass into getRawOffset for the trail surrogate character. 10 for significant number of bits for trail surrogates, 5 for what we discard during shifting.
        See Also:
        Constant Field Values
      • DATA_BLOCK_LENGTH_

        private static final int DATA_BLOCK_LENGTH_
        Number of data values in a stage 2 (data array) block.
        See Also:
        Constant Field Values
      • m_trie_

        private Trie m_trie_
        Trie instance
      • m_initialValue_

        private int m_initialValue_
        Initial value for trie values
      • m_currentCodepoint_

        private int m_currentCodepoint_
        Next element results and data.
      • m_nextCodepoint_

        private int m_nextCodepoint_
      • m_nextValue_

        private int m_nextValue_
      • m_nextIndex_

        private int m_nextIndex_
      • m_nextBlock_

        private int m_nextBlock_
      • m_nextBlockIndex_

        private int m_nextBlockIndex_
      • m_nextTrailIndexOffset_

        private int m_nextTrailIndexOffset_
    • Constructor Detail

      • TrieIterator

        public TrieIterator​(Trie trie)
        TrieEnumeration constructor
        Parameters:
        trie - to be used
        Throws:
        java.lang.IllegalArgumentException - throw when argument is null.
    • Method Detail

      • next

        public final boolean next​(RangeValueIterator.Element element)

        Returns true if we are not at the end of the iteration, false otherwise.

        The next set of codepoints with the same value type will be calculated during this call and returned in the argument element.

        Specified by:
        next in interface RangeValueIterator
        Parameters:
        element - return result
        Returns:
        true if we are not at the end of the iteration, false otherwise.
        Throws:
        java.util.NoSuchElementException - - if no more elements exist.
        See Also:
        RangeValueIterator.Element
      • reset

        public final void reset()
        Resets the iterator to the beginning of the iteration
        Specified by:
        reset in interface RangeValueIterator
      • extract

        protected int extract​(int value)
        Called by next() to extracts a 32 bit value from a trie value used for comparison. This method is to be overwritten if special manipulation is to be done to retrieve a relevant comparison. The default function is to return the value as it is.
        Parameters:
        value - a value from the trie
        Returns:
        extracted value
      • setResult

        private final void setResult​(RangeValueIterator.Element element,
                                     int start,
                                     int limit,
                                     int value)
        Set the result values
        Parameters:
        element - return result object
        start - codepoint of range
        limit - (end + 1) codepoint of range
        value - common value of range
      • calculateNextBMPElement

        private final boolean calculateNextBMPElement​(RangeValueIterator.Element element)
        Finding the next element. This method is called just before returning the result of next(). We always store the next element before it is requested. In the case that we have to continue calculations into the supplementary planes, a false will be returned.
        Parameters:
        element - return result object
        Returns:
        true if the next range is found, false if we have to proceed to the supplementary range.
      • calculateNextSupplementaryElement

        private final void calculateNextSupplementaryElement​(RangeValueIterator.Element element)
        Finds the next supplementary element. For each entry in the trie, the value to be delivered is passed through extract(). We always store the next element before it is requested. Called after calculateNextBMP() completes its round of BMP characters. There is a slight difference in the usage of m_currentCodepoint_ here as compared to calculateNextBMP(). Though both represents the lower bound of the next element, in calculateNextBMP() it gets set at the start of any loop, where-else, in calculateNextSupplementary() since m_currentCodepoint_ already contains the lower bound of the next element (passed down from calculateNextBMP()), we keep it till the end before resetting it to the new value. Note, if there are no more iterations, it will never get to here. Blocked out by next().
        Parameters:
        element - return result object
      • checkBlockDetail

        private final boolean checkBlockDetail​(int currentValue)
        Internal block value calculations Performs calculations on a data block to find codepoints in m_nextBlock_ after the index m_nextBlockIndex_ that has the same value. Note m_*_ variables at this point is the next codepoint whose value has not been calculated. But when returned with false, it will be the last codepoint whose value has been calculated.
        Parameters:
        currentValue - the value which other codepoints are tested against
        Returns:
        true if the whole block has the same value as currentValue or if the whole block has been calculated, false otherwise.
      • checkBlock

        private final boolean checkBlock​(int currentValue)
        Internal block value calculations Performs calculations on a data block to find codepoints in m_nextBlock_ that has the same value. Will call checkBlockDetail() if highlevel check fails. Note m_*_ variables at this point is the next codepoint whose value has not been calculated.
        Parameters:
        currentBlock - the initial block containing all currentValue
        currentValue - the value which other codepoints are tested against
        Returns:
        true if the whole block has the same value as currentValue or if the whole block has been calculated, false otherwise.
      • checkTrailBlock

        private final boolean checkTrailBlock​(int currentValue)
        Internal block value calculations Performs calculations on multiple data blocks for a set of trail surrogates to find codepoints in m_nextBlock_ that has the same value. Will call checkBlock() for internal block checks. Note m_*_ variables at this point is the next codepoint whose value has not been calculated.
        Parameters:
        currentValue - the value which other codepoints are tested against
        Returns:
        true if the whole block has the same value as currentValue or if the whole block has been calculated, false otherwise.
      • checkNullNextTrailIndex

        private final boolean checkNullNextTrailIndex()
        Checks if we are beginning at the start of a initial block. If we are then the rest of the codepoints in this initial block has the same values. We increment m_nextCodepoint_ and relevant data members if so. This is used only in for the supplementary codepoints because the offset to the trail indexes could be 0.
        Returns:
        true if we are at the start of a initial block.