Class TrieIterator
- java.lang.Object
-
- com.ibm.icu.impl.TrieIterator
-
- All Implemented Interfaces:
RangeValueIterator
public class TrieIterator extends java.lang.Object implements RangeValueIterator
Class enabling iteration of the values in a Trie.
2015-sep-03 TODO: Only used in test code, move there.
Result of each iteration contains the interval of codepoints that have the same value type and the value type itself.
The comparison of each codepoint value is done via extract(), which the default implementation is to return the value as it is.
Method extract() can be overwritten to perform manipulations on codepoint values in order to perform specialized comparison.
TrieIterator is designed to be a generic iterator for the CharTrie and the IntTrie, hence to accommodate both types of data, the return result will be in terms of int (32 bit) values.
See com.ibm.icu.text.UCharacterTypeIterator for examples of use.
Notes for porting utrie_enum from icu4c to icu4j:
Internally, icu4c's utrie_enum performs all iterations in its body. In Java sense, the caller will have to pass a object with a callback function UTrieEnumRange(const void *context, UChar32 start, UChar32 limit, uint32_t value) into utrie_enum. utrie_enum will then find ranges of codepoints with the same value as determined by UTrieEnumValue(const void *context, uint32_t value). for each range, utrie_enum calls the callback function to perform a task. In this way, icu4c performs the iteration within utrie_enum. To follow the JDK model, icu4j is slightly different from icu4c. Instead of requesting the caller to implement an object for a callback. The caller will have to implement a subclass of TrieIterator, fleshing out the method extract(int) (equivalent to UTrieEnumValue). Independent of icu4j, the caller will have to code his own iteration and flesh out the task (equivalent to UTrieEnumRange) to be performed in the iteration loop.There are basically 3 usage scenarios for porting:
1) UTrieEnumValue is the only implemented callback then just implement a subclass of TrieIterator and override the extract(int) method. The extract(int) method is analogous to UTrieEnumValue callback.
2) UTrieEnumValue and UTrieEnumRange both are implemented then implement a subclass of TrieIterator, override the extract method and iterate, e.g
utrie_enum(&normTrie, _enumPropertyStartsValue, _enumPropertyStartsRange, set);
In Java :
class TrieIteratorImpl extends TrieIterator{ public TrieIteratorImpl(Trie data){ super(data); } public int extract(int value){ // port the implementation of _enumPropertyStartsValue here } } .... TrieIterator fcdIter = new TrieIteratorImpl(fcdTrieImpl.fcdTrie); while(fcdIter.next(result)) { // port the implementation of _enumPropertyStartsRange }
3) UTrieEnumRange is the only implemented callback then just implement the while loop, when utrie_enum is called
// utrie_enum(&fcdTrie, NULL, _enumPropertyStartsRange, set); TrieIterator fcdIter = new TrieIterator(fcdTrieImpl.fcdTrie); while(fcdIter.next(result)){ set.add(result.start); }
- Since:
- release 2.1, Jan 17 2002
- See Also:
Trie
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from interface com.ibm.icu.util.RangeValueIterator
RangeValueIterator.Element
-
-
Field Summary
Fields Modifier and Type Field Description private static int
BMP_INDEX_LENGTH_
Size of the stage 1 BMP indexesprivate static int
DATA_BLOCK_LENGTH_
Number of data values in a stage 2 (data array) block.private static int
LEAD_SURROGATE_MIN_VALUE_
Lead surrogate minimum valueprivate int
m_currentCodepoint_
Next element results and data.private int
m_initialValue_
Initial value for trie valuesprivate int
m_nextBlock_
private int
m_nextBlockIndex_
private int
m_nextCodepoint_
private int
m_nextIndex_
private int
m_nextTrailIndexOffset_
private int
m_nextValue_
private Trie
m_trie_
Trie instanceprivate static int
TRAIL_SURROGATE_COUNT_
Number of trail surrogateprivate static int
TRAIL_SURROGATE_INDEX_BLOCK_LENGTH_
Number of stage 1 indexes for supplementary calculations that maps to each lead surrogate character.private static int
TRAIL_SURROGATE_MIN_VALUE_
Trail surrogate minimum value
-
Constructor Summary
Constructors Constructor Description TrieIterator(Trie trie)
TrieEnumeration constructor
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description private boolean
calculateNextBMPElement(RangeValueIterator.Element element)
Finding the next element.private void
calculateNextSupplementaryElement(RangeValueIterator.Element element)
Finds the next supplementary element.private boolean
checkBlock(int currentValue)
Internal block value calculations Performs calculations on a data block to find codepoints in m_nextBlock_ that has the same value.private boolean
checkBlockDetail(int currentValue)
Internal block value calculations Performs calculations on a data block to find codepoints in m_nextBlock_ after the index m_nextBlockIndex_ that has the same value.private boolean
checkNullNextTrailIndex()
Checks if we are beginning at the start of a initial block.private boolean
checkTrailBlock(int currentValue)
Internal block value calculations Performs calculations on multiple data blocks for a set of trail surrogates to find codepoints in m_nextBlock_ that has the same value.protected int
extract(int value)
Called by next() to extracts a 32 bit value from a trie value used for comparison.boolean
next(RangeValueIterator.Element element)
Returns true if we are not at the end of the iteration, false otherwise.void
reset()
Resets the iterator to the beginning of the iterationprivate void
setResult(RangeValueIterator.Element element, int start, int limit, int value)
Set the result values
-
-
-
Field Detail
-
BMP_INDEX_LENGTH_
private static final int BMP_INDEX_LENGTH_
Size of the stage 1 BMP indexes- See Also:
- Constant Field Values
-
LEAD_SURROGATE_MIN_VALUE_
private static final int LEAD_SURROGATE_MIN_VALUE_
Lead surrogate minimum value- See Also:
- Constant Field Values
-
TRAIL_SURROGATE_MIN_VALUE_
private static final int TRAIL_SURROGATE_MIN_VALUE_
Trail surrogate minimum value- See Also:
- Constant Field Values
-
TRAIL_SURROGATE_COUNT_
private static final int TRAIL_SURROGATE_COUNT_
Number of trail surrogate- See Also:
- Constant Field Values
-
TRAIL_SURROGATE_INDEX_BLOCK_LENGTH_
private static final int TRAIL_SURROGATE_INDEX_BLOCK_LENGTH_
Number of stage 1 indexes for supplementary calculations that maps to each lead surrogate character. See second pass into getRawOffset for the trail surrogate character. 10 for significant number of bits for trail surrogates, 5 for what we discard during shifting.- See Also:
- Constant Field Values
-
DATA_BLOCK_LENGTH_
private static final int DATA_BLOCK_LENGTH_
Number of data values in a stage 2 (data array) block.- See Also:
- Constant Field Values
-
m_trie_
private Trie m_trie_
Trie instance
-
m_initialValue_
private int m_initialValue_
Initial value for trie values
-
m_currentCodepoint_
private int m_currentCodepoint_
Next element results and data.
-
m_nextCodepoint_
private int m_nextCodepoint_
-
m_nextValue_
private int m_nextValue_
-
m_nextIndex_
private int m_nextIndex_
-
m_nextBlock_
private int m_nextBlock_
-
m_nextBlockIndex_
private int m_nextBlockIndex_
-
m_nextTrailIndexOffset_
private int m_nextTrailIndexOffset_
-
-
Constructor Detail
-
TrieIterator
public TrieIterator(Trie trie)
TrieEnumeration constructor- Parameters:
trie
- to be used- Throws:
java.lang.IllegalArgumentException
- throw when argument is null.
-
-
Method Detail
-
next
public final boolean next(RangeValueIterator.Element element)
Returns true if we are not at the end of the iteration, false otherwise.
The next set of codepoints with the same value type will be calculated during this call and returned in the argument element.
- Specified by:
next
in interfaceRangeValueIterator
- Parameters:
element
- return result- Returns:
- true if we are not at the end of the iteration, false otherwise.
- Throws:
java.util.NoSuchElementException
- - if no more elements exist.- See Also:
RangeValueIterator.Element
-
reset
public final void reset()
Resets the iterator to the beginning of the iteration- Specified by:
reset
in interfaceRangeValueIterator
-
extract
protected int extract(int value)
Called by next() to extracts a 32 bit value from a trie value used for comparison. This method is to be overwritten if special manipulation is to be done to retrieve a relevant comparison. The default function is to return the value as it is.- Parameters:
value
- a value from the trie- Returns:
- extracted value
-
setResult
private final void setResult(RangeValueIterator.Element element, int start, int limit, int value)
Set the result values- Parameters:
element
- return result objectstart
- codepoint of rangelimit
- (end + 1) codepoint of rangevalue
- common value of range
-
calculateNextBMPElement
private final boolean calculateNextBMPElement(RangeValueIterator.Element element)
Finding the next element. This method is called just before returning the result of next(). We always store the next element before it is requested. In the case that we have to continue calculations into the supplementary planes, a false will be returned.- Parameters:
element
- return result object- Returns:
- true if the next range is found, false if we have to proceed to the supplementary range.
-
calculateNextSupplementaryElement
private final void calculateNextSupplementaryElement(RangeValueIterator.Element element)
Finds the next supplementary element. For each entry in the trie, the value to be delivered is passed through extract(). We always store the next element before it is requested. Called after calculateNextBMP() completes its round of BMP characters. There is a slight difference in the usage of m_currentCodepoint_ here as compared to calculateNextBMP(). Though both represents the lower bound of the next element, in calculateNextBMP() it gets set at the start of any loop, where-else, in calculateNextSupplementary() since m_currentCodepoint_ already contains the lower bound of the next element (passed down from calculateNextBMP()), we keep it till the end before resetting it to the new value. Note, if there are no more iterations, it will never get to here. Blocked out by next().- Parameters:
element
- return result object
-
checkBlockDetail
private final boolean checkBlockDetail(int currentValue)
Internal block value calculations Performs calculations on a data block to find codepoints in m_nextBlock_ after the index m_nextBlockIndex_ that has the same value. Note m_*_ variables at this point is the next codepoint whose value has not been calculated. But when returned with false, it will be the last codepoint whose value has been calculated.- Parameters:
currentValue
- the value which other codepoints are tested against- Returns:
- true if the whole block has the same value as currentValue or if the whole block has been calculated, false otherwise.
-
checkBlock
private final boolean checkBlock(int currentValue)
Internal block value calculations Performs calculations on a data block to find codepoints in m_nextBlock_ that has the same value. Will call checkBlockDetail() if highlevel check fails. Note m_*_ variables at this point is the next codepoint whose value has not been calculated.- Parameters:
currentBlock
- the initial block containing all currentValuecurrentValue
- the value which other codepoints are tested against- Returns:
- true if the whole block has the same value as currentValue or if the whole block has been calculated, false otherwise.
-
checkTrailBlock
private final boolean checkTrailBlock(int currentValue)
Internal block value calculations Performs calculations on multiple data blocks for a set of trail surrogates to find codepoints in m_nextBlock_ that has the same value. Will call checkBlock() for internal block checks. Note m_*_ variables at this point is the next codepoint whose value has not been calculated.- Parameters:
currentValue
- the value which other codepoints are tested against- Returns:
- true if the whole block has the same value as currentValue or if the whole block has been calculated, false otherwise.
-
checkNullNextTrailIndex
private final boolean checkNullNextTrailIndex()
Checks if we are beginning at the start of a initial block. If we are then the rest of the codepoints in this initial block has the same values. We increment m_nextCodepoint_ and relevant data members if so. This is used only in for the supplementary codepoints because the offset to the trail indexes could be 0.- Returns:
- true if we are at the start of a initial block.
-
-