Package com.ibm.icu.impl.coll
Class CollationIterator
java.lang.Object
com.ibm.icu.impl.coll.CollationIterator
- Direct Known Subclasses:
CollationDataBuilder.DataBuilderCollationIterator
,IterCollationIterator
,UTF16CollationIterator
Collation element iterator and abstract character iterator.
When a method returns a code point value, it must be in 0..10FFFF,
except it can be negative as a sentinel value.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprivate static final class
private static final class
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate CollationIterator.CEBuffer
private int
protected final CollationData
private boolean
protected static final long
private int
private CollationIterator.SkippedState
protected final Trie2_32
-
Constructor Summary
ConstructorsConstructorDescriptionPartially constructs the iterator.CollationIterator
(CollationData d, boolean numeric) -
Method Summary
Modifier and TypeMethodDescriptionprotected final void
appendCEsFromCE32
(CollationData d, int c, int ce32, boolean forward) private final void
appendNumericCEs
(int ce32, boolean forward) Turns a string of digits (bytes 0..9) into a sequence of CEs that will sort in numeric order.private final void
appendNumericSegmentCEs
(CharSequence digits) Turns 1..254 digits into a sequence of CEs.protected abstract void
backwardNumCodePoints
(int num) private final void
backwardNumSkipped
(int n) (package private) final void
clearCEs()
final void
boolean
final int
fetchCEs()
Fetches all CEs.protected boolean
protected abstract void
forwardNumCodePoints
(int num) final long
getCE
(int i) protected int
getCE32FromBuilderData
(int ce32) private final int
getCE32FromPrefix
(CollationData d, int ce32) final long[]
getCEs()
final int
protected int
getDataCE32
(int c) Returns the CE32 from the data trie.abstract int
protected char
Called when handleNextCE32() returns a LEAD_SURROGATE_TAG for a lead surrogate code unit.protected long
Returns the next code point and its local CE32 value.int
hashCode()
protected static final boolean
isLeadSurrogate
(int c) private static final boolean
isSurrogate
(int c) protected static final boolean
isTrailSurrogate
(int c) protected long
makeCodePointAndCE32Pair
(int c, int ce32) final long
nextCE()
Returns the next collation element.private final int
nextCE32FromContraction
(CollationData d, int contractionCE32, CharSequence trieChars, int trieOffset, int ce32, int c) private final int
nextCE32FromDiscontiguousContraction
(CollationData d, CharsTrie suffixes, int ce32, int lookAhead, int c) private final long
nextCEFromCE32
(CollationData d, int c, int ce32) abstract int
Returns the next code point (with post-increment).private final int
final long
previousCE
(UVector32 offsets) Returns the previous collation element.private final long
previousCEUnsafe
(int c, UVector32 offsets) Returns the previous CE when data.isUnsafeBackward(c, isNumeric).abstract int
Returns the previous code point (with pre-decrement).protected final void
reset()
protected final void
reset
(boolean numeric) Resets the state as well as the numeric setting, and completes the initialization.abstract void
resetToOffset
(int newOffset) Resets the iterator state and sets the position to the specified offset.(package private) final void
setCurrentCE
(long ce) Overwrites the current CE (the last one returned by nextCE()).
-
Field Details
-
NO_CP_AND_CE32
protected static final long NO_CP_AND_CE32- See Also:
-
trie
-
data
-
ceBuffer
-
cesIndex
private int cesIndex -
skipped
-
numCpFwd
private int numCpFwd -
isNumeric
private boolean isNumeric
-
-
Constructor Details
-
CollationIterator
Partially constructs the iterator. In Java, we cache partially constructed iterators and finish their setup when starting to work on text (via reset(boolean) and the setText(numeric, ...) methods of subclasses). This avoids memory allocations for iterators that remain unused.In C++, there is only one constructor, and iterators are stack-allocated as needed.
-
CollationIterator
-
-
Method Details
-
equals
-
hashCode
public int hashCode() -
resetToOffset
public abstract void resetToOffset(int newOffset) Resets the iterator state and sets the position to the specified offset. Subclasses must implement, and must call the parent class method, or CollationIterator.reset(). -
getOffset
public abstract int getOffset() -
nextCE
public final long nextCE()Returns the next collation element. -
fetchCEs
public final int fetchCEs()Fetches all CEs.- Returns:
- getCEsLength()
-
setCurrentCE
final void setCurrentCE(long ce) Overwrites the current CE (the last one returned by nextCE()). -
previousCE
Returns the previous collation element. -
getCEsLength
public final int getCEsLength() -
getCE
public final long getCE(int i) -
getCEs
public final long[] getCEs() -
clearCEs
final void clearCEs() -
clearCEsIfNoneRemaining
public final void clearCEsIfNoneRemaining() -
nextCodePoint
public abstract int nextCodePoint()Returns the next code point (with post-increment). Public for identical-level comparison and for testing. -
previousCodePoint
public abstract int previousCodePoint()Returns the previous code point (with pre-decrement). Public for identical-level comparison and for testing. -
reset
protected final void reset() -
reset
protected final void reset(boolean numeric) Resets the state as well as the numeric setting, and completes the initialization. Only exists in Java where we reset cached CollationIterator instances rather than stack-allocating temporary ones. (See also the constructor comments.) -
handleNextCE32
protected long handleNextCE32()Returns the next code point and its local CE32 value. Returns Collation.FALLBACK_CE32 at the end of the text (c<0) or when c's CE32 value is to be looked up in the base data (fallback). The code point is used for fallbacks, context and implicit weights. It is ignored when the returned CE32 is not special (e.g., FFFD_CE32). Returns the code point in bits 63..32 (signed) and the CE32 in bits 31..0. -
makeCodePointAndCE32Pair
protected long makeCodePointAndCE32Pair(int c, int ce32) -
handleGetTrailSurrogate
protected char handleGetTrailSurrogate()Called when handleNextCE32() returns a LEAD_SURROGATE_TAG for a lead surrogate code unit. Returns the trail surrogate in that case and advances past it, if a trail surrogate follows the lead surrogate. Otherwise returns any other code unit and does not advance. -
forbidSurrogateCodePoints
protected boolean forbidSurrogateCodePoints()- Returns:
- false if surrogate code points U+D800..U+DFFF map to their own implicit primary weights (for UTF-16), or true if they map to CE(U+FFFD) (for UTF-8)
-
forwardNumCodePoints
protected abstract void forwardNumCodePoints(int num) -
backwardNumCodePoints
protected abstract void backwardNumCodePoints(int num) -
getDataCE32
protected int getDataCE32(int c) Returns the CE32 from the data trie. Normally the same as data.getCE32(), but overridden in the builder. Call this only when the faster data.getCE32() cannot be used. -
getCE32FromBuilderData
protected int getCE32FromBuilderData(int ce32) -
appendCEsFromCE32
-
isSurrogate
private static final boolean isSurrogate(int c) -
isLeadSurrogate
protected static final boolean isLeadSurrogate(int c) -
isTrailSurrogate
protected static final boolean isTrailSurrogate(int c) -
nextCEFromCE32
-
getCE32FromPrefix
-
nextSkippedCodePoint
private final int nextSkippedCodePoint() -
backwardNumSkipped
private final void backwardNumSkipped(int n) -
nextCE32FromContraction
private final int nextCE32FromContraction(CollationData d, int contractionCE32, CharSequence trieChars, int trieOffset, int ce32, int c) -
nextCE32FromDiscontiguousContraction
private final int nextCE32FromDiscontiguousContraction(CollationData d, CharsTrie suffixes, int ce32, int lookAhead, int c) -
previousCEUnsafe
Returns the previous CE when data.isUnsafeBackward(c, isNumeric). -
appendNumericCEs
private final void appendNumericCEs(int ce32, boolean forward) Turns a string of digits (bytes 0..9) into a sequence of CEs that will sort in numeric order. Starts from this ce32's digit value and consumes the following/preceding digits. The digits string must not be empty and must not have leading zeros. -
appendNumericSegmentCEs
Turns 1..254 digits into a sequence of CEs. Called by appendNumericCEs() for each segment of at most 254 digits.
-