Class MutableCodePointTrie
- java.lang.Object
-
- com.ibm.icu.util.CodePointMap
-
- com.ibm.icu.util.MutableCodePointTrie
-
- All Implemented Interfaces:
java.lang.Cloneable
,java.lang.Iterable<CodePointMap.Range>
public final class MutableCodePointTrie extends CodePointMap implements java.lang.Cloneable
Mutable Unicode code point trie. Fast map from Unicode code points (U+0000..U+10FFFF) to 32-bit integer values. For details see https://icu.unicode.org/design/struct/utrieSetting values (especially ranges) and lookup is fast. The mutable trie is only somewhat space-efficient. It builds a compacted, immutable
CodePointTrie
.This trie can be modified while iterating over its contents. For example, it is possible to merge its values with those from another set of ranges (e.g., another @{link CodePointMap}): Iterate over those source ranges; for each of them iterate over this trie; add the source value into the value of each trie range.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description private static class
MutableCodePointTrie.AllSameBlocks
private static class
MutableCodePointTrie.MixedBlocks
-
Nested classes/interfaces inherited from class com.ibm.icu.util.CodePointMap
CodePointMap.Range, CodePointMap.RangeOption, CodePointMap.StringIterator, CodePointMap.ValueFilter
-
-
Field Summary
Fields Modifier and Type Field Description private static byte
ALL_SAME
private static int
ASCII_I_LIMIT
private static int
ASCII_LIMIT
private static int
BMP_I_LIMIT
private static int
BMP_LIMIT
private int[]
data
private int
dataLength
private int
dataNullOffset
private int
errorValue
private byte[]
flags
private int
highStart
private int
highValue
private static int
I_LIMIT
private static byte
I3_16
private static byte
I3_18
private static byte
I3_BMP
private static byte
I3_NULL
private int[]
index
private static int
INDEX_3_18BIT_BLOCK_LENGTH
private char[]
index16
Temporary array while building the final data.private int
index3NullOffset
private static int
INITIAL_DATA_LENGTH
Start with allocation of 16k data entries.private int
initialValue
private static int
MAX_DATA_LENGTH
Maximum length of the build-time data array.private static int
MAX_UNICODE
private static int
MEDIUM_DATA_LENGTH
Grow about 8x each time.private static byte
MIXED
private int
origInitialValue
private static byte
SAME_AS
private static int
SMALL_DATA_BLOCKS_PER_BMP_BLOCK
private static int
UNICODE_LIMIT
-
Constructor Summary
Constructors Constructor Description MutableCodePointTrie(int initialValue, int errorValue)
Constructs a mutable trie that initially maps each Unicode code point to the same value.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description private int
allocDataBlock(int blockLength)
private static boolean
allValuesSameAs(int[] p, int pi, int length, int value)
private CodePointTrie
build(CodePointTrie.Type type, CodePointTrie.ValueWidth valueWidth)
CodePointTrie
buildImmutable(CodePointTrie.Type type, CodePointTrie.ValueWidth valueWidth)
Compacts the data and builds an immutableCodePointTrie
according to the parameters.private void
clear()
MutableCodePointTrie
clone()
Clones this mutable trie.private int
compactData(int fastILimit, int[] newData, int dataNullIndex, MutableCodePointTrie.MixedBlocks mixedBlocks)
Compacts a build-time trie.private int
compactIndex(int fastILimit, MutableCodePointTrie.MixedBlocks mixedBlocks)
private int
compactTrie(int fastILimit)
private int
compactWholeDataBlocks(int fastILimit, MutableCodePointTrie.AllSameBlocks allSameBlocks)
private void
ensureHighStart(int c)
private static boolean
equalBlocks(char[] s, int si, char[] t, int ti, int length)
private static boolean
equalBlocks(char[] s, int si, int[] t, int ti, int length)
private static boolean
equalBlocks(int[] s, int si, int[] t, int ti, int length)
private void
fillBlock(int block, int start, int limit, int value)
private static int
findAllSameBlock(int[] p, int start, int limit, int value, int blockLength)
private int
findHighStart()
Finds the start of the last range in the trie by enumerating backward.private static int
findSameBlock(char[] p, int pStart, int length, char[] q, int qStart, int blockLength)
Search for an identical block.static MutableCodePointTrie
fromCodePointMap(CodePointMap map)
Creates a mutable trie with the same contents as theCodePointMap
.int
get(int c)
Returns the value for a code point as stored in the map, with range checking.private static int
getAllSameOverlap(int[] p, int length, int value, int blockLength)
private int
getDataBlock(int i)
No error checking for illegal arguments.private static int
getOverlap(char[] p, int length, char[] q, int qStart, int blockLength)
private static int
getOverlap(char[] p, int length, int[] q, int qStart, int blockLength)
private static int
getOverlap(int[] p, int length, int[] q, int qStart, int blockLength)
Look for maximum overlap of the beginning of the other block with the previous, adjacent block.boolean
getRange(int start, CodePointMap.ValueFilter filter, CodePointMap.Range range)
Sets the range object to a range of code points beginning with the start parameter.private static boolean
isStartOfSomeFastBlock(int dataOffset, int[] index, int fastILimit)
private void
maskValues(int mask)
private static int
maybeFilterValue(int value, int initialValue, int nullValue, CodePointMap.ValueFilter filter)
void
set(int c, int value)
Sets a value for a code point.void
setRange(int start, int end, int value)
Sets a value for each code point [start..end].private void
writeBlock(int block, int value)
-
Methods inherited from class com.ibm.icu.util.CodePointMap
getRange, iterator, stringIterator
-
-
-
-
Field Detail
-
MAX_UNICODE
private static final int MAX_UNICODE
- See Also:
- Constant Field Values
-
UNICODE_LIMIT
private static final int UNICODE_LIMIT
- See Also:
- Constant Field Values
-
BMP_LIMIT
private static final int BMP_LIMIT
- See Also:
- Constant Field Values
-
ASCII_LIMIT
private static final int ASCII_LIMIT
- See Also:
- Constant Field Values
-
I_LIMIT
private static final int I_LIMIT
- See Also:
- Constant Field Values
-
BMP_I_LIMIT
private static final int BMP_I_LIMIT
- See Also:
- Constant Field Values
-
ASCII_I_LIMIT
private static final int ASCII_I_LIMIT
- See Also:
- Constant Field Values
-
SMALL_DATA_BLOCKS_PER_BMP_BLOCK
private static final int SMALL_DATA_BLOCKS_PER_BMP_BLOCK
- See Also:
- Constant Field Values
-
ALL_SAME
private static final byte ALL_SAME
- See Also:
- Constant Field Values
-
MIXED
private static final byte MIXED
- See Also:
- Constant Field Values
-
SAME_AS
private static final byte SAME_AS
- See Also:
- Constant Field Values
-
INITIAL_DATA_LENGTH
private static final int INITIAL_DATA_LENGTH
Start with allocation of 16k data entries.- See Also:
- Constant Field Values
-
MEDIUM_DATA_LENGTH
private static final int MEDIUM_DATA_LENGTH
Grow about 8x each time.- See Also:
- Constant Field Values
-
MAX_DATA_LENGTH
private static final int MAX_DATA_LENGTH
Maximum length of the build-time data array. One entry per 0x110000 code points.- See Also:
- Constant Field Values
-
I3_NULL
private static final byte I3_NULL
- See Also:
- Constant Field Values
-
I3_BMP
private static final byte I3_BMP
- See Also:
- Constant Field Values
-
I3_16
private static final byte I3_16
- See Also:
- Constant Field Values
-
I3_18
private static final byte I3_18
- See Also:
- Constant Field Values
-
INDEX_3_18BIT_BLOCK_LENGTH
private static final int INDEX_3_18BIT_BLOCK_LENGTH
- See Also:
- Constant Field Values
-
index
private int[] index
-
index3NullOffset
private int index3NullOffset
-
data
private int[] data
-
dataLength
private int dataLength
-
dataNullOffset
private int dataNullOffset
-
origInitialValue
private int origInitialValue
-
initialValue
private int initialValue
-
errorValue
private int errorValue
-
highStart
private int highStart
-
highValue
private int highValue
-
index16
private char[] index16
Temporary array while building the final data.
-
flags
private byte[] flags
-
-
Constructor Detail
-
MutableCodePointTrie
public MutableCodePointTrie(int initialValue, int errorValue)
Constructs a mutable trie that initially maps each Unicode code point to the same value. It uses 32-bit data values untilbuildImmutable(com.ibm.icu.util.CodePointTrie.Type, com.ibm.icu.util.CodePointTrie.ValueWidth)
is called. buildImmutable() takes a valueWidth parameter which determines the number of bits in the data value in the resultingCodePointTrie
.- Parameters:
initialValue
- the initial value that is set for all code pointserrorValue
- the value for out-of-range code points and ill-formed UTF-8/16
-
-
Method Detail
-
clone
public MutableCodePointTrie clone()
Clones this mutable trie.- Overrides:
clone
in classjava.lang.Object
- Returns:
- the clone
-
fromCodePointMap
public static MutableCodePointTrie fromCodePointMap(CodePointMap map)
Creates a mutable trie with the same contents as theCodePointMap
.- Parameters:
map
- the source map or trie- Returns:
- the mutable trie
-
clear
private void clear()
-
get
public int get(int c)
Returns the value for a code point as stored in the map, with range checking. Returns an implementation-defined error value if c is not in the range 0..U+10FFFF.- Specified by:
get
in classCodePointMap
- Parameters:
c
- the code point- Returns:
- the map value, or an implementation-defined error value if the code point is not in the range 0..U+10FFFF
-
maybeFilterValue
private static final int maybeFilterValue(int value, int initialValue, int nullValue, CodePointMap.ValueFilter filter)
-
getRange
public boolean getRange(int start, CodePointMap.ValueFilter filter, CodePointMap.Range range)
Sets the range object to a range of code points beginning with the start parameter. The range start is the same as the start input parameter (even if there are preceding code points that have the same value). The range end is the last code point such that all those from start to there have the same value. Returns false if start is not 0..U+10FFFF. Can be used to efficiently iterate over all same-value ranges in a map. (This is normally faster than iterating over code points and get()ting each value, but may be much slower than a data structure that stores ranges directly.)If the
CodePointMap.ValueFilter
parameter is not null, then the value to be delivered is passed through that filter, and the return value is the end of the range where all values are modified to the same actual value. The value is unchanged if that parameter is null.Example:
int start = 0; CodePointMap.Range range = new CodePointMap.Range(); while (map.getRange(start, null, range)) { int end = range.getEnd(); int value = range.getValue(); // Work with the range start..end and its value. start = end + 1; }
The trie can be modified between calls to this function.
- Specified by:
getRange
in classCodePointMap
- Parameters:
start
- range startfilter
- an object that may modify the map data value, or null if the values from the map are to be used unmodifiedrange
- the range object that will be set to the code point range and value- Returns:
- true if start is 0..U+10FFFF; otherwise no new range is fetched
-
writeBlock
private void writeBlock(int block, int value)
-
set
public void set(int c, int value)
Sets a value for a code point.- Parameters:
c
- the code pointvalue
- the value
-
fillBlock
private void fillBlock(int block, int start, int limit, int value)
-
setRange
public void setRange(int start, int end, int value)
Sets a value for each code point [start..end]. Faster and more space-efficient than setting the value for each code point separately.- Parameters:
start
- the first code point to get the valueend
- the last code point to get the value (inclusive)value
- the value
-
buildImmutable
public CodePointTrie buildImmutable(CodePointTrie.Type type, CodePointTrie.ValueWidth valueWidth)
Compacts the data and builds an immutableCodePointTrie
according to the parameters. After this, the mutable trie will be empty.The mutable trie stores 32-bit values until buildImmutable() is called. If values shorter than 32 bits are to be stored in the immutable trie, then the upper bits are discarded. For example, when the mutable trie contains values 0x81, -0x7f, and 0xa581, and the value width is 8 bits, then each of these is stored as 0x81 and the immutable trie will return that as an unsigned value. (Some implementations may want to make productive temporary use of the upper bits until buildImmutable() discards them.)
Not every possible set of mappings can be built into a CodePointTrie, because of limitations resulting from speed and space optimizations. Every Unicode assigned character can be mapped to a unique value. Typical data yields data structures far smaller than the limitations.
It is possible to construct extremely unusual mappings that exceed the data structure limits. In such a case this function will throw an exception.
- Parameters:
type
- selects the trie typevalueWidth
- selects the number of bits in a trie data value; if smaller than 32 bits, then the values stored in the trie will be truncated first- See Also:
fromCodePointMap(CodePointMap)
-
ensureHighStart
private void ensureHighStart(int c)
-
allocDataBlock
private int allocDataBlock(int blockLength)
-
getDataBlock
private int getDataBlock(int i)
No error checking for illegal arguments. The Java version always returns non-negative values.
-
maskValues
private void maskValues(int mask)
-
equalBlocks
private static boolean equalBlocks(int[] s, int si, int[] t, int ti, int length)
-
equalBlocks
private static boolean equalBlocks(char[] s, int si, int[] t, int ti, int length)
-
equalBlocks
private static boolean equalBlocks(char[] s, int si, char[] t, int ti, int length)
-
allValuesSameAs
private static boolean allValuesSameAs(int[] p, int pi, int length, int value)
-
findSameBlock
private static int findSameBlock(char[] p, int pStart, int length, char[] q, int qStart, int blockLength)
Search for an identical block.
-
findAllSameBlock
private static int findAllSameBlock(int[] p, int start, int limit, int value, int blockLength)
-
getOverlap
private static int getOverlap(int[] p, int length, int[] q, int qStart, int blockLength)
Look for maximum overlap of the beginning of the other block with the previous, adjacent block.
-
getOverlap
private static int getOverlap(char[] p, int length, int[] q, int qStart, int blockLength)
-
getOverlap
private static int getOverlap(char[] p, int length, char[] q, int qStart, int blockLength)
-
getAllSameOverlap
private static int getAllSameOverlap(int[] p, int length, int value, int blockLength)
-
isStartOfSomeFastBlock
private static boolean isStartOfSomeFastBlock(int dataOffset, int[] index, int fastILimit)
-
findHighStart
private int findHighStart()
Finds the start of the last range in the trie by enumerating backward. Indexes for code points higher than this will be omitted.
-
compactWholeDataBlocks
private int compactWholeDataBlocks(int fastILimit, MutableCodePointTrie.AllSameBlocks allSameBlocks)
-
compactData
private int compactData(int fastILimit, int[] newData, int dataNullIndex, MutableCodePointTrie.MixedBlocks mixedBlocks)
Compacts a build-time trie. The compaction - removes blocks that are identical with earlier ones - overlaps each new non-duplicate block as much as possible with the previously-written one - works with fast-range data blocks whose length is a multiple of that of higher-code-point data blocks It does not try to find an optimal order of writing, deduplicating, and overlapping blocks.
-
compactIndex
private int compactIndex(int fastILimit, MutableCodePointTrie.MixedBlocks mixedBlocks)
-
compactTrie
private int compactTrie(int fastILimit)
-
build
private CodePointTrie build(CodePointTrie.Type type, CodePointTrie.ValueWidth valueWidth)
-
-