Class Trie
- java.lang.Object
-
- com.ibm.icu.impl.Trie
-
public abstract class Trie extends java.lang.Object
A trie is a kind of compressed, serializable table of values associated with Unicode code points (0..0x10ffff).
This class defines the basic structure of a trie and provides methods to retrieve the offsets to the actual data.
Data will be the form of an array of basic types, char or int.
The actual data format will have to be specified by the user in the inner static interface com.ibm.icu.impl.Trie.DataManipulate.
This trie implementation is optimized for getting offset while walking forward through a UTF-16 string. Therefore, the simplest and fastest access macros are the fromLead() and fromOffsetTrail() methods. The fromBMP() method are a little more complicated; they get offsets even for lead surrogate codepoints, while the fromLead() method get special "folded" offsets for lead surrogate code units if there is relevant data associated with them. From such a folded offsets, an offset needs to be extracted to supply to the fromOffsetTrail() methods. To handle such supplementary codepoints, some offset information are kept in the data.
Methods in com.ibm.icu.impl.Trie.DataManipulate are called to retrieve that offset from the folded value for the lead surrogate unit.
For examples of use, see com.ibm.icu.impl.CharTrie or com.ibm.icu.impl.IntTrie.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static interface
Trie.DataManipulate
Character data in com.ibm.impl.Trie have different user-specified format for different purposes.private static class
Trie.DefaultGetFoldingOffset
-
Field Summary
Fields Modifier and Type Field Description protected static int
BMP_INDEX_LENGTH
Length of the BMP portion of the index (stage 1) array.protected static int
DATA_BLOCK_LENGTH
Number of data values in a stage 2 (data array) block.protected static int
HEADER_LENGTH_
Size of Trie header in bytesprotected static int
HEADER_OPTIONS_DATA_IS_32_BIT_
protected static int
HEADER_OPTIONS_INDEX_SHIFT_
protected static int
HEADER_OPTIONS_LATIN1_IS_LINEAR_MASK_
Latin 1 option maskprivate static int
HEADER_OPTIONS_SHIFT_MASK_
Header option formattingprotected static int
HEADER_SIGNATURE_
Constant number to authenticate the byte blockprotected static int
INDEX_STAGE_1_SHIFT_
Shift size for shifting right the input index.protected static int
INDEX_STAGE_2_SHIFT_
Shift size for shifting left the index array values.protected static int
INDEX_STAGE_3_MASK_
Mask for getting the lower bits from the input index.protected static int
LEAD_INDEX_OFFSET_
Lead surrogate code points' index displacement in the index array.protected int
m_dataLength_
Length of the data arrayprotected Trie.DataManipulate
m_dataManipulate_
Internal TrieValue which handles the parsing of the data value.protected int
m_dataOffset_
Start index of the data portion of the trie.protected char[]
m_index_
Index or UTF16 charactersprivate boolean
m_isLatin1Linear_
Flag indicator for Latin quick access data blockprivate int
m_options_
Trie options field.protected static int
SURROGATE_BLOCK_BITS
Number of bits of a trail surrogate that are used in index table lookups.protected static int
SURROGATE_BLOCK_COUNT
Number of index (stage 1) entries per lead surrogate.protected static int
SURROGATE_MASK_
Surrogate mask to use when shifting offset to retrieve supplementary values
-
Constructor Summary
Constructors Modifier Constructor Description protected
Trie(char[] index, int options, Trie.DataManipulate dataManipulate)
Trie constructorprotected
Trie(java.nio.ByteBuffer bytes, Trie.DataManipulate dataManipulate)
Trie constructor for CharTrie use.
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description private boolean
checkHeader(int signature)
Authenticates raw data header.boolean
equals(java.lang.Object other)
Checks if the argument Trie has the same data as this Trie.protected int
getBMPOffset(char ch)
Gets the offset to data which the BMP character points to Treats a lead surrogate as a normal code point.protected int
getCodePointOffset(int ch)
Internal trie getter from a code point.protected abstract int
getInitialValue()
Gets the default initial valueprotected int
getLeadOffset(char ch)
Gets the offset to the data which this lead surrogate character points to.protected int
getRawOffset(int offset, char ch)
Gets the offset to the data which the index ch after variable offset points to.int
getSerializedDataSize()
Gets the serialized data file size of the Trie.protected abstract int
getSurrogateOffset(char lead, char trail)
Gets the offset to the data which the surrogate pair points to.protected abstract int
getValue(int index)
Gets the value at the argument indexint
hashCode()
protected boolean
isCharTrie()
Determines if this is a 16 bit trieprotected boolean
isIntTrie()
Determines if this is a 32 bit trieboolean
isLatin1Linear()
Determines if this trie has a linear latin 1 arrayprotected void
unserialize(java.nio.ByteBuffer bytes)
Parses the byte buffer and creates the trie index with it.
-
-
-
Field Detail
-
LEAD_INDEX_OFFSET_
protected static final int LEAD_INDEX_OFFSET_
Lead surrogate code points' index displacement in the index array. 0x10000-0xd800=0x2800 0x2800 >> INDEX_STAGE_1_SHIFT_- See Also:
- Constant Field Values
-
INDEX_STAGE_1_SHIFT_
protected static final int INDEX_STAGE_1_SHIFT_
Shift size for shifting right the input index. 1..9- See Also:
- Constant Field Values
-
INDEX_STAGE_2_SHIFT_
protected static final int INDEX_STAGE_2_SHIFT_
Shift size for shifting left the index array values. Increases possible data size with 16-bit index values at the cost of compactability. This requires blocks of stage 2 data to be aligned by DATA_GRANULARITY. 0..INDEX_STAGE_1_SHIFT- See Also:
- Constant Field Values
-
DATA_BLOCK_LENGTH
protected static final int DATA_BLOCK_LENGTH
Number of data values in a stage 2 (data array) block.- See Also:
- Constant Field Values
-
INDEX_STAGE_3_MASK_
protected static final int INDEX_STAGE_3_MASK_
Mask for getting the lower bits from the input index. DATA_BLOCK_LENGTH - 1.- See Also:
- Constant Field Values
-
SURROGATE_BLOCK_BITS
protected static final int SURROGATE_BLOCK_BITS
Number of bits of a trail surrogate that are used in index table lookups.- See Also:
- Constant Field Values
-
SURROGATE_BLOCK_COUNT
protected static final int SURROGATE_BLOCK_COUNT
Number of index (stage 1) entries per lead surrogate. Same as number of index entries for 1024 trail surrogates, ==0x400>>INDEX_STAGE_1_SHIFT_- See Also:
- Constant Field Values
-
BMP_INDEX_LENGTH
protected static final int BMP_INDEX_LENGTH
Length of the BMP portion of the index (stage 1) array.- See Also:
- Constant Field Values
-
SURROGATE_MASK_
protected static final int SURROGATE_MASK_
Surrogate mask to use when shifting offset to retrieve supplementary values- See Also:
- Constant Field Values
-
m_index_
protected char[] m_index_
Index or UTF16 characters
-
m_dataManipulate_
protected Trie.DataManipulate m_dataManipulate_
Internal TrieValue which handles the parsing of the data value. This class is to be implemented by the user
-
m_dataOffset_
protected int m_dataOffset_
Start index of the data portion of the trie. CharTrie combines index and data into a char array, so this is used to indicate the initial offset to the data portion. Note this index always points to the initial value.
-
m_dataLength_
protected int m_dataLength_
Length of the data array
-
HEADER_LENGTH_
protected static final int HEADER_LENGTH_
Size of Trie header in bytes- See Also:
- Constant Field Values
-
HEADER_OPTIONS_LATIN1_IS_LINEAR_MASK_
protected static final int HEADER_OPTIONS_LATIN1_IS_LINEAR_MASK_
Latin 1 option mask- See Also:
- Constant Field Values
-
HEADER_SIGNATURE_
protected static final int HEADER_SIGNATURE_
Constant number to authenticate the byte block- See Also:
- Constant Field Values
-
HEADER_OPTIONS_SHIFT_MASK_
private static final int HEADER_OPTIONS_SHIFT_MASK_
Header option formatting- See Also:
- Constant Field Values
-
HEADER_OPTIONS_INDEX_SHIFT_
protected static final int HEADER_OPTIONS_INDEX_SHIFT_
- See Also:
- Constant Field Values
-
HEADER_OPTIONS_DATA_IS_32_BIT_
protected static final int HEADER_OPTIONS_DATA_IS_32_BIT_
- See Also:
- Constant Field Values
-
m_isLatin1Linear_
private boolean m_isLatin1Linear_
Flag indicator for Latin quick access data block
-
m_options_
private int m_options_
Trie options field.
options bit field:
9 1 = Latin-1 data is stored linearly at data + DATA_BLOCK_LENGTH
8 0 = 16-bit data, 1=32-bit data
7..4 INDEX_STAGE_1_SHIFT // 0..INDEX_STAGE_2_SHIFT
3..0 INDEX_STAGE_2_SHIFT // 1..9
-
-
Constructor Detail
-
Trie
protected Trie(java.nio.ByteBuffer bytes, Trie.DataManipulate dataManipulate)
Trie constructor for CharTrie use.- Parameters:
bytes
- data of an ICU data file, containing the triedataManipulate
- object containing the information to parse the trie data
-
Trie
protected Trie(char[] index, int options, Trie.DataManipulate dataManipulate)
Trie constructor- Parameters:
index
- array to be used for indexoptions
- used by the triedataManipulate
- object containing the information to parse the trie data
-
-
Method Detail
-
isLatin1Linear
public final boolean isLatin1Linear()
Determines if this trie has a linear latin 1 array- Returns:
- true if this trie has a linear latin 1 array, false otherwise
-
equals
public boolean equals(java.lang.Object other)
Checks if the argument Trie has the same data as this Trie. Attributes are checked but not the index data.- Overrides:
equals
in classjava.lang.Object
- Parameters:
other
- Trie to check- Returns:
- true if the argument Trie has the same data as this Trie, false otherwise
-
hashCode
public int hashCode()
- Overrides:
hashCode
in classjava.lang.Object
-
getSerializedDataSize
public int getSerializedDataSize()
Gets the serialized data file size of the Trie. This is used during trie data reading for size checking purposes.- Returns:
- size size of serialized trie data file in terms of the number of bytes
-
getSurrogateOffset
protected abstract int getSurrogateOffset(char lead, char trail)
Gets the offset to the data which the surrogate pair points to.- Parameters:
lead
- lead surrogatetrail
- trailing surrogate- Returns:
- offset to data
-
getValue
protected abstract int getValue(int index)
Gets the value at the argument index- Parameters:
index
- value at index will be retrieved- Returns:
- 32 bit value
-
getInitialValue
protected abstract int getInitialValue()
Gets the default initial value- Returns:
- 32 bit value
-
getRawOffset
protected final int getRawOffset(int offset, char ch)
Gets the offset to the data which the index ch after variable offset points to. Note for locating a non-supplementary character data offset, callinggetRawOffset(0, ch);
will do. Otherwise if it is a supplementary character formed by surrogates lead and trail. Then we would have to call getRawOffset() with getFoldingIndexOffset(). See getSurrogateOffset().- Parameters:
offset
- index offset which ch is to start fromch
- index to be used after offset- Returns:
- offset to the data
-
getBMPOffset
protected final int getBMPOffset(char ch)
Gets the offset to data which the BMP character points to Treats a lead surrogate as a normal code point.- Parameters:
ch
- BMP character- Returns:
- offset to data
-
getLeadOffset
protected final int getLeadOffset(char ch)
Gets the offset to the data which this lead surrogate character points to. Data at the returned offset may contain folding offset information for the next trailing surrogate character.- Parameters:
ch
- lead surrogate character- Returns:
- offset to data
-
getCodePointOffset
protected final int getCodePointOffset(int ch)
Internal trie getter from a code point. Could be faster(?) but longer with if((c32)<=0xd7ff) { (result)=_TRIE_GET_RAW(trie, data, 0, c32); } Gets the offset to data which the codepoint points to- Parameters:
ch
- codepoint- Returns:
- offset to data
-
unserialize
protected void unserialize(java.nio.ByteBuffer bytes)
Parses the byte buffer and creates the trie index with it.
The position of the input ByteBuffer must be right after the trie header.
This is overwritten by the child classes.
- Parameters:
bytes
- buffer containing trie data
-
isIntTrie
protected final boolean isIntTrie()
Determines if this is a 32 bit trie- Returns:
- true if options specifies this is a 32 bit trie
-
isCharTrie
protected final boolean isCharTrie()
Determines if this is a 16 bit trie- Returns:
- true if this is a 16 bit trie
-
checkHeader
private final boolean checkHeader(int signature)
Authenticates raw data header. Checking the header information, signature and options.- Parameters:
signature
- This contains the options and type of a Trie- Returns:
- true if the header is authenticated valid
-
-