Class UCharacterProperty
Internal class used for Unicode character property database.
This classes store binary data read from uprops.icu. It does not have the capability to parse the data into more high-level information. It only returns bytes of information when required.
Due to the form most commonly used for retrieval, array of char is used to store the binary data.
UCharacterPropertyDB also contains information on accessing indexes to significant points in the binary data.
Responsibility for molding the binary data into more meaning form lies on UCharacter.
- Since:
- release 2.1, february 1st 2002
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprivate class
private class
private class
private class
private class
private class
private static final class
private static final class
private class
private class
private class
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final int
Age value shiftprivate static final int
private static final int
(package private) UCharacterProperty.BinaryProperty[]
private static final int
Integer properties mask and shift values for blocks.private static final int
Integer properties mask and shift values for blocks.private static final int
private static final int
private static final int
private static final String
Default name of the datafileprivate static final int
private static final int
Integer properties mask for decomposition type.private static final int
private static final int
private static final int
private static final int
private static final int
Integer properties mask and shift values for East Asian cell width.private static final int
Integer properties mask and shift values for East Asian cell width.private static final int
private static final int
private static final int
First nibble shiftprivate static final int
private static final int
private static final int
private static final int
Mask constant for multiple UCharCategory bits (Z Separators).private static final int
private static final int
private static final int
private static final int
private static final int
private static final int[]
private static final int
private static final int
private static final int
private static final int
private static final int
private static final int
private static final int[]
Ranges (start/limit pairs) of ID_Compat_Math_Continue (only), from UCD PropList.txt.private static final int[]
ID_Compat_Math_Start characters, from UCD PropList.txt.private static final int
private static final int
private static final int
private static final int
private static final int
private static final int
static final UCharacterProperty
(package private) UCharacterProperty.IntProperty[]
private static final int
Second nibble maskstatic final char
Latin capital letter i with dot abovestatic final char
Latin small letter i with dot abovestatic final char
Latin lowercase iprivate static final int
private static final int
private static final int
(package private) int
Number of additional columns(package private) Trie2_16
Extra property trie(package private) int[]
Extra property vectors, 1st column for age and second for binary properties.(package private) int
Maximum values for block, bits used as in vector word 0(package private) int
Maximum values for script, bits used as in vector word 0char[]
Script_Extensions dataTrie dataUnicode versionprivate static final int
static final int
(package private) static final int
private static final int
private static final int
private static final int
private static final int
private static final int
private static final int
Sexagesimal numbers: ((ntv>>2)-0xbf) * 60^((ntv&3)+1) = (1..9)*(60^1..60^4)private static final int
Decimal digits: nv=0..9private static final int
Other digits: nv=0..9private static final int
Fractions: ((ntv>>4)-12) / ((ntv&0xf)+1) = -1..17 / 1..16private static final int
Fraction-20 values: frac20 = ntv-0x324 = 0..0x17 -> 1|3|5|7 / 20|40|80|160|320|640 numerator: num = 2*(frac20&3)+1 denominator: den = 20<<(frac20>>2)private static final int
Fraction-32 values: frac32 = ntv-0x34c = 0..15 -> 1|3|5|7 / 32|64|128|256 numerator: num = 2*(frac32&3)+1 denominator: den = 32<<(frac32>>2)private static final int
Large integers: ((ntv>>5)-14) * 10^((ntv&0x1f)+2) = (1..9)*(10^2..10^33) (only one significant decimal digit)private static final int
No numeric value.private static final int
Small integers: nv=0..154private static final int
No numeric value (yet).private static final int
Numeric types and values in the main properties words.private static final int
private static final int
private static final int
private static final int
private static final int
private static final int
private static final int
private static final int
private static final int
static final int
static final int
static final int
Integer properties mask and shift values for scripts.static final int
Script_Extensions: mask includes Scriptstatic final int
static final int
static final int
static final int
From ubidi_props.c/ubidi.icustatic final int
From ucase.c/ucase.icustatic final int
From ucase.c/ucase.icu as well as unorm.cpp/unorm.icustatic final int
From uchar.c/uprops.icu main triestatic final int
From uchar.c/uprops.icu main trie as well as properties vectors triestatic final int
One more than the highest UPropertySource (SRC_) constant.static final int
static final int
static final int
static final int
static final int
static final int
From unames.c/unames.icustatic final int
From normalizer2impl.cpp/nfc.nrmstatic final int
From normalizer2impl.cpp/nfc.nrm canonical iterator datastatic final int
From normalizer2impl.cpp/nfkc.nrmstatic final int
From normalizer2impl.cpp/nfkc_cf.nrmstatic final int
No source, not a supported property.static final int
From uchar.c/uprops.icu properties vectors triestatic final int
private static final int
private static final int
static final int
Character type maskprivate static final int
private static final int
private static final int
private static final int
private static final int
private static final int
private static final int
private static final int
private static final int
private static final int
private static final int
private static final int
private static final int
private static final int
private static final int
private static final int
private static final int
Additional properties used in internal trie dataprivate static final int
private static final int
private static final int
private static final int
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionint
digit
(int c) int
getAdditional
(int codepoint, int column) Gets the unicode additional properties.getAge
(int codepoint) Get the "age" of the code point.static int
getEuropeanDigit
(int ch) Returns the digit values of characters like 'A' - 'Z', normal, half-width and full-width.int
getIntPropertyMaxValue
(int which) int
getIntPropertyValue
(int c, int which) static final int
getMask
(int type) Gets the type maskint
getMaxValues
(int column) Get the the maximum values for some enum/int properties.private static final int
getNumericTypeValue
(int props) int
getNumericValue
(int c) final int
getProperty
(int ch) Gets the main property value for code point ch.(package private) final int
getSource
(int which) int
getType
(int c) double
getUnicodeNumericValue
(int c) boolean
hasBinaryProperty
(int c, int which) private static final boolean
isgraphPOSIX
(int c) Checks if c is in [^\p{space}\p{gc=Control}\p{gc=Surrogate}\p{gc=Unassigned}] with space=\p{Whitespace} and Control=Cc.(package private) static void
static final int
mergeScriptCodeOrIndex
(int scriptX) private static final int
ntvGetType
(int ntv) (package private) static UnicodeSet
ulayout_addPropertyStarts
(int src, UnicodeSet set) void
-
Field Details
-
INSTANCE
-
m_trie_
Trie data -
m_unicodeVersion_
Unicode version -
LATIN_CAPITAL_LETTER_I_WITH_DOT_ABOVE_
public static final char LATIN_CAPITAL_LETTER_I_WITH_DOT_ABOVE_Latin capital letter i with dot above- See Also:
-
LATIN_SMALL_LETTER_DOTLESS_I_
public static final char LATIN_SMALL_LETTER_DOTLESS_I_Latin small letter i with dot above- See Also:
-
LATIN_SMALL_LETTER_I_
public static final char LATIN_SMALL_LETTER_I_Latin lowercase i- See Also:
-
TYPE_MASK
public static final int TYPE_MASKCharacter type mask- See Also:
-
SRC_NONE
public static final int SRC_NONENo source, not a supported property.- See Also:
-
SRC_CHAR
public static final int SRC_CHARFrom uchar.c/uprops.icu main trie- See Also:
-
SRC_PROPSVEC
public static final int SRC_PROPSVECFrom uchar.c/uprops.icu properties vectors trie- See Also:
-
SRC_NAMES
public static final int SRC_NAMESFrom unames.c/unames.icu- See Also:
-
SRC_CASE
public static final int SRC_CASEFrom ucase.c/ucase.icu- See Also:
-
SRC_BIDI
public static final int SRC_BIDIFrom ubidi_props.c/ubidi.icu- See Also:
-
SRC_CHAR_AND_PROPSVEC
public static final int SRC_CHAR_AND_PROPSVECFrom uchar.c/uprops.icu main trie as well as properties vectors trie- See Also:
-
SRC_CASE_AND_NORM
public static final int SRC_CASE_AND_NORMFrom ucase.c/ucase.icu as well as unorm.cpp/unorm.icu- See Also:
-
SRC_NFC
public static final int SRC_NFCFrom normalizer2impl.cpp/nfc.nrm- See Also:
-
SRC_NFKC
public static final int SRC_NFKCFrom normalizer2impl.cpp/nfkc.nrm- See Also:
-
SRC_NFKC_CF
public static final int SRC_NFKC_CFFrom normalizer2impl.cpp/nfkc_cf.nrm- See Also:
-
SRC_NFC_CANON_ITER
public static final int SRC_NFC_CANON_ITERFrom normalizer2impl.cpp/nfc.nrm canonical iterator data- See Also:
-
SRC_INPC
public static final int SRC_INPC- See Also:
-
SRC_INSC
public static final int SRC_INSC- See Also:
-
SRC_VO
public static final int SRC_VO- See Also:
-
SRC_EMOJI
public static final int SRC_EMOJI- See Also:
-
SRC_IDSU
public static final int SRC_IDSU- See Also:
-
SRC_ID_COMPAT_MATH
public static final int SRC_ID_COMPAT_MATH- See Also:
-
SRC_COUNT
public static final int SRC_COUNTOne more than the highest UPropertySource (SRC_) constant.- See Also:
-
MY_MASK
static final int MY_MASK- See Also:
-
GC_CN_MASK
private static final int GC_CN_MASK -
GC_CC_MASK
private static final int GC_CC_MASK -
GC_CS_MASK
private static final int GC_CS_MASK -
GC_ZS_MASK
private static final int GC_ZS_MASK -
GC_ZL_MASK
private static final int GC_ZL_MASK -
GC_ZP_MASK
private static final int GC_ZP_MASK -
GC_Z_MASK
private static final int GC_Z_MASKMask constant for multiple UCharCategory bits (Z Separators). -
ID_COMPAT_MATH_CONTINUE
private static final int[] ID_COMPAT_MATH_CONTINUERanges (start/limit pairs) of ID_Compat_Math_Continue (only), from UCD PropList.txt. -
ID_COMPAT_MATH_START
private static final int[] ID_COMPAT_MATH_STARTID_Compat_Math_Start characters, from UCD PropList.txt. -
binProps
UCharacterProperty.BinaryProperty[] binProps -
gcbToHst
private static final int[] gcbToHst -
intProps
UCharacterProperty.IntProperty[] intProps -
m_additionalTrie_
Trie2_16 m_additionalTrie_Extra property trie -
m_additionalVectors_
int[] m_additionalVectors_Extra property vectors, 1st column for age and second for binary properties. -
m_additionalColumnsCount_
int m_additionalColumnsCount_Number of additional columns -
m_maxBlockScriptValue_
int m_maxBlockScriptValue_Maximum values for block, bits used as in vector word 0 -
m_maxJTGValue_
int m_maxJTGValue_Maximum values for script, bits used as in vector word 0 -
m_scriptExtensions_
public char[] m_scriptExtensions_Script_Extensions data -
DATA_FILE_NAME_
Default name of the datafile- See Also:
-
NUMERIC_TYPE_VALUE_SHIFT_
private static final int NUMERIC_TYPE_VALUE_SHIFT_Numeric types and values in the main properties words.- See Also:
-
NTV_NONE_
private static final int NTV_NONE_No numeric value.- See Also:
-
NTV_DECIMAL_START_
private static final int NTV_DECIMAL_START_Decimal digits: nv=0..9- See Also:
-
NTV_DIGIT_START_
private static final int NTV_DIGIT_START_Other digits: nv=0..9- See Also:
-
NTV_NUMERIC_START_
private static final int NTV_NUMERIC_START_Small integers: nv=0..154- See Also:
-
NTV_FRACTION_START_
private static final int NTV_FRACTION_START_Fractions: ((ntv>>4)-12) / ((ntv&0xf)+1) = -1..17 / 1..16- See Also:
-
NTV_LARGE_START_
private static final int NTV_LARGE_START_Large integers: ((ntv>>5)-14) * 10^((ntv&0x1f)+2) = (1..9)*(10^2..10^33) (only one significant decimal digit)- See Also:
-
NTV_BASE60_START_
private static final int NTV_BASE60_START_Sexagesimal numbers: ((ntv>>2)-0xbf) * 60^((ntv&3)+1) = (1..9)*(60^1..60^4)- See Also:
-
NTV_FRACTION20_START_
private static final int NTV_FRACTION20_START_Fraction-20 values: frac20 = ntv-0x324 = 0..0x17 -> 1|3|5|7 / 20|40|80|160|320|640 numerator: num = 2*(frac20&3)+1 denominator: den = 20<<(frac20>>2)- See Also:
-
NTV_FRACTION32_START_
private static final int NTV_FRACTION32_START_Fraction-32 values: frac32 = ntv-0x34c = 0..15 -> 1|3|5|7 / 32|64|128|256 numerator: num = 2*(frac32&3)+1 denominator: den = 32<<(frac32>>2)- See Also:
-
NTV_RESERVED_START_
private static final int NTV_RESERVED_START_No numeric value (yet).- See Also:
-
SCRIPT_X_MASK
public static final int SCRIPT_X_MASKScript_Extensions: mask includes Script- See Also:
-
SCRIPT_HIGH_MASK
public static final int SCRIPT_HIGH_MASK- See Also:
-
SCRIPT_HIGH_SHIFT
public static final int SCRIPT_HIGH_SHIFT- See Also:
-
MAX_SCRIPT
public static final int MAX_SCRIPT- See Also:
-
EAST_ASIAN_MASK_
private static final int EAST_ASIAN_MASK_Integer properties mask and shift values for East Asian cell width. Equivalent to icu4c UPROPS_EA_MASK- See Also:
-
EAST_ASIAN_SHIFT_
private static final int EAST_ASIAN_SHIFT_Integer properties mask and shift values for East Asian cell width. Equivalent to icu4c UPROPS_EA_SHIFT- See Also:
-
BLOCK_MASK_
private static final int BLOCK_MASK_Integer properties mask and shift values for blocks. Equivalent to icu4c UPROPS_BLOCK_MASK- See Also:
-
BLOCK_SHIFT_
private static final int BLOCK_SHIFT_Integer properties mask and shift values for blocks. Equivalent to icu4c UPROPS_BLOCK_SHIFT- See Also:
-
SCRIPT_LOW_MASK
public static final int SCRIPT_LOW_MASKInteger properties mask and shift values for scripts. Equivalent to icu4c UPROPS_SHIFT_LOW_MASK.- See Also:
-
SCRIPT_X_WITH_COMMON
public static final int SCRIPT_X_WITH_COMMON- See Also:
-
SCRIPT_X_WITH_INHERITED
public static final int SCRIPT_X_WITH_INHERITED- See Also:
-
SCRIPT_X_WITH_OTHER
public static final int SCRIPT_X_WITH_OTHER- See Also:
-
WHITE_SPACE_PROPERTY_
private static final int WHITE_SPACE_PROPERTY_Additional properties used in internal trie data- See Also:
-
DASH_PROPERTY_
private static final int DASH_PROPERTY_- See Also:
-
HYPHEN_PROPERTY_
private static final int HYPHEN_PROPERTY_- See Also:
-
QUOTATION_MARK_PROPERTY_
private static final int QUOTATION_MARK_PROPERTY_- See Also:
-
TERMINAL_PUNCTUATION_PROPERTY_
private static final int TERMINAL_PUNCTUATION_PROPERTY_- See Also:
-
MATH_PROPERTY_
private static final int MATH_PROPERTY_- See Also:
-
HEX_DIGIT_PROPERTY_
private static final int HEX_DIGIT_PROPERTY_- See Also:
-
ASCII_HEX_DIGIT_PROPERTY_
private static final int ASCII_HEX_DIGIT_PROPERTY_- See Also:
-
ALPHABETIC_PROPERTY_
private static final int ALPHABETIC_PROPERTY_- See Also:
-
IDEOGRAPHIC_PROPERTY_
private static final int IDEOGRAPHIC_PROPERTY_- See Also:
-
DIACRITIC_PROPERTY_
private static final int DIACRITIC_PROPERTY_- See Also:
-
EXTENDER_PROPERTY_
private static final int EXTENDER_PROPERTY_- See Also:
-
NONCHARACTER_CODE_POINT_PROPERTY_
private static final int NONCHARACTER_CODE_POINT_PROPERTY_- See Also:
-
GRAPHEME_EXTEND_PROPERTY_
private static final int GRAPHEME_EXTEND_PROPERTY_- See Also:
-
GRAPHEME_LINK_PROPERTY_
private static final int GRAPHEME_LINK_PROPERTY_- See Also:
-
IDS_BINARY_OPERATOR_PROPERTY_
private static final int IDS_BINARY_OPERATOR_PROPERTY_- See Also:
-
IDS_TRINARY_OPERATOR_PROPERTY_
private static final int IDS_TRINARY_OPERATOR_PROPERTY_- See Also:
-
RADICAL_PROPERTY_
private static final int RADICAL_PROPERTY_- See Also:
-
UNIFIED_IDEOGRAPH_PROPERTY_
private static final int UNIFIED_IDEOGRAPH_PROPERTY_- See Also:
-
DEFAULT_IGNORABLE_CODE_POINT_PROPERTY_
private static final int DEFAULT_IGNORABLE_CODE_POINT_PROPERTY_- See Also:
-
DEPRECATED_PROPERTY_
private static final int DEPRECATED_PROPERTY_- See Also:
-
LOGICAL_ORDER_EXCEPTION_PROPERTY_
private static final int LOGICAL_ORDER_EXCEPTION_PROPERTY_- See Also:
-
XID_START_PROPERTY_
private static final int XID_START_PROPERTY_- See Also:
-
XID_CONTINUE_PROPERTY_
private static final int XID_CONTINUE_PROPERTY_- See Also:
-
ID_START_PROPERTY_
private static final int ID_START_PROPERTY_- See Also:
-
ID_CONTINUE_PROPERTY_
private static final int ID_CONTINUE_PROPERTY_- See Also:
-
GRAPHEME_BASE_PROPERTY_
private static final int GRAPHEME_BASE_PROPERTY_- See Also:
-
S_TERM_PROPERTY_
private static final int S_TERM_PROPERTY_- See Also:
-
VARIATION_SELECTOR_PROPERTY_
private static final int VARIATION_SELECTOR_PROPERTY_- See Also:
-
PATTERN_SYNTAX
private static final int PATTERN_SYNTAX- See Also:
-
PATTERN_WHITE_SPACE
private static final int PATTERN_WHITE_SPACE- See Also:
-
PREPENDED_CONCATENATION_MARK
private static final int PREPENDED_CONCATENATION_MARK- See Also:
-
LB_MASK
private static final int LB_MASK- See Also:
-
LB_SHIFT
private static final int LB_SHIFT- See Also:
-
SB_MASK
private static final int SB_MASK- See Also:
-
SB_SHIFT
private static final int SB_SHIFT- See Also:
-
WB_MASK
private static final int WB_MASK- See Also:
-
WB_SHIFT
private static final int WB_SHIFT- See Also:
-
GCB_MASK
private static final int GCB_MASK- See Also:
-
GCB_SHIFT
private static final int GCB_SHIFT- See Also:
-
DECOMPOSITION_TYPE_MASK_
private static final int DECOMPOSITION_TYPE_MASK_Integer properties mask for decomposition type. Equivalent to icu4c UPROPS_DT_MASK.- See Also:
-
FIRST_NIBBLE_SHIFT_
private static final int FIRST_NIBBLE_SHIFT_First nibble shift- See Also:
-
LAST_NIBBLE_MASK_
private static final int LAST_NIBBLE_MASK_Second nibble mask- See Also:
-
AGE_SHIFT_
private static final int AGE_SHIFT_Age value shift- See Also:
-
DATA_FORMAT
private static final int DATA_FORMAT- See Also:
-
TAB
private static final int TAB- See Also:
-
CR
private static final int CR- See Also:
-
U_A
private static final int U_A- See Also:
-
U_F
private static final int U_F- See Also:
-
U_Z
private static final int U_Z- See Also:
-
U_a
private static final int U_a- See Also:
-
U_f
private static final int U_f- See Also:
-
U_z
private static final int U_z- See Also:
-
DEL
private static final int DEL- See Also:
-
NL
private static final int NL- See Also:
-
NBSP
private static final int NBSP- See Also:
-
CGJ
private static final int CGJ- See Also:
-
FIGURESP
private static final int FIGURESP- See Also:
-
HAIRSP
private static final int HAIRSP- See Also:
-
RLM
private static final int RLM- See Also:
-
NNBSP
private static final int NNBSP- See Also:
-
WJ
private static final int WJ- See Also:
-
INHSWAP
private static final int INHSWAP- See Also:
-
NOMDIG
private static final int NOMDIG- See Also:
-
U_FW_A
private static final int U_FW_A- See Also:
-
U_FW_F
private static final int U_FW_F- See Also:
-
U_FW_Z
private static final int U_FW_Z- See Also:
-
U_FW_a
private static final int U_FW_a- See Also:
-
U_FW_f
private static final int U_FW_f- See Also:
-
U_FW_z
private static final int U_FW_z- See Also:
-
ZWNBSP
private static final int ZWNBSP- See Also:
-
-
Constructor Details
-
UCharacterProperty
Constructor- Throws:
IOException
- thrown when data reading fails or data corrupted
-
-
Method Details
-
getProperty
public final int getProperty(int ch) Gets the main property value for code point ch.- Parameters:
ch
- code point whose property value is to be retrieved- Returns:
- property value of code point
-
getAdditional
public int getAdditional(int codepoint, int column) Gets the unicode additional properties. Java version of C u_getUnicodeProperties().- Parameters:
codepoint
- codepoint whose additional properties is to be retrievedcolumn
- The column index.- Returns:
- unicode properties
-
getAge
Get the "age" of the code point.
The "age" is the Unicode version when the code point was first designated (as a non-character or for Private Use) or assigned a character.
This can be useful to avoid emitting code points to receiving processes that do not accept newer characters.
The data is from the UCD file DerivedAge.txt.
This API does not check the validity of the codepoint.
- Parameters:
codepoint
- The code point.- Returns:
- the Unicode version number
-
isgraphPOSIX
private static final boolean isgraphPOSIX(int c) Checks if c is in [^\p{space}\p{gc=Control}\p{gc=Surrogate}\p{gc=Unassigned}] with space=\p{Whitespace} and Control=Cc. Implements UCHAR_POSIX_GRAPH. -
hasBinaryProperty
public boolean hasBinaryProperty(int c, int which) -
getType
public int getType(int c) -
getIntPropertyValue
public int getIntPropertyValue(int c, int which) -
getIntPropertyMaxValue
public int getIntPropertyMaxValue(int which) -
getSource
final int getSource(int which) -
getMaxValues
public int getMaxValues(int column) Get the the maximum values for some enum/int properties.- Returns:
- maximum values for the integer properties.
-
getMask
public static final int getMask(int type) Gets the type mask- Parameters:
type
- character type- Returns:
- mask
-
getEuropeanDigit
public static int getEuropeanDigit(int ch) Returns the digit values of characters like 'A' - 'Z', normal, half-width and full-width. This method assumes that the other digit characters are checked by the calling method.- Parameters:
ch
- character to test- Returns:
- -1 if ch is not a character of the form 'A' - 'Z', otherwise its corresponding digit will be returned.
-
digit
public int digit(int c) -
getNumericValue
public int getNumericValue(int c) -
getUnicodeNumericValue
public double getUnicodeNumericValue(int c) -
getNumericTypeValue
private static final int getNumericTypeValue(int props) -
ntvGetType
private static final int ntvGetType(int ntv) -
mergeScriptCodeOrIndex
public static final int mergeScriptCodeOrIndex(int scriptX) -
addPropertyStarts
-
upropsvec_addPropertyStarts
-
ulayout_addPropertyStarts
-
mathCompat_addPropertyStarts
-