Class CollationFastLatin

java.lang.Object
com.ibm.icu.impl.coll.CollationFastLatin

public final class CollationFastLatin extends Object
  • Field Details

    • VERSION

      public static final int VERSION
      Fast Latin format version (one byte 1..FF). Must be incremented for any runtime-incompatible changes, in particular, for changes to any of the following constants. When the major version number of the main data format changes, we can reset this fast Latin version to 1.
      See Also:
    • LATIN_MAX

      public static final int LATIN_MAX
      See Also:
    • LATIN_LIMIT

      public static final int LATIN_LIMIT
      See Also:
    • LATIN_MAX_UTF8_LEAD

      static final int LATIN_MAX_UTF8_LEAD
      See Also:
    • PUNCT_START

      static final int PUNCT_START
      See Also:
    • PUNCT_LIMIT

      static final int PUNCT_LIMIT
      See Also:
    • NUM_FAST_CHARS

      static final int NUM_FAST_CHARS
      See Also:
    • SHORT_PRIMARY_MASK

      static final int SHORT_PRIMARY_MASK
      See Also:
    • INDEX_MASK

      static final int INDEX_MASK
      See Also:
    • SECONDARY_MASK

      static final int SECONDARY_MASK
      See Also:
    • CASE_MASK

      static final int CASE_MASK
      See Also:
    • LONG_PRIMARY_MASK

      static final int LONG_PRIMARY_MASK
      See Also:
    • TERTIARY_MASK

      static final int TERTIARY_MASK
      See Also:
    • CASE_AND_TERTIARY_MASK

      static final int CASE_AND_TERTIARY_MASK
      See Also:
    • TWO_SHORT_PRIMARIES_MASK

      static final int TWO_SHORT_PRIMARIES_MASK
      See Also:
    • TWO_LONG_PRIMARIES_MASK

      static final int TWO_LONG_PRIMARIES_MASK
      See Also:
    • TWO_SECONDARIES_MASK

      static final int TWO_SECONDARIES_MASK
      See Also:
    • TWO_CASES_MASK

      static final int TWO_CASES_MASK
      See Also:
    • TWO_TERTIARIES_MASK

      static final int TWO_TERTIARIES_MASK
      See Also:
    • CONTRACTION

      static final int CONTRACTION
      Contraction with one fast Latin character. Use INDEX_MASK to find the start of the contraction list after the fixed table. The first entry contains the default mapping. Otherwise use CONTR_CHAR_MASK for the contraction character index (in ascending order). Use CONTR_LENGTH_SHIFT for the length of the entry (1=BAIL_OUT, 2=one CE, 3=two CEs). Also, U+0000 maps to a contraction entry, so that the fast path need not check for NUL termination. It usually maps to a contraction list with only the completely ignorable default value.
      See Also:
    • EXPANSION

      static final int EXPANSION
      An expansion encodes two CEs. Use INDEX_MASK to find the pair of CEs after the fixed table. The higher a mini CE value, the easier it is to process. For expansions and higher, no context needs to be considered.
      See Also:
    • MIN_LONG

      static final int MIN_LONG
      Encodes one CE with a long/low mini primary (there are 128). All potentially-variable primaries must be in this range, to make the short-primary path as fast as possible.
      See Also:
    • LONG_INC

      static final int LONG_INC
      See Also:
    • MAX_LONG

      static final int MAX_LONG
      See Also:
    • MIN_SHORT

      static final int MIN_SHORT
      Encodes one CE with a short/high primary (there are 60), plus a secondary CE if the secondary weight is high. Fast handling: At least all letter primaries should be in this range.
      See Also:
    • SHORT_INC

      static final int SHORT_INC
      See Also:
    • MAX_SHORT

      static final int MAX_SHORT
      The highest primary weight is reserved for U+FFFF.
      See Also:
    • MIN_SEC_BEFORE

      static final int MIN_SEC_BEFORE
      See Also:
    • SEC_INC

      static final int SEC_INC
      See Also:
    • MAX_SEC_BEFORE

      static final int MAX_SEC_BEFORE
      See Also:
    • COMMON_SEC

      static final int COMMON_SEC
      See Also:
    • MIN_SEC_AFTER

      static final int MIN_SEC_AFTER
      See Also:
    • MAX_SEC_AFTER

      static final int MAX_SEC_AFTER
      See Also:
    • MIN_SEC_HIGH

      static final int MIN_SEC_HIGH
      See Also:
    • MAX_SEC_HIGH

      static final int MAX_SEC_HIGH
      See Also:
    • SEC_OFFSET

      static final int SEC_OFFSET
      Lookup: Add this offset to secondary weights, except for completely ignorable CEs. Must be greater than any special value, e.g., MERGE_WEIGHT. The exact value is not relevant for the format version.
      See Also:
    • COMMON_SEC_PLUS_OFFSET

      static final int COMMON_SEC_PLUS_OFFSET
      See Also:
    • TWO_SEC_OFFSETS

      static final int TWO_SEC_OFFSETS
      See Also:
    • TWO_COMMON_SEC_PLUS_OFFSET

      static final int TWO_COMMON_SEC_PLUS_OFFSET
      See Also:
    • LOWER_CASE

      static final int LOWER_CASE
      See Also:
    • TWO_LOWER_CASES

      static final int TWO_LOWER_CASES
      See Also:
    • COMMON_TER

      static final int COMMON_TER
      See Also:
    • MAX_TER_AFTER

      static final int MAX_TER_AFTER
      See Also:
    • TER_OFFSET

      static final int TER_OFFSET
      Lookup: Add this offset to tertiary weights, except for completely ignorable CEs. Must be greater than any special value, e.g., MERGE_WEIGHT. Must be greater than case bits as well, so that with combined case+tertiary weights plus the offset the tertiary bits does not spill over into the case bits. The exact value is not relevant for the format version.
      See Also:
    • COMMON_TER_PLUS_OFFSET

      static final int COMMON_TER_PLUS_OFFSET
      See Also:
    • TWO_TER_OFFSETS

      static final int TWO_TER_OFFSETS
      See Also:
    • TWO_COMMON_TER_PLUS_OFFSET

      static final int TWO_COMMON_TER_PLUS_OFFSET
      See Also:
    • MERGE_WEIGHT

      static final int MERGE_WEIGHT
      See Also:
    • EOS

      static final int EOS
      See Also:
    • BAIL_OUT

      static final int BAIL_OUT
      See Also:
    • CONTR_CHAR_MASK

      static final int CONTR_CHAR_MASK
      Contraction result first word bits 8..0 contain the second contraction character, as a char index 0..NUM_FAST_CHARS-1. Each contraction list is terminated with a word containing CONTR_CHAR_MASK.
      See Also:
    • CONTR_LENGTH_SHIFT

      static final int CONTR_LENGTH_SHIFT
      Contraction result first word bits 10..9 contain the result length: 1=bail out, 2=one mini CE, 3=two mini CEs
      See Also:
    • BAIL_OUT_RESULT

      public static final int BAIL_OUT_RESULT
      Comparison return value when the regular comparison must be used. The exact value is not relevant for the format version.
      See Also:
  • Constructor Details

    • CollationFastLatin

      private CollationFastLatin()
  • Method Details

    • getCharIndex

      static int getCharIndex(char c)
    • getOptions

      public static int getOptions(CollationData data, CollationSettings settings, char[] primaries)
      Computes the options value for the compare functions and writes the precomputed primary weights. Returns -1 if the Latin fastpath is not supported for the data and settings. The capacity must be LATIN_LIMIT.
    • compareUTF16

      public static int compareUTF16(char[] table, char[] primaries, int options, CharSequence left, CharSequence right, int startIndex)
    • lookup

      private static int lookup(char[] table, int c)
    • nextPair

      private static long nextPair(char[] table, int c, int ce, CharSequence s16, int sIndex)
      Java returns a negative result (use the '~' operator) if sIndex is to be incremented. C++ modifies sIndex.
    • getPrimaries

      private static int getPrimaries(int variableTop, int pair)
    • getSecondariesFromOneShortCE

      private static int getSecondariesFromOneShortCE(int ce)
    • getSecondaries

      private static int getSecondaries(int variableTop, int pair)
    • getCases

      private static int getCases(int variableTop, boolean strengthIsPrimary, int pair)
    • getTertiaries

      private static int getTertiaries(int variableTop, boolean withCaseBits, int pair)
    • getQuaternaries

      private static int getQuaternaries(int variableTop, int pair)