Class CollationData


  • public final class CollationData
    extends java.lang.Object
    Collation data container. Immutable data created by a CollationDataBuilder, or loaded from a file, or deserialized from API-provided binary data. Includes data for the collation base (root/default), aliased if this is not the base.
    • Field Detail

      • REORDER_RESERVED_BEFORE_LATIN

        static final int REORDER_RESERVED_BEFORE_LATIN
        See Also:
        Constant Field Values
      • REORDER_RESERVED_AFTER_LATIN

        static final int REORDER_RESERVED_AFTER_LATIN
        See Also:
        Constant Field Values
      • MAX_NUM_SPECIAL_REORDER_CODES

        static final int MAX_NUM_SPECIAL_REORDER_CODES
        See Also:
        Constant Field Values
      • EMPTY_INT_ARRAY

        private static final int[] EMPTY_INT_ARRAY
      • ce32s

        int[] ce32s
        Array of CE32 values. At index 0 there must be CE32(U+0000) to support U+0000's special-tag for NUL-termination handling.
      • ces

        long[] ces
        Array of CE values for expansions and OFFSET_TAG.
      • contexts

        java.lang.String contexts
        Array of prefix and contraction-suffix matching data.
      • base

        public CollationData base
        Base collation data, or null if this data itself is a base.
      • jamoCE32s

        int[] jamoCE32s
        Simple array of JAMO_CE32S_LENGTH=19+21+27 CE32s, one per canonical Jamo L/V/T. They are normally simple CE32s, rarely expansions. For fast handling of HANGUL_TAG.
      • numericPrimary

        long numericPrimary
        The single-byte primary weight (xx000000) for numeric collation.
      • compressibleBytes

        public boolean[] compressibleBytes
        256 flags for which primary-weight lead bytes are compressible.
      • unsafeBackwardSet

        UnicodeSet unsafeBackwardSet
        Set of code points that are unsafe for starting string comparison after an identical prefix, or in backwards CE iteration.
      • fastLatinTable

        public char[] fastLatinTable
        Fast Latin table for common-Latin-text string comparisons. Data structure see class CollationFastLatin.
      • fastLatinTableHeader

        char[] fastLatinTableHeader
        Header portion of the fastLatinTable. In C++, these are one array, and the header is skipped for mapping characters. In Java, two arrays work better.
      • numScripts

        int numScripts
        Data for scripts and reordering groups. Uses include building a reordering permutation table and providing script boundaries to AlphabeticIndex.
      • scriptsIndex

        char[] scriptsIndex
        The length of scriptsIndex is numScripts+16. It maps from a UScriptCode or a special reorder code to an entry in scriptStarts. 16 special reorder codes (not all used) are mapped starting at numScripts. Up to MAX_NUM_SPECIAL_REORDER_CODES are codes for special groups like space/punct/digit. There are special codes at the end for reorder-reserved primary ranges.

        Multiple scripts may share a range and index, for example Hira & Kana.

      • scriptStarts

        char[] scriptStarts
        Start primary weight (top 16 bits only) for a group/script/reserved range indexed by scriptsIndex. The first range (separators & terminators) and the last range (trailing weights) are not reorderable, and no scriptsIndex entry points to them.
      • rootElements

        public long[] rootElements
        Collation elements in the root collator. Used by the CollationRootElements class. The data structure is described there. null in a tailoring.
    • Method Detail

      • getCE32

        public int getCE32​(int c)
      • getCE32FromSupplementary

        int getCE32FromSupplementary​(int c)
      • isDigit

        boolean isDigit​(int c)
      • isUnsafeBackward

        public boolean isUnsafeBackward​(int c,
                                        boolean numeric)
      • isCompressibleLeadByte

        public boolean isCompressibleLeadByte​(int b)
      • isCompressiblePrimary

        public boolean isCompressiblePrimary​(long p)
      • getCE32FromContexts

        int getCE32FromContexts​(int index)
        Returns the CE32 from two contexts words. Access to the defaultCE32 for contraction and prefix matching.
      • getIndirectCE32

        int getIndirectCE32​(int ce32)
        Returns the CE32 for an indirect special CE32 (e.g., with DIGIT_TAG). Requires that ce32 is special.
      • getFinalCE32

        int getFinalCE32​(int ce32)
        Returns the CE32 for an indirect special CE32 (e.g., with DIGIT_TAG), if ce32 is special.
      • getCEFromOffsetCE32

        long getCEFromOffsetCE32​(int c,
                                 int ce32)
        Computes a CE from c's ce32 which has the OFFSET_TAG.
      • getSingleCE

        long getSingleCE​(int c)
        Returns the single CE that c maps to. Throws UnsupportedOperationException if c does not map to a single CE.
      • getFCD16

        int getFCD16​(int c)
        Returns the FCD16 value for code point c. c must be >= 0.
      • getFirstPrimaryForGroup

        long getFirstPrimaryForGroup​(int script)
        Returns the first primary for the script's reordering group.
        Returns:
        the primary with only the first primary lead byte of the group (not necessarily an actual root collator primary weight), or 0 if the script is unknown
      • getLastPrimaryForGroup

        public long getLastPrimaryForGroup​(int script)
        Returns the last primary for the script's reordering group.
        Returns:
        the last primary of the group (not an actual root collator primary weight), or 0 if the script is unknown
      • getGroupForPrimary

        public int getGroupForPrimary​(long p)
        Finds the reordering group which contains the primary weight.
        Returns:
        the first script of the group, or -1 if the weight is beyond the last group
      • getScriptIndex

        private int getScriptIndex​(int script)
      • getEquivalentScripts

        public int[] getEquivalentScripts​(int script)
      • makeReorderRanges

        void makeReorderRanges​(int[] reorder,
                               UVector32 ranges)
        Writes the permutation of primary-weight ranges for the given reordering of scripts and groups. The caller checks for illegal arguments and takes care of [DEFAULT] and memory allocation.

        Each list element will be a (limit, offset) pair as described for the CollationSettings.reorderRanges. The list will be empty if no ranges are reordered.

      • makeReorderRanges

        private void makeReorderRanges​(int[] reorder,
                                       boolean latinMustMove,
                                       UVector32 ranges)
      • addLowScriptRange

        private int addLowScriptRange​(short[] table,
                                      int index,
                                      int lowStart)
      • addHighScriptRange

        private int addHighScriptRange​(short[] table,
                                       int index,
                                       int highLimit)
      • scriptCodeString

        private static java.lang.String scriptCodeString​(int script)