Class CollationFCD


  • public final class CollationFCD
    extends java.lang.Object
    Data and functions for the FCD check fast path. The fast path looks at a pair of 16-bit code units and checks whether there is an FCD boundary between them; there is if the first unit has a trailing ccc=0 (!hasTccc(first)) or the second unit has a leading ccc=0 (!hasLccc(second)), or both. When the fast path finds a possible non-boundary, then the FCD check slow path looks at the actual sequence of FCD values. This is a pure optimization. The fast path must at least find all possible non-boundaries. If the fast path is too pessimistic, it costs performance. For a pair of BMP characters, the fast path tests are precise (1 bit per character). For a supplementary code point, the two units are its lead and trail surrogates. We set hasTccc(lead)=true if any of its 1024 associated supplementary code points has lccc!=0 or tccc!=0. We set hasLccc(trail)=true for all trail surrogates. As a result, we leave the fast path if the lead surrogate might start a supplementary code point that is not FCD-inert. (So the fast path need not detect that there is a surrogate pair, nor look ahead to the next full code point.) hasLccc(lead)=true if any of its 1024 associated supplementary code points has lccc!=0, for fast boundary checking between BMP & supplementary. hasTccc(trail)=false: It should only be tested for unpaired trail surrogates which are FCD-inert.
    • Field Summary

      Fields 
      Modifier and Type Field Description
      private static int[] lcccBits  
      private static byte[] lcccIndex  
      private static int[] tcccBits  
      private static byte[] tcccIndex  
    • Constructor Summary

      Constructors 
      Constructor Description
      CollationFCD()  
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static boolean hasLccc​(int c)  
      static boolean hasTccc​(int c)  
      (package private) static boolean isFCD16OfTibetanCompositeVowel​(int fcd16)
      Tibetan composite vowel signs (U+0F73, U+0F75, U+0F81) must be decomposed before reaching the core collation code, or else some sequences including them, even ones passing the FCD check, do not yield canonically equivalent results.
      (package private) static boolean maybeTibetanCompositeVowel​(int c)
      Tibetan composite vowel signs (U+0F73, U+0F75, U+0F81) must be decomposed before reaching the core collation code, or else some sequences including them, even ones passing the FCD check, do not yield canonically equivalent results.
      (package private) static boolean mayHaveLccc​(int c)  
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • lcccIndex

        private static final byte[] lcccIndex
      • lcccBits

        private static final int[] lcccBits
      • tcccIndex

        private static final byte[] tcccIndex
      • tcccBits

        private static final int[] tcccBits
    • Constructor Detail

      • CollationFCD

        public CollationFCD()
    • Method Detail

      • hasLccc

        public static boolean hasLccc​(int c)
      • hasTccc

        public static boolean hasTccc​(int c)
      • mayHaveLccc

        static boolean mayHaveLccc​(int c)
      • maybeTibetanCompositeVowel

        static boolean maybeTibetanCompositeVowel​(int c)
        Tibetan composite vowel signs (U+0F73, U+0F75, U+0F81) must be decomposed before reaching the core collation code, or else some sequences including them, even ones passing the FCD check, do not yield canonically equivalent results. This is a fast and imprecise test.
        Parameters:
        c - a code point
        Returns:
        true if c is U+0F73, U+0F75 or U+0F81 or one of several other Tibetan characters
      • isFCD16OfTibetanCompositeVowel

        static boolean isFCD16OfTibetanCompositeVowel​(int fcd16)
        Tibetan composite vowel signs (U+0F73, U+0F75, U+0F81) must be decomposed before reaching the core collation code, or else some sequences including them, even ones passing the FCD check, do not yield canonically equivalent results. They have distinct lccc/tccc combinations: 129/130 or 129/132.
        Parameters:
        fcd16 - the FCD value (lccc/tccc combination) of a code point
        Returns:
        true if fcd16 is from U+0F73, U+0F75 or U+0F81