Class CharsetUTF8

  • All Implemented Interfaces:
    java.lang.Comparable<java.nio.charset.Charset>
    Direct Known Subclasses:
    CharsetCESU8

    class CharsetUTF8
    extends CharsetICU
    • Field Detail

      • fromUSubstitution

        private static final byte[] fromUSubstitution
      • BITMASK_FROM_UTF8

        private static final int[] BITMASK_FROM_UTF8
      • isCESU8

        private final boolean isCESU8
    • Constructor Detail

      • CharsetUTF8

        public CharsetUTF8​(java.lang.String icuCanonicalName,
                           java.lang.String javaCanonicalName,
                           java.lang.String[] aliases)
    • Method Detail

      • encodeHeadOf1

        private static final byte encodeHeadOf1​(int char32)
      • encodeHeadOf2

        private static final byte encodeHeadOf2​(int char32)
      • encodeHeadOf3

        private static final byte encodeHeadOf3​(int char32)
      • encodeHeadOf4

        private static final byte encodeHeadOf4​(int char32)
      • encodeThirdToLastTail

        private static final byte encodeThirdToLastTail​(int char32)
      • encodeSecondToLastTail

        private static final byte encodeSecondToLastTail​(int char32)
      • encodeLastTail

        private static final byte encodeLastTail​(int char32)
      • newDecoder

        public java.nio.charset.CharsetDecoder newDecoder()
        Specified by:
        newDecoder in class java.nio.charset.Charset
      • newEncoder

        public java.nio.charset.CharsetEncoder newEncoder()
        Specified by:
        newEncoder in class java.nio.charset.Charset
      • getUnicodeSetImpl

        void getUnicodeSetImpl​(UnicodeSet setFillIn,
                               int which)
        Description copied from class: CharsetICU
        This follows ucnv.c method ucnv_detectUnicodeSignature() to detect the start of the stream for example U+FEFF (the Unicode BOM/signature character) that can be ignored. Detects Unicode signature byte sequences at the start of the byte stream and returns number of bytes of the BOM of the indicated Unicode charset. 0 is returned when no Unicode signature is recognized.
        Specified by:
        getUnicodeSetImpl in class CharsetICU