Class UConverterDataReader


  • final class UConverterDataReader
    extends java.lang.Object
    ucnvmbcs.h ICU conversion (.cnv) data file structure, following the usual UDataInfo header. Format version: 6.2 struct UConverterStaticData -- struct containing the converter name, IBM CCSID, min/max bytes per character, etc. see ucnv_bld.h -------------------- The static data is followed by conversionType-specific data structures. At the moment, there are only variations of MBCS converters. They all have the same toUnicode structures, while the fromUnicode structures for SBCS differ from those for other MBCS-style converters. _MBCSHeader.version 4.2 adds an optional conversion extension data structure. If it is present, then an ICU version reading header versions 4.0 or 4.1 will be able to use the base table and ignore the extension. The unicodeMask in the static data is part of the base table data structure. Especially, the UCNV_HAS_SUPPLEMENTARY flag determines the length of the fromUnicode stage 1 array. The static data unicodeMask refers only to the base table's properties if a base table is included. In an extension-only file, the static data unicodeMask is 0. The extension data indexes have a separate field with the unicodeMask flags. MBCS-style data structure following the static data. Offsets are counted in bytes from the beginning of the MBCS header structure. Details about usage in comments in ucnvmbcs.c. struct _MBCSHeader (see the definition in this header file below) contains 32-bit fields as follows: 8 values: 0 uint8_t[4] MBCS version in UVersionInfo format (currently 4.2.0.0) 1 uint32_t countStates 2 uint32_t countToUFallbacks 3 uint32_t offsetToUCodeUnits 4 uint32_t offsetFromUTable 5 uint32_t offsetFromUBytes 6 uint32_t flags, bits: 31.. 8 offsetExtension -- _MBCSHeader.version 4.2 (ICU 2.8) and higher 0 for older versions and if there is not extension structure 7.. 0 outputType 7 uint32_t fromUBytesLength -- _MBCSHeader.version 4.1 (ICU 2.4) and higher counts bytes in fromUBytes[] if(outputType==MBCS_OUTPUT_EXT_ONLY) { -- base table name for extension-only table char baseTableName[variable]; -- with NUL plus padding for 4-alignment -- all _MBCSHeader fields except for version and flags are 0 } else { -- normal base table with optional extension int32_t stateTable[countStates][256]; struct _MBCSToUFallback { (fallbacks are sorted by offset) uint32_t offset; UChar32 codePoint; } toUFallbacks[countToUFallbacks]; uint16_t unicodeCodeUnits[(offsetFromUTable-offsetToUCodeUnits)/2]; (padded to an even number of units) -- stage 1 tables if(staticData.unicodeMask&UCNV_HAS_SUPPLEMENTARY) { -- stage 1 table for all of Unicode uint16_t fromUTable[0x440]; (32-bit-aligned) } else { -- BMP-only tables have a smaller stage 1 table uint16_t fromUTable[0x40]; (32-bit-aligned) } -- stage 2 tables length determined by top of stage 1 and bottom of stage 3 tables if(outputType==MBCS_OUTPUT_1) { -- SBCS: pure indexes uint16_t stage 2 indexes[?]; } else { -- DBCS, MBCS, EBCDIC_STATEFUL, ...: roundtrip flags and indexes uint32_t stage 2 flags and indexes[?]; } -- stage 3 tables with byte results if(outputType==MBCS_OUTPUT_1) { -- SBCS: each 16-bit result contains flags and the result byte, see ucnvmbcs.c uint16_t fromUBytes[fromUBytesLength/2]; } else { -- DBCS, MBCS, EBCDIC_STATEFUL, ... 2/3/4 bytes result, see ucnvmbcs.c uint8_t fromUBytes[fromUBytesLength]; or uint16_t fromUBytes[fromUBytesLength/2]; or uint32_t fromUBytes[fromUBytesLength/4]; } } -- extension table, details see ucnv_ext.h int32_t indexes[>=32]; ...
    • Field Detail

      • posAfterStaticData

        private int posAfterStaticData
        The buffer position after the static data.
      • byteBuffer

        private java.nio.ByteBuffer byteBuffer
        ICU data file input stream
      • DATA_FORMAT_ID

        private static final int DATA_FORMAT_ID
        File format version that this class understands. No guarantees are made if a older version is used see store.c of gennorm for more information and values
        See Also:
        Constant Field Values
    • Constructor Detail

      • UConverterDataReader

        protected UConverterDataReader​(java.nio.ByteBuffer bytes)
                                throws java.io.IOException

        Protected constructor.

        Parameters:
        bytes - ICU conversion data file
        Throws:
        java.io.IOException - throw if data file fails authentication
    • Method Detail

      • readStaticData

        protected void readStaticData​(UConverterStaticData sd)
                               throws java.io.IOException
        Throws:
        java.io.IOException
      • bytesReadAfterStaticData

        int bytesReadAfterStaticData()
      • readMBCSHeader

        protected void readMBCSHeader​(CharsetMBCS.MBCSHeader h)
                               throws java.io.IOException
        Throws:
        java.io.IOException
      • readBaseTableName

        protected java.lang.String readBaseTableName()
                                              throws java.io.IOException
        Throws:
        java.io.IOException
      • dataFormatHasUnicodeMask

        boolean dataFormatHasUnicodeMask()
        Data formatVersion 6.1 and higher has a unicodeMask.