Package com.ibm.icu.charset
Class UConverterDataReader
- java.lang.Object
-
- com.ibm.icu.charset.UConverterDataReader
-
final class UConverterDataReader extends java.lang.Object
ucnvmbcs.h ICU conversion (.cnv) data file structure, following the usual UDataInfo header. Format version: 6.2 struct UConverterStaticData -- struct containing the converter name, IBM CCSID, min/max bytes per character, etc. see ucnv_bld.h -------------------- The static data is followed by conversionType-specific data structures. At the moment, there are only variations of MBCS converters. They all have the same toUnicode structures, while the fromUnicode structures for SBCS differ from those for other MBCS-style converters. _MBCSHeader.version 4.2 adds an optional conversion extension data structure. If it is present, then an ICU version reading header versions 4.0 or 4.1 will be able to use the base table and ignore the extension. The unicodeMask in the static data is part of the base table data structure. Especially, the UCNV_HAS_SUPPLEMENTARY flag determines the length of the fromUnicode stage 1 array. The static data unicodeMask refers only to the base table's properties if a base table is included. In an extension-only file, the static data unicodeMask is 0. The extension data indexes have a separate field with the unicodeMask flags. MBCS-style data structure following the static data. Offsets are counted in bytes from the beginning of the MBCS header structure. Details about usage in comments in ucnvmbcs.c. struct _MBCSHeader (see the definition in this header file below) contains 32-bit fields as follows: 8 values: 0 uint8_t[4] MBCS version in UVersionInfo format (currently 4.2.0.0) 1 uint32_t countStates 2 uint32_t countToUFallbacks 3 uint32_t offsetToUCodeUnits 4 uint32_t offsetFromUTable 5 uint32_t offsetFromUBytes 6 uint32_t flags, bits: 31.. 8 offsetExtension -- _MBCSHeader.version 4.2 (ICU 2.8) and higher 0 for older versions and if there is not extension structure 7.. 0 outputType 7 uint32_t fromUBytesLength -- _MBCSHeader.version 4.1 (ICU 2.4) and higher counts bytes in fromUBytes[] if(outputType==MBCS_OUTPUT_EXT_ONLY) { -- base table name for extension-only table char baseTableName[variable]; -- with NUL plus padding for 4-alignment -- all _MBCSHeader fields except for version and flags are 0 } else { -- normal base table with optional extension int32_t stateTable[countStates][256]; struct _MBCSToUFallback { (fallbacks are sorted by offset) uint32_t offset; UChar32 codePoint; } toUFallbacks[countToUFallbacks]; uint16_t unicodeCodeUnits[(offsetFromUTable-offsetToUCodeUnits)/2]; (padded to an even number of units) -- stage 1 tables if(staticData.unicodeMask&UCNV_HAS_SUPPLEMENTARY) { -- stage 1 table for all of Unicode uint16_t fromUTable[0x440]; (32-bit-aligned) } else { -- BMP-only tables have a smaller stage 1 table uint16_t fromUTable[0x40]; (32-bit-aligned) } -- stage 2 tables length determined by top of stage 1 and bottom of stage 3 tables if(outputType==MBCS_OUTPUT_1) { -- SBCS: pure indexes uint16_t stage 2 indexes[?]; } else { -- DBCS, MBCS, EBCDIC_STATEFUL, ...: roundtrip flags and indexes uint32_t stage 2 flags and indexes[?]; } -- stage 3 tables with byte results if(outputType==MBCS_OUTPUT_1) { -- SBCS: each 16-bit result contains flags and the result byte, see ucnvmbcs.c uint16_t fromUBytes[fromUBytesLength/2]; } else { -- DBCS, MBCS, EBCDIC_STATEFUL, ... 2/3/4 bytes result, see ucnvmbcs.c uint8_t fromUBytes[fromUBytesLength]; or uint16_t fromUBytes[fromUBytesLength/2]; or uint32_t fromUBytes[fromUBytesLength/4]; } } -- extension table, details see ucnv_ext.h int32_t indexes[>=32]; ...
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description private static class
UConverterDataReader.IsAcceptable
-
Field Summary
Fields Modifier and Type Field Description private java.nio.ByteBuffer
byteBuffer
ICU data file input streamprivate static int
DATA_FORMAT_ID
File format version that this class understands.private static UConverterDataReader.IsAcceptable
IS_ACCEPTABLE
private int
posAfterStaticData
The buffer position after the static data.
-
Constructor Summary
Constructors Modifier Constructor Description protected
UConverterDataReader(java.nio.ByteBuffer bytes)
Protected constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description (package private) int
bytesReadAfterStaticData()
(package private) boolean
dataFormatHasUnicodeMask()
Data formatVersion 6.1 and higher has a unicodeMask.protected java.lang.String
readBaseTableName()
protected java.nio.ByteBuffer
readExtIndexes(int skip)
protected void
readMBCSHeader(CharsetMBCS.MBCSHeader h)
protected void
readMBCSTable(CharsetMBCS.MBCSHeader header, CharsetMBCS.UConverterMBCSTable mbcsTable)
protected void
readStaticData(UConverterStaticData sd)
-
-
-
Field Detail
-
IS_ACCEPTABLE
private static final UConverterDataReader.IsAcceptable IS_ACCEPTABLE
-
posAfterStaticData
private int posAfterStaticData
The buffer position after the static data.
-
byteBuffer
private java.nio.ByteBuffer byteBuffer
ICU data file input stream
-
DATA_FORMAT_ID
private static final int DATA_FORMAT_ID
File format version that this class understands. No guarantees are made if a older version is used see store.c of gennorm for more information and values- See Also:
- Constant Field Values
-
-
Method Detail
-
readStaticData
protected void readStaticData(UConverterStaticData sd) throws java.io.IOException
- Throws:
java.io.IOException
-
bytesReadAfterStaticData
int bytesReadAfterStaticData()
-
readMBCSHeader
protected void readMBCSHeader(CharsetMBCS.MBCSHeader h) throws java.io.IOException
- Throws:
java.io.IOException
-
readMBCSTable
protected void readMBCSTable(CharsetMBCS.MBCSHeader header, CharsetMBCS.UConverterMBCSTable mbcsTable) throws java.io.IOException
- Throws:
java.io.IOException
-
readBaseTableName
protected java.lang.String readBaseTableName() throws java.io.IOException
- Throws:
java.io.IOException
-
readExtIndexes
protected java.nio.ByteBuffer readExtIndexes(int skip) throws java.io.IOException, InvalidFormatException
- Throws:
java.io.IOException
InvalidFormatException
-
dataFormatHasUnicodeMask
boolean dataFormatHasUnicodeMask()
Data formatVersion 6.1 and higher has a unicodeMask.
-
-