Class UnicodeCompressor
- All Implemented Interfaces:
SCSU
The SCSU works by using dynamically positioned windows consisting of 128 consecutive characters in Unicode. During compression, characters within a window are encoded in the compressed stream as the bytes 0x7F - 0xFF. The SCSU provides transparency for the characters (bytes) between U+0000 - U+00FF. The SCSU approximates the storage size of traditional character sets, for example 1 byte per character for ASCII or Latin-1 text, and 2 bytes per character for CJK ideographs.
USAGE
The static methods on UnicodeCompressor may be used in a straightforward manner to compress simple strings:
String s = ... ; // get string from somewhere byte [] compressed = UnicodeCompressor.compress(s);
The static methods have a fairly large memory footprint. For finer-grained control over memory usage, UnicodeCompressor offers more powerful APIs allowing iterative compression:
// Compress an array "chars" of length "len" using a buffer of 512 bytes // to the OutputStream "out" UnicodeCompressor myCompressor = new UnicodeCompressor(); final static int BUFSIZE = 512; byte [] byteBuffer = new byte [ BUFSIZE ]; int bytesWritten = 0; int [] unicharsRead = new int [1]; int totalCharsCompressed = 0; int totalBytesWritten = 0; do { // do the compression bytesWritten = myCompressor.compress(chars, totalCharsCompressed, len, unicharsRead, byteBuffer, 0, BUFSIZE); // do something with the current set of bytes out.write(byteBuffer, 0, bytesWritten); // update the no. of characters compressed totalCharsCompressed += unicharsRead[0]; // update the no. of bytes written totalBytesWritten += bytesWritten; } while(totalCharsCompressed < len); myCompressor.reset(); // reuse compressor
- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate int
Alias to current dynamic windowprivate int[]
Keeps count of times character indices are encounteredprivate int
Current compression modeprivate int[]
Dynamic compression window offsetsprivate int
The current time stampprivate int[]
The time stamps indicate when a window was last definedprivate static boolean[]
For quick identification of a byte as a single-byte mode tagprivate static boolean[]
For quick identification of a byte as a unicode mode tagFields inherited from interface com.ibm.icu.text.SCSU
ARMENIANINDEX, COMPRESSIONOFFSET, GREEKINDEX, HALFWIDTHKATAKANAINDEX, HIRAGANAINDEX, INVALIDCHAR, INVALIDWINDOW, IPAEXTENSIONINDEX, KATAKANAINDEX, LATININDEX, MAXINDEX, NUMSTATICWINDOWS, NUMWINDOWS, RESERVEDINDEX, SCHANGE0, SCHANGE1, SCHANGE2, SCHANGE3, SCHANGE4, SCHANGE5, SCHANGE6, SCHANGE7, SCHANGEU, SDEFINE0, SDEFINE1, SDEFINE2, SDEFINE3, SDEFINE4, SDEFINE5, SDEFINE6, SDEFINE7, SDEFINEX, SINGLEBYTEMODE, sOffsets, sOffsetTable, SQUOTE0, SQUOTE1, SQUOTE2, SQUOTE3, SQUOTE4, SQUOTE5, SQUOTE6, SQUOTE7, SQUOTEU, SRESERVED, UCHANGE0, UCHANGE1, UCHANGE2, UCHANGE3, UCHANGE4, UCHANGE5, UCHANGE6, UCHANGE7, UDEFINE0, UDEFINE1, UDEFINE2, UDEFINE3, UDEFINE4, UDEFINE5, UDEFINE6, UDEFINE7, UDEFINEX, UNICODEMODE, UQUOTEU, URESERVED
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic byte[]
compress
(char[] buffer, int start, int limit) Compress a Unicode character array into a byte array.int
compress
(char[] charBuffer, int charBufferStart, int charBufferLimit, int[] charsRead, byte[] byteBuffer, int byteBufferStart, int byteBufferLimit) Compress a Unicode character array into a byte array.static byte[]
Compress a string into a byte array.private int
findDynamicWindow
(int c) Determine if a dynamic window for a certain character is definedprivate static int
findStaticWindow
(int c) Determine if a static window for a certain character is definedprivate int
Find the least-recently defined windowprivate boolean
inDynamicWindow
(int c, int whichWindow) Determine if a character is in a dynamic window.private static boolean
inStaticWindow
(int c, int whichWindow) Determine if a character is in a static window.private static boolean
isCompressible
(int c) Determine if a character is compressible.private static int
makeIndex
(int c) Create the index value for a character.void
reset()
Reset the compressor to its initial state.
-
Field Details
-
sSingleTagTable
private static boolean[] sSingleTagTableFor quick identification of a byte as a single-byte mode tag -
sUnicodeTagTable
private static boolean[] sUnicodeTagTableFor quick identification of a byte as a unicode mode tag -
fCurrentWindow
private int fCurrentWindowAlias to current dynamic window -
fOffsets
private int[] fOffsetsDynamic compression window offsets -
fMode
private int fModeCurrent compression mode -
fIndexCount
private int[] fIndexCountKeeps count of times character indices are encountered -
fTimeStamps
private int[] fTimeStampsThe time stamps indicate when a window was last defined -
fTimeStamp
private int fTimeStampThe current time stamp
-
-
Constructor Details
-
UnicodeCompressor
public UnicodeCompressor()Create a UnicodeCompressor. Sets all windows to their default values.- See Also:
-
-
Method Details
-
compress
Compress a string into a byte array.- Parameters:
buffer
- The string to compress.- Returns:
- A byte array containing the compressed characters.
- See Also:
-
compress
public static byte[] compress(char[] buffer, int start, int limit) Compress a Unicode character array into a byte array.- Parameters:
buffer
- The character buffer to compress.start
- The start of the character run to compress.limit
- The limit of the character run to compress.- Returns:
- A byte array containing the compressed characters.
- See Also:
-
compress
public int compress(char[] charBuffer, int charBufferStart, int charBufferLimit, int[] charsRead, byte[] byteBuffer, int byteBufferStart, int byteBufferLimit) Compress a Unicode character array into a byte array. This function will only consume input that can be completely output.- Parameters:
charBuffer
- The character buffer to compress.charBufferStart
- The start of the character run to compress.charBufferLimit
- The limit of the character run to compress.charsRead
- A one-element array. If not null, on return the number of characters read from charBuffer.byteBuffer
- A buffer to receive the compressed data. This buffer must be at minimum four bytes in size.byteBufferStart
- The starting offset to which to write compressed data.byteBufferLimit
- The limiting offset for writing compressed data.- Returns:
- The number of bytes written to byteBuffer.
-
reset
public void reset()Reset the compressor to its initial state. -
makeIndex
private static int makeIndex(int c) Create the index value for a character. For more information on this function, refer to table X-3 UTR6.- Parameters:
c
- The character in question.- Returns:
- An index for c
-
inDynamicWindow
private boolean inDynamicWindow(int c, int whichWindow) Determine if a character is in a dynamic window.- Parameters:
c
- The character to testwhichWindow
- The dynamic window the test- Returns:
- true if c will fit in whichWindow, false otherwise.
-
inStaticWindow
private static boolean inStaticWindow(int c, int whichWindow) Determine if a character is in a static window.- Parameters:
c
- The character to testwhichWindow
- The static window the test- Returns:
- true if c will fit in whichWindow, false otherwise.
-
isCompressible
private static boolean isCompressible(int c) Determine if a character is compressible.- Parameters:
c
- The character to test.- Returns:
- true if the c is compressible, false otherwise.
-
findDynamicWindow
private int findDynamicWindow(int c) Determine if a dynamic window for a certain character is defined- Parameters:
c
- The character in question- Returns:
- The dynamic window containing c, or INVALIDWINDOW if not defined.
-
findStaticWindow
private static int findStaticWindow(int c) Determine if a static window for a certain character is defined- Parameters:
c
- The character in question- Returns:
- The static window containing c, or INVALIDWINDOW if not defined.
-
getLRDefinedWindow
private int getLRDefinedWindow()Find the least-recently defined window
-