Package com.ibm.icu.text
Class CharsetRecog_mbcs
- java.lang.Object
-
- com.ibm.icu.text.CharsetRecognizer
-
- com.ibm.icu.text.CharsetRecog_mbcs
-
- Direct Known Subclasses:
CharsetRecog_mbcs.CharsetRecog_big5
,CharsetRecog_mbcs.CharsetRecog_euc
,CharsetRecog_mbcs.CharsetRecog_gb_18030
,CharsetRecog_mbcs.CharsetRecog_sjis
abstract class CharsetRecog_mbcs extends CharsetRecognizer
CharsetRecognizer implementation for Asian - double or multi-byte - charsets. Match is determined mostly by the input data adhering to the encoding scheme for the charset, and, optionally, frequency-of-occurrence of characters. Instances of this class are singletons, one per encoding being recognized. They are created in the main CharsetDetector class and kept in the global list of available encodings to be checked. The specific encoding being recognized is determined by subclass.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description (package private) static class
CharsetRecog_mbcs.CharsetRecog_big5
Big5 charset recognizer.(package private) static class
CharsetRecog_mbcs.CharsetRecog_euc
EUC charset recognizers.(package private) static class
CharsetRecog_mbcs.CharsetRecog_gb_18030
GB-18030 recognizer.(package private) static class
CharsetRecog_mbcs.CharsetRecog_sjis
Shift-JIS charset recognizer.(package private) static class
CharsetRecog_mbcs.iteratedChar
-
Constructor Summary
Constructors Constructor Description CharsetRecog_mbcs()
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description (package private) abstract java.lang.String
getName()
Get the IANA name of this charset.(package private) int
match(CharsetDetector det, int[] commonChars)
Test the match of this charset with the input text data which is obtained via the CharsetDetector object.(package private) abstract boolean
nextChar(CharsetRecog_mbcs.iteratedChar it, CharsetDetector det)
Get the next character (however many bytes it is) from the input data Subclasses for specific charset encodings must implement this function to get characters according to the rules of their encoding scheme.-
Methods inherited from class com.ibm.icu.text.CharsetRecognizer
getLanguage, match
-
-
-
-
Method Detail
-
getName
abstract java.lang.String getName()
Get the IANA name of this charset.- Specified by:
getName
in classCharsetRecognizer
- Returns:
- the charset name.
-
match
int match(CharsetDetector det, int[] commonChars)
Test the match of this charset with the input text data which is obtained via the CharsetDetector object.- Parameters:
det
- The CharsetDetector, which contains the input text to be checked for being in this charset.- Returns:
- Two values packed into one int (Damn java, anyhow)
bits 0-7: the match confidence, ranging from 0-100
bits 8-15: The match reason, an enum-like value.
-
nextChar
abstract boolean nextChar(CharsetRecog_mbcs.iteratedChar it, CharsetDetector det)
Get the next character (however many bytes it is) from the input data Subclasses for specific charset encodings must implement this function to get characters according to the rules of their encoding scheme. This function is not a method of class iteratedChar only because that would require a lot of extra derived classes, which is awkward.- Parameters:
it
- The iteratedChar "struct" into which the returned char is placed.det
- The charset detector, which is needed to get at the input byte data being iterated over.- Returns:
- True if a character was returned, false at end of input.
-
-