|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectcom.ibm.icu.text.BreakIterator
com.ibm.icu.text.RuleBasedBreakIterator
com.ibm.icu.text.RuleBasedBreakIterator_Old
com.ibm.icu.text.DictionaryBasedBreakIterator
public class DictionaryBasedBreakIterator
A subclass of RuleBasedBreakIterator_Old that adds the ability to use a dictionary to further subdivide ranges of text beyond what is possible using just the state-table-based algorithm. This is necessary, for example, to handle word and line breaking in Thai, which doesn't use spaces between words. The state-table-based algorithm used by RuleBasedBreakIterator_Old is used to divide up text as far as possible, and then contiguous ranges of letters are repeatedly compared against a list of known words (i.e., the dictionary) to divide them up into words. DictionaryBasedBreakIterator uses the same rule language as RuleBasedBreakIterator_Old, but adds one more special substitution name: _dictionary_. This substitution name is used to identify characters in words in the dictionary. The idea is that if the iterator passes over a chunk of text that includes two or more characters in a row that are included in _dictionary_, it goes back through that range and derives additional break positions (if possible) using the dictionary. DictionaryBasedBreakIterator is also constructed with the filename of a dictionary file. It uses Class.getResource() to locate the dictionary file. The dictionary file is in a serialized binary format. We have a very primitive (and slow) BuildDictionaryFile utility for creating dictionary files, but aren't currently making it public. Contact us for help.
Nested Class Summary | |
---|---|
protected class |
DictionaryBasedBreakIterator.Builder
The Builder class for DictionaryBasedBreakIterator inherits almost all of its functionality from the Builder class for RuleBasedBreakIterator_Old, but extends it with extra logic to handle the DICTIONARY_VAR token |
Field Summary |
---|
Fields inherited from class com.ibm.icu.text.RuleBasedBreakIterator_Old |
---|
IGNORE |
Fields inherited from class com.ibm.icu.text.RuleBasedBreakIterator |
---|
WORD_IDEO, WORD_IDEO_LIMIT, WORD_KANA, WORD_KANA_LIMIT, WORD_LETTER, WORD_LETTER_LIMIT, WORD_NONE, WORD_NONE_LIMIT, WORD_NUMBER, WORD_NUMBER_LIMIT |
Fields inherited from class com.ibm.icu.text.BreakIterator |
---|
DONE, KIND_CHARACTER, KIND_LINE, KIND_SENTENCE, KIND_TITLE, KIND_WORD |
Constructor Summary | |
---|---|
DictionaryBasedBreakIterator(String description,
InputStream dictionaryStream)
Constructs a DictionaryBasedBreakIterator. |
Method Summary | |
---|---|
int |
first()
Sets the current iteration position to the beginning of the text. |
int |
following(int offset)
Sets the current iteration position to the first boundary position after the specified position. |
protected int |
handleNext()
This is the implementation function for next(). |
int |
last()
Sets the current iteration position to the end of the text. |
protected int |
lookupCategory(char c)
Looks up a character category for a character. |
protected RuleBasedBreakIterator_Old.Builder |
makeBuilder()
Returns a Builder that is customized to build a DictionaryBasedBreakIterator. |
int |
preceding(int offset)
Sets the current iteration position to the last boundary position before the specified position. |
int |
previous()
Advances the iterator one step backwards. |
void |
setText(CharacterIterator newText)
Set the iterator to analyze a new piece of text. |
void |
writeTablesToFile(FileOutputStream file,
boolean littleEndian)
Write the RBBI runtime engine state transition tables to a file. |
Methods inherited from class com.ibm.icu.text.RuleBasedBreakIterator_Old |
---|
checkOffset, clone, current, debugDumpTables, debugPrintln, equals, getRuleStatus, getRuleStatusVec, getText, handlePrevious, hashCode, isBoundary, lookupBackwardState, lookupState, next, next, toString, writeSwappedInt, writeSwappedShort |
Methods inherited from class com.ibm.icu.text.RuleBasedBreakIterator |
---|
getInstanceFromCompiledRules |
Methods inherited from class java.lang.Object |
---|
finalize, getClass, notify, notifyAll, wait, wait, wait |
Constructor Detail |
---|
public DictionaryBasedBreakIterator(String description, InputStream dictionaryStream) throws IOException
description
- Same as the description parameter on RuleBasedBreakIterator_Old,
except for the special meaning of DICTIONARY_VAR. This parameter is just
passed through to RuleBasedBreakIterator_Old's constructor.dictionaryStream
- the stream containing the dictionary data
IOException
Method Detail |
---|
protected RuleBasedBreakIterator_Old.Builder makeBuilder()
makeBuilder
in class RuleBasedBreakIterator_Old
public void writeTablesToFile(FileOutputStream file, boolean littleEndian) throws IOException
RuleBasedBreakIterator_Old
writeTablesToFile
in class RuleBasedBreakIterator_Old
IOException
public void setText(CharacterIterator newText)
RuleBasedBreakIterator_Old
setText
in class RuleBasedBreakIterator_Old
newText
- An iterator over the text to analyze.public int first()
first
in class RuleBasedBreakIterator_Old
public int last()
last
in class RuleBasedBreakIterator_Old
public int previous()
previous
in class RuleBasedBreakIterator_Old
public int preceding(int offset)
preceding
in class RuleBasedBreakIterator_Old
offset
- The position to begin searching from
public int following(int offset)
following
in class RuleBasedBreakIterator_Old
offset
- The position to begin searching forward from
protected int handleNext()
handleNext
in class RuleBasedBreakIterator_Old
protected int lookupCategory(char c)
lookupCategory
in class RuleBasedBreakIterator_Old
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |