Package com.ibm.icu.impl.breakiter
Class MlBreakEngine
- java.lang.Object
-
- com.ibm.icu.impl.breakiter.MlBreakEngine
-
public class MlBreakEngine extends java.lang.Object
-
-
Field Summary
Fields Modifier and Type Field Description private UnicodeSet
fClosePunctuationSet
private UnicodeSet
fDigitOrOpenPunctuationOrAlphabetSet
private java.util.List<java.util.HashMap<java.lang.String,java.lang.Integer>>
fModel
private int
fNegativeSum
private static int
MAX_FEATURE
-
Constructor Summary
Constructors Constructor Description MlBreakEngine(UnicodeSet digitOrOpenPunctuationOrAlphabetSet, UnicodeSet closePunctuationSet)
Constructor for Chinese and Japanese phrase breaking.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description int
divideUpRange(java.text.CharacterIterator inText, int startPos, int endPos, java.text.CharacterIterator inString, int codePointLength, int[] charPositions, DictionaryBreakEngine.DequeI foundBreaks)
Divide up a range of characters handled by this break engine.private void
evaluateBreakpoint(java.lang.String inputStr, int[] indexList, int startIdx, int numCodeUnits, java.util.ArrayList<java.lang.Integer> boundary)
Evaluate whether the breakpointIdx is a potential breakpoint.private int
initIndexList(java.text.CharacterIterator inString, int[] indexList, int codePointLength)
Initialize the index list from the input string.private void
initKeyValue(UResourceBundle rb, java.lang.String keyName, java.lang.String valueName, java.util.HashMap<java.lang.String,java.lang.Integer> map)
In the machine learning's model file, specify the name of the key and value to load the corresponding feature and its score.private void
loadMLModel()
Load the machine learning's model file.private java.lang.String
transform(java.text.CharacterIterator inString)
Transform a CharacterIterator into a String.
-
-
-
Field Detail
-
MAX_FEATURE
private static final int MAX_FEATURE
- See Also:
- Constant Field Values
-
fDigitOrOpenPunctuationOrAlphabetSet
private UnicodeSet fDigitOrOpenPunctuationOrAlphabetSet
-
fClosePunctuationSet
private UnicodeSet fClosePunctuationSet
-
fModel
private java.util.List<java.util.HashMap<java.lang.String,java.lang.Integer>> fModel
-
fNegativeSum
private int fNegativeSum
-
-
Constructor Detail
-
MlBreakEngine
public MlBreakEngine(UnicodeSet digitOrOpenPunctuationOrAlphabetSet, UnicodeSet closePunctuationSet)
Constructor for Chinese and Japanese phrase breaking.- Parameters:
digitOrOpenPunctuationOrAlphabetSet
- An unicode set with the digit and open punctuation and alphabet.closePunctuationSet
- An unicode set with the close punctuation.
-
-
Method Detail
-
divideUpRange
public int divideUpRange(java.text.CharacterIterator inText, int startPos, int endPos, java.text.CharacterIterator inString, int codePointLength, int[] charPositions, DictionaryBreakEngine.DequeI foundBreaks)
Divide up a range of characters handled by this break engine.- Parameters:
inText
- An input text.startPos
- The start index of the input text.endPos
- The end index of the input text.inString
- A input string normalized from inText from startPos to endPoscodePointLength
- The number of code points of inStringcharPositions
- A map that transforms inString's code point index to code unit index.foundBreaks
- A list to store the breakpoint.- Returns:
- The number of breakpoints
-
transform
private java.lang.String transform(java.text.CharacterIterator inString)
Transform a CharacterIterator into a String.
-
evaluateBreakpoint
private void evaluateBreakpoint(java.lang.String inputStr, int[] indexList, int startIdx, int numCodeUnits, java.util.ArrayList<java.lang.Integer> boundary)
Evaluate whether the breakpointIdx is a potential breakpoint.- Parameters:
inputStr
- An input string to be segmented.indexList
- A code unit index list of the inputStr.startIdx
- The start index of the indexList.numCodeUnits
- The current code unit boundary of the indexList.boundary
- A list including the index of the breakpoint.
-
initIndexList
private int initIndexList(java.text.CharacterIterator inString, int[] indexList, int codePointLength)
Initialize the index list from the input string.- Parameters:
inString
- An input string to be segmented.indexList
- A code unit index list of the inString.codePointLength
- The number of code points of the input string- Returns:
- The number of the code units of the first six characters in inString.
-
loadMLModel
private void loadMLModel()
Load the machine learning's model file.
-
initKeyValue
private void initKeyValue(UResourceBundle rb, java.lang.String keyName, java.lang.String valueName, java.util.HashMap<java.lang.String,java.lang.Integer> map)
In the machine learning's model file, specify the name of the key and value to load the corresponding feature and its score.- Parameters:
rb
- A RedouceBundle corresponding to the model file.keyName
- The kay name in the model file.valueName
- The value name in the model file.map
- A HashMap to store the pairs of the feature and its score.
-
-