Package com.ibm.icu.text
Class TransliteratorParser
- java.lang.Object
-
- com.ibm.icu.text.TransliteratorParser
-
class TransliteratorParser extends java.lang.Object
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description private class
TransliteratorParser.ParseData
This class implements the SymbolTable interface.private static class
TransliteratorParser.RuleArray
RuleBody subclass for a String[] array.private static class
TransliteratorParser.RuleBody
A private abstract class representing the interface to rule source code that is broken up into lines.private static class
TransliteratorParser.RuleHalf
A class representing one side of a rule.
-
Field Summary
Fields Modifier and Type Field Description private static char
ALT_FORWARD_RULE_OP
private static char
ALT_FUNCTION
private static char
ALT_FWDREV_RULE_OP
private static char
ALT_REVERSE_RULE_OP
private static char
ANCHOR_START
UnicodeSet
compoundFilter
PUBLIC data member containing the parsed compound filter, if any.private static char
CONTEXT_ANTE
private static char
CONTEXT_POST
private RuleBasedTransliterator.Data
curData
The current data object for which we are parsing rulesprivate static char
CURSOR_OFFSET
private static char
CURSOR_POS
java.util.List<RuleBasedTransliterator.Data>
dataVector
PUBLIC data member.private int
direction
private static char
DOT
private static java.lang.String
DOT_SET
private int
dotStandIn
The stand-in character for the 'dot' set, represented by '.' in patterns.private static char
END_OF_RULE
private static char
ESCAPE
private static char
FORWARD_RULE_OP
private static char
FUNCTION
private static char
FWDREV_RULE_OP
private static java.lang.String
HALF_ENDERS
private static java.lang.String
ID_TOKEN
private static int
ID_TOKEN_LEN
java.util.List<java.lang.String>
idBlockVector
PUBLIC data member.private static UnicodeSet
ILLEGAL_FUNC
private static UnicodeSet
ILLEGAL_SEG
private static UnicodeSet
ILLEGAL_TOP
private static char
KLEENE_STAR
private static char
ONE_OR_MORE
private static java.lang.String
OPERATORS
private TransliteratorParser.ParseData
parseData
Temporary symbol table used during parsing.private static char
QUOTE
private static char
REVERSE_RULE_OP
private static char
RULE_COMMENT_CHAR
private static char
SEGMENT_CLOSE
private static char
SEGMENT_OPEN
private java.util.List<StringMatcher>
segmentObjects
Vector of StringMatcher objects for segments.private java.lang.StringBuffer
segmentStandins
String of standins for segments.private java.lang.String
undefinedVariableName
When we encounter an undefined variable, we do not immediately signal an error, in case we are defining this variable, e.g., "$a = [a-z];".private static char
VARIABLE_DEF_OP
private char
variableLimit
The last available stand-in for variables.private java.util.Map<java.lang.String,char[]>
variableNames
Temporary table of variable names.private char
variableNext
The next available stand-in for variables.private java.util.List<java.lang.Object>
variablesVector
Temporary vector of set variables.private static char
ZERO_OR_ONE
-
Constructor Summary
Constructors Constructor Description TransliteratorParser()
Constructor.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description private void
appendVariableDef(java.lang.String name, java.lang.StringBuffer buf)
Append the value of the given variable name to the given StringBuffer.private void
checkVariableRange(int ch, java.lang.String rule, int start)
Assert that the given character is NOT within the variable range.(package private) char
generateStandInFor(java.lang.Object obj)
Generate and return a stand-in for a new UnicodeMatcher or UnicodeReplacer.(package private) char
getDotStandIn()
Return the stand-in for the dot set.char
getSegmentStandin(int seg)
Return the standin for segment seg (1-based).void
parse(java.lang.String rules, int dir)
Parse a set of rules.private int
parsePragma(java.lang.String rule, int pos, int limit)
Parse a pragma.private int
parseRule(java.lang.String rule, int pos, int limit)
MAIN PARSER.(package private) void
parseRules(TransliteratorParser.RuleBody ruleArray, int dir)
Parse an array of zero or more rules.private char
parseSet(java.lang.String rule, java.text.ParsePosition pos)
Parse a UnicodeSet out, store it, and return the stand-in character used to represent it.private void
pragmaMaximumBackup(int backup)
Set the maximum backup to 'backup', in response to a pragma statement.private void
pragmaNormalizeRules(Normalizer.Mode mode)
Begin normalizing all rules using the given mode, in response to a pragma statement.(package private) static boolean
resemblesPragma(java.lang.String rule, int pos, int limit)
Return true if the given rule looks like a pragma.(package private) static int
ruleEnd(java.lang.String rule, int start, int limit)
void
setSegmentObject(int seg, StringMatcher obj)
Set the object for segment seg (1-based).private void
setVariableRange(int start, int end)
Set the variable range to [start, end] (inclusive).(package private) static void
syntaxError(java.lang.String msg, java.lang.String rule, int start)
Throw an exception indicating a syntax error.
-
-
-
Field Detail
-
dataVector
public java.util.List<RuleBasedTransliterator.Data> dataVector
PUBLIC data member. A Vector of RuleBasedTransliterator.Data objects, one for each discrete group of rules in the rule set
-
idBlockVector
public java.util.List<java.lang.String> idBlockVector
PUBLIC data member. A Vector of Strings containing all of the ID blocks in the rule set
-
curData
private RuleBasedTransliterator.Data curData
The current data object for which we are parsing rules
-
compoundFilter
public UnicodeSet compoundFilter
PUBLIC data member containing the parsed compound filter, if any.
-
direction
private int direction
-
parseData
private TransliteratorParser.ParseData parseData
Temporary symbol table used during parsing.
-
variablesVector
private java.util.List<java.lang.Object> variablesVector
Temporary vector of set variables. When parsing is complete, this is copied into the array data.variables. As with data.variables, element 0 corresponds to character data.variablesBase.
-
variableNames
private java.util.Map<java.lang.String,char[]> variableNames
Temporary table of variable names. When parsing is complete, this is copied into data.variableNames.
-
segmentStandins
private java.lang.StringBuffer segmentStandins
String of standins for segments. Used during the parsing of a single rule. segmentStandins.charAt(0) is the standin for "$1" and corresponds to StringMatcher object segmentObjects.elementAt(0), etc.
-
segmentObjects
private java.util.List<StringMatcher> segmentObjects
Vector of StringMatcher objects for segments. Used during the parsing of a single rule. segmentStandins.charAt(0) is the standin for "$1" and corresponds to StringMatcher object segmentObjects.elementAt(0), etc.
-
variableNext
private char variableNext
The next available stand-in for variables. This starts at some point in the private use area (discovered dynamically) and increments up towardvariableLimit
. At any point during parsing, available variables arevariableNext..variableLimit-1
.
-
variableLimit
private char variableLimit
The last available stand-in for variables. This is discovered dynamically. At any point during parsing, available variables arevariableNext..variableLimit-1
. During variable definition we use the special value variableLimit-1 as a placeholder.
-
undefinedVariableName
private java.lang.String undefinedVariableName
When we encounter an undefined variable, we do not immediately signal an error, in case we are defining this variable, e.g., "$a = [a-z];". Instead, we save the name of the undefined variable, and substitute in the placeholder char variableLimit - 1, and decrement variableLimit.
-
dotStandIn
private int dotStandIn
The stand-in character for the 'dot' set, represented by '.' in patterns. This is allocated the first time it is needed, and reused thereafter.
-
ID_TOKEN
private static final java.lang.String ID_TOKEN
- See Also:
- Constant Field Values
-
ID_TOKEN_LEN
private static final int ID_TOKEN_LEN
- See Also:
- Constant Field Values
-
VARIABLE_DEF_OP
private static final char VARIABLE_DEF_OP
- See Also:
- Constant Field Values
-
FORWARD_RULE_OP
private static final char FORWARD_RULE_OP
- See Also:
- Constant Field Values
-
REVERSE_RULE_OP
private static final char REVERSE_RULE_OP
- See Also:
- Constant Field Values
-
FWDREV_RULE_OP
private static final char FWDREV_RULE_OP
- See Also:
- Constant Field Values
-
OPERATORS
private static final java.lang.String OPERATORS
- See Also:
- Constant Field Values
-
HALF_ENDERS
private static final java.lang.String HALF_ENDERS
- See Also:
- Constant Field Values
-
QUOTE
private static final char QUOTE
- See Also:
- Constant Field Values
-
ESCAPE
private static final char ESCAPE
- See Also:
- Constant Field Values
-
END_OF_RULE
private static final char END_OF_RULE
- See Also:
- Constant Field Values
-
RULE_COMMENT_CHAR
private static final char RULE_COMMENT_CHAR
- See Also:
- Constant Field Values
-
CONTEXT_ANTE
private static final char CONTEXT_ANTE
- See Also:
- Constant Field Values
-
CONTEXT_POST
private static final char CONTEXT_POST
- See Also:
- Constant Field Values
-
CURSOR_POS
private static final char CURSOR_POS
- See Also:
- Constant Field Values
-
CURSOR_OFFSET
private static final char CURSOR_OFFSET
- See Also:
- Constant Field Values
-
ANCHOR_START
private static final char ANCHOR_START
- See Also:
- Constant Field Values
-
KLEENE_STAR
private static final char KLEENE_STAR
- See Also:
- Constant Field Values
-
ONE_OR_MORE
private static final char ONE_OR_MORE
- See Also:
- Constant Field Values
-
ZERO_OR_ONE
private static final char ZERO_OR_ONE
- See Also:
- Constant Field Values
-
DOT
private static final char DOT
- See Also:
- Constant Field Values
-
DOT_SET
private static final java.lang.String DOT_SET
- See Also:
- Constant Field Values
-
SEGMENT_OPEN
private static final char SEGMENT_OPEN
- See Also:
- Constant Field Values
-
SEGMENT_CLOSE
private static final char SEGMENT_CLOSE
- See Also:
- Constant Field Values
-
FUNCTION
private static final char FUNCTION
- See Also:
- Constant Field Values
-
ALT_REVERSE_RULE_OP
private static final char ALT_REVERSE_RULE_OP
- See Also:
- Constant Field Values
-
ALT_FORWARD_RULE_OP
private static final char ALT_FORWARD_RULE_OP
- See Also:
- Constant Field Values
-
ALT_FWDREV_RULE_OP
private static final char ALT_FWDREV_RULE_OP
- See Also:
- Constant Field Values
-
ALT_FUNCTION
private static final char ALT_FUNCTION
- See Also:
- Constant Field Values
-
ILLEGAL_TOP
private static UnicodeSet ILLEGAL_TOP
-
ILLEGAL_SEG
private static UnicodeSet ILLEGAL_SEG
-
ILLEGAL_FUNC
private static UnicodeSet ILLEGAL_FUNC
-
-
Method Detail
-
parse
public void parse(java.lang.String rules, int dir)
Parse a set of rules. After the parse completes, examine the public data members for results.
-
parseRules
void parseRules(TransliteratorParser.RuleBody ruleArray, int dir)
Parse an array of zero or more rules. The strings in the array are treated as if they were concatenated together, with rule terminators inserted between array elements if not present already. Any previous rules are discarded. Typically this method is called exactly once, during construction. The member this.data will be set to null if there are no rules.- Throws:
IllegalIcuArgumentException
- if there is a syntax error in the rules
-
parseRule
private int parseRule(java.lang.String rule, int pos, int limit)
MAIN PARSER. Parse the next rule in the given rule string, starting at pos. Return the index after the last character parsed. Do not parse characters at or after limit. Important: The character at pos must be a non-whitespace character that is not the comment character. This method handles quoting, escaping, and whitespace removal. It parses the end-of-rule character. It recognizes context and cursor indicators. Once it does a lexical breakdown of the rule at pos, it creates a rule object and adds it to our rule list. This method is tightly coupled to the inner class RuleHalf.
-
setVariableRange
private void setVariableRange(int start, int end)
Set the variable range to [start, end] (inclusive).
-
checkVariableRange
private void checkVariableRange(int ch, java.lang.String rule, int start)
Assert that the given character is NOT within the variable range. If it is, signal an error. This is necessary to ensure that the variable range does not overlap characters used in a rule.
-
pragmaMaximumBackup
private void pragmaMaximumBackup(int backup)
Set the maximum backup to 'backup', in response to a pragma statement.
-
pragmaNormalizeRules
private void pragmaNormalizeRules(Normalizer.Mode mode)
Begin normalizing all rules using the given mode, in response to a pragma statement.
-
resemblesPragma
static boolean resemblesPragma(java.lang.String rule, int pos, int limit)
Return true if the given rule looks like a pragma.- Parameters:
pos
- offset to the first non-whitespace character of the rule.limit
- pointer past the last character of the rule.
-
parsePragma
private int parsePragma(java.lang.String rule, int pos, int limit)
Parse a pragma. This method assumes resemblesPragma() has already returned true.- Parameters:
pos
- offset to the first non-whitespace character of the rule.limit
- pointer past the last character of the rule.- Returns:
- the position index after the final ';' of the pragma, or -1 on failure.
-
syntaxError
static final void syntaxError(java.lang.String msg, java.lang.String rule, int start)
Throw an exception indicating a syntax error. Search the rule string for the probable end of the rule. Of course, if the error is that the end of rule marker is missing, then the rule end will not be found. In any case the rule start will be correctly reported.- Parameters:
msg
- error descriptionrule
- pattern stringstart
- position of first character of current rule
-
ruleEnd
static final int ruleEnd(java.lang.String rule, int start, int limit)
-
parseSet
private final char parseSet(java.lang.String rule, java.text.ParsePosition pos)
Parse a UnicodeSet out, store it, and return the stand-in character used to represent it.
-
generateStandInFor
char generateStandInFor(java.lang.Object obj)
Generate and return a stand-in for a new UnicodeMatcher or UnicodeReplacer. Store the object.
-
getSegmentStandin
public char getSegmentStandin(int seg)
Return the standin for segment seg (1-based).
-
setSegmentObject
public void setSegmentObject(int seg, StringMatcher obj)
Set the object for segment seg (1-based).
-
getDotStandIn
char getDotStandIn()
Return the stand-in for the dot set. It is allocated the first time and reused thereafter.
-
appendVariableDef
private void appendVariableDef(java.lang.String name, java.lang.StringBuffer buf)
Append the value of the given variable name to the given StringBuffer.- Throws:
IllegalIcuArgumentException
- if the name is unknown.
-
-