Class TransliteratorParser


  • class TransliteratorParser
    extends java.lang.Object
    • Field Detail

      • dataVector

        public java.util.List<RuleBasedTransliterator.Data> dataVector
        PUBLIC data member. A Vector of RuleBasedTransliterator.Data objects, one for each discrete group of rules in the rule set
      • idBlockVector

        public java.util.List<java.lang.String> idBlockVector
        PUBLIC data member. A Vector of Strings containing all of the ID blocks in the rule set
      • compoundFilter

        public UnicodeSet compoundFilter
        PUBLIC data member containing the parsed compound filter, if any.
      • direction

        private int direction
      • variablesVector

        private java.util.List<java.lang.Object> variablesVector
        Temporary vector of set variables. When parsing is complete, this is copied into the array data.variables. As with data.variables, element 0 corresponds to character data.variablesBase.
      • variableNames

        private java.util.Map<java.lang.String,​char[]> variableNames
        Temporary table of variable names. When parsing is complete, this is copied into data.variableNames.
      • segmentStandins

        private java.lang.StringBuffer segmentStandins
        String of standins for segments. Used during the parsing of a single rule. segmentStandins.charAt(0) is the standin for "$1" and corresponds to StringMatcher object segmentObjects.elementAt(0), etc.
      • segmentObjects

        private java.util.List<StringMatcher> segmentObjects
        Vector of StringMatcher objects for segments. Used during the parsing of a single rule. segmentStandins.charAt(0) is the standin for "$1" and corresponds to StringMatcher object segmentObjects.elementAt(0), etc.
      • variableNext

        private char variableNext
        The next available stand-in for variables. This starts at some point in the private use area (discovered dynamically) and increments up toward variableLimit. At any point during parsing, available variables are variableNext..variableLimit-1.
      • variableLimit

        private char variableLimit
        The last available stand-in for variables. This is discovered dynamically. At any point during parsing, available variables are variableNext..variableLimit-1. During variable definition we use the special value variableLimit-1 as a placeholder.
      • undefinedVariableName

        private java.lang.String undefinedVariableName
        When we encounter an undefined variable, we do not immediately signal an error, in case we are defining this variable, e.g., "$a = [a-z];". Instead, we save the name of the undefined variable, and substitute in the placeholder char variableLimit - 1, and decrement variableLimit.
      • dotStandIn

        private int dotStandIn
        The stand-in character for the 'dot' set, represented by '.' in patterns. This is allocated the first time it is needed, and reused thereafter.
      • ILLEGAL_TOP

        private static UnicodeSet ILLEGAL_TOP
      • ILLEGAL_SEG

        private static UnicodeSet ILLEGAL_SEG
      • ILLEGAL_FUNC

        private static UnicodeSet ILLEGAL_FUNC
    • Constructor Detail

      • TransliteratorParser

        public TransliteratorParser()
        Constructor.
    • Method Detail

      • parse

        public void parse​(java.lang.String rules,
                          int dir)
        Parse a set of rules. After the parse completes, examine the public data members for results.
      • parseRules

        void parseRules​(TransliteratorParser.RuleBody ruleArray,
                        int dir)
        Parse an array of zero or more rules. The strings in the array are treated as if they were concatenated together, with rule terminators inserted between array elements if not present already. Any previous rules are discarded. Typically this method is called exactly once, during construction. The member this.data will be set to null if there are no rules.
        Throws:
        IllegalIcuArgumentException - if there is a syntax error in the rules
      • parseRule

        private int parseRule​(java.lang.String rule,
                              int pos,
                              int limit)
        MAIN PARSER. Parse the next rule in the given rule string, starting at pos. Return the index after the last character parsed. Do not parse characters at or after limit. Important: The character at pos must be a non-whitespace character that is not the comment character. This method handles quoting, escaping, and whitespace removal. It parses the end-of-rule character. It recognizes context and cursor indicators. Once it does a lexical breakdown of the rule at pos, it creates a rule object and adds it to our rule list. This method is tightly coupled to the inner class RuleHalf.
      • setVariableRange

        private void setVariableRange​(int start,
                                      int end)
        Set the variable range to [start, end] (inclusive).
      • checkVariableRange

        private void checkVariableRange​(int ch,
                                        java.lang.String rule,
                                        int start)
        Assert that the given character is NOT within the variable range. If it is, signal an error. This is necessary to ensure that the variable range does not overlap characters used in a rule.
      • pragmaMaximumBackup

        private void pragmaMaximumBackup​(int backup)
        Set the maximum backup to 'backup', in response to a pragma statement.
      • pragmaNormalizeRules

        private void pragmaNormalizeRules​(Normalizer.Mode mode)
        Begin normalizing all rules using the given mode, in response to a pragma statement.
      • resemblesPragma

        static boolean resemblesPragma​(java.lang.String rule,
                                       int pos,
                                       int limit)
        Return true if the given rule looks like a pragma.
        Parameters:
        pos - offset to the first non-whitespace character of the rule.
        limit - pointer past the last character of the rule.
      • parsePragma

        private int parsePragma​(java.lang.String rule,
                                int pos,
                                int limit)
        Parse a pragma. This method assumes resemblesPragma() has already returned true.
        Parameters:
        pos - offset to the first non-whitespace character of the rule.
        limit - pointer past the last character of the rule.
        Returns:
        the position index after the final ';' of the pragma, or -1 on failure.
      • syntaxError

        static final void syntaxError​(java.lang.String msg,
                                      java.lang.String rule,
                                      int start)
        Throw an exception indicating a syntax error. Search the rule string for the probable end of the rule. Of course, if the error is that the end of rule marker is missing, then the rule end will not be found. In any case the rule start will be correctly reported.
        Parameters:
        msg - error description
        rule - pattern string
        start - position of first character of current rule
      • ruleEnd

        static final int ruleEnd​(java.lang.String rule,
                                 int start,
                                 int limit)
      • parseSet

        private final char parseSet​(java.lang.String rule,
                                    java.text.ParsePosition pos)
        Parse a UnicodeSet out, store it, and return the stand-in character used to represent it.
      • generateStandInFor

        char generateStandInFor​(java.lang.Object obj)
        Generate and return a stand-in for a new UnicodeMatcher or UnicodeReplacer. Store the object.
      • getSegmentStandin

        public char getSegmentStandin​(int seg)
        Return the standin for segment seg (1-based).
      • setSegmentObject

        public void setSegmentObject​(int seg,
                                     StringMatcher obj)
        Set the object for segment seg (1-based).
      • getDotStandIn

        char getDotStandIn()
        Return the stand-in for the dot set. It is allocated the first time and reused thereafter.
      • appendVariableDef

        private void appendVariableDef​(java.lang.String name,
                                       java.lang.StringBuffer buf)
        Append the value of the given variable name to the given StringBuffer.
        Throws:
        IllegalIcuArgumentException - if the name is unknown.