Class TransliterationRule
- java.lang.Object
-
- com.ibm.icu.text.TransliterationRule
-
class TransliterationRule extends java.lang.Object
A transliteration rule used byRuleBasedTransliterator
.TransliterationRule
is an immutable object.A rule consists of an input pattern and an output string. When the input pattern is matched, the output string is emitted. The input pattern consists of zero or more characters which are matched exactly (the key) and optional context. Context must match if it is specified. Context may be specified before the key, after the key, or both. The key, preceding context, and following context may contain variables. Variables represent a set of Unicode characters, such as the letters a through z. Variables are detected by looking up each character in a supplied variable list to see if it has been so defined.
A rule may contain segments in its input string and segment references in its output string. A segment is a substring of the input pattern, indicated by an offset and limit. The segment may be in the preceding or following context. It may not span a context boundary. A segment reference is a special character in the output string that causes a segment of the input string (not the input pattern) to be copied to the output string. The range of special characters that represent segment references is defined by RuleBasedTransliterator.Data.
Example: The rule "([a-z]) . ([0-9]) > $2 . $1" will change the input string "abc.123" to "ab1.c23".
Copyright © IBM Corporation 1999. All rights reserved.
-
-
Field Summary
Fields Modifier and Type Field Description (package private) static int
ANCHOR_END
(package private) static int
ANCHOR_START
Flag attributes.private StringMatcher
anteContext
The match that must occur before the key, or null if there is no preceding context.private int
anteContextLength
The length of the string that must match before the key.private RuleBasedTransliterator.Data
data
An alias pointer to the data for this rule.(package private) byte
flags
Miscellaneous attributes.private StringMatcher
key
The matcher object for the key.private int
keyLength
The length of the key.private UnicodeReplacer
output
The object that performs the replacement if the key, anteContext, and postContext are matched.private java.lang.String
pattern
The string that must be matched, consisting of the anteContext, key, and postContext, concatenated together, in that order.private StringMatcher
postContext
The match that must occur after the key, or null if there is no following context.(package private) UnicodeMatcher[]
segments
An array of matcher objects corresponding to the input pattern segments.
-
Constructor Summary
Constructors Constructor Description TransliterationRule(java.lang.String input, int anteContextPos, int postContextPos, java.lang.String output, int cursorPos, int cursorOffset, UnicodeMatcher[] segs, boolean anchorStart, boolean anchorEnd, RuleBasedTransliterator.Data theData)
Construct a new rule with the given input, output text, and other attributes.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description (package private) void
addSourceTargetSet(UnicodeSet filter, UnicodeSet sourceSet, UnicodeSet targetSet, UnicodeSet revisiting)
Find the source and target sets, subject to the input filter.int
getAnteContextLength()
Return the preceding context length.(package private) int
getIndexValue()
Internal method.boolean
masks(TransliterationRule r2)
Return true if this rule masks another rule.int
matchAndReplace(Replaceable text, Transliterator.Position pos, boolean incremental)
Attempt a match and replacement at the given position.(package private) boolean
matchesIndexValue(int v)
Internal method.(package private) static int
posAfter(Replaceable str, int pos)
(package private) static int
posBefore(Replaceable str, int pos)
java.lang.String
toRule(boolean escapeUnprintable)
Create a source string that represents this rule.java.lang.String
toString()
Return a string representation of this object.
-
-
-
Field Detail
-
anteContext
private StringMatcher anteContext
The match that must occur before the key, or null if there is no preceding context.
-
key
private StringMatcher key
The matcher object for the key. If null, then the key is empty.
-
postContext
private StringMatcher postContext
The match that must occur after the key, or null if there is no following context.
-
output
private UnicodeReplacer output
The object that performs the replacement if the key, anteContext, and postContext are matched. Never null.
-
pattern
private java.lang.String pattern
The string that must be matched, consisting of the anteContext, key, and postContext, concatenated together, in that order. Some components may be empty (zero length).- See Also:
anteContextLength
,keyLength
-
segments
UnicodeMatcher[] segments
An array of matcher objects corresponding to the input pattern segments. If there are no segments this is null. N.B. This is a UnicodeMatcher for generality, but in practice it is always a StringMatcher. In the future we may generalize this, but for now we sometimes cast down to StringMatcher.
-
anteContextLength
private int anteContextLength
The length of the string that must match before the key. If zero, then there is no matching requirement before the key. Substring [0,anteContextLength) of pattern is the anteContext.
-
keyLength
private int keyLength
The length of the key. Substring [anteContextLength, anteContextLength + keyLength) is the key.
-
flags
byte flags
Miscellaneous attributes.
-
ANCHOR_START
static final int ANCHOR_START
Flag attributes.- See Also:
- Constant Field Values
-
ANCHOR_END
static final int ANCHOR_END
- See Also:
- Constant Field Values
-
data
private final RuleBasedTransliterator.Data data
An alias pointer to the data for this rule. The data provides lookup services for matchers and segments.
-
-
Constructor Detail
-
TransliterationRule
public TransliterationRule(java.lang.String input, int anteContextPos, int postContextPos, java.lang.String output, int cursorPos, int cursorOffset, UnicodeMatcher[] segs, boolean anchorStart, boolean anchorEnd, RuleBasedTransliterator.Data theData)
Construct a new rule with the given input, output text, and other attributes. A cursor position may be specified for the output text.- Parameters:
input
- input string, including key and optional ante and post contextanteContextPos
- offset into input to end of ante context, or -1 if none. Must be <= input.length() if not -1.postContextPos
- offset into input to start of post context, or -1 if none. Must be <= input.length() if not -1, and must be >= anteContextPos.output
- output stringcursorPos
- offset into output at which cursor is located, or -1 if none. If less than zero, then the cursor is placed after theoutput
; that is, -1 is equivalent tooutput.length()
. If greater thanoutput.length()
then an exception is thrown.cursorOffset
- an offset to be added to cursorPos to position the cursor either in the ante context, if < 0, or in the post context, if > 0. For example, the rule "abc{def} > | @@@ xyz;" changes "def" to "xyz" and moves the cursor to before "a". It would have a cursorOffset of -3.segs
- array of UnicodeMatcher corresponding to input pattern segments, or null if there are noneanchorStart
- true if the the rule is anchored on the left to the context startanchorEnd
- true if the rule is anchored on the right to the context limit
-
-
Method Detail
-
getAnteContextLength
public int getAnteContextLength()
Return the preceding context length. This method is needed to support theTransliterator
methodgetMaximumContextLength()
.
-
getIndexValue
final int getIndexValue()
Internal method. Returns 8-bit index value for this rule. This is the low byte of the first character of the key, unless the first character of the key is a set. If it's a set, or otherwise can match multiple keys, the index value is -1.
-
matchesIndexValue
final boolean matchesIndexValue(int v)
Internal method. Returns true if this rule matches the given index value. The index value is an 8-bit integer, 0..255, representing the low byte of the first character of the key. It matches this rule if it matches the first character of the key, or if the first character of the key is a set, and the set contains any character with a low byte equal to the index value. If the rule contains only ante context, as in foo)>bar, then it will match any key.
-
masks
public boolean masks(TransliterationRule r2)
Return true if this rule masks another rule. If r1 masks r2 then r1 matches any input string that r2 matches. If r1 masks r2 and r2 masks r1 then r1 == r2. Examples: "a>x" masks "ab>y". "a>x" masks "a[b]>y". "[c]a>x" masks "[dc]a>y".
-
posBefore
static final int posBefore(Replaceable str, int pos)
-
posAfter
static final int posAfter(Replaceable str, int pos)
-
matchAndReplace
public int matchAndReplace(Replaceable text, Transliterator.Position pos, boolean incremental)
Attempt a match and replacement at the given position. Return the degree of match between this rule and the given text. The degree of match may be mismatch, a partial match, or a full match. A mismatch means at least one character of the text does not match the context or key. A partial match means some context and key characters match, but the text is not long enough to match all of them. A full match means all context and key characters match. If a full match is obtained, perform a replacement, update pos, and return U_MATCH. Otherwise both text and pos are unchanged.- Parameters:
text
- the textpos
- the position indicesincremental
- if true, test for partial matches that may be completed by additional text inserted at pos.limit.- Returns:
- one of
U_MISMATCH
,U_PARTIAL_MATCH
, orU_MATCH
. If incremental is false then U_PARTIAL_MATCH will not be returned.
-
toRule
public java.lang.String toRule(boolean escapeUnprintable)
Create a source string that represents this rule. Append it to the given string.
-
toString
public java.lang.String toString()
Return a string representation of this object.- Overrides:
toString
in classjava.lang.Object
- Returns:
- string representation of this object
-
addSourceTargetSet
void addSourceTargetSet(UnicodeSet filter, UnicodeSet sourceSet, UnicodeSet targetSet, UnicodeSet revisiting)
Find the source and target sets, subject to the input filter. There is a known issue with filters containing multiple characters.
-
-