Class UnicodeBidiAlgorithm
java.lang.Object
org.apache.fop.complexscripts.bidi.UnicodeBidiAlgorithm
- All Implemented Interfaces:
BidiConstants
The UnicodeBidiAlgorithm
class implements functionality prescribed by
the Unicode Bidirectional Algorithm, Unicode Standard Annex #9.
This work was originally authored by Glenn Adams (gadams@apache.org).
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final org.apache.commons.logging.Log
logging instance -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprivate static int
convertToScalar
(int chHi, int chLo) Convert UTF-16 surrogate pair to unicode scalar valuee.private static boolean
convertToScalar
(CharSequence cs, int[] chars) Convert character sequence (a UTF-16 encoded string) to an array of unicode scalar values expressed as integers.private static int[]
copySequence
(int[] ta) private static int
directionOfLevel
(int level) private static void
private static int
findNextNonRetainedFormattingLevel
(int[] wca, int[] ea, int start, int lPrev) private static int[]
getClasses
(int[] chars) private static String
getClassName
(int bc) private static int
getLevelRunLength
(int[] ea, int start) private static int
getRetainedFormattingRunLength
(int[] wca, int start) private static boolean
isNeutral
(int bc) private static boolean
isRetainedFormatting
(int bc) private static boolean
isRetainedFormatting
(int[] ca, int s, int e) private static boolean
isStrong
(int bc) private static int
levelOfEmbedding
(int embedding) private static int[]
levelsFromEmbeddings
(int[] ea, int[] la) private static int
max
(int x, int y) private static String
padLeft
(int n, int width) private static String
private static String
private static void
resolveAdjacentBoundaryNeutrals
(int[] wca, int start, int end, int index, int bcNew) private static void
resolveExplicit
(int[] wca, int defaultLevel, int[] ea) private static void
resolveImplicit
(int[] wca, int defaultLevel, int[] ea, int[] la, int start, int end, int level, int sor, int eor) static int[]
resolveLevels
(int[] chars, int[] classes, int defaultLevel, int[] levels, boolean useRuleL1) Resolve the directionality levels of each character in a character seqeunce.static int[]
resolveLevels
(int[] chars, int defaultLevel, int[] levels) Resolve the directionality levels of each character in a character seqeunce.static int[]
resolveLevels
(CharSequence cs, Direction defaultLevel) Resolve the directionality levels of each character in a character seqeunce.private static void
resolveNeutrals
(int[] wca, int defaultLevel, int[] ea, int[] la, int start, int end, int level, int sor, int eor) private static int
resolveRun
(int[] wca, int defaultLevel, int[] ea, int[] la, int start, int end, int level, int levelPrev) private static void
resolveRuns
(int[] wca, int defaultLevel, int[] ea, int[] la) private static void
resolveSeparators
(int[] ica, int[] wca, int dl, int[] la) Resolve separators and boundary neutral levels to account for UAX#9 3.4 L1 while taking into account retention of formatting codes (5.2).private static void
resolveWeak
(int[] wca, int defaultLevel, int[] ea, int[] la, int start, int end, int level, int sor, int eor) private static boolean
startsWithRetainedFormattingRun
(int[] wca, int[] ea, int start) private static boolean
triggersBidi
(int ch) Determine of character CH triggers bidirectional processing.
-
Field Details
-
log
private static final org.apache.commons.logging.Log loglogging instance
-
-
Constructor Details
-
UnicodeBidiAlgorithm
private UnicodeBidiAlgorithm()
-
-
Method Details
-
resolveLevels
Resolve the directionality levels of each character in a character seqeunce. If some character is encoded in the character sequence as a Unicode Surrogate Pair, then the directionality level of each of the two members of the pair will be identical.- Parameters:
cs
- input character sequence representing a UTF-16 encoded stringdefaultLevel
- the default paragraph level, which must be zero (LR) or one (RL)- Returns:
- null if bidirectional processing is not required; otherwise, returns an array of integers, where each integer corresponds to exactly one UTF-16 encoding element present in the input character sequence, and where each integer denotes the directionality level of the corresponding encoding element
-
resolveLevels
public static int[] resolveLevels(int[] chars, int defaultLevel, int[] levels) Resolve the directionality levels of each character in a character seqeunce.- Parameters:
chars
- array of input characters represented as unicode scalar valuesdefaultLevel
- the default paragraph level, which must be zero (LR) or one (RL)levels
- array to receive levels, one for each character in chars array- Returns:
- null if bidirectional processing is not required; otherwise, returns an array of integers, where each integer corresponds to exactly one UTF-16 encoding element present in the input character sequence, and where each integer denotes the directionality level of the corresponding encoding element
-
resolveLevels
public static int[] resolveLevels(int[] chars, int[] classes, int defaultLevel, int[] levels, boolean useRuleL1) Resolve the directionality levels of each character in a character seqeunce.- Parameters:
chars
- array of input characters represented as unicode scalar valuesclasses
- array containing one bidi class per character in chars arraydefaultLevel
- the default paragraph level, which must be zero (LR) or one (RL)levels
- array to receive levels, one for each character in chars arrayuseRuleL1
- true if rule L1 should be used- Returns:
- null if bidirectional processing is not required; otherwise, returns an array of integers, where each integer corresponds to exactly one UTF-16 encoding element present in the input character sequence, and where each integer denotes the directionality level of the corresponding encoding element
-
copySequence
private static int[] copySequence(int[] ta) -
resolveExplicit
private static void resolveExplicit(int[] wca, int defaultLevel, int[] ea) -
directionOfLevel
private static int directionOfLevel(int level) -
levelOfEmbedding
private static int levelOfEmbedding(int embedding) -
levelsFromEmbeddings
private static int[] levelsFromEmbeddings(int[] ea, int[] la) -
resolveRuns
private static void resolveRuns(int[] wca, int defaultLevel, int[] ea, int[] la) -
findNextNonRetainedFormattingLevel
private static int findNextNonRetainedFormattingLevel(int[] wca, int[] ea, int start, int lPrev) -
getLevelRunLength
private static int getLevelRunLength(int[] ea, int start) -
startsWithRetainedFormattingRun
private static boolean startsWithRetainedFormattingRun(int[] wca, int[] ea, int start) -
getRetainedFormattingRunLength
private static int getRetainedFormattingRunLength(int[] wca, int start) -
resolveRun
private static int resolveRun(int[] wca, int defaultLevel, int[] ea, int[] la, int start, int end, int level, int levelPrev) -
resolveWeak
private static void resolveWeak(int[] wca, int defaultLevel, int[] ea, int[] la, int start, int end, int level, int sor, int eor) -
resolveNeutrals
private static void resolveNeutrals(int[] wca, int defaultLevel, int[] ea, int[] la, int start, int end, int level, int sor, int eor) -
resolveAdjacentBoundaryNeutrals
private static void resolveAdjacentBoundaryNeutrals(int[] wca, int start, int end, int index, int bcNew) -
resolveImplicit
private static void resolveImplicit(int[] wca, int defaultLevel, int[] ea, int[] la, int start, int end, int level, int sor, int eor) -
resolveSeparators
private static void resolveSeparators(int[] ica, int[] wca, int dl, int[] la) Resolve separators and boundary neutral levels to account for UAX#9 3.4 L1 while taking into account retention of formatting codes (5.2).- Parameters:
ica
- original input class array (sequence)wca
- working copy of original intput class array (sequence), as modified by prior stepsdl
- default paragraph levella
- array of output levels to be adjusted, as produced by bidi algorithm
-
isStrong
private static boolean isStrong(int bc) -
isNeutral
private static boolean isNeutral(int bc) -
isRetainedFormatting
private static boolean isRetainedFormatting(int bc) -
isRetainedFormatting
private static boolean isRetainedFormatting(int[] ca, int s, int e) -
max
private static int max(int x, int y) -
getClasses
private static int[] getClasses(int[] chars) -
convertToScalar
private static boolean convertToScalar(CharSequence cs, int[] chars) throws IllegalArgumentException Convert character sequence (a UTF-16 encoded string) to an array of unicode scalar values expressed as integers. If a valid UTF-16 surrogate pair is encountered, it is converted to two integers, the first being the equivalent unicode scalar value, and the second being negative one (-1). This special mechanism is used to track the use of surrogate pairs while working with unicode scalar values, and permits maintaining indices that apply both to the input UTF-16 and out scalar value sequences.- Parameters:
cs
- a UTF-16 encoded character sequencechars
- an integer array to accept the converted scalar values, where the length of the array must be the same as the length of the input character sequence- Returns:
- a boolean indicating that content is present that triggers bidirectional processing
- Throws:
IllegalArgumentException
- if the input sequence is not a valid UTF-16 string, e.g., if it contains an isolated UTF-16 surrogate
-
convertToScalar
private static int convertToScalar(int chHi, int chLo) Convert UTF-16 surrogate pair to unicode scalar valuee.- Parameters:
chHi
- high (most significant or first) surrogatechLo
- low (least significant or second) surrogate- Returns:
- a unicode scalar value
- Throws:
IllegalArgumentException
- if one of the input surrogates is not valid
-
triggersBidi
private static boolean triggersBidi(int ch) Determine of character CH triggers bidirectional processing. Bidirectional processing is deemed triggerable if CH is a strong right-to-left character, an arabic letter or number, or is a right-to-left embedding or override character.- Parameters:
ch
- a unicode scalar value- Returns:
- true if character triggers bidirectional processing
-
dump
-
getClassName
-
padLeft
-
padLeft
-
padRight
-