Class SpoofChecker
- java.lang.Object
-
- com.ibm.icu.text.SpoofChecker
-
public class SpoofChecker extends java.lang.Object
This class, based on Unicode Technical Report #36 and Unicode Technical Standard #39, has two main functions:
- Checking whether two strings are visually confusable with each other, such as "desparejado" and "ԁеѕрагејаԁо".
- Checking whether an individual string is likely to be an attempt at confusing the reader (spoof detection), such as "pаypаl" spelled with Cyrillic 'а' characters.
Although originally designed as a method for flagging suspicious identifier strings such as URLs,
SpoofChecker
has a number of other practical use cases, such as preventing attempts to evade bad-word content filters.Confusables
The following example shows how to use
SpoofChecker
to check for confusability between two strings:SpoofChecker sc = new SpoofChecker.Builder().setChecks(SpoofChecker.CONFUSABLE).build(); int result = sc.areConfusable("desparejado", "ԁеѕрагејаԁо"); System.out.println(result != 0); // true
SpoofChecker
uses a builder paradigm: options are specified within the context of a lightweightSpoofChecker.Builder
object, and upon callingSpoofChecker.Builder.build()
, expensive data loading operations are performed, and an immutableSpoofChecker
is returned.The first line of the example creates a
SpoofChecker
object with confusable-checking enabled; the second line performs the confusability test. For best performance, the instance should be created once (e.g., upon application startup), and the more efficientareConfusable(java.lang.String, java.lang.String)
method can be used at runtime.If the paragraph direction used to display the strings is known, it should be passed to
areConfusable(java.lang.String, java.lang.String)
:// These strings look identical when rendered in a left-to-right context. // They look distinct in a right-to-left context. String s1 = "A1א"; // A1א String s2 = "Aא1"; // Aא1 SpoofChecker sc = new SpoofChecker.Builder().setChecks(SpoofChecker.CONFUSABLE).build(); int result = sc.areConfusable(Bidi.DIRECTION_LEFT_TO_RIGHT, s1, s2); System.out.println(result != 0); // true
UTS 39 defines two strings to be confusable if they map to the same skeleton. A skeleton is a sequence of families of confusable characters, where each family has a single exemplar character.
getSkeleton(java.lang.CharSequence)
computes the skeleton for a particular string, so the following snippet is equivalent to the example above:SpoofChecker sc = new SpoofChecker.Builder().setChecks(SpoofChecker.CONFUSABLE).build(); boolean result = sc.getSkeleton("desparejado").equals(sc.getSkeleton("ԁеѕрагејаԁо")); System.out.println(result); // true
If you need to check if a string is confusable with any string in a dictionary of many strings, rather than calling
areConfusable(java.lang.String, java.lang.String)
many times in a loop,getSkeleton(java.lang.CharSequence)
can be used instead, as shown below:// Setup: String[] DICTIONARY = new String[]{ "lorem", "ipsum" }; // example SpoofChecker sc = new SpoofChecker.Builder().setChecks(SpoofChecker.CONFUSABLE).build(); HashSet<String> skeletons = new HashSet<String>(); for (String word : DICTIONARY) { skeletons.add(sc.getSkeleton(word)); } // Live Check: boolean result = skeletons.contains(sc.getSkeleton("1orern")); System.out.println(result); // true
Note: Since the Unicode confusables mapping table is frequently updated, confusable skeletons are not guaranteed to be the same between ICU releases. We therefore recommend that you always compute confusable skeletons at runtime and do not rely on creating a permanent, or difficult to update, database of skeletons.
Spoof Detection
The following snippet shows a minimal example of using
SpoofChecker
to perform spoof detection on a string:SpoofChecker sc = new SpoofChecker.Builder() .setAllowedChars(SpoofChecker.RECOMMENDED.cloneAsThawed().addAll(SpoofChecker.INCLUSION)) .setRestrictionLevel(SpoofChecker.RestrictionLevel.MODERATELY_RESTRICTIVE) .setChecks(SpoofChecker.ALL_CHECKS &~ SpoofChecker.CONFUSABLE) .build(); boolean result = sc.failsChecks("pаypаl"); // with Cyrillic 'а' characters System.out.println(result); // true
As in the case for confusability checking, it is good practice to create one
SpoofChecker
instance at startup, and call the cheaperfailsChecks(java.lang.String, com.ibm.icu.text.SpoofChecker.CheckResult)
online. In the second line, we specify the set of allowed characters to be those with type RECOMMENDED or INCLUSION, according to the recommendation in UTS 39. In the third line, the CONFUSABLE checks are disabled. It is good practice to disable them if you won't be using the instance to perform confusability checking.To get more details on why a string failed the checks, use a
SpoofChecker.CheckResult
:SpoofChecker sc = new SpoofChecker.Builder() .setAllowedChars(SpoofChecker.RECOMMENDED.cloneAsThawed().addAll(SpoofChecker.INCLUSION)) .setRestrictionLevel(SpoofChecker.RestrictionLevel.MODERATELY_RESTRICTIVE) .setChecks(SpoofChecker.ALL_CHECKS &~ SpoofChecker.CONFUSABLE) .build(); SpoofChecker.CheckResult checkResult = new SpoofChecker.CheckResult(); boolean result = sc.failsChecks("pаypаl", checkResult); System.out.println(checkResult.checks); // 16
The return value is a bitmask of the checks that failed. In this case, there was one check that failed:
RESTRICTION_LEVEL
, corresponding to the fifth bit (16). The possible checks are:RESTRICTION_LEVEL
: flags strings that violate the Restriction Level test as specified in UTS 39; in most cases, this means flagging strings that contain characters from multiple different scripts.INVISIBLE
: flags strings that contain invisible characters, such as zero-width spaces, or character sequences that are likely not to display, such as multiple occurrences of the same non-spacing mark.CHAR_LIMIT
: flags strings that contain characters outside of a specified set of acceptable characters. SeeSpoofChecker.Builder.setAllowedChars(com.ibm.icu.text.UnicodeSet)
andSpoofChecker.Builder.setAllowedLocales(java.util.Set<com.ibm.icu.util.ULocale>)
.MIXED_NUMBERS
: flags strings that contain digits from multiple different numbering systems.
These checks can be enabled independently of each other. For example, if you were interested in checking for only the INVISIBLE and MIXED_NUMBERS conditions, you could do:
SpoofChecker sc = new SpoofChecker.Builder() .setChecks(SpoofChecker.INVISIBLE | SpoofChecker.MIXED_NUMBERS) .build(); boolean result = sc.failsChecks("৪8"); System.out.println(result); // true
Note: The Restriction Level is the most powerful of the checks. The full logic is documented in UTS 39, but the basic idea is that strings are restricted to contain characters from only a single script, except that most scripts are allowed to have Latin characters interspersed. Although the default restriction level is
HIGHLY_RESTRICTIVE
, it is recommended that users set their restriction level toMODERATELY_RESTRICTIVE
, which allows Latin mixed with all other scripts except Cyrillic, Greek, and Cherokee, with which it is often confusable. For more details on the levels, see UTS 39 orSpoofChecker.RestrictionLevel
. The Restriction Level test is aware of the set of allowed characters set inSpoofChecker.Builder.setAllowedChars(com.ibm.icu.text.UnicodeSet)
. Note that characters which have script code COMMON or INHERITED, such as numbers and punctuation, are ignored when computing whether a string has multiple scripts.Advanced bidirectional usage
If the paragraph direction with which the identifiers will be displayed is not known, there are multiple options for confusable detection depending on the circumstances.In some circumstances, the only concern is confusion between identifiers displayed with the same paragraph direction.
An example is the case where identifiers are usernames prefixed with the @ symbol. That symbol will appear to the left in a left-to-right context, and to the right in a right-to-left context, so that an identifier displayed in a left-to-right context can never be confused with an identifier displayed in a right-to-left context:
- The usernames "A1א" (A one aleph) and "Aא1" (A aleph 1) would be considered confusable, since they both appear as @A1א in a left-to-right context, and the usernames "אA_1" (aleph A underscore one) and "א1_A" (aleph one underscore A) would be considered confusable, since they both appear as A_1א@ in a right-to-left context.
- The username "Mark_" would not be considered confusable with the username "_Mark", even though the latter would appear as Mark_@ in a right-to-left context, and the former as @Mark_ in a left-to-right context.
In that case, the caller should check for both LTR-confusability and RTL-confusability:
boolean confusableInEitherDirection = sc.areConfusable(Bidi.DIRECTION_LEFT_TO_RIGHT, id1, id2) || sc.areConfusable(Bidi.DIRECTION_RIGHT_TO_LEFT, id1, id2);
In cases where confusability between the visual appearances of an identifier displayed in a left-to-right context with another identifier displayed in a right-to-left context is a concern, the LTR skeleton of one can be compared with the RTL skeleton of the other. However, this very broad definition of confusability may have unexpected results; for instance, it treats the ASCII identifiers "Mark_" and "_Mark" as confusable.
Additional Information
A
SpoofChecker
instance may be used repeatedly to perform checks on any number of identifiers.Thread Safety: The methods on
SpoofChecker
objects are thread safe. The test functions for checking a single identifier, or for testing whether two identifiers are potentially confusable, may called concurrently from multiple threads using the sameSpoofChecker
instance.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
SpoofChecker.Builder
SpoofChecker Builder.static class
SpoofChecker.CheckResult
A struct-like class to hold the results of a Spoof Check operation.private static class
SpoofChecker.ConfusableDataUtils
static class
SpoofChecker.RestrictionLevel
Constants from UTS 39 for use in setRestrictionLevel.(package private) static class
SpoofChecker.ScriptSet
private static class
SpoofChecker.SpoofData
-
Field Summary
Fields Modifier and Type Field Description static int
ALL_CHECKS
Enable all spoof checks.static int
ANY_CASE
Deprecated.ICU 58 Any case confusable mappings were removed from UTS 39; the corresponding ICU API was deprecated.(package private) static UnicodeSet
ASCII
static int
CHAR_LIMIT
Check that an identifier contains only characters from a specified set of acceptable characters.static int
CONFUSABLE
Enable this flag inSpoofChecker.Builder.setChecks(int)
to turn on all types of confusables.private UnicodeSet
fAllowedCharsSet
private java.util.Set<ULocale>
fAllowedLocales
private int
fChecks
private SpoofChecker.RestrictionLevel
fRestrictionLevel
private SpoofChecker.SpoofData
fSpoofData
static int
HIDDEN_OVERLAY
Check that an identifier does not have a combining character following a character in which that combining character would be hidden; for example 'i' followed by a U+0307 combining dot.static UnicodeSet
INCLUSION
Security Profile constant from UTS 39 for use inSpoofChecker.Builder.setAllowedChars(com.ibm.icu.text.UnicodeSet)
.static int
INVISIBLE
Check an identifier for the presence of invisible characters, such as zero-width spaces, or character sequences that are likely not to display, such as multiple occurrences of the same non-spacing mark.static int
MIXED_NUMBERS
Check that an identifier does not mix numbers from different numbering systems.static int
MIXED_SCRIPT_CONFUSABLE
When performing the two-stringareConfusable(java.lang.String, java.lang.String)
test, this flag in the return value indicates that the two strings are visually confusable and that they are not from the same script, according to UTS 39 section 4.private static Normalizer2
nfdNormalizer
static UnicodeSet
RECOMMENDED
Security Profile constant from UTS 39 for use inSpoofChecker.Builder.setAllowedChars(com.ibm.icu.text.UnicodeSet)
.static int
RESTRICTION_LEVEL
Check that an identifier satisfies the requirements for the restriction level specified inSpoofChecker.Builder.setRestrictionLevel(com.ibm.icu.text.SpoofChecker.RestrictionLevel)
.static int
SINGLE_SCRIPT
Deprecated.ICU 51 Use RESTRICTION_LEVELstatic int
SINGLE_SCRIPT_CONFUSABLE
When performing the two-stringareConfusable(java.lang.String, java.lang.String)
test, this flag in the return value indicates that the two strings are visually confusable and that they are from the same script, according to UTS 39 section 4.static int
WHOLE_SCRIPT_CONFUSABLE
When performing the two-stringareConfusable(java.lang.String, java.lang.String)
test, this flag in the return value indicates that the two strings are visually confusable and that they are not from the same script but both of them are single-script strings, according to UTS 39 section 4.
-
Constructor Summary
Constructors Modifier Constructor Description private
SpoofChecker()
private constructor: a SpoofChecker has to be built by the builder
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description int
areConfusable(int direction, java.lang.CharSequence s1, java.lang.CharSequence s2)
Check whether two specified strings are visually when displayed in a paragraph with the given direction.int
areConfusable(java.lang.String s1, java.lang.String s2)
Check whether two specified strings are visually confusable.boolean
equals(java.lang.Object other)
Equality function.boolean
failsChecks(java.lang.String text)
Check the specified string for possible security issues.boolean
failsChecks(java.lang.String text, SpoofChecker.CheckResult checkResult)
Check the specified string for possible security issues.(package private) int
findHiddenOverlay(java.lang.String input)
UnicodeSet
getAllowedChars()
Get a UnicodeSet for the characters permitted in an identifier.java.util.Set<java.util.Locale>
getAllowedJavaLocales()
Get a set ofLocale
instances for the scripts that are acceptable in strings to be checked.java.util.Set<ULocale>
getAllowedLocales()
Get a read-only set of locales for the scripts that are acceptable in strings to be checked.private static void
getAugmentedScriptSet(int codePoint, SpoofChecker.ScriptSet result)
Computes the augmented script set for a code point, according to UTS 39 section 5.1.java.lang.String
getBidiSkeleton(int direction, java.lang.CharSequence str)
Get the "bidiSkeleton" for an identifier string and a direction.int
getChecks()
Get the set of checks that this Spoof Checker has been configured to perform.private void
getNumerics(java.lang.String input, UnicodeSet result)
Computes the set of numerics for a string, according to UTS 39 section 5.3.private void
getResolvedScriptSet(java.lang.CharSequence input, SpoofChecker.ScriptSet result)
Computes the resolved script set for a string, according to UTS 39 section 5.1.private void
getResolvedScriptSetWithout(java.lang.CharSequence input, int script, SpoofChecker.ScriptSet result)
Computes the resolved script set for a string, omitting characters having the specified script.SpoofChecker.RestrictionLevel
getRestrictionLevel()
Deprecated.This API is ICU internal only.private SpoofChecker.RestrictionLevel
getRestrictionLevel(java.lang.String input)
Computes the restriction level of a string, according to UTS 39 section 5.2.java.lang.String
getSkeleton(int type, java.lang.String id)
Deprecated.ICU 58java.lang.String
getSkeleton(java.lang.CharSequence str)
Get the "skeleton" for an identifier string.int
hashCode()
OverridesObject.hashCode()
.(package private) boolean
isIllegalCombiningDotLeadCharacter(int cp, java.lang.StringBuilder sb)
(package private) boolean
isIllegalCombiningDotLeadCharacterNoLookup(int cp)
-
-
-
Field Detail
-
INCLUSION
public static final UnicodeSet INCLUSION
Security Profile constant from UTS 39 for use inSpoofChecker.Builder.setAllowedChars(com.ibm.icu.text.UnicodeSet)
.
-
RECOMMENDED
public static final UnicodeSet RECOMMENDED
Security Profile constant from UTS 39 for use inSpoofChecker.Builder.setAllowedChars(com.ibm.icu.text.UnicodeSet)
.
-
SINGLE_SCRIPT_CONFUSABLE
public static final int SINGLE_SCRIPT_CONFUSABLE
When performing the two-stringareConfusable(java.lang.String, java.lang.String)
test, this flag in the return value indicates that the two strings are visually confusable and that they are from the same script, according to UTS 39 section 4.- See Also:
- Constant Field Values
-
MIXED_SCRIPT_CONFUSABLE
public static final int MIXED_SCRIPT_CONFUSABLE
When performing the two-stringareConfusable(java.lang.String, java.lang.String)
test, this flag in the return value indicates that the two strings are visually confusable and that they are not from the same script, according to UTS 39 section 4.- See Also:
- Constant Field Values
-
WHOLE_SCRIPT_CONFUSABLE
public static final int WHOLE_SCRIPT_CONFUSABLE
When performing the two-stringareConfusable(java.lang.String, java.lang.String)
test, this flag in the return value indicates that the two strings are visually confusable and that they are not from the same script but both of them are single-script strings, according to UTS 39 section 4.- See Also:
- Constant Field Values
-
CONFUSABLE
public static final int CONFUSABLE
Enable this flag inSpoofChecker.Builder.setChecks(int)
to turn on all types of confusables. You may set the checks to some subset of SINGLE_SCRIPT_CONFUSABLE, MIXED_SCRIPT_CONFUSABLE, or WHOLE_SCRIPT_CONFUSABLE to makeareConfusable(java.lang.String, java.lang.String)
return only those types of confusables.- See Also:
- Constant Field Values
-
ANY_CASE
@Deprecated public static final int ANY_CASE
Deprecated.ICU 58 Any case confusable mappings were removed from UTS 39; the corresponding ICU API was deprecated.This flag is deprecated and no longer affects the behavior of SpoofChecker.- See Also:
- Constant Field Values
-
RESTRICTION_LEVEL
public static final int RESTRICTION_LEVEL
Check that an identifier satisfies the requirements for the restriction level specified inSpoofChecker.Builder.setRestrictionLevel(com.ibm.icu.text.SpoofChecker.RestrictionLevel)
. The default restriction level isSpoofChecker.RestrictionLevel.HIGHLY_RESTRICTIVE
.- See Also:
- Constant Field Values
-
SINGLE_SCRIPT
@Deprecated public static final int SINGLE_SCRIPT
Deprecated.ICU 51 Use RESTRICTION_LEVELCheck that an identifier contains only characters from a single script (plus chars from the common and inherited scripts.) Applies to checks of a single identifier check only.- See Also:
- Constant Field Values
-
INVISIBLE
public static final int INVISIBLE
Check an identifier for the presence of invisible characters, such as zero-width spaces, or character sequences that are likely not to display, such as multiple occurrences of the same non-spacing mark. This check does not test the input string as a whole for conformance to any particular syntax for identifiers.- See Also:
- Constant Field Values
-
CHAR_LIMIT
public static final int CHAR_LIMIT
Check that an identifier contains only characters from a specified set of acceptable characters. SeeSpoofChecker.Builder.setAllowedChars(com.ibm.icu.text.UnicodeSet)
andSpoofChecker.Builder.setAllowedLocales(java.util.Set<com.ibm.icu.util.ULocale>)
. Note that a string that fails this check will also fail theRESTRICTION_LEVEL
check.- See Also:
- Constant Field Values
-
MIXED_NUMBERS
public static final int MIXED_NUMBERS
Check that an identifier does not mix numbers from different numbering systems. For more information, see UTS 39 section 5.3.- See Also:
- Constant Field Values
-
HIDDEN_OVERLAY
public static final int HIDDEN_OVERLAY
Check that an identifier does not have a combining character following a character in which that combining character would be hidden; for example 'i' followed by a U+0307 combining dot.More specifically, the following characters are forbidden from preceding a U+0307:
- Those with the Soft_Dotted Unicode property (which includes 'i' and 'j')
- Latin lowercase letter 'l'
- Dotless 'i' and 'j' ('ı' and 'ȷ', U+0131 and U+0237)
- Any character whose confusable prototype ends with such a character (Soft_Dotted, 'l', 'ı', or 'ȷ')
This list and the number of combing characters considered by this check may grow over time.
- See Also:
- Constant Field Values
-
ALL_CHECKS
public static final int ALL_CHECKS
Enable all spoof checks.- See Also:
- Constant Field Values
-
ASCII
static final UnicodeSet ASCII
-
fChecks
private int fChecks
-
fSpoofData
private SpoofChecker.SpoofData fSpoofData
-
fAllowedLocales
private java.util.Set<ULocale> fAllowedLocales
-
fAllowedCharsSet
private UnicodeSet fAllowedCharsSet
-
fRestrictionLevel
private SpoofChecker.RestrictionLevel fRestrictionLevel
-
nfdNormalizer
private static Normalizer2 nfdNormalizer
-
-
Method Detail
-
getRestrictionLevel
@Deprecated public SpoofChecker.RestrictionLevel getRestrictionLevel()
Deprecated.This API is ICU internal only.Get the Restriction Level that is being tested.- Returns:
- The restriction level
-
getChecks
public int getChecks()
Get the set of checks that this Spoof Checker has been configured to perform.- Returns:
- The set of checks that this spoof checker will perform.
-
getAllowedLocales
public java.util.Set<ULocale> getAllowedLocales()
Get a read-only set of locales for the scripts that are acceptable in strings to be checked. If no limitations on scripts have been specified, an empty set will be returned. setAllowedChars() will reset the list of allowed locales to be empty. The returned set may not be identical to the originally specified set that is supplied to setAllowedLocales(); the information other than languages from the originally specified locales may be omitted.- Returns:
- A set of locales corresponding to the acceptable scripts.
-
getAllowedJavaLocales
public java.util.Set<java.util.Locale> getAllowedJavaLocales()
Get a set ofLocale
instances for the scripts that are acceptable in strings to be checked. If no limitations on scripts have been specified, an empty set will be returned.- Returns:
- A set of locales corresponding to the acceptable scripts.
-
getAllowedChars
public UnicodeSet getAllowedChars()
Get a UnicodeSet for the characters permitted in an identifier. This corresponds to the limits imposed by the Set Allowed Characters functions. Limitations imposed by other checks will not be reflected in the set returned by this function. The returned set will be frozen, meaning that it cannot be modified by the caller.- Returns:
- A UnicodeSet containing the characters that are permitted by the CHAR_LIMIT test.
-
failsChecks
public boolean failsChecks(java.lang.String text, SpoofChecker.CheckResult checkResult)
Check the specified string for possible security issues. The text to be checked will typically be an identifier of some sort. The set of checks to be performed was specified when building the SpoofChecker.- Parameters:
text
- A String to be checked for possible security issues.checkResult
- Output parameter, indicates which specific tests failed. May be null if the information is not wanted.- Returns:
- True there any issue is found with the input string.
-
failsChecks
public boolean failsChecks(java.lang.String text)
Check the specified string for possible security issues. The text to be checked will typically be an identifier of some sort. The set of checks to be performed was specified when building the SpoofChecker.- Parameters:
text
- A String to be checked for possible security issues.- Returns:
- True there any issue is found with the input string.
-
areConfusable
public int areConfusable(java.lang.String s1, java.lang.String s2)
Check whether two specified strings are visually confusable. The types of confusability to be tested - single script, mixed script, or whole script - are determined by the check options set for the SpoofChecker. The tests to be performed are controlled by the flags SINGLE_SCRIPT_CONFUSABLE MIXED_SCRIPT_CONFUSABLE WHOLE_SCRIPT_CONFUSABLE At least one of these tests must be selected. ANY_CASE is a modifier for the tests. Select it if the identifiers may be of mixed case. If identifiers are case folded for comparison and display to the user, do not select the ANY_CASE option.- Parameters:
s1
- The first of the two strings to be compared for confusability.s2
- The second of the two strings to be compared for confusability.- Returns:
- Non-zero if s1 and s1 are confusable. If not 0, the value will indicate the type(s) of confusability found, as defined by spoof check test constants.
-
areConfusable
public int areConfusable(int direction, java.lang.CharSequence s1, java.lang.CharSequence s2)
Check whether two specified strings are visually when displayed in a paragraph with the given direction. The types of confusability to be tested—single script, mixed script, or whole script—are determined by the check options set for the SpoofChecker. The tests to be performed are controlled by the flags SINGLE_SCRIPT_CONFUSABLE MIXED_SCRIPT_CONFUSABLE WHOLE_SCRIPT_CONFUSABLE At least one of these tests must be selected. ANY_CASE is a modifier for the tests. Select it if the identifiers may be of mixed case. If identifiers are case folded for comparison and display to the user, do not select the ANY_CASE option.- Parameters:
direction
- The paragraph direction with which the identifiers are displayed. Must be eitherBidi.DIRECTION_LEFT_TO_RIGHT
orBidi.DIRECTION_RIGHT_TO_LEFT
.s1
- The first of the two strings to be compared for confusability.s2
- The second of the two strings to be compared for confusability.- Returns:
- Non-zero if s1 and s1 are confusable. If not 0, the value will indicate the type(s) of confusability found, as defined by spoof check test constants.
-
getBidiSkeleton
public java.lang.String getBidiSkeleton(int direction, java.lang.CharSequence str)
Get the "bidiSkeleton" for an identifier string and a direction. Skeletons are a transformation of the input string; Two identifiers are LTR-confusable if their LTR bidiSkeletons are identical; they are RTL-confusable if their RTL bidiSkeletons are identical. See Unicode Technical Standard #39 for additional information: https://www.unicode.org/reports/tr39/#Confusable_Detection. Using skeletons directly makes it possible to quickly check whether an identifier is confusable with any of some large set of existing identifiers, by creating an efficiently searchable collection of the skeletons. Skeletons are computed using the algorithm and data described in UTS #39.- Parameters:
direction
- The paragraph direction with which the string is displayed. Must be eitherBidi.DIRECTION_LEFT_TO_RIGHT
orBidi.DIRECTION_RIGHT_TO_LEFT
.str
- The input string whose bidiSkeleton will be generated.- Returns:
- The output skeleton string.
-
getSkeleton
public java.lang.String getSkeleton(java.lang.CharSequence str)
Get the "skeleton" for an identifier string. Skeletons are a transformation of the input string; Two strings are confusable if their skeletons are identical. See Unicode UAX 39 for additional information. Using skeletons directly makes it possible to quickly check whether an identifier is confusable with any of some large set of existing identifiers, by creating an efficiently searchable collection of the skeletons. Skeletons are computed using the algorithm and data described in Unicode UAX 39.- Parameters:
str
- The input string whose skeleton will be generated.- Returns:
- The output skeleton string.
-
getSkeleton
@Deprecated public java.lang.String getSkeleton(int type, java.lang.String id)
Deprecated.ICU 58CallsgetSkeleton(CharSequence id)
. Starting with ICU 55, the "type" parameter has been ignored, and starting with ICU 58, this function has been deprecated.- Parameters:
type
- No longer supported. Prior to ICU 55, was used to specify the mapping table SL, SA, ML, or MA.id
- The input identifier whose skeleton will be generated.- Returns:
- The output skeleton string.
-
equals
public boolean equals(java.lang.Object other)
Equality function. Return true if the two SpoofChecker objects incorporate the same confusable data and have enabled the same set of checks.- Overrides:
equals
in classjava.lang.Object
- Parameters:
other
- the SpoofChecker being compared with.- Returns:
- true if the two SpoofCheckers are equal.
-
hashCode
public int hashCode()
OverridesObject.hashCode()
.- Overrides:
hashCode
in classjava.lang.Object
-
getAugmentedScriptSet
private static void getAugmentedScriptSet(int codePoint, SpoofChecker.ScriptSet result)
Computes the augmented script set for a code point, according to UTS 39 section 5.1.
-
getResolvedScriptSet
private void getResolvedScriptSet(java.lang.CharSequence input, SpoofChecker.ScriptSet result)
Computes the resolved script set for a string, according to UTS 39 section 5.1.
-
getResolvedScriptSetWithout
private void getResolvedScriptSetWithout(java.lang.CharSequence input, int script, SpoofChecker.ScriptSet result)
Computes the resolved script set for a string, omitting characters having the specified script. If UScript.CODE_LIMIT is passed as the second argument, all characters are included.
-
getNumerics
private void getNumerics(java.lang.String input, UnicodeSet result)
Computes the set of numerics for a string, according to UTS 39 section 5.3.
-
getRestrictionLevel
private SpoofChecker.RestrictionLevel getRestrictionLevel(java.lang.String input)
Computes the restriction level of a string, according to UTS 39 section 5.2.
-
findHiddenOverlay
int findHiddenOverlay(java.lang.String input)
-
isIllegalCombiningDotLeadCharacterNoLookup
boolean isIllegalCombiningDotLeadCharacterNoLookup(int cp)
-
isIllegalCombiningDotLeadCharacter
boolean isIllegalCombiningDotLeadCharacter(int cp, java.lang.StringBuilder sb)
-
-