metadata {
authority_id: odni id: 2017 language: ics-630-01:ara source_script: Arab destination_script: Latn name: ICS-630-01 Romanization of Arabic Personal Names (2015) source: ICS-630-01 Annex A creation_date: 2017 confirmation_date: 2018-06 description: | This system, adapted from the Board on Geographic Names, is the Intelligence Community (IC) standard for the transliteration of Arabic names that will be applied to all final written reports and products for IC consumers. It is not intended to eliminate variations of a name that can contribute forensic information. Rather, it is to provide an IC standard Romanized (English) transliteration from modern standard Arabic that can then be linked to forensic information in ways that will help identify the referent of the name. Ambiguities can result from the Romanization of Arabic names because the Arabic source generally omits short vowel markings, double consonant marks, and other diacritics that would clearly distinguish the name. Linguists use their experience with the language and aids such as on-line tools and name dictionaries to determine the exact Arabic and the appropriate transliteration into the Roman alphabet. In cases where an individual's name has already been transliterated, that is to be indicated -- as found -- in parentheses immediately following its rendition in the transliteration standard (e.g., Muhammad Khulud ( Mohamed Khulood)). In addition, if the original Arabic- script spelling is known, that spelling should also appear in parentheses following the name, if possible, following best practices of the issuing organization and taking into consideration information system capabilities. This convention is designed to ensure that vital forensic information is not lost. For names of persons who are known to not be part of the Arabic-speaking community, use the relevant IC transliteration standard for names from that language (e.g., Mikhail, Yitzhak). A translator’s note may be used to clarify the known origin of the person. Spell names of individuals from languages that are written in Roman letters as they are spelled in those languages (e.g., George Clooney, Jorge Garcia, Georges Pompidou). In the case of active senior government officials in the on-line CIA World Factbook and the online directory of Chiefs of State and Cabinet Members of Foreign Governments, the spellings given in these on-line reference works should be used in place of the IC Standard. For any individual who has at one time been listed in the Factbook or Chiefs of State directory but who no longer appears in those resources (i.e. is no longer a government official), the IC Standard spelling should appear first, with the spelling, if known, as it previously appeared in those resources listed within parentheses at the first usage. The primary goal of this system is to produce a consistent Romanized transcription of the name that is readable to the non- specialist. The system uses the 26 letters of the standard ( English) Roman alphabet plus the apostrophe. Some ambiguities in the Romanized form will occur without the use of diacritics. However, within the context of a report, where additional information about the individual is provided, the referent will be clearly identified. This system will be used in conjunction with on-line tools, name dictionaries, and lists containing conventional spellings of names of well-known individuals. notes: | - Long/Short Vowels: Long and short vowels are not distinguished in this system Samir (could be Saamir or Samiir in Arabic). - Double consonants: Double consonants represented by the Arabic shaddah are shown in most cases (e.g., Hassan, Muhammad). Exceptions: ’ayn and consonants represented by digraphs are not doubled (e.g., al-Qadhafi [not alQadhdhafi], Mubashir [not Mubashshir]). - Hamzah (glottal stop): The hamzah is represented by an apostrophe (’). Note that this is the same symbol used to represent another consonant, the ’ayn. - Ta’ marbutah (feminine ending marker): On the construct form or when pronounced “t”, it is represented with a roman t. In all other cases, it is represented with an h. - Digraphs: No distinction is made between digraphs such as sh and single contiguous letters (e.g., s followed by h). - Definite article “al” (‘the’): Follows Arabic spelling rather than pronunciation. That is, sun letter assimilation is not shown in the Romanized form (e.g., ’Abd-alRahman, not ’Abd-ar-Rahman). - Diphthongs: the second element of the diphthong is represented by a y or a w (rather than an i or a u): Haytham, Faysal, Tawfiq, Rawdah. - Hyphens: Hyphens (-) are used to connect name elements within a name: ’Abd- al Rahman, Abu-al-Bashar, Bin-Ladin. Exceptions: Names that incorporate “Allah” as part of the name (e.g., ’Abdallah, Nasrallah), names marked by the lineage/family marker “Al” (e.g., Al Thani) are not hyphenated. - The definite article, “al”, within name phrases, is Romanized as al and not as ul: Nur-al-Din (not Nur-ul-Din). It is not capitalized when name-initial. - Names that incorporate Allah as part of the name retain the a of Allah rather than a grammatical marker u: ’Abdallah ( not ’Abdullah). - Foreign names borrowed or appearing in Arabic are spelled according to the standard Western tradition: Georges, Michel. However, names of non-Arabic origin no longer considered foreign by Arabic speakers follow the IC conventions: Butrus (not Peter). - Prefix بن (bin ‘son of’) is Romanized Bin unless written with an alif, in which case it is Romanized as Ibn. The colloquial form Bu (‘father’) should not be standardized as Abu. These prefixes are capitalized. - In general, Romanization follows the Modern Standard Arabic (MSA) form rather than local pronunciation standards. For example, the letter ج (jim) is represented as a j even when pronounced as a “g” (e.g., Egyptian Gamal is Romanized as Jamal).
}
tests {
test "مِصر", "Miṣr" test "قَطَر", "Qaṭar" test "المَغرِب", "Al Maghrib" test "الجُمهُورِيَّة العِراقِيَّة", "Al Jumhuriyah al ’Iraqiyah" test "جُمهُورِيَّة العِراق", "Jumhuriyat al ’Iraq" test "جُمهُورِيَّة مِصر العَرَبِيَّة", "Jumhuriyat Miṣr al ’Arabiyah" test "بَغداد", "Baghdad" test "تُونِس", "Tunis" test "حَسّان", "Hassan" test "مُحَمَّد", "Muhammad" test "القَذَّافِي", "Al Qadhafi" test "مُبَشِّر", "Mubashir" test "الجَزائِر", "Al Jaza’ir" test "عَبدالرَحمَن", "’Abd al Rahman" test "هَيْثَم", "Haytham" test "فَيْصَل", "Fayṣal" test "تَوْفِيق", "Tawfiq" test "رَوْضَة", "Rawḍah" test "نُورُالدِين", "Nur al Din" test "عَبدُاللَّه", "’Abdallah"
}
stage {
# CHARACTERS parallel { # Tool used for Unicode finding: # https://www.branah.com/unicode-converter # pointing sub "\u064e", "a" # َ fatha sub "\u064e", "", after: "\u0629" # َ fatha followed by ta' marboota sub "\u064e", "", after: "a" + any("ht") # َ fatha followed by ta' marboota, handling different order of conversion sub "\u0650", "i" # ِ kasra sub "\u064f", "u" # ُ damma sub "\u0652", "" # ْ sokoon, see note A below sub "\u0650\u064a", "i" # ـِي kasra followed by ي sub "\u0650\u064a\u0651\u064e", "iy" # ـِيَّ sub "\u0650\u064a", "iy", after: any(["\u064e", "u064f"]) # ـِي kasra followed by ي sub "\u064f\u0648", "u" # ـُو damma followed by و sub "\u064e\u0627", "a" # ـَا fatha followed by ا sub "\u064e\u0649", "á" # ـَى fatha followed by ى which is ا not ي sub "\u064e\u0648\u0652", "aw" # ـَوْ sub "\u064e\u064a\u0652", "ay" # ـَيْ sub "\u0622", "a" # آ # ta' marboota sub "\u0629", "at" # ة in the middle of the sentence sub "\u0629" + line_end, "ah" sub "\u0629", "ah", before: boundary + "\u0627\u0644" + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") sub "\u0629", "ah", before: boundary + "\u0627\u0644" + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") sub "\u0629", "ah", before: boundary + "\u0627\u0644" + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") sub "\u0629", "ah", before: boundary + "\u0627\u0644" + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") sub "\u0629", "ah", before: boundary + "\u0627\u0644" + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") sub "\u0629", "ah", before: boundary + "\u0627\u0644" + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") sub "\u0629", "ah", before: boundary + "\u0627\u0644" + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") sub "\u0629", "ah", before: boundary + "\u0627\u0644" + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") sub "\u0629", "ah", before: boundary + "\u0627\u0644" + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") sub "\u0629", "ah", before: boundary + "\u0627\u0644" + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") sub "\u0629", "ah", before: boundary + "\u0627\u0644" + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") sub "\u0629", "ah", before: boundary + "\u0627\u0644" + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") # shadda sub "\u0628\u0651", "bb" # ب sub "\u062a\u0651", "tt" # ت sub "\u062b\u0651", "th" # ث sub "\u062c\u0651", "jj" # ج sub "\u062d\u0651", "hh" # ح sub "\u062e\u0651", "kh" # خ sub "\u062f\u0651", "dd" # د sub "\u0630\u0651", "dh" # ذ sub "\u0631\u0651", "rr" # ر sub "\u0632\u0651", "zz" # ز sub "\u0633\u0651", "ss" # س sub "\u0634\u0651", "sh" # ش sub "\u0635\u0651", "ṣṣ" # ص sub "\u0636\u0651", "ḍḍ" # ض sub "\u0637\u0651", "ṭṭ" # ط sub "\u0638\u0651", "ẓẓ" # ظ sub "\u063a\u0651", "gh" # غ sub "\u0641\u0651", "ff" # ف sub "\u0642\u0651", "qq" # ق sub "\u0643\u0651", "kk" # ك sub "\u0644\u0651", "ll" # ل sub "\u0645\u0651", "mm" # م sub "\u0646\u0651", "nn" # ن sub "\u0647\u0651", "hh" # ه sub "\u0648\u0651", "ww" # و sub "\u064a\u0651", "yy" # ي sub "\u0626", "’" # ئ sub boundary + "\u0627\u0644\u0644\u0651\u064e\u0647", "Allah" sub non_word_boundary + maybe("\u064f") + "\u0627\u0644\u0644\u0651\u064e\u0647", "allah" sub "\u0621", any(["’", ""]) # ء sub boundary + "\u0627\u0644", "al " # ال sub non_word_boundary + maybe("\u064f") + "\u0627\u0644", " al " # ال in middle of composite name # '\uFE8E' : '' # ﺎ sub "\u0623", "" # أ sub boundary + "\u0627", "" # ا sub "\u0627", "a" # ا sub "\u0628", "b" # ب sub "\u062a", "t" # ت sub "\u062b", "th" # ث sub "\u062c", "j" # ج sub "\u062d", "h" # ح sub "\u062e", "kh" # خ sub "\u062f", "d" # د sub "\u0630", "dh" # ذ sub "\u0631", "r" # ر sub "\u0632", "z" # ز sub "\u0633", "s" # س sub "\u0634", "sh" # ش sub "\u0635", "ṣ" # ص sub "\u0636", "ḍ" # ض sub "\u0637", "ṭ" # ط sub "\u0638", "ẓ" # ظ sub "\u0639", "’" # ع sub "\u063a", "gh" # غ sub "\u0641", "f" # ف sub "\u0642", "q" # ق sub "\u0643", "k" # ك sub "\u0644", "l" # ل sub "\u0645", "m" # م sub "\u0646", "n" # ن sub "\u0647", "h" # ه sub "\u0648", "w" # و sub "\u064a", "y" # ي } # POSTRULES sub any("\u0061".."\uFFFF"), upcase, before: boundary, not_before: boundary + any("‘’'") sub " Al ", " al " # ال # don't capitalize defined article in the middle of a sentence
}