metadata {
authority_id: bgnpcgn id: 2007 language: iso-639-3:prs source_script: Arab destination_script: Latn name: Pashto Standardized Transliteration System for Personal Names (2011) creation_date: 2011 confirmation_date: 2011-06 description: | Pashto is a language characterized by its variability. The challenges of reaching a universally acceptable transliteration standard for personal names are posed both by the dialect spread trending roughly from the southwest to the northeast parts of its geographic area, and also by the competing influences of Dari and Urdu. Given the lack of any dominant standard on some points, it has been considered necessary to define a certain degree of flexibility within this standardization system in order to enable it to capture the variability of the language it seeks to reflect. A great effort has also been made to research and reflect common usage in the Pashto- speaking areas of Pakistan and Afghanistan, and to define rules which capture these trends to the best extent possible. However, a respect for the norms of usage has also been balanced with the need for consistency and intelligibility. Therefore, there are cases where a less common spelling will be the preferred usage under the guidance of this system. Lastly, specific names that cannot be incorporated gracefully into the rules described below are included with case-by-case preferred spellings in a list near the end of this document. notes: # Special rules: Consonants - Pashto letter ge (ږ) The two common renderings for this letter are 'zh' and 'g.' 1 The preferred option will be 'zh' (consistent with the choice of southern 'sh' for )ښ. However, when referring to communities that consistently render the name with a 'g' as opposed to a 'zh,' then 'g' will be the preferred option. In these cases, the inclusion of a variant spelling with 'zh' is strongly encouraged. - Double consonants Double consonants represented by the tashdid (shaddah) are shown in most cases regardless of whether they are clearly enunciated in speech. Examples Muhammad Hassan, Izzatullah. However, consonants represented by digraphs are not doubled. Example Mubashir ( not Mubashshir). Special care should be taken when possible to discriminate between doubled and non-doubled letters in names that are otherwise indistinguishable in their transliterated forms Hasan (حسن (vs. Hassan (حسان( - Digraphs No distinction is made between digraphs such as 'sh' and single contiguous letters such as 's' followed by 'h.' # Special rules: Vowels - Short vowels zair and pesh The preferred options for the short vowels represented by the zair and pesh will be 'i' and 'u.' However, in cases where there is a mixed Dari and Pashto environment, then the use of 'e' and 'o' is accepted in consideration of Dari norms. - Long/short vowels Long and short vowels are not distinguished in the system (with the exception of certain spellings driven by Dari influence as discussed above). In this and other systems, the borrowed Arabic name Salim could represent two distinct names, one with a long /a/ (Saalim - )سالم and one with a long /i/ (Saliim - )سلیم. This is known as a collision. This and many other prevailing standardization systems do not distinguish between these types of collisions. However, in cases like these, it is recommended that a vigorous effort be made to include variant spellings in order to eliminate ambiguity as to which name is intended, as in the following examples Hamid (var. Hameed) – حمید Hamid (var. Hamed) – حامد - Izafat The linking vowel of Persian origin known as the izafat will be written with a hyphen and then 'e' and then a following space. Example Koh-e Nur ("mountain of light"). There will be no special accommodation for when the initial word ends in a vowel. # Special rules: Arabic - The Arabic article al ( ال ) The Arabic article will be written with a lowercase 'a' and followed by a hyphen, with the obvious exception that an uppercase 'a' should be used where required by English orthographic conventions, e.g., at the beginning of a sentence. Example Karim al-Afghani - Genitive constructions Multi-part Arabic names that follow the Arabic genitive construction will be written with a lowercase ul joined to the last part of the name by a hyphen. Arabic sun letter assimilation generally will not be shown. # Genitive construction exceptions - In deference to widespread usage, the name Abd will be combined with the genitive article ul, and the rest of the name will be written separately ( specifically, one of the ninety-nine "names of God"). Examples Abdul Haq, Abdul Rahman - Names incorporating "--ullah" will be written as a single unit. Examples Abdullah, Rahmatullah - Names incorporating "--din" will be written as a single unit with sun letter assimilation shown, causing ul to change to ud. This type of name is the only case in which sun letter assimilation will be shown. Examples Jamaluddin, Shamsuddin - Note, no effort will be made to force names into the genitive construction if they are not linked by the article. For example, both names Fazl ul- Rahman and Fazl Rahman would be acceptable, depending on whether the article was included in the individual's name. - Consonant clusters Traditional Arabic names ending in a final consonant cluster will be spelled with the consonant cluster intact. Although most native Pashto speakers will break up the cluster in conversational speech, general usage in the written transliteration of names still favors the preservation of the consonant cluster in print. This type of name should not be confused with Arabic names whose orthography includes a zabar (short 'a' vowel) prior to the final consonant. -- NOT DONE # Special rules: Multiple-part Pashto names - TODO # Special rules: Glides versus consonants - The unwritten phonetic "glides," also known as semi- consonants (sounding similar to /y/ and /w/), will generally not be shown between two vowels. Examples Rauf, Said (سعيد) - Care should be taken to distinguish the above rule from cases in which wao or ya are a written part of the name and function as true consonants, including cases where they are doubled. Examples Fayiz ( فایز ), Fayyaz ( فیّاض ) - A common form where ya will be shown in its role as a consonant is with the Arabic nisba (suffix showing origin, relation, etc.) appended to names that end in a vowel. Examples Ziayi, Shafiyi, Mirzayi, Paktiayi # Specific rules: Exceptions - TODO In spite of best intentions, the unbending application of rules in any transliteration system is likely to produce some forms that fly in the face of accepted use. Therefore, the following names will be spelled as follows, in spite of minor variance with the rules described above. Aurangzeb (not Awrangzeb) Eid (not Id) Faizad (not Faizzad) Javed (not Jawed) 2 Parvez (not Parwez) Qureshi (not Quraishi) Saad (not Sad) Sherpao (not Sherpaw) - The following names do not constitute exceptions to the rules of this standard, but they are names that pose significant challenges to standardization and are therefore listed here to ensure consensus Bahadur (not Bahadar, Bahader) Feroz (not Firoz, Fairuz, etc.) Firdaws (not Firdos) Husain (not Hussain) Isfandyar (not Asfandyar) Ismail (not Ismael) Khushhal (not Khushal) Niamat (not Nimat) Numan (not Nauman) Raza (not Reza) Sherzad (not Shirzad) Tor Jan (not Tur Jan) Uwais (not Awais) # Special rules: Titles - We treat the spelling of commonly used titles differently from the handling of names, given that titles are subject to norms of English as they are accepted into the English language. Though not specifically covered by the scope of this transliteration standard, the following spellings are recommended for the sake of consistency Akhund Amir Commander (not Commandan) Hafiz Haji Mawlana Mawlawi Mullah Qari Qazi Sahib Sheikh Syed ( سیّد ) Ustad
}
tests {
test "حَسّان", "Hassan" test "حَسَن", "Hasan" test "صَفّار", "Saffar" test "صَفَر", "Safar" # collision test "حَمِيد", "Hamid" # collision test "حامِد", "Hamid" test "كَرِيم الأَفغَانِي", "Karim al-Afghani" test "عَبداللَّه", "Abdullah" test "جَمَال الدين", "Jamaluddin" test "شَمسُ الدين", "Shamsuddin" test "فَيَّاض", "Fayyaz" test "فايِز", "Fayiz" test "ا", "A" test "رَؤوف", "Rauf" test "سَعِيد", "Said" test "قَيُّوم", "Qayyum"
}
stage {
# CHARACTERS parallel { sub "\u0650", "i" # ِ kasra sub "\u064f", "u" # ُ damma sub "\u0650" + boundary, "-e" # ِ kasra sub "\u0652", "" # ْ sokoon sub "\u0659", "ê" # Sun letters sub maybe("\u064f") + maybe(space) + "\u0627\u0644\u062f" + any("\u064a\u0649") + "\u0646", "uddin" # الدين sub "\u0626", "êy" # ئ sub "\u0628", "b" # ب sub "\u067E", "p" # پ sub any("\u062a\u067C\u0637"), "t" # ت/ټ/ط sub "\u062c", "j" # ج sub "\u0686", "ch" # چ sub "\u0681", "dz" # ځ sub "\u0685", "ts" # څ sub any("\u062d\u0647"), "h" # ح/ه sub "\u062e", "kh" # خ sub any("\u062f\u0689"), "d" # د/ډ sub any("\u0631\u0693"), "r" # ر/ړ sub any("\u0630\u0632\u0636\u0638"), "z" # ذ/ز/ض/ظ sub any("\u0696\u0698"), "zh" # ژ/ږ sub any("\u062B\u0633\u0635"), "s" # س/ث/ص sub any("\u0634\u069A"), "sh" # ښ/ش sub any("\u0621\u0639"), "" # ع/ء sub "\u063a", "gh" # غ sub "\u0641", "f" # ف sub "\u0642", "q" # ق sub "\u0643", "k" # ك sub "\u06A9", "k" # ک sub any("\u06AF\u06AB"), "g" # گ/ګ sub "\u0644", "l" # ل sub "\u0645", "m" # م sub any("\u0646\u06BC"), "n" # ن/ڼ sub "\u0648", "w" # و sub "\u064a", "y" # ي sub "\u0649", "y" # ي sub "\u064e" + maybe("\u0627"), "a" # َ fatha sub "\u0650", any("ie") sub "\u064f", any("uo") # ُ damma sub "\u0622", "a" # آ sub "\u0627", "a" # ا sub "\u0648", "o" # و sub "\u064e\u0648\u0652", "aw" # ـَوْ sub "\u064f\u0648", "u" # ـُو sub "\u064e\u064a", "ai" # ـي sub "\u0650" + any("\u064a\u0649"), "i" sub "\u06D0", "e" # ې sub "\u06CD", "ey" # ۍ sub "\u06CC", "a" # ی sub "\u064e\u06CC" + any("\u0647\u0627"), "aya" # َيا / َيه sub "\u0650\u06CC" + any("\u0647\u0627"), "ia" # ِيا /ِيه sub "\u0652\u06CC\u0627", "ya" # ْيا sub any("\u06D0\u06D2") + boundary, "ey" # ے / ې sub "\u0648\u064A" + boundary, "oy" # وي sub "\u064f\u0648\u064A" + boundary, "uy" # ُوي sub "\u0623", "" # أ # Double consonants sub "\u0628\u0651", "bb" # ب sub "\u067E\u0651", "pp" # پ sub any("\u062a\u067C\u0637") + "\u0651", "tt" # ت/ټ/ط sub "\u062c\u0651", "jj" # ج sub any("\u062d\u0647") + "\u0651", "hh" # ح/ه sub any("\u062f\u0689") + "\u0651", "dd" # د/ډ sub any("\u0631\u0693") + "\u0651", "rr" # ر/ړ sub any("\u0630\u0632\u0636\u0638") + "\u0651", "zz" # ذ/ز/ض/ظ sub any("\u062B\u0633\u0635") + "\u0651", "ss" # س/ث/ص sub "\u0641\u0651", "ff" # ف sub "\u0642\u0651", "qq" # ق sub "\u0643\u0651", "kk" # ك sub "\u06A9\u0651", "kk" # ک sub any("\u06AF\u06AB") + "\u0651", "gg" # گ/ګ sub "\u0644\u0651", "ll" # ل sub "\u0645\u0651", "mm" # م sub any("\u0646\u06BC") + "\u0651", "nn" # ن/ڼ sub "\u0648\u0651", "ww" # و sub "\u064a\u0651", "yy" # ي sub "\u0649\u0651", "yy" # ي sub boundary + "\u0627\u0644", "al-", before: space # ال sub maybe(space) + "\u0627\u0644\u0644\u0651\u064e\u0647", "ullah" sub "\u0624\u0648", "u" } # POSTRULES sub any("\u0061".."\uFFFF"), upcase, before: boundary, not_before: boundary + any("‘’'") # don't capitalize defined article in the middle of a sentence sub " Al-", " al-" # الن
}