metadata {
authority_id: bgnpcgn id: 1968 language: iso-639-3:prs source_script: Arab destination_script: Latn name: Romanization of Pashto (1968) url: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/693760/ROMANIZATION_OF_PASHTO.pdf creation_date: 1968 confirmation_date: 2017-11 description: | Pashto is an Indo-Iranian language and is one of two nationally official languages in Afghanistan and one of five regionally recognised languages in Pakistan. The romanization system presented here may be applied to all Pashto geographical names. Although the BGN/PCGN policy for geographical names in Afghanistan is to apply the BGN/PCGN national system of romanization for Afghanistan (2007), which incorporates Dari elements, when applied to a Pashto geographical name, the romanized results of the BGN/PCGN national system for Afghanistan are the same as those of this Pashto romanization system1 . The Pashto alphabet uses a modified form of the Perso-Arabic script, and contains twelve additional consonants not present in standard Arabic, as well as three additional vowel characters and an additional vowel point. ڼ گ ښ ژ ږ ړ ډ ځ څ چ ټ پ :Consonants ٙ :Point Vowel; ې ۍ ى :Vowels The points used in Arabic to mark short vowels and certain other diacritical marks are not written in Pashto. Consequently, a reference source may sometimes be required to aid correct identification of the standard spellings and proper vowels and elimination of dialectal and idiosyncratic variations. In the interests of clarity, a column showing vowel pointing from Arabic to indicate short vowels has been included in the examples below, alongside the unpointed form that will usually be encountered. However it should be noted that the pronunciation of short vowels will vary. (Note: it is recommended that a font such as Scheherazade, available from www.sil.org, which includes the Unicode extended Arabic sub-range, be used to view this system2 .) notes: - 1. Alif ( ا ) should be romanized as follows a. Initially,it indicates that the word begins with a vowel or diphthong; the alif itself is not romanized, but rather the short vowel it “carries” is romanized; e.g., Aslam Zhrandah ه َد ن ژر سلَم َأ ميړ → b. When it carries a maddah ()آ (see vowel table, row 3), it represents ā; e.g., Band. Mīṟ د ن ب َ آب → Āb c. Medially and finally it represents ā (see table 2, row 2); e.g., ۍ ماڼ → Māṉêy d. Medially and finally in words of Arabic origin, alif may serve as the bearer of hamzah, e.g. رأس → ra’s. See also note 4. - 2. The characters tsē ( څ ) and dzē ( ځ ) may be romanized t͡ s and d͡ z (the combining double breve ( Unicode 0361) appearing over the digraph) when for special reasons it is desired that confusion be avoided between ت (t) plus س (s) and between د (d) plus ز (z), respectively. - 3. Occasionally the character sequences ه ك , ه ز , ه س , and ه گ occur . They may be romanized k·h, z· h, s·h, and g·h in order to differentiate these romanizations from the digraphs kh, zh, sh, and gh, which are used to represent the characters خ , ژ, ش , and غ respectively . - 4. Hamzah ( ء ) should be romanized as follows a. In word-initial position, where it will appear either above or below alif ( indicates a short vowel and should not itself be romanized. romanized by an apostrophe, e.g. أ or إ ), it In other positions it should be جُزء → juz’. b. Yeh with hamzah ( ئ ) should be romanized êy, unless it represents the compound (iẕāfah) morpheme, in which case it is romanized according to note 9 below. - 5. The division of words utilized in Pashto writing is followed in romanization, except that the elements –ābād, - khwā, -shahr, -zādah, -zay and -ullāh are always romanized as part of the preceding word, e.g. آباد ت م َ ْح ر َ → Raḩmatābād and الله ت م َ ْح ر َ → Raḩmatullāh. However, when the word for God ( الله ) appears as a standalone word it should be written Allāh. Note also the “dagger alif” ( ٙ) above the second ل (lām) in the word الله ; this, like the short vowels, is not written in Pashto but should be romanized ā, like a full- size alif. Persian derivational endings such as –vand and endings of Turkish origin such as –lar, -lī, -lū, -i, -u, - si, and –su, should be written together with the preceding word. - 6. The Pashto preposition د should be romanized dê in agreement with its pronunciation, despite the fact that it is sometimes pointed with kasrah ( ٙ ). - 7. In names of Arabic origin, the l of the definite article al/ul is assimilated before the ‘sun letters’ t, s̄ , d, z̄ , r, z, s, sh, ş, ẕ, ţ, z̧ , l and n. In romanization, the article will be written al or its assimilated equivalent in name-initial position but ul or its assimilated equivalent elsewhere; the article should be separated from the name it precedes and should not be capitalized, except at the beginning of a name, e.g. جَبَل السَرَاج → Jabal us Sarāj - 8. In Arabic names, a shaddah, ٙ is used to denote the doubling of a particular consonant character, e.g. مَّد َح م ُ → Muḩammad. However, in Pashto this ‘doubling’ is frequently omitted in both Perso-Arabic script and the resulting romanization. Guidance on doubling may be taken from an authoritative names source, such as an Afghan government source or Pashto dictionary; for example, it is usual to see Ḩājī without and ‘Abbās with the doubled consonant. The doubled y consonant is almost always retained, as in Sayyid or Qayyūm - 9. The iẕāfah morpheme is not a grammatical feature of Pashto and, if encountered in a linguistically hybrid geographical name (i.e. combining features of both Pashto and Dari), it should be treated according to the BGN/PCGN national system of romanization for Afghanistan, 2007, as – e, unless the preceding word ends with a silent heh ()ه or a vowel when it should be shown – ye, e.g. 10. The character sequence خو , صار ح ِ غر → Ghar-e Ḩişār; و ن َ ه ٔ لع َ ق َ → when followed by ا or ی , Qal‘ah-ye Now. - 10. The character sequence خو when followed by ا or ی ,should be romanized khw, although the w is either not pronounced, or only weakly pronounced; e.g. خواجه → khwājah. - 11. An inventory of letter-diacritic combinations in addition to the unmodified letters of the basic Roman script is ‘ (U+2018) ʼ (U+2019) Ā (U+0100) ā (U+0101) Á (U+00C1) á (U+00E1) Ḏ (U+0044+0031) ḏ (U+0064+00031) Ē (U+0112) ē (U+0113) Ê (U+00CA) ê (U+00EA) Ḩ (U+1E28) ḩ (U+1E29) Ī (U+012A) ī (U+012B) N̄ (U+004E+0304) n̄ (U+004E+0304) Ō (U+014C) ō (U+014D) Ṟ (U+0052+0031) ṟ (U+0072+0031) Ş (U+015E) ş (U+015F) S̄ (U+0053+0304) s̄ (U+0073+0304) Ṯ (U+0054+0031) ṯ (U+0074+0031) Ţ (U+0162) ţ (U+0163) Ū (U+016A) ū (U+016B) Z̧ (U+005A+0327) z̧ (U+007A+0327) Z̄ (U+005A+0304) z̄ (U+007A+0304) Ẕ (U+005A+0331) ẕ (U+007A+0331) Z͟ H (U+005A+0048+035F) z͟ h (U+007A+0068+035F)
}
tests {
test "بَغْلان", "Baghlān" test "پُوټَكَى", "Pōṯakay" test "شِيرِين تَگَاب", "Shīrīn Tagāb" test "کُوْټ", "Kōṯ" test "ثَابِر", "S̄ābir" test "جَلال آبَاد", "Jalālābād" test "چَارِيكَار", "Chārīkār" test "ځَدْرَاڼ", "Dzadrāṉ" test "څَوکۍ", "Tsowkêy" test "حَضْرَتِ إِمَام", "Ḩaẕrat-e Imām" test "خُوْسْت", "Khōst" test "سْپِين بُوْلْدَک", "Spīn Bōldak" test "ډَنْډ وَ پَتَان", "Ḏanḏ Wa Patān" test "كَنْدَهَار", "Kandahār" test "أَنْدَړ", "Andaṟ" test "كُنْدُز", "Kunduz" test "مِير أَسْلَم ژْرَنْدَه", "Mīr Aslam Zhrandah" test "ږِيرَه", "Z͟hīrah" test "سَمَنْگَان", "Samangān" test "كښٙتَه كَلا", "Ks͟hêtah Kalā" test "قَيْصَار", "Qayşār" test "فَيض آبَاد", "Faīẕābād" test "حَضْرَتِ سُلْطَان", "Ḩaẕrat-e Sulţān" test "ظَاهِر كَلا", "Z̧āhir Kalā" test "پُلِ عَلَم", "Pul-e ‘Alam" test "غَزْنِي", "Ghaznī" test "مَزَارِ شَرِيف", "Mazār-e Sharīf" test "قَيْصَار", "Qayşār" test "كَنْدَهَار", "Kandahār" test "گَرْدېز", "Gardēz" test "کَابُل", "Kābul" test "مَيمَنَه", "Maīmanah" test "خَان آبَاد", "Khānābād" test "مَاڼۍ", "Māṉêy" test "وَاخَان", "Wākhān" test "يَنْگِي قَلعَه", "Yangī Qal‘ah" test "جَلال آبَاد", "Jalālābād" test "مُرْغَاب کَابُل", "Murghāb Kābul" test "گٙردُون", "Gêrdōn" test "آب بَنْد", "Āb Band" test "سْپِين بُوْلْدَک", "Spīn Bōldak" test "جَوزجَان", "Jowzjān" test "گَرْدېز", "Gardēz" test "مَیدان شَهْر", "Maīdān Shahr" test "ډَنْډِ سُفْلىٰ", "Ḏanḏ-e Suflá" test "جَبَل السَرَاج", "Jabal us Sarāj"
}
dependency “bgnpcgn-prs-Arab-Latn-2007”, as: arablatn
stage {
run map.arablatn.stage.main # CHARACTERS parallel { sub "\u0650", "i" # ِ kasra sub "\u064f", "u" # ُ damma sub "\u0650" + boundary, "-e" # ِ kasra sub space + "\u0627\u0644\u0644\u0651\u064e\u0647", "ullāh" # Note5 sub space + "\u062E\u0648\u0627", "khwā" # Note5 sub space + "\u0634\u064E\u0647\u0631", "shahr" # Note5 sub space + "\u0632\u0627\u062F\u0629", "zādah" # Note5 sub space + "\u0632\u064E\u064a", "zay" # Note5 sub "\u0652", "" # ْ sokoon sub "\u0659", "ê" # Sun letters sub boundary + "\u0627\u0644\u062a" + maybe("\u0651"), "ut t" # الت sub boundary + "\u0627\u0644\u062b" + maybe("\u0651"), "us̄ s̄" # الث sub boundary + "\u0627\u0644\u062f" + maybe("\u0651"), "ud d" # الد sub boundary + "\u0627\u0644\u0630" + maybe("\u0651"), "uz̄ z̄" # الذ sub boundary + "\u0627\u0644\u0631" + maybe("\u0651"), "ur r" # الر sub boundary + "\u0627\u0644\u0632" + maybe("\u0651"), "uz z" # الز sub boundary + "\u0627\u0644\u0633" + maybe("\u0651"), "us s" # الس sub boundary + "\u0627\u0644\u0634" + maybe("\u0651"), "ush sh" # الش sub boundary + "\u0627\u0644\u0635" + maybe("\u0651"), "uş ş" # الص sub boundary + "\u0627\u0644\u0636" + maybe("\u0651"), "uẕ ẕ" # الض sub boundary + "\u0627\u0644\u0637" + maybe("\u0651"), "uţ ţ" # الط sub boundary + "\u0627\u0644\u0638" + maybe("\u0651"), "uz̧ z̧" # الظ sub boundary + "\u0627\u0644\u0644" + maybe("\u0651"), "ul l" # الل sub boundary + "\u0627\u0644\u0646" + maybe("\u0651"), "un n" # الن sub "\u0626", "êy" # ئ } # POSTRULES sub any("\u0061".."\uFFFF"), upcase, before: boundary, not_before: boundary + any("‘’'-") # don't capitalize defined article in the middle of a sentence sub " Ut T", " ut T" # الت sub " Us̄ S̄", " us̄ S̄" # الث sub " Ud D", " ud D" # الد sub " Uz̄ Z̄", " uz̄ Z̄" # الذ sub " Ur R", " ur R" # الر sub " Uz Z", " uz Z" # الز sub " Us S", " us S" # الس sub " As S", " us S" # needed to add it after porting, why? sub " Ush Sh", " ush Sh" # الش sub " Uş Ş", " uş Ş" # الص sub " Uẕ Ẕ", " uẕ Ẕ" # الض sub " Uţ Ţ", " uţ Ţ" # الط sub " Uz̧ Z̧", " uz̧ Z̧" # الظ sub " Ul L", " ul L" # الل sub " Un n", " un N" # الن compose
}