metadata {

authority_id: odni
id: 2004
language: iso-639-2:fas
source_script: Arab
destination_script: Latn
name: Intelligence Community (IC) Standard for the Transliteration of Farsi (Persian) Personal Names (2004)
creation_date: 2004
confirmation_date: 2004-11
description: |

notes:
  - Long/short vowels:- There is no distinction made in Roman
    between long and short a:-  E.g., Parvas (first a is short,
    second is long).
  - Double consonants:-  Double consonants represented by the
    tashdid are shown by doubling the Roman letter:-
    Mo'azzami.  Exceptions:-  Ain and consonants represented by
    Roman digraphs (e.g., sh, ch) are not doubled:-  Mobasher [
    not:-  Mobashsher].  Double letters are only used for
    tashdid (thus, Hosein [not Hossein]) or to reflect the ‘sun
    letter’ assimilation (see beelow).
  - Hamzeh:-  The hamzeh is represented name-internally by an
    apostrophe, as is the ain.  Name-initially, however,
    neither hamzeh nor ain are indicated in transliteration (
    e.g., Abdorrahman, not 'Abdorrahman).
  - Digraphs:-  No distinction is drawn in Roman between
    digraphs such as sh and single contiguous letters (e.g., s
    followed by h).
  - Arabic definite article "al" ('the'):-  Common in many
    names borrowed from Arabic, the transliteration should
    follow the Arabic rules for “sun letter” assimilation in
    spoken form and reflect the nominative case.  That  is:-
    Abdorrahman, not Abd al-Rahman.  Note also that the
    “Abdollah” and “Abdol + attribute of Allah” names are
    written as one unanalyzed word, as are other names that
    contain the definite article:-  Shamsoddin (not Shams al-
    Din), Nezamoddin, etc.
  - Diphthongs:-  Diphthongs are written ei and ow, as in,
    respectively:-  Hosein; Khosrow.
  - Yeh maqsura (final yeh pronounced as “a”):-  should be
    written as “a” as in “Musa”.

  - Special Rules

  - Hyphens:-  A hyphen is used to indicate the ezafeh
    construction:-  Arshad-e Ameri
  - Borrowed names that incorporate the name of God (Allah)
    are transliterated as one word, with the letter "o":-  E.g.,
    Abdollah, Ayatollah, Azizollah.
  - Foreign names borrowed or appearing in Farsi are spelled
    according to the standard Western tradition (even if there
    is an Arabic or Farsi version of the same name):-  Joseph,
    Michael.
  - Common suffixes, such as nia, pur, fard, far, abad,
    zadeh, khah, and nezhad as well as nesbeh (‘relationship’ (
    to place of birth, etc.)) names derived with these
    suffixes  (e.g., nezhadi, abadi) are written as part of the
    name:-

    asa               Mehrasa
    baksh             Tajbaksh
    dust              Rafighdust
    far               Parvizfar
    fard              Akhavanfard
    gar               Fuladgar
    gol               Zarringol
    kar               Parhizkar
    khah              Vatankhah
    khu               Nikkhu
    mand              Purmand
    mehr              Zadmehr
    nezhad            Niknezhad
    nia               Montajebnia
    parast            Khodaparast
    parvar            Golparvar
    pur               Mohteshemipur
    tabar             Shayestehtbar
    yar               Mohammadyar
    zadeh             Vakilzadeh

    abadi             Salehabadi
    khani             Alikhani
    nezhadi   Niknezhadi

  - Note also that yar can function as a prefix and, as such,
    should be affixed directly to the name:-

    yar               Yarmohammadi, Yarshater

  - This is in contrast with hyphenated names such as Raja’i-
    Khorasani, Tabataba’i-Shirazi, Soleimani-Maimandi, etc.

}

tests {

test "مُوسَى", "musa"
test "مُؤمِن", "mo’men"
test "رِضايي", "reza’i"
test "مُبَشِّر", "mobasher"
test "حَسَّان", "hassan"
test "حَسَن", "hasan"
test "صَفَّار", "saffar"
test "صَفَر", "safar"

}

stage {

# CHARACTERS
parallel {
  # special rules

  sub space, "", after: "\u0622\u0628\u064E\u0627\u062F" # space followed by abad is removed
  sub "\ufdf2", "Allah" # See note 5
  sub space + "\u0627\u0644\u0644\u0651\u064e\u0647", "ollah" # NOTE 9

  sub "\u0652", "" # ْ sokoon
  sub "\u0659", "ê"

  sub "\u064e\u064a\u0652", "ay" # ـَيْ
  sub "\u0649\u0670", "á" # ىٰ
  sub "\u0674", "-e" # ٴ
  sub "\u0654", "-e" #  ٔ
  # - '-ye'

  # ta' marboota
  sub "\u0629", "eh"

  sub "\u0626", "’" # ئ
  sub "\u0624", "’" # ؤ
  sub "\u0623", "" # أ
  sub "\u0625", "" # إ

  # See note B
  sub boundary + "\u0627\u0644", "al " # ال
  sub boundary + "\u0622\u0644", "Al " # ‫آل‬
  # '\uFE8E' : ''  # ﺎ

  # Sun letters
  sub boundary + "\u0627\u0644\u062a" + maybe("\u0651"), "at t" # الت
  sub boundary + "\u0627\u0644\u062b" + maybe("\u0651"), "as s" # الث
  sub boundary + "\u0627\u0644\u062f" + maybe("\u0651"), "ad d" # الد
  sub boundary + "\u0627\u0644\u0630" + maybe("\u0651"), "az z" # الذ
  sub boundary + "\u0627\u0644\u0631" + maybe("\u0651"), "ar r" # الر
  sub boundary + "\u0627\u0644\u0632" + maybe("\u0651"), "az z" # الز
  sub boundary + "\u0627\u0644\u0633" + maybe("\u0651"), "as s" # الس
  sub boundary + "\u0627\u0644\u0634" + maybe("\u0651"), "ash sh" # الش
  sub boundary + "\u0627\u0644\u0635" + maybe("\u0651"), "as s" # الص
  sub boundary + "\u0627\u0644\u0636" + maybe("\u0651"), "az z" # الض
  sub boundary + "\u0627\u0644\u0637" + maybe("\u0651"), "at t" # الط
  sub boundary + "\u0627\u0644\u0638" + maybe("\u0651"), "az z" # الظ
  sub boundary + "\u0627\u0644\u0644" + maybe("\u0651"), "al l" # الل
  sub boundary + "\u0627\u0644\u0646" + maybe("\u0651"), "an n" # الن

  # Farsi Vowel (Pointing)
  sub "\u0622", "a" # آ alef maddeh
  sub "\u064e", "a" # َ fatha
  sub "\u0627", "", before: "\u064e" # ا
  sub "\u0627", "a", not_before: boundary # ا
  sub boundary + "\u0627\u064e", "a" # ا initial followed by fatha
  sub boundary + "\u0627\u064f", "o" # ا initial followed by damma
  sub boundary + "\u0627\u0650", "e" # ِ ا initial followed by kasra

  sub "\u064f", "o" # damma
  sub "\u064f\u0648", "u" # ـُو damma followed by و
  # '\u064e\u0648' : 'ow'  # ـَو
  # '\u064e\u0648\u0652' : 'aw'  # ـَوْ

  sub "\u0650", "e" # kasra
  sub "\u0650\u064a", "i" # ـِي kasra followed by ي
  sub "\u0650\u06cc", "i" # ـِي kasra followed by ي
  sub "\u0650\u064a\u0651\u064e", "iy" # ـِيَّ
  sub "\u0650\u06cc\u0651\u064e", "iy" # ـِيَّ
  sub "\u0650\u064a", "iy", after: any(["\u064e", "u064f"]) # ـِي kasra followed by ي
  # '\u064e\u064a' : 'aī'  # ـَي
  # '\u064e\u06cc' : 'aī'  # ـَي
  # '\u064e\u0649' : 'ay'  # ـَى fatha followed by ى which is ا not ي

  # additional symbols

  # shadda

  sub "\u0628\u0651", "bb" # ب
  sub "\u062a\u0651", "tt" # ت
  sub "\u062b\u0651", "ss" # ث
  sub "\u062c\u0651", "jj" # ج
  sub "\u062d\u0651", "hh" # ح
  sub "\u062e\u0651", "kh" # خ
  sub "\u062f\u0651", "dd" # د
  sub "\u0630\u0651", "zz" # ذ
  sub "\u0631\u0651", "rr" # ر
  sub "\u0632\u0651", "zz" # ز
  sub "\u0633\u0651", "ss" # س
  sub "\u0634\u0651", "sh" # ش
  sub "\u0635\u0651", "ss" # ص
  sub "\u0636\u0651", "zz" # ض
  sub "\u0637\u0651", "tt" # ط
  sub "\u0638\u0651", "zz" # ظ
  sub "\u063a\u0651", "gh" # غ
  sub "\u0641\u0651", "ff" # ف
  sub "\u0642\u0651", "gh" # ق
  sub "\u0643\u0651", "kk" # ك
  sub "\u0644\u0651", "ll" # ل
  sub "\u0645\u0651", "mm" # م
  sub "\u0646\u0651", "nn" # ن
  sub "\u0647\u0651", "hh" # ه
  sub "\u0648\u0651", "vv" # و
  sub "\u064a\u0651", "yy" # ي

  sub "\u0621", "", before: boundary # ء
  sub "\u0621", "’" # ء

  # FROM NOTES

  sub "\u064e\u0649", "a" # ـَى fatha followed by ى which is ا not ي
  sub "\u0649", "a" # ى alef maqsura NOTE-1

  sub "\u064a\u064a", "’i" # NOTE 4 (2)
  sub "\u06cc\u06cc", "’i"

  sub "\u0627\u064a" + boundary, "’i" # NOTE 4 (3)
  sub "\u0627\u06cc" + boundary, "’i"

  # Farsi consonant characters

  sub "\u0628", "b" # ب
  sub "\u067E", "p" # پ
  sub "\u062a", "t" # ت
  sub "\u062B", "s" # ث
  sub "\u062c", "j" # ج
  sub "\u0686", "ch" # ‫چ‬
  sub "\u062d", "h" # ح
  sub "\u062e", "kh" # خ
  sub "\u062f", "d" # د
  sub "\u0630", "z" # ذ
  sub "\u0631", "r" # ر
  sub "\u0632", "z" # ز
  sub "\u0698", "zh" # ‫ژ‬
  sub "\u0633", "s" # س
  sub "\u0634", "sh" # ش
  sub "\u0635", "s" # ص
  sub "\u0636", "z" # ض
  sub "\u0637", "t" # ط
  sub "\u0638", "z" # ظ
  sub "\u0639", "‘" # ع
  sub "\u0639", "", before: boundary # ع not represented initially
  sub "\u063a", "gh" # غ
  sub "\u0641", "f" # ف
  sub "\u0642", "gh" # ق
  sub "\u0643", "k" # ك
  sub "\u06A9", "k" # ک
  sub "\u06AF", "g" # ‫گ‬
  sub "\u0644", "l" # ل
  sub "\u0645", "m" # م
  sub "\u0646", "n" # ن
  sub "\u0647", "h" # ه
  sub "\u0648", "v" # و
  sub "\u064a", "y" # ي
  sub "\u0649", "y" # ي
  sub "\u06D0", "ē" # ې
  sub "\u06CD", "êy" # ‫ۍ‬
}

}