metadata {

authority_id: un
id: 1971
language: iso-639-2:ara
source_script: Arab
destination_script: Latn
name: Romanization of Arabic -- Beirut system (1971)
url: https://unstats.un.org/unsd/geoinfo/UNGEGN/docs/2nd-uncsgn-docs/E_Conf61_4_Add1_e.pdf
creation_date: 1971
confirmation_date: 2018-06
description: |
  The current United Nations recommended romanization
  system was approved in 2017 (resolution XI/3), based on
  the system adopted by Arabic experts at the conference
  held in Beirut in 2007, the Unified Arabic
  Transliteration System, taking into account the
  practical amendments and corrections carried out and
  agreed upon by the representatives of the Arabic-
  speaking countries at the Fourth Arab Conference on
  Geographical Names, held in Beirut in 2008, and some
  clarifications and amendments agreed in Riyadh in 20171.
  Previously, the United Nations had approved a
  romanization system in 1972 (resolution II/8), based on the
  system adopted by Arabic experts at the conference
  held at Beirut in 1971 with the practical amendments carried out
  and agreed upon by the representatives of the Arabic-speaking
  countries at their conference. The table was published in volume
  II of the conference report.
  In UN resolution XI/3 it is specifically stated that the
  system was recommended for the “romanization of the
  geographical names within those Arabic-speaking countries
  where this system is officially adopted”. There is
  evidence of its partial implementation in Jordan, Oman and
  Saudi Arabia. The UNGEGN Working Group on Romanization
  Systems intends to continue monitoring the UN system’s
  implementation across Arabic-speaking countries.
  In some countries there exist local romanization schemes
  or practices. The geographical names of Algeria, Djibouti,
  Mauritania, Morocco and Tunisia are generally rendered in
  the traditional manner which conforms to the principles of
  the French orthography.
  The previous UN-approved system is still found in
  considerable international usage.
  Arabic is written from right to left. The Arabic script
  usually omits vowel points and diacritical marks from
  writing which makes it difficult to obtain uniform results
  in the romanization of Arabic. It is essential to identify
  correctly the words which appear in any particular name
  and to know the standard Arabic-script spelling including
  the relevant vowels. One must also take into account
  dialectal and idiosyncratic deviations. The romanization
  is generally reversible though there may be some ambiguous
  letter sequences (dh, kh, sh, th) which may also point to
  combinations of Arabic characters in addition to the
  respective single characters.
notes:
  - |
    ث is t͟h (th with sub marcon)
    خ is k͟h (kh with sub marcon)
    ذ is d͟h (dh with sub marcon)
    ش is s͟h (sh with sub marcon)
    ظ is z͟h (zh with sub marcon)
    غ is g͟h (gh witg sub marcon)
    The previous UN 1972 System had the following differences:
    the character (ظ) was romanized as z̧ instead of d͟h;
    the cedilla (¸) was used instead of sub-macron (_) in all characters with sub-macrons.  - |

}

tests {

# Examples taken from:
# https://unstats.un.org/unsd/geoinfo/UNGEGN/docs/2nd-uncsgn-docs/E_Conf61_4_Add1_e.pdf
# page 31 (38 digital)
test "خَيبَر", "K͟haybar"
test "ظَهران", "Z͟hahrān"
test "القُدس", "Al Quds"
test "شَرم الشَيْخ", "S͟harm as͟h S͟hayk͟h"

}

dependency “un-ara-Arab-Latn-2017”, as: arablatn

stage {

# CHARACTERS
parallel {

  # sun letters
  sub boundary + "\u0627\u0644\u062b" + maybe("\u0651"), "at͟h t͟h" # الث
  sub boundary + "\u0627\u0644\u0630" + maybe("\u0651"), "ad͟h d͟h" # الذ
  sub boundary + "\u0627\u0644\u0634" + maybe("\u0651"), "as͟h s͟h" # الش
  sub boundary + "\u0627\u0644\u0638" + maybe("\u0651"), "az͟h z͟h" # الظ

  # shadda
  sub "\u062e\u0651", "k͟hk͟h" # خ
  sub "\u0630\u0651", "d͟hd͟h" # ذ
  sub "\u0634\u0651", "s͟h" # ش
  sub "\u0638\u0651", "z͟hz͟h" # ظ
  sub "\u063a\u0651", "g͟hg͟h" # غ

  sub "\u062b", "t͟h" # ث
  sub "\ufe9b", "t͟h" # ﺛ
  sub "\ufe9c", "t͟h" # ﺜ
  sub "\ufe9a", "t͟h" # ﺚ

  sub "\u062e", "k͟h" # خ
  sub "\ufea7", "k͟h" # ﺧ
  sub "\ufea8", "k͟h" # ﺨ
  sub "\ufea6", "k͟h" # ﺦ

  sub "\u063a", "g͟h" # غ
  sub "\ufecf", "g͟h" # ﻏ
  sub "\ufed0", "g͟h" # ﻐ
  sub "\ufece", "g͟h" # ﻎ

  sub "\u0630", "d͟h" # ذ
  sub "\ufeac", "d͟h" # ﺬ

  sub "\u0634", "s͟h" # ش
  sub "\ufeb7", "s͟h" # ﺷ
  sub "\ufeb8", "s͟h" # ﺸ
  sub "\ufeb6", "s͟h" # ﺶ

  sub "\u0638", "z͟h" # ظ
  sub "\ufec7", "z͟h" # ﻇ
  sub "\ufec8", "z͟h" # ﻈ
  sub "\ufec6", "z͟h" # ﻆ
}

run map.arablatn.stage.main

# POSTRULES
sub " At͟h T͟h", " at͟h T͟h" # الث
sub " Ad͟h D͟h", " ad͟h D͟h" # الذ
sub " As͟h S͟h", " as͟h S͟h" # الش
sub " Az͟h Z͟h", " az͟h Z͟h" # الظ

}