metadata {
authority_id: iso id: 233-3 language: iso-233-3:prs source_script: Arab destination_script: Latn name: "ISO 223-3:1999 Persian language -- Simplified transliteration" url: https://web.archive.org/web/20200920064754/http://www.freeprotocols.org/content/republished/doc.public/standards/communication/iso/iso-233/iso-233-3.pdf creation_date: 1999 confirmation_date: 1999-01-15 description: | This part of ISO 233 is one of a series of International Standards, dealing with the conversion of systems of writing. The aim of this part of ISO 233 and others in the series is to provide a means for international communication of written messages in a form which permits the automatic transmission and reconstitution of these, by men or machines. The system of conversion, in this case, must be univocal and entirely reversible. This means that no consideration should be given to phonetic and aesthetic matters or to certain national customs: all these considerations are, indeed, ignored by the machine performing the function. The adoption of this part of ISO 233 for international communication leaves every country free to adopt for its own use a national standard which may be different, on condition that it is compatible with this part of ISO 233. The system proposed herein should make this possible and be acceptable to international use if the graphisms it creates are such that they may be converted automatically into the graphisms used in any strict national systems. This part of ISO 233 may be used by anyone who has a clear understanding of the system and is certain that it can be applied without ambiguity. The result obtained will not give a correct pronunciation of the original text in a person’s own language, but it will serve as a means of finding automatically the original graphism and thus allow anyone who has knowledge of the original language to pronounce it correctly. Similarly, one can only pronounce correctly a text written in, for example, English or Polish, if one has a knowledge of English or Polish. The adoption of national standards compatible with this part of ISO 233 will permit the representation, in an international publication, of the morphemes of each language according to the customs of the country where it is spoken. It will be possible to simplify this representation in order to take into account the number of the character sets available on different kinds of machines. 1-Scope: This part of ISO 233 establishes a simplified system for the transliteration of Persian characters into Latin characters. This simplification of the stringent rules established by ISO 233:1984 is especially intended to facilitate the processing of bibliographic information ( e.g. catalogues, indices, citations, etc.) 2-Normative references: The following normative documents contain provisions which, though reference in this text, constitute provisions of this part of ISO 233. For dated references, subsequent amendments to, or revisions of, any of these publications do not apply. However, parties to agreements based on this part of ISO 233 are encouraged to investigate the possibility of applying the most recent editions of the normative documents indicated below. For undated references, the latest edition of the normative document referred to applies. Members of ISO and IEC maintain registers of currently valid International StandardsISO 233- 2, Information and documentation -- Transliteration of Arabic characters into Latin characters — Part 2: Arabic language — Simplified transliteration. ISO/IEC 10646-1, Information Technology — Universal Multiple-Octet Coded Character Set (UCS) — Part 1: Architecture and Basic Multilingual Plane. notes: | TODO
}
tests {
test "آذَر", "âẕar" test "سَم", "sam" test "پُر", "por" test "پِدَر", "pedar" test "مَثَلاً", "mas̱alâ´´" test "جزء", "jz’" test "رأس", "râ’s" test "سؤال", "sv’âl" test "مسئلة", "msy’lh"
}
stage {
# CHARACTERS parallel { # word-medial or word-final form where so appearing in a word. # '\u0627': '-' # # Vowel, Diphthong and Diacritical Characters # '\u064E': 'a' # # Both e and i are available to romanize this short vowel, # # depending on local usage and/or root language. In cases where the sound # # is uncertain, i is the default romanization in BGN/PCGN standardization # # procedures. # '\u0650': # - 'e' # - 'i' # # Both o and u are available to romanize this short vowel, # # depending on local usage and/or root language. In cases where the sound # # is uncertain, u is the default romanization in BGN/PCGN standardization # # procedures. # '\u064F': # - 'o' # - 'u' # '\u0659': 'ê' # # An alif with mad ( آ ) is written only in the initial position by # # BGN/PCGN standardization procedures, in keeping with Persian language # # family standards of use of the Arabic alphabet. The same letter written # # in a medial or final position is written . . . # '\u0622': 'ā' # pending issue #442 # '\u0648': 'ō' # '\u0648': 'ū' # '\u0648': 'ow' # '\u06CC': 'ī' # # Or 'ē'. The character ی should be romanized ay or ē according to # # its root language or local pronunciation. In case of uncertainty a # # reference source (such as the Fairchild Aerial Surveys map series, or a # # BGN/PCGN approved policy document/list of recommended spellings) should # # be consulted. # '\u06CC': 'ay' # '\u06D0': 'ē' # # Or 'aī'. Both the combination ay and aī are available to romanize # # this character according to its root language or local pronunciation. # # In cases where the sound is uncertain ay is the default romanization in # # BGN/PCGN standardization procedures # '\u06CC': # - 'ay' # - 'á' # '\u06CD': 'êy' # '\u0621': '’' # '\u0674': # - '-e' # - '-ye' # # Other Diacritical Marks and Language Conventions # '\u0627': 'āy' # '\u0648': 'w' # '\u0626': '’' # '\u06C0': '' # '\u0651': '' # special rules sub space, "", after: "\u0622\u0628\u064E\u0627\u062F" # space followed by abad is removed sub "\ufdf2", "Allāh" # See note 5 # pointing sub "\u064e", "a" # َ fatha sub "\u0650", any(["e", "i"]) sub "\u0650" + boundary, "-e" # ِ kasra sub "\u064f", any(["o", "u"]) # ُ damma sub "\u0652", "" # ْ sokoon sub "\u0659", "ê" # special pointed letters sub "\u0639\u064e", "‘a" # عَ sub "\u0639\u0650", "‘i" # عِ sub "\u0639\u064f", "‘ū" # عُ # handle MacOS regex difference sub "\u0639\u064f\u0648", "‘ū" # عُو damma followed by و sub "\u0650\u064a", "ī" # ـِي kasra followed by ي sub "\u0650\u06cc", "ī" # ـِي kasra followed by ي sub "\u0650\u064a\u0651\u064e", "īy" # ـِيَّ sub "\u0650\u064a", "iy", after: any(["\u064e", "u064f"]) # ـِي kasra followed by ي sub "\u064f\u0648", "ō" # ـُو damma followed by و sub "\u064e\u0627", "ā" # ـَا fatha followed by ا sub "\u064e\u0649", "ay" # ـَى fatha followed by ى which is ا not ي sub "\u064e\u0648\u0652", "aw" # ـَوْ sub "\u064e\u0648", "ow" # ـَو sub "\u064e\u064a\u0652", "ay" # ـَيْ sub "\u0650\u06cc\u0651\u064e", "īy" # ـِيَّ sub "\u064e\u064a", "aī" # ـَي sub "\u064e\u06cc", "aī" # ـَي sub "\u0649\u0670", "á" # ىٰ sub "\u0674", "-e" # ٴ sub "\u0654", "-e" # ٔ # - '-ye' sub "\u0622", "â" # آ # ta' marboota sub "\u0629", "t" # ة in the middle of the sentence sub "\u0629" + line_end, "h" # TODO: simplify this sub "\u0629", "h", before: boundary + "\u0627\u0644" + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") sub "\u0629", "h", before: boundary + "\u0627\u0644" + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") sub "\u0629", "h", before: boundary + "\u0627\u0644" + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") sub "\u0629", "h", before: boundary + "\u0627\u0644" + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") sub "\u0629", "h", before: boundary + "\u0627\u0644" + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") sub "\u0629", "h", before: boundary + "\u0627\u0644" + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") sub "\u0629", "h", before: boundary + "\u0627\u0644" + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") sub "\u0629", "h", before: boundary + "\u0627\u0644" + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") sub "\u0629", "h", before: boundary + "\u0627\u0644" + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") sub "\u0629", "h", before: boundary + "\u0627\u0644" + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") sub "\u0629", "h", before: boundary + "\u0627\u0644" + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") sub "\u0629", "h", before: boundary + "\u0627\u0644" + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") + any("\u0600".."\u06ff") # Tanvin sub "\u064b", "´´" # ً sub "\u064c", "" # ٌ sub "\u064d", "" # ٍ # hamzeh sub "\u0621", "’" # ء sub "\u0623", "â’" # أ sub "\u0624", "v’" # ؤ sub "\u0626", "y’" # ئ # punctuation sub "\u060c", "," # vavak comma sub "\u061b", ";" # nogteh vavak semi column sub "\u061f", "?" # neshane-ye porsesh question mark sub "\u0625", "" # إ sub "\u0627", "â" # ا # See note B sub boundary + "\u0627\u0644", "al " # ال # '\uFE8E' : '' # ﺎ # Sun letters sub boundary + "\u0627\u0644\u062a" + maybe("\u0651"), "at t" # الت sub boundary + "\u0627\u0644\u062b" + maybe("\u0651"), "as̄ s̄" # الث sub boundary + "\u0627\u0644\u062f" + maybe("\u0651"), "ad d" # الد sub boundary + "\u0627\u0644\u0630" + maybe("\u0651"), "az̄ z̄" # الذ sub boundary + "\u0627\u0644\u0631" + maybe("\u0651"), "ar r" # الر sub boundary + "\u0627\u0644\u0632" + maybe("\u0651"), "az z" # الز sub boundary + "\u0627\u0644\u0633" + maybe("\u0651"), "as s" # الس sub boundary + "\u0627\u0644\u0634" + maybe("\u0651"), "ash sh" # الش sub boundary + "\u0627\u0644\u0635" + maybe("\u0651"), "aş ş" # الص sub boundary + "\u0627\u0644\u0636" + maybe("\u0651"), "aẕ ẕ" # الض sub boundary + "\u0627\u0644\u0637" + maybe("\u0651"), "aţ ţ" # الط sub boundary + "\u0627\u0644\u0638" + maybe("\u0651"), "az̧ z̧" # الظ sub boundary + "\u0627\u0644\u0644" + maybe("\u0651"), "al l" # الل sub boundary + "\u0627\u0644\u0646" + maybe("\u0651"), "an n" # الن # consonant characters sub "\u0628", "b" # ب sub "\u067E", "p" # پ sub "\u062a", "t" # ت # '\u067C': 'ṯ' # ټ sub "\u062B", "s̱" # ث sub "\u062c", "j" # ج sub "\u0686", "c" # چ # # The variant form ج is seen infrequently and does not have a # # single Unicode encoding. # '\u0681': 'dz' # Note 2 # ځ # '\u0685': 'ts' # Note 2 # څ sub "\u062d", "ḥ" # ح sub "\u062e", "ḵ" # خ sub "\u062f", "d" # د sub "\u0689", "ḏ" # ډ sub "\u0630", "ẕ" # ذ sub "\u0631", "r" # ر # '\u0693' : 'ṟ' # ړ sub "\u0632", "z" # ز sub "\u0698", "z" # ژ # '\u0696' : 'z͟h' # ږ sub "\u0633", "s" # س # '\u069A' : 's͟h' # ښ sub "\u0634", "š" # ش sub "\u0635", "ṣ" # ص sub "\u0636", "ż" # ض sub "\u0637", "ṭ" # ط sub "\u0638", "z" # ظ sub "\u0639", "‘" # ع sub "\u063a", "gh" # غ sub "\u0641", "f" # ف sub "\u0642", "q" # ق # '\u0643' : 'k' # ك sub "\u06A9", "k" # ک sub "\u06AF", "g" # گ sub "\u0644", "l" # ل sub "\u0645", "m" # م sub "\u0646", "n" # ن # '\u06BC' : 'ṉ' # ڼ sub "\u0648", "v" # و sub "\u0647", "h" # ه sub "\u064a", "y" # ي sub "\u0649", "y" # ي sub "\u06D0", "ē" # ې sub "\u06CD", "êy" # ۍ # shadda sub "\u0628", "bb" # ب sub "\u067E", "pp" # پ sub "\u062a", "tt" # ت sub "\u062B", "s̱s̱" # ث sub "\u062c", "jj" # ج sub "\u0686", "č̱č̱" # چ sub "\u062d", "ḥḥ" # ح sub "\u062e", "ḵḵ" # خ sub "\u062f", "dd" # د sub "\u0689", "ḏḏ" # ډ sub "\u0630", "ẕẕ" # ذ sub "\u0631", "rr" # ر sub "\u0632", "zz" # ز sub "\u0698", "zz" # ژ sub "\u0633", "ss" # س sub "\u0634", "šš" # ش sub "\u0635", "ṣṣ" # ص sub "\u0636", "żż" # ض sub "\u0637", "ṭṭ" # ط sub "\u0638", "zz" # ظ sub "\u0639", "‘" # ع sub "\u063a", "gh" # غ sub "\u0641", "ff" # ف sub "\u0642", "qq" # ق sub "\u06A9", "kk" # ک sub "\u06AF", "gg" # گ sub "\u0644", "ll" # ل sub "\u0645", "mm" # م sub "\u0646", "nn" # ن sub "\u0648", "vv" # و sub "\u0647", "hh" # ه sub "\u064a", "yy" # ي sub "\u0649", "yy" # ي sub "\u06D0", "ēē" # ې sub "\u06CD", "êy" # ۍ }
}