Jump to content

Decipherment

From Wikipedia, the free encyclopedia

Inphilology,deciphermentis the discovery of the meaning of the symbols found in extinctlanguagesand/oralphabets.[1]Today, at least a dozen languages remain undeciphered.[2]A notable recent decipherment was that of theLinear Elamitescript.[3]

Decipherment overlaps with another technical field known ascryptanalysis,a field that aims to decipher writings used in secret communication, known asciphertext.A famous case of this was in thecryptanalysis of the Enigmaduring theWorld War II.Many other ciphers from past wars have only recently been cracked.[4]Unlike in language decipherment, however, actors using ciphertext intentionally lay obstacles to prevent outsiders from uncovering the meaning of the communication system.[5]

Categories

[edit]

According to Gelb and Whiting, the approach of decipherment depends on four categories of situations in an undeciphered language:[5][6]

  • Type O: known writing and known language. Although decipherment in this case is trivial, useful information can be gleaned when a known language is written in an alphabet other than the one it is commonly written in. Studying the writing of thePhoenicianorSumerianlanguages in theGreek alphabetallows information about pronunciation and vocalization to be gleaned that cannot be obtained when studying the expression of these languages in their normal writing system.
  • Type I: unknown writing and known language. Deciphered languages in this category includePhoenician,Ugaritic,Cypriot,andLinear B.In this situation,alphabeticsystems are the easiest to decipher, followed bysyllabiclanguages, and finally the most difficult beinglogo-syllabic.
  • Type II: known writing and unknown language. An example isLinear A.Strictly speaking, this situation is not one of decipherment but of linguistic analysis. Decipherment in this category is considered extremely difficult to achieve on the basis of internal information only.
  • Type III: unknown writing and unknown language. Examples include the Archanes script and the Archanes formula,Phaistos disk,Cretan hieroglyphs,andCypro-Minoan syllabary.When this situation occurs in an isolated culture and without the availability of outside information, decipherment is typically considered impossible.

Methods

[edit]

A number of methods are available to go about deciphering an extinct writing system or language. These can be divided into approaches utilizing external or internal information.[5]

External information

[edit]

Many successful encipherments have proceeded from the discovery of external information, a common example being through the use ofmultilingual inscriptions,such as theRosetta Stone(with the same text in three scripts:Demotic,hieroglyphic,andGreek) that enabled the decipherment of Egyptian hieroglyphic. In principle, multilingual text may be insufficient for a decipherment as translation is not a linear and reversible process, but instead represents an encoding of the message in a different symbolic system. Translating a text from one language into a second, and then from the second language back into the first, rarely reproduces exactly the original writing. Likewise, unless a significant number of words are contained in the multilingual text, limited information can be gleaned from it.[5]

Internal information

[edit]

Internal approaches are multi-step: one must first ensure that the writing they are looking at represents real writing, as opposed to a grouping of pictorial representations or a modern-day forgery without further meaning. This is commonly approached with methods from the field ofgrammatology.Prior to decipherment of meaning, one can then determine the number of distinctgraphemes(which, in turn, allows one to tell if the writing system is alphabetic, syllabic, or logo-syllabic; this is because such writing systems typically do not overlap in the number of graphemes they use[6]), the sequence of writing (whether it be from left to right, right to left, top to bottom, etc.), and the determination of whether individual words are properly segmented when the alphabet is written (such as with the use of a space or a different special mark) or not. If a repetitive schematic arrangement can be identified, this can help in decipherment. For example, if the last line of a text has a small number, it can be reasonably guessed to be referring to the date, where one of the words means "year" and, sometimes, a royal name also appears. Another case is when the text contains many small numbers, followed by a word, followed by a larger number; here, the word likely means "total" or "sum". After one has exhausted the information that can be inferentially derived from probable content, they must transition to the systematic application of statistical tools. These include methods concerning the frequency of appearance of each symbol, the order in which these symbols typically appear, whether some symbols appear at the beginning or end of words, etc. There are situations where orthographic features of a language make it difficult if not impossible to decipher specific features (especially without certain outside information), such as when an alphabet does not express double consonants. Additional, and more complex methods, also exist. Eventually, the application of such statistical methods becomes exceedingly laborious, in which computers might be used to apply them automatically.[5]

Computational approaches

[edit]

Computational approaches towards the decipherment of unknown languages began to appear in the late 1990s.[7]Typically, there are two types of computational approaches used in language decipherment: approaches meant to produce translations in known languages, and approaches used to detect new information that might enable future efforts at translation. The second approach is more common, and includes things such as the detection of cognates or related words, discovery of the closest known language, word alignments, and more.[6]

Artificial intelligence

[edit]

In recent years, there has been a growing emphasis on methods utilizingartificial intelligencefor the decipherment of lost languages, especially throughnatural language processing(NLP) methods. Proof-of-concept methods have independently re-decipheredUgariticandLinear Busing data from similar languages, in this caseHebrewandAncient Greek.[8]

Deciphering pronunciation

[edit]

Related to attempts to decipher the meaning of languages and alphabets, include attempts to decipher how extinct writing systems, or older versions of contemporary writing systems (such as English in the 1600s) were pronounced. Several methods and criteria have been developed in this regard. Important criteria include (1) Rhymes and the testimony of poetry (2) Evidence from occasional spellings and misspellings (3) Interpretations of material in one language from authors in foreign languags (4) Information obtained from related languages (5) Grammatical changes in spelling over time.[9]

For example, analysis of poetry focuses on the use of wordplay or literary techniques between words that have a similar sound.Shakespeare's playRomeo and Julietcontains wordplay that relies on a similar sound between the words "soul" and "soles", allowing confidence that the similar pronunciation between the terms today also existed in Shakespeare's time. Another common source of information on pronunciation is when earlier texts userhyme,such as when consecutive lines in poetry end in the similar or the same sound. This method does have some limitations however, as texts may use rhymes that rely on visual similarities between words (such as 'love' and 'remove') as opposed to auditory similarities, and that rhymes can be imperfect. Another source of information about pronunciation comes from explicit description of pronunciations from earlier texts, as in the case of theGrammatica Anglicana,such as in the following comment about the letter <o>: "In the long time it naturally soundeth sharp, and high; as in chósen, hósen, hóly, fólly [...] In the short time more flat, and a kin to u; as còsen, dòsen, mòther, bròther, lòve, pròve".[10]Another example comes from detailed comments on pronunciations ofSanskritfrom the surviving works of Sanskrit grammarians.[9]

Challenges

[edit]

Many challenges exist in the decipherment of languages, including when:[2][6]

  • When it is not known which language is closest to it.
  • When the words in the script are not clearly segmented, like in someIberian languages.
  • When the writing system is not known. In specific, if there is little certainty towards the number of graphemes that exist in a certain writing system, it cannot be determined if that system is an alphabet, a syllabry, a logosyllabry, or something else.
  • When the reading direction is not known. For example, it may not be clear if a writing system is meant to be read from left to right, or from right to left.
  • When it is not known if a script uses punctuation or spaces between words.
  • When the language of a script subject to decipherment efforts is not known.
  • When there is a small dataset available to learn about the properties of a script. This could lead to issues such as an incomplete vocabulary being known for the script.
  • When the typical order between subjects, objects, and verbs is not known.
  • When it is not known whether or how certain words can change their form.
  • When it is not known when multiple symbols are used to represent the same sound, syllable, word, concept, or idea (allographs).
  • When it is not clear how the penmanship or the style of writing of a particular scribe relates to the style of writing of another scribe working in the same text (the same letters or words might be written in a way that looks different), in which case it is difficult to correlate information across multiple examples of the use of the writing system.
  • When it is not known if certain words change their meaning depending on the context they appear in (homonyms).
  • When the context of discovery of a writing is not known. This is because information about the location out of which a writing system came from can provide valuable information about its relationship to known languages.
  • When adequate digital datasets for documented writing systems is not available, limiting the ability to use computational methods for decipherment.
  • When sufficient hardware resources, such ashigh performance computing,is not available (which might be necessary for more computationally intensive computational methods).

Notable decipherers

[edit]
Name of scholar Script deciphered Date
Magnus Celsius Staveless Runes 1674
Jón Ólafsson of Grunnavík Cipher runes 1740s
Jean-Jacques Barthélemy Palmyrene alphabet 1754
Jean-Jacques Barthélemy Phoenician alphabet 1758
Antoine-Isaac Silvestre de Sacy Pahlavi script 1791
Jean-François Champollion Egyptian Hieroglyphs(Decipherment) 1822
Georg Friedrich Grotefend,Eugène Burnouf,andHenry Rawlinson Old Persian Cuneiform(Decipherment) 1823
Thomas Young Demotic script
Manuel Gómez-Moreno Northeastern Iberian script
James Prinsep Brahmi,Kharosthi
Edward Hincks MesopotamianCuneiform
Bedřich Hrozný Hittite Cuneiform
Vilhelm Thomsen Old Turkic
George SmithandSamuel Birch,et al.[11] Cypriot syllabary
Hans BauerandÉdouard Paul Dhorme[12] Ugaritic alphabet
Wáng Yìróng,Liú È,Sūn Yíràng,et al. Oracle Bone script
Aleksei Ivanovich Ivanov,Nikolai Aleksandrovich Nevsky,et al. Tangut script
Michael Ventris,John Chadwick,andAlice Kober Linear B
Yuri KnorozovandTatiana Proskouriakoff,et al. Maya
Louis Félicien de Saulcy Libyco-Berberscript (almost fully)
Jan-Olof Tjäder "Enlarged opening script" ofRavenna(variant of theLatin alphabet)
Zaza Alexidze Caucasian Albanian alphabet
François Desset[3] Linear Elamite

See also

[edit]

Deciphered scripts

[edit]

Undeciphered scripts

[edit]

Undeciphered texts

[edit]

References

[edit]
  1. ^Although the script,Libyco-Berber,has been almost fully deciphered, the language has not.
  1. ^Trask, R.L (2000).The Dictionary of Historical and Comparative Linguistics.Fitzroy Dearborn Publishers, p. 82 ( "The process of determining the relation between an extinct and unknown writing system and the language it represents. Strictly, decipherment is the elucidation of thescript—that is, determining the values of the written characters ")
  2. ^abLuo, Jiaming; Hartmann, Frederik; Santus, Enrico; Barzilay, Regina; Cao, Yuan (2021)."Deciphering Undersegmented Ancient Scripts Using Phonetic Prior".Transactions of the Association for Computational Linguistics.9:69–81.doi:10.1162/tacl_a_00354.ISSN2307-387X.
  3. ^abDesset, François; Tabibzadeh, Kambiz; Kervran, Matthieu; Basello, Gian Pietro; Marchesi, and Gianni (2022-07-01)."The Decipherment of Linear Elamite Writing".Zeitschrift für Assyriologie und vorderasiatische Archäologie.112(1): 11–60.doi:10.1515/za-2022-0003.ISSN1613-1150.
  4. ^Bauer, Craig P. (2023-03-04)."The new golden age of decipherment".Cryptologia.47(2): 97–100.doi:10.1080/01611194.2023.2170158.ISSN0161-1194.
  5. ^abcdeGelb, I. J.; Whiting, R. M. (1975)."Methods of Decipherment".Journal of the Royal Asiatic Society.107(2): 95–104.doi:10.1017/S0035869X00132769.ISSN2051-2066.
  6. ^abcdBraović, Maja; Krstinić, Damir; Štula, Maja; Ivanda, Antonia (2024-06-01)."A Systematic Review of Computational Approaches to Deciphering Bronze Age Aegean and Cypriot Scripts".Computational Linguistics.50(2): 725–779.doi:10.1162/coli_a_00514.ISSN0891-2017.
  7. ^Knight, Kevin; Yamada, Kenji (1999)."A Computational Approach to Deciphering Unknown Scripts"(PDF).Unsupervised Learning in Natural Language Processing.
  8. ^Luo, Jiaming; Cao, Yuan; Barzilay, Regina (2019)."Neural Decipherment via Minimum-Cost Flow: From Ugaritic to Linear B".arXiv.Association for Computational Linguistics: 3146–3155.doi:10.18653/v1/P19-1303.
  9. ^abCampbell, Lyle (2021).Historical linguistics: an introduction(4th ed.). MIT Press. pp. 372–375.ISBN978-0-262-53159-7.
  10. ^Burridge, Kate; Bergs, Alexander (2017).Understanding language change.Understanding language series. London New York: Routledge, Taylor & Francis Group. pp. 234–235.ISBN978-0-415-71339-9.
  11. ^"Cypro-Syllabic".
  12. ^"Anatomy of a Decipherment",http://images.library.wisc.edu/WI/EFacs/transactions/WT1966/reference/wi.wt1966.adcorre.pdf"

Further reading

[edit]
  • Daniels, Peter T. (2020). "The Decipherment of Ancient Near Eastern Languages". In Hasselbach-Andee, Rebecca (ed.).A Companion to Ancient Near Eastern Languages.Wiley. pp. 1–25.
  • Ferrera, Silvia; Tamburini, Fabio (2022)."Advanced techniques for the decipherment of ancient scripts".Lingue e linguaggio:239–259.