Jump to content

Pangloss Collection

From Wikipedia, the free encyclopedia

ThePangloss Collectionis adigital librarywhose objective is to store and facilitate access toaudio recordingsinendangered languagesof the world. Developed by theLACITOcentre ofCNRSinParis,the collection provides freeonline accessto documents ofconnected,spontaneous speech, in otherwise little-documentedlanguages of all continents.[1]

Principles[edit]

A sound archive with synchronized transcriptions[edit]

For the science oflinguistics,language is first and foremost spoken language. The medium of spoken language is sound. The Pangloss Collection gives access to original recordings simultaneously with transcriptions and translations, as a resource for further research. After being recorded in its cultural context, texts have been transcribed in collaboration withnative speakers.

A structured, open architecture[edit]

The archived data is structured in accordance with the latest data-processing standards, asopen architecture,in anopen format,and may be downloaded under aCreative Commons license.The software used to prepare and disseminate it isopen-source.The Pangloss Collection is a member of theOLACnetwork of archival repositories and of the Digital Endangered Languages and Music Archive Network (DELAMAN).

History[edit]

The collection was initially called theLACITO Archive.[2][3]The project originated in 1996 from the collaboration of Boyd Michailovsky, linguist at LACITO, with John B. Lowe, engineer;[4]: 15 they were later joined by Michel Jacobson, engineer, who developed some tools for the project, and brought it online.[1]: 124 [4]

The purpose of the archive was “to conserve, and to make available for research, recorded and transcribed oral traditions and other linguistic materials in (mainly) unwritten languages, giving simultaneous access to sound recordings and text annotation.”[4]The earliest archived corpora in the collection were languages fromNepal,fromNew Caledonia,fromeastern AfricaandFrench Guiana.[5]

The archive has grown steadily since the early 2000s,[6]incorporating corpora from various linguists, whether members of LACITO or not. In 2009, the archive had 200 recordings in 45 languages.[7]In 2014, the (newly renamed)Pangloss Collectionhad 1,400 recordings in 70 languages.[1]: 121 

As of April 2021, the Pangloss archive contains5,038 recordings[8]in 196 languages,[9]totalling 780 hours of audio and video recordings.[6]

Languages in the Pangloss Collection include Mwotlap(Austronesian;Vanuatu),[10] Japhug(Sino-Tibetan;Southwest China),[11] Ersu(Sino-Tibetan;Southwest China),[12] Naxi(orYongning Na:Sino-Tibetan;Southwest China),[13] andCèmuhî(Austronesian;New Caledonia).[14]

References[edit]

  1. ^abcMichailovsky, Boyd, Martine Mazaudon, Alexis Michaud, Séverine Guillaume, Alexandre François &Evangelia Adamou.2014.Documenting and researching endangered languages: the Pangloss Collection.Language Documentation & Conservation8, pp. 119-135.
  2. ^Jacobson, Michel; Michailovsky, Boyd (2002).The LACITO Archive: its purpose and implementation.Int'l Workshop on Resources and Tools in Field Linguistics. Las Palmas, Canary Is., Spain.
  3. ^Screen capture of LACITO's archive homepage— 27 February 2001.
  4. ^abcJacobson, Michel; Michailovsky, Boyd; Lowe, John B. (2001)."Linguistic documents synchronizing sound and text".Speech Communication.Special issue: “Speech Annotation and Corpus Tools”.33(1–2): 79–96.CiteSeerX10.1.1.467.490.doi:10.1016/S0167-6393(00)00070-4.
  5. ^Screen capture of LACITO's archive contents— 22 April 2002.
  6. ^ab“About us” sectionof the Pangloss Collection (retrieved 24 April 2021)
  7. ^Screen capture of LACITO's archive contents— 26 November 2009.
  8. ^Source:list of all Pangloss resourceson the Cocoon homepage (retrieved 10 January 2022).
  9. ^Source: number of language entries in itslist of corpora(retrieved 24 April 2021).
  10. ^Mwotlap corpus:564 resources.
  11. ^Japhug corpus:551 resources.
  12. ^Ersu corpus:363 resources.
  13. ^Yongning Na corpus:301 resources.
  14. ^Cèmuhî corpus:230 resources.

External links[edit]