Text Encoding Initiative

TheText Encoding Initiative(TEI) is atext-centric community of practicein theacademic fieldofdigital humanities,operating continuously since the 1980s. The community currently runs a mailing list, meetings and conference series, and maintains the TEItechnical standard,ajournal,^[1]awiki,aGitHubrepository and atoolchain.

TEI guidelines

TheTEI Guidelinescollectively define a type ofXMLformat, and are the defining output of the community of practice. The format differs from other well-knownopen formatsfor text (such asHTMLandOpenDocument) in that it is primarily semantic rather than presentational: the semantics and interpretation of every tag and attribute are specified. There are some 500 different textual components and concepts:word,^[2]sentence,^[3]character,^[4]glyph,^[5]person,^[6] etc. Each is grounded in one or more academic disciplines and examples are given.

Technical details

The standard is split into two parts, a discursive textual description with extended examples and discussion and set of tag-by-tag definitions. Schemata in most of the modern formats (DTD,RELAX NGandXML Schema (W3C)) are generated automatically from the tag-by-tag definitions. A number of tools support the production of the guidelines and the application of the guidelines to specific projects.

A number of special tags are used to circumvent restrictions imposed by the underlyingUnicode;glyphto allow representation of characters that do not qualify for Unicode inclusion^[2]andchoiceto allow overcome the required strict linearity.^[7]

Most users of the format do not use the complete range of tags, but produce a customisation using a project-specific subset of the tags and attributes defined by the Guidelines. The TEI defines a sophisticated customization mechanism known as ODD for this purpose. In addition to documenting and describing each TEI tag, an ODD specification specifies its content model and other usage constraints, which may be expressed usingschematron.

TEI Liteis an example of such a customization. It defines an XML-basedfile formatfor exchanging texts. It is a manageable selection from the extensive set of elements available in the full TEI Guidelines.

As an XML-based format, TEI cannot directly deal withoverlapping markupand non-hierarchical structures. A variety of options to represent this sort of data is suggested by the guidelines.^[8]

Examples

The text of the TEI guidelines is rich in examples. There is also a samples page on the TEI wiki,^[9]which gives examples of real-world projects that expose their underlying TEI.

Prose tags

TEI allows texts to be marked up syntactically at any level of granularity, or mixture of granularities. For example, this paragraph (p) has been marked up into sentences (s) and clauses (cl).^[10]

<s>
<cl>ItwasaboutthebeginningofSeptember,1664,
<cl>thatI,amongtherestofmyneighbours,
heardinordinarydiscourse
<cl>thattheplaguewasreturnedagaintoHolland;</cl>
</cl>
</cl>
<cl>forithadbeenveryviolentthere,andparticularlyat
AmsterdamandRotterdam,intheyear1663,</cl>
<cl>whither,<cl>theysay,</cl>itwasbrought,
<cl>somesaid</cl>fromItaly,othersfromtheLevant,amongsomegoods
<cl>whichwerebroughthomebytheirTurkeyfleet;</cl>
</cl>
<cl>otherssaiditwasbroughtfromCandia;
othersfromCyprus.</cl>
</s>
<s>
<cl>Itmatterednot<cl>fromwhenceitcame;</cl>
</cl>
<cl>butallagreed<cl>itwascomeintoHollandagain.</cl>
</cl>
</s>

Verse

TEI has tags for marking up verse. This example (taken from the French translation of the TEI Guidelines) shows a sonnet.^[11]

<divtype="sonnet">
<lgtype="quatrain">
<l>Lesamoureuxferventsetlessavantsaustères</l>
<l>Aimentégalement,dansleurmûresaison,</l>
<l>Leschatspuissantsetdoux,orgueildelamaison,</l>
<l>Quicommeeuxsontfrileuxetcommeeuxsédentaires.</l>
</lg>
<lgtype="quatrain">
<l>Amisdelascienceetdelavolupté</l>
<l>Ilscherchentlesilenceetl'horreurdesténèbres;</l>
<l>L'Érèbeleseûtprispoursescoursiersfunèbres,</l>
<l>S'ilspouvaientauservageinclinerleurfierté.</l>
</lg>
<lgtype="tercet">
<l>Ilsprennentensongeantlesnoblesattitudes</l>
<l>Desgrandssphinxallongésaufonddessolitudes,</l>
<l>Quisemblents'endormirdansunrêvesansfin;</l>
</lg>
<lgtype="tercet">
<l>Leursreinsfécondssontpleinsd'étincellesmagiques,</l>
<l>Etdesparcellesd'or,ainsiqu'unsablefin,</l>
<l>Étoilentvaguementleursprunellesmystiques.</l>
</lg>
</div>

Choice tag

Thechoicetag is used to represent sections of text that might be encoded or tagged in more than one possible way. In the following example, based on one in the standard,choiceis used twice, once to indicate an original and a corrected number, and once to indicate an original and regularised spelling.^[12]

<pxml:id="p23">Lastly,That,uponhissolemnoathtoobservealltheabove
articles,thesaidman-mountainshallhaveadailyallowanceof
meatanddrinksufficientforthesupportof<choice>
<sic>1724</sic>
<corr>1728</corr>
</choice>ofoursubjects,
withfreeaccesstoourroyalperson,andothermarksofour
<choice>
<orig>favour</orig>
<reg>favor</reg>
</choice>.

ODD

One Document Does it all( "ODD" ) is aliterate programminglanguage forXML schemas.^[13]^[14]^[15]^[16]

In literate-programming style, ODD documents combine human-readable documentation and machine-readable models using the Documentation Elements module of the Text Encoding Initiative. Tools generatelocalised and internationalised HTML,ePub,orPDFhuman-readable output andDTDs,W3C XML Schema,Relax NGCompact Syntax, or Relax NG XML Syntax machine-readable output.

The Roma web application^[17]is built around the ODD format and can use it to generate schemas inDTD,W3C XML Schema,Relax NGCompact Syntax, or Relax NG XML Syntax formats, as used by many XML validation tools and services.

ODD is the format used internally by the Text Encoding Initiative for the TEItechnical standard.^[18]Although ODD files generally describe the difference between a customized XML format and the full TEI model, ODD also can be used to describe XML formats that are entirely separate from the TEI. One example of this is theW3C's Internationalization Tag Setwhich uses the ODD format to generate schemas and document its vocabulary.^[19]^[20]

TEI customizations

TEI customizations are specializations of the TEI XML specification for use in particular fields or by specific communities.

EpiDoc(Epigraphic Documents)
Charters Encoding Initiative^[21]
Medieval Nordic Text Archive (Menota)^[22]

Customization in the TEI is done through the ODD mechanism mentioned above. In truth since its P5 version, all so-called 'TEI Conformant' uses of the TEI Guidelines are based on a TEI customization documented in a TEI ODD file. Even when users choose one of the off-the-shelf pre-generated schemas to validate against, these have been created from freely available customization files.

Projects

The format is used by many projects worldwide. Practically all projects are associated with one or more universities. Some well-known projects that encode texts using TEI include:

TEI projects
Project	URL	Subject(s)
British National Corpus	http://www.natcorp.ox.ac.uk	100-million-word snapshot of current English-language usage
Oxford Text Archive	https://ota.bodleian.ox.ac.uk/repository/xmlui/	>1 GB ofLinguisticdata and electronic texts in 25 languages
Perseus Project	https://www.perseus.tufts.edu/	GreekandLatintexts
EpiDoc	https://sourceforge.net/p/epidoc/wiki/Home/	Epigraphyandpapyrology
Women Writers Project	https://wwp.northeastern.edu/	Early modern women writers(Margaret Cavendish,Eliza Haywood,etc.)
New Zealand Electronic Text Centre	http://www.nzetc.org/	New ZealandandPacific Islandstexts
The SWORD Project	https://www.crosswire.org/sword/	Bible software,dictionaries,Christian literature
FreeDict	https://freedict.org/	Bilingual dictionaries
Text Creation Partnership	https://textcreationpartnership.org/	Early British and American books
CELT	https://celt.ucc.ie/publishd.html	Ancient and medieval Irish manuscripts
ISTEX	https://www.istex.fr/	Archives of scientific publications
CAB	https://cab.geschkult.fu-berlin.de/	An edition of the Zoroastrian rituals of theAvesta,in theAvestanlanguages

History

Prior to the creation of TEI, humanities scholars had no common standards for encoding electronic texts in a manner that would serve their academic goals (Hockey1993, p. 41). In 1987, a group of scholars representing fields in humanities, linguistics, and computing convened at Vassar College to put forth a set of guidelines known as the “Poughkeepsie Principles”. These guidelines directed the development of the first TEI standard, "P1".^[23]^[24]

1987 – Work started by theAssociation for Computers and the Humanities,^[25]theAssociation for Computational Linguistics,and theAssociation for Literary and Linguistic Computingon what would become the TEI.^[26]This culminated in theClosing statement of the Vassar Planning Conference.^[27]
1994 – TEI P3 released,^[28]co-edited byLou Burnard(atOxford University) andMichael Sperberg-McQueen(then at theUniversity of Illinois at Chicago,later at theW3C).
1999 – TEI P3 updated.
2002 – TEI P4 released, moving from SGML to XML; adoption ofUnicode,which XML parsers are required to support.^[29]
2007 – TEI P5 released, including integration with thexml:langandxml:idattributes from the W3C^[30](these had previously been attributes in the TEI namespace), regularization of local pointing attributes to use the hash (as used in HTML) and unification of the ptr and xptr tags. Together these changes with many more new additions make P5 more regular and bring it closer to current xml practice as promoted by the W3C and as used by other XML variants. Maintenance and feature update versions of TEI P5 have been released at least twice a year since 2007.
2011 – TEI P5 v2.0.1 released with support forgenetic editing^[31](among many other additions, the genetic-editing features allow encoding of texts without interpretation as to their specific semantics).
2017 – TEI was awarded the Antonio Zampolli Prize from the Alliance of Digital Humanities Organizations.^[32]

References

^"Journal of the Text Encoding Initiative".Open Edition Journals.Retrieved29 June2022.
^^a ^b"TEI element w (word)".tei-c.org.
^"TEI element s (s-unit)".tei-c.org.
^"TEI element c (character)".tei-c.org.
^"TEI element g (character or glyph)".tei-c.org.
^"TEI element person (person)".tei-c.org.
^"Element choice".www.tei-c.org.
^"20 Non-hierarchical Structures - TEI P5: — Guidelines for Electronic Text Encoding and Interchange".tei-c.org.2019.Retrieved19 March2019.
^"Samples of TEI texts".wiki.tei-c.org.2011.Retrieved17 April2012.
^"17 Simple Analytic Mechanisms - TEI P5: — Guidelines for Electronic Text Encoding and Interchange".tei-c.org.2012.Retrieved15 April2012.
^"TEI element lg (groupe de vers)".tei-c.org.2012. Archived fromthe originalon 6 June 2012.Retrieved15 April2012.
^"TEI element choice".tei-c.org.2012.Retrieved15 April2012.
^Bauman, Syd; Flanders, Julia (2004), "ODD customizations",Extreme Markup Languages 2004,archived fromthe originalon 2012-03-29,retrieved2012-04-15.
^Burnard, Lou;Rahtz, Sebastian(2004), "RelaxNG with Son of ODD",Extreme Markup Languages 2004,archived fromthe originalon 2012-03-29,retrieved2012-04-15.
^Reiss, Kevin M. (2007),Literate Documentation for XML(PDF),Urbana-Champaign, Illinois: Digital Humanities 2007, archived fromthe original(PDF)on 2016-03-03,retrieved2012-04-15.
^Burnard, Lou;Rahtz, Sebastian(June 2013)."A complete schema definition language for the Text Encoding Initiative".XML London 2013:152–161.doi:10.14337/XMLLondon13.Rahtz01(inactive 2024-02-27).ISBN 978-0-9926471-0-0.{{cite journal}}:CS1 maint: DOI inactive as of February 2024 (link)
^Roma web application
^Burnard, Lou; Bauman, Syd, eds. (2007),TEI P5: Guidelines for Electronic Text Encoding and Interchange,Charlottesville, Virginia, USA: TEI Consortium.
^W3C ITSandTEI ODD file Archived2017-07-15 at theWayback Machine
^Savourel, Yves; Kosek, Jirka; Ishida, Richard, eds. (2008), "5.2 ITS and TEI",Best Practices for XML Internationalization,W3C Working Group.
^"Charters Encoding Initiative - Ludwig-Maximilians-Universität München".www.cei.lmu.de.
^"Medieval Nordic Text Archive (Menota)".www.menota.org.
^Ahronheim, J.R. (1998). "Descriptive metadata: Emerging standards".Journal of Academic Librarianship.24(5): 395–403.doi:10.1016/S0099-1333(98)90079-9.
^Cantara, L. (2005). "The text-encoding initiative: Part 1".OCLC Systems & Services.21(1): 36–39.doi:10.1108/10650750510578136.
^"The Association for Computers and the Humanities |".ach.org.
^"Historical background",section iv.2of TEI P5: Guidelines for Electronic Text Encoding and Interchange.
^"Closing statement of the Vassar Planning Conference".tei-c.org.2009.Retrieved15 April2012.
^"TEI Guidelines".Retrieved2010-06-18.
^"2",XML Basics,retrieved2011-07-09
^"Extensible Markup Language (XML) 1.0 (Fifth Edition)".w3.org.
^"P5 version 2.0.1 release notes".tei-c.org.2012.Retrieved15 April2012.
^"TEI: Text Encoding Initiative".

External links

TEI Consortium Web sitewith a list ofTEI projects,aform for adding your project Archived2017-03-05 at theWayback Machineandwiki
Journal of the TEI Archived2019-01-18 at theWayback Machine
TEI Lite: An Introduction to Text Encoding for Interchange
TEI @ Oxford Archived2021-04-13 at theWayback Machine(hosted atOxford University) with development and backup versions of much of the core content.
TEI GitHub site(hosted atGitHub) with repository and issue tracker
Larger list of TEI Projects
What is the TEI?(Introductory overview by Lou Burnard)

[1] "Journal of the Text Encoding Initiative".Open Edition Journals.Retrieved29 June2022.

[auto-2] "TEI element w (word)".tei-c.org.

[3] "TEI element s (s-unit)".tei-c.org.

[4] "TEI element c (character)".tei-c.org.

[5] "TEI element g (character or glyph)".tei-c.org.

[6] "TEI element person (person)".tei-c.org.

[7] "Element choice".www.tei-c.org.

[8] "20 Non-hierarchical Structures - TEI P5: — Guidelines for Electronic Text Encoding and Interchange".tei-c.org.2019.Retrieved19 March2019.

[9] "Samples of TEI texts".wiki.tei-c.org.2011.Retrieved17 April2012.

[10] "17 Simple Analytic Mechanisms - TEI P5: — Guidelines for Electronic Text Encoding and Interchange".tei-c.org.2012.Retrieved15 April2012.

[11] "TEI element lg (groupe de vers)".tei-c.org.2012. Archived fromthe originalon 6 June 2012.Retrieved15 April2012.

[12] "TEI element choice".tei-c.org.2012.Retrieved15 April2012.

[13] Bauman, Syd; Flanders, Julia (2004), "ODD customizations",Extreme Markup Languages 2004,archived fromthe originalon 2012-03-29,retrieved2012-04-15.

[14] Burnard, Lou;Rahtz, Sebastian(2004), "RelaxNG with Son of ODD",Extreme Markup Languages 2004,archived fromthe originalon 2012-03-29,retrieved2012-04-15.

[15] Reiss, Kevin M. (2007),Literate Documentation for XML(PDF),Urbana-Champaign, Illinois: Digital Humanities 2007, archived fromthe original(PDF)on 2016-03-03,retrieved2012-04-15.

[16] Burnard, Lou;Rahtz, Sebastian(June 2013)."A complete schema definition language for the Text Encoding Initiative".XML London 2013:152–161.doi:10.14337/XMLLondon13.Rahtz01(inactive 2024-02-27).ISBN 978-0-9926471-0-0.{{cite journal}}:CS1 maint: DOI inactive as of February 2024 (link)

[17] Roma web application

[18] Burnard, Lou; Bauman, Syd, eds. (2007),TEI P5: Guidelines for Electronic Text Encoding and Interchange,Charlottesville, Virginia, USA: TEI Consortium.

[19] W3C ITSandTEI ODD file Archived2017-07-15 at theWayback Machine

[20] Savourel, Yves; Kosek, Jirka; Ishida, Richard, eds. (2008), "5.2 ITS and TEI",Best Practices for XML Internationalization,W3C Working Group.

[21] "Charters Encoding Initiative - Ludwig-Maximilians-Universität München".www.cei.lmu.de.

[22] "Medieval Nordic Text Archive (Menota)".www.menota.org.

[23] Ahronheim, J.R. (1998). "Descriptive metadata: Emerging standards".Journal of Academic Librarianship.24(5): 395–403.doi:10.1016/S0099-1333(98)90079-9.

[24] Cantara, L. (2005). "The text-encoding initiative: Part 1".OCLC Systems & Services.21(1): 36–39.doi:10.1108/10650750510578136.

[25] "The Association for Computers and the Humanities |".ach.org.

[26] "Historical background",section iv.2of TEI P5: Guidelines for Electronic Text Encoding and Interchange.

[27] "Closing statement of the Vassar Planning Conference".tei-c.org.2009.Retrieved15 April2012.

[28] "TEI Guidelines".Retrieved2010-06-18.

[29] "2",XML Basics,retrieved2011-07-09

[30] "Extensible Markup Language (XML) 1.0 (Fifth Edition)".w3.org.

[31] "P5 version 2.0.1 release notes".tei-c.org.2012.Retrieved15 April2012.

[32] "TEI: Text Encoding Initiative".

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]