TEI - Text Encoding Initiative

Tagged:

Executive Summary

TEI stands for Text Encoding Initiative, and it is used to markup literary documents such as poetry and prose. Through TEI it is possible to not only denote the descriptive aspects of works but the analytical aspects as well. Used to its fullest exstent, TEI provides the means for a great deal of textual analysis.

What It Is

TEI is an XML markup language. Rooted in SGML, it is used to denote the structural as well as interpretive aspects of literary works. In other words, people who markup documents in TEI denote things like parts, chapters, paragraphs, line groups, and lines. At the same time it is possible to denote things like marginalia, typographical layout, but most importantly commentary on the text itself. Think of the TEI markup process analagous the the cataloing of every internal aspect of a book. Every single part can be analysised to some degree. To what degree is dependent on the markup policies of person/institution doing the work.

What Can Be Done With It

Combined with XML-aware analysis tools it is possible to extract themes from sets of texts and compare them accordingly. While this is the vision, most markup is simply used to render documents in Web browsers, fill databases, or render the document as PDF.

Examples

Who Should Be Using It

Just about any library who wants to provide electronic versions of their texts should consider the use of TEI. By doing so libraries can provide increased access to their "special collections" content while perserving the original items.

Related Technologies

  • DocBook - a very similar XML markup language specifically designed for computer manuals
  • XSL (Extensible Stylesheet Language) - used to transform XML documents into other types of plain text files.

More Information