The Text Encoding Initiative


Eric Johnson

In the summer of 1994, in the "Editor's Notes" in TEXT Technology, I said, "For those of us who are interested in text analysis, all electronic texts can be useful, of course, and we are glad to have them in almost any form, but some formats are preferable to others." I described the formats that are commonly available, and I concluded that the most valuable and useful texts are encoded with the Standard Generalized Markup Language (SGML) following the guidelines of the Text Encoding Initiative (TEI).

TEI-conformant encoded texts can indicate the parts of a work: that is, they can mark the start and ends of lines, paragraphs, pages, chapters, acts, and so on; thus such texts can be processed to produce accurate indexes and concordances. In addition, speakers in a novel can be identified, tags can be added to show the part of speech for homographs, and many features of a text can be identified. Such texts allow students and scholars to do many kinds of research that would not be possible without them. Indeed, they both enable and stimulate precise textual and linguistic research. Also, texts encoded with the SGML that are created for one kind of computing platform or environment can usually be used on all others -- one of the primary goals of the TEI is to recommend a standard format for the interchange of texts.

For those of us who are bungling users of SGML, it is difficult to properly appreciate the enormous amount of thought and effort that has gone into the preparation of the TEI guidelines for the use of SGML. We should be thankful to Lou Burnard and C. M. Sperberg-McQueen, the Editors of the TEI Guidelines for Electronic Text Encoding and Interchange, and to the many others who worked with them.

The special issue of TEXT Technology edited by Lou Burnard gives a wealth of information about the use of the SGML following the TEI guidelines. After reading the articles in this issue, anyone who is serious about using SGML will want to have the complete TEI Guidelines: 1300 pages of detailed description, analysis, and examples of the use of the SGML.

The current edition of the Guidelines, often called the TEI P3, is available in two formats: two 600-page printed volumes or one CD-ROM (for the Macintosh or for PC with Windows). The cost for either format is $75.00. They may be ordered from

     C. M. Sperberg-McQueen
     University of Illinois at Chicago
     Academic Computing Center (M/C 135)
     1940 W. Taylor Rm. 124
     Chicago IL 60612-7352

or from

     TEI Orders
     Oxford University Computing Services
     13 Banbury Road
     Oxford OX2 6NN

In addition, the TEI Guidelines may be obtained free of charge by anonymous ftp from any of the following: (in pub/tei and its subdirectories) (in pub/SGML/tei/p3) (in pub/SGML/TEI) (in /TEI)

I am confident that readers of the special issue of TEXT Technology will enjoy perusing the articles about SGML and TEI and will profit from them. They indicate the directions of text processing for the future.

Click here to go to Eric Johnson's publications.

Click here to go to Eric Johnson's home page.