SGML: Lou Burnard Intro to TextTech

Lou Burnard's Introduction

[CR: 19950807]

This is an "Introduction to Electronic Texts and the Text Encoding Initiative. A Special Issue [5.3] of 'TEXT Technology: The Journal of Computer Text Processing," by Lou Burnard. See the full bibliographic entry in the database.


This special issue of Text Technology collects together a variety of papers reporting the experience of some intrepid pioneers in the most rapidly growing area of text technology: the application of application-independent markup, specifically the recommendations of the Text Encoding Initiative, in the creation of a new way to "do electronic text". What are the salient features of this "new way" and why is it likely to be more than a passing fad?

One way to begin answering these questions might be by a reading of Jeffery Triggs' contribution on the varieties of Electronic Text. Triggs shows by a series of examples how easily an electronic text can fall short of realizing the full potential offered by its new medium. He calls to our attention the ways in which preconceived ideas of electronic text as a substitute for printed page can obstruct the goal of multi-purpose plasticity which so attracted us to texts in electronic form in the first place. He also warns us of the dangers of locking away the results of our hard editorial endeavours within a proprietary format, thus limiting its use to particular software systems.

If we have learned nothing else in ten years of unprecedented activity in computer-based text processing, it must be that today's operating system and today's killer application program last only a little longer than today's newspaper. Yet even now, software vendors continue to assure us that our data is proof against time, so long as we continue to use their software. We are inclined to distrust such assurances.

The TEI Guidelines have much to offer any researcher engaged in the business of creating, manipulating, or managing electronic resources, but two particular research communities seem to have most to gain from their application. These are the library community, represented here by a wide-ranging paper from John Price-Wilkin, and the creators and users of language corpora, represented here in the report of an exciting new multilingual corpus project, involving researchers from many different countries, from Laurent Romary and colleagues.

Price-Wilkin provides an excellent account of both the strengths and the limitations of the World Wide Web's approach to the use of SGML, spelling out in practical detail exactly how the TEI approach may be used to address those weaknesses, without losing the benefits of the succcess story which is the World Wide Web today. His paper should be required reading for anyone aiming to set up an electronic text centre, virtual library, or whatever name we finally settle on for the entity which will replace (or complement) the traditional scholarly library within the next decade.

Romary's paper is representative of the extraordinary spurt of activity in the European research community concerned with language engineering, the application of information technology to the age old problems of language understanding and learning, felt perhaps most keenly in the European Union, with its nine official languages. The research described by Romary and his colleagues is typical of many projects now underway, in which the building of multilingual corpora and ancillary software are seen as essential new tools in language pedagogy. The promise of long term portability and re-usability offered by the use of standards such as the TEI is naturally seen as central to all such endeavours.

If they are to be successful, the TEI Guidelines must be usable by the widest range of text technologists. For that reason, it is particularly important that they should be taken up in the teaching environment where tomorrow's technologists acquire their basic skills. Two valuable papers in this collection present some early experiences of pioneers in this effort: one, by Susan Kruse, from the teacher's perspective, and the other, by James Tauber, from the student's. In both cases, the overall tone is one of a pleasant surprise that the task of converting to the TEI way, initially so forbidding, turns out in fact to be surprisingly easy. As more and better tools become available, it is to be hoped that the steepness of that initial learning curve will smooth out still further.

Because of their intended generality, the Guidelines often offer many different ways of encoding a particular textual feature. Using them to the full may thus require a careful consideration of the implications of each of many possibilities, and a clear statement of the policy to be adopted in each application. For those concerned with choosing the best way of encoding a text, rather than simply the most expedient, or the easiest, Syd Baumann's thoughtful contribution provides an excellent role model: his paper considers exhaustively several different ways in which a simple table of contents might be encoded in a TEI-conformant manner, analysing the implications and significance of each. Not every application may need to be quite so meticulous; the strength of the TEI approach is however that those which do can find in the Guidelines all the flexibility they need.

The volume concludes with a simple introduction to the bare bones of the TEI scheme intended to whet the appetite of the reader for a more detailed and thorough exposition. Written by my esteemed colleague and co-editor of the TEI Guidelines, Michael Sperberg-McQueen, it presents the bare essentials of the TEI encoding scheme, in a copiously illustrated and very accessible form, designed specifically for the novice text encoder.

A word about the preparation of this issue may be of interest. At the request of the editor of Text Technology, Eric Johnson, I posted a call for contributions on the TEI's Internet-based mailing list TEI-L. I was most agreeably surprised by the reaction: within a few weeks, I received offers of contributions from seven or eight people, many of whom were personally unknown to me. I stipulated no particular format for text submission, and consequently received papers in LaTeX, Microsoft Word, formatted ASCII, HTML and TEI conformant SGML. I converted all of these to the same TEI-conformant form, using my favourite workhorse, the indispensable emacs. For publication, I was requested to supply copy in WordPerfect format: to accomplish this, I wrote about half a dozen WordPerfect macros. Fully SGML-aware versions of both WordPerfect and Microsoft Word have since been announced, which encourages me to believe that in the not very distant future this method will seem as archaic as the card punch on which I first prepared an electronic text.

I would like to record here my gratitude to Eric Johnson for offering the TEI this platform, and also to all the contributors to this issue, whose work it has been a pleasure and a privilege to bring to you.

Lou Burnard
Oxford, January 1995