SGML: TEI Introduction

  
 
                               TEI EDW26
             An Introduction to the Text Encoding Initiative
                              Lou Burnard
                               6 Aug 1991
                          (Revised, May 1992)
 
   Abstract
   --------
 
This paper [1] outlines the origins and goals of the Text Encoding
Initiative, an international research project which aims to provide
guidelines for the standardisation of electronic texts in research.  It
begins with a re-statement of the goals of the project, placed in the
current research context, and then gives a brief description of its
nature and scope, with attention to its potential for the handling of
the full diversity of textual materials likely to be of interest to
social historians.
 
 
   1. What is the Text Encoding Initiative?
   ----------------------------------------
 
The Text Encoding Initiative is an international research project, the
aim of which is to develop and to disseminate guidelines for the
encoding and interchange of machine-readable texts. It is sponsored by
the Association for Computers and the Humanities (ACH),
the Association for
Computational Linguistics (ACL),
and the Association for Literary and
Linguistic Computing (ALLC).
The project is funded by the U.S. National
Endowment for the Humanities, DG XIII of the Commission of the European
Communities, the Canadian Social Science and Humanities Research Council
and the Andrew W. Mellon Foundation.  Equally important has been the
donation of time and expertise by the many members of the international
research community who have served on the TEI's various Working
Committees and Working Groups.
 
The project originated in a conference held at Vassar College in the
winter of 1987. [2] The conference was organised by the ACH with funding
from NEH. Thirty-one invited scholars, representatives of scholarly
organizations and archive directors from North America, Europe and the
Middle East participated in two days of intense discussion from which
emerged a consensus, expressed in the "Poughkeepsie Principles"
discussed below. The success of this conference resulted in the
establishment of the Text Encoding Initiative as a co-operative venture
sponsored jointly by the ACH, the ACL and the ALLC and directed by a
steering committee comprising two representatives from each of these
associations.
 
The TEI Steering Committee formulated a work plan  to achieve the goals
proposed at the Vassar Conference, and then approached fifteen leading
scholarly organizations in North America and Europe to nominate
representatives to an Advisory Board. This Board met for the first time
in Chicago in February 1989, approved the work plan  with some
modifications and nominated some members to the four working committees
of the TEI. It will meet again in 1992 to review the final product of
the TEI.
 
During the first funding cycle of the TEI (June 1988-June 1990), work
was carried out in four large working committees, with membership drawn
from Europe and North America, and from a wide range of disciplines. The
members of the largest committee, for example, had backgrounds in
computer science, textual editing, theology, software development,
linguistics, philosophy and history and  were from Norway, Belgium,
Germany, Spain and the UK as well as the USA. Each of the four
committees proposed a variety of recommendations in distinct areas of
text encoding practices. These found expression in the first draft of
the TEI Guidelines, a large (300 page) report which was widely
distributed in Europe, North America and elsewhere in November 1990.
As noted above, however, the present article will not attempt to
summarise these Recommendations [3] themselves, as a number of summary
articles have already appeared. [4]
 
During the second TEI funding cycle (June 1990-June 1992) the initial
recommendations are being reviewed and extended by about a dozen
different specialist working groups. The membership of these is, again,
as broadly based as possible, while the topics they are currently
addressing include character sets; textual criticism; hypermedia;
mathematical formulae and tables; language corpora and spoken texts;
physical description of printed and manuscript sources; formal
characteristics of literary genres in prose, drama and verse; general
linguistics; electronic dictionaries, lexica and terminology banks; and
last, but by no means least, historical sources. Each work group is
charged with the production of a set of recommendations for change to
the Guidelines and the testing of their provisions in the Group's chosen
field. These reports will form one of three sources of input to the
final revision of the current draft proposals, for review during the
coming winter.
 
In the summer of 1991, a series of TEI Workshops was organised, at
Tempe, Oxford and Providence. These attracted a large number of
vociferous discussants, from a wide variety of backgrounds, including
professional software developers, librarians, computer scientists as
well as researchers and teachers from most of the disciplines in the
humanities. The emphasis here was on using the Guidelines as they stood
with currently available software and on putting their proposals to the
test of real data.
 
Similarly pragmatic concerns underly the second source of modifications
to the current draft: the experiences of the "affiliated projects". Some
fifteen major research projects, on both sides of the Atlantic, have
signed affiliation agreements with the TEI which involve, amongst other
things, the testing of the Guidelines against realistically sized
samples of the various textual resources which each project is engaged
in creating and a detailed report on the problems encountered and
solutions found.  These projects include major linguistic corpus
building ventures such as the British National Corpus and the ACL Data
Collection Initiative; projects engaged in the creation of large general
purpose historical or literary corpora such as the Brown University
Women Writers Project or Harvard University's Perseus Project; as well
as smaller individual projects aiming to produce scholarly electronic
editions of major authors such as Middleton, Milton or Nietzsche.
 
The final source of comment and revision is the public at large, which
has reacted with surprising and occasionally embarassing enthusiasm to
the original draft proposals. Over a hundred sets of individual comments
and criticisms, some quite detailed, have already been received, and it
is hoped that this public discussion will continue during the remainder
of the project's lifetime. It has been often remarked that standards
cannot be enforced by fiat: they must be accepted voluntarily if they
are to achieve any permanent standing. To that end, the TEI has always
been anxious to stimulate informed discussion of its proposals in as
many and as diverse forums as there are listeners willing to hear.
 
The task of co-ordinating the working groups and committees, and of
combining their drafts for publication is carried out by two editors,
one European and one American, while the project as a whole is managed
by the steering committee. The final deliverables of the project, which
will include a substantial reference manual and a number of tutorial
guides, will be presented to the Advisory Board in the spring of 1992,
in time for final publication at the end of the second funding cycle in
June of that year.
 
 
   2. TEI Design Goals
   -------------------
 
As mentioned above, the basic design goals of the TEI Guidelines were
determined by the results of a planning conference held at Vassar
College in Poughkeepsie, New York, at the outset of the project.  That
planning conference agreed on the following statement of principles:
 
     1) The guidelines are intended to provide a standard format for
        data interchange in humanities research.
 
     2) The guidelines are also intended to suggest principles for
        the encoding of texts in the same format.
 
     3) The guidelines should
        3.1) define a recommended syntax for the format,
        3.2) define a metalanguage for the description of
             text-encoding schemes,
        3.3) describe the new format and representative existing
             schemes both in that metalanguage and in prose.
 
     4) The guidelines should propose sets of coding conventions
        suited for various applications.
 
     5) The guidelines should include a minimal set of conventions
        for encoding new texts in the format.
 
     6) The guidelines are to be drafted by committees on text
        documentation; text representation; text interpretation and
        analysis; metalanguage definition and description of existing
        and proposed schemes. These committees will be coordinated by
        a steering committee of representatives of the principal
        sponsoring organizations.
 
     7) Compatibility with existing standards will be maintained as
        far as possible.
 
     8) A number of large text archives have agreed in principle to
        support the guidelines in their function as an interchange
        format.  We encourage funding agencies to support development
        of tools to facilitate this interchange.
 
     9) Conversion of existing machine-readable texts to the new format
        involves the translation of their conventions into the syntax
        of the new format.  No requirements will be made for the
        addition of information not already coded in the texts.
 
The mandate of creating a common interchange format requires the
specification of a specific markup syntax as well as the definition of a
large predefined tag set and the provision of mechanisms for extending
the markup scheme.  The mandate to provide guidance for new text
encodings ("suggest principles for text encoding") requires that
recommendations be made as to what textual features should be recorded
in various situations.  The TEI Guidelines were thus, from the start,
concerned both with `how' encoding should be preserved for interchange
purposes and `what' should in fact be encoded, a point to which I return
below.
 
As well as attempting to balance the membership of the original
committees geographically and by discipline, an attempt was made to
focus the work of each committee on areas where substantial agreement
existed among the different parties, rather than to revive or rehearse
well recognised disagreements.  The guiding principle was to facilitate
consensus rather than controversy.  Inevitably, this approach runs the
risk of pleasing no-one, by being void of substantial content, or
refusing to take a stand on any issue. In practice however, the
reception of the Guidelines shows that these dangers were safely
avoided. Moreover, despite the diversity of backgrounds of those
involved, a pleasing discovery was the number of fundamentally identical
encoding problems to be found in apparently different types of material.
The task of encoding (for example) the different strata of a manuscript
tradition poses formally identical problems with those of encoding in
parallel multiple linguistic analyses of a given sentence. The task of
representing hypertextual links in a document is strikingly similar to
that of representing literary allusions or cross references. The problem
of preserving the physical appearance of a text together with multiple
interpretations derived from it is common to almost every text-based
discipline.
 
 
   3. The need for interchange
   ---------------------------
 
The goal of the TEI is to develop and disseminate a set of Guidelines
for the interchange of machine-readable texts among researchers, so as
to allow easier and more efficient sharing of resources for textual
computing and natural language processing.  It needs to be stressed at
the outset that the phrase "machine-readable texts" is to be understood
in the widest possible sense. What, for example, do historians do with
texts which is fundamentally different from what linguists or social
science researchers or literary analysts do? In each case, the same
problems arise: an existing source must be represented accurately, a set
of interpretations about its component parts must be specified, and some
processing carried out. Sometimes the processing will focus more on
representation than on analysis (as in the production of a new edition);
at other times, the focus will be on a particular kind of analysis
(as in the extraction of statistical data or the construction of a
database). However, as I and others have argued elsewhere, [5] a crucial
benefit of computer-aided research is the way in which it supports both
approaches. The creator of an appropriately marked-up electronic text
can have her cake and eat it too. Standardisation efforts which do not
address the issue of a neutral, processing-independent, view of textual
objects, but instead focus on standards in particular application areas,
thus fall somewhat short of realising the full potential of electronic
texts.
 
Of course, in a world of unlimited resources, there would be no
particular need for interchange standards at all: each creator of
electronic texts could do so in glorious isolation. However, in several
areas of the research community, notably the expanding field of natural
language processing, a need for interchangeable electronic resources is
already widely recognised, on both economic and methodological grounds.
Economically, it is widely accepted that the heavy cost of creating such
resources as language corpora and electronic lexica can only be
justified if the resources can be re-used by many projects. [6]
Methodologically, the repeatability of research results which forms an
essential aspect of any empirical research can best be guaranteed by the
continued availability of data sets for secondary analysis.
Standardisation is easily achieved where there is a broad consensus
about the kinds of data to be processed and the particular software
packages to be used (as has been, for example, the case for many years
in social science survey research). It is less simple where essentially
identical kinds of data resources (such as textual corpora) contain
matter of interest to distinct research communities characterised by an
immense variety of theoretical positions and methods. Yet the same
conceptual model of what texts actually are should be applicable to all.
This can only be achieved if standardisation is expressed at a
sufficiently general level to allow for the widest variety of views.
 
The TEI arose from a perceived need within one, comparatively small,
research community: that concerned with the encoding and manipulation of
purely textual data for purposes of descriptive or corpus linguistics,
stylistic analysis, textual editing and other forms of what is broadly
called `Literary and Linguistic Computing' (LLC). There has recently
been an interesting convergence between the needs and abilities of that
community with those of the somewhat larger body of researchers
concerned with the computational analysis of natural language (NLP)
whether for natural language understanding, generation or translation
systems. Straddling the two communities are those concerned with the
creation of better, objectively derived, models of language in use,
whose methods have transformed current practices in lexicography and
language teaching. What links all of these researchers is the need to
process large amounts of textual data in a wide variety of different
styles. What the TEI offers them, and others, is a model for the
standardisation of textual data resources for interchange.
 
It is helpful, when considering standardisation of electronic resources,
to distinguish the objects of standardisation (the `what') from the
particular representation recommended for it (the `how'). Like other
standardisation efforts, the TEI Guidelines include both recommendations
about which textual features should be distinguished when encoding texts
from scratch if the resulting text is to be of maximal usefulness to the
research community, and recommendations of specific practices in the
encoding of new texts. The `how' chosen by the TEI is based on the
international standard Standard Generalized Markup Language (SGML),
an informal introduction to which is
provided elsewhere in this volume. The `what' is rather more difficult
to summarise in a short document of this nature, but some general
remarks and a few specific examples are provided below. Distinguishing
these two aspects of standardisation is particularly important for
electronic resources, because of the ease with which their
representations may be changed.
 
What is sometimes forgotten is that ease of conversion is crucially
dependent on the prior existence of an agreed set of distinctions. The
TEI attempts to make such an agreed set of distinctions, by proposing an
abstract data model consisting of those features for which a consensus
can be reached as to their importance in a wide range of automatic
analyses. To identify these features, particular software systems may
use entirely different representation schemata: different
representations will be appropriate for different hardware environments,
for different software packages, for archival storage and in particular
for the exchange of data across particular networks.
 
 
   4. Structure and interpretation: the TEI topoi
   ----------------------------------------------
 
I suggested above that the primary function of markup was to make
explicit an interpretation of a text. Any standardisation effort such as
the TEI must therefore at some time grasp the nettle of deciding which
interpretations are to be favoured over others. To put it another way,
the TEI must at least attempt to address the question as to which
aspects or features of a text should be made explicit by its markup.
 
For some scholars, this is a simple issue. There are some features of a
text which are "obvious" and "objective" -- examples usually include
major structural subdivisions such as chapters or verse lines or entries
in a charter. There are others which are equally obviously "pure
interpretation" -- such as whether or not a passage in a prose text
belongs to some stylistic category, or is in a foreign language, or is a
personal name. As this last list perhaps indicates, for the present
writer this is a far from clear cut distinction. In almost every kind of
material, and especially in the kinds of materials studied by
historians, there is a continuum of categorisations, from things about
which almost everyone will agree almost all of the time, down to things
which almost no-one will identify in the same way ever.
 
The TEI therefore adopts a liberal policy. It proposes for consideration
a set of categories which wide consultation has demonstrated to be of
use to a broad consensus of researchers. It proposes ways in which
instances of those categories may be marked up (as discussed in the last
section). Researchers in agreement as to the use of the categories so
defined can thus interchange texts, or (if you wish) interpreted texts.
They can do so moreover in a format which allows the disentangling of
the interpretation from the text stream, or its enrichment in a
controlled way. No claim is made as to the feasibility or desirability
of making such interpretations in a given case -- all that the TEI can
or does offer is a way of making explicit what has been done.
 
The remainder of this paper discusses some concrete instances of the
kinds of textual feature which typify the current TEI proposals.
 
 
   4.1. The structure of a TEI text
   --------------------------------
 
   4.1.1. The TEI header
   ---------------------
 
All TEI-conformant texts contain (a) a "TEI header" and (b) the
transcription of the text proper.  The TEI header provides information
analogous to that provided by the title page of a printed text. It
contains a description of the machine-readable text, a description of
the way it has been encoded, and a revision history; these are delimited
by the <file.description>, the <encoding.declarations>, and the
<revision.history> tags, respectively.  The first of these identifies
the electronic text as an object in its own right, independent of its
source or sources (which must however be documented within it). The
second supplies details of the particular encoding practices or
variations which characterise the text, for example any special
codebooks or other values used within the body of the text and
descriptions of the referencing scheme or editorial principles applied.
The header is, perhaps surprisingly, the `only' part of a TEI text which
is mandatory.
 
   4.1.2. Marking Divisions within a Text
   --------------------------------------
 
The TEI recommendations categorise document elements as either
"structural" or "floating".  Structural elements are constrained as to
where they may appear in a document; for example a <head> or heading may
not appear in the middle of a <list>. Floating elements, as the name
suggests, are less constrained and may appear almost anywhere in a text:
examples include <note> or <date>. Intermediate between the two
categories are so-called "crystals": these are floating features the
contents of which have an inherent structure, for example <list>
or <citn> elements.
 
   4.1.1.1. Structural features
   ----------------------------
 
The current recommendations define a general purpose hierarchic
structure, which has been found to be suitable for a very large (perhaps
surprisingly large) variety of textual sources. In this, a text is
divided into an  optional <front>, a <body> and an optional
<back>. The body of a text may be a series of paragraphs (marked with
<p>...</p>), or it may be divided into chapters, sections, subsections,
etc.  In the latter case, the <body> is divided into a series of
elements known generically as "div"s.  The largest subdivision of a
given text is tagged <div1>, the next smallest <div2> and so on.  Written
prose texts may also be further subdivided into <p>s (paragraphs).  For
verse texts, metrical lines are tagged with the <l> tag.
 
   4.1.1.2. Floating features
   --------------------------
 
As mentioned above, the current Guidelines propose names and definitions
for a wide variety of floating features.  Examples include <head> for
titles and captions (not properly floating, since they are generally
tied to a particular structural element); <q> for quoted matter and
direct speech; <list> for lists and <item> for the items within them;
<note> for footnotes etc.; <corr> for editorial corrections of the
original source made by the encoder; and, optionally, a variety of
lexically `awkward' items such as <abbr>eviations, <acronym>s, <number>s,
<name>s, <date>s, <citn> for bibliographic or other citations, <address>
for street addresses and <foreign> for non-English words or phrases.
 
   4.2. Reference scheme
   ---------------------
 
The advantage of using a single hierarchic scheme as outlined above,
is that a referencing scheme based on it can be automatically generated.
For example, a given paragraph
will acquire a number indicating its sequence
within the enclosing <div>, itself identified by its number within any
enclosing <div> above it, and ultimately within the enclosing <text>. For
example, the value "T98.1.9/12" might identify the 12th paragraph
in chapter 9
of book 1 of the text with number T98.
 
To complement this kind of internal referencing system, the Guidelines
provide two distinct methods of marking other reference schemes, such as
page and line numbers.  The hierarchy of volume, page, and line can be
neatly expressed with a concurrent markup stream separate from the main
markup hierarchy (see P1 section 5.6); for data entry purposes, however,
the simpler scheme we describe here may be more convenient. After data
entry, this markup can be transformed mechanically into that required
for a concurrent markup hierarchy, if that is supported by the software
in use.
 
Page breaks, column breaks, and line breaks may be marked with empty
"milestone" elements: that is, tags such as <line.break> or <page.break>
which mark a single point in the text, not a span of text, and therefore
have no corresponding end-tags. Such tags may have an "n" attribute to
supply the number of the page, column, or line beginning at the tag
explicitly, or may give only the number of the first if subsequent ones
can be calculated automatically.  This mechanism also allows for the
pagination etc. of more than one edition to be specified by using an
"ed" attribute.
 
   4.3. Descriptive vs presentational markup
   -----------------------------------------
 
A matter of considerable controversy (and associated misunderstanding)
has been the question of whether or not aspects of a text directly
related to its physical appearance can or should be marked up. For some
researchers, and in many applications, typographic features such as
lineation or font are of little or no importance. For others, they are
the very subject of interest. Because SGML focuses attention on
"describing" a text, rather than attempting to simulate its appearance,
the TEI recommendations have proposed that where it is possible to
identify a structural (or floating) feature by its function, then that
is what should be primarily tagged. This does not however mean that they
provide no support for cases where the exact purpose of some
distinctly-rendered part of a text cannot be determined. It is
recognised that in many cases it may be neither desirable nor possible
to interpret changes of rendering in this way.
 
A global attribute "rendition" may be specified for every tag in the TEI
scheme, the value of which is a user-specified string descriptive of
the way that the current element is rendered in the source being
transcribed. [7] In most cases, a change in rendering and a change of
element coincide: this mechanism therefore reduces the amount of tagging
from what would be required if a separate set of tags were used for
rendering. Further reduction in tagging is provided by the fact that the
default value for a "rendition" attribute is that of the immediately
surrounding element (if any).
 
In cases where a renditional change is not associated with
any discernible element, a special tag <highlighted> may
be used, the sole function of which is to carry the
"rendition" attribute.
 
No recommendations about the form of value to be supplied for
rendition attributes have yet been made: these are the subject of
current work in two working groups. Similar considerations apply
to the use of quotation marks and quoted passages within a text.
 
 
   4.4. Scope and coverage of P1
   -----------------------------
 
As an example of the scope and range of facilities which SGML can
support, I close with a brief summary of the full contents of the
current draft and a more detailed description of a few of the more
specialised kinds of textual features for which tags are already
proposed in the draft Guidelines.
 
It should be stressed that the first draft of the Guidelines, despite
its weighty appearance (nearly 300 pages of closely printed A4), is very
much a discussion paper and far from being complete or definitive.  Some
characteristics of the TEI approach are however already discernible
which are unlikely to change.  One is a focus on the encoding of the
content of text, rather than its appearance -- as discussed above, this
is also a characteristic of SGML.  Another is the rigorous application
of Occam's razor: the TEI approach to the immense variety of text types
in the real world is to attempt to define a comparatively small number
of features which all texts share, and to allow for these to be used in
combination with user-definable sets of more specialised features. [8]
 
The current draft has eight main sections, which are briefly summarized
below.
 
Chapter 1 outlines the purpose and scope of the TEI scheme.  As outlined
above, its main goals are both to facilitate data interchange and to
provide guidance for those creating new texts.  The desiderata of
simplicity, clarity, formal rigour, sufficient power for research
purposes, conformance to international standards, and independence of
software, hardware or application alike are stressed.
 
Chapter 2 provides a gentle introduction to the basic concepts of SGML
and also contains some more technical information about the ways in
which the TEI scheme uses the standard.
 
Chapter 3 addresses the problems of character encoding and translation
in a world dominated by the rival claims of ASCII and EBCDIC.  If the
goal is to provide machine-independent support for all writing systems
of all languages, these problems are far from trivial.  The specific
recommendations made are that only a subset of the ISO-646 character set
(sometimes known as ASCII) can currently be relied on for data
interchange, and that this should be extended either by using the entity
reference mechanism provided by SGML or by using transliteration
schemes.  It proposes a powerful but economical way of documenting such
transliteration schemes by a formal Writing System Declaration.
 
Chapter 4 contains recommendations for in-file documentation of
electronic texts adequate to the bibliographic needs of researchers,
data archivists and librarians.  It recommends that a special header be
added to each file to perform a function analogous to that of the title
page of a non-electronic text, and proposes sets of tags for information
about the file itself, the source from which it was derived and how it
was encoded.
 
Chapter 5, the largest chapter, attempts to define a set of
general-purpose structural and floating tags for continuous prose texts.
Its basic idea of text as an ordered hierarchy of objects, within which
floating features and crystals may appear, was discussed above.  This
chapter of the Guidelines also proposes tags for features such as lists,
notes, names, abbreviations, numbers, foreign or emphasised phrases,
cross references, and hypertextual links.  Sections deal with the kinds
of textual element commonly found in front and back matter of printed
texts, title pages etc. Other sections discuss ways of encoding textual
variation and critical apparatus and of recording the rendering of
arbitrary textual fragments within this overall framework.  There is
also some discussion of different ways of maintaining multiple
referencing schemes within the same text.
 
Chapter 6 outlines a number of theory-independent mechanisms for
representing all kinds of linguistic analyses of running text.  It is
probably the most daunting chapter for the non-specialist reader, though
much of its contents are of very wide relevance.  It argues that most,
if not all, linguistic analyses can be represented as bundles of named,
value-bearing, `feature structures', which may be nested and grouped
into sets or lists.  It proposes ways of supporting multiple and
independently aligned analyses, chiefly by means of the ID/IDREF pointer
mechanism native to SGML.  It also contains some tagsets for such
commonly occurring formalisms as tree structures and parts of speech.
 
Chapter 7 considers in more detail particular aspects of some specific
types of text.  The text-types discussed in this draft are: language
corpora and collections; verse, drama, and narrative; dictionaries; and
office documents.  In each case, an overview of the problems specific to
these types of discourse is given, with some preliminary proposals for
tags appropriate to them.  This chapter is one that will be considerably
revised and extended over the coming months, as its initial proposals
are firmed up and as its scope is extended to other types of text.
 
Chapter 8 outlines a method by which the current Guidelines may be
modified and extended, largely by introducing indirection into the
Document Type Definitions (DTDs)
(the formal SGML specifications for the TEI
encoding scheme).  Extension and modification of the TEI proposals is an
important design goal, since this is both expected and intended, and the
final form of the Guidelines will facilitate it.
 
Preliminary versions of a number of technical appendixes are provided in
the current draft.  These include annotated examples, illustrating the
application of the TEI encoding scheme to a wide range of texts, formal
SGML document type declarations (DTDs) for all the tags and groups of
tags defined in the TEI scheme, and code pages for some commonly used
character sets.  Later drafts will extend and improve these initial
versions considerably, and will also contain an alphabetical reference
section with a summary of each tag, its attributes, its usage, and an
example of its use, as well as full Writing System Declarations for a
range of commonly used alphabets.
 
Space precludes an exhaustive discussion of the various tags and
associated features suggested by the current TEI draft proposals.
Further proposals from the specialist working groups currently
discussing extensions in a wide range of subject areas will be included
in the final TEI report in a year's time.  However, it is hoped that
enough detail has been provided to give some indication of the general
ideas underlying the scheme.
 
 
------------------Notes--------------------------------------------
 
[1] First published in Greenstein, D.I., ed,
Modelling Historical Data (Goettingen, Max-Planck-Inst. fuer
-------------------------
Geschichte, 1991).
 
[2] See "Report of Workshop on Encoding Principles"
Literary and Linguistic Computing 3.2 (1988).
---------------------------------
 
[3] ACH-ACL-ALLC Guidelines for the encoding and interchange of
machine-readable texts, edited by Lou Burnard and C.M. Sperberg-McQueen
(Chicago and Oxford, Text Encoding Initiative, October, 1990), hereafter
"Guidelines" or "P1".
 
[4] Examples include Humanistiske Data 3-90; ACH Newsletter 12 (3-4);
                     -----------------       --------------
EPSIG News 3(3); SGML Users Group Newsletter 18; ACLS Newsletter 2/4
----------       ---------------------------     ---------------
and elsewhere. A fuller summary is to appear as Burnard "The TEI:
a progress report" in Proceedings of the 11th ICAME Conference,
                      ----------------------------------------
Berlin, 1990, ed G.Leitner. Encoding principles of the TEI are discussed
in Sperberg-McQueen, "Texts in the electronic age: textual study and text
encoding, with examples from medieval texts",
in Literary and Linguistic Computing (6.1, 1991).
   ---------------------------------
 
[5] "Primary to Secondary" in Peter Denley and Deian Hopkins, eds.
History and Computing (Manchester: Manchester University Press, 1987).
---------------------
 
[6] A recent EUROTRA-funded study reported on this and other aspects of
`re-usability of lexical resources' in considerable detail: its
findings are equally relevant to other disciplines. See U. Heid,
"Eurotra-7: Feasibility and Project Definition Study on the Reusability
of Lexical and Terminological Resources in Computerised Applications.",
January, 1991.
 
[7] This implies of course that the markup describes a single source.
 
[8] This has been termed the "pizza model", by contrast with either the
"table d'h^ote" or the "`a la carte" models. A choice of a small number
of bases is offered, each of which may be combined with a large number
of toppings.