[This local archive copy mirrored from: http://www.stg.brown.edu/webs/tei10/tei10.papers/Simonspaper.html; see the canonical version of the document.]
Text Encoding Initiative
|
---|
Gary F. Simons
Summer Institute of Linguistics
gary_simons@sil.org
This paper develops a solution to the problem of importing existing TEI data into an existing object-oriented database schema without changing the TEI data or the database schema. After investigating the general problem of where the mismatch lies between the SGML model and the object model, the paper proposes a solution based on architectural processing. Two meta-DTDs are used, one to define the architectural forms for the object model and another to map the existing SGML data onto those forms. A full example using a critical text in TEI markup is developed.
Much of the promise of SGML lies in the fact that descriptively marked up data can be interchanged freely and used by multiple applications for analytical processing or publication formatting. Indeed, this is part of the motivation behind the Text Encoding Initiative's Guidelines for Electronic Text Encoding and Interchange [TEI94]. Given the fact that an SGML DTD has much in common with the conceptual model that results from an object-oriented analysis of a problem domain, it is logical to conclude that SGML data should be particularly amenable to being imported into software that uses an object-oriented data model. This is not a trivial task, however, since there are some fundamental differences between the SGML model of data and the object model.
The paper explores this general problem as it develops a solution to a more specific problem, namely, how to import existing SGML data into an existing object-oriented database schema without changing either the SGML data or the database schema. The target system is an object-oriented database system named CELLAR (for Computing Environment for Linguistic, Literary, and Anthropological Research [RST93]). The solution uses architectural processing to map the SGML data onto architectural forms that the CELLAR system can use to construct the appropriate structure of objects.
Section 1 of the paper discusses the basic differences between the SGML model of data and the object model, and illustrates why the mapping from SGML elements to objects is not a trivial one. Section 2 introduces the DTD for an architecture that maps SGML data onto objects. Section 3 gives a complete example of the automated process by which the SGML data are mapped onto this architectural DTD via an intermediate meta-DTD that encodes the mapping. The example used is that of a critical text edition encoded in TEI format. Finally, section 4 discusses the implementation and the results that have been achieved thus far.
The problems inherent in importing SGML data into an object database stem from the differences between the SGML model of data and the object model of data. In speaking of the "object model of data," I am referring specifically to the way object databases [Cat97] and conceptual modeling languages [Bor85] represent information. Such systems replace the simple instance variables of an object-oriented programming language with attributes that encapsulate integrity constraints and the semantics of relationships to other objects.
In SGML, the fundamental unit of data representation is the element. Each element must have a generic identifier; it may optionally have a number of attributes or content or both. Each attribute has a name and a value; the value is represented by a string of characters. The content of an element may consist of character data or embedded elements or a combination of both. These generalizations may be expressed in terms of the following declarations:
<!ELEMENT element - - (attr* & content?) > <!ATTLIST element gi NAME #REQUIRED > <!ELEMENT attr - O EMPTY > <!ATTLIST attr name NAME #REQUIRED value CDATA #IMPLIED > <!ELEMENT content - - (#PCDATA | element)* >
In the object model, the fundamental unit of data representation is the object. Each object must have a class, and is either a primitive object that stores primitive data like a string or a number, or is a complex object that has attributes. Each attribute has a name and a value; the value consists of embedded objects. These generalizations may be expressed in terms of the following declarations:
<!ELEMENT object - - (attr)* > <!ATTLIST object class NAME #REQUIRED > <!ELEMENT attr - - (primitiveObject | object)* > <!ATTLIST attr name NAME #REQUIRED > <!ELEMENT primitiveObject - - (#PCDATA) > <!ATTLIST primitiveObject class NAME #REQUIRED >
Element and object are superficially similar: generic identifier corresponds to class, both have attributes, and both occur recursively. They differ fundamentally, however, in the nature of the attributes and the recursion. With elements, the attributes cannot contain embedded structure; the recursion of elements is allowed only within the content of an element. With objects, there is no specialized notion of content; rather, the recursive embedding of further objects takes place within the attributes.
An SGML document following the model of 1.1 can be automatically mapped onto the object model of section 1.2 by making four transformations:
<element
gi=X>...</element>
to <object
class=X>...</object>
.
<attr name=X value=Y>
to
<attr name=X><primitiveObject
class="String">Y</primitiveObject></attr>
.
<content>...</content>
to <attr
name="content">...</attr>
.
#PCDATA
within the tags
<primitiveObject
class="String">...</primitiveObject>
.
For example, the following sample SGML element contains an instance of each of the four conditions listed above:
<phrase rend="ital">an italic phrase</phrase>
Following the nutshell model of SGML in section 1.1, this corresponds to the following semantic representation:
<element gi="phrase"> <attr name="rend" value="ital"> <content>an italic phrase</content> </element>
This would be converted into the following object representation by the proposed default mapping:
<object class="phrase"> <attr name="rend"> <primitiveObject class="String">ital</primitiveObject> </attr> <attr name="content"> <primitiveObject class="String">an italic phrase</primitiveObject> </attr> </object>
The default transformation described in the preceding section can easily be done on any SGML document, but it will seldom yield a result that actually fits the conceptual model of a target object database. Consider, for instance, the following simplistic SGML document:
<document> <creationDate>12-Jun-97</creationDate> <title> <maintitle>The main title</maintitle> <subtitle>a subtitle</subtitle> </title> <authors> <author> <name>First Author</name> <affil>Some Company</affil> </author> <author> <name>Second Author</name> <affil>Another Company</affil> </author> </authors> <p>An introductory paragraph</p> <div1><!-- The first section --></div1> <div1><!-- The second section --></div1> </document>
The above represents a typical approach to encoding a document in SGML. But compare it to the following which is also typical of how a Document class might be defined in an object database:
class Document has creationDate : Date title : TitleStatement authors : sequence of Person content : sequence of Paragraph or Division
The default mapping proposed in section 1.3 would
first go wrong by putting all the subelements within the document in a single
attribute named content; instead we want to map them into four different
attributes. The first three subelements
(<creationDate>
, <title>
, and
<authors>
) correspond to Document attributes of the same
name. The remaining subelements (<p>
and two instances
of <div1>
) correspond to objects that go into the Document
attribute named content (which happens not to be explicitly tagged).
Though the first three subelements correspond to attributes, they differ
significantly in the way they do so. <creationDate>
additionally carries the information that the embedded PCDATA content should
be mapped onto a basic object of class Date. <title>
not
only corresponds to the attribute title but also to an object of class
TitleStatement (which in turn has attributes maintitle and
subtitle). By contrast, <authors>
corresponds to
the attribute and nothing more; each embedded
<author>
element corresponds to an object of class Person.
This example illustrates the following fundamental result when comparing the SGML model to the object model: some SGML elements encode an object, some encode an attribute, and still others simultaneously encode both. The basic challenge of importing SGML data into an object database is to determine which of these cases holds for each of the element types occurring in the data, and then to express formally how each maps onto the corresponding classes and attributes of the target database schema.
The HyTime standard [ISO92] first introduced the concept of architectural forms as a way to associate standardized semantics with elements in user-defined DTDs [DD94]. Now that this notion has been generalized in the SGML Extended Facilities (defined in Annex A of the revised HyTime standard [ISO97]), we can use it to good advantage in solving the problem at hand. Architectural forms provide a mechanism we can use to express the semantics of how SGML elements map onto the object model. See [Cov97] for pointers to other applications of architectural forms.
There are two basic element forms in the architecture,
<object>
and <attr>
. Rather than having
a third form for the case when an element corresponds to both an object and
an attribute, this case is treated as being a mapping to an object, and the
object form adds an architectural attribute to name the attribute it also
maps to. A third form, <ignore>
is used for the case when
the SGML element does not correspond to anything in the target object model
so the element content should be processed as though the start and end tags
were not there. The definitions of these three forms are given below. (The
definition of the architecture is abridged for the sake of this presentation;
see [Sim97b] and [Sim97c] for
the full definition.)
<!-- CELLAR.DTD (abridged version) Meta-DTD of the CELLAR architecture for mapping SGML data into CELLAR's object model of data --> <!ENTITY % content "object | attr | ignore | #PCDATA" > <!-- -- -- OBJECT: the element corresponds to an object in CELLAR -- -- --> <!ELEMENT object - - (%content;)* > <!ATTLIST object class -- Create this class of CELLAR object -- CDATA #REQUIRED parentAttr -- Put the object in this attr of its parent -- CDATA #IMPLIED contentAttr -- Put embedded objects in this attribute -- CDATA #IMPLIED pcdataClass -- Create this class for embedded PCDATA -- CDATA "String" encoding -- Put embedded strings in this encoding -- CDATA #IMPLIED id -- A unique identifier for this object -- ID #IMPLIED attrName -- Set this attribute of the object ... -- CDATA #IMPLIED attrValue -- ... to this value -- CDATA #IMPLIED attrType -- The value is an IDREF or of named class -- CDATA "String" <!-- -- -- ATTR: the element corresponds to an attribute in CELLAR -- -- --> <!ELEMENT attr - - (%content;)* > <!ATTLIST attr contentAttr -- Put embedded objects in this attribute -- CDATA #IMPLIED pcdataClass -- Create this class for embedded PCDATA -- CDATA "String" encoding -- Put embedded strings in this encoding -- CDATA #IMPLIED > <!-- -- -- IGNORE: the element corresponds to nothing in CELLAR; -- -- ignore it at this level, but process its content -- -- --> <!ELEMENT ignore - - (%content;)* >
The easiest way to explain these forms is by example. In the illustrative
document in section 1.4, the
<document>
element corresponds to an object of class Document;
the element content (unless an embedded element names a specific target
attribute) goes into the content attribute of the object. The
<document>
element would be augmented as follows to indicate
its mapping into the object model:
<document cellar=object class="Document" contentAttr="content">
This says that in the architecture named cellar, this
<document>
element corresponds to an
<object>
element whose class is "Document" and
whose contentAttr is "content".
The <creationDate>
element corresponds to an attribute.
Its content goes into the creationDate attribute, and the embedded
PCDATA needs to be converted into Date objects. Thus,
<creationDate cellar=attr contentAttr="creationDate" pcdataClass="Date">
The <title>
element corresponds to a TitleStatement object,
but it also corresponds to an attribute in that it maps into the title
attribute of its parent object (that is, the Document). Thus,
<title cellar=object class="TitleStatement" parentAttr="title">
Finally, the <authors>
element corresponds to the
authors attribute; thus,
<authors cellar=attr contentAttr="authors">
As stated in the introduction, the goal of this work is to import existing SGML data into an existing object-oriented database schema without changing the SGML data or the database schema. This section demonstrates a full example of the process. The SGML data file is a critical edition in TEI markup of a passage from the Second Epistle of Clement. A fuller treatment of this sample text along with examples of what can be done with it in the CELLAR environment is given in [Sim97a].
The file for the critical text is as follows. Note that a significant portion of the content has been elided in the interest of brevity. The Greek text is encoded in TLG beta code.
<!DOCTYPE TEI.2 SYSTEM "textcrit.dtd"> <TEI.2> <text> <front> <docTitle>2 Clement, chapter 7</docTitle> <witlist> <wit id=A type=Manuscript>Codex Alexandrinus <bibl>A Greek uncial of the fifth century. Housed in the British Museum. Published in: The Codex Alexandrinus in reduced photographic facsimile, with an introduction by F. G. Kenyon, London 1909. </bibl></wit> <wit id=C type=Manuscript>Codex Constantinopolitanus <bibl> . . . </bibl></wit> <wit id=S type=Manuscript>Syriac Version <bibl> . . . </bibl></wit> <wit id=L type=Edition>Lightfoot 1890 <bibl>Lightfoot, J. B. 1890. The Apostolic Fathers: Clement, Ignatius, Polycarp (2nd edition). Part One: Clement, volume 2, pages 210-261. Macmillan. (Reprinted 1989 by Hendrickson Publishers, Peabody, MA) </bibl></wit> <wit id=Lb type=Edition>Loeb edition <bibl> . . . </bibl></wit> <wit id=B type=Edition>Bihlmeyer 1970 <bibl> . . . </bibl></wit> <wit id=W type=Edition>Wengst 1984 <bibl> . . . </bibl></wit> </witlist> </front> <body> <div n=7> <!-- ***************** Verse 1 ********************* --> <s n=1> w(/ste <app><rdg wit='A L Lb B'>ou)=n</rdg> <rdg wit='C S W'><omit></rdg></app> a)delfoi/ <app><rdg wit='A L Lb B'>mou</rdg> <rdg wit='C W'><omit></rdg></app> a)gwnisw/meqa ei)do/tej, o(/ti e)n xersi\n o( <app><rdg wit='C S L Lb B W'>a)gw\n</rdg> <rdg wit='A'>ai)w/n</rdg></app> kai\ o(/ti ei)j tou\j fqartou\j a)gw=naj kataple/ousin polloi/, a)ll' ou) pa/ntej stefanou=ntai, <app><rdg wit='C L Lb B W'>ei) mh\</rdg> <rdg wit='A'>oi( mh/</rdg> <rdg wit='S'>ei) mh\ mo/non</rdg></app> oi( polla\ kopia/santej kai\ kalw=j a)gwnisa/menoi. </s> <!-- and so forth for remaining verses --> </div> </body></text> </TEI.2>
The DTD for this file is the following:
<!-- TextCrit.DTD A DTD for encoding a text critical edition. All tags are from the TEI guidelines (Text Encoding Initiative). The content models have been simplified to deal only with the tags needed for the sample text of II Clement. The aim is to faithfully represent the TEI scheme of markup without having to deal with the huge TEI DTD. This DTD reflects the "Parallel segmentation method" of encoding. See section 19.2.3 of the TEI Guidelines. Gary Simons, Summer Institute of Linguistics Last revised: 18 october 1997 --> <!ELEMENT TEI.2 - - ( text ) > <!ELEMENT text - - ( front, body ) > <!ELEMENT front - - ( docTitle, witList ) > <!ELEMENT docTitle - - (#PCDATA) > <!ELEMENT witList - - ( wit+ ) > <!ELEMENT wit - - ( #PCDATA, bibl? ) > <!ATTLIST wit id ID #REQUIRED type CDATA #REQUIRED > <!ELEMENT bibl - - (#PCDATA) > <!ELEMENT body - - ( div+ ) > <!ELEMENT div - - ( s+ ) > <!ATTLIST div n CDATA #IMPLIED > <!ELEMENT s - - ( #PCDATA | app )+ > <!ATTLIST s n CDATA #IMPLIED > <!ELEMENT app - - ( rdg+ ) > <!ELEMENT rdg - - ( #PCDATA | omit ) > <!ATTLIST rdg wit IDREFS #REQUIRED > <!ELEMENT omit - O EMPTY >
The conceptual model for the objects and attributes into which we want to import the input file is diagrammed below. The notation and the model are explained in [Sim97a]. Here suffice it to say that solid arrows mean "contains" and the dotted arrow means "holds pointers to."
Place here the file Textcrit.gif
At the outset, two DTDs are given. For this example they are:
To perform the automatic mapping from the client DTD to the architectural DTD, two additional DTDs must be defined:
The process for automatically mapping a client document onto its corresponding architectural document follows these steps:
This process is illustrated in the subsections which follow.
The input file we are using (from section 3.1) uses a DTD in the file textcritt.dtd. The first step is to define an alternate version of this DTD which invokes the desired architectural processing features. The result is as follows:
<!-- my-textcrit.dtd This is a version of textcrit.dtd that invokes the mapping to CELLAR architectural forms. --> <?ArcBase mapping> <!ENTITY % mappingDTD SYSTEM "map-textcrit.dtd" > <!NOTATION mapping SYSTEM> <!ATTLIST #NOTATION mapping ArcDocF NAME #FIXED "TEI.2" ArcDTD CDATA #FIXED "%mappingDTD" > <!ENTITY % originalDTD SYSTEM "textcrit.dtd" > %originalDTD;
Note that this DTD does not modify the original declarations for the elements
and attributes of the client DTD in any way. Rather, it duplicates them exactly
by including the original DTD in full at the end. The purpose of this version
of the DTD is to declare that the architecture named mapping is to
be used. This is done with the <?ArcBase mapping>
processing
instruction. Following this is the architectural support declaration.
It consists of a notation declaration followed by an attribute definition
list that sets options which control the architecture engine. In this case,
ArcDocF
specifies the generic identifier for the document
(top-level) element of the architectural document, and ArcDTD
names the file which contains the architectural DTD. For this step in the
process, the architectural document is a new version of the
<TEI.2>
document that adds the attributes for the CELLAR
architectural forms.
The second DTD to be created is a meta-DTD that defines the mapping of the elements in the client DTD onto the elements of the architectural DTD. The result for our example is as follows:
<!-- map-textcrit.dtd This maps textcrit.dtd onto CELLAR arc forms Gary simons, SIL, 18 Oct 1997 --> <!afdr "ISO/IEC 10744:1992" --Allow multiple ATTLIST declarations--> <?ArcBase cellar> <!ENTITY % cellarDTD SYSTEM "cellar.dtd" > <!NOTATION cellar SYSTEM> <!ATTLIST #NOTATION cellar arcDocF NAME #FIXED object arcFormA NAME #FIXED cellar arcNamrA NAME #FIXED cellarNames ArcDTD CDATA #FIXED "%cellarDTD" > <!ATTLIST TEI.2 cellar NAME #FIXED object class CDATA #FIXED CriticalText > <!ATTLIST text cellar NAME #FIXED ignore > <!ATTLIST front cellar NAME #FIXED ignore > <!ATTLIST docTitle cellar NAME #FIXED attr contentAttr CDATA #FIXED title > <!ATTLIST witList cellar NAME #FIXED attr contentAttr CDATA #FIXED authorities > <!ATTLIST wit cellar NAME #FIXED object cellarNames CDATA #FIXED "class type attrValue id" attrName CDATA #FIXED siglum attrType CDATA #FIXED String contentAttr CDATA #FIXED description -- id automatically preserved from client attr of same name -- > <!ATTLIST bibl cellar NAME #FIXED attr contentAttr CDATA #FIXED source > <!ATTLIST body cellar NAME #FIXED attr contentAttr CDATA #FIXED body > <!ATTLIST div cellar NAME #FIXED object class CDATA #FIXED CriticalTextChapter contentAttr CDATA #FIXED contents attrName CDATA #FIXED n attrType CDATA #FIXED String cellarNames CDATA #FIXED "attrValue n" > <!ATTLIST s cellar NAME #FIXED object class CDATA #FIXED CriticalTextVerse contentAttr CDATA #FIXED contents attrName CDATA #FIXED n attrType CDATA #FIXED String cellarNames CDATA #FIXED "attrValue n" encoding CDATA #FIXED GKOb > <!ATTLIST app cellar NAME #FIXED object class CDATA #FIXED TextVariation contentAttr CDATA #FIXED readings > <!ATTLIST rdg cellar NAME #FIXED object class CDATA #FIXED Reading contentAttr CDATA #FIXED text attrName CDATA #FIXED witnesses attrType CDATA #FIXED IDREFS cellarNames CDATA #FIXED "attrValue wit" > <!ATTLIST omit cellar NAME #FIXED object class CDATA #FIXED String > <!ENTITY % originalDTD SYSTEM "textcrit.dtd" > %originalDTD;
This DTD declares cellar as the name of its base architecture. The architectural support attributes for this architecture declare that:
ArcDocF
),
ArcFormA
),
ArcNamrA
; see below for an explanation), and
ArcDTD
).
Like the DTD for the original document, this meta-DTD is also for
<!DOCTYPE document>
; thus the original DTD is included
in full without modification at the end. The <!AFDR>
declaration at the beginning instructs the SGML parser to permit duplicate
ATTLIST declarations in this meta-DTD; otherwise it would be a syntax error
for the DTD to both define an ATTLIST for an element and to read one from
the original DTD.
The bulk of this meta-DTD consists of duplicate ATTLIST declarations for the elements in the client DTD. Their purpose is to add declarations for the attributes of the cellar architecture.
The mapping rules use these features that have not already been illustrated or discussed:
<wit>
) takes a list of paired names. The architectural
attribute which is the first member of a pair takes on the value of the client
attribute which is the second member. Thus the first pair defined for
<wit>
says that the name for the class of the object
to create comes out of the type attribute of the client element.
<rdg>
) , the resulting value is pointers
to the objects associated with the given IDs.
<s>
says that all of the strings in the content of
<s>
(including all its subelements) should be created
with the CELLAR language encoding named "GKOb" (for ancient Greek).
Before performing the final step of automatic translation, the client document instance must be changed to use the modified DTD defined in section 3.3.1. That is,
<!DOCTYPE TEI.2 SYSTEM "my-textcrit.dtd"> <TEI.2> <!-- the content is as in section 3.1 --> </TEI.2>
The final step is to run the architecture engine to perform the mapping to translate a client document instance into an architectural document instance. The parsers in the SP family [Cla97] are able to do this. For instance, the following command line
nsgmls -Amapping clement.sgm
applies just the mapping architecture and results in an output which adds the architectural attributes to the original document instance. The command line
nsgmls -Amapping -Acellar clement.sgm
applies the cellar architecture as well and performs the translation of the client document instance into the corresponding document that uses the object markup system of the CELLAR architecture.
For instance, performing this translation step on the sample Clement text (from section 3.1) yields a document like the following (note that most of the content is elided to avoid excessive detail):
<object class="CriticalText"> <attr contentAttr="title" pcdataClass="String"> 2 Clement, chapter 7</attr> <attr contentAttr="authorities"> <object class="Manuscript" id="A" contentAttr="description" attrName="siglum" attrType="String" attrValue="A" pcdataClass="String"> Codex Alexandrinus <attr contentAttr="source" pcdataClass="String"> A Greek uncial of the fifth century. . . </attr> </object> <!-- The other six authorities --> </attr> <attr contentAttr="body"> <!-- The CriticalTextChapter and its conetnts --> </attr> </object>
The final step in the process is to run a method of the CELLAR system that invokes a data input parser that converts the architectural document instance into the corresponding structure of objects. The input to the CELLAR parser is the ESIS output file of the nsgmls parser. At the heart of the implementation is a recursive function of 125 lines that processes one element at a time from the ESIS stream. This function relies on another 125 lines of code in smaller supporting functions. The source code for this parser is listed in full and explained in an electronic working paper [Sim97b].
The CELLAR architecture that has been implemented is actually richer than what is presented above. It also handles cases where:
The full architecture, its implementation, and a number of complete examples (including all the files needed to run them) are presented in an electronic working paper [Sim97b].
The results to date have been promising. The goal of developing a general solution to the problem of importing SGML data into an existing object database schema has been achieved. Given the fact that the method permits superfluous markup to be ignored and unmappable elements to be discarded altogether, it is always possible to achieve a translation from an SGML file into a structure of objects in the database. The usefulness of the result depends on the degree of congruence between the conceptual model of the markup for the source data in SGML and that of the schema for the target object database.
I am deeply indebted to my colleague Robin Cover who has helped in many ways over the course of this project. He has gone the extra mile in helping me to find resources and in offering useful feedback and encouragement.
[Bor85] Borgida, A. (1985) Features of languages for the development of information systems at the conceptual level. IEEE Software 2(1): 63-72.
[Cat97] Cattell, R.G.G., et al. (1997) The Object Database Standard 2.0. San Francisco: Morgan Kaufman.
[Cla97] Clark, J. (1997) SP:An SGML System Conforming to International Standard ISO 8879 --Standard Generalized Markup Language, version 1.2. <http://jclark.com/sp/>. See especially "Architectural form processing," <http://jclark.com/sp/archform.htm>.
[Cov97] Cover, R. (1997) Architectural Forms and SGML Architectures, in The SGML/XML Web Page. <http://www.sil.org/sgml/topics.html#archForms>.
[DD94] DeRose, S. and Durand, D. (1994) Making Hypermedia Work: A User's Guide to HyTime. Boston: Kluwer Academic Publishers. See especially pages 79-90.
[ISO92] International Organization for Standardization. (1992) ISO/IEC 10744. Hypermedia/Time-based Structuring Language: HyTime.
[ISO97] International Organization for Standardization. (1997) Architectural Form Definition Requirements (AFDR), Annex A.3 of ISO/IEC N1920, Information Processing--Hypermedia/Time-based Structuring Language (HyTime), Second edition 1997-08-01. <http://www.ornl.gov/sgml/wg8/docs/n1920/html/clause-A.3.html>.
[RST93] Rettig, M., Simons, G., and Thomson, J. (1993) Extended Objects. Communications of the ACM 36(8):19-24.
[Sim97a] Simons, G. (1997) Conceptual modeling versus visual modeling: a technological key to building consensus. Computers and the Humanities 30(4):303- 319.
[Sim97b] Simons, G. (1997) Importing SGML data into CELLAR by means of architectural forms. <http://www.sil.org/cellar/import/>.
[Sim97c] Simons, G. (1997) Using architectural forms to map SGML data into an object-oriented database, in Proceedings of SGML/XML '97, Washington, D. C., 8-11 December 1997. See <http://www.gca.org/conf/sgml97/> for conference information.
[ST97] Simons, G., and Thomson, J. (in press) Multilingual data processing in the CELLAR environment. To appear in John Nerbonne (ed.), Linguistic Databases. Stanford, CA: Center for the Study of Language and Information. (The original working paper is available at <http://www.sil.org/cellar/mlingdp/mlingdp.html>.)
[TEI94] Sperberg-McQueen, C. M. and Burnard, L. (1994) Guidelines for Electronic Text Encoding and Interchange. Chicago and Oxford: Text Encoding Initiative.
URL: www.stg.brown.edu/webs/tei10/tei10.papers/Simonspaper.html
Last Modified: Wednesday, 19-Nov-97 15:26:57 EST
This page is hosted by the STG.