SGML: MLA '94 Trip Report (Sperberg-McQueen)

SGML: MLA '94 Trip Report (Sperberg-McQueen)

From @UTARLVM1.UTA.EDU:owner-tei-l@UICVM.BITNET Tue Jan  3 18:44:15 1995
Return-Path: <@UTARLVM1.UTA.EDU:owner-tei-l@UICVM.BITNET>
Received: from UTARLVM1.UTA.EDU by (4.1/25-eef)
	id AA22696; Tue, 3 Jan 95 18:44:09 CST
Message-Id: <>
   with BSMTP id 6796; Tue, 03 Jan 95 16:45:40 CST
Received: from PUCC.PRINCETON.EDU (NJE origin VMMAIL@PUCC) by UTARLVM1.UTA.EDU (LMail V1.2a/1.8a) with BSMTP id 8958; Tue, 3 Jan 1995 16:45:39 -0600
Received: from PUCC.PRINCETON.EDU by PUCC.PRINCETON.EDU (Mailer R2.10 ptf008)
 with BSMTP id 2100; Tue, 03 Jan 95 17:44:41 EST
 PUCC.PRINCETON.EDU (LMail V1.2a/1.8a) with BSMTP id 2359; Tue, 3 Jan 1995
 17:43:39 -0500
Date:         Tue, 3 Jan 1995 16:40:55 CST
Reply-To: "C. M. Sperberg-McQueen" <U35395%UICVM.bitnet@UTARLVM1.UTA.EDU>
Sender: Text Encoding Initiative public discussion list
From: "C. M. Sperberg-McQueen" <U35395%UICVM.bitnet@UTARLVM1.UTA.EDU>
Organization: ACH/ACL/ALLC Text Encoding Initiative
Subject:      trip report:  MLA '94
To: Multiple recipients of list TEI-L <TEI-L%UICVM.bitnet@UTARLVM1.UTA.EDU>
Status: RO

                              Trip Report
                           MLA '94, San Diego

                         C. M. Sperberg-McQueen
                             December 1994

   The annual convention of the Modern Language Association was held
this year in San Diego, California, between Christmas and New Year's (as
always), and I was able to attend not quite two days of it.  I am told
that the attendance at MLA is lower when the site is in California rath-
er than New York, Chicago, or other eastern locations, which only goes
to show that even professors of language and literature, possessed of
degrees and academic positions, can be very foolish.  San Diego (need it
be said?) was gorgeous, and I now regret not having extended my stay for
another day. When I arrive back in Chicago, where it has apparently
decided to become winter again, I may regret not having extended my stay
for another four months.

   Arriving at the end of the second day of the conference, I was not
able to attend all of the sessions I would have liked, but the sessions
I did manage to get to made this, for me, one of the more pleasant and
informative MLA conferences I can remember.  Here, for what they are
worth, are some notes on the conference and brief paraphrases of those
papers which struck me most (or at least on those which moved me to take
the most coherent notes) and which seem most likely to be of interest to
readers of this list.

   In an unusual fit of industry, I spent some time on the plane to San
Diego reading the publishers' advertisements in the conference issue of
PMLA and noting down things I wanted to look at.  As a result, when I
went to the book exhibits, I had some things to look for, and the four
acres or so of exhibits were a bit less intimidating than they usually
are. (Of course, I did not see everything I wanted to, still less did I
browse the other stands at random, as I had planned.)  At the Chadwyck-
Healey booth, I saw one of the few computers in evidence (do you remem-
ber when IBM exhibited at the MLA?), but the CD-ROMs had been shipped
separately from the machine, with the results predicted by Murphy's Law,
so I did not actually get to see the current state of the Patrologia
Latina or the English Poetry Database.  I did have a pleasant conversa-
tion with Doug Roesemann, who says the PL and EP are doing very well,
thankyouverymuch, and I did pick up some interesting brochures for sev-
eral of their increasingly numerous other SGML-based full-text products,
including a collection of English verse drama, a Database of African-
American Poetry 1760-1900, a collection of various editions and adapta-
tions of Shakespeare, and a similar collection of thirteen different
editions of the Bible in English.  At the Stanford University Press
booth I found a book on the Germanic dialects written by my old teacher
of historical linguistics, Rob Robinson, which became my bedtime reading
for the conference (a bad choice, perhaps:  it's very interesting and
didn't put me to sleep).

    At the University of Michigan Press, a new book on German editorial
theory edited by Hans Walter Gabler, George Bornstein, and Gillian Bor-
land Pierce turned out not to have appeared yet.  But the Michigan bro-
chure had a nice page on the Society for Early English and Norse Elec-
tronic Texts (SEENET), a subscription series organized by Hoyt Duggan of
Virginia and edited by him and Thorlac Turville-Petre of Nottingham, on
lines similar to those of the Early English Text Society which has done
so much for the publication of medieval English texts.  The first item,
a diplomatic transcript of one manuscript (Corpus Christi 201) of Lang-
land's PIERS PLOWMAN, will be out within a few months; I was pleased to
see it advertised as a TEI-conformant document. At Michigan I also found
Richard Bailey's book IMAGES OF ENGLISH, a very interesting survey of
attitudes toward English over the past fifteen hundred years or so.  And
finally, at the University Press of Virginia, I saw and bought a recent
reprint of Jerome McGann's CRITIQUE OF TEXTUAL CRITICISM.  This seems
particularly appropriate since for me the central theme of this MLA
seems to have been the electronic scholarly edition:  how to create it,
how to disseminate it, what it ought to be like.

   On the evening of Wednesday, the 28th of December, after seeing the
book exhibits, I got to a session on computers and Germanic philology.
James Marchand of the University of Illinois at Urbana/Champaign began
the session with a characteristically enthusiastic and diverting survey
of the resources and tools available for the philologist interested in
computers, covering (I abbreviate heavily) discussion lists, texts for
the asking, ftp, character encoding problems, fonts, Unicode, Archie,
library catalogues, HyTelnet, Gopher, Mosaic, and the future.  "You can
know more about these texts than your teachers did; you can know more
about them than Hermann Paul did."  (Note for non-Germanists:  Hermann
Paul, a prominent Germanic philologist of the late nineteenth century,
knew everything there was to know about the older Germanic dialects and
their extant texts.)  He was followed by Geoffrey B. Muckenhirn, also
from Urbana, who described in simple coherent terms the difficulties
facing academics who hope to collaborate with each other over the
network--in particular, some of the character-set disasters lying in
wait for the unwary.  Antonette diPaolo Healey, the editor of the Dic-
tionary of Old English at Toronto, then gave an illuminating and lively
discussion of the technical setup of the Dictionary project at the
present, and of its plans for the future.  She described the publication
of the DOE corpus, pointing out with fully justified pride that the Dic-
tionary of the Old English is the first scholarly dictionary in history
to publish the entirety of its citation base, so that reviewers and
users of the dictionary have access to the same body of primary informa-
tion as the lexicographers.  She described how through the efforts of
Lou Burnard at Oxford the DOE corpus had become one of the first major
electronic resources to be made TEI-conformant, and discussed the vari-
ous forms in which the corpus is now available.  After an overview of
the inhouse tools used by the Dictionary staff, she concluded by outlin-
ing the project's plans for the future, in particular their desire to
move from their current typesetting-based system of text markup toward a
structural markup of the dictionary entries based on the TEI encoding
scheme.  This, as one might imagine, made me very happy indeed.  Randall
Jones of Brigham Young University concluded the session with a report on
the BYU Spoken Corpus of Modern German.  Modeled on the Basic German
materials collected by J. Alan Pfeffer in 1960, the BYU corpus includes
400 interviews (about 80 hours) with native German speakers, performed
at over sixty locations throughout western and eastern Germany, Switzer-
land, and Austria.  Mr. Jones outlined the goals of the project and
described briefly some of the more startling results, e.g. that in
almost half of its occurrences, the conjunction WEIL introduces a coor-
dinate not a subordinate clause.  The interviews are transcribed in
orthographic form, using Word-Cruncher markup.

   The next day, I attended among other sessions a discussion of a draft
statement on the importance of preserving research materials in original
forms.  A committee chaired by G. Thomas Tanselle has drafted this
statement in an attempt to make clear that reproductions of existing
material in new forms (in particular in microfilm or in electronic form)
do not mean the original material has no further interest for scholars.
Libraries or others who discard original materials once they are micro-
filmed or digitized are thus discarding potentially significant sources
of information regarding our cultural heritage.  There was substantial
agreement that a statement of some sort was needed, though a fair amount
of niggling over the details of content and tone.  Copies of the draft
statement may be obtained from the MLA; comments should go to Thomas
Tanselle by the end of February.  It would be useful if more people
knowledgeable about electronic texts were able to read the draft and
comment on it.  In particular, I am lukewarm to the proposal that no
electronic texts should be created which do not include scanned images
of the original pages: even if scanned images are included, the elec-
tronic version is not a full substitute for the original, and if scanned
images are not included, the chances are much smaller that anyone in
possession of their senses will suggest that the original can be
destroyed once the electronic version is created.

   That afternoon, the MLA Committee on Scholarly Editions held the
first of two sessions it organized, on "Practice and Ideal in Electronic
Scholarly Editions". Hoyt Duggan of Virginia described the Piers Plowman
Electronic Archive he is now working on.  We are on the verge, Mr. Dug-
gan suggested, of a major reconceptualization of text, but we are not
yet in a golden age.  Too many electronic editions are transcribed not
from the best available source but from one conveniently out of copy-
right.  Too many are quick and dirty transcriptions for quick and dirty
research; too few reflect the standards which ought to prevail in elec-
tronic as in other types of philological work.  An electronic archive,
Duggan argued, can finally render obsolete the acrid disputes of the
past decades over whether medieval works should be edited from the best
extant manuscript, or from an archetype reconstructed from a stemma of
manuscripts, or whether to emend at all, or whether to do so conserva-
tively or not.  Such debates are not intrinsic to the task of editing,
it turns out, but rooted in the characteristics of print publication, in
which it is economically infeasible to print more than one or two ver-
sions of any sizable text, and which has some difficulty representing
the protean shifts of the medieval manuscript text in clear and vivid
ways.  The editor of an electronic text, unlike the editor bound to
RECONSTRUCTION, but may produce as many documentary and reconstructive
editions as energy and longevity permit.  The Piers Plowman electronic
archive will exploit the opportunities offered by electronic editions to
provide both scanned color images and diplomatic transcriptions of all
54 manuscripts and three early printings of the work, with codicological
and paleographical descriptions, links between the transcriptions and
the images, and eventually manuscript collations, a stemma, and a recon-
structed archetypal text.  "SGML markup allows editors of documentary
editions to have their cake and eat it too," Mr. Duggan said, with par-
ticular reference to the ability of SGML encodings to give both a tran-
scription of an erroneous reading in a manuscript and an emendation
based on other manuscripts.  The archive as a whole will provide a firm-
er foundation than has hitherto been available for text-critical stud-
ies, dialectology, instruction in paleography, study of Middle English
meter, and a whole host of other fields.

   Peter Robinson, of Oxford University, then described the Canterbury
Tales project at Oxford, which similarly intends to make that work
available in transcriptions and scanned images of every manuscript wit-
ness, together with collations and secondary materials.  The first
CD-ROM (of a projected series of thirty) should appear in 1995, with the
Wife of Bath's Prologue in diplomatic transcripts and images of all 58
manuscripts, collations, a database of variant spellings, a bibliography
of secondary literature, and new descriptions of all manuscripts based
on fresh inspections.  The images on this CD-ROM are derived from exist-
ing black and white microfilms, Mr. Robinson said, and are intended pri-
marily to allow readers to assess the accuracy of the transcriptions,
not for use with image enhancement programs or other similar digital
magic.  (The next CD-ROM, by contrast, will contain new full-color
(24-bit) scanned images of a single manuscript.)  "Encoding is vital:
encoding is what makes this edition possible," Mr. Robinson said; as
many as three-quarters of the bytes on the CD-ROM will be markup rather
than "content".  The density of information represented by such markup
makes encoding on this scale expensive; Mr.  Robinson estimated that
costs of creating the CD-ROM might run as high as forty dollars per page
of original material, but I did not follow all of his arithmetic.  One
consequence of such a cost is that editions of this type must be collab-
orative: "The age of the heroic individual editor is over."  The logis-
tics alone can be a challenge in such a project.  In this project, Cam-
bridge University Press has played a big role; the experience has shown
that the difference between an individual user's word-processing file
and a publishable CD-ROM is every bit as great as that between a type-
script made on a manual typewriter and a printed letterpress book.  He
concluded by observing ironically that we have spent six centuries disa-
greeing over how to read a single text of the Canterbury Tales; now we
will have fifty-eight texts to read and disagree over.

   Ian Lancashire, of Toronto, then reported on his experiences creating
electronic versions of Renaissance English texts, for use in undergradu-
ate education, for the study of Renaissance word meanings and lexicogra-
phy, and for eventual integration into a hypertextually connected
knowledge-base of the period.  He provided a usefully concrete view of
one practitioner's methods of dealing with markup in practice, with an
extensive handout of examples showing, among other things, a sample text
in his private data-capture format (designed for simple keyboarding), a
sed script for translating that data-capture format into SGML, the sam-
ple text in SGML after translation, and a Perl program to insert line
numbers in the appropriate SGML attributes of a poem.  Other portions of
the handout show the same text in various SGML and non-SGML guises into
which it can be translated by software.  His work with Renaissance Eng-
lish texts has forced him, Mr. Lancashire said, to extend the TEI encod-
ing scheme in two ways.  First, he has developed new tags to provide a
fuller record of the physical structure of a book, and second, he has
developed conventions for transcribing special characters for which
there are no publicly defined SGML entities.  (I am not entirely certain
that this last need be regarded as an extension of the TEI scheme, but
it does illustrate very well one type of application for which the TEI's
writing system declaration was designed.)  Mr. Lancashire ended his talk
by describing some specific ways in which his work on electronic
editions had led him to new understandings of Renaissance English texts
which he would otherwise not have gained:  in developing his conventions
for character encoding, he found that Renaissance English writers
disagree even over the letters of the alphabet, and have in fact no
clear concept of an alphabet, as distinct from a script.  And his study
of Renaissance English dictionaries has changed radically his views on
Renaissance word meanings.

   In the concluding talk of the session, John Unsworth of Virginia gave
a big-picture account of the cultural context within which the rise of
the network is taking place.  The first half of his talk was devoted to
a loving dissection of some unfortunate writer's lament on the rise of
electronic culture, and to a scorched-earth refutation of that writer's
incredible misapprehension (or was it supposed to have been a mispri-
sion?) of Walter Benjamin's essay on the artwork in the age of its tech-
nological reproducibility. The second half took off from there, and
defies any attempt at paraphrase, at least by me.  The kind of analysis
involved can easily lose all detectable touch with hands-on practical
issues, but Mr. Unsworth seemed to me to keep his high-level cultural
discussion well connected to the day to day reality of computer and net-
work use.  It was a good talk, and a good conclusion to the session.

   The next morning, the Committee on Scholarly Editions had its second
session, entitled simply "Electronic Scholarly Editions".  Unfortunately,
my notes from this session are less complete, so I cannot do the papers
full justice. Charles Faulhaber of Berkeley began his talk on "Before the
Computer:  the Search for Non-Linearity" with a quick overview of the
history of textual computing, from the "paleo-computing" period in which
to the computer was used solely as a tool for the construction of print-
ed products, to the current period, in which the electronic text is
viewed as a product in itself.  The problem with treating the printed
product of a computing project as primary, he said, is that we have to
give up one of the primary and most valuable characteristics of the
electronic text:  its non-linearity.  We read texts sequentially, or
syntagmatically, but we study them non-sequentially, or paradimatically.
He then proceeded to an exhilarating review of the development, over
centuries, of methods of easing the burden of research by providing
tools to support non-sequential reading.  He began with the accretion of
glosses on the text of the Bible, led through the development of the
glossa ordinaria to that of the alphabetical index, and eventually to
the creation of a word list, and then a concordance, for the Bible.  He
also reviewed the gradual creation of topically arranged surveys of
Christian teaching (in Peter Lombard's SENTENTIA) and of canon law (in
Gratian's DECRETUM).  Finally, he discussed the development of concor-
dances at greater length, from the first hand-made concordances to texts
other than the Bible (1787, for Shakespeare) to the first computer-
generated concordances, and on to the present.  It was a beautiful dem-
onstration of the long long search for ways to do, in print, things that
are very hard to do in print, and very easy to do with computers.

   Susan Hockey gave the second talk of the session, describing some
problems confronting us as a community as we seek to create the next
generation of editions using computers.  The vast quantities of material
which can be put into computerized archives -- one has only to think
back to the talks on the Piers Plowman and Canterbury Tales projects --
will completely overwhelm us if we do not find strong methods of organ-
izing it and navigating through it.  There is a great deal of intellec-
tual effort involved not just in transcribing texts into electronic form
but in creating the links needed to organize the material, and in
retaining the integrity and authority possessed by good printed edi-
tions.  As John Unsworth had pointed out the day before (quoting Willard
McCarty), many computing projects begin by trying to imitate the charac-
teristics of print.  The development of the TEI Guidelines may be the
most significant development in humanities computing, in part because
they provide an incremental, extensible system for markup which is NOT
tied to the characteristics of the printed page but works directly with
the characteristics of texts as abstract objects.  Three aspects seemed
to her particularly important:  the TEI header, with its provision of
METADATA describing the electronic document itself; the work done on
diplomatic transcription of manuscripts; and the facilities for analysis
and interpretation of texts.

   On software issues, she warned that new users (and funding agencies)
are often attracted by glitzy interfaces, but that functionality is more
important in the long run, and harder to evaluate quickly.  Mechanisms
for the delivery of electronic texts still require work.  CD-ROMs, in
particular, are not in themselves the answer:  they are a closed system,
typically dependent on one particular piece of software.  Network-based
delivery mechanisms have more promise for the long term, but the rapid
progression of fashions, from Gopher to WAIS to WWW to what-next, indi-
cates persuasively how far we still have to go before having a system
that is satisfactory in itself, and not just better than anything else

   In the future, Ms. Hockey continued, the largest task for computing
humanists will be the development of high-function software capable of
exploiting our software-independent data, to integrate text with digital
images and with metadata.  Like TACT and OCP, she expects all software
truly suitable for scholarly research to be developed within the schol-
arly community rather than commercially.  Such software needs to go
beyond the string-search mechanisms currently used, to integrate much
fuller knowledge about morphology and the lexicon of the languages being
transcribed: for this, a dynamically extensible lexical database seems
the most necessary resource.  We need prototypes of electronic editions
with appropriate software, with which we can experiment to see what
works and what does not work; they should exploit the long-established
technology necessary to make group annotation work; they should exploit
the TEI's extended-pointer mechanisms to allow references to portions of
documents rather than just to whole files; and they must work with the
authentication mechanisms now being developed for network transactions.

   I gave the concluding talk of the session myself (and now you know
why my notes for the others are spotty).  In it, I first described
briefly the main requirements I think any electronic edition must meet:
accessibility, longevity (electronic editions must not become technical-
ly obsolete before they become intellectually obsolete), and intellectu-
al integrity.  A successful choice of useful software, I argued, is
emphatically not one of the requirements, since electronic editions
ought not to be limited to any one piece of software but should be
designed from the ground up to be software-independent.  After reviewing
the work and results of the TEI, with particular attention to the tags
most relevant to critical and scholarly editions, I concluded with a set
of recommendations for those setting out to create electronic scholarly
editions.  To some, these will seem painfully obvious, but it sometimes
seems that stating the obvious is a very good way to start heated

*   Strive for software- and hardware-independence.
*   Distinguish firmly between the intellectual requirements of the edi-
    tion and the requirements for convenient distribution and use of the
*   Create the edition in a software- and hardware-independent notation.
    Derive platform-specific versions of that archival form for distri-
    bution when and as necessary.  Never confuse the edition itself with
    the temporary forms it takes for distribution and use.
*   Use documented, publicly defined, non-proprietary formats for the
    archival version of the edition.  At this time, there is no serious
    alternative to SGML for this purpose.  Use proprietary formats only
    for distribution versions.
*   Exploit the ability of the computer to provide multiple views of the
    same material.

   At this point in one of my trip reports, I believe it must be safe to
assume that I am talking only to those people who find reading my trip
reports useful or enjoyable.  If you are one of these, could you do me a
favor?  Drop me a line at saying "Yes, I read the MLA '94
trip report."  I'd like to know whether there is in fact anyone out there
at all.  Thanks.

   After the conclusion of the second session on electronic scholarly
edtions, I was just in time to hear the second and third papers in a
session on early German women's writing.  I missed a paper on Hrotsvita
von Gandersheim, but heard most of a talk by Valerie Hotchkiss on
Schwesternb|cher, collections of stories about the women in German con-
vents, which shed a great deal of light both on sex roles and on late
medieval mysticism and devotional practices.  The third paper was an
informative report by Sara Westphal-Wihl on her current research regard-
ing women as owners and collectors of books in the medieval and late
medieval periods.  It was a pleasure to end the conference with talks
which so successfully applied traditional philological virtues to the
treatment of non-traditional problems, and which reminded me so plea-
santly of the reasons we are striving to make computers into serviceable
tools for literary and historical research.  I left the conference feel-
ing an optimism about the study of literature that many other MLA con-
ferences have not given me.

   In the meantime, I have arrived back in Chicago, and reports of win-
ter's arrival prove to have been greatly exaggerated.  It was good to go
to San Diego, and it is good to be back home.  Happy New Year, all!

-C. M. Sperberg-McQueen
 ACH / ACL / ALLC Text Encoding Initiative
 University of Illinois at Chicago

P.S. The concluding paragraph was written before the wind chill dropped
to below zero degrees Fahrenheit.  Winter HAS arrived, and it's hard to
resist the thought that it wouldn't have hurt at all to stay in San Diego
for a while longer!  -CMSMcQ