A Report on the 1994 CETH Summer Seminar

Electronic Texts in the Humanities: Methods and Tools

by Mary Mallery, MLS Candidate at the School of Communication, Information and Library Studies, Rutgers University

The third CETH Summer Seminar, co-sponsored by the Centre for Computing in the Humanities, University of Toronto, was held at Princeton University in the final two weeks of June. The thirty participants hailed from seven countries including the United States: Spain, Sweden, Canada, Australia, New Zealand, and Hong Kong. Participants also came from a variety of disciplines: humanities scholarship, computing, publishing, and the library communities. For two weeks we shared the facilities at Princeton University and tried to speak each other's language and see electronic texts from one another's point of view.

In the first week, the two principal instructors, Susan Hockey, the Director of CETH, and Willard McCarty, Assistant Director at the Centre for Computing in the Humanities, University of Toronto, conducted morning lecture/discussion sessions on text preparation and markup, while in the afternoons the participants manned the terminals for hands-on work with text-processing applications (MTAS, Micro-OCP and TACT). Gregory Murphy, the CETH Text Systems Manager, Peter Batke, the Humanities Consultant in the Princeton University Instructional and Media Services Group, and Alan Goldberg, Manager of Public Computing Facilities at Princeton, also assisted at these sessions, so that there was a good deal of one-on-one help available. All hands-on work had a text as its base. After a trial session with MTAS, we traced thematic developments in Shakespeare's sonnets with TACT, a difficult task, as Willard noted in his lecture, since when a poet speaks of love, for example, he rarely uses the word itself. On another day we practiced manipulating the output of Micro-OCP, based on Susan Hockey's morning introduction to the various types of concordances and the sorting power of Micro-OCP.

Once we understood the basics of computer-assisted text analysis, Robert Hollander, Director of the Dartmouth Dante Project (DDP), demonstrated the DDP database on-line and showed us how tracking the interpretation of a slight variation in the language of a text as wel-studied as Dante's Divine Comedy can show the conventions of literary criticism as well as open large questions about how we today perceive Dante's satiric and poetic vision. Lisa Horowitz, a CETH staff member, conducted a tour of the new ARTFL interface on Mosaic and provided all participants with a guide to PhiloLogic, the search engine to ARTFL used at the University of Chicago gateway telnet site.

At the next session, Susan gave a guided tour through the world of corpus linguistics and the special problems of these very large databases of textual information. Then C. Michael Sperberg-McQueen, Editor-in-Chief of the Text Encoding Initiative (TEI) and Senior Research Programmer at the University of Illinois at Chicago, ended the week with an introduction to SGML (Standard Generalized Markup Language) and the TEI. With Michael leading us, the group worked together to construct a TEI header and document type definition (DTD) for a particularly thorny example from Gibbon's Decline and Fall of the Roman Empire (chapter 30). The discussion included how to tag titles, subtitles, footnotes, marginalia, and various Latin and Greek quotations. Two pages of the text took us more than an hour to tag. In this way, we began to understand why the TEI P3 (which has the rules for all kinds and types of text, including poetry, prose and drama) is 1300 pages long. The practical, hands-on session featured work with SoftQuad's Author/Editor program.

It was here that we began to first talk about hypertext, a subject that dominated the second week of the seminar. Gregory Murphy gave a talk that provided the segue between SGML and HTML (hypertext markup language), and he demonstrated his own Web server. Many of the participants from libraries had come to the seminar with the idea of setting up a Web server, or using the Mosaic interface for a multimedia teaching environment.

Every participant at the seminar received a large binder full of documentation (much of which was specially prepared by the instructors for this seminar). In fact, the comprehensive nature of the reference materials that CETH provides are not appreciated until after you leave the seminar and no longer have the assurance of Michael ready to answer any TEI question or Susan available at the next terminal in case of problems with Micro-OCP or Willard as the TACT trouble-shooter.

During the second week of the seminar, Peter Robinson, shared his expertise in preparing scholarly electronic editions with Collate 2, a widely-used computer collation program which he developed at the Oxford University Centre for Humanities Computing. Later in the day, the participants had the opportunity to work with Collate on their own at the Princeton University Computing Center. As an example of work being done in scholarly editing, Karl Uitti, the John N. Woodhull Professor of Modern Languages at Princeton University, demonstrated his Charrette Manuscript Project, in which a team of researchers at Princeton is building a database which documents this thirteenth century manuscript of Chretien de Troyes' poem "Lancelot" both in text and in digitized image. Peter Robinson also talked about the digitization of images, and on the following day we had an example from art history, when Kirk Alexander demonstrated Princeton University's "Piero Project," a three-dimensional model of a chapel used to study fresco cycles in the Italian Renaissance.

Electronic dictionaries were the focus of another session, and, after an introduction by Susan, Willard showed us his work-to-date on building an online lexicon of Ovid's Metamorphoses with TACT. George Miller, James S. McDonnell Distinguished University Professor of Psychology, Emeritus and Senior Research Psychologist at Princeton University, presented WordNet, an on-line lexical database for English, organized by semantic relations. Gregory Murphy then walked us through the New OED2 with PAT/LECTOR as the search engine.

Then we turned our attention entirely to hypertext, and Elli Mylonas, lead project analyst in the Scholarly Technology Group at Brown University, and David Durand, a member of the TEI hypertext committee and co-author of a book on Hytime, gave a joint presentation on hyperext, with the background and history of the field and a look at different document models and link models that designers have used to date in this emerging field. Elli Mylonas demonstrated StorySpace and then the Perseus project, for which she served as the managing editor. David Durand introduced Hytime, an international standard (ISO 10744:1992) for a hypermedia SGML application.

Toward the end of the final week, Susan and Willard shared the podium to present advanced analytical tools not commonly available, e.g., pattern recognizers, lemmatization systems, morphological analyzers, and parsers. The contributions of computational linguistics and artificial intelligence were also discussed. One question that came up was: When providing access to a large number of texts, how do you deal with too much information? The need for resource development was stressed. More consistent implementation of standards, like the TEI, is also important so that project designers can build on one another's accomplishments.

After the seminar participants had presented their projects (see below for a list of the seminar participants), Susan and Willard brought the discussion back to some basic questions, such as: What is the role of humanities computing: merely an efficient facilitator of traditional work or a fundamental component for pursuing new questions? Where do we go from here with software, and with its application? How can the machine better assist us in educating the imagination?

There was plenty of time for socializing also. At night, many in the group dined out together, on Sunday the group organized a hike in the Delaware Water Gap, and there was a banquet on the last evening of the seminar. But many evenings were spent in front of a computer. Everyone who came to the seminar had a project in electronic texts which they had already prepared (through scanning or typing). Each night we prepared for the presentation of our projects, which was scheduled for the final two days of the seminar. There was such an open exchange of ideas and challenges at the terminals, that some participants combined forces to give joint presentations.

Do you walk away from a CETH Summer Seminar speaking a different language than you did when you first came? At the first gathering of the seminar participants, two people remarked to me that they expected the seminar to change their life. At the time, I thought they were being picturesque, but the camaraderie that emerges from such a diverse group of people during two weeks of steady and intense work on such difficult and intriguing problems does lead to a transformation of sorts, or as Elizabeth Burr, a participant from Rice University, said, "Willard's work on the Metamorphoses is relevant to more than just Ovid's text." Though there are so many questions still to answer, perhaps the most important question is answered by such an affirmative experience: Yes, we can work together to see that the use of electronic texts and hypermedia in humanities scholarship is done well and with the kind of careful thought that will ensure its utility for many years to come.

