REPORT ON CETH SUMMER SEMINAR '94

Lewis M. Barth
Hebrew Union College -Los Angeles

This is a summary report on my experiences at the CETH Summer Seminar (June 19-July 1, 1994) held at Princeton. The report will conclude with an assessment of the implications of the seminar for my own work on an edition of Pirkei d'Rabbi Eliezer (PRE) and possibly other areas of Judaic Studies.

I. BACKGROUND

CETH (Center for Electronic Texts in the Humanities) is a joint center at Rutgers and Princeton. To quote its director, Susan Hockey, CETH has "three major activities: an Inventory of Machine-Readable Texts in the Humanities which is held on RLIN [Research Libraries Information Network], the establishment of focused collections of scholarly texts for access over the Internet via appropriate software, and an annual summer seminar on methods and tools for electronic texts in the humanities." The Center was funded in 1993 by the NEH "to act as a national focus for the scholarly use of electronic texts in the humanities within the USA."

The seminar was taught by Susan Hockey, Director of the Center for Electronic Texts in the Humanities; Willard McCarty, Assistant Director, Centre for Computing in the Humanities, University of Toronto, and the founding editor of Humanist; C. M. Sperberg-McQueen, editor in chief of the Text Encoding Initiative; Peter Robinson, Executive Officer for the Canterbury Tales Project, and developer of COLLATE; Elli Mylonas, Research Associate in Classics at Harvard University, and former Managing Editor of the Perseus Project; David Durand, Ph.D. candidate in Computer Science at Boston University and an active member of the Text Encoding Initiative and the HyTime committee (hypertext); plus members of the staff of Computing and Information Technology, Princeton.

Thirty participants were selected for this summer's seminar. We represented a wide diversity of background, scholarly interests and computing skills. Participants came from colleges, universities and research institutes in Hong Kong, New Zealand, Canada, Spain and Sweden in addition to the largest group from the United States. The participants had e-texts projects in the following natural (human!)languages: Chinese, English, French, Galician, Greek, Hebrew. Medieval English/French, Slavic Languages, Spanish, and Swedish. Among them were text scholars working on critical editions and also librarians responsible for electronic text archives at their universities. The group included graduate students and scholars at all levels of the academic ladder.

The program consisted of presentations on issues relating to electronic texts, demonstrations of software tools and humanities databases, and hands-on time learning the software and applying it to one's own project. Among the areas covered: simple to complex concordancing, optical scanning, text encoding, stylistic comparisons and authorship studies using concordance tools, introductions to SGML (Standard Generalized Markup Language), TEI (Text Encoding Initiative) and HTML markup languages, and tools for preparing critical editions (using COLLATE). In addition, there was some discussion of new techniques for the digitization of primary source materials and the preparation of hypertexts. Among the projects demonstrated: the Canterbury Tales Project, the Dartmouth Dante Project (a database of the Divine Comedy and commentaries), the Charrette Project (database and Medieval French Manuscript Tradition), WordNet (a lexical database of English), and Perseus (the CD- ROM of classical Greek civilization). There was also considerable discussion of the value of making e-material (texts, critical editions, images, etc.) available for scholarly research on the World Wide Web [WWW].

The program concluded with nearly two days of reports by the participants on the implications of electronic media for their various long range projects. Just a brief sample of some of the projects: The Papers of Elizabeth Cady Stanton and Susan B. Anthony, the Corpus of Modern Swedish, the Writings of Charles S. Peirce, an e-text archive of Chinese classics, a legal database of human rights documents, a critical edition of Anglo-Norman Fables of Marie de France, encoding modern written Galician (N.W. Spain) texts, and an electronic edition of Don Quixote. My own presentation was on issues relating to a critical edition of PRE, an eighth(?) century C.E. narrative midrashic text. There was also a report on the development of electronic archives at Harvard and Yale.

I think it is fair to say that the majority of participants engaged in textual projects or those dealing with large linguistic corpora expressed an increasing interest in or commitment to utilize TEI markup language and tagging for their documents.

II. ASSESSMENT FOR MY OWN PROJECT AND POSSIBLY OTHER AREAS IN JUDAIC STUDIES

I went to the seminar with two goals: 1) to learn how to set up files of PRE for COLLATE in order to do a collation of sixteen manuscripts on one chapter as a test project and 2) to learn more about TEI in order to determine if it was a useful markup language for the texts I was preparing. In fact, the seminar opened up far more for me than I had anticipated. By the end of the two weeks I felt that I had been exposed to a number of areas of electronic text manipulation which could be useful or vital for work on a critical edition regardless of natural language.

Naturally, not all topics were dealt with in the same depth or detail. Nevertheless, I became convinced that the electronic encoding of all (relevant) manuscripts and witnesses of a particular work would not only serve to provide the basis for doing a critical edition (more on that in a moment) but also could serve as a database for all future scholarly research on that text. The Dante Project and the Canterbury Tales Project, for example, demonstrate the value of having all relevant materials available online for the scholarly community. There is no reason not to imagine the importance of having the same type of resource for any significant Hebrew text.

How does this relate to the concept of "critical edition"? Such electronic databases of manuscripts may be the raw materials for a critical edition in the usual understanding of this term, or they may make such printed editions obsolete. The potential here is for any scholar to view graphics files of the manuscripts --far better than microfilms or Xerox copies, see and evaluate the text editor's transcription on screen next to the digitized image of the manuscript, click on a particular word to see immediately variants from all other manuscripts, click on a variant and call up the graphic image of the manuscript and the transcription from which the variant comes, and on and on. Through building the proper links it is possible to encode references to parallel passages, and it is also possible to make available significant comments to a particular passage. Further, if a text is thoroughly marked up, with the appropriate software one can do word searches, concordencing on the fly, and a variety of other tasks which provide the basis for evaluation and interpretation of a text.

Is this possible to do with Hebrew and associated texts? Some of this has been done for a long time. I'm thinking of the Leiberman Talmud Project, the various CD-ROM programs (ATM, Davka, Sh"oot=Responsa), the CAL project for Aramaic texts, as well as numerous resources (scholarly and popular) for study of the Hebrew Bible, LXX and New Testament. There also exist extensive electronic databases of Hebrew texts in Israel, at the Academy of the Hebrew Language and at Bar Ilan University, and there may be more. In addition, there are now the exceptionally important printed editions and concordances of Hekhalot literature and the Talmud Jerusalmi done by Peter Schaefer and his colleagues and students in Germany using TUSTEP as well as the edition of Gottfried Reeg of the texts of the Ten Martyrs legend. Some of what I have listed is available commercially, some not. Some is available for research purposes on an individual basis, some not. My purpose in lumping these items together is merely to indicate that there already exists a significant body of Hebrew electronic texts. To my knowledge none of this extensive material is available on line in the same way as the Dante Project. Reasonable questions, of course, are: should it be available on network, and should there be costs?

Incidently, using Hebrew Kermit (available at no expense thru ftp), it is now possible for an individual scholar anywhere to access the catalogues of all major libraries in Israel and do bibliographical searches in English or Hebrew characters. This is a very important contribution to Jewish studies internationally. In addition, with Hebrew Kermit it is possible to visit the Jerusalem Mosaic on WWW, though not to view graphics, etc. I assume that people who can access WWW through Mosaic can do that and more. All of which suggests that capacity exists to make transcriptions and graphic images of Hebrew manuscripts available through WWW. However, I'm not aware of (and would be delighted to hear about) projects for the development of on-line databases of Hebrew manuscripts accessible for research in this way.

For me, one of the most significant aspects of presenting manuscript material electronically is the open ended nature of the medium and the potential for teamwork. First, there seems to be a typical difficulty for anyone involved in textual editing of ancient or medieval materials: a critical edition is just published when someone discovers another manuscript which alters our understanding of the tradition. In the electronic medium there is no problem. The new manuscript needs to be transcribed, linked up and entered into the electronic database. Second, transcriptions of manuscripts can be revised, updated and credit given to those who contributed through an appropriate revision history on screen. Finally, there is no need to worry about the mortality of one editor and his/her work being unusable for others. Because of standardized encoding procedures it is possible to build on e-texts previously produced, improve them, and continue the process of textual criticism as well as interpretation.

The potential elegance and ready accessibility of the end product should not mask the difficulty or time commitment necessary for preparing electronic texts. Readily available and sharable DTDs (Document Type Definitions) and headers for Hebrew texts with TEI and HTML markup need to be prepared so we all don't have to reinvent the wheel. Some such HTML DTDs must exist as it is possible to access Hebrew documents from Israel on WWW, as previously noted. (Note: The standard for network graphic presentation of Hebrew is ISO 8859-8 [Latin/Hebrew], which is widely accepted in Israel and Europe and is defined for TEI Writing Systems Declarations to be used in transliteration along with other ISO alphabet standards).

For the individual scholar preparing e-texts, it is relatively easy but very time consuming to convert Hebrew texts created on a standard Hebrew or multi-lingual wordprocessor from upper ASCII to lower ASCII for portability between PC, Macintosh and UNIX environments. Such conversion is especially important in preparing texts for processing by COLLATE which at present is a program written only for Macintosh. One can convert the results of COLLATE processing back to upper ASCII for importing to Hebrew word-processors, but that takes time as well. And in every conversion there is the potential possibility of loss of information.

Consequently, it is desirable for wordprocessing tools to be created to do all this using Hebrew characters right-to-left with simple transformation if necessary (press of a key or two) to lower ASCII left-to-right for various types of processing or network transmission. Perhaps such wordprocessing software already exists, but it needs to be in the public domain and available for scholars and students on any platform. This is not a criticism of some very good commercial wordprocessors for Hebrew, but a reflection of the special needs of scholars using electronic media. As an example of the issues involved, will it be possible to use Author/Editor, a SGML aware wordprocessor, for right-to-left languages with easy explanations of how this is to be done? (Note: TUSTEP provides a very rich programming environment to do much of this, but requires very extensive instruction to master its commands and is designed for eventual printing, rather than making SGML/TEI conformant e-texts available over the network or for paper publication).

As one participant concluded, the CETH Summer Seminar raised a great many issues and forced all of us to reflect on if not reevaluate how we intend to develop our projects. I have lots of questions about the implications of what I learned for Hebrew text processing for research purposes. For me, and perhaps many who use computers all the time, but are not technically trained in computer programming, this type of Seminar was invaluable.


Please send comments to: lbarth@bcf.usc.edu