[Mirrored from: http://info.ox.ac.uk/ctitext/publish/comtxt/ct13/burrows.html]
|Computers & Texts No.
13||Table of Contents||December 1996|
The University of Western Australia Library
How to provide access to large text databases, such as those published by Chadwyck-Healey Ltd, for use throughout an institution is now particularly significant. This article describes one solution and details some of the associated issues and problems.
The Scholars' Centre in The University of Western Australia Library has recently established an Electronic Text Service for academic staff and students in the humanities. The vehicle for this service is the DynaWeb software of Electronic Book Technologies, Inc., which enables large full-text databases to be accessed by Web browsers across the campus network. Magnetic tapes of databases published by Chadwyck-Healey Ltd have been loaded to a central server and made available through DynaWeb. This process has not been without its difficulties, but the results are very encouraging for the provision of large-scale humanities databases.
The Scholars' Centre was established in 1994 as a unit within the University of Western Australia Library providing a range of services to academic staff, postgraduates and honours students in the humanities and social sciences. These services include document delivery and inter-library loan, the management of the Library's Special Collections and Microform Collection, and administration of a large study area with personal study rooms and desks. But an area of major interest has also been the application of electronic techniques to the process of scholarship in the humanities. The Scholars' Centre coordinates both the Library's World Wide Web server (http://www.library.uwa.edu.au) and the University's Campus Wide Information Service (CWIS) (http://www.uwa.edu.au), as well as maintaining its own electronic mail group with over 400 subscribers. An extensive programme of instruction in the use of the Internet and of locally networked electronic resources has also been undertaken, mostly on an individual basis (Burrows, 1995).
The latest addition to the Centre's range of services is an Electronic Text Service, which is being officially launched in November 1996. Funded in part by a grant from the Faculty of Arts, and modelled to a considerable extent on the University of Virginia's Electronic Text Center (http://www.lib.virginia.edu/etext/ETC.html), this service has the initial aim of providing a group of full-text scholarly databases to staff and students in the humanities, over the campus network. Unlike Virginia, however, the software used as the basis of the service is the DynaWeb and DynaText package, produced by Electronic Book Technologies, Inc. (http://www.ebt.com). The Scholars' Centre is a participant in this company's Higher Education Grant Program.
Electronic Book Technologies is well-known for its DynaText software, which is widely used for publishing electronic books on CD-ROM and over local networks. Among the most notable scholarly examples of its use are Chadwyck-Healey's English Poetry Full-Text Database, and Peter Robinson's edition of the Prologue to the Wife of Bath's Tale (http://www.shef.ac.uk/uni/projects/ctp/index.html). DynaWeb is a more recent product, which provides World Wide Web server software to make DynaText books available to Web browsers, over intranets as well as the Internet. The current release of DynaWeb (version 1.0) lacks some of the functionality of DynaText, particularly its ability to have a table of contents and the full text scrolling in parallel in the same window. But DynaWeb offers the considerable advantage of being much easier to offer over a network, making substantial humanities texts to anyone with a standard Web browser. This includes text-only software like Lynx, as well as Netscape and other graphical browsers.
DynaText works from files with SGML-compliant markup, which are run through a 'make-book' program to construct a series of index files. Once this has been done, a utility called InStEd is used to construct and edit stylesheets. These control the formatting for various views of a DynaText book, most importantly for the table of contents and the full text. To create the DynaWeb view of a book requires the construction of an additional stylesheet (or set of stylesheets) specifically to translate elements in the DynaText DTD into HTML markup. This is done by assigning an individual element to a group representing an HTML element; a DIV1 element in TEI Lite might be assigned to the <H2> group, for example. Much of the routine translation is done automatically by InStEd, but structural elements in particular need to be mapped individually. Unsurprisingly, perhaps, it is difficult to translate lists and tables into an acceptable DynaWeb display.
DynaWeb allows both browsing and searching. Since the current version of the software does not make use of frames, only one page can be displayed at a time. This means that tables of contents must be browsed sequentially until the first layer of the full text is reached. If the book's structure is a particularly complicated one, the browsing process can take several screens to reach the text. But users can also take advantage of the navigational features of the Web browser client, as well as of navigational buttons within DynaWeb itself.
Searching in DynaWeb and DynaText is controlled by query files which can be customized for each book. These enable the publisher to construct ready-made queries, by specifying particular combinations of elements and attributes in a comparatively simple syntax. In DynaWeb, each of these queries appears as a search form or combination of forms, which the user can fill in without needing to know the syntax or tags involved. This is a powerful tool which can be employed in sophisticated ways.
A Demonstration of DynaWeb is available from Electronic Book Technologies' WWW site at http://dynatext.ebt.com/docs/ebthome.html
The first database to be published through the Electronic Text Service, in June 1996, was the Italian text of the Divine Comedy of Dante Alighieri, in the Petrocchi edition. This was marked up in TEI Lite by James Tauber, the CWIS Officer attached to the Scholars' Centre (Tauber, 1995). As a comparatively small file (about 700K), with a fairly simple structure, it provided a suitable pilot to test the basic procedures for making books and refining stylesheets, in both DynaText and DynaWeb.
The major initial components of the Electronic Text Service, however, are five literary databases published by Chadwyck-Healey Ltd (http://www.chadwyck.co.uk): The Bible in English, Editions and Adaptations of Shakespeare, English Verse Drama, Goethes Werke, and the Patrologia Latina Database. As well as containing numerous image files and an elaborate and complicated structure, they are all comparatively large even as raw source files, ranging from the 200MB of The Bible in English to the Patrologia Latina with more than 1GB. They were supplied by Chadwyck-Healey as raw source files, accompanied only by their own Document Type Definition. Because the CD-ROM version of these databases uses DynaText, we were able to obtain DynaText stylesheet files from the publisher separately. These included both the main full-text and table of contents stylesheets, as well as various subsidiary stylesheets for notes and lower levels of text.
All these databases are derived from previous printed editions. The Patrologia Latina Database, for instance, attempts to mirror in detail the physical structure of its printed predecessor. This means not only that the table of contents begins with a list of 221 volumes, but also that page numbers, column numbers and section numbers are included in the text. The Goethe database imitates the printed Weimar edition, even down to its index volumes. This can be quite distracting to the user, especially when the database is a combination of many different printed editions, each with its own approach to pagination and structure, as is the case with the Bible, Shakespeare, and English Verse Drama. DynaWeb seems less adept than DynaText at mimicking printed materials. Instead of the continuous scrolling text of DynaText, the hypertext architecture of DynaWeb can only supply separate chunks of text at a time, in varying sizes depending on the structural markup.
Turning these large and complicated files into DynaWeb books has been a fairly lengthy process. Though good support was received from the Sydney office of Electronic Book Technologies, this appears to be the first attempt to apply DynaWeb to Chadwyck-Healey databases. The learning curve was comparatively steep, with a considerable amount of trial and error.
The Electronic Text Service uses a SunSPARC 20 server running SunOS 5.5, with 12 GB of disk space and 64 MB of RAM. The size of the source files caused some complications in running DynaText's 'make-book' program. The solution involved building the index files in a separate partition from the DynaWeb server software, and providing virtual links between the two.
Query files were written for each book. As far as possible, the queries provided in the DynaText CD-ROM version were replicated. Chadwyck-Healey have extensively customized the search screens in their DynaText CD-ROMs, however, to provide unique, non-standard functionality for each book, and some of this cannot be reproduced in the standard search forms.
The image files for each book were supplied in bitmap format. They had to be converted to GIF for display by Netscape, which also required the file reference in each <FIGURE> tag in the source file to be amended. This was done by writing a simple Perl script.
Stylesheet editing was done by using X-Windows emulation software (eXcursion) running on a networked Pentium PC. This has proved somewhat unsatisfactory, mainly because of the large size of the books involved. The construction of HTML stylesheets was made difficult by the elaborate structure of the databases, as well as by the need to mimic printed editions, as mentioned above. This has turned out to be the most time-consuming part of what has otherwise been a reasonably straightforward process. The result, however, is a collection of large humanities databases which can be browsed and searched comparatively easily over the campus network.
The immediate goal is to promote the Electronic Text Service as a tool for teaching and research in the humanities. A programme of demonstrations and training sessions is planned. These will aim at encouraging academic staff to explore ways in which the use of the service can be integrated into taught courses, initially at postgraduate and honours level. This will not just be a case of linking the content of a database to the subject-matter of a course. It will be at least as important to consider the form of the databases, and to investigate the methodology of electronic texts and their implications for scholarly communication and publishing.
The service will be extended with the addition of texts drawn from commercial and public-domain sources. But attention will also be given to encouraging staff and students to construct and publish their own materials, either as part of a course or as an outcome of research. In this sense the Electronic Text Service will, it is hoped, make a significant contribution to the methods of scholarship and learning, as well as providing a resource of considerable depth and richness.
Burrows, Toby. (1995). 'Educating for the Internet in an academic library:
the Scholars' Centre at the University of Western Australia',
Education for Information 13, 229-242.
Tauber, James K. (1995). 'Abandon all hope, ye who enter: a TEI novice recounts his experiences marking up La Divina Commedia and the Greek New Testament', Text Technology 5, 225-233.
[Table of Contents] [Letter to the Editor]
Computers & Texts 13 (1996), 15. Not to be republished in any form
without the author's permission.
HTML Author: Michael Fraser (firstname.lastname@example.org)
Document Created:January 7 1997
The URL of this document is http://info.ox.ac.uk/ctitext/publish/comtxt/ct13/burrows.html