Initiatives Towards a Standard Encoding for Manuscript Descriptions

Dr Peter Robinson (De Montfort University) peter.robinson@dmu.ac.uk
Lou Burnard (Humanities Computing Unit, University of Oxford) lou.burnard@computing-services.oxford.ac.uk
Merrilee Proffitt (Bancroft Library, University of Southern California at Berkeley) mproffit@library.berkeley.edu
Matthew Driscoll (Arnamagnaean Institute, Copenhagen) mjd@coco.ihi.ku.dk

Over the last five years, there has been increasing interest in the development of an agreed standard for encoding of machine readable manuscript descriptions. This interest has been driven by several factors. One is the rise of the World Wide Web, with its promise of a single point of access to online resources. Another factor is the success of the Text Encoding Initiative (TEI) in defining encodings for a wide variety of humanities texts, and the promise that SGML/TEI-based manuscript descriptions might find common acceptance where the many database formats for manuscript descriptions developed in the last decades have not. A third factor is the advent of digital imaging, and the recognition that the combination of online manuscript descriptions and digital images of manuscripts could vastly increase the access by scholars and others to the manuscripts and their study.

This panel will report on major initiatives, both in Europe and North America, towards achievement of the aim of a standard encoding for manuscript descriptions.

The MASTER project in outline
Peter Robinson, Centre for Technology and the Arts, De Montfort University

MASTER (Manuscript Access through Standards for Electronic Records) is a European Union funded project to create a single online catalogue of medieval manuscripts in European libraries. This project will develop, in partnership with the TEI, a single standard for computer readable descriptions of manuscripts. It will create software for making these records, test the standard and the software on at least 5000 manuscripts, and mount the records in a single networked catalogue, available to everyone. The catalogue will also include images of many manuscripts. MASTER is funded under the Framework IV Telematics for Libraries call.

The Partners in MASTER are: the Centre for Technology and the Arts, De Montfort University (leader); the Royal Library, the Hague; the Arnamagnaean Institute, Copenhagen; L'Institut de recherche et d'histoire des textes, Paris; the National Library of the Czech Republic, Prague; the University of Oxford. In addition, several other major libraries are associated partners in the project: these include the British Library, the Vatican Library, the Biblioteca Ambrosiana, and the Bodleian Library, Oxford.

An important part of MASTER is the development of software to help libraries make the records, so that even small libraries can make them. The project has been designed to bring together experts in text encoding and experts in manuscript cataloguing so that the developing standard can be tested in use, with direct feedback to the standard designers. An independent expert group will also meet periodically to review the standard, and to provide further comment to the standard designers. The standard will go through two stages of development and testing, with the first phase of testing scheduled to commence in October 1999.

Developing an SGML-based standard for manuscript descriptions
Lou Burnard, Humanities Computing Unit, University of Oxford

The documents whose production the MASTER project is intended to facilitate are precisely located in that uncertain no man's land, between highly structured and predictable database records on the one hand, and loosely organized discursive prose on the other which has long been the despair of database analysts. On the one hand, every organization engaged in detailed manuscript cataloguing knows perfectly well how its own manuscript descriptions should be organized -- what they must contain, and how it must be expressed -- on the other, every organization seems to adopt different practices. Typical manuscript descriptions contain a mixture of formal language, expressing such features as collation formulae or physical dimensions, and informal detail about a whole gamut of provenance information, impressionistic description, and historical summary. As such they present a splendid challenge to the abilities of formal document description in general, and of the claimed generality of the TEI approach to the problem in particular. In this part of the session, I will outline the approach we have proposed in the MASTER project.

We have taken as a starting point the idea that our descriptions must be coherent textual objects which can either appear as objects in their own right within a collection of such things -- a traditional manuscript catalogue; but it should also be possible to embed the metadata of which they are composed in the formal structure of a TEI Header attached to a digital transcription of an object. We have further identified a gross structure common to all such documents, with discrete areas for identification of the manuscript, summary of its intellectual content, its physical description, its origin and provenance, and administrative information applicable to its current location. Within these areas, and topic-specific subdivisions of them, we further propose more specialized and retrieval-focused access points for such features as proper names and iconographic terms.

We have defined a preliminary DTD to describe these and other aspects of the manuscript description, expressed as a set of extensions to the basic TEI architecture, and have started to validate this by encoding a variety of existing descriptions from the many different institutions which are collaborating in the MASTER project. The next stage will be to compare and consolidate cataloguing practices with reference to this agreed set of definitions, thus moving towards an environment in which collaborative cataloguing and sharing of records can be undertaken. The final piece in this process will be the production of software to support the production, analysis, and display of these records of Europe's rich documentary heritage.

Manuscript initiatives in America: the Digital Scriptorium
Merrilee Proffitt, Bancroft Library, University of Southern California at Berkeley

The Digital Scriptorium is a joint project of the Bancroft Library (UC Berkeley) and the Rare Books and Manuscript Library (RMBL) of Columbia University to digitize and make available on the World Wide Web the two universities' medieval and early Renaissance manuscript holdings. Funded by the Andrew W. Mellon Foundation, the Digital Scriptorium comprises a database of some 10,000 images. The database contains at least one image (usually more) from each of the almost 700 codices and 2000 documents (eighth to sixteenth century) held by the Bancroft Library and RBML. In addition, the database includes entries from Barnard College, Teachers College, Union Theological Seminary, UC Berkeley's Music Library, and the Robbins Collection. The project is intended as a prototype that will eventually allow participation of other institutions in the United States and Europe and provide virtual access to information about medieval manuscripts. The project intends to investigate the feasibility of continuing as a full or partial cost-recovery operation.

MASTER: the view from a library
Matthew Driscoll, Arnamagnaean Institute, Copenhagen

The Arnamagnaean Institute (AMI) is a full partner in the MASTER project, and intends to use the encoding defined by MASTER, and the software prepared for record entry by the project, for making a web-accessible catalogue of all the medieval Icelandic manuscripts in the possession of the AMI. It is proposed also to use this catalogue to achieve a 'virtual reunification' of the two halves of the Arnamagnaean collection, now divided between Reykjavik and Copenhagen. The AMI has limited resources and it is therefore important that the input of manuscript descriptions be efficient. It is important also that the descriptions themselves remain under the control of the Institute, so that they can be modified and corrected as necessary by the Institute's staff. The AMI is also planning a full digitization of all manuscripts in its possession, and the catalogue records should link to these images as they become available.

The AMI is leading the 'record entry and standard testing' work packages of MASTER. The key aim of this work is to test the efficiency and utility of the standard, and the software developed for the input of records. This will permit direct experience of the standard in use, and provide vital information both to the standard developers and to the software implementers. Although full testing of the standard is not scheduled to commence till October, there will already have been considerable experiment with the standard by AMI's cataloguers, and this paper will report on this.

