The Use of SGML in the VUBIS-Antwerpen Library Network

Paper presented at the 2nd Annual Conferenc on the Practical Use of SGML - Antwerp, October 25, 1995

Jan Corthouts, Deputy librarian
University of Antwerp UIA Library
PB 13
Antwerpen
B-2610
jcort@lib.uia.ac.be
Richard Philips, System Manager
University of Antwerp
PB 13
Antwerpen
B-2610
rphilips@lib.ua.ac.be

Abstract

The VUBIS-Antwerpen library network has in recent years adopted the SGML standard as a key standard for the exploitation of its catalogues and for the creation of what we believe to be new and innovative services. This paper presents a brief report on ongoing SGML-projects at the University of Antwerp: CCB, Antilope, Impala and HyperLib.


Introduction

In the age of computer mediated communication, libraries feel a clear need to reorganise and reorient their services and operations. Libraries are being transformed from depositories of documents into broad, co-ordinated, easy to use structures for accessing networks and networked information. To date, librarians have mainly tried to provide local access to local ownership; henceforth, they are going to have to create an informational metastructure, largely based on telecommunications and on the new electronic documentary resources. This metastructure is usually referred to as the 'virtual library': a library or service that provides organised access to information existing at a number of locations on the net, but almost universally accessible, and that provides a presentation of that information to the user as though it were local and unified. Faced with rapidly rising costs and increasingly inelastic budgets, librarians look to computers and networks as part of the solution to restore an economic balance and to provide a high level of end user services. (see footnote 1)

To obtain this level of information integration it is necessary to develop a standard way of communicating between information systems. It is obvious that in recent years the importance of the relationship between information technology standards and librarianship has grown considerably. SGML is one of these standards, although it has been discovered by libraries only very recently. However, it is our belief that its importance will grow very fast. For VUBIS-Antwerpen SGML has become one of the key standards in library operations.

VUBIS-Antwerpen Library Network

VUBIS-Antwerpen (see footnote 2) is a library network using VUBIS software at the University of Antwerp. Partners in this network are ten academic or special libraries, mostly situated in the Antwerp region. Together they produce a union catalogue counting approximately 900.000 titles with a growth rhythm of 75.000 titles per year.

The VUBIS-Antwerpen library network has in recent years adopted the SGML standard as one of the key standards for the exploitation of its catalogues and for the creation of what we believe to be new and innovative services. This paper presents a brief report on ongoing SGML-projects at the University of Antwerp.

Exchange of Bibliographic Data

The CCB Project

The CCB-project was initiated by the Belgian National Conference of University Chief Librarians and is managed by the universities of Gent and Leuven (see footnote 3) . The CCB contains the union catalogue of approximately 40 university and special libraries containing ca. 4 million bibliographic records on two CD-ROM disks.

Although usually MARC or UNIMARC formats are used for exchanging bibliographic records between library systems, SGML was selected for this project and a DTD for CCB records was defined. The following considerations lead to the choice of SGML:

As a result of this project, all the universities in Belgium are now exchanging their catalogue records in SGML format for the yearly production of the CCB on CD-ROM.

The Antilope Catalogue

Antilope (see footnote 4) is a database created at the UIA. It constitutes the Belgian union catalogue of periodicals held by the Royal Library and the university, research and special libraries in Belgium. Periodicals collections from some important foreign libraries like the Technische Universiteit Delft, the Landbouwuniversiteit Wageningen, the Koninklijke Nederlandse Akademie voor Wetenschappen, the British Library Document Supply Centre and the Institut National dInformation Scientifique et Technique have been added and match-merged. The Antilope database contains approximately 75.000 titles with ca. 230.000 locations in ca. 80 libraries.

Recently the Flemish Government has approved a project which aims to complete this Antilope catalogue with descriptions of older journals not yet present in this database. These journals are being catalogued in the different local library systems. On a regular basis these newly catalogued records will be sent to the UIA by FTP for match-merging with the existing database. The format for these descriptions is once again based on SGML (see footnote 5) .

This approach will soon allow the replacement of printed control listings by electronic versions, thus allowing all participating libraries (especially the smaller ones) to immediately correct their journal holdings from a computer file using an off-the-shelf SGML editor.

USMARC Formats

As noted above, formats based on USMARC or UNIMARC are normally used to exchange bibliographic records. Currently different experiments are going on to map USMARC structures to SGML DTDs and vice versa. The University of Berkeley Library have developed a variety of utilities for transforming USMARC records into an SGML format and vice versa (sgml2marc and marc2sgml) (see footnote 6) . The interest in SGML-based cataloguing approaches has increased due to different projects and initiatives aimed at integrating electronic resources in on-line library catalogues:

The Email Gateway for the Impala Document Ordering System

Access versus ownership is a familiar phrase in current library literature. The basic discussion of the concept of access versus ownership centers around the crisis in materials acquisition - especially periodicals - in academic libraries. Factors contributing to this crisis are: rising prices, shrinking budgets and increasing scholarly production (see footnote 9) . Due to this crisis, libraries are cancelling subscriptions to less used / very expensive journals. As a consequence demand for these publications can no longer be fulfilled by local ownership. In these cases ownership is being replaced by access to journal holdings in other libraries, often specialised in document delivery. Rapid and reliable document delivery can only be guaranteed if an adequate technological and organisational infrastructure exists. In Belgium, this infrastructure is provided for by Impala, an on-line electronic document ordering system, managed by the UIA. It allows the electronic forwarding and management of document requests between libraries (see footnote 10) .

A system like Impala must to be able to talk to other document delivery applications in Belgium (LIBIS-Net) and to the systems of large document suppliers abroad like the BLDSC (UK) or the TU Delft (NL). To enhance this communication we decided to use email as the transmission mechanism and SGML as the structuring language for the document requests (see ). All this is transparent to the end user: the Royal Library in Belgium using the Impala system can request a photocopy of a journal article held by the KU Leuven. Because the KU Leuven is known by the Impala system as an email supplier, the article request is captured by Impala, transformed to SGML and transmitted to the KU Leuven using email over Belnet. The KU Leuven receives this message and processes the information within its own on-line system. This solution has been implemented not only for the KU Leuven, but also for TU Delft (NL), KNAW (NL), and BLDSC (UK). BLUW (NL) and INIST (F) will soon follow. The accompanying DTD has been set up so that a complete peer-to-peer solution can now be developed (see footnote 11) . This should result in a true bi-directional gateway.


Figure 1.
1 - Email gateway for Impala

SDI: Serving the End Users

Printed acquisition lists have been existing now for quite a while at the University of Antwerp. Not only are these lists printed and distributed every month, but they are also available through the World Wide Web (see footnote 12) .

The purpose of the SDI service is to personalise these acquisitions lists. SDI or Selective Dissemination of Information is a service that automatically notifies the end user by email about new publications that he might find interesting.

Every SDI user can choose in what format he would like to receive the bibliographic descriptions in his email messages. At this moment a user can choose between:

The SGML format clearly has the advantage of structured output allowing users to import the records in personal bibliographic software like ProCite, Cardbox, Reference Manager etc.

HyperLib: Creating a Hypertext Library Information Infrastructure

HyperLib (see footnote 14) is an EC-funded project of the Loughborough University (UK) and the University of Antwerp (B). This project aims to improve access to the services of the libraries of the University of Antwerp, thereby enhancing the librarys utility to its users. In order to achieve a system that is both effective and easy to use it was decided to investigate how the library could benefit from a hypertext approach. This design focused primarily on hypertext guides (for library end-users) and manuals (for library staff) (see footnote 15) , and database related resources (an academic bibliography (see footnote 16) and a new hypertext catalogue (see footnote 17) ).

User guides are documents specifically intended to inform the user on library services. Now that the library catalogue is often accessed through the campus network and the Internet, it becomes necessary to provide a method to support these distant users. Manuals on the other hand are intended for library staff members. Because system enhancements are being implemented regularly and often on short notice, printed manuals become rapidly obsolete and reprinting them for a large user base has proven to be expensive. The development of hypertext manuals accessible through the network guarantees permanent access to updated versions. Henceforth, they should offer the possibility to print the manuals on demand (e.g., for training purposes) and to inform users on changed or new chapters (current awareness).

Making hypertext manuals and guides available in a heterogeneous network requires adequate solutions for problems with regard to text encoding and client/server access. The library manuals and guides have been in existence for quite a while and are being produced using a variety of word processing and desktop publishing software. These applications are not hypertext editing tools as such, and the transport and reuse of information represented by such texts from one program or application to another remains difficult. The project needed a standard way of encoding the hypertext structures, thus enabling flexible transport. The project therefore decided to use SGML or Standard Generalised Markup Language. At about the same time the World Wide Web became extremely popular. WWW uses HTML (HyperText Markup Language) to structure its documents. HTML is an SGML application and converting an SGML HyperLib instance to the HTML format proved to be feasible. WWW browsers now cover all operating systems, thus guaranteeing client-server access to the HyperLib products.

A HyperLib manual or guide is an SGML instance conforming to the HyperLib Document Type Definition (DTD) (see footnote 18) and it is built from a sequence of topics. A topic is the smallest addressable unit in a hypertext network and it contains structured text (headings, paragraphs, lists), embedded elements (foreground and background images) and hypertext references to other topics within the same manual, to other HyperLib manuals or guides, to other local or remote HTML documents on the Web or to Internet services in general (telnet, ftp, Gopher...). All topics are organised in a hierarchical network of parents and siblings (see ). As a result all topics are defined with a unique location in the hypertext network.


Figure 2.
Tree structure of the hypertext topics

The production of a HyperLib document goes through different phases (see )


Figure 3.
The HyperLib production process[DL][DT]Text encoding[/DT][DD]

The author creates the HyperLib document in SGML format using native SGML authoring tools. These tools include validating parsers guaranteeing that the document instance conforms to the requirements of the DTD. The hyperlinks are added by the author during the editing of the instance, but some of them (implicit references like a link to the table of contents, to a browsable index, to the next or previous topic...) are created automatically during the parsing processes.

[/DD][DT]Syntactical parser[/DT][DD]

The SGML instance is validated by the SGMLS parser. This parser checks if the document instance contains all formal specifications from the HyperLib DTD. If errors are reported, the document goes back to the author for correction.

[/DD][DT]Semantical parser[/DT][DD]

A second in-house developed parser checks whether all topics are defined with their correct relations to other topics in the network. The parser will also sort the topics in their correct sequence (top to bottom, left to right) and will add the implicit hypertext links.

[/DD][DT]Transformation[/DT][DD]

The resulting SGML instance is used for conversion to HTML for presentation on the Web. The implicit hypertext references (link to table of contents, to next/previous topics...) are presented as clickable icons (see ).

[/DD][/DL]


Figure 4.
A HyperLib topic on the WWW

Conclusion

SGML helped VUBIS-Antwerpen in creating new library services in response to changing academic demands. But there is more to be expected. With the move towards electronic publication and distribution of documents, traditional models of production, distribution and use of scientific information are being re-evaluated both by publishers and libraries. This will advance access and manipulation of full-text SGML formatted documents. If it is true that within three years 82% of the publishers will change over to SGML, libraries will receive their information in a different format. May be an occasion to come back to you within a year or two.


Footnotes:

footnote 1 (back)

Christinger Tomer, Information technology standards for libraries. In: Journal of the American Society for Information Science, 43 (8) September 1992, p. 566-570.

footnote 2 (back)

For more information on the VUBIS-Antwerpen library network, see http://www.ua.ac.be/vubis.html

footnote 3 (back)

For a more complete report on the CCB project, please see the following publications: Jan Corthouts and Herbert Van de Sompel, CCB: tweede editie van de collectieve catalogus van Belgi op CD-ROM. In: Van geautomatiseerd beheer van archieven en bibliotheken naar geautomatiseerde informatie, (Bibliotheekkunde; 51), p. 65-82 Jan Corthouts and Herbert Van de Sompel, CCB: de collectieve catalogus van Belgi op CD-ROM. In Bibliotheek- en archiefgids, 69 (3), 1993, p. 119-127. This project was reported on before at the 1st SGML Belux Conference, Brussels, 22 March 1994. A presentation of the project in electronic form is available in Dutch, English and French at [http://www.libis.kuleuven.ac.be/libis/ccb/index.html] The CCB DTD can be found at [ftp://lib.ua.ac.be/pub/ccb/ccb.dtd]

footnote 4 (back)

For more information on Antilope see [http://www.ua.ac.be/MAN/ANTILOPEE/root.html]

footnote 5 (back)

The Antilope DTD can be found at [ftp://lib.ua.ac.be/pub/antilope/atp/atp.dtd]

footnote 6 (back)

These utilities are available from [ftp://library.berkely.edu/pub/sgml/marcdtd/]

footnote 7 (back)

Richard Giordano, Documentation of electronic texts using Text Encoding Initiative Headers : an introduction. In: Library Resources and Technical Services, 38 (4), 1994, p. 5-27 Edward Gaynor, Cataloging electronic texts : the University of Virginia Library experience. In: Library Resources and Technical Services, 38(4), p. 405-413. C.M. Sperberg-McQueen and Lou Burnard, TEI P3: Guidelines for electronic text encoding and interchange .- Oxford; Chicago : TEI, 1994 contains a special section (24.3) titles Header elements and their relationship to the MARC record, offering suggestions about how to transfer header information to a MARC record.

footnote 8 (back)

Alan D. Harrison, Frank A. Roos and R. Eric Thomas, (Semi)automatic capturing of bibliographic information from journal contents pages for inclusion in online library catalogues: the RIDDLE project. In: Electronic library, 13 (1), 1995, p. 15-19.

footnote 9 (back)

Cheryl B. Truesdell, Is access a viable alternative to ownership: A review of access performance. In: The journal of academic librarianship, September 1994, p. 200-206.

footnote 10 (back)

Online documentation on Impala can be found at [http://www.ua.ac.be/MAN/IMPALAN/root.html] for the Dutch version and [http://www.ua.ac.be/MAN/IMPALAF/root.html] for the French version. See also: K. Clara and R. Philips, Impala, het documentbestelsysteem. In Bibliotheek- en archiefgids, 69 (1993), p. 180-185. K. Clara, Impala, het Belgische elektronische bestelsysteem. In: Van geautomatiseerd beheer van archieven en bibliotheken naar geautomatiseerde informatie, (Bibliotheekkunde; 51), p. 31-39.

footnote 11 (back)

The Impala DTD can be found at [ftp://lib.ua.ac.be/pub/impala/impala.dtd] We are currently working on a complete online documentation. The URL-reference will be [http://www.ua.ac.be/MAN/IMAIL/root.html]

footnote 12 (back)

See http://www.ua.ac.be/AWL/awl.html]

footnote 13 (back)

The SDI DTD can be found at [ftp://lib.ua.ac.be/pub/sdi/SDI.dtd]. Online documentation about the SDI service is available in English [http://www.ua.ac.be/MAN/SDIE/root.html] and Dutch [http://www.ua.ac.be/MAN/SDIN/root.html].

footnote 14 (back)

The complete documentation on the HyperLib project is available online at [http://www.ua.ac.be/docstore.html].

footnote 15 (back)

A complete overview of HyperLib manuals and guides can be found at [http://www.ua.ac.be/MAN/man.html].

footnote 16 (back)

See [http://www.ua.ac.be/AB/ab.html]

footnote 17 (back)

See [http://www.ua.ac.be/WWWOPAC/wwwopac.html]

footnote 18 (back)

Documentation on the HyperLib DTD can be found at [http://www.ua.ac.be/MAN/WP31/root.html]. A report on the transformation of HyperLib instances to HTML format is documented at [http://www.ua.ac.be/MAN/WP32/root.html].