The evolving use of SGML in Electronic Journals

Judith Wusteman, Computing Laboratory, University of Kent

Although the SGML standard has been in existence for over ten years, its use in both paper and electronic journals has become widespread only in the last two years. Initial experiments by publishers and others are raising a whole raft of issues which need to be resolved. I will discuss some of these issues in the light of my experience in the Infobike/JournalsOnline project which is using SGML as an intermediate format for the provision of bibliographic information by publishers to the system.

I will first give some examples of the use of SGML in current ejournal projects, both for bibliographic information and for article full-texts. This will be followed by comments on this use, including the need for standardisation. I will discuss some "standard" DTDs currently available for header and full-text information and will comment on their usefulness in practice.

Despite the increasing prominence of SGML in the journals arena, only three of the sixty-plus projects funded by the UK's eLib (Electronic Libraries) programme use SGML to any appreciable extent. These three are SuperJournal 2, Infobike/JournalsOnline and CLIC. Commercially-based systems are also using SGML; for example, Blackwells Electronic Navigator. I will describe the use of SGML in these projects and share some of the lessons learned in Infobike/JournalsOnline. These lessons include the need for standardisation and the difficulty in achieving it.

I will also compare the use of SGML in eLib projects with its more ambitious use in the University of Illinois Digital Libraries project in the US. Here, a large testbed of full-text SGML journal articles are being developed. The problems experienced in this project in the rendering and display of SGML will become increasingly relevant to publishers in the UK as they begin to offer full-text SGML article provision.

DTDs, be they header or full-text, vary significantly between publishers. I will describe the ``standard'' DTDs available and discuss their relative advantages and disadvantages. Current standard header DTDs are ISO 12083, MAJOUR (Modular Application for JOURnals) and the more recent SSSH (Simplified SGML for Serial Headers). The latter was developed in 1996. As well as describing our experience with this DTD in Infobike, I will give an update on its subsequent progress and development.

ISO 12083 and the Elsevier science article DTD are both publicly available ``standard'' full-text DTDs. There is currently less choice for such full-text DTDs than for headers. Until recently, publishers were not particularly interested in pursuing standards in this area as they could not envisage a reason for exchanging full text. As real SGML projects are developed, this view is changing. I will discuss the current position concerning the development of full-text DTD standards.

With the arrival of SGML in the '80s, the myth arose that simply coding information in SGML meant that the information was future-proofed. Now that this myth is being debunked, a new assumption is emerging; that the use of standard DTDs future-proofs information. But this is not true either; the use of information-poor or badly-structured DTDs can result in loss of information. I will comment on the practical implications of such DTDs.

