Making a Digital Library:
The Chemistry Online Retrieval Experiment

- A Summary of the CORE Project (1991-1995) -

December, 1995

Richard Entlich (Cornell University)
Lorrin Garson (American Chemical Society)
Michael Lesk (Bellcore)
Lorraine Normore (Chemical Abstracts Service)
Jan Olsen (Cornell University)
Stuart Weibel (OCLC)

The CORE project was an electronic library prototype of primary journal articles in chemistry, containing about four years of twenty primary journals published by the American Chemical Society (about 400,000 pages). CORE included both scanned images and an SGML (Standard Generalized Markup Language) marked-up version for on-the-fly rendering for screen display. Each page was scanned and segmented, with graphical units isolated and linked to figure references in the articles. The original machine-readable typography was converted to SGML format and the results were used to build databases with indexes for full-text Boolean searching.

Each page image was stored as a 300 dpi bitonal image for printing, and 100 dpi greyscale for screen display. All text data and the most recent page images were available on Unix-based magnetic storage at any given time, with additional (older) page images stored on a WORM (Write Once, Read Many) jukebox.

Complex scientific material (superscripts, tables, equations, special fonts and symbols, etc.) presents substantial problems for representation and display, especially when the material is being converted from previously published information, as were these journals.

The tasks of building and maintaining electronic journal databases remains formidable (especially if conversion from older formats is involved). However, experiences with chemists in this project suggest that electronic publishing will be popular with scholars, even though there remain significant disadvantages and impediments to adoption.

Analysis of user studies and transaction logs is ongoing and will be submitted for publication in the near future.

Further information on the CORE Project can be found at:

Acknowledgments: The CORE project thanks Sony of America, Digital Equipment Corporation, Sun Microsystems, and the Cornell Theory Center (which receives major funding from the National Science Foundation, and New York State, and additional funding from ARPA, the National Institutes of Health, and IBM Corporation).