SGML: UM Making of America breaks the 1,000 volume barrier

UM Making of America breaks the 1,000 volume barrier

From owner-humanist@lists.Princeton.EDU Wed Jul  9 14:40:18 1997
        Date: Mon, 7 Jul 1997 19:40:37 -0400 (EDT)
        From: John Price-Wilkin <>
        Subject: UM Making of America breaks the 1,000 volume barrier


	UM Making of America breaks the 1,000 volume barrier

Several hundred new volumes were recently added to the University of
Michigan's Making of America site
bringing to the total number of books to 1,402.  That's an average of 258
pages per volume, and a total size of 742Mb of searchable text.  This
represents a significant body of materials for research, 85% of the size
of the English Poetry Database, now accessible freely on the Internet. 
Nearly 200 more monographic titles will be added in the coming months,
bringing the size of the monographic portion to nearly 1Gb. 

The UM Making of America site will soon feature Michigan's first
periodical titles in the project.  Indexers are currently adding article
separators and accurately keyed bibliographic information to the MoA
periodicals.  Several volumes have been prepared, and material will be
added as indexing is completed.  By the end of the summer, approximately
250,000 pages of 19c periodicals will be accessible through the project. 

The UM portion of the project uses a combination of automatically
generated OCR with a low level of SGML encoding, using the TEI Guidelines.
This strategy provides us with a means by which we can inexpensively build
access mechanisms while at the same time building a consistent upgrade
path.  Volumes begin as lightly encoded materials, with a TEIHeader, body,
and page breaks.  As funding is found to improve materials in the
collection, the OCR is proofed/corrected and the full structure of the
volume (e.g., chapters, sections, paragraphs) are encoded.  Initial
materials are made available in the relatively "raw" form, displaying page
images at various resolutions dynamically generated from 600dpi TIFF
images.  The improved, or "cooked", materials are used to display
on-the-fly generated HTML, with links to the same page images for
consultation.  An announcement and fuller discussion on this capability
will be made available late in the summer.  We hope that over time
resources can be found to improve large numbers of volumes in the MoA

Access mechanisms for this collection were developed by the University of
Michigan Digital Library Production Service, a federated organization with
funding, staff, and computing resources contributed by the UM's Library,
its Information Technology Division, and the Media Union.  For more
information about DLPS, see 
The collection itself was built by the hard work of collection development
librarians and Preservation staff.  Generous funding was provided by the
Mellon Foundation for the original page conversion.  A sister site with
many significant 19c periodicals has been made available by Cornell
University at:

For more information or to provide feedback about the University of
Michigan MoA site, please e-mail