With financial support from the Mellon Foundation, the Harvard University Library's Office for Information Systems E-Journal Archiving Project commissioned a feasibility study to investigate the development of a common markup formalism that can be used to "reasonably represent the intellectual content (text, tables, formulas, still images, and links) of archived journal articles." The study was carried out by Inera Corporation, using input from ten publishers who were asked to provide their existing DTDs, documentation and sample SGML documents for analysis. The Inera investigative team reviewed the materials to determine if such a structure can be developed and to assess the challenges that would be faced in SGML transformation; they also examined the challenges faced by organizations that have worked with DTDs from multiple publishers. A 65-page report documenting the E-Journal Archival DTD Feasibility Study has now been published. The report recommends the creation of an XML DTD or Schema which "can be developed, allowing successful conversion of significant intellectual content from publisher SGML and XML files into a common format for archival purposes." The authors of the recommendation elected to defer the choice between XML DTD and W3C XML Schema for formal notation. The Harvard project team now "hopes to finalize the conceptual agreement with its publishing partners, to document technical development, operations, and staffing of the archive, and to refine the business model that will sustain the archive over time."
Bibliographic information: E-Journal Archival DTD Feasibility Study. Commissioned by the Harvard University Library, Office for Information Systems, E-Journal Archiving Project. Prepared by Inera Incorporated. December 5, 2001. 65 pages. Edited by Bruce Rosenblum (Inera). With contributions from Bob Hollowell (American Institute of Physics), Kristine Schnebly (BioOne), David Sommer and Richard O'Beirne (Blackwell Science), Karen Hunter and Jos Migchielsen (Elsevier Science), John Sack, Maureen Phayer, and Diana Robinson (Highwire Press), Stephen Cohen and Ken Rawson (IEEE), Y Kathy Kwan and Ed Sequeira (Pubmed Central), Howard Ratner and Heather (Rankin Nature Publishing Group), Evan Owens and John Muenning (University of Chicago Press), Margaret Wallace (John Wiley & Sons).
Summary from D-Lib Magazine: "In the Fall of 2001, under the auspices of a Mellon Grant to explore ejournal archiving, Harvard University Library contracted with Inera, Inc. to review a variety of DTDs from selected publishers. The study focused on two key questions: Can a common DTD be designed and developed into which publishers' proprietary SGML files can be transformed to meet the requirements of an archiving institution? If such a structure can be developed, what are the issues that will be encountered when transforming publishers' SGML files into the archive structure for deposit into the archive? The requirement of the archival article DTD was defined as ability to represent the intellectual content of journal articles."
Project Methodology: "Harvard and Inera selected ten DTDs for review... The goal was inclusion of a sufficient number of DTDs to allow most significant issues to be identified during the course of the study. All publishers asked to participate in this study accepted. They include: American Institute of Physics (AIP), BioOne (BioOne), Blackwell Science (Blackwell), Elsevier Science (Elsevier), Highwire Press (HWP), Institute of Electrical and Electronics Engineers (IEEE), Nature Publishing Group (Nature), Pubmed Central (PMC), University of Chicago Press (UCP), John Wiley & Sons (Wiley). All participating publishers were asked to submit the current version of their DTD, DTD documentation, and twenty to twenty-five sample document instances from multiple journals and issues... All of the reviewed DTDs owe their legacy, directly or indirectly, to the ISO 12083 Serial DTD..."
Status report of 2001-12-05 from DLF (via Marilyn Geller, Harvard Project Manager): "Harvard has completed a first round of business meetings and technical meetings with our publisher-partners, Blackwell, John Wiley, and University of Chicago Press. We have also received a report from Inera, Inc. on the feasibility of developing a common archival article DTD... The significant conclusions drawn from this study are that it is possible to create a common archival article DTD that would represent the intersection and the union of several existing publisher DTDs and that thorough documentation and quality assurance tools would be essential to insure that conversion is successful. Because this study has so much potential for resolving ingest, storage and delivery issues, it is being made available to the entire scholarly communications community. We are optimistic that this will encourage discussion and progress in the technical aspects of e-journal preservation. In the coming months, we hope to finalize the conceptual agreement with our publishing partners, document technical development, operations, and staffing of the archive, and refine the business model that will sustain this archive over time..."
Principal references:
- E-Journal Archival DTD Feasibility Study. [cache]
- "Feasibility of an Archival Article DTD" quoted in 'In Brief', based upon a report from Marilyn Geller, Project Manager, Harvard University. In D-Lib Magazine Volume 7 Number 12 (December 2001).
- "Harvard University Library: A Study of Electronic Journal Archiving." From a Digital Library Federation (DLF) 'Summary of the Projects and their Progress'. [cache]
- Summary of recommendations in the DTD Feasibility Study
- Contacts: Marilyn Geller (Project Manager) and Bruce Rosenblum (Primary Author, DTD Feasibility Study).
- See also: "ISO 12083 XML DTDs"
- "Harvard University E-Journal Archive Project" - Main reference page.