Cover Pages: Harvard University E-Journal Archive Project

[January 07, 2002] The Harvard University E-Journal Archive Project was initiated through an October 2000 request to the Andrew W. Mellon Foundation for funding to create a plan for the archiving of electronic journals. In May 2001, the the Harvard University Library and three major publishers of scholarly journals (Blackwell Publishing; John Wiley & Sons, Inc.; the University of Chicago Press) announced an agreement to work together on a plan to develop an experimental archive for electronic journals. In December 2001. the Harvard E-Journal Archive project team published a Version 1.0 draft Submission Information Package (SIP) Specification which "defines acceptable data formats, file naming conventions, bibliographic and technical metadata, etc." The SIP Normative Data Formats are included in appendices A and B of this document, including several XML DTDs and Schemas. An E-Journal Archival DTD Feasibility Study commisioned by Harvard University has also been released. It recommends the creation of an XML DTD or Schema which "can be developed, allowing successful conversion of significant intellectual content from publisher SGML and XML files into a common format for archival purposes."

E-Journal Archive description from the Draft Submission Information Package (SIP) Specification [2001-12-19]:

The purpose of the Harvard University E-Journal Archive is to preserve the significant intellectual content of journals independent of the form in which that content was originally delivered in order to assure that this content will be available to the scholarly community for the indefinite future. Functionally, the archive is designed to render text and still images and other formats as practical with no significant loss in intellectual content. The archive reserves the right to freely manipulate the internal format of the manifestation over time as long as the plain meaning of the intellectual content is preserved.

The framework for discussing the architecture and operation of the archive is provided by the Open Archival Information System (OAIS) Reference Model. Under the OAIS model, material from a content provider is transmitted to the archive in a form called a Submission Information Package (SIP). The format of the SIP acceptable to the Harvard archive is described normatively by this specification... The archive Ingest function accepts the SIP and potentially transforms its contents into an internal form called an Archival Information Package (AIP) for long-term preservation.

[May 30, 2003] NLM Releases XML Tagset and DTDs for Journal Publishing, Archiving, and Interchange. An announcement from the US National Center for Biotechnology Information (NCBI) at the National Library of Medicine (NLM) describes the release of a Tagset and two XML DTDs designed to "simplify journal publishing and increase the accuracy of the archiving and exchange of scholarly journal articles. In April 2002, representatives from NCBI, Mulberry Technologies, Inc., Inera, Inc., the Harvard University E-Journal Archiving Project, and the Mellon Foundation (supporting the Harvard project and Inera) met in Bethesda, MD to discuss what changes needed to be made to the PMC DTD to reach the target of the common DTD format for archiving. The [resulting] Journal Publishing DTD and the Archiving and Interchange DTD have been created from the Archiving and Interchange Tagset, a set of XML elements and attributes that can be used to define many other types of documents, including textbooks and online documentation. The Tagset provides a set of XML modules that defines elements and attributes for describing the textual and graphical content of journal articles as well as some nonarticle material such as letters, editorials, and book reviews. The purpose of the Tagset is to preserve the intellectual content of journals independently of the form in which that content was originally created. The Tagset has been written as a set of XML DTD modules, each of which is a separate file. No module is a complete DTD by itself, but these modules can be combined to create any number of new DTDs." The NLM Tagset represents an open specification: the DTDs and the Tagset are in the public domain so that any organization wishing to create its own DTD from the Tagset may do so without permission from NLM. NLM is forming an XML Interchange Structure Advisory Board to assist in development and maintenance of the Tagset. An Archiving and Interchange Tagset Secretariat will collect feedback and will physically maintain the files and documentation.

December 5, 2001 update from Marilyn Geller, Project Manager for the Harvard University E-Journal Archive: "Harvard has completed a first round of business meetings and technical meetings with our publisher-partners, Blackwell, John Wiley, and University of Chicago Press. We have also received a report from Inera, Inc.on the feasibility of developing a common archival article DTD... The project's technical team has met with each of the publishers regarding the principles of technical development and the specifications for ingesting content. The most significant technical development in the last few months has been the delivery of the Inera study on the feasibility of creating a common archival DTD that would allow the archive to received material from all publishing partners tagged in the same manner. Ten publishers participated in this study by contributing their DTDs, documentation, and samples for review. The significant conclusions drawn from this study are that it is possible to create a common archival article DTD that would represent the intersection and the union of several existing publisher DTDs and that thorough documentation and quality assurance tools would be essential to insure that conversion is successful. Because this study has so much potential for resolving ingest, storage and delivery issues, it is being made available to the entire scholarly communications community. We are optimistic that this will encourage discussion and progress in the technical aspects of e-journal preservation... In the coming months, we hope to finalize the conceptual agreement with our publishing partners, document technical development, operations, and staffing of the archive, and refine the business model that will sustain this archive over time."

Project description from DLF:

Harvard is basing its archive on the architectural framework provided by the Open Archival Information System (OAIS) Reference Model. Under the OAIS model, material from a content producer is transmitted to the archive in a form called a Submission Information Package, or SIP. We have put together a tentative draft proposal for the technical specifications of the SIP that defines acceptable data formats, file naming conventions, bibliographic and technical metadata, and so forth. We are scheduling a round of meetings with technical representatives of our publishing partners to discuss and refine this proposal.

One of the key ideas we are exploring on the technical side is whether it is practical to design a common XML DTD that will reasonably represent the intellectual content of archival e-journal articles. Such a common DTD would simplify the work of gathering content from a variety of publishers using different DTDs. In this study, we have contracted with Inera because of their substantial background in this area and will look at the article DTDs being used by our publishing partners as well as a sampling of other DTDs representing large volumes of content and interesting elements. After determining the common elements of these DTDs, we hope to analyze the usefulness of this approach paying attention to what information is common to all DTDs and what information may be lost by using this common DTD. [description from August 31, 2001]

In the Harvard University draft Submission Information Package (SIP) Specification, the issue-level metadata file (issue-md.xml) is XML-encoded according to the METS XML schema. This file contains descriptive, administrative, and structural metadata related to the issue and all issue-level SIP components... METS (Metadata Encoding and Transmission Standard) is a XML-formatted metadata framework for encoding descriptive, administrative, and structural metadata of digital library objects. It was developed as an initiative of the Digital Library Federation, and is built upon work previously performed for the DLF-funded Making of America II project coordinated at the University of California, Berkeley. The METS mechanisms for defining structural metadata and synchronization were derived in part from TEI and SMIL. METS is a metadata framework capturing structural relationships and providing containers for descriptive and administrative metadata encoded according to standards external to METS itself." See "Metadata Encoding and Transmission Standard (METS)."

SIP XML Schemas and DTDs: Appendix B of the draft SIP specification documents the SIP Normative Data Formats for representing content components within the SIP. XML 1.0 is identified for use in the Metadata, Issue and Item-Level Text, and Item-Level Linkage; beyond conformance to the XML 1.0 standard, certain SIP components must also conform to specific XML schemas. For example, (1) Structural Metadata schema are governed by the METS XML schema with namespace http://www.loc.gov/METS/ [http://www.loc.gov/standards/mets/mets.xsd]; (2) Descriptive and Administrative Metadata use the EJAR schema with namespace http://hul.harvard.edu/EJAR/METADATA [http://hul.harvard.edu/EJAR/metadata.xsd]; (3) Issue-level data conforms to the EJAR-ISSUE schema with namespace http://hul.harvard.edu/EJAR/ISSUE/ [http://hul.harvard.edu/EJAR/issue.xsd]; (4) Item-level data conforms to the EJAR-ITEM schema with namespace http://hul.harvard.edu/EJAR/ITEM/ [http://hul.harvard.edu/EJAR/item.xsd]; (5) Item reference links use the EJAR-LINKS with namespace http://hul.harvard.edu/EJAR/LINKS/ [http://hul.harvard.edu/EJAR/links.xsd]. The document also specifies the use of W3C SVG and the DTD for MathML, Version 2.0.

References:

"Report on the Planning Year Grant For the Design of an E-journal Archive." Presented by the Harvard University Library Mellon Project Steering Committee to The Andrew W. Mellon Foundation. April 1, 2002. 33 pages. [cache]
"XML in the Harvard Libraries." By Stephen Abrams (Harvard University Library). December 16, 2002. 30 slides.
Submission Information Package (SIP) Specification. Harvard E-Journal Archive project. Draft Version 1.0 (first public draft). December 19, 2001. From Harvard University Library, Office for Information Systems. 40 pages. Contact for comments: Stephen Abrams or Marilyn Geller. [cache]
SIP display examples from Submission Information Package (SIP) Specification, 2001-12-19. For [1] SIP Issue-Level Metadata File (issue-md.xml) and [2] SIP Item-Level Metadata File (item-md.xml). Extracted from the draft specification.
"Harvard University Library: A Study of Electronic Journal Archiving." From a Digital Library Federation (DLF) 'Summary of the Projects and their Progress'. [cache]
The Andrew W. Mellon Foundation's e-Journal archiving program. Summary of the Mellon program, including the Harvard University electronic journal archiving proposal.
News item of 2002-01-04: "Harvard University Library Feasibility Study Recommends XML DTD/Schema for E-Journal Archives."
E-Journal Archival DTD Feasibility Study. Commissioned by the Harvard University Library, Office for Information Systems, E-Journal Archiving Project. Prepared by Inera Incorporated. December 5, 2001. 65 pages. Edited by Bruce Rosenblum (Inera). [cache]
Summary of recommendations in the DTD Feasibility Study 2001-12-05.
"Feasibility of an Archival Article DTD" quoted in 'In Brief', based upon a report from Marilyn Geller, Project Manager, Harvard University. In D-Lib Magazine Volume 7 Number 12 (December 2001).
[May 14, 2001] "Harvard University Library Joins Forces with Key Publishers for Electronic Journal Archive." Announcement May 14, 2001. "The Harvard University Library and three major publishers of scholarly journals -- Blackwell Publishing, John Wiley & Sons, Inc., and the University of Chicago Press -- have agreed to work together on a plan to develop an experimental archive for electronic journals. The preservation and the archiving of electronic journals - which are increasingly "born digital" and for which, in many cases, no paper copies exist - present unique, long-term challenges to librarians, publishers, and, ultimately, to the scholars and researchers who will seek to access to them over time..." [source]
"Harvard University Library E-Journal Archiving Project." By Dale Flecker. Harvard University Library E-Journal Archiving Project. February, 2001. 14 slides.
Proposal For a Study Of Electronic Journal Archiving. Submitted to the Andrew W. Mellon Foundation. October 13, 2000. "The Harvard University Library is requesting funding from the Andrew W. Mellon Foundation to create a plan for the archiving of electronic journals, in response to a letter of invitation received from the Foundation in August 2000."
Reference Model for an Open Archival Information System (OAIS). July 2001. Consultative Committee for Space Data Systems. "The purpose of this document is to define the International Organization for Standardization (ISO) Reference Model for an Open Archival Information System (OAIS). An OAIS is an archive, consisting of an organization of people and systems, that has accepted the responsibility to preserve information and make it available for a Designated Community... The reference model addresses a full range of archival information preservation functions including ingest, archival storage, data management, access, and dissemination. It also addresses the migration of digital information to new media and forms, the data models used to represent the information, the role of software in information preservation, and the exchange of digital information among archives. It identifies both internal and external interfaces to the archive functions, and it identifies a number of high-level services at these interfaces..."
Contacts: Marilyn Geller (Harvard E-Journal Archive Project Manager) and Bruce Rosenblum (Primary Author, DTD Feasibility Study).


SEARCH \| ABOUT \| INDEX \| NEWS \| CORE STANDARDS \| TECHNOLOGY REPORTS \| EVENTS \| LIBRARY