E-Journal Archival DTD Feasibility Study: Summary of Recommendations
From E-Journal Archival DTD Feasibility Study section 12.1, "DTD Design and Development":
Based on the findings in this report Inera believes a DTD or Schema can be developed that will allow successful conversion of significant intellectual content from publisher SGML and XML files into a common format for archival purposes. We recommend the DTD and accompanying use policies to have the following characteristics:
- The archive should use XML, not SGML, because there is a wider range of tools available for XML.
- The archive DTD should be less restrictive in structure and more streamlined in element selection than the specific DTDs created by individual publishers because it must accommodate a wide range of journal styles.
- The archive DTD should fall between the intersection and the union of structural elements found in the surveyed publisher DTDs. It should be broad enough to capture the key structural elements common to most publishers, however it will exclude many elements that are unique to specific publishers because the intellectual content of these elements can be adequately rendered through less specific markup.
- The archive DTD should make use of public standards rather than proprietary standards wherever possible because it is more likely that work based on public standards will be easier to access at such time as articles are retrieved from the archive.
- Archive XML files should include generated text and face markup. We believe that this solution presents the most effective method for an archive to render content.
- The archive DTD must include comprehensive documentation to insure that the DTD is correctly and consistently interpreted. In addition, the archive should consider providing quality assurance tools to organizations that transform content into the archive DTD.
- Quality control tools should be developed in conjunction with the archive DTD. They should be used to validate all content submitted to the archive. Creators of archival SGML should be encouraged (if not required) to use the tools.
From Section section 12.2, "Transformation Deposit and Retrieval":
While we are confident that the design and development of an archive DTD can be successfully completed, we believe there are significant challenges to be faced with its deployment and use. These issues include:
- The quality and consistency of incoming SGML cannot be insured. Some content that is converted to an archive DTD and archived without immediate use or adequate quality checks may prove to be unusable, in part or in full, when it is accessed at an unknown future date.
- Quality can be significantly improved through a validation process and ongoing feedback loops to archive depositors. Setup and maintenance of quality checks will impact the archive budget.
- Some manual intervention may be necessary during SGML transformation. It is possible that under some circumstances the degree of manual intervention required may exceed the capacity of an archiving institution.
- Publishers should not be asked to add granularity when transforming SGML to an archive DTD because it places an increased burden on their ability to deposit content. However in cases when adding structure is critical to enable linking, and it can be done with pattern recognition, this step should be encouraged.
Prepared by Robin Cover for The XML Cover Pages archive. See: "Harvard University Library Feasibility Study Recommends XML DTD/Schema for E-Journal Archives." Main reference: "Harvard University E-Journal Archive Project."