Issue II                                  Volume I                       August

Table of Contents
  1. The DOE Electronic Exchange Initiative
  2. SGML Technical Working Group
  3. SGML Pilot Projects
  4. Standard Generalized Markup Language
  5. SGML is an International Standard - ISO 8879
  6. Document Type Definitions
  7. SGML Systems
  8. Creating SGML Formatted Output
  9. The Electonic Exchange Vision

The DOE Electronic Exchange Initiative

The initiative to develop standards for the electronic exchange of scientific and technical information was promulgated within the Department of Energy in August 1991 by the Office of Information Resources Management (IRM) Policy, Plans, and Oversight, when they announced an initiative to develop standards for the electronic exchange of STI within the Department. At that time the Department adopted ISO-8879, as defined in theFederal Information Processing Standards (FIPS) Publication 152, as the DOE standard. The Office of Scientific and Technical Information coordinates STI management within the Department, was given responsibility to manage the transition. The initiative assumes an open systems environment in which international standards play the predominant role for electronic exchange of STI. The vision of this initiative is the electronic creation, storage, transfer, retrieval, and exchange of full text documents throughout the STI community. In February 1993 The Electronic Exchange of Scientific and Technical Information: Strategic Plan was issued by the Department. The Strategic Plan was developed by a team of information managers and professionals and computer scientists from DOE and contractor sites across the country. These stakeholders established the goal identified in the Strategic Plan to make electronic exchange of full-text Department of Energy scientific and technical information the norm by the year 2000.

SGML Technical Working Group

The vehicle for this collaborative effort within the Department is the Standard Generalized Markup Language Technical Working Group. The SGML Technical Working Group is a cooperative undertaking of the DOE and Commerce, Energy, National Library of Medicine, National Aeronautic and Space Administration Defense Information (CENDI) community to facilitate the use of SGML, and other standards, for the electronic exchange of STI. The Working Group, with membership of DOE, DOE contractors, and CENDI organizations, serves as a nucleus resource and advisory group for facilitating the implementation of SGML applications.

The SGML Technical Working Group plans to meet on October 24, 1994 during the Office of Scientific and Technical Information's INFOTECH activities. Topics for this meeting will be Document Type Definition developmen t, the Bibliographic Record Project with Savannah River Site and full-text exchange projects envisioned early next year. Information about the SGML Technical Working Group meeting may be received by either phoning Bob Donohue at (615) 241-3849 or Email: Bob.Donohue@CCMAIL.OSTI.GOV.

SGML Pilot Projects

A number of pilot projects are being conducted within the Department. A pilot project is currently under way with the Savannah River Site to electronically exchange bibliographic information using SGML-encoded data. This pilot will be completed by the end of July. A full text and bibliographic data exchange is envisioned with the Oak Ridge National Laboratory. Additionally, OSTI is planning a pilot with the Brookhaven National Laboratory for the development and exchange of an electronic document that encompasses hypermedia characteristics.

Standard Generalized Markup Language

SGML was designed by Dr. Charles Goldfarb to enable and facilitate the exchange of documents in an environment where the expectation is that computing platforms and software applications are different and the informational value of these documents can be maximized in an automated processing environment. One of the primary driving forces behind the development of SGML was to free publishers from the autocratic and arbitrary rule of expensive proprietary languages. Another impetus was the desire to create electronic products without repetitious editorial effort, that is, to prepare content once in a form that is suitable for both printed and electronic documents. Additionally, the growing recognition that information may well last longer than the computer or software used to produce the information.

SGML is an International Standard - ISO 8879

SGML is an international standard (ISO 8879) and it is also a Federal Information Processing Standard (FIPS 152). It has been adopted by the legal and financial community, the automotive industry, the commercial airline industry, the Security and Exchange Commission, the pharmaceutical industry, the Department of Defense and, of course, the Department of Energy, to name a few. While SGML is not a specific set of tags, it is, rather, a methodology and a language for describing the structure of a document. Any document can be described in terms of structure and SGML can describe any document that has structure. An SGML document consists of only three elements: the SGML Declaration, the Document Type Definition and the Document Instance. The SGML Declaration establishes the character set and syntax that will be used to describe the document; it lets the user to determine the base character set, the data encoding method (ASCII is the most commonly used) and the maximum length of the tag names for example.

Document Type Definitions

The Document Type Definition, or DTD, describes the structure and content elements of a document. An application, that complies with the SGML standard, interprets and processes the tags and text contained in the document. The DTD also establishes the relationship between tags that occur in a document. There are also tags that may qualify the meaning in a tag. These attribute tags add intelligence to the data in an SGML document. For example an attribute tag may be a classification for a particular portion of text; only those portions of text with an appropriate classification may be read by someone with the commiserate level of clearance and need-to-know in an electronic document, for example. The Document Instance is the marked-up document. These pieces of an SGML application can reside physically together or separately. While the SGML standard addresses text only, graphical information is dealt with via external entity references.

SGML Systems

To establish a productive SGML system, a number of things must be considered. An SGML system refers to a suite of computer applications that address the creation, processing and production of SGML documents. An input system is required for creating SGML documents. This can entail a simple ASCII text editor or one of the editing applications specifically designed for creating SGML document instances. There are a number of these on the market today; WordPerfect, Datalogic, GRIF, ArborText and SoftQuad are all making SGML editing applications that are easy to use and keep the SGML encoding aspects of the process behind the scenes where it belongs. An SGML parser is also necessary for validating the SGML document instance against the structural rules established in the Document Type Definition. We have been using a public domain parser called SGMLS and most SGML editors have a parser bundled with the their editing software. A Document Type Definition describes the structural rules associated with specific classes of documents. There are a number of Document Type Definition creation applications also being marketed today. For example Near & Far by Microstar is a Document Type Definition utility in which the practitioner is called a Computer Aided Document Engineer. SoftQuad, Frame and Interleaf, to name a few, also have Document Type Definition utilities that ease this process.

Creating SGML Formatted Output

A means to translate the document instances into a specific output application is also necessary. This is taking the SGML document instance and creating a formatted output. The formatted output could be a paper document or an electronic document viewed over the Internet. While the output specification standard is yet to be resolved by the international standards bodies, there are a number of approaches being taken to address this area. Electronic Book Technologies, Interleaf's Worldview and Adobe Acrobat, to describe the short list, all have the necessary technology to create formatted representations of the SGML data. We are finding SGML does allow multiple output alternatives without a great deal of difficulty associated with the conversion process. For example we took an SGML document and created a Hypertext Markup Language (HTML) document for viewing through Mosaic, processed the same SGML document through the Interactive Authoring and Display System (a hypermedia SGML viewer) and, in addition, created a WordPerfect paper document, all from the same SGML source document.

The Electronic Exchange Vision

The vision of the Electronic Exchange Initiative is information that is freely exchanged without being encumbered by proprietary publishing formats that lock up information rendering it inaccessible over time. As use and reuse of information are critical components of life cycle management of information, there is only one standard that addresses these critical components: Standard Generalized Markup. Any future information systems architecture must address electronic exchange standards; the exchange environment in the Department of Energy is now characterized by autonomous islands of information rich organizations that do not have the means to share, easily or inexpensively, the result of their labor with the Federal community and the American public.

