ELECTRONIC EXCHANGE
Issue II Volume I August
1994
- The DOE Electronic Exchange Initiative
- SGML Technical Working Group
- SGML Pilot Projects
- Standard Generalized Markup Language
- SGML is an International Standard - ISO 8879
- Document Type Definitions
- SGML Systems
- Creating SGML Formatted Output
- The Electonic Exchange Vision
The initiative to develop standards for the electronic exchange of
scientific and technical information was promulgated within the
Department of Energy in August 1991 by the Office of Information
Resources Management (IRM) Policy, Plans, and Oversight, when they
announced an initiative to develop standards for the electronic exchange
of STI within the Department. At that time the Department adopted
ISO-8879, as defined in theFederal Information Processing
Standards (FIPS) Publication 152, as the DOE standard. The
Office of Scientific and Technical Information coordinates STI
management within the Department, was given responsibility to manage the
transition.
The initiative assumes an open systems environment in which
international standards play the predominant role for electronic
exchange of STI. The vision of this initiative is the electronic
creation, storage, transfer, retrieval, and exchange of full text
documents throughout the STI community.
In February 1993 The Electronic Exchange of Scientific and
Technical Information: Strategic Plan was issued by the
Department. The Strategic Plan was developed by a team of information
managers and professionals and computer scientists from DOE and
contractor sites across the country. These stakeholders established the
goal identified in the Strategic Plan to make electronic exchange of
full-text Department of Energy scientific and technical information the
norm by the year 2000.
Return to Table of Contents
The vehicle for this collaborative effort within the Department is the
Standard Generalized Markup Language Technical Working Group. The SGML
Technical Working Group is a cooperative undertaking of the DOE and
Commerce, Energy, National Library of Medicine, National Aeronautic and
Space Administration Defense Information (CENDI) community to facilitate
the use of SGML, and other standards, for the electronic exchange of
STI. The Working Group, with membership of DOE, DOE contractors, and
CENDI organizations, serves as a nucleus resource and advisory group for
facilitating the implementation of SGML applications.
The SGML
Technical Working Group plans to meet on October 24, 1994 during the
Office of Scientific and Technical Information's INFOTECH activities.
Topics for this meeting will be Document Type Definition developmen t,
the Bibliographic Record Project with Savannah River Site and full-text
exchange projects envisioned early next year. Information about the
SGML Technical Working Group meeting may be received by either phoning
Bob Donohue at (615) 241-3849 or Email: Bob.Donohue@CCMAIL.OSTI.GOV.
Return to Table of Contents
A number of pilot projects are being conducted within the Department. A pilot
project is currently under way with the Savannah River Site to electronically
exchange bibliographic information using SGML-encoded data. This pilot will
be completed by the end of July. A full text and bibliographic data exchange is
envisioned with the Oak Ridge National Laboratory. Additionally, OSTI is
planning a pilot with the Brookhaven National Laboratory for the development
and exchange of an electronic document that encompasses hypermedia
characteristics.
Return to Table of Contents
SGML was designed by Dr. Charles Goldfarb to enable and facilitate the
exchange of documents in an environment where the expectation is that
computing platforms and software applications are different and the
informational value of these documents can be maximized in an automated
processing environment. One of the primary driving forces behind the
development of SGML was to free publishers from the autocratic and arbitrary
rule of expensive proprietary languages. Another impetus was the desire to
create electronic products without repetitious editorial effort, that is, to
prepare content once in a form that is suitable for both printed and electronic
documents. Additionally, the growing recognition that information may well
last longer than the computer or software used to produce the information.
Return to Table of Contents
SGML is an international standard (ISO 8879) and it is also a Federal
Information Processing Standard (FIPS 152). It has been adopted by the legal
and financial community, the automotive industry, the commercial airline
industry, the Security and Exchange Commission, the pharmaceutical
industry, the Department of Defense and, of course, the Department of Energy,
to name a few.
While SGML is not a specific set of tags, it is, rather, a methodology
and a language for describing the structure of a document. Any document
can be described in terms of structure and SGML can describe any
document that has structure. An SGML document consists of only three
elements: the SGML Declaration, the Document Type Definition and the
Document Instance. The SGML Declaration establishes the character set
and syntax that will be used to describe the document; it lets the user
to determine the base character set, the data encoding method (ASCII is
the most commonly used) and the maximum length of the tag names for
example.
Return to Table of Contents
The Document Type Definition, or DTD, describes the structure and
content elements of a document. An application, that complies with the
SGML standard, interprets and processes the tags and text contained in
the document. The DTD also establishes the relationship between tags
that occur in a document. There are also tags that may qualify the
meaning in a tag. These attribute tags add intelligence to the data in
an SGML document. For example an attribute tag may be a classification
for a particular portion of text; only those portions of text with an
appropriate classification may be read by someone with the commiserate
level of clearance and need-to-know in an electronic document, for
example. The Document Instance is the marked-up document. These pieces
of an SGML application can reside physically together or separately.
While the SGML standard addresses text only, graphical information is
dealt with via external entity references.
Return to Table of Contents
To establish a productive SGML system, a number of things must be considered.
An
SGML system refers to a suite of computer applications that address the
creation,
processing and production of SGML documents. An input system is required for
creating SGML documents. This can entail a simple ASCII text editor or one of
the
editing applications specifically designed for creating SGML document
instances.
There are a number of these on the market today; WordPerfect, Datalogic, GRIF,
ArborText and SoftQuad are all making SGML editing applications that are easy
to
use and keep the SGML encoding aspects of the process behind the scenes where
it
belongs. An SGML parser is also necessary for validating the SGML document
instance against the structural rules established in the Document Type
Definition. We
have been using a public domain parser called SGMLS and most SGML editors have
a
parser bundled with the their editing software.
A Document Type Definition describes the structural rules associated with
specific
classes of documents. There are a number of Document Type Definition creation
applications also being marketed today. For example Near & Far by Microstar is
a
Document Type Definition utility in which the practitioner is called a Computer
Aided
Document Engineer. SoftQuad, Frame and Interleaf, to name a few, also have
Document Type Definition utilities that ease this process.
Return to Table of Contents
A means to translate the document instances into a specific output application
is
also necessary. This is taking the SGML document instance and creating a
formatted
output. The formatted output could be a paper document or an electronic
document
viewed over the Internet. While the output specification standard is yet to be
resolved by the international standards bodies, there are a number of
approaches
being taken to address this area. Electronic Book Technologies, Interleaf's
Worldview
and Adobe Acrobat, to describe the short list, all have the necessary
technology to
create formatted representations of the SGML data. We are finding SGML does
allow
multiple output alternatives without a great deal of difficulty associated with
the
conversion process. For example we took an SGML document and created a
Hypertext
Markup Language (HTML) document for viewing through Mosaic, processed the same
SGML document through the Interactive Authoring and Display System (a
hypermedia
SGML viewer) and, in addition, created a WordPerfect paper document, all from
the
same SGML source document.
Return to Table of Contents
The vision of the Electronic Exchange Initiative is information that is freely
exchanged without being encumbered by proprietary publishing formats that
lock up information rendering it inaccessible over time. As use and reuse of
information are critical components of life cycle management of information,
there is only one standard that addresses these critical components: Standard
Generalized Markup. Any future information systems architecture must
address electronic exchange standards; the exchange environment in the
Department of Energy is now characterized by autonomous islands of
information rich organizations that do not have the means to share, easily or
inexpensively, the result of their labor with the Federal community and the
American public.
Return to Table of Contents