[Mirrored from: http://www.gcn.com/]

26 May 1997

Other articles by:
Florence Olsen


13 January 1997
Agencies milk the Web for help

27 January 1997
Cohen will face IT leadership void at Defense

27 January 1997
IRS equipment frees up hands

10 February 1997
Fight user directory chaos

24 February 1997
Health records project hits pay dirt

24 February 1997
Warehousers get a shortcut

3 March 1997
Wang: Group your resources

17 March 1997
Navy stands by its librarians

31 March 1997
DFAS sharpens up its images

14 April 1997
Air Force goes for the GOLD

14 April 1997
NT server net will link 34,000 postal facilities

28 April 1997
GSA retires its vintage Univac

28 April 1997
Justice champions DMA standard

12 May 1997
Put everyone on the same page

26 May 1997
SQL Server gets tryout at USGS

Florence Olsen

Enterprise Computing
Energy puts SGML to work

Language lets office offer desktop access to multimedia document archives

The Energy Department is spending $825,000 to exploit little-known capabilities of the Standard Generalized Markup Language for managing distributed archives.

Energy's Office of Scientific and Technical Information (OSTI) recognized the value of SGML several years ago when it adopted the language as its standard for document exchange. Now Energy is laying the groundwork for a distributed multimedia archive that agency scientists and academic researchers can access from any desktop computer.

With the new architecture, OSTI officials want to give government scientists desktop access to the full text and multimedia output of Energy's more than 60 research sites and program offices that conduct basic materials research and other high-interest investigations.

Exchange course

OSTI has had modest success so far with electronic document exchange. "We're trying to stay in an environment where we're not locked into a proprietary representation," said Earl Smith, computer specialist in OSTI's Information Systems Development branch at Oak Ridge National Laboratory in Tennessee. Now only 17 research sites use an electronic format to transfer their lab output to OSTI.

Smith said unlocking the capabilities of SGML can benefit all organizations that manage information. "SGML gives you room to do lots of interesting things," he said.

Under the new plan, the labs themselves would be relieved of transferring copies of their research to OSTI. Instead, OSTI would maintain an entity registry at each research site by exploiting a little-used entity naming mechanism in the SGML standard.

Document du jour

The Intelligent Document Exchange Architecture (IDEA) under development incorporates commercial products that transform non-SGML documents into SGML.

Despite its preference for the SGML format, OSTI has been accepting Adobe Systems Inc.'s PostScript and Portable Document Format, TIFF, Group 4 and HyperText Markup Language documents as well.

Older documents likely still will be stored as TIFF, PostScript or PDF image files with links to an SGML bibliographic header or metadata file identifying author, title, chapters, figures and mathematical equations, for example.

In an SGML-based archive, applications referencing SGML documents don't have to know whether those documents are stored in a file system or a database or whether they are out somewhere on the Internet, said Charles F. Goldfarb, principal architectural consultant on the DOE project and chief inventor of the SGML standard.

System developers using SGML can avoid direct, hard-wired addressing with Universal Resource Locators, or URLs, for example.

"This is a part of SGML that only recently is being explored in any depth," Goldfarb said.

The SGML entities, or components, will be stored in a relational database and managed by commercial document management systems.

The software that will complete OSTI's distributed document archive also derives from SGML.

That software is a scalable SGML messaging-oriented middleware and data collection system.

Until now, no vendor has used the open SGML standard to build tools that do this, said Ron Turner, co-owner of Soph-Ware Associates in Spokane, Wash., and principal developer for OSTI's distributed archive project. "Until the tools are built, the open standard really isn't doing much," he said.

OSTI officials said they hope to use the new SGML tools and the discipline that goes into analyzing and creating SGML document types to manage the interactions between all the components in the distributed archive.

Soph-Ware Associates will publish its distributed archive specification and offer a
commercial product based on that specification.

Goldfarb said the OSTI architecture is interesting because it doesn't rely on elaborate application programming interfaces for applications to communicate with one another.

"The idea of replacing a software way of thinking with a document way of thinking I find very innovative," Goldfarb said.

Although the architecture as envisioned does not use the mandatory Government Information Locator Service protocol, the American National Standards Institute's Z39.50, it performs a similar type of transaction, Turner said, and could have links to GILS.

Stronger connections

OSTI's distributed archive initially would support Energy's bench scientists, most of whom are connected directly to the high-speed Energy Sciences backbone network.

The multimedia delivery infrastructure will depend on the increased bandwidth from the National Science Foundation's Internet2 initiative to upgrade ESnet, Smith said.

In addition to the government's $825,000 Small Business Innovation Research Grant, the SGML project has received in-kind funding from many different sources.

They include SoftQuad Inc.; Saros Corp., a business unit of FileNet Corp.; and One Room Systems.


[ GCNN | GCN | S&L | SHOPPER | CSG ]


This site developed and served by
Millstar Electronic Publishing Group, Inc.
Copyright © 1996-97 Government Computer News and Millstar.
All rights reserved.