[Mirrored from: http://www.gcn.com/]
26 May 1997|
Other articles by:
Language lets office offer desktop access to multimedia document archives
The Energy Department is spending $825,000 to exploit little-known capabilities of the Standard Generalized Markup Language for managing distributed archives.
Energy's Office of Scientific and Technical Information (OSTI) recognized the value of SGML several years ago when it adopted the language as its standard for document exchange. Now Energy is laying the groundwork for a distributed multimedia archive that agency scientists and academic researchers can access from any desktop computer.
With the new architecture, OSTI officials want to give government scientists desktop access to the full text and multimedia output of Energy's more than 60 research sites and program offices that conduct basic materials research and other high-interest investigations.
OSTI has had modest success so far with electronic document exchange. "We're trying to stay in an environment where we're not locked into a proprietary representation," said Earl Smith, computer specialist in OSTI's Information Systems Development branch at Oak Ridge National Laboratory in Tennessee. Now only 17 research sites use an electronic format to transfer their lab output to OSTI.
Smith said unlocking the capabilities of SGML can benefit all organizations that manage information. "SGML gives you room to do lots of interesting things," he said.
Under the new plan, the labs themselves would be relieved of transferring copies of their research to OSTI. Instead, OSTI would maintain an entity registry at each research site by exploiting a little-used entity naming mechanism in the SGML standard.
Document du jour
The Intelligent Document Exchange Architecture (IDEA) under development incorporates commercial products that transform non-SGML documents into SGML.
Despite its preference for the SGML format, OSTI has been accepting Adobe Systems Inc.'s PostScript and Portable Document Format, TIFF, Group 4 and HyperText Markup Language documents as well.
Older documents likely still will be stored as TIFF, PostScript or PDF image files with links to an SGML bibliographic header or metadata file identifying author, title, chapters, figures and mathematical equations, for example.
In an SGML-based archive, applications referencing SGML documents don't have to know whether those documents are stored in a file system or a database or whether they are out somewhere on the Internet, said Charles F. Goldfarb, principal architectural consultant on the DOE project and chief inventor of the SGML standard.
System developers using SGML can avoid direct, hard-wired addressing with Universal Resource Locators, or URLs, for example.
"This is a part of SGML that only recently is being explored in any depth," Goldfarb said.
The SGML entities, or components, will be stored in a relational database and managed by commercial document management systems.
The software that will complete OSTI's distributed document archive also derives from SGML.
That software is a scalable SGML messaging-oriented middleware and data collection system.
Until now, no vendor has used the open SGML standard to build tools that do this, said Ron Turner, co-owner of Soph-Ware Associates in Spokane, Wash., and principal developer for OSTI's distributed archive project. "Until the tools are built, the open standard really isn't doing much," he said.
OSTI officials said they hope to use the new SGML tools and the discipline that goes into analyzing and creating SGML document types to manage the interactions between all the components in the distributed archive.
Soph-Ware Associates will publish its distributed archive
specification and offer a
Goldfarb said the OSTI architecture is interesting because it doesn't rely on elaborate application programming interfaces for applications to communicate with one another.
"The idea of replacing a software way of thinking with a document way of thinking I find very innovative," Goldfarb said.
Although the architecture as envisioned does not use the mandatory Government Information Locator Service protocol, the American National Standards Institute's Z39.50, it performs a similar type of transaction, Turner said, and could have links to GILS.
OSTI's distributed archive initially would support Energy's bench scientists, most of whom are connected directly to the high-speed Energy Sciences backbone network.
The multimedia delivery infrastructure will depend on the increased bandwidth from the National Science Foundation's Internet2 initiative to upgrade ESnet, Smith said.
In addition to the government's $825,000 Small Business Innovation Research Grant, the SGML project has received in-kind funding from many different sources.
They include SoftQuad Inc.; Saros Corp., a business unit of FileNet Corp.; and One Room Systems.