Standards for Interoperability

By Mary Fletcher Laplante, Executive Director, SGML Open

The Standard Generalized Markup Language is an international standard (ISO 8879) for the open interchange of documents and document-based information.

SGML defines a scheme for tagging information in a neutral, non- proprietary format that describes its content and structure. SGML-encoded data contains no application- or platform-specific processing instructions that constrain its use. SGML therefore increases the value of an organization's greatest assets -- the knowledge and information in its documents -- by making them accessible and reusable across platforms, applications, users, and time. It maximizes the return on the significant investments that we make in generating and maintaining document-based information. It also gives an enterprise real ownership of its data because it is not locked into a proprietary format that is controlled by a vendor.

What place does it fit in the document management process?

As a data format, SGML applies to information itself rather than to a particular component of or function within the document management process. As such, SGML "fits" the entire process, from capture to information dissemination.

Information can be created in or converted to SGML from other front-end data-capture systems. The document, enriched with structural information through the use of SGML, can now be processed as a collection of information objects that relate to each other. These objects can be stored in a database and managed individually. They can be shared, accessed, and manipulated independently of their use within one particular document. To disseminate information, documents can generated dynamically by pulling SGML objects from the database to fill a document "container" that is appropriate for the target platform. Because SGML separates content from format, presentation characteristics can be applied upon delivery, which offers the benefit of maximum publishing flexibility.

How does it contribute to interoperability?

SGML is one of the most important document management investments that an organization can make because it ensures the interoperability of its information.

Technology changes every eighteen months, which is one of the reasons why we invest in open systems that are based on de jure and de facto standards. We want to be sure that the hardware and software that we buy today will work with other systems we have now or will have in the future. Investments in open systems platforms and architecture may increase the value of a corporation's physical assets, but those systems are guaranteed to be replaced eventually. The best investment that we can make in open systems is the development and maintenance of open information, which is what SGML enables.

Besides strategic benefits, SGML also offers the tactical advantages of allowing an organization to share the same data across multiple document repositories, thereby supporting enterprise-wide document management. Organizations can choose the products and technologies that are best suited to their needs, while knowing that their documents are interchangeable and accessible to anyone, even across repositories.

How does it relate to other standards?

As an open, de jure, non-proprietary standard, SGML has no direct competition. Confusion about its relationship to other standards occurs in three general areas: with regard to document publishing standards, to HTML (Hypertext Markup Language), and to compound document applications such as OpenDoc.

Until recently, it was common to compare SGML to other de facto document processing standards such as the PostScript page description language developed Adobe Systems Inc. and Microsoft RTF (Rich Text Format). There are several key differences, however, that clearly distinguish them. PostScript and RTF encode documents with processing instructions for rendering their format -- their typefaces, positions of characters and graphics in a physical space, and so on. They are highly page-oriented. SGML, on the other hand, encodes documents with intelligence about their structure, not with instructions about their appearance or format. Presentation characteristics for either page or pageless rendering are applied when the information is published, not when it is transported, archived, or created, as with other publishing standards. This means that the most important aspect of a document -- its content -- is readily accessible and reusable, which makes SGML an information management rather than a publishing standard.

There is another important difference between SGML and publishing standards like PostScript and RTF that is often overlooked. The PostScript and RTF specifications are open in the sense that they are published and freely distributed, but they are proprietary to the companies that develop them. They are vendor-controlled, subject to change in order to protect market share. As an ISO standard, SGML is both open and non-proprietary.

One of the most commonly asked questions in the SGML industry today is how SGML relates to HTML, the format that is used to encode documents for delivery on the World-Wide Web. HTML (versions 2.0 and higher) is an application of SGML, which means that they are complimentary, not competing. Business organizations can maintain their data repositories in SGML format, then translate to HTML when the delivery channel is the Internet.

Regarding compound document application standards, SGML data can be one of the types of information objects in a compound document. Again, SGML is complimentary to rather than competitive with OpenDoc and OLE.

Position in the market? Market share, major vendors supporting, etc.

SGML is not a "standard in progress." It was adopted by ISO in 1986, which means that it is mature and stable. Businesses can implement SGML with confidence, as it is not a moving target that is subject to extensive and repeated revision.

In its initial implementations, SGML served primarily as a vendor-neutral substitute for proprietary publishing system markup. Today's SGML applications go way beyond publishing, however, and into the broader arena of information management. SGML is being used to manage large-scale databases of information, deliver electronic documents with hypertext facilities across the Internet, transmit news stories via news wires for real-time processing, and even design user interfaces for software products.

In the past, SGML has been referred to as a niche market, characterized primarily by small, moderately-successful independent software and service providers. That is changing rapidly, however, as SGML becomes less of a "boutique" application and moves into the mainstream of the information technology market. Annual sales of SGML products and services are now in the hundreds of millions of dollars, and software giants Microsoft and Novell have entered the market -- a sure sign that SGML is a standard to take seriously. As an enabler of document management, electronic document delivery, and Internet publishing, SGML adoption will only accelerate in the future, as these are three areas of technology in which business enterprises are making significant investment.

What are the main issues going forward to ensure success and usefulness of the standard?

There are two sets of issues that the SGML industry is addressing in order to ensure its continued success and usefulness:

"Standards for Interoperability" was written by Mary Fletcher Laplante, Executive Director of SGML Open

An international consortium, SGML Open is dedicated to accelerating the widespread adoption of ISO 8879, the Standard Generalized Markup Language. Members include vendors providing a broad range of SGML software and services, augmented by an advisory board of industry leaders and analysts and liaison relationships with customer user groups.

This article appeared in the Jan/Feb 1995 issue of The Gilbane Report.