[Archive copy mirrored from: http://www.grif.fr/newsref/xml.html, text only]

Home | About GRIF S.A. | Products | Services
News and Reference | GRIF at Work | Technical Support

A White Paper from Grif


Adding intelligence to your business-critical documents

Intranets can vary greatly in type and in complexity, from the simplest single page memo to the most complex technical documentation. As an alternative to the simple HTML format and the full SGML standard, the newly specified "eXtensible Markup Language", XML, offers an interesting approach that takes advantage of the powerful content-based markup and scalability of SGML while, at the same time, enabling documents to be published as easily as is currently possible with HTML. The XML standard will enable all types of business-critical documents to be created, edited, stored, published and accessed on corporate intranets, from the simplest single page memo to the most complex technical documentation.

Intranet Documentation

The corporate intranet is set to become the commonly used infrastructure of choice for producing and managing all corporate information and documentation. Intranet documentation systems are based around the SGML standard which is used to represent and structure information and documents shared over corporate intranets. The main advantage of an intranet-based documentation system intranet is increased productivity. Productivity improvements arise from the ability to create, edit, exchange and manage all types of documents using the same set of tools, formats and protocols. This improves the exchange of information among the various authors, reviewers or users of these documents. SGML-based systems also ensure that all documents in circulation from the simplest unstructured memos to the most complex professional documents which must comply with the predefined corporate document model for the type of document in question (contracts, illustrated parts catalogs, maintenance manuals, etc.)

The World Wide Web and HTML

On corporate intranets, as well as on the Internet, the most popular way to publish information is via the World Wide Web. The Web provides an effective way to link together all types of information, whether it be textual, graphics, video, or sound, using HTML (HyperText Markup Language) as the "glue" code between the different parts. The hypertext linking mechanism offered by HTML has been one of its major contributions, enabling the construction of easy and intuitive user interfaces for accessing published information. This user interface is implemented in browsers that are now the universal application on desktops.

The second major contribution of HTML has been to rapidly educate a very large audience as to the main advantages of distributed information systems: the structuring of information; the separation of the content and layout of documents; the use of standard formats and protocols that enable the exchange of information between heterogeneous systems and applications; the power of the hypertext metaphor to organize a set of documents to be searched, accessed and consulted interactively.

Intranet Documentation and SGML

HTML is based on SGML (Standard Generalized Markup Language), the ISO standard for defining and using content-based markup of information. SGML specifies the tag set that will be used in a particular type of document. The tag set, as well as the hierarchical relationships between those tags, are defined in a DTD (Document Type Definition) which provides the grammar for the structural representation to which the document must comply. HTML is an extremely simplified tag set that is well-suited for the structural representation and layout of simple Web pages.

The principal advantage of SGML is the possibility that it gives you for defining the most appropriate format for a given class of documents and for specifying document structures that can be as simple or as complex as necessary. Using SGML for documentation on corporate intranets provides two main advantages: firstly, the guarantee of being able to handle all corporate documents, whatever their complexity; and secondly the ability to use, for each class of documents, the document structure models most appropriate to the complexity of this document class. Very simple documents can thus be created with accordingly simple document models, while complex documents can be created with complex document models. There is no limitation on the number of document models that can be used documentation intranet and there is no obligation to know in advance all the document models that will be needed to handle all the possible documents. A new document model can be defined at any time to handle a new class of documents.However, it is always a good practice to not multiply the number of dtds at will : one of the strengths of SGML is to help classifying the documents into well specified types and this advantage disappears if we tend to have one dtd per document.


The simplicity of HTML structure, even if it has been obviously an important factor in the success of this format, is a severe limitation to the representation of professional documents. On the other hand, the richness of SGML makes it the format of choice for large mission-critical applications, but its complexity is a limitation when building applications to be used by a large number of non-expert users over the Web on the Internet or intranet.

The goal of XML, the eXtensible Markup Language, is to bridge the gap between the two, providing the richness of SGML for specifying the content-based structure and for editing documents, but providing the ease of use of HTML for publishing and accessing these documents onlineWeb.

To achieve these goals, XML has been designed by a working group of the World Wide Web Consortium (W3C) as a simplified subset of SGML specially designed for Web applications. The initial specification of XML was published in November 1996. The final version of this specification is due in April 1997. Future developments will include the specification of hyperlinks to be used in SGML applications on the Web and the application of DSSSL style sheets in Web browsers. An important decision has been to use the Unicode Standard to define the encoding of the characters that make up the text data used in XML documents.

What makes XML particularly suitable for Web applications, though, is its ability to handle document fragments without the need to know the structure model that was used to create these fragments. An XML document can thus be published without its DTD, the only requirement being that the document should be "well formed", that is to say that document elements are nested so as to enable the creation of a tree structure.

With SGML, a document must be distributed in its complete form, and it must fully comply with the DTD that was used to create it. It is impossible to read or process an SGML document without knowing the DTD. the pages be published without the description of the tag set used to build them, the DTD removes the main obstacle to the direct use of SGML on the Web: with SGML, the only way to read a document is for this document to be complete and to comply to the dtd it has been built upon. It is then impossible to read or do any processing of an SGML document without knowing the dtd. To exchange, edit, publish and access SGML documents on the Web, it is necessary that all clients know in advance all the dtds used to produce the documents they need to work with. This restricts the use of SGML on the Web to well delimited environments and applications where all possible users are part of the same organization or working group and thus have access to the DTD. This is not really a constraint in industrial projects where SGML documents are often mission-critical and, in any case, created, edited and managed through well-defined procedures on secure intranets. But this characteristic of SGML prevents it from being freely used to publish documents on the Internet at large, where possible users on the client side have no way to know the DTD used to create these documents.

By enabling the publishing of document fragments (or pages) without the description of the tag set used to create them (the DTD), XML removes the main obstacle to the direct exchange, editing, publishing and accessing of SGML documents on the Web and on corporate intranets. The XML standard provides all the advantages of the powerful content-based markup and scalability of SGML while at the same time enabling business-critical documents to be published as easily as is currently possible with HTML.

Last updated: 30/4/1997 Copyright © 1997 GRIF S.A.
Comments or problems to webmaster@grif.fr