SGML ARCHITECTURES:

Implications and Opportunities for Industry


by Steven R. Newcomb, President

TechnoTeacher, Inc.


Introduction

SGML has always offered the means whereby the syntax and semantics associated with a given information construct (an element type) can be expressed; these are expressed in element definitions in document type definitions (DTD). Differing document types may contain some similar or identical element definitions, and sets of software applications can be made to contain or use similar or identical software modules for processing such similar or identical element types. In this way, anyone who controls some set of DTDs can heighten the application-neutrality of the information contained in documents conforming to those DTDs, save money on software development, and reduce expensive confusion in general, by maximizing the generality of each information construct (element type), and by avoiding, insofar as possible, any duplication of semantics which do not also duplicate syntax. However, until quite recently, with the advent of the HyTime (ISO/IEC 10744) international standard, there was no agreed-upon formalism for the expression of similarity in structure and semantics. Now these things can be expressed formally, and enforced, at least to some extent, automatically, in a new, more abstract kind of document type definition called a ``meta-DTD.'' A meta-DTD describes the structure and semantics of a class of documents which therefore conforms to an ``SGML architecture.''

The need for modularity, consistency, and reusability is addressed in similar ways in the context of good object-oriented system design, and the similarity between SGML element types, on the one hand, and object classes, on the other, is not coincidental. The notion of ``inheritance'' of the structure and functionality of one class by another is at the heart of the innovation of SGML architectures; SGML architectures make it possible for element types to ``inherit'' the characteristics of the meta element types defined in SGML architectures.

SGML Architectures vs. Current Practices

In the absence of SGML architectures, syntactical similarities between constructs used in some set of DTDs could be represented and enforced by SGML's parameter entity definition/reference feature. This permits any arbitrary portion of a DTD, such as an attribute definition list, to appear in multiple places by a text substitution mechanism. Another tool for enforcing similarity in some set of DTDs is simply to create a single DTD entity which contains all of the elements used in all of the DTDs in the set. This single entity can define all of the element types common to all of the DTDs, and all of the other element types as well, so long as each document type is uniquely associated with a single element type as its document element. All documents conforming to all of the DTDs in the set refer to the same DTD entity, but each document's While the method of using common DTD fragments that are inserted into actual DTDs is usually workable, this method can bring with it some undesirable rigidity. Wherever a DTD fragment is inserted into a DTD, it is inserted verbatim. Even if parameter entities are not used, or if they are used in complex ways (such as having the inserted text of a parameter entity contain a reference to a previously-defined parameter entity), the insertion of DTD fragments can result in the propagation of unnecessary and unnatural constraints on the structure of documents, or, alternatively, less structural constraint than is desired by the architect, and than can be usefully validated by an SGML parser or SGML database engine. Moreover, the impact of a change in a parameter entity on any given DTD can be surprising and confusing to everyone but a computer.

The use of SGML architectures, rather than sets of DTD fragments, for formalizing the structures needed to permit interchange of information common to several document types, permits the architects of DTDs to retain full and optimal control of SGML's validation apparatus, while still guaranteeing information interchangeability. DTDs can be permitted to change in any way that does not violate the constraints imposed by the SGML architecture. Each SGML architecture can be developed in such a way that it imposes no constraints on document structure that are not actually necessary for interchange of the information with which the SGML architecture is concerned. The DTDs themselves can then be used to impose as much further constraint on actual document instances as desired, so none of the constraining power of SGML is lost to any particular document type simply because of a need to allow the interchange of information elements whose general outline must appear in more than one type of document.

HyTime is the Pioneer SGML Architecture

Like all SGML architectures, HyTime is formally described by a meta-DTD consisting of a set of meta element types, called ``architectural forms.'' In a document instance conforming to HyTime, any element that inherits the characteristics of a HyTime architectural form is recognized by a HyTime application by means of the value of that element's ``HyTime'' attribute, which is always the name of the architectural form. The name of the attribute, ``HyTime,'' corresponds to the HyTime architecture, and its value (e.g., ``ilink'') corresponds to the ilink architectural form (the ilink meta element type) in the HyTime meta-DTD. (Dave Peterson has aptly said that the value of an architecture attribute, such as ``ilink'' used as the value of a ``HyTime'' attribute, can be considered a ``meta generic identifier'' or ``meta-GI.'') Any element whose ``HyTime'' attribute's value is ``ilink'' is universally known to HyTime applications as an element that expresses a relationship of some kind, and that has certain HyTime-defined syntactic characteristics.

HyTime is a very important architecture for several reasons:

SGML Architectures Other Than HyTime

A limitless number of SGML architectures can be usefully developed. It now appears that business-context-specific SGML architectures, rather than DTDs, will be the most effective and, at the same time, the least costly method of allowing information interchange within any specific context, such as an entire enterprise, in which there are multiple types of documents. For the context of the Gourd Motor Company (no relation to any existing corporate entity), for example, a company-wide SGML architecture can be developed in which the value of a so-called ``Gourd'' attribute identifies the Gourd architectural form to which any given element is intended to conform. For example, if Gourd=requisition, then the element conforms to the constraints universally specified, throughout the Gourd organization, for requisition documents. This allows each division, department, or other subunit to define its own subclass(es) of Gourd-standard requisitions, each kind meeting all the syntactic and semantic requirements of all who need to use the information contained in it. In the case of a Gourd unit that, unlike any other Gourd unit, purchases radioactive materials, this kind of flexibility can be extremely desirable. In such a case, all the government and environmental paperwork can become part of the requisition document, organized in a fashion determined locally by that unit, but still processable by the purchasing department and all other concerned units at Gourd.

The notion that SGML architectures can and should be used to achieve enterprise integration has a number of interesting and profound implications for the information processing industry. Among these implications are:

The Loci of Control in Organizations

The distribution of control among the hierarchical levels of human organizations is a tricky balancing problem. Control which is too centralized reduces the initiative and adaptability of organizational subunits and individuals. Adaptability is survivability; overly centralized organizations court their own demise by reducing the value of their most important assets: the accumulated experiences of their personnel. On the other hand, control which is overly distributed makes organization-wide cooperation difficult or impossible. Opportunities afforded by combining the experiential and other assets of several subunits are can be missed, unless by some lucky accident all of the subunits involved spontaneously cooperate with one another. It is always an open question, in any given organization, whether the distribution of control occurs in an optimal fashion, and constantly changing conditions demand constant re-evaluation of the mechanisms whereby control is distributed.

Control, by definition, is the way in which decisions are taken and implemented, and decisions must always be based on information. The way in which information is structured both reflects and determines the way in which management (at whatever level) perceives the information on which decisions will be based. Control of information architecture is a fundamental aspect of management. For example, while it may not be a manager's job to maintain an organization's records, it is definitely the job of some manager to decide exactly how those records will be organized and maintained. The way in which responsibility for the structure of records is distributed throughout an organization, therefore, may reveal much about how control in general is distributed throughout that organization.

Some general observations follow from all this:

The ongoing evolutionary development and use of SGML architectures will help all organizations meet the above challenges more efficiently. Using SGML architectures, control over various aspects of the organization's information architectures can be formally distributed, and, when necessary, redistributed. In general, managers already have most of the information architecture skills they need. They have lacked only a formal, standard, and sufficiently flexible way to express and propose evolutionary changes in the way information is structured. The SGML architecture formalism set forth in Annex C of the HyTime standard will increasingly fill this need. [Copyright (C) 1994-5 TechnoTeacher, Inc. All Rights Reserved. This article first appeared in , the SGML Newsletter, Volume 8, Number 8 (August, 1995), pp. 1-5. (SGML Associates, Inc., 6360 S. Gibraltar Circle, Aurora, Colorado 80016-1212 USA. +1 303 680 0875; fax +1 303 680 4906.)]