[Archive copy mirrored from: http://tag.sgml.com/5020101.htm, text and partial links only]

® The SGML Newsletter

Article from February, 1992

SGML Architectural Forms

by Ludo Van Vooren and Eric C. Severson
Abstract

One of the persistent concerns voiced by SGML users and would-be users revolves around attaching semantic information to the elements in an SGML document. This is sometimes a formidable problem since it can involve anticipating and describing the processing that is to be applied to the information in the document.

A concept emerging from the HyTime Committee, called "SGML Architectural Forms," provides SGML users with a new tool for describing document semantics. The essence of the Architectural Forms idea is that it allows users to extend the attribute set for an element without doing violence to the basic processing, parsing and integrity of the DTD or associated document instances. Extending the attribute set allows users to express and preserve information that would otherwise require use of external files. The attraction of the approach is that it does not require use of new structures and processes; it uses the SGML parser and an extended form of the existing DTD to convey the desired semantic information.

Background

SGML systems must couple an SGML parser with an application. Although ISO 8879 does not explicitly define the "result" a parser should produce, the parser is implicitly required to pass, in some form, the information contained in the document instance to the application. As far as the parser is concerned, an SGML file is a series of nested containers. Some containers hold data, some hold other containers and some hold both. However, all the containers are of the same nature. They are distinguished by their "generic identifiers," commonly known as their "tag names." This "generic identifier" property attaches a DTD defined name to each container. SGML attributes and their values are also attached to containers and provide extra information about particular containers.

From the application's point of view, containers are processed based on the information attached to them. If an application is "hard-wired" to a particular DTD, processing decisions can be based directly on the container's name and attributes. For example, a particular application could recognize the container name "T" to be a table with an attribute "NC" indicating the number of columns. In this kind of application, processing semantics are determined directly be the container names.

However, this is not a flexible application of SGML. The strength of SGML is realized when multiple sets of processing semantics can be attached to a single set of DTD named containers.

Attaching Semantics to SGML Elements

When an SGML application is designed, its process can be expressed as a list of semantics. For example, hypertext delivery systems process semantics such as links, anchors and graphics. Each of these semantics operates on a defined list of arguments. To process a link, for example, the application needs to know what the target of the link is. The challenge for a fully generic SGML application is to assign semantics and capture the data required by these semantics from any DTD. In other words, a generic application should be able to process a chapter regardless of whether it is marked with an SGML element named "CHAPTER," "CHP" or "XDFGT."

A specific semantic assignment and the data that it requires can be expressed as an object architecture. For example, in a hypertext system one possible element is a context link. The object definition begins with the name of the object, "clink." To be processed, the object "clink" must include a value for its anchor point "target." The object definition could be expressed as:

object:   clink
          target: identifier for the anchor

There are many options when it comes to attaching semantics to an SGML DTD. Whichever option is chosen this "connection" must be established for each new DTD handled by the system. Therefore, for each new DTD, a phase of DTD "installation" is required.

In some existing SGML systems, this "connection" is established through a proprietary syntax stored in a separate file, a "style-sheet" containing processing semantics. For example, to assign link semantics to "XREF" elements with link targets expressed as "IDREF" attributes, an SGML-style encoding might be:

<style name="XREF">
       <object> link   </>
       <process script> linkscript tvalue=@ (IDREF) </>
</>

where XREF is assigned to an <object> named "link" (in this case the object may have screen display capabilities attached), and <process script> attaches a program called "linkscript" to the object "link." When this instance of "link" is activated, "linkscript" is called with the value in "tvalue" which is assigned the contents of IDREF.

This kind of solution is common in SGML applications. CALS, for example, uses a similar semantic assignment in the Format Output Specification Instance (FOSI) standard. Because the assignment of processing semantics is a general one, the ISO is working on a way to attach semantics to SGML elements in their Document Style and Semantics Specification Language (DSSSL).

Semantic assignment for SGML applications is a processing burden as well as an encoding burden. If the assignments are made with a proprietary syntax, a special module will be needed to read this additional code. This is currently the case with FOSI and the latest version of DSSSL. Recently, however, some SGML applications experts have begun to use SGML itself to provide semantic assignments in a way that does not compromise the generality of SGML. This solution has become known as "SGML Architectural Forms."

SGML Architectural Forms

"SGML Architecture" (or "architectural forms") is a concept invented by the HyTime Committee. It is a significant step forward in generic SGML processing. And for hard-wired SGML applications that work with only one DTD, the SGML architecture approach provides an simple, low-cost way to connect to other SGML applications.

The idea behind SGML architectural forms is to directly code the relationship between SGML elements and target applications semantics in the DTD of the document instance to be converted. An application following the architecture forms model will consist of a set of named semantics or architecture forms, each of which will have a list of named arguments that it requires. The definition of the architectural form can be expressed in SGML. For example, a link similar to the illustration above could be defined as:

name:    ilink
         target: identifier of the target
         traverse: 0=one way, l=both ways

This information can be coded directly in the existing DTD using an extended set of #FIXED attributes:

<!ATTLIST   xref
            idref IDREF #REQUIRED
            one-way (0|1) "0"
            semantic "ilink" #FIXED "ilink"
            target "idref" #FIXED "idref"
            traverse "one-way" #FIXED "one-way"
>

By using extensions to the DTD, the architectural forms approach allows attribute "semantics" to be attached to containers that are processed by an SGML parser and passed to an application. Attributes with default values of type #FIXED are implied but visible to the application. Thus, the application can access the corresponding architecture form and the attribute information needed for processing. As illustrated above, each attribute specification indicates the location of required attribute information. For example, the value of the attribute "target" indicates that the attribute "idref" contains the required information.

To avoid any conflict in attribute names a single reserved attribute name is used to make alternative assignments. When that attribute is present, it indicates the attribute name mapping for conflicting attributes. In the illustration, if the attribute "traverse" existed in the original DTD, the coding of the architecture form would look like this:

<!ATTLIST   xref
            idref IDREF #REQUIRED
            one-way (0|1) "0"
            traverse (slow | fast) #IMPLIED
            ARCFORM   "traverse directn"
                      #FIXED "traverse directn"
            semantic "ilink" #FIXED "ilink"
            target "idref" #FIXED "idref"
            directn "one-way" #FIXED "one-way"
>

Warning

To take advantage of SGML, you don't want to write a DTD that describes the form of a document. Elements like BOLD or INDENT-P are bad SGML practice. You have to be careful not to fall in the same trap with architectural forms. It would be bad practice to attach an architectural form "BOLD" to a chapter title. Instead, you want to attach a "Chapter Title" architectural form to the corresponding tags in all your DTDs.

Conclusion

The advantage of the architectural forms method is that it uses an SGML parser to decode and transmit both the original SGML information and the semantics to be used for an application. This way, only one tool (SGML parser) and one known syntax (SGML DTD) are used. In many practical situations, this is a very attractive solution for generic SGML processing.

We believe that this will greatly help SGML. An immediate application could address an area that has been overlooked by developers: SGML searches across multiple document types. For example, the user might want to find the chapter titles but does not know how these are tagged in the various DTD's. A search engine could look for architectural forms instead of tag names. But this is the subject of a different article... </>

Bio: Ludo Van Vooren is Director of Applications, and Eric Seversen is Executive Vice President of Avalance Development Corporation in Boulder, Colorado.

Abstract: An introduction into the HyTime concept of Architectural Forms used to attach semantic information to the elements in an SGML document.


Prev Article	Table of Contents	Next Article

<TAG> is a registered trademark of SGML Associates, Inc.
All copy and information on this site is copyright © 1996 by SGML Associates, Inc.
Last modified 6-September, 1996.
Feedback to Webmaster
Last modified: 01/06/1997 9:19 p.m.