[This local archive copy mirrored from the canonical site on 980113: http://www.isogen.com/papers/archintro.html; links may not have complete integrity, so use the canonical document at this URL if possible.]
Author: W. Eliot Kimber
ISOGEN International Corp
Provides a brief, tutorial introduction to SGML architectures. |
Copyright (c) 1997 W. Eliot Kimber and ISOGEN International Corp.
SGML architectures are really nothing more than plain old SGML document types that are used in a slightly different way. The difference is that an SGML document type defines the rules for a specific document while an architecture defines the rules for a class of documents. Architectures are roughly (and I stress roughly) analogous to supertypes in object-oriented programming in that they usually define general element types and attributes that can be specialized in individual documents. For example, an architecture might define the general element form "list", which your document then specializes into two distinct kinds of list, "ordered list" and "unordered list".
Conceptually, SGML document types and architectures are the same: they define the rules for a class of documents. The rules include both formal SGML-defined specifications using DTD syntax (the "DTD declarations") and other specifications using some combination of prose description and non-SGML-defined formalisms. The SGML DTD declarations enable SGML parsing and validation of the structure and syntax of document instances. The rest of the specification documents the total set of rules and may, in addition, enable additional validation beyond that provided by SGML or XML parsers.
The only difference between document types and architectures is the syntax of how they used from documents. For SGML document types, the DTD declarations are syntactically part of the document and directly define the element and attribute names used in the instance. For architectures, the declarations are used by reference, with elements and attributes in the instance mapped to elements and attributes in the architecture using a simple mapping mechanism.
SGML architectures are then a form of document type intended to be used by reference so as to allow specialization. Just like a document type, an SGML architecture defines a set of element types and attributes.
The SGML architecture mechanism is formally defined as part of the SGML Extended Facilities in ISO/IEC 10744:1997, Annex A.3, Architectural Form Definition Requirements, published in August, 1997 (but in use much earlier because the mechanism was implemented by the SP parser from James Clark in early 1996). The AFDR annex is available for online review at the ISO/IEC JTC1/WG4 Web site.
This paper has tried to provide a brief but informative introduction to the SGML architecture mechanism, how it is used with documents, and how processors can take advantage of architectures. I have not explored all the details of the architecture mechanism nor have I plumbed the depths of esoteric subtlties such as fine control of archictural mapping and unmapping. It is not necessary to understand any of these details in order to make immediate, productive use of architectures.
For most uses of architectures, simple mappings that rely heavily on the automapping mechanism will be the order of the day and will meet most requirements. In particular, the use of what would otherwise be document-level DTDs as architectures can provide significant benefits for "DTD-less" documents at a minimum cost, letting document creators enable validation without burdening instances with otherwise unnecessary DTD declarations.
Finally, when thinking about the SGML architecture mechanism, keep two important things in mind. First, the DTD declaration part of architectures is only part of the whole picture. An architecture is always bigger than the SGML-defined declarations used for it, so you should expect to have (or provide) additional definitions and documentation for the architectures you are using, including prose descriptions as well as other formal specifications, such as object models or database schemas. No useful architecture can be completely defined by DTD declarations alone.
Second, remember that an architecture is, ultimately, a bag of rules that you can give a universally-unique name to. The public ID or URN you give to an architecture names the entire set of rules, however they might be defined, not just the SGML-syntax formalisms. This provides a way for documents to point unambiguously to the rules that govern them. This pointing to the rules makes it clear to both human observers and processing programs what the intended rules are without the need to pass that information "out of band". This alone can have as much benefit as facilities like architectural validation and name remapping provide.