[This local archive copy mirrored from the canonical site on 980113: http://www.isogen.com/papers/archintro.html; links may not have complete integrity, so use the canonical document at this URL if possible.]

A Tutorial Introduction to SGML Architectures

Author: W. Eliot Kimber

ISOGEN International Corp

Provides a brief, tutorial introduction to SGML architectures.

Copyright (c) 1997 W. Eliot Kimber and ISOGEN International Corp.


Table of Contents


1. A Tutorial Introduction to SGML Architectures

SGML architectures are really nothing more than plain old SGML document types that are used in a slightly different way. The difference is that an SGML document type defines the rules for a specific document while an architecture defines the rules for a class of documents. Architectures are roughly (and I stress roughly) analogous to supertypes in object-oriented programming in that they usually define general element types and attributes that can be specialized in individual documents. For example, an architecture might define the general element form "list", which your document then specializes into two distinct kinds of list, "ordered list" and "unordered list".

Conceptually, SGML document types and architectures are the same: they define the rules for a class of documents. The rules include both formal SGML-defined specifications using DTD syntax (the "DTD declarations") and other specifications using some combination of prose description and non-SGML-defined formalisms. The SGML DTD declarations enable SGML parsing and validation of the structure and syntax of document instances. The rest of the specification documents the total set of rules and may, in addition, enable additional validation beyond that provided by SGML or XML parsers.

The only difference between document types and architectures is the syntax of how they used from documents. For SGML document types, the DTD declarations are syntactically part of the document and directly define the element and attribute names used in the instance. For architectures, the declarations are used by reference, with elements and attributes in the instance mapped to elements and attributes in the architecture using a simple mapping mechanism.

SGML architectures are then a form of document type intended to be used by reference so as to allow specialization. Just like a document type, an SGML architecture defines a set of element types and attributes.

The SGML architecture mechanism is formally defined as part of the SGML Extended Facilities in ISO/IEC 10744:1997, Annex A.3, Architectural Form Definition Requirements, published in August, 1997 (but in use much earlier because the mechanism was implemented by the SP parser from James Clark in early 1996). The AFDR annex is available for online review at the ISO/IEC JTC1/WG4 Web site.

1.1. SGML Architecture Jargon

Like any formal mechanism, the SGML architecture mechanism has its own jargon, which while internally consistent, may not always be intuitively descriptive in a general context. The key architecture-related terms are:

SGML architecture

A set of rules that govern a class or superclass of documents.

Here the term "architecture" is used in the sense of a general governing plan, such as the plans for a house. For all intents and purposes the term SGML architecture is synonymous with the term SGML document type. Any architecture can be used as a document type and any document type can be used an architecture (although not all documents types make good architectures).

architectural form

An element type or attribute defined by an architecture.

The term "form" is used to distinguish element types and attributes defined by documents in their DTDs from element types and attributes defined by architectures.

Many people use the term "architectural form" to mean the entire architecture, which is incorrect but understandable.

There is no difference between an element type and an element form except for where it is declared (DOCTYPE declarations vs. architectural DTDs).

meta-DTD

The SGML-syntax declarations for an SGML architecture, i.e., the architecture DTD.

As Henry Thompson has pointed out, the term meta-DTD is really not correct and offends mathematicians and logicians who know that a meta-DTD would be a DTD for DTDs. For this reason, we are trying to replace the term meta-DTD with the term architectural DTD, which is more descriptive and technically accurate.

1.2. Jargon From Other Domains

SGML document types are essentially a form of data model or schema for individual documents. Therefore, SGML architectures are data models or schemas for classes of documents. There are any number of disciplines that provide various forms of schema and data model definition, all of which are more or less analogous to SGML architectures. In general, terms like "schema" and "data model" are synonymous with the term "architecture" as used here. Of course, any given schema or data model formalism will differ in expressive power, syntax, and other details.

One key difference between architectures and other schema methods is that SGML architectures can use a variety of forms and formalisms in their definition. Because the only formalism used in the definition of SGML architectures is DTD declarations, and because we know that DTD declarations are really useful only for defining syntactic rules, we know that additional forms of definition will be needed to completely define the rules for architectures. Thus, any methodology for defining schemas and data models may be used as part of an architecture's total definition. For example, you might use a modeling language like UML or EXPRESS to define the general data model and constraints the documents must conform to.

In other words, an SGML architecture is nothing more than a bag of rules for documents, with some of the rules definable using DTD syntax and the rest of the rules defined using some other mechanism or mechanisms.

1.3. How Architectures Work

The basic architecture mechanism is a simple one: you define the set of element types and attributes that make up your architecture (general document type) and then map elements and attributes in individual documents to the elements types and attributes in the architecture. This mapping can be explicit or it can be automatic, taking advantage of the default mapping rules defined by the formal SGML architecture mechanism.

The process of creating and using an architecture is as follows:
  1. Define the architecture by creating some element type declarations. For this example, the architecture is very simple, defining a trivial document structure:
    
<!-- Declarations for simple architecture -->
    <!-- Architecture name is "simplearch" -->
    <!ELEMENT simpledoc 
      (title,
       paragraph+)
    >
    <!ELEMENT title
      (#PCDATA)
    >
    <!ELEMENT paragraph
      (#PCDATA)
    >
    <!-- End of simple architecture declarations -->
  2. Create a document that will be mapped to the architecture:
    
<?XML version="1.0" ?>
    <trip.report>
    <destination>XML Developer's Day</destination>
    <p>I attended the XML Developer's day...</p>
    </trip.report>
  3. Point from the document to the architecture you want to map it to by adding an architecture use declaration:
    
<?XML version="1.0" ?>
    <?IS10744:arch name="simplearch" 
                   dtd-system-id="simplearch.dtd" ?>
    <trip.report>
    <destination>XML Developer's Day</destination>
    <p>I attended the XML Developer's day...</p>
    </trip.report>

    The architecture use declaration PI connects the document to the architecture and gives the architecture a name within the document ("simplearch" in the example). This name is then used to do the mapping from the document to the architecture.

    Notice that the architecture use declaration is very much like a document type declaration except that the architecture DTD is not a syntactic part of the document, meaning that the XML or SGML parser doesn't have to parse the architecture declarations in order to parse the document. However, because the declarations are pointed to, an architecture-aware processor can validate the document against the architectural DTD if requested to do so.

  4. Define the mapping from the document to the architecture by adding attributes to the elements:
    
<?XML version="1.0" ?>
    <?IS10744:arch name="simplearch" 
                   dtd-system-id="simplearch.dtd" ?>
    <trip.report
      simplearch="simpledoc"
    >
    <destination
      simplarch="title"
    >XML Developer's Day</destination>
    <p
      simplearch="paragraph"
    >I attended the XML Developer's day...</p>
    </trip.report>

    The simplearch attribute defines the mapping from each element to its corresponding element form in the simplearch architecture. In this example, the attributes are in the elements' start tags, but the mapping can also be defined by declaring the attributes and fixing their values:

    
<?XML version="1.0" ?>
    <?IS10744:arch name="simplearch" 
                   dtd-system-id="simplearch.dtd" ?>
    <!DOCTYPE trip.report [
     <!ATTLIST trip.report simplearch NAME #FIXED "simpledoc" >
     <!ATTLIST destination simplearch NAME #FIXED "title" >
     <!ATTLIST p           simplearch NAME #FIXED "paragraph" >
    ]>
    <trip.report>
    <destination>XML Developer's Day</destination>
    <p>I attended the XML Developer's day...</p>
    </trip.report>

    The two forms of the document are equivalent, but the second requires processing of the attribute list declarations, while the first version does not. Of course, for long documents, declaring the attributes saves the cost of repeating the attributes explicitly for every element instance. And of course, given a document with fixed declarations, it is easy to generate a document with explicitly-specified attributes.

The mapping defined by the simplearch attribute establishes a correspondence between the document and the architecture. This correspondence enables processing the document in terms of the architecture. Instead of defining processing in terms of element types, you can define it in terms of element forms in a particular architecture. Because the mapping is done using attributes, the processing can be defined simply by keying off of the value of the simplearch attribute, e.g.:

<rule>
switch (attval('simplearch')) {
  case 'simpledoc':
   ; Set up page or screen layout
   break;
  case 'title':
    set font="bold";
    break;
  case 'paragraph':
    set break-before="yes";
    break;
}
</rule>

Because the architecture is defined as a set of DTD declarations, you can create documents that use it directly. You can create such a document by resolving the mapping of a document to an architecture in order to create a new document in which the elements reflect the architecture but the order and content reflect the original document. Such a document is called an architectural instance. For example, the simpledoc architectural instance of the document shown in the example above is:


<simpledoc>
<title>XML Developer's Day</title>
<paragraph>I attended the XML Developer's day...</paragraph>
</simpledoc>

That's all there is to the basic mechanism: establishing a mapping from elements and attributes in a document to elements and attributes in an architecture's DTD. To do architecture-based processing of documents all you need to know is which attribute defines the mapping. However, to create documents and document types that take advantage of architectures, you need to understand a few more details about how the architectural mapping mechanism works. The SGML architecture mechanism provides two important facilities: architectural validation and automatic mapping. Architectural validation lets you validate documents against the architectures from which they are derived. Automatic mapping lets you do some or all of the mapping from a document to an architecture automatically, mostly by matching element and attribute names.

1.4. Automatic Architectural Mapping

The formal SGML architecture mechanism provides an optional architectural automapping mechanism that lets you avoid the cost of doing explicit mappings. In SGML architecture jargon, something that is mapped to something in the architectural instance is said to be architectural. Not everything in a document need map to something in the architectural instance, so not everything in a document needs to be architectural. For example, it is possible to suppress the architectural mapping of elements and data, which is sometimes necessary if they would cause architetural validation errors if they were mapped. By default, elements, except the document element, are not architectural and data is, subject to the automatic mapping rules explained here. In other words, elements are not architectural unless either explicitly mapped or automatically mapped, while data is architectural unless explicitly or automatically unmapped. In general the automatic mapping mechanism behaves in the way you would intuitively expect it to. (Of course, as in any such system, there are always a few cases which make sense logically but are not necessarily immediately intuitive; however, the mechanism was designed to behave intuitively as much as possible.)

The automatic mappings are:
  • Element and attribute names in the document are automatically mapped to elements and attributes of the same name in the architecture, unless explicitly unmapped. For example, if the architecture declares an element named "para" with an attribute named "ID", then a "para" element with an "ID" attribute will automatically be taken as architectural, including the value of the ID attribute.
  • The document element is automatically mapped to the architectural document element, unless explicitly mapped or unmapped. The architectural document element form is declared as part of the architecture use declaration. This mapping ensures that the architectural instance always has a document element. If no architectural document element is declared by the architecture use declaration, then the document element must be explicitly mapped.
  • Elements with ID attributes (that is, attributes whose declared value is "ID"), are mapped to the architectural bridging form if not otherwise mapped. The architectural bridging form is declared as part of the architecture use declaration. (See below for more about the architectural bridging form.). This mapping ensures that ID references have something to point to in the architectural instance.
  • Data entities are mapped to the default architectural notation if their notation is not also declared in the architecture. The default architectural notation is declared as part of the architecture use declaration. This mapping ensures that data entity references have something to point to in the architectural instance.
  • Data that occurs where data is allowed in the architecture is taken as architectural unless explicitly unmapped. This means that data will automatically "flow through" to the architectural instance only where it's allowed and will not cause architectural validation errors if it occurs in the document where the architecture doesn't allow it. This means you don't normally have to worry about how data in documents will map to the architecture, except to ensure that data does occur where it's expected.

The automatic mapping mechanism is active by default. You can control the automatic mapping of elements, attributes, and data using the automapping control attribute of the architecture use declaration. Data mapping can be controlled using the "ignore data" attribute, which you specify on individual elements to control the architectural mapping of their data.

Automatic mapping of elements can be turned off for specific elements by declaring the architectural mapping attribute with a default value of #IMPLIED or by specifying it with a null value. You sometimes need to do this when an element in your document happens to have the same name as an element in the architecture but doesn't actually correspond to that element. For example, say the architecture declares a "title" element for division titles, but you use "title" for the professional titles of people. You would need to unmap your title element, like so:


<?IS10744:arch name="somearch" ?>
<!DOCTYPE MyDoc [
  <!ATTLIST title 
    somearch NAME #IMPLIED
  >
]>
<MyDoc>...

1.5. Automapping and Document DTDs

The architectural automatic mapping facility provides a simple little trick that is particularly useful in an environment where documents may not have explicit DTDs, such as XML. The trick is this: if you declare what would have been a document's DTD as an architecture, then you can validate the document against the architecture with no explicit architectural mappings because everything will be mapped automatically. In other words, if a document conforms to its own declarations, then it will conform to those declarations when used as an architecture. This can be useful when transforming SGML documents with DOCTYPE declarations into XML or SGML documents without DOCTYPE declarations, letting you omit the declarations without losing the connection from the documents to their governing document type.

To use this trick, all you have to do is provide the necessary architecture use declaration. For example, given this original SGML document:


<!DOCTYPE MyDoc SYSTEM "mydtd.dtd" >
<MyDoc>
  ...
</MyDoc>
You can generate this equivalent, DTD-less XML document:

<?XML version="1.0" ?>
<?IS10744:arch name="mydtd" dtd-system-id="mydtd.dtd" ?>
<MyDoc>
 ...
</MyDoc>

The XML document now lacks DTD declarations but still maintains its connection back to the original DTD and can be validated against its original declarations using architectural validation. Because the instance hasn't changed, there's no need for any explicit architectural mappings.

This trick can also be useful when a document has partial declarations or when you simply want to be crystal clear about what the rules governing a document are.

1.6. Architectural Bridging Elements

When doing architectural mapping you often want some of the elements in your document to simply be architectural, meaning that they are part of the architectural instance, without mapping them to a specific semantic in the architecture. You often want to do this simply to preserve the element boundaries in the original document or to take advantage of attributes defined by the architecture. Thus, you need to have an element type in the architecture to serve as a generic mapping target. This element is referred to generically as an architectural bridging form. The term "bridging" comes from the idea that the form acts as a bridge between the elements defined by the architecture and elements and data unique to your document. You can think it of it as an "other" category to which you can map any element that doesn't have a better mapping.

For example, consider an architecture that provides generic elements for structuring people's names and addresses. It defines general structures like "name" and "address" without defining any specific substructure. However, the documents that map to this architecture may have lots of substructure. When doing the architectural mapping, you need to preserve the original element boundaries but there's nothing specific to map those elements to, so you need something generic in the architecture, i.e., an "other" category. Here is an architectural DTD for names and addresses:


<!-- Person Name and addresses architecture ("personarch")-->
<!ELEMENT person 
  (name,
   address?)
>
<!ELEMENT name 
   (#PCDATA | archbridge)*
>
<!ELEMENT address
   (#PCDATA | archbridge)*
>
<!ELEMENT archbridge
   (#PCDATA | archbridge)*
>

Here is a document to be mapped to the architecture:


<?XML version="1.0" ?>
<customer.record>
 <cust.name><last>Kimber</last><first>William</first></cust.name>
 <cust.address>
 <street>1234 Maple St.</street>
 <city>Austin</city><state>TX</state><zip>78757</zip>
 </cust.address>
</customer.record>
Note that the name and address have detailed substructure. Note also that there is no punctuation between the elements in the substructure, which is presumably autogenerated by style sheets. This means that if the original element boundaries are lost that the boundaries between different data items will be lost (the data content would all run together). Thus, when doing the architectural mapping, you need to preserve the element boundaries. You use the architectural bridging element for this.

First, you declare the use of the architecture and specify the name of the bridging element:


<?XML version="1.0" ?>
<?IS10744:arch name="personarch"
  bridge-form="archbridge"
?>
<customer.record>
 <cust.name><last>Kimber</last><first>William</first></cust.name>
 <cust.address>
 <street>1234 Maple St.</street>
 <city>Austin</city><state>TX</state><zip>78757</zip>
 </cust.address>
</customer.record>

Now you define the mapping from elements in the document to elements in the architecture, here done using attribute list declarations for clarity:


<?XML version="1.0" ?>
<?IS10744:arch name="personarch"
  bridge-form="archbridge"
?>
<!DOCTYPE customer.record [
 <!ATTLIST customer.record personarch NAME #FIXED "person" >
 <!ATTLIST  cust.name      personarch NAME #FIXED  "name"   >
 <!ATTLIST   last          personarch NAME #FIXED   "archbridge" >
 <!ATTLIST   first         personarch NAME #FIXED   "archbridge" >
 <!ATTLIST  cust.address   personarch NAME #FIXED  "address" >
 <!ATTLIST   street        personarch NAME #FIXED   "archbridge" >
 <!ATTLIST   city          personarch NAME #FIXED   "archbridge" >
 <!ATTLIST   state         personarch NAME #FIXED   "archbridge" >
 <!ATTLIST   zip           personarch NAME #FIXED   "archbridge" >
]>
<customer.record>
 <cust.name><last>Kimber</last><first>William</first></cust.name>
 <cust.address>
 <street>1234 Maple St.</street>
 <city>Austin</city><state>TX</state><zip>78757</zip>
 </cust.address>
</customer.record>

Now all the elements in the document are accounted for in the architectural mapping and no element boundaries will be lost. The personarch architectural instance for the sample document is:


<person>
 <name><archbridge>Kimber</archbridge><archbridge>William</archbridge></name>
 <address>
 <archbridge>1234 Maple St.</archbridge>
 <archbridge>Austin</archbridge><archbridge>TX</archbridge><archbridge>78757</archbridge>
 </address>
</person>

Architectural bridging forms are also useful for enabling the use of common attributes provided by the architecture, where what's important is the attributes, not the element structure. Elements that need to use the attributes are mapped to the bridging form, which lets them use the architectural attributes without implying any other architectural semantic. For example, say you have an architecture that provides a generic "security" attribute that you want to use with any element in the document. The architecture enables this by providing a bridging form that takes the security attribute, thus making it available to any element in the document. Because of their general nature, bridging elements usually have a very general content model, e.g., "ANY".

An example of an architecture with a bridging form with a security attribute is:


<!-- Architecture with general-use attributes and bridging form
     for security: "securearch"

     This architecture might govern security-related document 
     metadata as well as providing a generic "security" attribute
     for use by individual elements.  The document metadata parts
     of this architecture have been omitted for clarity in this
     example.
  -->
<!ELEMENT security.info 
  (doc.metadata,
   (#PCDATA | securearch.bridge)*)
>
<!-- securearch.bridge is the architectural bridging form for
     this architecture.  It provides generic security attributes. -->
<!ELEMENT securearch.bridge
  ANY 
>
<!-- security.level attribute identifies minimum security level
     of the element.  If omitted, the security level is the 
     highest security level of any subelements, if any specify
     a value, otherwise the security level is determined by the
     security level of the element's parent. 
  -->
<!ATTLIST securearch.bridge
   security.level (unclassified|internal|confidential|topsecret) #IMPLIED
>

A document that uses the security architecture uses the securearch.bridge form to associate security.level attributes in the document with the same attribute in the architecture:


<?XML version="1.0" ?>
<?IS10744:arch name="securearch" doc-elem-form="security-info"
  bridge-form="securearch.bridge" ?>
<!DOCTYPE MyDoc [
  <!ATTLIST Division
     myatt      CDATA #IMPLIED
     security.level (unclassified|internal|confidential|topsecret) #IMPLIED
     securearch NAME #FIXED "securearch.bridge"
  >
  <!ATTLIST para
     someotheratt  CDATA #IMPLIED
     security.level (unclassified|internal|confidential|topsecret) #IMPLIED
     securearch NAME #FIXED "securearch.bridge"
  >
]>
<MyDoc>
 ...
<Division security.level="internal" myatt="foo">
  ...
 <para security.level="confidential" someotheratt="bar">Oooh, don't look
 </para>
  ...
</Division>
</MyDoc>

The architectural instance of the above is:


<security.info>
  ...
 <securearch.bridge security.level="internal">
  ...
  <securearch.bridge security.level="confidential">Oooh, don't look
  </securearch.bridge>
  ...
 </securearch.bridge>
</security.info>

A processor that understands the security architecture (for example, to calculate the highest security level within a particular context) knows for sure that the security.level attribute it finds on the Division and para elements is the same attribute defined in the security architecture, and not one that happens to have the same name. The processor doesn't care what the original element types were because it's only interested in the value of the security.level attribute.

Some architectures only exist to provide attributes. In that case, you need only define a single architectural element type, which serves as both the architectural document element form and the bridging form.

1.7. Architectural Attribute Name Remapping

Through the automatic mapping mechanism, attributes of elements that are architectural are mapped to the architecture if they have the same name as an attribute in the architecture. For example, in the security architecture example in the previous section, the security.level attribute on the Division element is automatically mapped to the security.level attribute of the securearch.bridge form because the names are the same. However, you often want to map attributes in your documents to attributes in the architecture even though the names don't match. In addition, you may need to occasionally turn off coincidental mappings. Thus, you need to a way to map the names of attributes declared in the architecture to the attribute names used in your documents.

You map attributes in the architecture to your attribute names using the architectural attribute renaming attribute, which goes on the elements in your document for which the attributes are to be renamed. The name of this attribute is declared as part of the architecture use declaration in your document. For example, say you want to use the security.level attribute but you want to provide a shorter name for your documents, say sec. You first declare the name of an attribute renaming attribute as part of the architecture use declaration for the security architecuture:


<?XML version="1.0" ?>
<?IS10744:arch name="securearch" doc-elem-form="security-info"
  bridge-form="securearch.bridge" 
  renamer-att="securearch.names"
?>
...

You now use the attribute securearch.names to map the architectural attribute name security.level to your preferred name, sec:

<?XML version="1.0" ?>
<?IS10744:arch name="securearch" doc-elem-form="security-info"
  bridge-form="securearch.bridge" 
  renamer-att="securearch.names"
?>
<!DOCTYPE MyDoc [
  <!ATTLIST Division
     myatt      CDATA #IMPLIED
     sec        (unclassified|internal|confidential|topsecret) #IMPLIED
     securearch NAME #FIXED "securearch.bridge"
     securearch.names CDATA #FIXED "security.level sec"
  >
  <!ATTLIST para
     someotheratt  CDATA #IMPLIED
     sec           (unclassified|internal|confidential|topsecret) #IMPLIED
     securearch NAME #FIXED "securearch.bridge"
     securearch.names CDATA #FIXED "security.level sec"
  >
]>
<MyDoc>
 ...
<Division sec="internal" myatt="foo">
  ...
 <para sec="confidential" someotheratt="bar">Oooh, don't look
 </para>
  ...
</Division>
</MyDoc>

The architectural instance for this document is the same as in the previous section. The architecture-aware processor looks for the renaming attribute, sees that security.level has been remapped to sec, gets the value of the sec attribute, and uses that for the value of the security.level attribute in the architectural instance.

The value of the attribute renaming attribute is a list of name pairs, where the first name is the architectural attribute name and the second name is the local attribute name.

You can also use the attribute renamer to map an attribute to content and content to an attribute, as well as remapping the names used for attribute value name tokens.

1.8. Architectural Validation

For many uses of architectures, especially very general ones, simply establishing a correspondence between elements in the document and elements in the architecture is sufficient to the task at hand. In other words, often all you need to do is map local element names to the names defined by an architecture. However, because SGML architectures are defined using DTD syntax, you can validate documents against architectural DTDs just as you can validate them against their own DTDs. This can be very handy when the documents themselves have no DTD declarations, such as for well-formed XML documents. When combined with automatic architectural mapping, architectural validation can provide rigorous validation for documents that don't otherwise need to carry around their own DTD declarations, letting you defer validation until after parsing without losing the ability to validate altogether.

Architectural validation is no different from normal SGML or XML validation except that it is the architectural instance that is validated, not the original document. In other words, architectural validation can always be implemented by first generating a literal architectural instance and then validating that instance against the architectural DTD declarations. However, most tools that do architectural validation, such as the SP parser from James Clark, do it transparently.

Thus, the real question is "what constraints do the architectural DTD declarations impose on documents?" Architectural DTDs impose constraints as follows:

  • Elements required by the architectural content models must be mapped to corresponding elements in the document
  • Elements that are optional in the architectural content model need never be mapped to
  • Documents may include or allow elements that are not mapped to anything in the architecture. For example, if the architectural content model is "a,b", the document can have the content model "a,x,b" as long as the element "x" is not mapped to something in the architecture.
  • For attributes, only those attributes declared as #REQUIRED in the architectural DTD need be provided by the document. All other attributes are either implied or have architecture-defined defaults and therefore need never be present in the document (and need not even be declared if there are explicit attribute list declarations for the document).

Note that it is not an error to declare content models in your document that allow documents that are not valid with respect to an architecture. This is because architectural validation is of instances, not declarations. Of course, your document's declarations must also allow architecturally valid documents or validation errors are unavoidable.

1.9. Using Multiple Architectures

A single document can be derived from several architectures at once. This is because each architecture is processed independently of any other architectures, meaning that architectures cannot interfere with each other in the same document. One way to think of architectural mappings is as providing a "view" of a document where the architectural mapping is a filter applied to the document. As in other environments, you can apply a variety of filters to a single document.

The only potential problem with using multiple architectures is name collisions. However, the SGML architecture mechanism provides a complete set of renaming facilities so that any name collisions can be resolved in the document. For example, if two architectures happen to use the same name for some attribute, the attribute renaming mechanism can be used to map one or both attributes to different names on the elements that use them. Because each architecture has its own mapping attribute, there can be no conflict between different architectures in the mapping of elements in documents to elements in the different architectures.

Thus, the only real concern when using multiple architectures is ensuring that a given document is valid with respect to all the architectures it uses. However, this is more a concern for architecture designers, who need to ensure that their architectures are sufficiently flexible to allow them to be used with other architectures as appropriate. As a rule, the more general and widely applicable an architecture is, the fewer attribute and context constraints it should impose.

The effort of combining architectures can be hidden from document authors by creating a new architecture that does the combination. Documents then use this new architecture. This is possible because architectures can be derived from other architectures. For example, say you want to combine the trivial document architecture with the security architecture. You could do it simply by deriving your documents from both architectures or you could create a new architecture that is itself derived from these two architectures, simplifying the mapping documents have to do. Such a combination architecture might look like this:


<!-- Combo architecture that combines the trivial document architecture
     with the security info architecture.  New architecture is called
     "securedoc"
  -->
<?IS10744:arch name="simplearch" doc-elem-form="simpledoc"?>
<?IS10744:arch name="securearch" doc-elem-form="security-info"
  bridge-form="securearch.bridge" 
?>
<!ELEMENT simpledoc 
  (title,
   paragraph+)
>
<!ATTLIST simpledoc
  security.level (unclassified|internal|confidential|topsecret) #IMPLIED
  securearch NAME #FIXED "security.info"
>
<!ELEMENT title
  (#PCDATA)
>
<!ATTLIST title
  security.level (unclassified|internal|confidential|topsecret) #IMPLIED
  securearch NAME #FIXED "securearch.bridge"
>
<!ELEMENT paragraph
  (#PCDATA)
>
<!ATTLIST paragraph
  security.level (unclassified|internal|confidential|topsecret) #IMPLIED
  securearch NAME #FIXED "securearch.bridge"
>
<!-- End of securedoc architecture declarations -->

Documents can now be derived from this new architecture, which will then automatically connect them to the original architectures (because you can follow the chain of architecture use declarations). A typical document derived from the securedoc architecture might be:


<?XML version="1." ?>
<?IS10744:arch name="securedoc" doc-elem-form="simpledoc" ?>
<trip.report security.level="internal">
<destination securedoc="title">XML Developer's Day</destination>
<p securedoc="paragraph">I attended the XML Developer's day...</p>
</trip.report>

Note how much simpler the document is, as it need only reflect a single architectural mapping, even though there are really two architectures at work. Because there are three architectures in this system, there are three possible architectural instances that can be generated.

The first architectural instance is the one for the securedoc architecture, that is, the architecture from which the document is directly derived:


<simpledoc security.level="internal">
<title>XML Developer's Day</title>
<paragraph>I attended the XML Developer's day...</paragraph>
</simpledoc>

The second and third architectural instances are created from the first instance, as the securedoc architecture is itself derived from two architectures, simpledoc and security info. The simpledoc architectural instance is:


<simpledoc>
<title>XML Developer's Day</title>
<paragraph>I attended the XML Developer's day...</paragraph>
</simpledoc>
Lacking only the security.level attribute provided by the security architecture.

The security info architectural instance is:


<security.info security.level="internal">
<securearch.bridge>XML Developer's Day</securearch.bridge>
<securearch.bridge>I attended the XML Developer's day...</securearch.bridge>
</security.info>

Because these two architectural instances were generated from the same document (the securedoc architectural instance), there is a well-defined correspondence between the elements and data in the simpledoc instance and in the security.info instance, namely the original elements and data from which they were generated. This means that an architecture-aware processor that remembers the correspondence can process the instances independently yet still navigate to things in the other instances or in the original document. Of course, architecture-aware processors need never be implemented in terms of processing of architectural instances, but can just key off of attributes values.

Note, however, that if an architecture-aware processor is to support multiple levels of architecture, it does need to be prepared to traverse the architecture use chain.

Finally, given a combination architecture like the securedoc example, implementors always have the choice of implementing support at the securedoc level or reusing, by virtue of the mapping to the original architectures, the processing for those architectures (i.e., relying on an existing simpledoc or security info processor). How you do it depends on implementation needs and constraints. In other words, the use of multiple levels of architecture does not require that processors designed for one architecture be able to handle mutliple levels of derivation.

This paper has tried to provide a brief but informative introduction to the SGML architecture mechanism, how it is used with documents, and how processors can take advantage of architectures. I have not explored all the details of the architecture mechanism nor have I plumbed the depths of esoteric subtlties such as fine control of archictural mapping and unmapping. It is not necessary to understand any of these details in order to make immediate, productive use of architectures.

For most uses of architectures, simple mappings that rely heavily on the automapping mechanism will be the order of the day and will meet most requirements. In particular, the use of what would otherwise be document-level DTDs as architectures can provide significant benefits for "DTD-less" documents at a minimum cost, letting document creators enable validation without burdening instances with otherwise unnecessary DTD declarations.

Finally, when thinking about the SGML architecture mechanism, keep two important things in mind. First, the DTD declaration part of architectures is only part of the whole picture. An architecture is always bigger than the SGML-defined declarations used for it, so you should expect to have (or provide) additional definitions and documentation for the architectures you are using, including prose descriptions as well as other formal specifications, such as object models or database schemas. No useful architecture can be completely defined by DTD declarations alone.

Second, remember that an architecture is, ultimately, a bag of rules that you can give a universally-unique name to. The public ID or URN you give to an architecture names the entire set of rules, however they might be defined, not just the SGML-syntax formalisms. This provides a way for documents to point unambiguously to the rules that govern them. This pointing to the rules makes it clear to both human observers and processing programs what the intended rules are without the need to pass that information "out of band". This alone can have as much benefit as facilities like architectural validation and name remapping provide.

Bibliography

  1. ISO/IEC 10744:1997 (HyTime), Final Editors Draft. URL: http://www.ornl.gov/sgml/wg8/docs/n1920/
  2. HyTime User's Group Web site. URL: www.hytime.org. Central starting point for all things HyTime.
  3. Practical Hypermedia: An Introduction to HyTime. Kimber, William E.. Prentice-Hall Professional Technical Reference. ISBN 0-13-309899-0. Draft available for review. Due to be published mid 1998. URL: http://www.drmacro.com/bookrev