[Archive copy mirrored 971002 from the URL: http://home.sprynet.com/sprynet/dmeggins/xml-arch.html; see this canonical/updated version of the document.]

Proposal: Architectural Forms for XML

Author:
David Megginson <dmeggins@microstar.com>
Date:
XML Dev Day, 21 August 1997

Contents

  1. Advantages of Architectural Forms
  2. Examples of Architectural Forms
  3. Syntactic Overview

Advantages of Architectural Forms

Business Advantages

  1. Less disruption to corporate culture: you can design XML document types best suited to your needs, rather than forcing employees to adapt to an outside exchange standard.
  2. Reusability: a single document can contain many different types of information for many different purposes, all in the same place.
  3. Safer, more trustworthy data: you can (optionally) perform many different validations on the same document, and can exchange compatible validation requirements with other companies and organisations easily.

Technical Advantages

  1. Proven technology: the methodology (with slightly more complicated syntax) is already in use in full-SGML production systems.
  2. Simplicity: the mappings are simple and elegant, and in a DTD-driven environment, they can be hidden entirely from authors.
  3. Requires no syntactic extensions to XML: the architecture base declaration is a processing instruction, and all mappings are handled by attributes; as a result, you can do basic architectural processing with existing XML tools.
  4. Allows multiple inheritance: a document, and even a single element, can belong to more than one base architecture at the same time.
  5. Moots the name-space issue: documents can use any element type names they want, as long as they provide mappings to the base architecture(s).

Examples of XML Architectural Forms

This section develops the example of a simple order document type for online purchasing. In addition to its regular structure, the document uses two base architectures: invoice, containing invoicing information for the supplier, and part-list, containing part information for the customer.

DTD-Aware Example

In this example, the document will be processed by a DTD-aware XML parser (like NXP, MSXML, or SP). All of the architectural information is contained in the DTD itself, using #FIXED attribute declarations for the mappings:

Client DTD (for DTD-aware processing):

<?XML-ArcBase invoice ArcAuto="nArcAuto"?>
<?XML-ArcBase part-list ArcAuto="nArcAuto"?>

<!ELEMENT order (sender, recipient, item+, price)>

<!ELEMENT sender (#PCDATA)>
<!ATTLIST sender
  invoice    NUTOKEN #FIXED "customer">

<!ELEMENT recipient (#PCDATA)>
<!ATTLIST recipient
  invoice    NUTOKEN #FIXED "customer"
  part-list  NUTOKEN #FIXED "source">

<!ELEMENT item (#PCDATA)>
<!ATTLIST item
  part-list  NUTOKEN #FIXED "part"
  quantity   NUTOKEN #REQUIRED
  partno     NUTOKEN #REQUIRED>

<!ELEMENT price (#PCDATA)>
<!ATTLIST price
  invoice    NUTOKEN #FIXED "billable">

The document itself shows no evidence that you are using architectural forms --- the DTD silently arranges that all valid documents will conform to the two base architectures as well:

Client document (for DTD-aware processing):

<?XML version="1.0"?>
<!DOCTYPE order SYSTEM "http://www.acme.com/dtds/order.dtd">
<order>
<sender>Wile E. Coyote</sender>
<recipient>ACME Parts Inc.</recipient>
<item quantity="1" partno="516">Giant slingshot</item>
<item quantity="1" partno="18">Electro-magnet</item>
<item quantity="1" partno="774">Jet engine</item>
<price>USD1,789.57</price>
</order>

Well-Formed Example

For XML parsers that are capable only of well-formed parsing (like LARK), mappings must appear as attribute values in the document itself. I have disabled automatic mapping here, but in cases where such an approach makes sense, you will not need to specify all of the mappings explicitly:

Client document (for non-DTD processing):

<?XML version="1.0"?>
<?XML-ArcBase invoice ArcAuto="nArcAuto"?>
<?XML-ArcBase part-list ArcAuto="nArcAuto"?>
<order>
<sender invoice="customer">Wile E. Coyote</sender>
<recipient invoice="supplier" 
  part-list="source">ACME Parts Inc.</recipient>
<item part-list=part quantity="1" 
  partno="516">Giant slingshot</item>
<item part-list=part quantity="1" 
  partno="18">Electro-magnet</item>
<item part-list=part quantity="1" 
  partno="774">Jet engine</item>
<price invoice="billable">USD1,789.57</price>
</order>

Note that the root element always maps to the root element of the base architecture (in this case, invoice for the invoice architecture and part-list for the part-list architecture), so there is no need to provide an explicit mapping.

Architectural Documents

Normally, processing software will simply note the values of the architectural form attributes rather than actually constructing the architectural documents, but the following two figures show what the above mappings represent conceptually:

Architectural document for invoice base architecture:

<?XML version="1.0"?>
<invoice>
<customer>Wile E. Coyote</customer>
<supplier>ACME Parts Inc.</supplier>
<billable>USD1,789.57</billable>
</invoice>

Architectural document for part-list base architecture:

<?XML version="1.0"?>
<part-list>
<source>ACME Parts Inc.</source>
<part quantity="1" partno="516">Giant slingshot</part>
<part quantity="1" partno="18">Electro-magnet</part>
<part quantity="1" partno="774">Jet engine</part>
</part-list>

Syntactic Overview

For every XML base architecture, there is a single architecture base declaration, possibly containing architecture support variables for configuration. Then, within the DTD or the document, you use architecture control attributes to perform the actual mappings.

Architecture Base Declaration

The architecture base declaration is simply a processing instruction beginning with the string XML-ArcBase followed by the name of the base architecture:

<?XML-ArcBase biblio?>

You use a separate declaration for each base architecture. If you wish to customise the architectural processing in some way, or to require the use of a meta-DTD for strong validation, you can specify architecture support variables after the base architecture name:

<?XML-ArcBase biblio 
  ArcURL="http://www.w3.org/dtds/biblio.dtd"?>

Architecture Control Attributes

There are four architecture control attributes that you can use to control mapping within the DTD or document. In many cases, you will need to be concerned only with the architectural form attribute, but the others are available for more complex applications.

Architectural form attribute
Map an element to an architectural form. By default, this attribute has the same name as the base architecture, but you can change it with the ArcFormA architecture support variable.
Architectural attribute renamer attribute
Map an element's attributes to attributes with different names in the base architecture. The value is a list of pairs of names: the first in each pair is the name of an attribute form in the base architecture, and the second is the name of an actual attribute in the client document. By default, there is no attribute renaming unless you provide an attribute name with the ArcNamrA architecture support variable.
Architectural suppressor attribute
Suppress or enable architectural processing for an element's children. This attribute may have one of the following values:
sArcAll
Suppress all architectural processing, even if a descendant attempts to re-enable it.
sArcForm
Suppress all architectural processing, unless a descendant explicitly re-enables it.
sArcNone
Re-enable architectural processing if possible.
You may suppress architectural processing only if you specify an attribute name with the ArcSuprA architecture support variable.
Architecture ignore data attribute
Suppress or enable data recognition within an element and its descendants. This attribute may have one of the following values:
nArcIgnD
Never ignore data. When validating against the meta-DTD, report an error if data appears where it is not allowed.
cArcIgnD
Conditionally ignore data. When using a meta-DTD, recognise data only where the DTD allows it; when not using a meta-DTD, ignore data only within non-architectural elements.
ArcIgnD
Always ignore data.
You may suppress or enable data recognition only if you specify an attribute name with the ArcIgnDA architecture support variable.

Architecture Support Variables

Within the architecture base declaration, you may specify any the following architecture support variables (the first four configure the architecture control attributes):

ArcFormA
Specify the name of the architectural form attribute (defaults to the name of the base architecture).
ArcNamrA
Specify the name of the architectural form attribute renamer attribute (none by default).
ArcSuprA
Specify the name of the architectural suppressor attribute (none by default).
ArcIgnDA
Specify the name of the architecture ignore data attribute (none by default).
ArcDocF
Specify the name of the root element in the base architecture (defaults to the name of the base architecture).
ArcURL
Specify a URL for the base architecture's meta-DTD, and require the processing software to parse the meta-DTD to obtain defaulted attribute values and other information (see ArcDTD as well).
ArcDTD
Specify the name of an entity containing the base architecture's meta-DTD, and require the processing software to parse the meta-DTD to obtain defaulted attribute values and other information (if both ArcURL and ArcDTD are specified, ArcURL takes precedence; if neither is specified, the processing software will not use a meta-DTD).
ArcDataF
Specify the name of a default notation to use for data entities whose notations do not appear in the base architecture (by default, those entities will be ignored if the processing software is reading the meta-DTD).
ArcBridF
Specify an architectural form to use for mapping elements with IDs, when those elements would otherwise be ignored (by default, those elements will not be mapped).
ArcAuto
If the value is ArcAuto, automatically map all elements to architectural forms with the same name, unless they have explicit mappings of their own. If the processing software is using the meta-DTD, map only elements with names that appear in the meta-DTD. If the value is nArcAuto, perform no automatic mapping except for the root element (the default value is ArcAuto).
ArcOptSA
Specify the names of additional, architecture-specific support variables that can be included in the architecture base declaration (by default, allow the ArcOpt variable given below).
ArcOpt
Specify a list of parameter entities that should be assigned the value INCLUDE in the meta-DTD (useful only together with ArcURL or ArcDTD).

David Megginson <dmeggins@microstar.com>