http://www.personal.u-net.com/~sgml/af.htm; please refer to the canonical source document if possible.]

The Role of Architectural Forms in XML/EDI Applications

Martin Bryan, The SGML Centre

This paper suggests how some of the Architectural Form Definition Requirements expressed in Annex A: SGML Extended Facilities of ISO/IEC 10744:1997 could be used to simplify the creation and management of XML document type definitions (DTDs) that are designed to be used for business-to-business electronic data interchange (EDI).

Background to architectural forms

Architectural forms (AFs) were introduced to SGML in 1997 to allow the development of meta-DTDs that could be used to create classes of elements which could share common processing characteristics. There are four basic types of AFs:

Element type architecutural forms, which define a meta-model for a particular class of elements.
Attribute type architectural forms, which can be used to assign process control attributes to specific types of elements.
Notation type attributes, used to identify notation processors that understand how to handle specific meta-data classes.
Data attribute type architectural forms, which can be used to assign processing control attributes to notation processors.

Note: As data attributes were not included in the set of SGML features selected for the Extensible Markup Language (XML) it is not possible to use all the features of SGML architectural forms within an XML application.

The formal definition of SGML Architectural Forms (AFs) is based on the use of an identifying processinng instructrion at the start of the document prolog and the use of notation declarations that identify each architecture. Architecture support declarations are based on the use of data attributes associated with individual notations, but as noted above this functionality cannot be used in an XML environment.

The processing instruction that identifies the use of SGML AFs has the form:

  <?ISO10744 ArcBase AF1 AF2 AF3>

Notations designed to be used to manage architectures can be identified by the presence of the keyword AFDR in the public identifier of the notation. For example:

  <!NOTATION AF1 PUBLIC "-//www.myco.org//NOTATION AFDR New AF Processor//EN">

Note: The full AFDR specification provides a set of data attributes that can be used to manage many aspects of AF processing, including methods for avoiding name clashes between architectures. As these facilities cannot be used in an XML environment they are not described here.

The role of namespaces to identify XML-based architectural forms

The XML Namespaces extension to XML provides XML-based implementations with a simpler mechanism for identifying the use of architectural forms that was not available to the designers of the SGML AFDR extensions.

Note: As a large number of applications of XML are now based on the use of namespaces it is expected that most XML tools will support namespaces.

An XML namespace declaration uses a reserved attribute to associate a name qualifier with a particular Internet URL. This URL identifies where the user should go to find rules for processing information identified as belonging to the namespace. For example, to associate the namespace MTB with this document you would add the following attribute definition to DTDs based on rules specified in this document:

  <!ATTLIST doctype-x xmlns:MTB CDATA #FIXED "http://www.sgml.u-net.com/af.htm">

Namespaces can be used to qualify both element and attribute names. They can, therefore, be used to create sets of XML elements and attributes that correspond to a particular notation, without requiring a separate notation declaration for the AF, or the use of a special processing instruction that would only have meaning to XML tools that have been extended to use the SGML Extended Facilities options.

The role of attribute type architectural forms

By assigning a namespace qualifier to one or more of the attributes in an XML DTD you can indicate that these attributes are to be processed applying rules specified in a separate document. These rules can be used by multiple DTDs, and can be associated with specific processes that can be accessed through the xsl:import option of the Extensible Stylesheet Language (XSL).

The following example shows how the rules specified in the ISO Basic Semantics Register can be associated with an XML message that contains a SIMPL-EDI order:

  <!ELEMENT Order (MessageID, Date+, RefersTo*,
                           Buyer, Supplier, OtherParty+, Item+) >
  <!ATTLIST Order
            xmlns:EDIFACT        CDATA   #FIXED   "http://www.unece.org/trade/untdid/"
            xmlns:BSR            CDATA   #FIXED   "http://www.iso.ch/BSR/"
            EDIFACT:Name         CDATA   #FIXED   "UNH"
            BSR:Name             CDATA   #FIXED   "DocumentType Concept=828"
            BSR:Attributes       CDATA   #FIXED   "SequenceNo  Document.Identifier BSU2140
                                                   UN-ID       DocumentType.Version BSU1234"
            SequenceNo           CDATA   #IMPLIED
            UN-ID                CDATA   #FIXED   "ORDERS:D:96A:UN:SIMP01">

As this Order is the outermost element of this DTD two namespaces are defined at the start of the attribute list. The first indicates that elements or attributes whose name is qualified by EDIFACT: are to follow the rules laid down in the UN Trade Data Interchange Directory. The second indicates that elements and attributes whose names are qualified by BSR: conform to the rule specified in the ISO Basic Semantics Register (BSR).

The referenced ISO BSR doument will have defined two attributes that make up the BSR architectural form:

The Name attribute is used to identify the name assigned within the register to the ISO BSR equivalent of this element. In this example that name is qualified by a name/property pair which indicates that the element, while conforming to the general class of a Document Type, also conforms to the subclass whose concept identifier is 828, which is the BSR identifier for a purchase order.
The Attributes will identify which of the unqualified attributes for the element can be mapped to equivalents in the BSR. The rules for defining this attribute require that each such attribute be identified using a triplet that defines:
- the local name of the attribute
- the name of the equivalent BSR construct
- the unique identifier assigned to that construct in the ISO BSR database.

Using these two attributes application designers can develop tools that can, for example:

ask for explanatory information on the role of these attributes from the ISO BSR
validate that the contents of the element/attribute conform to any datatype restrictions specified in the ISO BSR
load elements of the particular type into an appropriate database in a format that will allow it to be shared by all applications who use this type of ISO-defined data.

Note that the role of these two architectural form attributes is to decouple the laconic local naming conventions used within this targetted DTD from the more verbose standardized naming conventions that need to be used to uniquely identify the constructs in the wider ranging general-purpose register.

XML allows more than one attribute list declaration to be associated with each element. This facility can be used to manage the association of attribute type architectural form definitions with XML elements. The following example, taken from the CEN TC251 ProvideEchr healthcare informatics DTD, illustrates this point:

  <!ELEMENT RelatedDate  (#PCDATA) >
  <!ATTLIST RelatedDate
            Cen251:Type CDATA                   #FIXED "TOCD"
            Accuracy    (ACCURATE|APPROXIMATE)  #IMPLIED
            Role        (*ROLES)                #REQUIRED    >
  <!ATTLIST RelatedDate
            Cen251:Noi  CDATA                   #FIXED "1478:related date" >
  <!--A date and time, other than the originating date and time, which
      is related to an EHCR message component. The relevance of the date is
      specified by the attribute related date role.-->

In this example there are two attributes qualified by the Cen251: namespace identifier. The first one, defining the type of the date as the type defined as TOCD in the specification identified by the URL assigned to the namespace, forms part of the main attribute definition list. The second one, identifying the number and name assigned to this object in the CEN TC251 data model, is defined as a separate attribute list declaration. The advantage of this is that the set of mappings can then be maintained in a separate file, which need only be invoked for those applications that need it. Typically this would be done through a reference to an external parameter entity, which could take the following form:

  <!DOCTYPE ProvideEhcr PUBLIC "+//CEN TC251//DTD Provide Healthcare Version 10//EN"
   [<!ENTITY % Help-links SYSTEM "Cen251-names.ent">
    %Help-links;
   ]>

The role of element type architectural forms

In an XML environment, namespaces are used to identify the XML equivalent of SGML element type architectural forms. By associating a namespace identifier with an element type name a DTD can be linked to a meta-DTD or to an XML schema. While XML applications will have no built-in processes for validating a namespace against a meta-DTD definition that conforms to the rules in the SGML Extended Facilities annex, the fact that the element name has been so qualified can be used to identify shared processing rules for elements.

Typically the presence of a namespace qualified element within a DTD will indicate the need to import one or more sharable processing modules into an XSL specification or to invoke one or more processes within the application. For example, if an XML document includes a diagram encoded using the XML-based W3C Scalable Vector Graphics (SVG) specification, the SVG namespace could be declared in the root element and then used at appropriate points, as shown in the following simplified example:

  <?xml version="1.0"?>
  <AccidentReport
     xmlns="http://www.insurance.com/Accidents/Reports"
     xmlns:svg="http://www.w3.org/Graphics/SVG/1.0">
  <!-- Elements in the parent namespace go here -->
  <svg:svg width="40%" height="40%">
     <svg:rectangle width="43.6" height="31.5"/>
     <!-- Other elements of the SVG graphic go here -->
   </svg:svg>
   <!-- Rest of document in parent namespace goes here -->
   </AccidentReport>

Whilst not strictly conformant with the SGML rules, as namespaces are not linked directly to notation processors in the way specified in the Architectural Form Definition Requirements, this mechanism does, nevertheless, conform with the spirit of the AFDR specification in that it clearly assoiates specific types of elements with specific processes. This is the underlying philosophy behind all SGML architectural forms.

Relationship between architectural forms and XML schemas

The first draft of the XML Schemas: Structure and XML Schemas: Datatypes issued in May 1999 suggest alternative mechanisms for a) defining SGML data models and b) associating data types with elements and attributes. The techniques proposed in these papers, however, result in a much more verbose specification format than that proposed in this paper. For example, the Order element defined above would have the following format (in its minimalist form) when defined in an XML Schema:

  <import schemaAbbrev='BSR'
          schemaName='http://www.iso.ch/BSR/schema1.xsd'
          datatypes='true' />
  <elementType name='Order'>
   <sequence>
    <elementTypeRef name='MessageID'/>
    <elementTypeRef name='Date' minOccur='1' maxOccur='3'/>
    <elementTypeRef name='RefersTo' minOccur='0' maxOccur='10'/>
    <elementTypeRef name='Buyer'/>
    <elementTypeRef name='Supplier'/>
    <elementTypeRef name='OtherParty' minOccur='1' maxOccur='97'/>
    <elementTypeRef name='Item' minOccur='1' maxOccur='200000'/>
   </sequence>
   <attrDecl name='Name' schemaName='http://www.unece.org/trade/untdid/schema2.xsd'>
    <fixed>UNH</fixed>
   </attrDecl>
   <attrDecl name='Name' schemaName='http://www.iso.ch/BSR/schema3.xsd'>
    <datatypeRef name='ObjectID' schemaAbbrev='BSR'/>
    <fixed>DocumentType Concept=828</fixed>
   </attrDecl>
   <attrDecl name='Attributes' schemaName='http://www.iso.ch/BSR/schema3.xsd'>
    <datatypeRef name='AttributeTypes' schemaAbbrev='BSR'/>
    <fixed>SequenceNo Document.Identifier BSU2140
           UN-ID DocumentType.Version BSU1234</fixed>
   </attrDecl>
   <attrDecl name='SequenceNo'>
    <datatypeRef name='integer'/>
   </attrDecl>
   <attrDecl name='UN-ID'>
    <fixed>ORDERS:D:96A:UN:SIMP01</fixed>
   </attrDecl>
  </elementType>

Whilst the new specifications will doubtless form the basis for a new series of datatype-aware XML tools their adoption in the near-term is problematical. It will be some time before schema-aware XML parsers become available. Even when they are available not all XML parsers will be schema-aware. While schema-aware parsers are also likely to be DTD-aware validating XML parsers, not all DTD-aware validating parsers will also be schema-aware.

One of the key advantages of using AFs rather than schemas is that the techniques proposed in this paper can be used by any DTD-aware XML parser, whether or not it is a fully validating parser. For example, the parser used in Internet Explorer 5.0 is DTD-aware, but is not a validating parser. A secondary advantage is that XSLT is based on the data passed to the application by a DTD-aware parser: it currently has no facilities for using information stored in a schema to manage the transformation. The AF approach ensures that the required information is available with the specifications as they stand today, rather than waiting for the next update of the relevant standards, and the harmonization of the many specifications involved, which is likely to take us into the next century.