This paper suggests how some of the Architectural Form Definition Requirements expressed in Annex A: SGML Extended Facilities of ISO/IEC 10744:1997 could be used to simplify the creation and management of XML document type definitions (DTDs) that are designed to be used for business-to-business electronic data interchange (EDI).
Architectural forms (AFs) were introduced to SGML in 1997 to allow the development of meta-DTDs that could be used to create classes of elements which could share common processing characteristics. There are four basic types of AFs:
Note: As data attributes were not included in the set of SGML features selected for the Extensible Markup Language (XML) it is not possible to use all the features of SGML architectural forms within an XML application.
The formal definition of SGML Architectural Forms (AFs) is based on the use of an identifying processinng instructrion at the start of the document prolog and the use of notation declarations that identify each architecture. Architecture support declarations are based on the use of data attributes associated with individual notations, but as noted above this functionality cannot be used in an XML environment.
The processing instruction that identifies the use of SGML AFs has the form:
<?ISO10744 ArcBase AF1 AF2 AF3>
Notations designed to be used to manage architectures can be identified by
the presence of the keyword AFDR
in the public identifier of the notation. For
example:
<!NOTATION AF1 PUBLIC "-//www.myco.org//NOTATION AFDR New AF Processor//EN">
Note: The full AFDR specification provides a set of data attributes that can be used to manage many aspects of AF processing, including methods for avoiding name clashes between architectures. As these facilities cannot be used in an XML environment they are not described here.
The XML Namespaces extension to XML provides XML-based implementations with a simpler mechanism for identifying the use of architectural forms that was not available to the designers of the SGML AFDR extensions.
Note: As a large number of applications of XML are now based on the use of namespaces it is expected that most XML tools will support namespaces.
An XML namespace declaration uses a reserved attribute to associate a
name qualifier with a particular Internet URL. This URL identifies where
the user should go to find rules for processing information identified as
belonging to the namespace. For example, to associate the namespace MTB
with
this document you would add the following attribute definition to DTDs based
on rules specified in this document:
<!ATTLIST doctype-x xmlns:MTB CDATA #FIXED "http://www.sgml.u-net.com/af.htm">
Namespaces can be used to qualify both element and attribute names. They can, therefore, be used to create sets of XML elements and attributes that correspond to a particular notation, without requiring a separate notation declaration for the AF, or the use of a special processing instruction that would only have meaning to XML tools that have been extended to use the SGML Extended Facilities options.
By assigning a namespace qualifier to one or more of the attributes in an
XML DTD you can indicate that these attributes are to be processed applying
rules specified in a separate document. These rules can be used by
multiple DTDs, and can be associated with specific processes that can be
accessed through the xsl:import
option of the Extensible
Stylesheet Language (XSL).
The following example shows how the rules specified in the ISO Basic Semantics Register can be associated with an XML message that contains a SIMPL-EDI order:
<!ELEMENT Order (MessageID, Date+, RefersTo*, Buyer, Supplier, OtherParty+, Item+) > <!ATTLIST Order xmlns:EDIFACT CDATA #FIXED "http://www.unece.org/trade/untdid/" xmlns:BSR CDATA #FIXED "http://www.iso.ch/BSR/" EDIFACT:Name CDATA #FIXED "UNH" BSR:Name CDATA #FIXED "DocumentType Concept=828" BSR:Attributes CDATA #FIXED "SequenceNo Document.Identifier BSU2140 UN-ID DocumentType.Version BSU1234" SequenceNo CDATA #IMPLIED UN-ID CDATA #FIXED "ORDERS:D:96A:UN:SIMP01">
As this Order
is the outermost element of this DTD two
namespaces are defined at the start of the attribute list. The first indicates
that elements or attributes whose name is qualified by EDIFACT:
are to follow the rules laid down in the UN Trade Data Interchange Directory.
The second indicates that elements and attributes whose names are qualified
by BSR:
conform to the rule specified in the ISO Basic Semantics
Register (BSR).
The referenced ISO BSR doument will have defined two attributes that make up the BSR architectural form:
Name
attribute is used to identify the name assigned
within the register to the ISO BSR equivalent of this element. In this example
that name is qualified by a name/property pair which indicates that the
element, while conforming to the general class of a Document Type, also
conforms to the subclass whose concept identifier is 828, which is the
BSR identifier for a purchase order.
Attributes
will identify which of the unqualified
attributes for the element can be mapped to equivalents in the BSR. The rules
for defining this attribute require that each such attribute be identified
using a triplet that defines:Using these two attributes application designers can develop tools that can, for example:
Note that the role of these two architectural form attributes is to decouple the laconic local naming conventions used within this targetted DTD from the more verbose standardized naming conventions that need to be used to uniquely identify the constructs in the wider ranging general-purpose register.
XML allows more than one attribute list declaration to be associated with each element. This facility can be used to manage the association of attribute type architectural form definitions with XML elements. The following example, taken from the CEN TC251 ProvideEchr healthcare informatics DTD, illustrates this point:
<!ELEMENT RelatedDate (#PCDATA) > <!ATTLIST RelatedDate Cen251:Type CDATA #FIXED "TOCD" Accuracy (ACCURATE|APPROXIMATE) #IMPLIED Role (*ROLES) #REQUIRED > <!ATTLIST RelatedDate Cen251:Noi CDATA #FIXED "1478:related date" > <!--A date and time, other than the originating date and time, which is related to an EHCR message component. The relevance of the date is specified by the attribute related date role.-->
In this example there are two attributes qualified by the Cen251:
namespace identifier. The first one, defining the type of the date as
the type defined as TOCD
in the specification identified
by the URL assigned to the namespace, forms part of the main attribute
definition list. The second one, identifying the number and name
assigned to this object in the CEN TC251 data model, is defined as a
separate attribute list declaration. The advantage of this is that
the set of mappings can then be maintained in a separate file, which need
only be invoked for those applications that need it. Typically this would
be done through a reference to an external parameter entity, which could
take the following form:
<!DOCTYPE ProvideEhcr PUBLIC "+//CEN TC251//DTD Provide Healthcare Version 10//EN" [<!ENTITY % Help-links SYSTEM "Cen251-names.ent"> %Help-links; ]>
In an XML environment, namespaces are used to identify the XML equivalent of SGML element type architectural forms. By associating a namespace identifier with an element type name a DTD can be linked to a meta-DTD or to an XML schema. While XML applications will have no built-in processes for validating a namespace against a meta-DTD definition that conforms to the rules in the SGML Extended Facilities annex, the fact that the element name has been so qualified can be used to identify shared processing rules for elements.
Typically the presence of a namespace qualified element within a DTD will indicate the need to import one or more sharable processing modules into an XSL specification or to invoke one or more processes within the application. For example, if an XML document includes a diagram encoded using the XML-based W3C Scalable Vector Graphics (SVG) specification, the SVG namespace could be declared in the root element and then used at appropriate points, as shown in the following simplified example:
<?xml version="1.0"?> <AccidentReport xmlns="http://www.insurance.com/Accidents/Reports" xmlns:svg="http://www.w3.org/Graphics/SVG/1.0"> <!-- Elements in the parent namespace go here --> <svg:svg width="40%" height="40%"> <svg:rectangle width="43.6" height="31.5"/> <!-- Other elements of the SVG graphic go here --> </svg:svg> <!-- Rest of document in parent namespace goes here --> </AccidentReport>
Whilst not strictly conformant with the SGML rules, as namespaces are not linked directly to notation processors in the way specified in the Architectural Form Definition Requirements, this mechanism does, nevertheless, conform with the spirit of the AFDR specification in that it clearly assoiates specific types of elements with specific processes. This is the underlying philosophy behind all SGML architectural forms.
The first draft of the XML Schemas: Structure and XML Schemas: Datatypes
issued in May 1999 suggest alternative mechanisms for a) defining SGML data
models and b) associating data types with elements and attributes. The
techniques proposed in these papers, however, result in a much more verbose
specification format than that proposed in this paper. For example, the
Order
element defined above would have the following format
(in its minimalist form) when defined in an XML Schema:
<import schemaAbbrev='BSR' schemaName='http://www.iso.ch/BSR/schema1.xsd' datatypes='true' /> <elementType name='Order'> <sequence> <elementTypeRef name='MessageID'/> <elementTypeRef name='Date' minOccur='1' maxOccur='3'/> <elementTypeRef name='RefersTo' minOccur='0' maxOccur='10'/> <elementTypeRef name='Buyer'/> <elementTypeRef name='Supplier'/> <elementTypeRef name='OtherParty' minOccur='1' maxOccur='97'/> <elementTypeRef name='Item' minOccur='1' maxOccur='200000'/> </sequence> <attrDecl name='Name' schemaName='http://www.unece.org/trade/untdid/schema2.xsd'> <fixed>UNH</fixed> </attrDecl> <attrDecl name='Name' schemaName='http://www.iso.ch/BSR/schema3.xsd'> <datatypeRef name='ObjectID' schemaAbbrev='BSR'/> <fixed>DocumentType Concept=828</fixed> </attrDecl> <attrDecl name='Attributes' schemaName='http://www.iso.ch/BSR/schema3.xsd'> <datatypeRef name='AttributeTypes' schemaAbbrev='BSR'/> <fixed>SequenceNo Document.Identifier BSU2140 UN-ID DocumentType.Version BSU1234</fixed> </attrDecl> <attrDecl name='SequenceNo'> <datatypeRef name='integer'/> </attrDecl> <attrDecl name='UN-ID'> <fixed>ORDERS:D:96A:UN:SIMP01</fixed> </attrDecl> </elementType>
Whilst the new specifications will doubtless form the basis for a new series of datatype-aware XML tools their adoption in the near-term is problematical. It will be some time before schema-aware XML parsers become available. Even when they are available not all XML parsers will be schema-aware. While schema-aware parsers are also likely to be DTD-aware validating XML parsers, not all DTD-aware validating parsers will also be schema-aware.
One of the key advantages of using AFs rather than schemas is that the techniques proposed in this paper can be used by any DTD-aware XML parser, whether or not it is a fully validating parser. For example, the parser used in Internet Explorer 5.0 is DTD-aware, but is not a validating parser. A secondary advantage is that XSLT is based on the data passed to the application by a DTD-aware parser: it currently has no facilities for using information stored in a schema to manage the transformation. The AF approach ensures that the required information is available with the specifications as they stand today, rather than waiting for the next update of the relevant standards, and the harmonization of the many specifications involved, which is likely to take us into the next century.