The role of attributes in context determinancy

[From: http://www.personal.u-net.com/~sgml/contexts.doc.]

The role of attributes in context determinancy
Martin Bryan, The SGML Centre

One of the most commonly asked questions in the SGML/XML world relates to when you should use attributes rather than elements to store data. This paper suggests that one of the primary reasons for using attributes should be the need to control the contexts in which elements are processed.

There are three main types of attributes:

Those relating to element identification (ID and IDREF type attributes, and those attributes of type CDATA that have application-specific identification rules, such as the name attribute of the A element in HTML)
Those containing tokens that identify one or more contexts in which the element applies, or which identify one or more options to be used during processing of the element (entity names, notation names, name tokens or values from a predefined set of tokens)
Those tokens that carry data to be used as part of the application (typically CDATA type attributes).

This paper does not concern itself with the latter type of attribute, and only mention the first type in passing. It concentrates on the role of tokenised attributes in controlling the processing of elements.

Types of tokenised attributes

While SGML distinguishes between numbers and number tokens, and names and name tokens, XML restricts itself to using name tokens. Both SGML and XML provide separate rules for the identification and processing of named entities and notations, and for defining lists of permitted values expressed as name tokens.

With the exception of notations and IDs, there is a plural form of each token type that permits multiple tokens to be specified, with one or more space, tab or line ending codes between adjacent tokens. Because of this name tokens cannot contain spaces.

The values of notation and entity attributes are restricted to the set of names assigned to notations and entities within the accompanying DTDs. To allow attribute minimization SGML also requires that the same name token only appear in one enumerated list within each attribute list declaration. Whilst the XML specification does not enforce this latter restriction it does recommend it be followed.

The roles of attributes

Attributes should indicate the type of processing that an SGML or XML element requires. In some cases this is obvious. For example, notation attributes clearly define the coding of the data within the element, and so clearly control the processing of the contents. Similarly entity attributes clearly identify external, unparsed, entities that will need to be processed according to the rules applicable to the notation defined in the entity declaration.

The role of ID and IDREF type attributes is also clearly related to processing, in that IDREF attributes must refer to ID attributes of the same type, and IDs can be used to select specific instances of an element that need to be processed in a different manner from others.

For other forms of name token the link between processing and the attribute value may be less clear. In many cases it is not the processing of the data by an SGML or XML parser that the attribute is designed to control, but some processing to be undertaken at a later stage in the application-dependent process. In such cases the need to use attribute values rather than subelements to record the controlling property is less clear, but still relevant. To understand why this is so, you need to understand how attributes can control context.

How attributes determine context

The XML Path specification helps to make clear the relationship between attributes and elements. An XML Path consists of a series of element names optionally qualified by conditional tests, which can include tests of attribute values or the contents of elements. For example, to select all pictures within a report that are wider than 600 pixels for printing landscape you could use an XML Path of the form:

report//picture[@width>600]

Attribute values can also be used to control the processing of elements lower in the hierarchy. The following example shows how it is possible to identify paragraphs within a chapter identified by attributes as being part of an annex so that they can be processed in a different way from paragraphs in normal chapters:

book/chapter[@type=annex]//para

The same technique can have more general application. For example, a generalized structure for a message can be used to drive multiple different processes. Each process can generate its own set of rules for the presentation or processing of the data. For example, an Order message could have a number of qualifying types assigned to it using an attribute declared in the following manner:

<!ATTLIST Order type (Purchase|Production|Delivery) "Purchase">

A series of processing rules can then be defined, based on different attribute values. These rule can apply to different levels in the message. For example, rules could be defined for the following:

Order[@type=Purchase]

Order[@type=Production]/[Supplier="ABC Supplies"]

Order[@type=Delivery]/Item/[PartNo="123526252"]

The first rule would affect the whole of each Purchase Order. The second would only affect those Production Orders specifically identified as being for ABC Supplies. The third would apply to Delivery Orders that included a request for an item with a particular part number.

Using attributes for property inheritance

One of the criticisms that programmers often make of SGML and XML elements is that they do not exhibit class inheritance of the type used by structured computer programming language objects. Whilst this is, strictly speaking, true it fails to take into account the advantages that a hierarchical structure brings to structured data. Within any XML element it is possible to refer to the attribute value of any parent element as part of any processing that is required. Because such values can automatically be referenced directly there is no need to explicitly declare type inheritance within SGML or XML data models: inheritance can be implied from the model.

XML Path statements need not, however, be restricted to controlling the transformation of XML elements within XSLT statements. They can be applied anywhere that is appropriate for the application. Where inheritance needs to be explicitly specified an XML Path statement can be used as part of an XPointer definition to record the inheritance within a Document Type Definition (DTD) or XML Schema:

<!ATTLIST element-x

property-a %XPointer; #FIXED "xpointer(../@property-a)">

Using this technique inheritance need not be confined to immediate parents, or even to direct parents, as the following examples suggest:

<!ATTLIST element-y

type %XPointer; #FIXED

"xpointer(ancestor::(chapter|section)[@type][last()]/@type)">

<!ATTLIST payment

source %XPointer; #FIXED "xpointer(/Order/buyer/country/@code)">

By using XML pointers to reference instance-specific data of the type shown in the last example it is possible to control processes indirectly through the contents of documents, as well as directly through the structures defined within the DTD.

As users become aware of the benefits of using attributes to control the processing of specific parts of an XML message they will come to appreciate the power that XPaths provides to those seeking to process subcomponents of a structured data stream of the type provided in XML-encoded documents.