Use cases for DSDL

[Note: This document was posted to the RELAX-NG mailing list by Martin Bryan (The SGML Centre). BSI IST/41 on 2001-10-05. Comment: "In talking to James Clark earlier today about the relationship between RELAX NG and the proposed new ISO Document Structure Definition Language (DSDL) James asked if I could provide some use cases that would justify the initial set of requirements that the DSDL proposal contained. The attached document starts by listing the requirements identified as being essential for DSDL, and then provides a set of use case statements that seeks to justify each requirement. It also contains brief use cases for supporting three optional features of SGML that are not supported by XML, and not listed as being requirements for DSDL, but for which cases can be made within data streams being used by businesses... See: "Document Schema Definition Language (DSDL) Proposed as ISO New Work Item." (June 12, 2001).]

Use cases for DSDL

The requirements statement in the DSDL new work item proposal lists the following initial set of requirements:

The standard shall provide a means of expressing, in SGML/XML instance format, all of the markup declarations permitted by the WebSGML profile of ISO/IEC 8879 and in Version 1.0 of the W3C Extensible Markup Language (XML)
The standard shall in particular provide a means of expressing markup declarations that have the following functions:
- identifying external data resources that may validly be included within document instances that conform to the model, including data instances that are in notations other than that defined in this standard (external general entity declarations)
- importing parts of models from external sources (external parameter entity declarations)
- specifying the identification of notations required to process those parts of document instances that are not encoded according to the standard (notation declarations)
This standard shall allow the representation within document instances of data in a clearly identified notation that is not intended to be processed by programs that are conformant with this standard.
The standard shall provide a means of constraining the number of times a particular element may occur at a given point in a document model to be within a range with specified minimum and/or maximum values.
The standard shall provide a means of identifying the character set to be used to constrain the contents of elements or attributes.
The standard shall provide a means of constraining the content of attribute values and elements to conform to a particular datatype or pattern based on a formally named, standardized, set of datatyping rules.
The standard shall provide a means of identifying a set of permitted values against which the content of a particular element or attribute value shall be checked for validity. The set of permitted values may be provided as an external resource, or by reference to an external service using a standardized API.
The standard shall provide a means by which any part of a model may have alternative specifications that are conditional upon the outcome of evaluating pattern-matching tests upon a document instance. Changes may be made to the model group of the current element or to that of any of its children, the list of attributes that can be associated with the current element, to the datatype specification of the element's contents or the value of one or more of its attributes, or to the default value to be assigned to an attribute or the contents of the element if none is supplied. Changes may be specified as part of the contents of the definition, or by reference to an external entity that is to be incorporated at the time a positive test is completed.
The standard shall provide facilities for defining "model types" that can form the basis for the models of elements in multiple document type definitions in such a way that users can restrict the use of parts of the model and add application-specific elements to the models at those points at which they are appropriate.
The standard shall provide a means by which the authority responsible for defining part or all of a document structure can be uniquely identified, with elements defined by different authorities being identifiable as such within document instances.
The standards shall provide a means by which sections of a document structure can be temporarily disabled without having to define a new document structure.
The standard shall provide a means by which the rationale for an element, attribute or other information component can be recorded as an annotation to its declaration
The standard shall be designed in such a way that it can be extended to include the functions of ISO/IEC 8879 not included in the normative part of this standard.

This document seeks to provide user cases for each of these requirements and to suggest some other options it might be nice to be able to meet..

Requirement 1 - Use of Instance Format for Document Structure Definition

Users have complained that having to learn one language for reading/writing DTDs and another for reading/writing document instances is confusing. The new range of "schema languages" that have replaced DTDs are all described as document instances that can be parsed and transformed using parsing techniques in general use rather than DTD specific parsers. DSDL document type specificiations should be presentable as a valid XML instance.

Requirement 2 - External Entity Declaration using Document Instance Notation

Users need a mechanism for importing well-controlled sets of data into document instances, and for incorporating already defined elements and their attribute definitions into document type declarations. Where data being embedded into instances is not conformant with the standard the relevant notation should be clearly identified in a way that allows the relevant rendering service to be invoked. (This may include a reference to the source of software capable of providing the rendering service in a stated environment.)

Entity declarations should not require the use of a separate Doctype parser, or of special notation. It should be sufficient to identify the processes being controlled via a namespace declaration.

Note: The same mechanism would probably also be able to associate parsing constraints with elements in instances.

Requirement 3 - Switching off parsing

It should be possible to incorporate data that is not to be parsed in an instance, rather than having to store it as an external entity. This is necessary to allow instances to document valid code sequences, and to allow for data streams that use sequences what would be valid within code streams. Such data may not be structured, or may be structured using an undefined language. It should be possible to identify areas of the data stream that do not need parsing, and to associate with such areas information that can be used to identify the type of rendering module it is intended to be processed by.

Requirement 4 - Constraining the number of occurrences

Many e-commerce applications require an upper or lower limit to be applied to the number of times an element can be ordered. For example, a book club may require customers to order no less than 4 books when they renew their membership, or no more than 3 books when taking advantage of a special offer. In some cases the rules state that members must order at least 3 items and no more than 20. While it is difficult to place an upper limit for maximum occurrences (1023 would seem reasonable in the majority of cases) it should be possible to come up with some form of system declaration that could be used to constrain the maximum values.

Requirement 5 - Character set for element content

For compatibility with backend applications it may be necessary to restrict the character set used for element contents or attributes. For example, the backend system may only support the ISO 8859 or EBCDIC character set and be unable to handle Japanese or Chinese Unicode characters. Forms should be able to constrain the set of characters that are accepted in response to a question.

Requirement 6 - Constraining element and attribute contents

The document type declaration must be able to fully declare the constraints to be placed on element and attribute contents, either in terms of one of a predefined sets of permitted values, a range of permitted numeric values (e.g. positive integers, or fractions that are restricted to .25, .50 and .75), data that conforms to formally declared patterns (e.g. those for ISBN numbers) and data that conforms to specific datatypes (e.g. dates, times, currency values, etc.

Requirement 7 - External value verification

It should be possible for the validation routine to invoke another process to check the validity of an attribute value or element content. For example, the set of permitted values may be stored in a database under the control of another organization, which is responsible for its validation. Examples of such elements include those recording DUNS number, numeric code identifiers, credit cards numbers, etc. In some cases the validating system may be required to respond by providing additional information for use in place of, or alongside, the input data in the resulting application. For example, transmitting a DUNS number may require the presentation tool to provide details of the name of the referenced organization as returned from the external resource to be displayed in place of the entered data.

Requirement 8 - Conditional models

The classic use case for this occurs within administration data within hospitals, government departments, etc, where different responses are required for men and women, and for married and unmarried people. For example you don't need to ask men or young children questions related to pregnancy, or children under 16 about their marital status. It should be possible to check that when one element has as a specific answer, or answer within a predefined range, that another element either has a value in a relevant range, or that it is not present.

Requirement 9 - Types

Extensible "models" of sets of elements are required in many business applications. For example, postal addresses are a commonly used feature of data models that should not have to be redeclared in each model. It should be possible to declare submodels in such a way that any model that refers to them is automatically updated whenever the submodel is updated. This requirement extends the basic import mechanism defined by Requirement 2 by requiring that it should be possible to apply modifying constraints to the imported definitions, and to be able to specify points in the imported model where other constructs are to be added (for example, by stating that at a particular point in the imported model another element or attribute definition is to be added to the type definition).

Requirement 10 - Authority to modify submodels

Sometimes alterations to specific parts of document models rests with specific bodies. For example, the format for specifying tax information in an invoice may be fixed by a government. In such cases the applying of extensions to imported submodels may need to be inhibited, or restricted to the organization responsible for the maintenance of that part of the data stream. Alternatively there may simply need to be a mechanism for recording who is responsible for the submodel, e.g. by assigning it a namespace.

Requirement 11 - Temporary sections

Some data may only be collected for the purposes of commenting on data streams, or for viewing under controlled circumstances. Examples of such data include annotations, confidential data only accessible by staff with a specific clearance level, data that has been deleted from the latest version of a file, or data that will only become valid on a certain date. Mechanisms for uniquely identifying such data should include means to pass the identifier off to an external routine to determine whether or not to include the temporary section within the displayed data.

Requirement 12 - Annotations

Document type definitions need to be self documenting, and to contain appropriate version identification information. Such data should be specifiable in instance format, rather than having to be specified using comment declarations. It should be possible insert annotations within any element used to declare document structure, so that individual attributes, and individual attribute values specifications can have annotating information associated with their definitions.

Requirement 13 - Modularity

Not all applications will support all features. Elements that identify specified features of ISO 8879 should be separately specifiable. Other features should also be dependent on local support for features such as datatypes, notation processing, etc.

Note that this requirements statement does not support all the optional features in ISO 8879, only the subset that is considered vital to the next generation of systems. Other features that would be nice, but which are not strict requirements include:

Option A - Tag Omission

The ability to omit tags when their presence can often be implied from the model where two consecutive elements start or end. It should be possible to specify that certain tags do not need to occur at certain points in the model at which they can be implied from the immediately following tag.

Option B - Data as Tags

Many documents contain data that forms a natural markup tag (e.g. "Abstract:", "Chapter 12", "Appendix 1:") It should be possible to identify strings that in the current context of the model identify the omission of a specified piece of markup.

Option C - Subdocuments

It should be possible to identify parts of a document for which a document model specified in another file is to be applied. This is particularly important where a document includes data that has been copied or referenced via links that embed the data into the document. It should not be necessary to store such data in a separate file that is referenced via an external entity.

Martin Bryan
BSI IST/41