[Note: This document was posted to the RELAX-NG mailing list by Martin Bryan (The SGML Centre). BSI IST/41 on 2001-10-05. Comment: "In talking to James Clark earlier today about the relationship between RELAX NG and the proposed new ISO Document Structure Definition Language (DSDL) James asked if I could provide some use cases that would justify the initial set of requirements that the DSDL proposal contained. The attached document starts by listing the requirements identified as being essential for DSDL, and then provides a set of use case statements that seeks to justify each requirement. It also contains brief use cases for supporting three optional features of SGML that are not supported by XML, and not listed as being requirements for DSDL, but for which cases can be made within data streams being used by businesses... See: "Document Schema Definition Language (DSDL) Proposed as ISO New Work Item." (June 12, 2001).]
The requirements statement in the DSDL new work item proposal lists the following initial set of requirements:
This document seeks to provide user cases for each of these requirements and to suggest some other options it might be nice to be able to meet..
Users have complained that having to learn one language for reading/writing DTDs and another for reading/writing document instances is confusing. The new range of "schema languages" that have replaced DTDs are all described as document instances that can be parsed and transformed using parsing techniques in general use rather than DTD specific parsers. DSDL document type specificiations should be presentable as a valid XML instance.
Users need a mechanism for importing well-controlled sets of data into document instances, and for incorporating already defined elements and their attribute definitions into document type declarations. Where data being embedded into instances is not conformant with the standard the relevant notation should be clearly identified in a way that allows the relevant rendering service to be invoked. (This may include a reference to the source of software capable of providing the rendering service in a stated environment.)
Entity declarations should not require the use of a separate Doctype parser, or of special notation. It should be sufficient to identify the processes being controlled via a namespace declaration.
Note: The same mechanism would probably also be able to associate parsing constraints with elements in instances.
It should be possible to incorporate data that is not to be parsed in an instance, rather than having to store it as an external entity. This is necessary to allow instances to document valid code sequences, and to allow for data streams that use sequences what would be valid within code streams. Such data may not be structured, or may be structured using an undefined language. It should be possible to identify areas of the data stream that do not need parsing, and to associate with such areas information that can be used to identify the type of rendering module it is intended to be processed by.
Many e-commerce applications require an upper or lower limit to be applied to the number of times an element can be ordered. For example, a book club may require customers to order no less than 4 books when they renew their membership, or no more than 3 books when taking advantage of a special offer. In some cases the rules state that members must order at least 3 items and no more than 20. While it is difficult to place an upper limit for maximum occurrences (1023 would seem reasonable in the majority of cases) it should be possible to come up with some form of system declaration that could be used to constrain the maximum values.
For compatibility with backend applications it may be necessary to restrict the character set used for element contents or attributes. For example, the backend system may only support the ISO 8859 or EBCDIC character set and be unable to handle Japanese or Chinese Unicode characters. Forms should be able to constrain the set of characters that are accepted in response to a question.
The document type declaration must be able to fully declare the constraints to be placed on element and attribute contents, either in terms of one of a predefined sets of permitted values, a range of permitted numeric values (e.g. positive integers, or fractions that are restricted to .25, .50 and .75), data that conforms to formally declared patterns (e.g. those for ISBN numbers) and data that conforms to specific datatypes (e.g. dates, times, currency values, etc.
It should be possible for the validation routine to invoke another process to check the validity of an attribute value or element content. For example, the set of permitted values may be stored in a database under the control of another organization, which is responsible for its validation. Examples of such elements include those recording DUNS number, numeric code identifiers, credit cards numbers, etc. In some cases the validating system may be required to respond by providing additional information for use in place of, or alongside, the input data in the resulting application. For example, transmitting a DUNS number may require the presentation tool to provide details of the name of the referenced organization as returned from the external resource to be displayed in place of the entered data.
The classic use case for this occurs within administration data within hospitals, government departments, etc, where different responses are required for men and women, and for married and unmarried people. For example you don't need to ask men or young children questions related to pregnancy, or children under 16 about their marital status. It should be possible to check that when one element has as a specific answer, or answer within a predefined range, that another element either has a value in a relevant range, or that it is not present.
Extensible "models" of sets of elements are required in many business applications. For example, postal addresses are a commonly used feature of data models that should not have to be redeclared in each model. It should be possible to declare submodels in such a way that any model that refers to them is automatically updated whenever the submodel is updated. This requirement extends the basic import mechanism defined by Requirement 2 by requiring that it should be possible to apply modifying constraints to the imported definitions, and to be able to specify points in the imported model where other constructs are to be added (for example, by stating that at a particular point in the imported model another element or attribute definition is to be added to the type definition).
Sometimes alterations to specific parts of document models rests with specific bodies. For example, the format for specifying tax information in an invoice may be fixed by a government. In such cases the applying of extensions to imported submodels may need to be inhibited, or restricted to the organization responsible for the maintenance of that part of the data stream. Alternatively there may simply need to be a mechanism for recording who is responsible for the submodel, e.g. by assigning it a namespace.
Some data may only be collected for the purposes of commenting on data streams, or for viewing under controlled circumstances. Examples of such data include annotations, confidential data only accessible by staff with a specific clearance level, data that has been deleted from the latest version of a file, or data that will only become valid on a certain date. Mechanisms for uniquely identifying such data should include means to pass the identifier off to an external routine to determine whether or not to include the temporary section within the displayed data.
Document type definitions need to be self documenting, and to contain appropriate version identification information. Such data should be specifiable in instance format, rather than having to be specified using comment declarations. It should be possible insert annotations within any element used to declare document structure, so that individual attributes, and individual attribute values specifications can have annotating information associated with their definitions.
Not all applications will support all features. Elements that identify specified features of ISO 8879 should be separately specifiable. Other features should also be dependent on local support for features such as datatypes, notation processing, etc.
Note that this requirements statement does not support all the optional features in ISO 8879, only the subset that is considered vital to the next generation of systems. Other features that would be nice, but which are not strict requirements include:
The ability to omit tags when their presence can often be implied from the model where two consecutive elements start or end. It should be possible to specify that certain tags do not need to occur at certain points in the model at which they can be implied from the immediately following tag.
Many documents contain data that forms a natural markup tag (e.g. "Abstract:", "Chapter 12", "Appendix 1:") It should be possible to identify strings that in the current context of the model identify the omission of a specified piece of markup.
It should be possible to identify parts of a document for which a document model specified in another file is to be applied. This is particularly important where a document includes data that has been copied or referenced via links that embed the data into the document. It should not be necessary to store such data in a separate file that is referenced via an external entity.