[Cache from http://www.y12.doe.gov/sgml/sc34/document/0275.htm; please use this canonical URL/source if possible.]
Title: | CD 19757-0 - DSDL Part 0 - Overview |
Source: | G. Ken Holman |
Project: | DSDL |
Project editor: | G. Ken Holman |
Status: | Draft for comment |
Action: | |
Date: | 11 December 2001 |
Summary: | |
Distribution: | SC34 and Liaisons |
Refer to: | |
Supercedes: | |
Reply to: | Dr. James David Mason (ISO/IEC JTC1/SC34 Chairman) Y-12 National Security Complex Information Technology Services Bldg. 9113 M.S. 8208 Oak Ridge, TN 37831-8208 U.S.A. Telephone: +1 865 574-6973 Facsimile: +1 865 574-1896 E-mailk: mailto:mxm@y12.doe.gov http://www.y12.doe.gov/sgml/sc34/sc34oldhome.htm Ms. Sara Hafele, ISO/IEC JTC 1/SC 34 Secretariat American National Standards Institute 25 West 43rd Street New York, NY 10036 Tel: +1 212 642-4937 Fax: +1 212 840-2298 E-mail: shafele@ansi.org |
Author: G. Ken Holman
Date: $Date: 2001/12/11 18:46:03 $(UTC)
Copyright (C) 2001 SC34
Document Schema Definition Language (DSDL) is a multipart International Standard defining a modular set of specifications for describing the document structures, data types, and data relationships in structured information resources. Two kinds of integrated specifications are included: specifications for describing aspects of validity of a document, and rules for combining and packaging a collection of processes applicable to the task of validating a document. This integration makes DSDL applicable to both business and publishing applications of structured information resources. This applicability reflects the expansion of Extensible Markup Language (XML) applications beyond the publishing environment in which XML and its foundation - the Standard Generalized Markup Language (SGML) - were first developed.
In addition to this overview document being Part 0, the following parts comprise this International Standard.
The information items in structured information resources are not always ordered as a simple hierarchy. The abstraction representing all of the relationships within the information items is often a graph where there exist some links relating items in ways that are non-hierarchical. Expressing such a graph of items in a tree structure breaks those links that cannot be expressed by the grammar of the hierarchical representation. Reconstituting the structured information resource restores the cut links to make the resource intact.
Validating the hierarchical representation of the graph involves checking the tree structure against the grammar of the structure with cut links, as well as checking the links that were cut. This IS describes the task of validating all of the relationships in a document using a pipeline of consecutively applied composite processes of validation and simple transformation. This IS includes a standardized set of basic composite processes and an extensible mechanism of referencing other possible composite processes.
An example of a pipeline of three consecutively applied processes could include validating the structural hierarchy of an information resource, followed by ascribing default values for absent information items in the structure, followed by validating the non-hierarchical links between the resulting information items. This pipeline would produce different results than first validating non-hierarchical links between information items in the resource, followed by supplying default values for absent items, followed by validating the structural hierarchy of the result of the manipulation.
The DSDL framework includes:
a method of identifying the validation processes to be applied in pipelined paths of discrete steps
a language for choreographing the validation processes as a set of available pipelines
a description of the set of information items applicable to these validation processes
Portions of this Part will be initially based on RELAX Namespaces.
Grammar-oriented schema languages validate the structure of information items in an instance conforms to a set of constraints described by a tree grammar. This includes constraining the text in the tree found at the terminal symbols in the grammar to data types and parameters described in Part 3 of this IS.
This Part includes a syntax for specifying and identifying:
the grammar of the hierarchy
the identity of data types, their parameters and the parameter values standardized by DSDL
the identity of non-DSDL data types
This Part is initially based on RELAX NG.
Terminal symbols of text in the hierarchical tree may represent values of a data type.
This Part defines:
a set of standardized named data types (e.g. integer)
a set of parameters and their values for each data type (e.g. minimum and maximum values)
a set of constraints describing a possibly infinite set of strings representing values of the data type
This Part is initially based on a subset of primitive data types and their facets from Part 2 of W3C XML Schema.
The non-hierarchical links between information items in a structured resource can be reconstituted by addressing the items and expressing the relationship between them found in the original graph of information. The addressing mechanism includes hierarchy-based paths of steps along the tree to the information item being addressed.
This Part defines:
a method of identifying information items based on
the ancestry of the information item
other available mechanisms not based on the tree (e.g. ID attribute values)
an extensible basis supporting mechanisms not currently available
a method of describing relationships that are not hierarchical
This Part is initially based on Schematron.
Object-oriented schema languages validate the structure of information items in an instance conforms to a set of constraints described using inheritance. These constraints can be useful when using XML in conjunction with object-oriented concepts used widely in modern programming languages (e.g. Java) and modern modeling languages (e.g. UML).
This Part is initially based on Part 1 of W3C XML Schema and the sections of Part 2 of W3C XML Schema describing the derivation of new simple types and describing the syntax for referring to primitive data types.
Structured information resources may need to be augmented, reduced, or have information items otherwise manipulated as part of the validation process. XML Document Type Definitions (DTDs) and HyTime include methods of defaulting attributes and information item renaming that characterize the changes that are sometimes necessary.
These highly-limited micro-transformations can:
supply default values and/or structures for absent information items
substitute alternative names for information items
elide information items
This Part will be declarative in nature and will not attempt to provide totally general purpose transformation requirements.
Ed. Note: could this be titled "Syntax-oriented schema language" or "Legacy-oriented schema language?
Existing structural constraints on and defaulted values for information items in a structured resource may already be described using XML Document Type Definition (DTD) syntax. These constraints could be interpreted accommodating namespaces. These constraints need not be directly coupled to the XML document through a document type declaration.
This Part will address:
the semantics of the validation of a tree according to the syntax of a DTD
decoupling the specification of the DTD from the instance to be validated by the DTD
ISO/IEC JTC 1/SC 34 N-0275 - CD 19757-0 - DSDL Part 0 - Overview G. Ken Holman Copyright (C) 2001 SC34 $Date: 2001/12/11 18:46:03 $(UTC)