The Cover PagesThe OASIS Cover Pages: The Online Resource for Markup Language Technologies
Advanced Search
Site Map
CP RSS Channel
Contact Us
Sponsoring CP
About Our Sponsors

Cover Stories
Articles & Papers
Press Releases

XML Query

XML Applications
General Apps
Government Apps
Academic Apps

Technology and Society
Tech Topics
Related Standards
Created: December 15, 2003.
News: Cover StoriesPrevious News ItemNext News Item

RELAX NG XML Schema Language Published as an ISO Standard (DSDL Part 2).

A posting from James Clark announces the publication of the RELAX NG specification as an ISO standard, being Part 2 'Regular-Grammar-Based Validation' of the multi-part ISO 19575 Document Schema Definition Language (DSDL). In Clark's vision, the RELAX NG schema language is "based firmly on the labelled-tree abstraction," distinguished from other XML schema languages by what it leaves out; in RELAX NG, the syntax and minimal labelled-tree abstraction implicit in that syntax are at the center of XML processing." According to the DSDL Part 2 abstract, ISO/IEC 19757-2:2003 "specifies RELAX NG, a schema language for XML. A RELAX NG schema specifies a pattern for the structure and content of an XML document. The pattern is specified by using a regular tree grammar. A RELAX NG schema is itself an XML document. ISO/IEC 19757-2:2003 specifies (1) when an XML document is a correct RELAX NG schema and (2) when an XML document is valid with respect to a correct RELAX NG schema." RELAX NG is supported by a growing collection of software tools, including validators, conversion utilities, code generators, and XML editors. ISO/IEC 19757-2:2003 is Part 2 of a planned ten-part ISO standard which will include "Rule-Based Validation: Schematron" (Part 3) as well. The goal of ISO SC34/WG1 (Document Description and Processing Languages, Information Description) in developing Document Schema Definition Languages (DSDL) is "to create a framework within which multiple validation tasks of different types can be applied to an XML document in order to achieve more complete validation results than just the application of a single technology."

Bibliographic Information

ISO/IEC 19757-2:2003. Information technology -- Document Schema Definition Language (DSDL) -- Part 2: Regular-grammar-based validation -- RELAX NG. Edition #1 (Monolingual, available in English only). Produced by ISO/IEC JTC 1/SC 34, Document description and processing languages. 34 pages. Purchase price (PDF or paper): CHF 120,00 [approximately 77 € or $86 USD, 2003-12].

ISO/IEC FDIS 19757-2:2002(E). Information technology -- Document Schema Definition Languages (DSDL) -- Part 2: Regular grammar-based validation -- RELAX NG. Date: 2002-12-12. Final Draft International Standard (FDIS). Produced by ISO/IEC JTC 1/SC 34/WG 1 (Information Technology, Subcommittee SC 34, Document Description and Processing Languages, Information Description). 40 pages. Available for download (free).

DSDL Multi-part standard:

  • ISO 19575-1: Document Schema Definition Language (DSDL) Part 1: Overview
  • ISO 19575-2: Document Schema Definition Language (DSDL) Part 2: Regular-grammar-based validation: RELAX NG
  • ISO 19575-3: Document Schema Definition Language (DSDL) Part 3: Rule-based validation: Schematron
  • ISO 19575-4: Document Schema Definition Language (DSDL) Part 4: Namespace-based Validation Candidate Selection
  • ISO 19575-5: Document Schema Definition Language (DSDL) Part 5: Datatypes
  • ISO 19575-6: Document Schema Definition Language (DSDL) Part 6: Path-based integrity constraints
  • ISO 19575-7: Document Schema Definition Language (DSDL) Part 7: Character repertoire validation
  • ISO 19575-8: Document Schema Definition Language (DSDL) Part 8: Declarative document manipulation
  • ISO 19575-9: Document Schema Definition Language (DSDL) Part 9: Datatype- and namespace-aware DTDs
  • ISO 19575-10: Document Schema Definition Language (DSDL) Part 10: Validation management

Overview of RELAX NG (DSDL Part 2) Specification

Clause 5 of ISO/IEC 19757 "describes the data model, which is the abstraction of an XML document used throughout the rest of the document. Clause 6 describes the syntax of a RELAX NG schema. Clause 7 describes a sequence of transformations that are applied to simplify a RELAX NG schema, and also specifies additional requirements on a RELAX NG schema. Clause 8 describes the syntax that results from applying the transformations; this simple syntax is a subset of the full syntax. Clause 9 describes the semantics of a correct RELAX NG schema that uses the simple syntax; the semantics specify when an element is valid with respect to a RELAX NG schema. Clause 10 describes requirements that apply to a RELAX NG schema after it has been transformed into simple form. Finally, Clause 11 describes conformance requirements for RELAX NG validators...

This part of ISO/IEC 19757 uses EBNF notation to describe the full syntax and the simple syntax of RELAX NG... Although the EBNF notation is based on the XML representation of an RELAX NG schema as a sequence of characters, the grammar operates at the data model level. For example, although the syntax uses <text/>, an instance or schema can use <text></text> instead , because they both represent the same element at the data model level...

Data model: RELAX NG deals with XML documents representing both schemas and instances through an abstract data model. XML documents representing schemas and instances shall be well-formed in conformance with W3C XML and shall conform to the constraints of W3C XML-Names..." [from FDIS]

Quintessence of RELAX NG

James Clark, principal designer of RELAX NG, has written a book Foreward for the RELAX NG volume being prepared by Eric van der Vlist. An excerpt which captures some key elements of the RELAX NG design:

RELAX NG is based on a very clear vision of XML processing. XML is useful only because XML processing components can interoperate. Most XML processing components do not input and output arbitrary XML documents. To combine XML processing components reliably, it is therefore essential to be able to specify the inputs and outputs of XML processing components and to verify mechanically that components are behaving according to their specifications. The most important issue in doing this is choosing which abstraction of XML to use for specifying the inputs and outputs of XML processing components.

XML standardizes only a syntax, but if you constrain XML documents directly in terms of the sequences of characters that represent them, the syntactic noise is deafening. On the other hand, if you use an abstraction that incorporates concepts such as object-orientation that have no basis in the syntax, then you are coupling your XML processing components more tightly than necessary. What then is the right abstraction? The W3C XML Infoset Recommendation provides a menu of abstractions, but the items on the menu are of wildly differing importance.

I would argue that the right abstraction is a very simple one. The abstraction is a labelled tree of elements. Each element has an ordered list of children where each child is a Unicode string or an element. An element is labelled with a two-part name consisting of a URI and local part. Each element also has an unordered collection of attributes where each attribute has a two-part name, distinct from the name of the other attributes in the collection, and a value, which is a Unicode string. That is the complete abstraction. The core ideas of XML are this abstraction, the syntax of XML and how the syntax and abstraction correspond. If you understand this, then you understand XML...

RELAX NG is based firmly on the labelled-tree abstraction. All a RELAX NG schema does is provide a way to specify a class of XML documents in terms of this abstraction. Other schema languages, including W3C XML Schema, also provide this capability. Where RELAX NG differs from most other schema languages is in what in leaves out. It leaves out alternative abstractions of XML (such as W3C XML Schema's PSVI) that compete with the fundamental labelled-tree abstraction. It leaves anything for transforming the document no matter how simple (such as default attributes). It leaves anything used for parsing the document (such as entity declarations). It leaves out anything for mapping between XML and programming language data structures or relational databases. Just like XML itself, much of the advantage of RELAX NG stems from what it leaves out.

RELAX NG's vision of XML processing is not one which puts RELAX NG at the center of XML processing to the exclusion of other technologies. Rather the RELAX NG vision is one in which XML, or more precisely the syntax and minimal labelled-tree abstraction implicit in that syntax, is at the center of XML processing. The only thing you are locked into with RELAX NG is XML. This is why a lack of vendor support need not prevent you from using RELAX NG... [excerpted]

What is RELAX NG?

John Cowan's tutorial "RELAX NG: DTDs on Warp Drive" is available online. According to its abstract:

The RELAX NG schema language [is] an alternative schema language for XML. RELAX NG allows easy and intuitive descriptions of just what is and what is not allowed in an XML document. It is simple enough to learn in a few hours, and rich and flexible enough to support the design and validation of every kind of document from the very simple to the very complex. Once RELAX NG's concepts have crossed the blood-brain barrier, you will never be able to take any other schema language quite seriously again.

RELAX NG is an evolution and generalization of XML DTDs, and it shares the same basic paradigm. Based on experience with SGML and XML, RELAX NG both adds and subtracts features from DTDs. XML DTDs can be automatically converted into RELAX NG. Experts in designing SGML and XML DTDs will find their skills transfer easily to designing RELAX NG. Design patterns that are used in XML DTDs can be used in RELAX NG. Overall, RELAX NG is much more mature (and it is possible to have a higher degree of confidence in its design) than it would be if it were based on a completely new and different paradigm.

A major goal of RELAX NG is that it be easy to learn and easy to use. Schemas can be patterned after the structure of the documents they describe, but need not be: definitions to be composed from other definitions in a variety of ways. Attributes and elements are treated uniformly as much as possible. RELAX NG supports pluggable simple datatype libraries, from a trivial one that describes only strings and tokens to the full XML Schema Part 2; new ones can be readily designed and built as needed. RELAX NG provides full support for namespaces. RELAX NG provides two interconvertible syntaxes, an XML one for processing, and a compact non-XML one for human authoring.

RELAX NG has been standardized in OASIS by the RELAX NG Technical Committee, and is Part 2 of ISO DSDL, the Document Schema Definition Languages umbrella... [adapted]

Principal references:

Hosted By
OASIS - Organization for the Advancement of Structured Information Standards

Sponsored By

IBM Corporation
ISIS Papyrus
Microsoft Corporation
Oracle Corporation


XML Daily Newslink
Receive daily news updates from Managing Editor, Robin Cover.

 Newsletter Subscription
 Newsletter Archives
Bottom Globe Image

Document URI:  —  Legal stuff
Robin Cover, Editor: