A framework for combining more than one schema language<

Date:      Thu, 04 Jan 2001 10:37:35 +0900
From:      Murata Makoto <mura034@attglobal.net>
To:        reldeve@egroups.com
Subject:   [reldeve] A framework for combining more than one schema language

I wrote the attached message long time ago, but this is still a basis of the design of RELAX Namespace. This proposal is based on Takahashi-san's very old proposal at http://www.uic.edu/~cmsmcq/tech/n1873.html.

My motivation for fragment validation is summarized by Dave Hollander in the very first TR for XML namespaces.

http://www.w3.org/TR/1998/NOTE-xml-names-0119#sec-example
http://www.w3.org/TR/1998/NOTE-xml-names-0119#validation

Cheers,

Makoto

A framework for combining more than one schema language
Murata Makoto (makoto@tpost1.netspace.or.jp)
Date: Tue, 08 Jun 1999 03:20:35 +0900

1. Introduction

The XML Schema specification describes a complicated language. One reason is that it tries to address different requirements from different applications at the same time. If we attempt to address the requirements of RDF-schema authors as well, the language will become even more complicated.

Instead of persuing a gigantic "unified" language or patchwork, this memo suggests an alternative approach. It presents a framework such that different modules can be written in different languages. It is hoped that such a framework significantly simplifies each language without reducing interoperability.

2. Original proposal for modules

Probably, the best way to introduce this framework is to go back to the original proposal.

TAKAHASHI Toru [1] proposed to introduce modules to SGML. A module is a sequence of markup declarations as well as import declarations and export declarations.

Modules are invoked by internal or external DTD subsets. Modules do not nest. That is, modules do not invoke other modules.

An export declaration is a sequence of element types. Such exported element types can be used outside of the defining module.

Unlike "import" in the XSDL draft, an import declaration in the original proposal merely enumerates parameter entity names. These parameter entities are referenced within the module but are not declared there.

When invoking a module, a DTD provides the definition of imported parameter entities. The defining text contains element types declared in the internal or external DTD subset as well as those exported from other modules.

A DTD also attaches a namespace prefix to each module. All markup declarations in the module are assumed to be in this namespace.

Observe that modules are almost black boxes. They interact with the outer world only by importing parameter entities and exporting element types.

3. A multi-language framework

Since it minimizes the interaction of modules, the original proposal opens the door for multi-language frameworks. We only have to allow more than one language for writing modules.

3.1 Syntax

Since we are trying to define a new syntax, we cannot use internal or external DTD subsets. Instead, we introduce module compositors.

A schema consists of a module compositor as well as one or more modules. Different modules may be written in different languages.

1) Module

A module roughly corresponds to a sequence of markup declarations as well as import/export declarations. A mathml module provides all makup declarations required for writing mathematical expressions. An I18N module provides a set of common attributes (e.g., xml:lang and xml:bidi) for I18N.

A module is described in a module definition language (MDL). There may exist more than one MDL. The first element of a module must specify the MCL by specifying an URI. The syntax for import and export is common to all MDLs, but different MDLs may use different syntax for defining markup declarations.

A module is referenced by a module compositor. A module does not reference to a module compositor or other modules.

2) Module compositor

A module compositor is described by a module compositor language (MCL). There exists only one MCL.

A module compositor references to one or more modules, These modules are combined and they collective provide a single schema. For example, we might want to combine a TEI module, some XHTML modules, and an I18N module to form a schema.

A module compositor references to a module by specifying the location URI of this module. Moreover, the module compositor also specifies a URI as the namespace of the module. Namespace URIs declared in XML instances and namespace URIs specified by module compositors must coincide.

For information factoring, a module compositor may reference to other module compositors. Referenced module compositors are merely expanded. (Hence, this provides "include".)

3) Export and Import

An export declaration exports element types by listing their names. Furthermore, it may also export a named attribute group. Exported element types and named attribute groups can be used by the module compositor.

An import declaration imports model groups and an attribute group. The schema compositor provides the definitions of these model groups and attribute group. Those element types and attribute groups which are exported from other modules can be used in these definitions.

3.2 Validation

Validation consists of three steps as below:
  step 1: decomposition of documents into element subtrees 
          and attribute grourps,
  step 2: validation of element subtrees, and
  step 3: validation of attribute groups

Step 1) decomposition of documents into element subtrees and attribute grourps

In this step, a document is first decomposed into element subtrees and attribute groups. Each element subtree belongs to only one namespace, and is valided by step 2. Each attribute group also belongs to only one namespace, and is validated by step 3.

An element is detached from its parent element only when they belong to different namespaces. Then, a placeholder element is instead created, and is used in step 2.

An attribute is detached from its containing element only when they belong to different namespaces. Then, a placeholder attribute is instead created, and is used in step 3.

Partial validation is done by skipping validation of some element subtrees or attribute groups. For example, we might want to validate mathematical expressions without validating the top-level document structure.

Step 2) validation of element subtrees

Each element subtree belongs to a single namespace. This subtree is validated against the module to which the namespace resolves. The top-level element of the subtree is of an exported element type, but other elements may or may not be exported.

A placeholder element created in step 1 matches imported model groups.

Step 3) validation of attribute groups

Each attribute group belongs to a single namespace. These elements are validated against the module to which the namespace resloves. Each attribute is of an exported attribute group.

A placeholder attribute created in step 1 matches imported attribute groups.

4. Summary

The original proposal for modules scales very nicely to a multi-language framework. We can combine different languages, each of which is best-suited to some application. For example, we can use the RDF schema language to describe RDF schemata, use another language for document structural schemata. We can even combine a top-level document structure module, an RDF module, and a phrase-level document structure module.

Compared to a gigantic unified language, this framework is apparently easier to implement and learn. Nevertheless, we can create a schema which takes advantages of more than one schema language. We only have to take advantage of modularity for switching schema languages.

We might want to separate our language into two separate language, for example one without archetypes and refinement and one without model groups and attribute groups and complicated content modles. This framework allows for the coexistence of such two languages.

References

Takahashi Toru. "A Proposal to Introduce "Module" Structures into SGML", 12 November 1996, document ISO/IEC JTC1/SC18/WG8 N1873. A copy in Word format is available from the SC34 Web site (at http://www.ornl.gov/sgml/wg8/document/1873.doc); Copies in TEI Lite and HTML are available from MSM's Web site (at http://www.uic.edu/~cmsmcq/tech/n1873.sgml and http://www.uic.edu/~cmsmcq/tech/n1873.html).

Prepared by Robin Cover for The XML Cover Pages archive.

SEARCH Advanced Search ABOUT Site Map CP RSS Channel Contact Us Sponsoring CP About Our Sponsors NEWS Cover Stories Articles & Papers Press Releases CORE STANDARDS XML SGML Schemas XSL/XSLT/XPath XLink XML Query CSS SVG TECHNOLOGY REPORTS XML Applications General Apps Government Apps Academic Apps EVENTS LIBRARY Introductions FAQs Bibliography Technology and Society Semantics Tech Topics Software Related Standards Historic	A framework for combining more than one schema language< Date: Thu, 04 Jan 2001 10:37:35 +0900 From: Murata Makoto <mura034@attglobal.net> To: reldeve@egroups.com Subject: [reldeve] A framework for combining more than one schema language I wrote the attached message long time ago, but this is still a basis of the design of RELAX Namespace. This proposal is based on Takahashi-san's very old proposal at http://www.uic.edu/~cmsmcq/tech/n1873.html. My motivation for fragment validation is summarized by Dave Hollander in the very first TR for XML namespaces. http://www.w3.org/TR/1998/NOTE-xml-names-0119#sec-example http://www.w3.org/TR/1998/NOTE-xml-names-0119#validation Cheers, Makoto A framework for combining more than one schema language Murata Makoto (makoto@tpost1.netspace.or.jp) Date: Tue, 08 Jun 1999 03:20:35 +0900 1. Introduction The XML Schema specification describes a complicated language. One reason is that it tries to address different requirements from different applications at the same time. If we attempt to address the requirements of RDF-schema authors as well, the language will become even more complicated. Instead of persuing a gigantic "unified" language or patchwork, this memo suggests an alternative approach. It presents a framework such that different modules can be written in different languages. It is hoped that such a framework significantly simplifies each language without reducing interoperability. 2. Original proposal for modules Probably, the best way to introduce this framework is to go back to the original proposal. TAKAHASHI Toru [1] proposed to introduce modules to SGML. A module is a sequence of markup declarations as well as import declarations and export declarations. Modules are invoked by internal or external DTD subsets. Modules do not nest. That is, modules do not invoke other modules. An export declaration is a sequence of element types. Such exported element types can be used outside of the defining module. Unlike "import" in the XSDL draft, an import declaration in the original proposal merely enumerates parameter entity names. These parameter entities are referenced within the module but are not declared there. When invoking a module, a DTD provides the definition of imported parameter entities. The defining text contains element types declared in the internal or external DTD subset as well as those exported from other modules. A DTD also attaches a namespace prefix to each module. All markup declarations in the module are assumed to be in this namespace. Observe that modules are almost black boxes. They interact with the outer world only by importing parameter entities and exporting element types. 3. A multi-language framework Since it minimizes the interaction of modules, the original proposal opens the door for multi-language frameworks. We only have to allow more than one language for writing modules. 3.1 Syntax Since we are trying to define a new syntax, we cannot use internal or external DTD subsets. Instead, we introduce module compositors. A schema consists of a module compositor as well as one or more modules. Different modules may be written in different languages. 1) Module A module roughly corresponds to a sequence of markup declarations as well as import/export declarations. A mathml module provides all makup declarations required for writing mathematical expressions. An I18N module provides a set of common attributes (e.g., xml:lang and xml:bidi) for I18N. A module is described in a module definition language (MDL). There may exist more than one MDL. The first element of a module must specify the MCL by specifying an URI. The syntax for import and export is common to all MDLs, but different MDLs may use different syntax for defining markup declarations. A module is referenced by a module compositor. A module does not reference to a module compositor or other modules. 2) Module compositor A module compositor is described by a module compositor language (MCL). There exists only one MCL. A module compositor references to one or more modules, These modules are combined and they collective provide a single schema. For example, we might want to combine a TEI module, some XHTML modules, and an I18N module to form a schema. A module compositor references to a module by specifying the location URI of this module. Moreover, the module compositor also specifies a URI as the namespace of the module. Namespace URIs declared in XML instances and namespace URIs specified by module compositors must coincide. For information factoring, a module compositor may reference to other module compositors. Referenced module compositors are merely expanded. (Hence, this provides "include".) 3) Export and Import An export declaration exports element types by listing their names. Furthermore, it may also export a named attribute group. Exported element types and named attribute groups can be used by the module compositor. An import declaration imports model groups and an attribute group. The schema compositor provides the definitions of these model groups and attribute group. Those element types and attribute groups which are exported from other modules can be used in these definitions. 3.2 Validation Validation consists of three steps as below: step 1: decomposition of documents into element subtrees and attribute grourps, step 2: validation of element subtrees, and step 3: validation of attribute groups Step 1) decomposition of documents into element subtrees and attribute grourps In this step, a document is first decomposed into element subtrees and attribute groups. Each element subtree belongs to only one namespace, and is valided by step 2. Each attribute group also belongs to only one namespace, and is validated by step 3. An element is detached from its parent element only when they belong to different namespaces. Then, a placeholder element is instead created, and is used in step 2. An attribute is detached from its containing element only when they belong to different namespaces. Then, a placeholder attribute is instead created, and is used in step 3. Partial validation is done by skipping validation of some element subtrees or attribute groups. For example, we might want to validate mathematical expressions without validating the top-level document structure. Step 2) validation of element subtrees Each element subtree belongs to a single namespace. This subtree is validated against the module to which the namespace resolves. The top-level element of the subtree is of an exported element type, but other elements may or may not be exported. A placeholder element created in step 1 matches imported model groups. Step 3) validation of attribute groups Each attribute group belongs to a single namespace. These elements are validated against the module to which the namespace resloves. Each attribute is of an exported attribute group. A placeholder attribute created in step 1 matches imported attribute groups. 4. Summary The original proposal for modules scales very nicely to a multi-language framework. We can combine different languages, each of which is best-suited to some application. For example, we can use the RDF schema language to describe RDF schemata, use another language for document structural schemata. We can even combine a top-level document structure module, an RDF module, and a phrase-level document structure module. Compared to a gigantic unified language, this framework is apparently easier to implement and learn. Nevertheless, we can create a schema which takes advantages of more than one schema language. We only have to take advantage of modularity for switching schema languages. We might want to separate our language into two separate language, for example one without archetypes and refinement and one without model groups and attribute groups and complicated content modles. This framework allows for the coexistence of such two languages. References Takahashi Toru. "A Proposal to Introduce "Module" Structures into SGML", 12 November 1996, document ISO/IEC JTC1/SC18/WG8 N1873. A copy in Word format is available from the SC34 Web site (at http://www.ornl.gov/sgml/wg8/document/1873.doc); Copies in TEI Lite and HTML are available from MSM's Web site (at http://www.uic.edu/~cmsmcq/tech/n1873.sgml and http://www.uic.edu/~cmsmcq/tech/n1873.html). Prepared by Robin Cover for The XML Cover Pages archive.