[Archive copy mirrored from: http://www.uic.edu/~cmsmcq/tech/n1873.html]

ISO/IEC JTC1/SC18/WG8 N1873


ISO/IEC JTC1/SC18/WG8

Document Processing and Related Communication

Document Description and Processing Languages

12 November 1996

JTC1.18.15.1

Table of Contents


1 Introduction

Designing a large, complex DTD is a very difficult job. One reason of this difficulty comes from SGML's restriction on name spaces.

For element type names, SGML allows only one name space per document. This restriction means, that if you intend to design a new DTD, you have to be familiar with all the element types you wish to use to construct the DTD, and have to select their names very carefully to avoid name conflicts.

This lack of modularity makes difficult to use separately designed declaration sets (parts of DTDs) in mixture to build up a complete DTD. For example, if you want to use pre-defined declaration sets for "tables" and "math expressions" together to construct your own "report" DTD, you have to examine whether there are any name conflicts between them or not. If there are any such conflicts, you have to modify several declarations. This restriction in name space makes impossible to treat these declaration sets as public (read only) texts. Similar problems may occur on the parameter entity names.

To solve these problems, we propose to introduce the concept of "Module" into SGML.

Author's Note: This document is the revised version of my previous proposal shown in WG8 N1677. Through the discussion done in the SGML RG meeting in the WG8 meeting held at Alexandria (April 1994) and discussions with Japanese WG8 members, some problems had been pointed out on the previous proposal. These are:

The new proposal shown in this document is still incomplete and needs more refinements, but I believe the issues listed above are solved in principle.

2 Concept of "Module"

2.1 Separation of Name Spaces

A module is a set of declarations which has its own name spaces. Each module has a name space for element type names and has another one for parameter entity names.

All these names declared in a module are registered in the corresponding name space of the module. In principle (with some exceptions explained below), all names declared in a module are hidden from the outside, and all names declared at the outside of a module are hidden from the inside of the module.

With this separation of name spaces, most of the problems on name conflicts can be avoided.

2.2 Import and Export of Names

The following lines (exp.1) show an example of module.

 
    <!IMPORT (mg1, mg2)>
    <!EXPORT (a, c)>
    <!ELEMENT a  - O  (b, c) >
    <!ELEMENT b  - O  (d | e)* >
    <!ELEMENT c  - O  (%mg1;) >
    <!ELEMENT d  - O  (#PCDATA) >
    <!ELEMENT e  - O  (%mg2;)+ >

exp.1: An Example of Module

The declaration shown in the first line is called "import declaration". The declaration shown in the second line is called "export declaration".

2.3 Import Declaration

A import declaration declares the names of parameter entities whose values are imported from the outside. In exp.1, the names "mg1" and "mg2" are declared in a import declaration, so these names can be used as parameter entity names without defining the corresponding entity values. Each module may have zero or one import declaration. As the usual parameter entity declarations, the import declaration have to appear before the first place where one of the declared names is referred.

Not as the usual parameter entities, the names declared in a import declaration can be used (referred) only in the content models of element types which are defined in the module. References to those names in any other place are prohibited.

Author's Note: I believe that this restriction is essential to enable to distinguish the element type names imported through a import declaration from the names of element types defined inside of the module. The alternative means to enable this distinction is to introduce a new construct for referring to the parameters imported from the outside.

2.4 Export Declaration

On the contrary, an export declaration declares element type names which will be opened to the outside. All the element type names which are not declared in an export declaration are hidden from the outside. Each module must have one (and may have only one) export declaration. The export declaration have to declare one or more element type names to export.

3 Module Declaration

The construct "module declaration" declares and invokes a module. The following lines (exp. 2) show an typical example of module declaration.

 
    <!MODULE mod ("(#PCDATA)", "(x, y)")
                    SYSTEM "moddef1.mod" >

exp.2: An Example of Module Declaration

This module declaration invokes a module defined in an external entity which can be referred to by the system identifier "moddef1.mod", with a module name "mod". Furthermore, this module declaration gives two parameter values ("(#PCDATA)" and "(x, y)") to the invoked module. At the inside of the module, these parameter values will become the values of parameter entities declared in the import declaration.

The number of parameter values must be the same to the number of parameter entity names specified in the import declaration of the invoked module. Each character string given as a parameter value must satisfy the syntax of "model group".

Author's Note: The second restriction are also required to distinguish the element type names imported through the import declaration from the names of element types defined in the module.

Instead of referring to an external entity which contains the definition of a module, a module declaration may have its own definition of the invoking module at the inside of the declaration itself. If the external entity "moddef1.mod" has the content shown in exp.1, the following declaration (exp.3) has the same effect to exp.2.

 
    <!MODULE mod ("(#PCDATA)", "(x, y)") [
      <!IMPORT (mg1, mg2)>
      <!EXPORT (a, c)>
      <!ELEMENT a  - O  (b, c) >
      <!ELEMENT b  - O  (d | e)* >
      <!ELEMENT c  - O  (%mg1;) >
      <!ELEMENT d  - O  (#PCDATA) >
      <!ELEMENT e  - O  (%mg2;)+ >
  ]>

exp.3: Another Example of Module Declaration

As the result of interpretation of the module declaration shown above, the element types which refer to the imported parameters will have the following definitions. (Redundant parentheses are deleted.)

 
    <!ELEMENT c  - O  (#PCDATA) >
    <!ELEMENT e  - O  (x, y)+ >

The element type names imported from the outside through the import declaration will be semantically distinguished from the element type names defined in the module. In this case, even if this module contains a declaration of element type "x", the instance of the element type "x" defined in this module can't appear in the content of element "e". Only the instances of element type "x" defined outside of the module can appear in the content of a instance of element type "e".

Module declarations may be appeared in a document type declaration subset, in mixture with other declarations (element type declarations, attribute definition list declarations, general entity declarations, etc.).

In a document type declaration subset, element type names exported from a module which is invoked by a module declaration appeared in the declaration subset, may be used to construct content models of element type declarations in the declaration subset. In principle, these element type names must be qualified by "module prefix". A module prefix is constructed from a module name and a module prefix connector "::". Module prefixes may be omitted if there are no ambiguities.

In the following example (exp.4), the element type "a" and "c" defined in module "mod" and exported to the outside are used to construct the content model of element type "doc" and "p" respectively.

 
  <!DOCTYPE doc [
      <!ELEMENT doc   - -   (fm, (p | mod::a)+, bm) >
    <!ELEMENT p      - O   (q | hp | mod::c)* >
    <!ELEMENT x      - O   (#PCDATA) >
    <!ELEMENT y      - O   (#PCDATA) >
    ...
      <!MODULE mod ("(#PCDATA)", "(x, y)") [
        <!IMPORT (mg1, mg2)>
        <!EXPORT (a, c)>
        <!ELEMENT a  - O  (b, c, x) >
        <!ELEMENT b  - O  (d | e)* >
        <!ELEMENT c  - O  (%mg1;) >
        <!ELEMENT d  - O  (#PCDATA) >
        <!ELEMENT e  - O  (%mg2;)+ >
      <!ELEMENT x  - O  (#PCDATA) >
    ]>
      ...
    ]>

exp.4: An Example Using the Element Type Names Defined in a Module

All other element types defined in the module (in this case, "b", "d", "e" and "x") are hidden from outside. They can't be used in the outside of the module.

4 Modules in the Document Instances

In principle, all the instances of element types defined in a module will be marked up by start-tags and end-tags which have an "extended GI". An extended GI is a element type name qualified by a module prefix (defined above). For example, an instance of a document which has the document type declaration shown in exp.4 may be as follows.

 
<doc>
 <fm>...
  ...
 <mod::a>
   <mod::b>
     <mod::d>content of "d" defined in the module</mod::d>
     <mod::e>
       <x>content of "x" defined outside of the module</x>
       <y>content of "y" defined outside of the module</y>
     <mod::d>content of "d" defined in the module</mod::d>
     ...
   </mod::b>
   <mod::c>content of "c" defined in the module</mod::c>
   <mod::x>content of "x" defined in the module</mod::x>
 </mod::a>
 <bm>...
</doc>

exp.4: An Example of Document Instance Contains an Instance of Module

If a element type which has the same element type name is not declared outside of the module (in other words, there are no ambiguities), the module prefixes in an extended GI may be omitted. Therefore, the example above can also be represented as follows.

 
<doc>
 <fm>...
  ...
 <a>
   <b>
     <d>content of "d" defined in the module</d>
     <e>
       <x>content of "x" defined outside of the module</x>
       <y>content of "y" defined outside of the module</y>
     <d>content of "d" defined in the module</d>
     ...
   </b>
   <c>content of "c" defined in the module</c>
   <mod::x>content of "x" defined in the module</mod::x>
 </a>
 <bm>...
</doc>

exp.5: Another Representation of the Document Instance shown in exp.4

5 More Concrete (but still much simplified) Examples

5.1 A Module for Math Expressions

The following is an example of a module which has definitions for math expressions. This module exports two element type names ("f" for inline formulae, and "df" for displayed formulae).

 
    
    <!EXPORT (f, df) >
    <!ENTITY % math.expr "#PCDATA|power|root|sq|sqrt|frac|..." >
    <!ELEMENT f      - O  (%math.expr;) >
    <!ELEMENT df     - O  (%math.expr;)+ >
    <!ELEMENT power  - O  (coef, of) >
    <!ELEMENT root   - O  (coef, of) >
    <!ELEMENT coef   O O  (%math.expr;) >
    <!ELEMENT of     - O  (%math.expr;) >
    ...

If this module is invoked from a DTD with the module name "math", and if the name "f" and "root" are used for other element types in the DTD, the document instance may be described as follows.

 
    <p>
    bra bra bra <math::f><power>3<of>(a + b)</math::f> bra bra...
    bra bra bra...
    <df>
    x = <math::root>4<of>c - d</math::root>
    </df>
    <p>
    bra bra bra bra...

5.2 Two "Table" Modules in a Document

If there are separately designed two modules for tables (they have different structures), and if you intend to use them together to construct your own DTD, you can accomplish this even if both modules use the same element type name (for example, "table") for the top-level element of tables.

Suppose that these modules are defined as follows.

In an external entity whose system identifier is "table-a.mod":

 
    <!IMPORT (cell-cont) >
    <!EXPORT (table) >
    <!ELEMENT table   - -   (row)+ >
    <!ELEMENT row     - O   (cell)+ >
    <!ELEMENT cell    - O   (%cell-cont;)* >

In an external entity whose system identifier is "table-b.mod":

 
    <!IMPORT (field-cont) >
    <!EXPORT (table) >
    <!ELEMENT table   - -   (field)+ >
    <!ATTLIST table
                nrow   NUMBER  #REQUIRED
                ncell  NUMBER  #REQUIRED
    >
    <!ELEMENT field   - O   (%field-cont;)*>
    <!ATTLIST field
                r  NUMBER  #REQUIRED
                c  NUMBER  #REQUIRED
    >

An example of document which contains both tables in its DTD may be described as follows.

 
    <!DOCTYPE doc [
      <!ENTITY % para  "p|lq|ul|ol|cell-tab::table|field-tab::table">
      <!ENTITY % ph    "#PCDATA|q|hp">
      <!MODULE cell-tab  ("(%ph;)*") SYSTEM "table-a.mod" >
      <!MODULE field-tab ("(%ph;)*") SYSTEM "table-b.mod" >
      <!ELEMENT doc   - -  (fm, body, bm) >
      ...
      <!ELEMENT h1    - O   (h1t, (%para;)*, h2*) >
      <!ELEMENT h2    - O   (h2t, (%para;)*, h3*) >
      ...
    ]>
    <doc>...
    <h1><h1t>bra bra bra...</h1t>
    <p>
    bra bra bra...
    </p>
    <cell-tab::table>
     <row>
      <cell>123<cell>456<cell><hp>0</hp></cell>
     </row>
     <row>
      <cell>abc<cell>def<cell><hp>Z</hp></cell>
     </row>
     ...
    </cell-tab::table>
    <p>
    bra bra bra...
    </p>
    <field-tab::table nrow=5 ncell=3>
     <field r=1 c=1>123<field r=1 c=2>456<field r=1 c=3><hp>0</hp></field>
     <field r=2 c=1>abc<field r=2 c=2>def<field r=2 c=3><hp>Z</hp></field>
     ...
    </field-tab::table>
    ...

6 Conclusion

As shown above, DTD designers can use a module as a building-block to build up their own DTDs. It enables the structured design of DTDs, and reduces the cost to design complex DTDs. It also enables DTD designers to use simple and natural element type names, without worrying about name conflicts.

Furthermore, the syntax proposed in this docuemnt for module require no changes to the interpretation of any existing constructs. So it can be introduced to SGML with keeping the upward compatibility completely.


ISO/IEC JTC1/SC18/WG8 N1873


Convenor: Dr. James D. Mason
Integrated Information Management and Dissemination Systems,
Information Management Services
Oak Ridge National Laboratory
Bldg. 2506, M.S. 6302, P.O. Box 2008
Oak Ridge, Tennessee 37831-6302 U.S.A.
Telephone, +1 615 574 6973; Fax, +1 615 574 6983; E-mail, masonjd@ornl.gov