[This local archive copy mirrored from the canonical site: http://www.ornl.gov/sgml/wg8/document/1987.htm; links may not have complete integrity, so use the canonical document at this URL if possible.]

ISO/IEC JTC1/WG4 N1987

TITLE:	A Proposal to Introduce "Module" Structures into SGML
SOURCE:	Toru Takahashi
PROJECT:	JTC1.18.15.1
PROJECT EDITOR:	Charles F. Goldfarb
STATUS:	Personal Contribution to the SGML RG in the WG4 Meeting at Paris
PURPOSE:	For Discussion about the Revision of SGML
SUPERSEDES:	WG8 N1873
REFERENCES:	Charles F.Goldfarb, "Draft Module Proposal for SGML Revision", 4 February 1998
DATE:	12 May 1998

1 Introduction

Designing a large, complex DTD is a very difficult job. One reason of this difficulty comes from SGML's restriction on namespaces.

For element type names, SGML allows only one name space throwghout a document. This restriction means, that if you intend to design a new DTD, you have to be familiar with all the element types you wish to use to construct the DTD, and have to select their names very carefully to avoid name conflicts.

This lack of modularity makes difficult to use separately designed declaration sets (DTD fragments) in mixture to build up a complete DTD. For example, if you want to use pre-defined declaration sets for "tables" and "math expressions" together to construct your own "report" DTD, you have to examine whether there are any name conflicts between them or not. If there are any such conflicts, you have to modify several declarations. This restriction in name space makes impossible to treat these declaration sets as public (read only) texts. Similar problems may occur on the parameter entity names.

To solve these problems, I propose to introduce the concept of "Module" into SGML.

2 Concept of "Module"

2.1 A Module as an External Parameter Entity

A module is declared as an external parameter entity. The newly introduced reserved name "MODULE" is used to indicate that the entity is refered to as a module. For example, the following declaration declares the content of the strage object located at "some location" as a module entity.

<!ENTITY % module1 SYSTEM "some location" MODULE>
%module1;

2.2 Namespaces

2.2.1 Element Types, General Entities, Notations

If a module contains definitions of element types, general enities or notations, the names of these objects are identified with "qualified name"s from the outside of the module. A "qualified name" is a single name token prefixed (qualified) by one or more module entity names. The sequence of module entity names unambiguously specifies the context where the named object is defined. (This context is specified relative to the context where the qualified name is appeared.)

For example, if the internal subset of your document contains following lines,

<!ENTITY % module1 SYSTEM "some location" MODULE>
%module1;

<!ELEMENT foo - - (a|b)*> and if the module entity "module1" contains following lines,

<!ELEMENT foo - - (#PCDATA)> the latter element type will be identified with the qualified name "module1:foo" in your document, and be distinguished from another element type "foo" defined in the internal subset. In this case, you can put the following declaration in the internal subset.

<!ELEMENT doc - - (foo, module1:foo)>

Note 1: the ":" in a qualified name represents the newly introduced delimiter qns (qualified name separator). Throughout this document, I will use ":" as the delimiter string for qns.

Note 2: A qualified name can be appeared wherever a simple name is permitted.

Note 3: If there is no ambiguity, simple (not qualified) names may be used to refer to the objects defined in modules. For example, if module1 in above example defines the element type "bar" and if there is no "bar" element type defined outside of the module, the simple name "bar" can be used instead of the qualified name "module1:bar". When two or more modules are used in nested fashion, an object defined in an inner module can be identified with a qualified name which is prefixed with a sequence of module entity names (from outside to inside order). For example, suppose that the internal subset contains a reference to module entity "module1" as follows,

<!ENTITY % module1 SYSTEM "some location" MODULE>
%module1; and module1 also contains a reference to another module as follows.

<!ENTITY % module2 SYSTEM "another location" MODULE>
%module2; In this case, the element type "x" defined in module2 can be specified with the qualified name "module1:module2:x" in your document, and with the qualified name "module2:x" in module1. If module1 also defines element type "x", it will be specified with names "module1:x" (in your document) and "x" (in module1 itself), so there will be no name conflicts.

2.2.2 Parameter Entities

On the other hand, a module entity defines a dictinct namespace for parameter entities. That is, parameter entity names defined outside of the module entity are not recognized within it, and vice versa. For example, if the internal subset of a document contains following lines,

<!ENTITY % model "(a|b)*">
<!ELEMENT foo - - %model;>

<!ENTITY % module1 SYSTEM "some location" MODULE>
%module1; and if the module entity "module1" contains following lines,

<!ENTITY % model "(c,d)">
<!ELEMENT bar - - %model;> the element type "foo" defined in the internal subset will have the content model "(a|b)*", and the element type "bar" defined in the module entity "module1" will have the content model "(c,d)".

Note 4: I'm not sure how to handle attribute names. Currently, I think that attribute names should not to be qualified, because every element type has its own namespace for attribute names.

2.3 Passing "parameter"s to Modules

One or more "parameter"s can be specified in the entity reference to a module entity. These parameters are passed to, and can be refered to from the module. For example, if your document contains the following lines, two parameter strings "a,b" and "c" are passed to the module "module1".

<!ENTITY % module1 SYSTEM "some location" MODULE>
%module1("a,b", "c");

Note 5: Calling these strings as "parameter"s may not appropriate because SGML uses the word "parameter" for different meaning. If so, more appropriate term should be invented.

Note 6: A parameter string is represented as a "parameter literal". These parameters can be refered to from module entity "module1" as follows.

<!ELEMENT foo - O ($1;)>
<!ELEMENT bar - O (#PCDATA|$2;)*> "$1;" and "$2;" in above example are "numbered parameter reference"s. A numbered parameter reference "$n;" refers to the n-th parameter string passed to the module. In this case, the element type "foo" will have the content model "(a,b)", and "bar" will have the content model "(#PCDATA|c)*".

When a numbered parameter refernce is appeared in a module, the names in the corresponding parameter string will be qualified in the context where the reference to the module is appeared (not in the context of referenced module). In other words, parameter strings are passed with their own namespaces. In this case, the interpreted result is equivalent to write following lines in your document.

<!ELEMENT module1:foo - O (a,b)>
<!ELEMENT module1:bar - O (#PCDATA|c)*> A numbered parameter reference can appear wherever a parameter entity reference is permitted. As the result, a numbered parameter reference itself can be passed to the nested module as a part of a parameter string. For example, suppose the case that your document refered to a module with a parameter string as follows,

<!ENTITY % module1 SYSTEM "some location" MODULE>
%module1("a,b"); and the module entity "module1" contains following lines.

<!ENTITY % module2 SYSTEM "some location" MODULE>
%module2("($1;)|c"); If module2 contains a element type declaration shown below,

<!ELEMENT bar - O ($1;)*> it is equivalent to write the following declaration in your document.

<!ELEMENT module1:module2:bar - O ((a,b)|module1:c)*>

3 Usage Examples

3.1 Self Contained (Closed) Modules

The following is an example of module entity which contains declarations for math expressions.

<!ENTITY % math.expr "#PCDATA|power|root|sq|sqrt|frac|...">

<!ELEMENT f      - O (%math.expr;)   -- inline formula -->
<!ELEMENT df     - O (%math.expr;)+ -- display formula -->
<!ELEMENT power - O (coef, of)>
<!ELEMENT root   - O (coef, of)>
<!ELEMENT coef   O O (%math.expr;)>
<!ELEMENT of     - O (%math.expr;)>
... This module entity is self contained (including all declarations needed to describe math expressions), and can be refered to from a document without passing parameter strings. Here is an example:

<!DOCUMENT doc SYSTEM "location for main DTD" [
<!ENITITY % math SYSTEM "some location" MODULE>
%math;
...
]>
<doc>
...
<p>
bra bra bra <math:f><math:power>3<math:of>(a + b)</math:f> bra bra...
bra bra bra...
<math:df>
x = <math:root>4<math:of>c - d</math:root>
</math:df>
<p>
bra bra bra bra...
...
</doc>

3.2 Parameterized (Open) Modules

The following is an example of module entity which contains declarations for a very simple table.

<!ENTITY % title-model "($1;)">
<!ENTITY % cell-model "($2;)">

<!ELEMENT table - - (title, body)>
<!ELEMENT title - O %title-model;>
<!ELEMENT body   - - (row)+>
<!ELEMENT row    - O (cell)+>
<!ELEMENT cell   - O %cell-model;> This module defines the framework of table structure, but does not define what kind of elements can be appeared in the content of a title element or a cell element, because these matters should be opened to the users of this module. Users can specify content models of element types remaind open in the module by passing parameter strings. For example, the following reference to this module,

<!ENTITY % ph "#PCDATA|q|em">
<!ENTITY % tbl SYSTEM "some location" MODULE>
%tbl("#PCDATA", "(%ph;)*"); will result in the following interpretation:

<!ELEMENT tbl:table - - (tbl:title, tbl:body)>
<!ELEMENT tbl:title - O (#PCDATA)>
<!ELEMENT tbl:body   - - (tbl:row)+>
<!ELEMENT tbl:row    - O (tbl:cell)+>
<!ELEMENT tbl:cell   - O ((#PCDATA|q|em)*)> A parameterized module behaves as a templete which partially defines its structure. It can vary its final structure according to the parameter values even in a single document. For example, suppose that your document contains two references to the same table module in its internal subset as follows:

<!ENTITY % tbl1 SYSTEM "some location" MODULE>
%tbl1("#PCDATA", "(%ph;)*");
<!ENTITY % tbl2 SYSTEM "some location" MODULE>
%tbl2("#PCDATA", "(%para;)*"); In this case, a single entity is refered to as two types of table modules which has different structures. In the first invocation, it will have the structure which is similar to the previous example, but in the second invocation, it will have the structure which permits paragraphs, lists and tables itself in the content of a table cell element. The interpreted result will be as follows:

<!ELEMENT tbl1:table - - (tbl1:title, tbl1:body)>
<!ELEMENT tbl1:title - O (#PCDATA)>
<!ELEMENT tbl1:body   - - (tbl1:row)+>
<!ELEMENT tbl1:row    - O (tbl1:cell)+>
<!ELEMENT tbl1:cell   - O ((#PCDATA|q|em)*)>

<!ELEMENT tbl2:table - - (tbl2:title, tbl2:body)>
<!ELEMENT tbl2:title - O (#PCDATA)>
<!ELEMENT tbl2:body   - - (tbl2:row)+>
<!ELEMENT tbl2:row    - O (tbl2:cell)+>
<!ELEMENT tbl2:cell   - O ((p|list|tbl1:table|tbl2:table)*)>

4 Definitions Relating to Modules

module entity: An external parameter entity that is validly declared as a module declaration set.
modular name: The name of an element type, notation or general entity that is declared within a module entity. There are two types: simple and qualified.
simple modular name: A modular name that consists of a single name token.
qualified name: A modular name that consists of a single name token prefixed by one or more module entity names.

5 Added / Modified Productions

Note 7: New productions and modified portions are represented in italic style in the following list. [60] parameter entity reference =
                pero,
                name group?,
                name,
                reference parameter group?,
                reference end

[65] ps =
                s | Ee |
                parameter entity reference |
                numbered parameter reference

[67] replaceable parameter data =
                ( parameter entity reference |
                  character reference |
                  numbered parameter reference |
                  data character |
                  Ee )*

[70] ts =
                s | Ee |
                parameter entity reference |
                numbered parameter reference

[xx] numbered parameter reference =
               npro,
               number,
               reference end

where: npro is a numbered parameter reference open delimiter string. [xx] reference parameter group =
               grpo,
               ts*,
               parameter literal,
               ( ts*,
                 connector,
                 ts*,
                 parameter literal )*,
               ts*,
               grpc

[108] external entity specification =
external identifier,
(ps+, (entity type | "MODULE"))?

where: MODULE asserts that the entity text is a module declaration set; it can be specified only if the entity name is a parameter entity name. [108.1] module declaration set =
               (entity declaration |
                element type declaration |
                attribute definition list declaration |
                notation declaration |
                ds)*

[108.2] modular name =
name | qualified name

where: "name" is a simple modular name. [108.3] qualified name =
module entity name, qns, modular name

where: "module entity name" is the name of the module in which the "modular name" is declared. qns is the qualified name separator delimiter string.

6 Open issues

Should it be restricted to pass parameter strings to module entities? If not, should it be allowed to pass parameter strings to general entities?