[This local archive copy mirrored from the canonical site: http://www.ornl.gov/sgml/wg8/document/1987.htm; links may not have complete integrity, so use the canonical document at this URL if possible.]
TITLE: | A Proposal to Introduce "Module" Structures into SGML |
SOURCE: | Toru Takahashi |
PROJECT: | JTC1.18.15.1 |
PROJECT EDITOR: | Charles F. Goldfarb |
STATUS: | Personal Contribution to the SGML RG in the WG4 Meeting at Paris |
PURPOSE: | For Discussion about the Revision of SGML |
SUPERSEDES: | WG8 N1873 |
REFERENCES: | Charles F.Goldfarb, "Draft Module Proposal for SGML Revision", 4 February 1998 |
DATE: | 12 May 1998 |
Designing a large, complex DTD is a very difficult job. One reason of this difficulty comes from SGML's restriction on namespaces.
For element type names, SGML allows only one name space throwghout a document. This restriction means, that if you intend to design a new DTD, you have to be familiar with all the element types you wish to use to construct the DTD, and have to select their names very carefully to avoid name conflicts.
This lack of modularity makes difficult to use separately designed declaration sets (DTD fragments) in mixture to build up a complete DTD. For example, if you want to use pre-defined declaration sets for "tables" and "math expressions" together to construct your own "report" DTD, you have to examine whether there are any name conflicts between them or not. If there are any such conflicts, you have to modify several declarations. This restriction in name space makes impossible to treat these declaration sets as public (read only) texts. Similar problems may occur on the parameter entity names.
To solve these problems, I propose to introduce the concept of "Module" into SGML.
A module is declared as an external parameter entity. The newly introduced reserved name "MODULE" is used to indicate that the entity is refered to as a module. For example, the following declaration declares the content of the strage object located at "some location" as a module entity.
<!ENTITY % module1
SYSTEM "some location" MODULE>
%module1;
If a module contains definitions of element types, general enities or notations, the names of these objects are identified with "qualified name"s from the outside of the module. A "qualified name" is a single name token prefixed (qualified) by one or more module entity names. The sequence of module entity names unambiguously specifies the context where the named object is defined. (This context is specified relative to the context where the qualified name is appeared.)
For example, if the internal subset of your document contains following lines,
<!ENTITY % module1
SYSTEM "some location" MODULE>
%module1;
<!ELEMENT foo - - (a|b)*> and if the module entity "module1" contains following lines,
<!ELEMENT foo - - (#PCDATA)> the latter element type will be identified with the qualified name "module1:foo" in your document, and be distinguished from another element type "foo" defined in the internal subset. In this case, you can put the following declaration in the internal subset.
<!ELEMENT doc - - (foo, module1:foo)>
Note 1: the ":" in a qualified name represents the newly introduced delimiter qns (qualified name separator). Throughout this document, I will use ":" as the delimiter string for qns.
Note 2: A qualified name can be appeared wherever a simple name is permitted.
Note 3: If there is no ambiguity, simple (not qualified) names may be used to refer to the objects defined in modules. For example, if module1 in above example defines the element type "bar" and if there is no "bar" element type defined outside of the module, the simple name "bar" can be used instead of the qualified name "module1:bar". When two or more modules are used in nested fashion, an object defined in an inner module can be identified with a qualified name which is prefixed with a sequence of module entity names (from outside to inside order). For example, suppose that the internal subset contains a reference to module entity "module1" as follows,
<!ENTITY % module1
SYSTEM "some location" MODULE>
%module1;
and module1 also contains a reference to another module as follows.
<!ENTITY % module2
SYSTEM "another location" MODULE>
%module2;
In this case, the element type "x" defined in module2 can be specified
with the qualified name "module1:module2:x" in your document, and with
the qualified name "module2:x" in module1. If module1 also defines
element type "x", it will be specified with names "module1:x"
(in your document) and "x" (in module1 itself), so there will be no
name conflicts.
On the other hand, a module entity defines a dictinct namespace for parameter entities. That is, parameter entity names defined outside of the module entity are not recognized within it, and vice versa. For example, if the internal subset of a document contains following lines,
<!ENTITY % model
"(a|b)*">
<!ELEMENT foo -
- %model;>
<!ENTITY % module1
SYSTEM "some location" MODULE>
%module1;
and if the module entity "module1" contains following lines,
<!ENTITY % model
"(c,d)">
<!ELEMENT bar -
- %model;> the element type "foo" defined in
the internal subset will have the content model "(a|b)*", and the
element type "bar" defined in the module entity "module1"
will have the content model "(c,d)".
Note 4: I'm not sure how to handle attribute names. Currently, I think that attribute names should not to be qualified, because every element type has its own namespace for attribute names.
One or more "parameter"s can be specified in the entity reference to a module entity. These parameters are passed to, and can be refered to from the module. For example, if your document contains the following lines, two parameter strings "a,b" and "c" are passed to the module "module1".
<!ENTITY % module1
SYSTEM "some location" MODULE>
%module1("a,b",
"c");
Note 5: Calling these strings as "parameter"s may not appropriate because SGML uses the word "parameter" for different meaning. If so, more appropriate term should be invented.
Note 6: A parameter string is represented as a "parameter literal". These parameters can be refered to from module entity "module1" as follows.
<!ELEMENT foo -
O ($1;)>
<!ELEMENT bar -
O (#PCDATA|$2;)*> "$1;" and "$2;" in
above example are "numbered parameter reference"s. A numbered
parameter reference "$n;" refers to the n-th parameter string passed
to the module. In this case, the element type "foo" will have the
content model "(a,b)", and "bar" will have the content model
"(#PCDATA|c)*".
When a numbered parameter refernce is appeared in a module, the names in the corresponding parameter string will be qualified in the context where the reference to the module is appeared (not in the context of referenced module). In other words, parameter strings are passed with their own namespaces. In this case, the interpreted result is equivalent to write following lines in your document.
<!ELEMENT module1:foo
- O (a,b)>
<!ELEMENT module1:bar
- O (#PCDATA|c)*> A numbered parameter reference can
appear wherever a parameter entity reference is permitted. As the result, a
numbered parameter reference itself can be passed to the nested module as a part
of a parameter string. For example, suppose the case that your document refered
to a module with a parameter string as follows,
<!ENTITY % module1
SYSTEM "some location" MODULE>
%module1("a,b");
and the module entity "module1" contains following lines.
<!ENTITY % module2
SYSTEM "some location" MODULE>
%module2("($1;)|c");
If module2 contains a element type declaration shown below,
<!ELEMENT bar - O ($1;)*> it is equivalent to write the following declaration in your document.
<!ELEMENT module1:module2:bar - O ((a,b)|module1:c)*>
The following is an example of module entity which contains declarations for math expressions.
<!ENTITY % math.expr "#PCDATA|power|root|sq|sqrt|frac|...">
<!ELEMENT f
- O (%math.expr;) -- inline formula -->
<!ELEMENT df
- O (%math.expr;)+ -- display formula -->
<!ELEMENT power
- O (coef, of)>
<!ELEMENT root
- O (coef, of)>
<!ELEMENT coef
O O (%math.expr;)>
<!ELEMENT of
- O (%math.expr;)>
...
This module entity is self contained (including all declarations needed to
describe math expressions), and can be refered to from a document without
passing parameter strings. Here is an example:
<!DOCUMENT doc SYSTEM "location
for main DTD" [
<!ENITITY %
math SYSTEM "some location" MODULE>
%math;
...
]>
<doc>
...
<p>
bra bra bra <math:f><math:power>3<math:of>(a
+ b)</math:f> bra bra...
bra bra bra...
<math:df>
x = <math:root>4<math:of>c
- d</math:root>
</math:df>
<p>
bra bra bra bra...
...
</doc>
The following is an example of module entity which contains declarations for a very simple table.
<!ENTITY % title-model
"($1;)">
<!ENTITY % cell-model
"($2;)">
<!ELEMENT table
- - (title, body)>
<!ELEMENT title
- O %title-model;>
<!ELEMENT body
- - (row)+>
<!ELEMENT row
- O (cell)+>
<!ELEMENT cell
- O %cell-model;> This module defines the framework of
table structure, but does not define what kind of elements can be appeared in
the content of a title element or a cell element, because these matters should
be opened to the users of this module. Users can specify content models of
element types remaind open in the module by passing parameter strings. For
example, the following reference to this module,
<!ENTITY % ph
"#PCDATA|q|em">
<!ENTITY % tbl
SYSTEM "some location" MODULE>
%tbl("#PCDATA",
"(%ph;)*"); will result in the following interpretation:
<!ELEMENT tbl:table
- - (tbl:title, tbl:body)>
<!ELEMENT tbl:title
- O (#PCDATA)>
<!ELEMENT tbl:body
- - (tbl:row)+>
<!ELEMENT tbl:row
- O (tbl:cell)+>
<!ELEMENT tbl:cell
- O ((#PCDATA|q|em)*)> A parameterized module behaves as
a templete which partially defines its structure. It can vary its final
structure according to the parameter values even in a single document. For
example, suppose that your document contains two references to the same table
module in its internal subset as follows:
<!ENTITY % para "p|list|tbl1:table|tbl2:table">
<!ENTITY % ph
"#PCDATA|q|em">
<!ENTITY % tbl1
SYSTEM "some location" MODULE>
%tbl1("#PCDATA",
"(%ph;)*");
<!ENTITY % tbl2
SYSTEM "some location" MODULE>
%tbl2("#PCDATA",
"(%para;)*"); In this case, a single entity is refered
to as two types of table modules which has different structures. In the first
invocation, it will have the structure which is similar to the previous example,
but in the second invocation, it will have the structure which permits
paragraphs, lists and tables itself in the content of a table cell element. The
interpreted result will be as follows:
<!ELEMENT tbl1:table
- - (tbl1:title, tbl1:body)>
<!ELEMENT tbl1:title
- O (#PCDATA)>
<!ELEMENT tbl1:body
- - (tbl1:row)+>
<!ELEMENT tbl1:row
- O (tbl1:cell)+>
<!ELEMENT tbl1:cell
- O ((#PCDATA|q|em)*)>
<!ELEMENT tbl2:table
- - (tbl2:title, tbl2:body)>
<!ELEMENT tbl2:title
- O (#PCDATA)>
<!ELEMENT tbl2:body
- - (tbl2:row)+>
<!ELEMENT tbl2:row
- O (tbl2:cell)+>
<!ELEMENT tbl2:cell
- O ((p|list|tbl1:table|tbl2:table)*)>
Note 7: New productions and modified portions are represented in
italic style in the following list. [60] parameter entity reference =
pero,
name group?,
name,
reference parameter group?,
reference end
[65] ps =
s | Ee |
parameter entity reference |
numbered parameter reference
[67] replaceable parameter data =
( parameter entity reference |
character reference |
numbered parameter reference |
data character |
Ee )*
[70] ts =
s | Ee |
parameter entity reference |
numbered parameter reference
[xx] numbered parameter reference =
npro,
number,
reference
end
where: npro is a numbered parameter reference open delimiter
string.
[xx] reference parameter group =
grpo,
ts*,
parameter
literal,
(
ts*,
connector,
ts*,
parameter
literal )*,
ts*,
grpc
[108] external entity specification =
external identifier,
(ps+, (entity type | "MODULE"))?
where: MODULE asserts that the entity text is a module declaration
set; it can be specified only if the entity name is a parameter entity name.
[108.1] module declaration set =
(entity
declaration |
element
type declaration |
attribute
definition list declaration |
notation
declaration |
ds)*
[108.2] modular name =
name
| qualified name
where: "name" is a simple modular name.
[108.3] qualified name =
module
entity name, qns, modular name
where: "module entity name" is the name of the module in which the "modular name" is declared. qns is the qualified name separator delimiter string.