Design Report for the W3C XML Specification DTD

by ArborText Inc.

Revision History
Revision 1.0.	7 April 1998.	Revised by: elm/sel.
First release of report, corresponding to the DTD with the FPI "-//W3C//DTD Specification::19990323//EN".

Contents

About This Report

Purpose and Scope
How to Use This Report
How to Read Elm Tree Diagrams

1 Introduction

1.1. Catalog of Analysis Inputs

1.2. Design Principles

1.2.1. Scope
1.2.2. Focus
1.2.3. Presentation Independence

1.3. Information for DTD Maintainers

1.3.1. Versioning and Updates
1.3.2. Naming and Coding Conventions
1.3.3. Parameter Entity Typology
1.3.4. XML Usage
1.3.5. Parameterization

1.4. Issues

2 Common Attributes

2.1. Attributes Appearing on Every Element

2.1.1. id Attribute
2.1.2. role Attribute

2.2. Attributes Appearing on Selected Elements

2.2.1. Key Attribute
2.2.2. Definition Attribute
2.2.3. Reference Attribute
2.2.4. Hypertext Reference Attribute and Source Attributes
2.2.5. Attributes Appearing on the htable Element

3 Document Hierarchy and Metadata Structures

3.1. Overall Specification Structure (spec)

Description
Processing Expectations
Rationale

3.2. Specification Header (header)

Description
Processing Expectations
Rationale

4 Standalone Element Structures

4.1. Paragraphs (p and statusp)

Description
Attributes
Processing Expectations
Rationale

4.2. Regular Lists

4.2.1. Unordered List (ulist) and Ordered List (olist)
4.2.2. Simple List (slist)
4.2.3. Glossary List (glist)

4.3. Special Lists

4.3.1. Bibliography List (blist)
4.3.2. Organization List (orglist)

4.4. Notes

4.4.1. Regular Note (note)
4.4.2. Well-Formedness Constraint Note (wfcnote) and Validity Constraint Note (vcnote)

4.5. Illustrations

4.5.1. Example (eg)
4.5.2. Graphic (graphic)
4.5.3. Code Scrap (scrap)
4.5.4. HTML-Style Table (htable)

5 Phrase-Level Structures

5.1. Annotations (footnote)

Description
Attributes
Processing Expectations

5.2. Terms and Definitions

5.2.1. Defined Term (term)
5.2.2. Term Definition (termdef)

5.3. Emphasized Text

5.3.1. Emphasized Text (emph)
5.3.2. Quote (quote)

5.4. References

5.4.1. Bibliography Reference (bibref)
5.4.2. URI Reference (loc)
5.4.3. Specification Reference (specref)
5.4.4. Term Reference (termref)
5.4.5. Title Reference (titleref)
5.4.6. External Specification Reference (xspecref)
5.4.7. External Term Definition Reference (xtermref)

5.5. Technical

5.5.1. Code Fragment (code)
5.5.2. Keyword (kw)
5.5.3. Nonterminal Reference (nt)
5.5.4. External Nonterminal Reference (xnt)

5.6. Editorial Notes (ednote)

Description
Attributes
Processing Expectations
Rationale

6 Making Connections

6.1. Linking Relationships
6.2. Data Content Notations
6.3. Characters and Symbols

7 Element Classes and Mixtures

7.1. Standalone Element Classes and Mixtures
7.2. Phrase-Level Element Mixtures

About This Report

This report documents the design of the XML specification DTD. This is the first release of the report, corresponding to the DTD with the FPI "-//W3C//DTD Specification::19990323//EN".
NOTE: This version of the report contains some graphics that have display anomalies (hidden diagonal lines showing up in GIFs). This will be corrected in a future version.

Purpose and Scope

This report describes the DTD used for World Wide Web Consortium (W3C) specifications and notes related to XML.

Following are the major contributors to the DTD design:

	Jon Bosak, Sun (XML chair)
	Tim Bray, Textuality and Netscape (XML co-editor)
	Dan Connolly, W3C (W3C staff contact)
	Eve Maler, ArborText (DTD maintainer)
	Gavin Nicol, Inso (DOM member)
	C. Michael Sperberg-McQueen, University of Illinois (XML co-editor)
	Lauren Wood, SoftQuad (DOM chair)

How to Use This Report

This report is organized as follows:

Chapter 1 lists the sources of input used during the DTD design effort, describes the project design parameters, and describes global outstanding issues. Read this to understand the basic principles underlying the design results and to review the issues.
Chapter 2 through Chapter 5 contain the markup models for the DTD, and Chapter 6 describes information linking relationships, nontextual data formats, and special symbols that the DTD encodes. Read these chapters to understand how the markup will be used with World Wide Web Consortium XML information, and the reasons for their design.
Note that, where appropriate, some processing expectations have been documented for the markup. This information is not to be considered a complete style specification; it simply records known requirements.
Chapter 7 contains the common categories of elements and the commonly used element mixtures in content models. Read this to understand the “mixture” content models described in Chapter 3, Chapter 4, and Chapter 5.

How to Read Elm Tree Diagrams

To understand the graphical “elm tree diagrams” used in this report, use the following legend.

Section 1 Introduction

Following is information on the sources of analysis input, the design principles governing the markup model, the implementation principles governing its expression in DTD form, and outstanding issues.

1.1. Catalog of Analysis Inputs

The following have been used as analysis input in designing the DTD:

Original XML specification DTD, jointly developed and revised by Michael Sperberg-McQueen, Tim Bray, and Jon Bosak
Michael's SWEB documentation
The XML specification
The XLink specifications
The XML/SGML comparison note
The DOM specifications

1.2. Design Principles

Following are the design principles governing the markup model of the DTD.

1.2.1. Scope

Although the DTD has come to be called “XMLspec,” it is intended for W3C working drafts, notes, recommendations, and all other document types that fall under the category of “technical reports.”

The DTD is responsible for covering three main aspects of XML technical reports:

Basic W3C technical report structure and content, including paragraphs, lists, cross-references, and so on
Structure specific to the XML-related family of W3C technical reports, such as EBNF productions and validity constraints
Proper headers and metadata for W3C technical reports

1.2.2. Focus

The DTD is intended to support the following functions, in order of priority:

Production of technical reports
First and foremost, the DTD should facilitate hassle-free production and publication. Many of the documents in the scope are made available in several output forms, including source XML, derived HTML, RTF, and PostScript. It is important to produce these outputs in a form that meets W3C requirements, and produce them quickly in order to speed the W3C publication release process. Also, it may be useful to extract different parts of the document content (for example, just the productions) for distribution.
Creation and modification of content
The DTD should provide an intuitive, efficient interface to the creation process. This means that the DTD shouldn't be overly large or complicated, but that it should provide support for information structures using the jargon, and to the depth, that authors will tend to understand the information.
Review of content
To a lesser degree, the DTD should support the informal workflow that goes on when co-editors pass around drafts for review. To this end, the DTD should provide markup for editor “communication” inside the document source.
Proof of concept of XML publishing
Finally, where possible, this DTD and its associated applications should use good XML practice and conforming XML tools, because many will look to this application as an example.

1.2.3. Presentation Independence

The DTD should avoid presentational markup where possible. Sometimes this principle comes into conflict with the production focus, but in general, presentation independence helps serve the goal of production of multiple outputs. In any case, egregious examples of formatting-specific markup should be avoided.

1.3. Information for DTD Maintainers

The following information gives background on implementation decisions.

1.3.1. Versioning and Updates

This DTD is given a formal public identifier in the following pattern:

-//W3C//DTD XML Specification::yyyymmdd//EN

The current version is identified as:

-//W3C//DTD XML Specification::19980323//EN

It is a goal to avoid backwards-incompatible changes where possible, but occasionally this is necessary. Always review the change history in any new version of the DTD carefully before deploying it.

Currently, DTD changes are at the discretion of the maintainer and the heaviest users of the DTD. A more formal procedure may be put in place by the W3C later.

1.3.2. Naming and Coding Conventions

The original element names were mostly kept; changes were made in a few cases only to rationalize the naming scheme.

Hyphens are avoided, except in the “w3c-” prefix.

Whitespace and tabs are used relatively sparingly to enhance readability; excessive whitespace is avoided in the interest of a compact and “unthreatening” DTD.

1.3.3. Parameter Entity Typology

Parameter entities are used in several different capacities in the DTD. To indicate their different roles, unique suffixes are used as follows:

descrip.att

The name, declared value, and default value specifications for a set of one or more attributes.

Some descriptions may have a sub-suffix, such as -req, which means that the attribute (or one of the attributes) in question is required.

descrip.class

A set of related elements that are typically available as options in certain content model “free mixtures” (repeatable-OR groups). These entities are referenced from within descrip.mix entity declarations, content models, and content model exceptions.

If you add a new standalone or phrase-level element, make sure that you add it to the appropriate class entity, or create a new class for it. If you create a new class, incorporate references to that class in the appropriate mixture entity declarations.

descrip.mix

A set of elements that are available to writers in certain contexts as a “free mixture” (repeatable-OR group). These entities are referenced from within content models.

descrip.pcd.mix

A set of #PCDATA and elements that are available to writers in certain contexts as a “free mixture” (repeatable-OR group). These entities are referenced from within content models. The presence of #PCDATA makes these “mixed” content models, which means that document creators can type regular text here.

local.descrip.class

An empty placeholder that is available to be used in extending an element class.

descrip.mdl

A content model fragment (other than a “free mixture”) that is common or customizable.

The goal in naming the entities was to be consistent and brief, without losing readability. The keyword indicating the entity type always appears last because the location of an entity reference will already give a clue as to the entity type, and so this is not the information that needs to be seen first when the DTD is read. This naming scheme also allows for easier searching.

1.3.4. XML Usage

The DTD conforms to XML V1.0. The intent is to make available an XML-compliant version of the DTD, even though some editors may choose to work and interchange in full SGML.

While XLink is used for all URL-style linking, the IDREF mechanism is still used heavily for internal links. As support for the #id(xxx) Xpointer construct grows, we will consider moving to this style of link for these cases.

1.3.5. Parameterization

The DTD is beginning to be used by other W3C Working Groups. While this DTD was designed with the needs of XML technical reports firmly in mind, quite a lot of the markup design would be useful for technical reports produced by others in the W3C. Therefore, the DTD has been parameterized to allow for:

Modification of certain content models that are likely to be subject to personal and Working Group preference
Addition of new elements at the “standalone” and “phrase” levels
Some limited removal of existing elements at the “standalone” and “phrase” levels

If it is found that the DTD can be made more widely useful solely by heavier parameterization, it would probably be worth it to add the new parameters.

Heed the following advice if you plan to develop a variant of the DTD:

Plan and document both the substance of your changes and the reasons for them.
Build variants only by redefining the original parameter entities, if possible; don't edit any of the original DTD files.
If you plan to interchange your files with other DTD users, favor markup changes that place tighter validation restrictions on documents (subsets), rather than changes that would allow instances that no longer conform to the standard DTD (extensions).
Avoid confusion by using a different formal public identifier for your DTD variant if you have changed any element or attribute markup characteristics. You may want to indicate the derivation with an identifier that uses the following pattern:
```
-//owner-ID//DTD XML Specification 19980323-Based your-descrip-and-version//lang
```

1.4. Issues

Following are outstanding global issues:

Consider adding a syntactic metavariable element, so that emph doesn't get abused too badly.
Eventually remove statusp. Note that there's a bug in how it's defined: it can contain paragraphs (of both types). This bug will go away when the element goes away.
Consider adding an optional href to name and affiliation, and allowing them and emailin regular text.
Revise XLink usage as necessary.
Dan's latest word on the appropriateness of external cross-references in specifications is that all references should be to a bibliography entry, and then the bibliography entry should point to the Web resource (if possible). This would suggest that we should freely allow bibref, but allow loc only in the special header fields such as “Latest Version” and in bibliography entries. Should we try to migrate over to this scheme?
Perhaps related is the fact that titleref is freely allowed in paragraphs as well as in bibl. Since titleref is like a restricted or subclassed form of loc, it may also be obsolescent. In addition, titleref appears to duplicate the hypertext function of bibl (or maybe it's the other way around, since it may be inappropriate to make the entire contents of bibl “hot”).

Dan has requested that the element type names in this DTD match HTML wherever possible. The question, how much is possible? About 10–15 of the element types in this DTD are strongly reminiscent of element types in HTML. However, in all cases, there are subtle differences (sometimes simply amounting to different subelements allowed inside the element in question). Should the element type names be made to match?

Following are the potential correspondences:

XML specification DTD	HTML	Comments
loc	a	The semantic is slightly different
p	p	No change needed
ulist	ul
item	li
olist	ol
slist	sl	For consistency; not HTML-based
glist	dl	Contents are significantly different
gitem	dli	For consistency; not HTML-based
label	dt
def	dd
blist	bl	For consistency; not HTML-based
eg	pre	The semantic and contexts are different
graphic	img
emph	em

Do any notations need to be defined? Currently, the graphic element is defined to point to a URI, and doesn't require that an unparsed entity be declared or invoked.
If a glossary list is used to organize term definitions, how can termdef properly be used? Currently, at least in the XLink-related drafts, the contents of the label element are surrounded with a termdef element and a term element isn't provided, while the actual definition text in def goes unmarked-up as such.

Section 2 Common Attributes

This chapter describes the markup design for attributes that appear on multiple element types in substantially similar form.

2.1. Attributes Appearing on Every Element

The following attributes are truly “common”; they are available on every element type and have the same basic meaning everywhere.

	Regular note
	Well-formedness constraint note
	Validity constraint note

	Bibliography reference
	URI reference
	Specification reference
	Term reference
	Title reference
	External specification reference
	External term definition reference

	Code fragment
	Keyword
	Nonterminal reference
	External nonterminal reference

Source of Link	Target of Link	Opt or Req	Scope of Link	Processing Expectations
`email`	External resource	Req	URL	Allow traversal from email address to resource
`bibl`	External resource	Opt	URL	Allow traversal from bibliographic entry to resource
`graphic`	External resource	Req	URL	Pull in graphic data and display in place
`scrap`	`language` in `langusage`	Req	IDREF	None
`wfc`	`wfcnote`	Req	IDREF	Generate the head text of the note and other surrounding text, and output in place
`vc`	`vcnote`	Req	IDREF	Generate the head text of the note and other surrounding text, and output in place
`nt`	`prod`	Req	IDREF	Allow traversal from the nonterminal to the production that defines it.
`bibref`	`bibl`	Req	IDREF	Allow traversal from bibliographic reference to the bibliographic entry
`loc`	External resource	Req	URL	Allow traversal from the mention of the location to the location itself
`specref`	`div1`, `div2`, div3, `inform-div1`	Req	IDREF	Generate, in place, an italic "[n.n], Section Title" reference based on the relevant information from the referenced division
`specref`	`item` in `olist`	Req	IDREF	Generate, in place, the sequential number of the referenced item
`specref`	`prod`	Req	IDREF	Generate, in place, the number of the production in brackets
`termref`	`termdef`	Req	IDREF	Allow traversal from the mention of the term to the location where the term is defined
`titleref`	External resource	Opt	URL	Allow traversal from the mention of the document's title to the document itself
`xnt`	External resource	Req	URL	Allow traversal from the mention of the nonterminal to the (remote) production for it
`xspecref`	External resource	Req	URL	Allow traversal from the mention of the spec reference to the (remote) location where the spec is discussed.
`xtermref`	External resource	Req	URL	Allow traversal from the mention of the term to the (remote) location where the term is defined

Symbol	Name	Definition	Description
&	amp	`&#38;`	Ampersand
'	apos	`'`	Apostrophe
>	gt	`>`	Greater than sign
“	ldquo	`“`	Left double quotation mark
<	lt	`&#60;`	Less than sign
—	mdash	--	Em dash
	nbsp	` `	No break (required) space
'	quot	`"`	Double quotation mark
”	rdquo	`”`	Right double quotation mark

	Unordered list (`ulist`)
	Ordered list (`olist`)
	Simple list (`slist`)
	Glossary list (`glist`)

	Regular note (`note`)
	Well-formedness constraint note (`wfcnote`)
	Validity constraint note (`vcnote`)

	Example (`eg`)
	Graphic (`graphic`)
	Code scrap (`scrap`)
	HTML-style table (`htable`)

	p	statusp	list	speclist	note	illus	ednote
`%div.mix;`: `div1`, `inform-div1div2`, `div3`, `div4`	X		X	X	X	X	X
`%obj.mix;`: `item`, `def`, `note`, `wfcnote`, `vcnote`, `footnote`	X		X	X	X	X	X
`%p.mix;`: `p`			X	X	X	X
`%statusp.mix;`: `status`, `statusp`	X	X	X
`%hdr.mix;`: `notice`, `abstract`, `pubstmt`, `sourcedesc`, `revisiondesc`	X		X				X
`%termdef.mix;`: `termdef`					X	X

	Bibliography reference `bibref`
	Specification reference `specref`
	Term reference `termref`
	Title reference `titleref`
	External specification reference `xspecref`
	External term definition reference `xtermref`

	`name`
	`bibl`

	`htable` (The default is “left”)
	`tr`
	`td`

	`tr`
	`td`

	`spec`
	`header`

Symbol	Name	Definition	Description
&	amp	`&#38;`	Ampersand
'	apos	`'`	Apostrophe
>	gt	`>`	Greater than sign
“	ldquo	`“`	Left double quotation mark
<	lt	`&#60;`	Less than sign
—	mdash	--	Em dash
	nbsp	` `	No break (required) space
'	quot	`"`	Double quotation mark
”	rdquo	`”`	Right double quotation mark

	Keyword (`kw`)
	Nonterminal reference (`nt`)
	External nonterminal reference (`xnt`)
	Code fragment (`code`)

	#PCD	annot	termdef	emph	ref	loc	ref	ednote
`%p.pcd.mix;`: `p`, `sitem`, `td`, `quote`	X	X	X	X	X	X	X	X
`%statusp.pcd.mix;`: `statusp`	X	X	X	X	X	X	X	X
`%head.pcd.mix;`: `head`	X	X		X		X		X
`%label.pcd.mix;`: `label`	X	X	X	X		X		X
`%eg.pcd.mix;`: `eg`, `bnf`	X	X		X				X
`%termdef.pcd.mix;`: `termdef`	X		`term`	X	X	X		X
`%bibl.pcd.mix;`: `bibl`	X		X	X	X			X
`%tech.pcd.mix;`: `code`, `kw`	X							X
`rhs`	X	`nt`, `xnt`, `com`
`com`	X	`loc`, `bibref`
`title`, `subtitle`, `version`, `w3c-designation`, `w3c-doctype`, `day`, `month`, `year`, `name`, `affiliation`, `email`, `language`, `role`, `lhs`, `date`, `edtext`, `emph`, `loc`, `nt`, `term`, `termref`, `titleref`, `xnt`, `xspecref`, `xtermref`	X

Symbol	Name	Definition	Description
&	amp	`&#38;`	Ampersand
'	apos	`'`	Apostrophe
>	gt	`>`	Greater than sign
“	ldquo	`“`	Left double quotation mark
<	lt	`&#60;`	Less than sign
—	mdash	--	Em dash
	nbsp	` `	No break (required) space
'	quot	`"`	Double quotation mark
”	rdquo	`”`	Right double quotation mark