[This local archive copy mirrored from the canonical site: http://www.ccil.org/~cowan/XSchema-draft-19980601.txt, 19980603; links may not have complete integrity, so use the canonical document at this URL if possible.]

XML XSchemas: Language/Metalanguage Unification


Title:          XML XSchemas: Language/Metalanguage Unification
Source:         John Cowan <cowan@ccil.org>
Primary Author: John Cowan (no W3C affiliation)
Date:		1998-06-01
Status:         Expert contribution
Action:         For the consideration of W3C XML WG/SI
References:	REC-xml-19980210
Distribution:   All interested parties


Synopsis

This note describes one possible method of rendering the information in
an XML DTD in the form of an XML document instance.  Such an instance is
called an XSchema.  An XSchema contains all the information necessary to
validate any XML document defined by it, and can be automatically
created from an XML DTD. It is not possible to reconstruct the exact DTD
used to create an XSchema, but a functionally equivalent DTD can be
reconstructed.

XSchemas are primarily meant for consumption by automated validators.
Non-validating XML parsers do not require them, and human beings will
find them painful to write by hand, due to their great redundancy
compared to DTDs.


1.  General Considerations

The following points explain how various constructions in DTDs are
represented or not represented in XSchemas.

1.1 XSchemas Represent Full DTDs, Not Internal Or External Subsets

When creating an XSchema corresponding to a particular DTD, both the
internal subset (contained within the document's DOCTYPE declaration)
and the external subset (referred to from the DOCTYPE declaration) are
taken into account.

1.2 Parameter Entities

Before converting a DTD to an XSchema, it is necessary to replace all
parameter entity references with their replacement texts, and to remove
parameter entity declarations from the DTD. (In principle, it would be
possible to create XSchemas that used general entities corresponding to
at least some parameter entities, but the constraint that general
entities be well-formed might prevent some parameter entities from being
so converted.)

This is the main reason why the exact DTD used to construct an XSchema
cannot be recovered: common structure expressed using parameter entities
is not easily recoverable.

1.3 DTD Comment Declarations And Processing Instructions

Since comment declarations are allowed in XML document instances, they
may be passed from DTDs to XSchemas unchanged.  Processing instructions
are likewise passed through unchanged.

1.4 Conditional Sections

All conditional sections must be removed from a DTD before converting it
to an XSchema.  INCLUDE conditional sections are replaced by their
contents, whereas IGNORE conditional sections are removed entirely.

1.5 Element And Attribute Names

Many of the element names in the XSchema are the same as XML declaration
keywords (and are therefore in upper case).  There can be no confusion
between elements and declarations, however, because markup declarations
begin with "<!"  rather than simply "<".  For consistency, all other
XSchema element names are also in upper case.  All attribute names are
in lower case.

1.6 Extraneous Whitespace

All whitespace in element content is totally ignored in XSchemas.  Only
ENTITY elements have #PCDATA content.



2.  The DOCTYPE Element

The DOCTYPE element is the root element of an XSchema, containing all
the other XSchema elements.  Its attributes give properties of the
XSchema as a whole.

2.1 DOCTYPE Element Content Model

The DOCTYPE element may contain ELEMENT, ENTITY, and NOTATION elements
describing respectively the elements, entities, and notations described
by the XSchema.

2.2 DOCTYPE Attributes

The only (and required) attribute of a DOCTYPE element is "root", which
is a name token attribute specifying the root element of all document
instances conforming to this XSchema.


3.  The ELEMENT Element And Related Elements

There is a single ELEMENT element in the XSchema for every element type
in the document described by the XSchema.  The attributes and content of
an ELEMENT element provide complete information about the element
described.  This implies that when converting DTDs to XSchemas, multiple
ATTLIST declarations must be consolidated.

3.1 ELEMENT Element Content Model

The first child of an ELEMENT element indicates the model of the element
being represented: it can be an EMPTY element, an ANY element, a MIXED
element, or any of the following elements declaring element content:
NAME, CHOICE, SEQ, OPT, OPTRPT, REPEAT.  The remaining children are
optional ATT elements declaring attributes.

3.2 ELEMENT Attributes

The only (and required) attribute of an ELEMENT element is "name", which
is a name token attribute specifying the name of the element.

3.3 EMPTY elements

An EMPTY element is an empty element with no attributes.  It is used to
describe a content model of EMPTY.

3.4 ANY elements

An ANY element is also an empty element with no attributes.  It is used
to describe a content model of ANY.

3.5 MIXED elements

A MIXED element contains optional NAME elements and has no attributes.
It is used to describe a mixed-content model including #PCDATA (parsed
character data) and the elements named in the NAME elements.

3.6 NAME elements

A NAME element is an empty element with one required name token
attribute, "name", which is used to specify the name of an element
participating in a mixed-content or element-content model.

3.7 CHOICE elements

A CHOICE element may contain NAME, CHOICE, SEQ, OPT, OPTRPT, or REPEAT
elements, and represents a choice list as part of an element-content
model.  It has no attributes.  It is not used within a MIXED element,
where the presence of a choice list is implicit.

3.8 SEQ elements

A SEQ element may contain NAME, CHOICE, SEQ, OPT, OPTRPT, or REPEAT
elements, and represents a sequence list as part of an element-content
model.  It has no attributes.

3.9 OPT elements

An OPT element may contain NAME, CHOICE, or SEQ elements, and represents
an option (represented by the "?"  character in DTDs) as part of an
element-content model.  It has no attributes.

3.10 OPTRPT element

An OPTRPT element may contain NAME, CHOICE, or SEQ elements, and
represents an optional repetition (represented by the "*" character in
DTDs) as part of an element-content model.  It has no attributes.  It is
not used within a MIXED element, where the presence of an optional
repetition is implicit.

3.11 REPEAT element

A REPEAT element may contain NAME, CHOICE, or SEQ elements, and
represents an option (represented by the "+" character in DTDs) as part
of an element-content model.  It has no attributes.


4.  The ATT Element And Related Elements

There is a single ATT element for each attribute described.  ATT
elements are always the children of ELEMENT elements.

4.1 ATT Element Content

An ATT element has one or two children.  The first child, which is
optional, is either a TYPE element specifying a predefined XML attribute
type, or an ENUMTYPE element specifying an enumerated type.  If neither
a TYPE element nor an ENUMTYPE element is present, the attribute type is
CDATA.

The second child, which is always present, is either a REQUIRED element,
an IMPLIED element, a FIXED element, or a VALUE element.

4.2 ATT Element Attributes

The only (and required) attribute of an ATT element is "name", which is
a name token attribute specifying the name of the attribute.

4.3 TYPE Element

A TYPE element is an empty element specifying that an attribute has one
of the standard XML types.  There is one required attribute of a TYPE
element, "type", which can take the values "ID", "IDREF", "IDREFS",
"ENTITY", "ENTITIES", "NMTOKEN", or "NMTOKENS".

4.4 ENUMTYPE Elements

An ENUMTYPE element specifies that the attribute it describes can take
on one of a fixed set of values.  The values are described by VALUE
elements which are the children of the ENUMTYPE element.  ENUMTYPE
elements have no attributes.

4.5 REQUIRED Element

A REQUIRED element specifies that the attribute it describes is
required.  REQUIRED elements are empty and have no attributes.

4.6 IMPLIED element

An IMPLIED element specifies that the attribute it describes, if not
explicitly present in the document, has an application-determined value.
IMPLIED elements are empty and have no attributes.

4.7 FIXED Element

A FIXED element specifies that the attribute it describes always has the
value specified by the required "value" attribute of the FIXED element.
This attribute is character data.  FIXED elements are empty.

4.8 VALUE element

A VALUE element is empty and has one required character data attribute,
"value".  As a child of an ATT element, it specifies a default value for
the attribute being described.  As a child of an ENUMTYPE element, it
specifies one of the possible values for the attribute being described.


5.  The ENTITY Element

An ENTITY element describes a general entity, which may be internal or
external, and if external, may be parsed or unparsed.  They correspond
to ENTITY declarations in DTDs, other than parameter entity
declarations.

If multiple entity declarations with the same entity name appear in a
DTD, all but the first must be ignored when converting to an XSchema.

5.1 ENTITY Element Content Model

The content of an ENTITY element is character data, and represents the
replacement text of an internal entity.  ENTITY elements declaring
external entities must be empty.

5.2 ENTITY Element Attributes

The attributes of an ENTITY element are "name", "href", "public", and
"notation".  The "name" attribute is a required name token attribute and
represents the name of the entity being described.

The other attributes are string attributes.  The "href" attribute is
required for external entities, and represents the system identifier for
the entity.  The "public" attribute is optional for external entities,
and represents the public identifier for the entity.  The "notation"
attribute is required for unparsed external entities, and represents the
notation for the entity.


6.  The NOTATION Element

A NOTATION element describes a notation.  They correspond to NOTATION
declarations in DTDs.  NOTATION elements are empty.

6.1 NOTATION Element Attributes

The attributes of a NOTATION element are "name", "href", and "public",
and have the same significance as the correspondingly named attributes
of an ENTITY element.


7.  Conformance

An XSchema conforms to this document if it is valid and contains a
DOCTYPE declaration referring to an external DTD subset substantially
equivalent to that given in Appendix A.  No internal subset may appear
in the DOCTYPE declaration.

An XSchema also conforms to this document if it is well-formed, does not
contain a DOCTYPE declaration, and would be valid if it contained a
DOCTYPE declaration referring to a DTD substantially equivalent to that
given in Appendix A.


Appendix A.  XML DTD for XSchemas

<--

This is the 19980407 draft of the DTD for XSchemas,
which are XML documents that contain the same information as XML DTDs.

Typical usage:  <!DOCTYPE DOCTYPE PUBLIC
			"--//blather/XSchema/EN"
			"dsd.dtd">

-->

<!ELEMENT DOCTYPE (ELEMENT|ENTITY|NOTATION)*>
<!ATTLIST DOCTYPE
	root	NMTOKEN	#REQUIRED>

<!ENTITY % repeatable "NAME | CHOICE | SEQ">
<!ENTITY % sequenceable "%repeatable; | OPT | OPTRPT | REPEAT">
<!ENTITY % name "name NMTOKEN #REQUIRED">

<!ELEMENT ELEMENT ((EMPTY | ANY | MIXED | %sequenceable;), (ATT)*)>
<!ATTLIST ELEMENT %name;>

<!ELEMENT EMPTY EMPTY>

<!ELEMENT ANY EMPTY>

<!ELEMENT MIXED (NAME*)>

<!ELEMENT NAME EMPTY>
<!ATTLIST NAME %name;>

<!ELEMENT CHOICE (%sequenceable;)+>

<!ELEMENT SEQ (%sequenceable;)+>

<!ELEMENT OPT (%repeatable;)>

<!ELEMENT OPTRPT (%repeatable;)>

<!ELEMENT REPEAT (%repeatable;)>

<!ELEMENT ATT ((TYPE | ENUMTYPE)?, (REQUIRED | IMPLIED | FIXED | VALUE)>
<!ATTLIST ATT %name;>

<!ELEMENT TYPE EMPTY>
<!ATTLIST TYPE
	type (ID|IDREF|IDREFS|ENTITY|ENTITIES|NMTOKEN|NMTOKENS) #REQUIRED>

<!ELEMENT ENUMTYPE (VALUE)+>

<!ELEMENT REQUIRED EMPTY>

<!ELEMENT IMPLIED EMPTY>

<!ENTITY % value "value CDATA #REQUIRED">

<!ELEMENT FIXED EMPTY>
<!ATTLIST FIXED %value;>

<!ELEMENT VALUE EMPTY>
<!ATTLIST VALUE %value;>

<!ENTITY % external
	"name CDATA #REQUIRED
	href CDATA #IMPLIED
	public DATA #IMPLIED">

<!ELEMENT ENTITY (#PCDATA))
<!ATTLIST ENTITY
	%external;
	notation CDATA #IMPLIED>

<!ELEMENT NOTATION EMPTY>
<!ATTLIST NOTATION
	%external;>

<-- End of DTD -->


Appendix B.  Meta-XSchema

This is the XSchema that describes XSchemas, derived from the XSchema DTD
above.  You are not expected to understand this.

<?xml version="1.0" standalone="yes"?>

<!DOCTYPE DOCTYPE SYSTEM "dsd.dtd">

<DOCTYPE root="DOCTYPE">

	<ELEMENT name="DOCTYPE">
		<OPTRPT>
			<NAME name="ELEMENT"/>
			<NAME name="ENTITY"/>
			<NAME name="NOTATION"/>
		</OPTRPT>
		<ATT name="root">
			<TYPE type="NMTOKEN"/>
			<REQUIRED/>
		</ATT>
	</ELEMENT>

	<ELEMENT name="ELEMENT">
		<SEQ>
			<CHOICE>
				<NAME name="EMPTY"/>
				<NAME name="ANY"/>
				<NAME name="MIXED"/>
				<NAME name="NAME"/>
				<NAME name="CHOICE"/>
				<NAME name="SEQ"/>
				<NAME name="OPT"/>
				<NAME name="OPTRPT"/>
				<NAME name="REPEAT"/>
			</CHOICE>
			<OPTRPT>
				<NAME name="ATT"/>
			</OPTRPT>
		</SEQ>
		<ATT name="name">
			<TYPE type="NMTOKEN"/>
			<REQUIRED/>
		</ATT>
	</ELEMENT>

	<ELEMENT name="EMPTY">
		<EMPTY/>
	</ELEMENT>

	<ELEMENT name="ANY">
		<EMPTY/>
	</ELEMENT>

	<ELEMENT name="MIXED">
		<OPTRPT>
			<NAME name="NAME"/>
		</OPTRPT>
	</ELEMENT>

	<ELEMENT name="NAME">
		<EMPTY/>
		<ATT name="name">
			<TYPE type="NMTOKEN"/>
			<REQUIRED/>
		</ATT>
	</ELEMENT>

	<ELEMENT name="CHOICE">
		<REPEAT>
			<CHOICE>
				<NAME name="NAME"/>
				<NAME name="CHOICE"/>
				<NAME name="SEQ"/>
				<NAME name="OPT"/>
				<NAME name="OPTRPT"/>
				<NAME name="REPEAT"/>
			</CHOICE>
		</REPEAT>
	</ELEMENT>

	<ELEMENT name="SEQ">
		<REPEAT>
			<CHOICE>
				<NAME name="NAME"/>
				<NAME name="CHOICE"/>
				<NAME name="SEQ"/>
				<NAME name="OPT"/>
				<NAME name="OPTRPT"/>
				<NAME name="REPEAT"/>
			</CHOICE>
		</REPEAT>
	</ELEMENT>

	<ELEMENT name="OPT">
		<CHOICE>
			<NAME name="NAME"/>
			<NAME name="CHOICE"/>
			<NAME name="SEQ"/>
		</CHOICE>
	</ELEMENT>

	<ELEMENT name="OPTRPT">
		<CHOICE>
			<NAME name="NAME"/>
			<NAME name="CHOICE"/>
			<NAME name="SEQ"/>
		</CHOICE>
	</ELEMENT>

	<ELEMENT name="REPEAT">
		<CHOICE>
			<NAME name="NAME"/>
			<NAME name="CHOICE"/>
			<NAME name="SEQ"/>
		</CHOICE>
	</ELEMENT>

	<ELEMENT name="ATT">
		<SEQ>
			<OPT>
				<CHOICE>
					<NAME name="TYPE"/>
					<NAME name="ENUMTYPE"/>
				</CHOICE>
			</OPT>
			<CHOICE>
				<NAME name="REQUIRED"/>
				<NAME name="IMPLIED"/>
				<NAME name="FIXED"/>
				<NAME name="VALUE"/>
			</CHOICE>
		</SEQ>
		<ATT name="name">
			<REQUIRED/>
		</ATT>
	</ELEMENT>

	<ELEMENT name="TYPE">
		<EMPTY/>
		<ATT name="type">
			<ENUMTYPE>
				<VALUE value="ID"/>
				<VALUE value="IDREF"/>
				<VALUE value="IDREFS"/>
				<VALUE value="ENTITY"/>
				<VALUE value="ENTITIES"/>
				<VALUE value="NMTOKEN"/>
				<VALUE value="NMTOKENS"/>
			</ENUMTYPE>
			<REQUIRED/>
		</ATT>
	</ELEMENT>

	<ELEMENT name="ENUMTYPE">
		<REPEAT>
			<NAME name="VALUE"/>
		</REPEAT>
	</ELEMENT>

	<ELEMENT name="REQUIRED">
		<EMPTY/>
	</ELEMENT>

	<ELEMENT name="IMPLIED">
		<EMPTY/>
	</ELEMENT>

	<ELEMENT name="FIXED">
		<EMPTY/>
		<ATT name="value">
			<IMPLIED/>
		</ATT>
	</ELEMENT>

	<ELEMENT name="VALUE">
		<EMPTY/>
		<ATT name="value">
			<IMPLIED/>
		</ATT>
	</ELEMENT>

	<ELEMENT name="ENTITY">
		<MIXED/>
		<ATT name="name">
			<TYPE type="NMTOKEN"/>
			<REQUIRED/>
		</ATT>
		<ATT name="href">
			<IMPLIED/>
		</ATT>
		<ATT name="public">
			<IMPLIED/>
		</ATT>
		<ATT name="notation">
			<IMPLIED/>
		</ATT>
	</ELEMENT>

	<ELEMENT name="NOTATION">
		<EMPTY/>
		<ATT name="name">
			<TYPE type="NMTOKEN"/>
			<REQUIRED/>
		</ATT>
		<ATT name="href">
			<IMPLIED/>
		</ATT>
		<ATT name="public">
			<IMPLIED/>
		</ATT>
	</ELEMENT>

</DOCTYPE>