[This local archive copy mirrored from the canonical site: http://www.ccil.org/~cowan/XSchema-draft-19980601.txt, 19980603; links may not have complete integrity, so use the canonical document at this URL if possible.]
Title: XML XSchemas: Language/Metalanguage Unification Source: John Cowan <cowan@ccil.org> Primary Author: John Cowan (no W3C affiliation) Date: 1998-06-01 Status: Expert contribution Action: For the consideration of W3C XML WG/SI References: REC-xml-19980210 Distribution: All interested parties Synopsis This note describes one possible method of rendering the information in an XML DTD in the form of an XML document instance. Such an instance is called an XSchema. An XSchema contains all the information necessary to validate any XML document defined by it, and can be automatically created from an XML DTD. It is not possible to reconstruct the exact DTD used to create an XSchema, but a functionally equivalent DTD can be reconstructed. XSchemas are primarily meant for consumption by automated validators. Non-validating XML parsers do not require them, and human beings will find them painful to write by hand, due to their great redundancy compared to DTDs. 1. General Considerations The following points explain how various constructions in DTDs are represented or not represented in XSchemas. 1.1 XSchemas Represent Full DTDs, Not Internal Or External Subsets When creating an XSchema corresponding to a particular DTD, both the internal subset (contained within the document's DOCTYPE declaration) and the external subset (referred to from the DOCTYPE declaration) are taken into account. 1.2 Parameter Entities Before converting a DTD to an XSchema, it is necessary to replace all parameter entity references with their replacement texts, and to remove parameter entity declarations from the DTD. (In principle, it would be possible to create XSchemas that used general entities corresponding to at least some parameter entities, but the constraint that general entities be well-formed might prevent some parameter entities from being so converted.) This is the main reason why the exact DTD used to construct an XSchema cannot be recovered: common structure expressed using parameter entities is not easily recoverable. 1.3 DTD Comment Declarations And Processing Instructions Since comment declarations are allowed in XML document instances, they may be passed from DTDs to XSchemas unchanged. Processing instructions are likewise passed through unchanged. 1.4 Conditional Sections All conditional sections must be removed from a DTD before converting it to an XSchema. INCLUDE conditional sections are replaced by their contents, whereas IGNORE conditional sections are removed entirely. 1.5 Element And Attribute Names Many of the element names in the XSchema are the same as XML declaration keywords (and are therefore in upper case). There can be no confusion between elements and declarations, however, because markup declarations begin with "<!" rather than simply "<". For consistency, all other XSchema element names are also in upper case. All attribute names are in lower case. 1.6 Extraneous Whitespace All whitespace in element content is totally ignored in XSchemas. Only ENTITY elements have #PCDATA content. 2. The DOCTYPE Element The DOCTYPE element is the root element of an XSchema, containing all the other XSchema elements. Its attributes give properties of the XSchema as a whole. 2.1 DOCTYPE Element Content Model The DOCTYPE element may contain ELEMENT, ENTITY, and NOTATION elements describing respectively the elements, entities, and notations described by the XSchema. 2.2 DOCTYPE Attributes The only (and required) attribute of a DOCTYPE element is "root", which is a name token attribute specifying the root element of all document instances conforming to this XSchema. 3. The ELEMENT Element And Related Elements There is a single ELEMENT element in the XSchema for every element type in the document described by the XSchema. The attributes and content of an ELEMENT element provide complete information about the element described. This implies that when converting DTDs to XSchemas, multiple ATTLIST declarations must be consolidated. 3.1 ELEMENT Element Content Model The first child of an ELEMENT element indicates the model of the element being represented: it can be an EMPTY element, an ANY element, a MIXED element, or any of the following elements declaring element content: NAME, CHOICE, SEQ, OPT, OPTRPT, REPEAT. The remaining children are optional ATT elements declaring attributes. 3.2 ELEMENT Attributes The only (and required) attribute of an ELEMENT element is "name", which is a name token attribute specifying the name of the element. 3.3 EMPTY elements An EMPTY element is an empty element with no attributes. It is used to describe a content model of EMPTY. 3.4 ANY elements An ANY element is also an empty element with no attributes. It is used to describe a content model of ANY. 3.5 MIXED elements A MIXED element contains optional NAME elements and has no attributes. It is used to describe a mixed-content model including #PCDATA (parsed character data) and the elements named in the NAME elements. 3.6 NAME elements A NAME element is an empty element with one required name token attribute, "name", which is used to specify the name of an element participating in a mixed-content or element-content model. 3.7 CHOICE elements A CHOICE element may contain NAME, CHOICE, SEQ, OPT, OPTRPT, or REPEAT elements, and represents a choice list as part of an element-content model. It has no attributes. It is not used within a MIXED element, where the presence of a choice list is implicit. 3.8 SEQ elements A SEQ element may contain NAME, CHOICE, SEQ, OPT, OPTRPT, or REPEAT elements, and represents a sequence list as part of an element-content model. It has no attributes. 3.9 OPT elements An OPT element may contain NAME, CHOICE, or SEQ elements, and represents an option (represented by the "?" character in DTDs) as part of an element-content model. It has no attributes. 3.10 OPTRPT element An OPTRPT element may contain NAME, CHOICE, or SEQ elements, and represents an optional repetition (represented by the "*" character in DTDs) as part of an element-content model. It has no attributes. It is not used within a MIXED element, where the presence of an optional repetition is implicit. 3.11 REPEAT element A REPEAT element may contain NAME, CHOICE, or SEQ elements, and represents an option (represented by the "+" character in DTDs) as part of an element-content model. It has no attributes. 4. The ATT Element And Related Elements There is a single ATT element for each attribute described. ATT elements are always the children of ELEMENT elements. 4.1 ATT Element Content An ATT element has one or two children. The first child, which is optional, is either a TYPE element specifying a predefined XML attribute type, or an ENUMTYPE element specifying an enumerated type. If neither a TYPE element nor an ENUMTYPE element is present, the attribute type is CDATA. The second child, which is always present, is either a REQUIRED element, an IMPLIED element, a FIXED element, or a VALUE element. 4.2 ATT Element Attributes The only (and required) attribute of an ATT element is "name", which is a name token attribute specifying the name of the attribute. 4.3 TYPE Element A TYPE element is an empty element specifying that an attribute has one of the standard XML types. There is one required attribute of a TYPE element, "type", which can take the values "ID", "IDREF", "IDREFS", "ENTITY", "ENTITIES", "NMTOKEN", or "NMTOKENS". 4.4 ENUMTYPE Elements An ENUMTYPE element specifies that the attribute it describes can take on one of a fixed set of values. The values are described by VALUE elements which are the children of the ENUMTYPE element. ENUMTYPE elements have no attributes. 4.5 REQUIRED Element A REQUIRED element specifies that the attribute it describes is required. REQUIRED elements are empty and have no attributes. 4.6 IMPLIED element An IMPLIED element specifies that the attribute it describes, if not explicitly present in the document, has an application-determined value. IMPLIED elements are empty and have no attributes. 4.7 FIXED Element A FIXED element specifies that the attribute it describes always has the value specified by the required "value" attribute of the FIXED element. This attribute is character data. FIXED elements are empty. 4.8 VALUE element A VALUE element is empty and has one required character data attribute, "value". As a child of an ATT element, it specifies a default value for the attribute being described. As a child of an ENUMTYPE element, it specifies one of the possible values for the attribute being described. 5. The ENTITY Element An ENTITY element describes a general entity, which may be internal or external, and if external, may be parsed or unparsed. They correspond to ENTITY declarations in DTDs, other than parameter entity declarations. If multiple entity declarations with the same entity name appear in a DTD, all but the first must be ignored when converting to an XSchema. 5.1 ENTITY Element Content Model The content of an ENTITY element is character data, and represents the replacement text of an internal entity. ENTITY elements declaring external entities must be empty. 5.2 ENTITY Element Attributes The attributes of an ENTITY element are "name", "href", "public", and "notation". The "name" attribute is a required name token attribute and represents the name of the entity being described. The other attributes are string attributes. The "href" attribute is required for external entities, and represents the system identifier for the entity. The "public" attribute is optional for external entities, and represents the public identifier for the entity. The "notation" attribute is required for unparsed external entities, and represents the notation for the entity. 6. The NOTATION Element A NOTATION element describes a notation. They correspond to NOTATION declarations in DTDs. NOTATION elements are empty. 6.1 NOTATION Element Attributes The attributes of a NOTATION element are "name", "href", and "public", and have the same significance as the correspondingly named attributes of an ENTITY element. 7. Conformance An XSchema conforms to this document if it is valid and contains a DOCTYPE declaration referring to an external DTD subset substantially equivalent to that given in Appendix A. No internal subset may appear in the DOCTYPE declaration. An XSchema also conforms to this document if it is well-formed, does not contain a DOCTYPE declaration, and would be valid if it contained a DOCTYPE declaration referring to a DTD substantially equivalent to that given in Appendix A. Appendix A. XML DTD for XSchemas <-- This is the 19980407 draft of the DTD for XSchemas, which are XML documents that contain the same information as XML DTDs. Typical usage: <!DOCTYPE DOCTYPE PUBLIC "--//blather/XSchema/EN" "dsd.dtd"> --> <!ELEMENT DOCTYPE (ELEMENT|ENTITY|NOTATION)*> <!ATTLIST DOCTYPE root NMTOKEN #REQUIRED> <!ENTITY % repeatable "NAME | CHOICE | SEQ"> <!ENTITY % sequenceable "%repeatable; | OPT | OPTRPT | REPEAT"> <!ENTITY % name "name NMTOKEN #REQUIRED"> <!ELEMENT ELEMENT ((EMPTY | ANY | MIXED | %sequenceable;), (ATT)*)> <!ATTLIST ELEMENT %name;> <!ELEMENT EMPTY EMPTY> <!ELEMENT ANY EMPTY> <!ELEMENT MIXED (NAME*)> <!ELEMENT NAME EMPTY> <!ATTLIST NAME %name;> <!ELEMENT CHOICE (%sequenceable;)+> <!ELEMENT SEQ (%sequenceable;)+> <!ELEMENT OPT (%repeatable;)> <!ELEMENT OPTRPT (%repeatable;)> <!ELEMENT REPEAT (%repeatable;)> <!ELEMENT ATT ((TYPE | ENUMTYPE)?, (REQUIRED | IMPLIED | FIXED | VALUE)> <!ATTLIST ATT %name;> <!ELEMENT TYPE EMPTY> <!ATTLIST TYPE type (ID|IDREF|IDREFS|ENTITY|ENTITIES|NMTOKEN|NMTOKENS) #REQUIRED> <!ELEMENT ENUMTYPE (VALUE)+> <!ELEMENT REQUIRED EMPTY> <!ELEMENT IMPLIED EMPTY> <!ENTITY % value "value CDATA #REQUIRED"> <!ELEMENT FIXED EMPTY> <!ATTLIST FIXED %value;> <!ELEMENT VALUE EMPTY> <!ATTLIST VALUE %value;> <!ENTITY % external "name CDATA #REQUIRED href CDATA #IMPLIED public DATA #IMPLIED"> <!ELEMENT ENTITY (#PCDATA)) <!ATTLIST ENTITY %external; notation CDATA #IMPLIED> <!ELEMENT NOTATION EMPTY> <!ATTLIST NOTATION %external;> <-- End of DTD --> Appendix B. Meta-XSchema This is the XSchema that describes XSchemas, derived from the XSchema DTD above. You are not expected to understand this. <?xml version="1.0" standalone="yes"?> <!DOCTYPE DOCTYPE SYSTEM "dsd.dtd"> <DOCTYPE root="DOCTYPE"> <ELEMENT name="DOCTYPE"> <OPTRPT> <NAME name="ELEMENT"/> <NAME name="ENTITY"/> <NAME name="NOTATION"/> </OPTRPT> <ATT name="root"> <TYPE type="NMTOKEN"/> <REQUIRED/> </ATT> </ELEMENT> <ELEMENT name="ELEMENT"> <SEQ> <CHOICE> <NAME name="EMPTY"/> <NAME name="ANY"/> <NAME name="MIXED"/> <NAME name="NAME"/> <NAME name="CHOICE"/> <NAME name="SEQ"/> <NAME name="OPT"/> <NAME name="OPTRPT"/> <NAME name="REPEAT"/> </CHOICE> <OPTRPT> <NAME name="ATT"/> </OPTRPT> </SEQ> <ATT name="name"> <TYPE type="NMTOKEN"/> <REQUIRED/> </ATT> </ELEMENT> <ELEMENT name="EMPTY"> <EMPTY/> </ELEMENT> <ELEMENT name="ANY"> <EMPTY/> </ELEMENT> <ELEMENT name="MIXED"> <OPTRPT> <NAME name="NAME"/> </OPTRPT> </ELEMENT> <ELEMENT name="NAME"> <EMPTY/> <ATT name="name"> <TYPE type="NMTOKEN"/> <REQUIRED/> </ATT> </ELEMENT> <ELEMENT name="CHOICE"> <REPEAT> <CHOICE> <NAME name="NAME"/> <NAME name="CHOICE"/> <NAME name="SEQ"/> <NAME name="OPT"/> <NAME name="OPTRPT"/> <NAME name="REPEAT"/> </CHOICE> </REPEAT> </ELEMENT> <ELEMENT name="SEQ"> <REPEAT> <CHOICE> <NAME name="NAME"/> <NAME name="CHOICE"/> <NAME name="SEQ"/> <NAME name="OPT"/> <NAME name="OPTRPT"/> <NAME name="REPEAT"/> </CHOICE> </REPEAT> </ELEMENT> <ELEMENT name="OPT"> <CHOICE> <NAME name="NAME"/> <NAME name="CHOICE"/> <NAME name="SEQ"/> </CHOICE> </ELEMENT> <ELEMENT name="OPTRPT"> <CHOICE> <NAME name="NAME"/> <NAME name="CHOICE"/> <NAME name="SEQ"/> </CHOICE> </ELEMENT> <ELEMENT name="REPEAT"> <CHOICE> <NAME name="NAME"/> <NAME name="CHOICE"/> <NAME name="SEQ"/> </CHOICE> </ELEMENT> <ELEMENT name="ATT"> <SEQ> <OPT> <CHOICE> <NAME name="TYPE"/> <NAME name="ENUMTYPE"/> </CHOICE> </OPT> <CHOICE> <NAME name="REQUIRED"/> <NAME name="IMPLIED"/> <NAME name="FIXED"/> <NAME name="VALUE"/> </CHOICE> </SEQ> <ATT name="name"> <REQUIRED/> </ATT> </ELEMENT> <ELEMENT name="TYPE"> <EMPTY/> <ATT name="type"> <ENUMTYPE> <VALUE value="ID"/> <VALUE value="IDREF"/> <VALUE value="IDREFS"/> <VALUE value="ENTITY"/> <VALUE value="ENTITIES"/> <VALUE value="NMTOKEN"/> <VALUE value="NMTOKENS"/> </ENUMTYPE> <REQUIRED/> </ATT> </ELEMENT> <ELEMENT name="ENUMTYPE"> <REPEAT> <NAME name="VALUE"/> </REPEAT> </ELEMENT> <ELEMENT name="REQUIRED"> <EMPTY/> </ELEMENT> <ELEMENT name="IMPLIED"> <EMPTY/> </ELEMENT> <ELEMENT name="FIXED"> <EMPTY/> <ATT name="value"> <IMPLIED/> </ATT> </ELEMENT> <ELEMENT name="VALUE"> <EMPTY/> <ATT name="value"> <IMPLIED/> </ATT> </ELEMENT> <ELEMENT name="ENTITY"> <MIXED/> <ATT name="name"> <TYPE type="NMTOKEN"/> <REQUIRED/> </ATT> <ATT name="href"> <IMPLIED/> </ATT> <ATT name="public"> <IMPLIED/> </ATT> <ATT name="notation"> <IMPLIED/> </ATT> </ELEMENT> <ELEMENT name="NOTATION"> <EMPTY/> <ATT name="name"> <TYPE type="NMTOKEN"/> <REQUIRED/> </ATT> <ATT name="href"> <IMPLIED/> </ATT> <ATT name="public"> <IMPLIED/> </ATT> </ELEMENT> </DOCTYPE>