Taskforce members: D. Beech, P. Biron, A. Brown, P. Chen, D. Fallside
(ed), M. Fuchs, M. Murata, J. Robie
The updates proposed in this document take the form of new text to replace text currently existing in sections 2, 3 and Appendix B of the 19 July XML Schema: Structures working draft. Within sections 2 and 3 of this proposal, subsections indicated by only headings are assumed to contain the text of the July 19 draft, although note that the heading itself of section 2.4 is updated.
1. Introduction
2. Conceptual Framework
3. Schema Definitions and Declarations
3.1 The Schema
3.2 The
Document and its Root
3.3 References
to Schema Constructs
3.4 Types, Elements and
Attributes
3.4.1 Type Definition
3.4.2 Datatype Specification
3.4.3 Archetype Specification
3.4.4 Attribute Declaration
3.4.5 Attribute Group Definition
3.4.6 Element Content Model
3.4.7 Mixed Content
3.4.8 Element-Only Content
3.4.9 Named Model Group
3.4.10 Element Declaration
3.5 Archetype
Refinement
3.6 Entities
and Notations
3.6.1 Internal Parsed Entity
Declaration
3.6.2 External Parsed Entity
Declaration
3.6.3 Unparsed Entity Declaration
3.6.4 Notation Declaration
4. Schema Composition and Namespaces
5. Documenting schemas
6. Conformance
B. (normative) DTD for Schemas
The purpose of a schema is to define a set of XML elements and attributes and the rules for their correct combination. For this reason, schemas always contain definitions of elements and/or attributes, and they usually contain constraints, that is, the rules for which elements and attributes can be used with which other ones, under which circumstances and in which ways.
The schema language is itself a set of elements and attributes. We will describe these, and show how they are used. But first, a quick example of an XML document.
Example
<PurchaseOrder> <shipTo> <name>Alice Smith</name> <street>123 Maple Street</street> <city>Mill Valley</city> <state>CA</state> <zip>90952</zip> </shipTo> <orderDate>1999-05-20</orderDate> <shipDate>1999-05-25</shipDate> <comments> Get these things to me in a hurry, my lawn is going wild! </comments> <Items> <Item> <productName>Lawnmower, model BUZZ-1</productName> <quantity>1</quantity> <price>148.95</price> </Item> <Item> <productName>Baby Monitor, model SNOOZE-2</productName> <quantity>1</quantity> <price>39.98</price> </Item> </Items> </PurchaseOrder>
The purchase order consists of a main element with several subordinate elements. Most of the subelements have simple atomic types such as "string" or "date", but some are complex. Type is the mechanism for defining a complex element structure. For example, we can define a type such as Address as follows:
Example
<type name="Address" > <element name="name" type="dt:String" /> <element name="street" type="dt:String" /> <element name="city" type="dt:String" /> <element name="state" type="dt:String" /> <element name="zip" type="dt:Number" /> </type>
An Address type consists of five elements. Though each has a distinct name, four of the elements will simply contain a string in a document instance while one will contain a number. Each of the basic types (string, number, etc) is defined in another schema, whose namespace is indicated by the "dt" prefix.
Given the definition of Address as above, we can define a PurchaseOrder as:
Example
<type name="PurchaseOrder"> <element name="shipTo" type="Address" /> <element name="orderDate" type="dt:Date" /> <element name="shipDate" type="dt:Date" /> <element name="comments" type="dt:string" /> <element name="Items" type="Items" /> </type>
Several of the elements of the PurchaseOrder have types defined in the datatypes namespace we saw earlier; others, such as Address and Items are types defined in the current schema, and hence are not explicitly namespace-qualified.
What we mean by an elements name is the tag name appearing in the document instance, for example:
Example
<street> </street>
By a type we mean a description of a set of attributes and a pattern of elements and contained characters. For example the type "dt:String" means that text may appear, but no subelements, while "dt:Date" means that the content is limited to strings in a particular format representing dates. In XML technical terms, the name is the Generic Identifier of an element while the type identifies the (extended) Content Model.
A definition creates a new type; a declaration enables the appearance in a document instance of an element with a specific name and type. In the schema, we see both the definition of several types, and also several elements declared as usages of these types. For example, Address is defined to be a type, while within the definition of Address we see five declarations of elements. These declarations are not themselves types. They are accessor names to content of a specific type such as dt:String.
The relation between an elements usage and the valid contents is indicated by the "type" attribute. Suppose we want to make it clear that an element contains text but no subelements. We can say:
Example
<element name="street" type="dt:String" />
At the opposite extreme, we can limit an element to only contain subelements, and specific ones at that, with a declaration such as:
Example
<element name="shipTo" type="Address" />
Because Address is defined in the schema to have certain elements as its content, any shipTo element appearing in an instance must include those elements.
A schema contains some preamble information and a set of definitions and declarations.
Schema top level | |||||||||||||||||||||||||||||||||||
|
preamble consists of an xmlSchemaRef specifying the URI for XML Schema: Structures; the schemaIdentity specifying the URI by which this schema is to be identified; and a schemaVersion specification for private version documentation purposes and version management.
Example
<!DOCTYPE schema PUBLIC '-//W3C//DTD XML Schema Version 1.0//EN' SYSTEM 'http://www.w3.org/XML/Group/1999/07/schema-snapshot/xmlschema/structures/structures.dtd' > <schema name='file:/usr/schemas/xmlschema/mySchema.xsd' version='M.n' xmlns='http://www.w3.org/XML/Group/1999/07/schema-snapshot/xmlschema/structures/structures.xsd'> ... </schema>Note that the abstract syntax xmlSchemaRef is realised via a default namespace declaration in the concrete syntax.
The schema's model property is discussed in Archetype Refinement (§3.5). The schema's export, import and include properties are discussed in Schema Composition and Namespaces (§4).
The schema's declarations and definitions, discussed in detail in Schema Definitions and Declarations (§3), provide for the creation of new schema components:
Summary of Definitions and Declarations | ||||||||||||||||||||||||||||||
|
Example
The following illustrates the basic model for declaring all XML Schema: Structures components:
<type name='myType'> ... </type> <element name='myElement'> ... </element> <attrGroup name='myAttrGroup'> ... </attrGroup> <modelGroup name='myModelGroup'> ... </modelGroup> <notation name='myNotation' ... /> <textEntity name='myTextEntity'> ... </textEntity> <externalEntity name='myExternalEntity' ... /> <unparsedEntity name='myUnparsedEntity' ... /> </schema>When creating a new component, we declare that its name is associated with the specification for that component. Each new component definition creates a new entry in the symbol space for that kind of component.
The (§) Constraint on Schemas obtains.
Issue (no-evolution): This draft does not deal with the requirement "for addressing the evolution of schemata" (see []).
NOTE: We have not so far seen any need to reconstruct the XML 1.0 notion of root. For the connection from document instances to schemas, see (§).
Uniform means are provided for reference to a broad variety of schema constructs, both within a single schema and to features imported ( (§)) from external schemas. The name used to reference any component of XML Schema: Structures from within a schema consists of an NCName and an optional schemaRef, a reference to an external schema. In a few cases, some qualification may be added to a reference: this is made clear as the individual reference forms are introduced below.
Example: Component Names and References | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The abstract syntax above characterizes the reference mechanisms used in this specification.
The (§) Constraint on Schemas obtains.
The (§) Constraint on Schemas obtains.
The identify definition wrt schema-validity obtains.
The (§) Constraint on Schemas may also obtain.
Like XML 1.0 DTDs, XML Schema: Structures provides facilities for constraining the contents of elements and the values of attributes, and for augmenting the information set of instances, e.g. with defaulted values and type information. [Definition:] We refer hereafter to the combination of schema constraints and information set contributions with the abbreviation SC. Compared to DTDs, XML Schema: Structures provides for a richer set of SCs, and improved capabilities for sharing SCs across sets of elements and attributes.
A type definition creates a named type. It associates the given name with either a datatype specification or an archetype specification.
Types | ||||||||||
|
We start with [Definition:] the simple datatypes whose expression in XML documents consists entirely of character data. As in the current draft of XML Schemas: Datatypes, wherever we speak of datatypes in this draft, we shall mean these simple datatypes. The treatment of aggregate datatypes (collections and structures) has not yet been resolved.
Datatypes | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
XML Schema: Structures incorporates the datatype specification mechanisms defined by [] in order to express SCs on attribute values and the character data contents of elements.
The production for datatypeSpec above serves to indicate where this chapter connects with XML Schemas: Datatypes. exportControl is defined in (§).
The other productions provide for using datatypes once they have been defined, see below under contentType and attrDecl.
We assume that it is appropriate to allow for some local specialization of datatypes at the point of use, and provide for that here (specialize).
As explained in References to Schema Constructs (§3.3), a schemaRef, if included allows for the referenced definition to be located in some other schema.
The (§) Constraint on Schemas obtains.
The satisfy-dt definition wrt schema-validity obtains.
The (§) Schema Information Set Contribution obtains.
NOTE: Timing constraints were such that this text may not align completely with XML Schemas: Datatypes
[Definition:] Archetype specifications gather together all SCs pertinent to elements in instance documents, their attributes and their contents. They are called archetypes because there may be more than one element declaration that shares the same SCs (see Element Declaration (§3.4.10)), and which therefore can be constrained by a common archetype.
Archetype Specification | |||||||||||||||||||||||||
|
The first two productions above provide the basic structure of the specification, the last two provide for reference to the things specified. But note that the name of an archetype is not ipso facto the name of elements whose appearance in instances will be associated with the SCs of that type. The connection between an element name and an archetype is made by an elementDecl, see below.
Alongside Attribute Declaration (§3.4.4) for permitted attributes, SCs for contents are defined in an archetype (contentType). For elements which may contain only character data, content type SCs are specified by reference to a Datatype Specification (§3.4.2). Note that doing this by way of datatypeRef means that the character data SCs may provide for specialization and even defaulting in a manner similar to attribute values. For other kinds of elements, an Element Content Model (§3.4.6) is required.
Issue (elt-default): The extension of defaulting to element content is tentative.
The (§) Constraint on Schemas obtains.
The (§) Constraint on Schemas obtains.
The attr-decl-set definition wrt schema-validity obtains.
The attr-fullname definition wrt schema-validity obtains.
The (§) Constraint on Schemas obtains.
The satisfy-as definition wrt schema-validity obtains.
Issue (sic-elt-default): The above definitions do not provide for handling a default on an archetype's datatypeRef. Preferred solution: empty element items ipso facto satisfy datatypeRefs with defaults and are augmented with the default value. This would have the consequence that you cannot provide the empty string as the explicit value of an element item if it's governed by a datatypeRef with a default.
The (§) Schema Information Set Contribution obtains.
NOTE: This draft does not provide any mechanism for applying any SCs to element items whose namespace does not nominate a schema. This may be addressed in a later draft: in the meantime a workaround is possible as follows:
Suppose we wish to use some Dublin Core terms in a schema, but all we know is the URI for the Dublin Core document. Perhaps we want to schema-validate
<mybook><dc:creator xmlns:dc='...'>Rafael Sabattini</dc:creator></mybook>where
mybook
is already known to be covered by my schema. The workaround is to replace the real Dublin Core URI with a local URL for a tiny schema which simply definescreator
, and references the real URI for documentation.
Attribute declarations associate a name (which will appear as an attribute in start tags in instances) with SCs for the presence and value thereof.
Attributes | ||||||||||||||||||||||||||||||
|
NOTE: Note that the datatypeRef productions are repeated here for easy reference.
Attribute declarations provide for:
Example
<attribute name='myAttribute'/> <attribute name='anotherAttribute' type='integer' default='42'/> <attribute name='yetAnotherAttribute' type='integer' minOccurs='1'/> <attribute name='stillAnotherAttribute' type='string' fixed='Hello world!'/>Four attributes are declared: one with no explicit SCs at all; two defined by reference to a built-in type, one with a default and one required to be present in instances; and one with a fixed value.
When attribute declarations are used in an archetype
specification, each archetype provides its own symbol space for
attribute names. E.g. an attribute named title
within one archetype need not have the same datatypeRef as one declared within
another archetype.
The attr-satisfy definition wrt schema-validity obtains.
Issue (default-attr-datatype): What is the default attribute datatypeSpec?
The satisfy-attrs definition wrt schema-validity obtains.
The (§) Schema Information Set Contribution obtains.
Issue (namespace-declare): We've got a problem with namespace declarations: they're not attributes at the infoset level, so they can appear without compromising validity, EXCEPT if there is a fixed or required declaration, and defaults should have the apparently desired effect.
XML Schema: Structures can name a group of attributes so that they may be incorporated as a whole into archetype definitions:
Attribute groups | |||||||||||||||||||||||||
|
Attribute group definitions:
Example
<attrGroup name='myAttrGroup'> <attribute .../> ... </attrGroup> <type name='myelement' content='empty'> <attrGroupRef name='myAttrGroup'/> </type>Define and refer to an attribute group. The effect is as if the attribute declarations in the group were present in the archetype definition.
NOTE: There needs to be a Constraint on Schema which constrains the attrDecls which appear with an attrGroupRef: the name is the same as one of the attrDecls in the group, datatype and defaulting preserves substitutability, etc.
When content of elements is not constrained by reference to a datatype (Datatype Specification (§3.4.2)), it can have any, empty, element-only or mixed content. In the latter cases, the form of the content is specified in more detail.
Content model | |||||
|
A content model constrains the element content of an archetype specification: it says nothing about attributes.
Content models do not have names, but appear as a part of the definition of an archetype, which does have a name. Model groups can be named and used by name, see below.
The satisfy-cm definition wrt schema-validity obtains.
A content model for mixed content provides for mixing elements with character data in document instances. The allowed elements are named, but neither their order nor their number of occurrences is constrained.
Mixed content | |||||
|
The elementRefs and elementDecls determine the elements that may appear as children along with character data.
Example
<mixed> <element ref='name1'/> <element ref='name2'/> <element ref='name3'/> </mixed>Allows character data mixed with any number of name1
,name2
andname3
elements.
NOTE: The fact that mixed allows for there to be no elementRefs or elementDecls makes it similar to XML 1.0's Mixed production. Indeed an empty mixed is the only way a schema can allow character data content with no datatype constraint at all.
The (§) Constraint on Schemas obtains.
See Element Declaration (§3.4.10) for discussion and examples of the appearance of elementDecl above.
The satisfy-mixed definition wrt schema-validity obtains.
A content model for element-only content specifies only child elements (no immediate character data content other than white space is allowed). The content model consists of a simple grammar governing the allowed types of child elements and the order in which they must appear.
Element-only content | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The grammar for element-only content is built on model elements and model groups (modelElt and modelGroup above). A model element provides for some number of occurrences in an instance of either a single element (via elementRef or elementDecl) or a group of elements (via modelGroup or modelGroupRef). A model group is two or more model elements plus a compositor.
A compositor for a model group
specifies for a given group whether it is a sequence of its model
elements, a choice between its model elements or a set of its
model elements which must appear in instances. These options
reconstruct the XML 1.0 ,
connector, the XML 1.0 |
connector and the SGML &
connector respectively.
In the first case (sequence) all the model elements must appear
in the order given in the group; in the second case (choice),
exactly one of the model elements must appear in the element
content; and in the third case (all), all the model elements,
which are restricted in this case only to unqualified elementRefs and elementDecls, must appear in the
element content, but may appear in any order.
The occurs specification governs how many times the instance material allowed by a modelElt may occur at that point in the grammar. In the absence of a collection specification, at most one occurrence is allowed, and maxOccurs must have the value 1 if specified (default is 1), and minOccurs must be either 0 or 1 (default 1). If collection is specified, the default for minOccurs is 0, and the absence of a maxOccurs specification means that no upper bound is placed on the number of occurrences. The value of minOccurs must be less than or equal to the value of maxOccurs (which must be greatere than 0).
See Element Declaration (§3.4.10) for further discussion and examples of the appearance of elementDecl within modelElt above.
The satisfy-eo definition wrt schema-validity obtains.
The (§) Constraint on Schemas obtains.
NOTE: Note that the above permits repeated use of the same elementRef, analogous to DTD usage.
NOTE: EDITORS: Add a COS for the checking of valid pairs of minOccurs and maxOccurs.
The (§) Constraint on Schemas obtains.
Issue (still-unambig): Should this compatibility constraint be preserved?
This reconstructs another common use of parameter entities.
Named model groups | ||||||||||||||||||||
|
Example
<modelGroup name='myModelGroup'> <element ref='myelement'/> </modelGroup> <element name='myelement'> <type> <modelGroupRef name='myModelGroup'/> <attribute ...>. . .</attribute> </type> </element> <element name='anotherelement'> <type> <group order='choice'> <element ref='yetAnotherelement'/> <modelGroupRef name='myModelGroup'/> </group> <attribute ...>. . .</attribute> </type> </element>A minimal model group is defined and used by reference, first as the whole content model, then as one alternative in a choice.
An [Definition:] element declaration associates an element name with a type, either by reference or by incorporation.
Element declaration | ||||||||||||||||||||
|
An element declaration associates a name with a typeSpec. This name will appear in tags
in instance documents; the type specification provides SCs on the form of elements tagged with the
specified name. An element declaration whose elementSpec is a typeSpec is comparable to an <!ELEMENT
...>
declaration in an XML 1.0 DTD.
The last two productions above provide for elements to be referenced by name from content models.
As noted above element names are in a separate symbol space from the symbol space for the names of types, so there can (but need not be) a type with the same name as a top-level element.
The elt-fullname definition wrt schema-validity obtains.
An elementDecl may also appear within a modelElt. See above (Element-Only Content (§3.4.8) and Mixed Content (§3.4.7)) for where this is allowed. This declares a locally-scoped association between an element name and a type. As with attribute names, locally-scoped element names reside in symbol spaces local to the archetype that defines them. Note however that type names are always top-level names within a schema, even when associated with locally-scoped element names.
NOTE: It is not yet clear whether a type defined implicitly by the appearance of a typeSpec directly within an elementSpec will have an implicit name, or if so what that name would be.
Example
<element name='myelement' type='myDatatype'/> <element name='et0' type='myArchetype'/> <element name='et1'> <type> <group order='all'>. . .</group> <attribute ...>. . .</attribute> </type> </element> <element name='et2'> <type content='any'/> </element> <element name='et3'> <type content='empty'> <attribute ...>. . .</attribute> </type> </element> <element name='et4'> <type> <group order='choice'>. . .</group> <attribute ...>. . .</attribute> </type> </element> <element name='et5'> <type> <group order='seq'>. . .</group> <attribute ...>. . .</attribute> </type> </element> <element name='et6'> <type model='open' content='mixed'/> </element>A pretty complete set of alternatives. Note the last one is intended to be equivalent to the idea sometimes called WFXML, for Well-Formed XML: it allows any content at all, whether defined in the current schema or not, and any attributes.
<element name='contextOne'> <type order='seq'> <element name='myLocalelement' type='myFirstArchetype'/> <element ref='globalelement'/> </type> </element> <element name='contextTwo' <type order='seq'> <element name='myLocalelement' type='mySecondArchetype'/> <element ref='globalelement'/> </type> </element>Instances of myLocalelement
withincontextOne
will be constrained bymyFirstArchetype
, while those withincontextTwo
will be constrained bymySecondArchetype
.
NOTE: The possibility that differing attribute definitions and/or content models would apply to elements with the same name in different contexts is an extension beyond the expressive power of a DTD in XML 1.0.
The (§) Constraint on Schemas obtains.
The (§) Constraint on Schemas obtains.
The satisfy-ed definition wrt schema-validity obtains.
The ind-valid definition wrt schema-validity obtains.
The satisfy-etr definition wrt schema-validity obtains.
A notation may be declared by specifying a name and an identifier for the notation. A notation may be referenced by name in a schema as part of an external entity declaration.
Example
<notation name='jpeg' public='image/jpeg' system='viewer.exe' /> <element name='picture> <type> <attribute name='entity' type='NOTATION'/> </type> </element>
<picture entity='SKU-5782-pic'/>The notation need not ever be mentioned in the instance document.
Issue (unparsed-entity-attributes): We need to synchronise with XML Schemas: Datatypes regarding how we declare attributes as unparsed entities!
The following is the DTD for XML Schema: Structures:
|