XML
Schema Tutorial:
Additional Materials [*]
Henry S. Thompson
Language Technology Group,
Human Communication Research Centre
University of Edinburgh
London, 15 December 1999
This document contains additional illustrative and reference material to accompany the narrative presentation in the slide documents.
DISCLAIMER
Some of the material contained herein is drawn from internal working drafts of the XML Schema Working Group of the W3C. Those WG-internal drafts are due to be published in the next few days, but may change in the interim, and in any case errors may have been introduced in copying and formatting. Only material identified as W3C publications is a reliable or quotable* source of information about XML Schema or any other W3C activity.
_______
*Note that employees of W3C member companies (that's you) are enjoined from distributing W3C-internal materials to non-W3C members: these materials are covered by that injunction!
“The purpose of an XML Schema: Structures schema is to define and describe a class of XML documents by using [schema] constructs to constrain and document the meaning, usage and relationships of their constituent parts: datatypes, elements and their content, attributes and their values. Schema constructs may also provide for the specification of additional information such as default values. Schemas are intended to document their own meaning, usage, and function through a common documentation vocabulary. Thus, XML Schema: Structures can be used to define, describe and catalogue XML vocabularies for classes of XML documents”
“[XML Schema: Datatypes] addresses the need of both document authors and applications writers for a robust, extensible datatype system for XML which could be incorporated into XML processors. As discussed below, these datatypes could be used in other XML-related standards as well.”
1 Introduction
1.1 Documentation Conventions
1.2 Purpose
1.3 Relationship To Other Work
1.4 Terminology
2 Conceptual Framework
2.1 Kinds of XML Documents
2.2 On schemas, constraints and
contributions
2.3 Schemas, Types and Elements
2.4 Schemas and their component parts
2.5 Names and Symbol Spaces
2.6 Referencing Schema Components
2.7 Association of components with a target namespace
2.7.1 Association of definitions with a target namespace
2.7.2 Providing a target namespace for definitions and
declarations
2.8 Abstract and Concrete Syntax
3 Schema Definitions and Declarations
3.1 The Schema
3.2 The Document and its Root
3.3 References to Schema Constructs
3.4 Types, Elements and Attributes
3.4.1 Simple Type Definition
3.4.2 Complex Type Definition
3.4.3 Attribute Declaration
3.4.4 Attribute Group Definition
3.4.5 Element Content Model
3.4.6 Rich Content Models
3.4.7 Mixed Content
3.4.8 Named Model Group
3.4.9 Element Declaration
3.5 Wildcards
3.6 Deriving Type Definitions
3.6.1 Deriving type definitions by extension
3.6.2 Deriving type definitions by restriction
3.6.3 Controlling derivation
3.6.4 Reinterpreting Content Models
3.6.5 Element Equivalence Classes
3.6.6 The ur-type
3.6.7 Graveyard for stale
syntax, here to avoid breaking IDREFs elsewhere *
3.7 Unique, key and key reference constraints
3.8 Notations
3.8.1 Notation Declaration
4 Schema Access and Composition
4.1 Layer 1: Summary of the schema-validation core
4.2 Layer 2: Schema definitions in XML
4.2.1 Assembling a schema for a single namespace
from multiple schema definition documents
4.2.2 References to schema components
across namespaces
4.3 Layer 3: Web-interoperability
4.3.1 Standards for representation and retrieval of
schema definitions on the Web
4.3.2 How schema definitions are located on the Web
5 Annotating schemas
6 Conformance *
6.1 Schema Validity *
6.2 Detailed validity constraints and
definitions *
6.2.1 The Schema *
6.2.2 References to Schema
Constructs *
6.2.3 Types, Elements and
Attributes *
6.2.4 Type Refinement *
6.2.5 Import Restrictions *
6.2.6 Schema Inclusion *
6.2.7 Schema Validity *
6.3 Responsibilities of
Schema-aware processors *
6.4 Lexical representation *
6.5 Information set *
Appendices
A (normative) Schema for Schemas
B (normative) DTD for Schemas
C Glossary (normative) *
D References (normative) *
E Acknowledgments (non-normative)
F Sample Schema (non-normative)
G Tabulation of changes
H Open Issues
1 Introduction
1.1 Purpose
1.2 Requirements
1.3 Scope
1.4 Terminology
2 Type System
2.1 Datatype
2.2 Value space
2.3 Lexical space
2.4 Datatype dichotomies
2.4.1 Atomic vs. aggregate datatypes
2.4.2 Primitive vs. generated datatypes
2.4.3 Built-in vs. user-generated
datatypes
2.5 Facets
2.5.1 Fundamental facets
2.5.2 Constraining or Non-fundamental facets
3 Built-in datatypes
3.1 Namespace considerations
3.2 Primitive datatypes
3.2.1 string
3.2.2 boolean
3.2.3 float
3.2.4 double
3.2.5 decimal
3.2.6 timeInstant
3.2.7 timeDuration
3.2.8 recurringInstant
3.2.9 binary
3.2.10 uri
3.3 Generated datatypes
3.3.1 language
3.3.2 NMTOKEN
3.3.3 NMTOKENS
3.3.4 Name
3.3.5 QName
3.3.6 NCName
3.3.7 ID
3.3.8 IDREF
3.3.9 IDREFS
3.3.10 ENTITY
3.3.11 ENTITIES
3.3.12 NOTATION
3.3.13 integer
3.3.14 non-negative-integer
3.3.15 positive-integer
3.3.16 non-positive-integer
3.3.17 negative-integer
3.3.18 date
3.3.19 time
4 Defining Generated Datatypes
5 Conformance
Appendices
A Schema for Datatype Definitions (normative)
B DTD for Datatype Definitions (normative)
C Datatypes and Facets
C.1 Fundamental Facets
C.2 Constraining Facets
D ISO 8601 Date and Time Formats
D.1 ISO 8601 Conventions
D.2 Truncated Formats
D.3 Deviations from ISO 8601 Formats
D.3.1 Sign Allowed
D.3.2 More Than 9999 Years
E Regular Expressions
F References
F.1 Normative
F.2 Non-normative
G Acknowledgments (non-normative)
H Open Issues
I Revisions from Previous Draft
First the instance
<PurchaseOrder orderDate="1999-05-20">
<shipTo type="US">
<name>Alice Smith</name>
<street>123 Maple Street</street>
<city>Mill Valley</city>
<state>CA</state>
<zip>90952</zip>
</shipTo>
<billTo type="UK">
<name>Trevor Mostyn</name>
<street>12, The Gables</street>
<city>Bourton-on-the-Water</city>
<state>Glous.</state>
<zip>GL3 2BB</zip>
</billTo>
<shipDate>1999-05-25</shipDate>
<comment>Get these things to me in a hurry, my lawn is going wild!</comment>
<Items>
<Item pno="333-333">
<productName>Lawnmower,
model BUZZ-1</productName>
<quantity>1</quantity>
<price>148.95</price>
<comment>Please confirm this is the electric model</comment>
</Item>
<Item pno="444-444">
<productName>Baby Monitor,
model SNOOZE-2</productName>
<quantity>1</quantity>
<price>39.98</price>
</Item>
</Items>
</PurchaseOrder>
Then the schema
<schema
targetNamespace='http://…/PurchaseOrder'
xmlns:po='http://…/PurchaseOrder'
xmlns='http://www.w3.org/1999/XMLSchema'>
<element name='PurchaseOrder'
type='po:PurchaseOrderType'/>
<element name='comment' type='string'/>
<type name='PurchaseOrderType'>
<element name='shipTo' type='po:Address'/>
<element name='billTo' type='po:Address'/>
<element name='shipDate' type='date'/>
<element ref='po:comment' minOccurs='0'/>
<element name='Items' type='po:Items'/>
<attribute name='orderDate' type='date'/>
</type>
<type name='Address'>
<element name='name' type='string'/>
<element name='street' type='string'/>
<element name='city' type='string'/>
<element name='state' type='string'/>
<element name='zip' type='integer'/>
<attribute name='type' type='string'/>
</type>
<type name='Items'>
<element name='Item'
minOccurs='0' maxOccurs='*'>
<type>
<element name='productName'
type='string'/>
<element name='quantity'>
<datatype source='integer'>
<minExclusive value='0'/>
</datatype>
</element>
<element name='price' type='decimal'/>
<element ref='po:comment'
minOccurs='0'/>
<attribute name='pno'
type='string'/>
</type>
</element>
</type>
</schema>
The schema
<xsd:type name='animalFriends'>
<xsd:any equivClass='pet' maxOccurs='*'/>
</xsd:type>
<xsd:type name='pet'>
<xsd:attribute name='name'/>
<xsd:attribute name='owner' minOccurs='0'/>
</xsd:type>
<xsd:element name=’pet’ type='pet'
abstract=’yes’/>
<xsd:element name='cat' equivClass='pet'/>
<xsd:type source='pet' derivedBy='extension'>
<xsd:element name='kittens' minOccurs='0'/>
<xsd:attribute name='lives'/>
</xsd:type>
</xsd:element>
<xsd:element name='dog' class='pet'/>
<xsd:type source='pet' derivedBy='extension'>
<xsd:element name='puppies' minOccurs='0'/>
<xsd:attribute name='breed'/>
</xsd:type>
</xsd:element>
A valid instance
<anima
<animalFriends>
<cat
name='Fluffy' lives='9'/>
<dog
name='Gromit' owner='Wallace'
breed='mutt'/>
</animalFriends>
See disclaimer on page 2
<!-- XML Schema schema for XML Schemas:
Part 2: Datatypes -->
<!-- Note this schema is NOT the normative
datatypes schema - - the prose copy
in the
datatypes REC is the normative
version (which
shouldn't differ from this one except for
this comment and entity
expansions, but just in case -->
<!DOCTYPE schema PUBLIC
"-//W3C//DTD XMLSCHEMA
19991216//EN"
"structures.dtd" >
<schema xmlns="http://www.w3.org/1999/XMLSchema"
targetNamespace="http://www.w3.org/1999/XMLSchema"
version="$Id: datatypes.xsd,v
1.2 1999/12/04 12:09:08 aqw Exp $">
<type name="datatype"
source="annotated"
derivedBy="extension">
<element ref="facet"
minOccurs="0"
maxOccurs="*"/>
<attribute name="name" type="NCName">
<annotation>
<info>Will be restricted to
required or forbidden</info>
</annotation>
</attribute>
<attribute name="source"
type="QName"
minOccurs="1"/>
</type>
<element name="datatype" equivClass="schemaTop">
<type source="datatype"
derivedBy="restriction">
<annotation>
<info>This is the top-level type
element,
as ref'ed in <schema</info>
</annotation>
<attribute name="name" minOccurs="1">
<annotation>
<info>Required at the
top level</info>
</annotation>
</attribute>
</type>
</element>
<type name="facet"
source="annotated"
derivedBy="extension">
<attribute name="value" minOccurs="1"/>
</type>
<element name="facet"
type="facet"
abstract="true"/>
<element name="minBound" abstract="true"
equivClass="facet"/>
<element name="minExclusive"
equivClass="minBound"/>
<element name="minInclusive"
equivClass="minBound"/>
<element name="maxBound" abstract="true"
equivClass="facet"/>
<element name="maxExclusive"
equivClass="maxBound"/>
<element name="maxInclusive"
equivClass="maxBound"/>
<type name="numFacet" source="facet"
derivedBy="restriction">
<attribute name="value"
type="non-negative-integer"/>
</type>
<element name="precision" type="numFacet"
equivClass="facet"/>
<element name="scale" type="numFacet"
equivClass="facet"/>
<element name="length" type="numFacet"
equivClass="facet"/>
<element name="maxLength" type="numFacet"
equivClass="facet"/>
<!-- the following datatype is used to limit the
possible values for the encoding facet on
the binary datatype -->
<datatype name="encodings" source="NMTOKEN">
<enumeration value="hex">
<annotation>
<info>each (8-bit) byte is encoded as
a sequence of 2 hexidecimal
digits</info>
</annotation>
</enumeration>
<enumeration value="base64">
<annotation>
<info>value is encoded in Base64 as
defined in the MIME RFC</info>
</annotation>
</enumeration>
</datatype>
<element name="encoding" equivClass="facet">
<type source="facet" derivedBy="restriction">
<attribute name="value" type="encodings"/>
</type>
</element>
<element name="period" equivClass="facet">
<type source="facet" derivedBy="restriction">
<attribute name="value" type="timeDuration"/>
</type>
</element>
<element
name="enumeration"
equivClass="facet"/>
<element name="pattern" equivClass="facet"/>
<!-- built-in generated datatypes -->
<!-- only has a few for now, eventually needs to have all of them -->
<datatype name="integer" source="decimal">
<scale value="0"/>
</datatype>
<datatype name="non-negative-integer"
source="integer">
<minInclusive value="0"/>
</datatype>
<datatype name="positive-integer"
source="non-negative-integer">
<minInclusive value="1"/>
</datatype>
<datatype name="non-positive-integer"
source="integer">
<maxInclusive value="0"/>
</datatype>
<datatype name="negative-integer"
source="non-positive-integer">
<maxInclusive value="-1"/>
</datatype>
<datatype name="date"
source="recurringInstant">
<period value="000000T2400"/>
</datatype>
<datatype name="time"
source="recurringInstant">
<period value="000000T2400"/>
</datatype>
<datatype name="NMTOKENS" source="string">
<pattern value="\c+(\s\c+)*">
<annotation>
<info source="http://www.w3.org/TR/REC-xml#NT-Nmtokens">
pattern matches production 8
from the XML spec
</info>
</annotation>
</pattern>
</datatype>
<datatype name="NMTOKEN" source="NMTOKENS">
<pattern value="\c+">
<annotation>
<info source="http://www.w3.org/TR/REC-xml#NT-Nmtoken">
pattern matches production 7 from the XML spec
</info>
</annotation>
</pattern>
</datatype>
<datatype name="Name" source="NMTOKEN">
<pattern value="\i\c*">
<annotation>
<info source="http://www.w3.org/TR/REC-xml#NT-Name">
pattern matches production 5 from the XML spec
</info>
</annotation>
</pattern>
</datatype>
<datatype name="ID" source="NCName">
<annotation>
<info
source="http://www.w3.org/TR/REC-xml#id">
values of this datatype must be
unique
within a document
</info>
</annotation>
</datatype>
<datatype name="IDREFS" source="string">
<pattern
value="[\i-[:]][\c-[:]]*(\s[\i-[:]][\c-[:]]*)*">
<annotation>
<info
source="http://www.w3.org/TR/REC-xml#NT-Names">
pattern matches production 6 from the XML spec
modified as required by the
Conformance section in
Namespaces
in XML
(http://www.w3.org/TR/REC-xml-names#conformance)
</info>
<info
source="http://www.w3.org/TR/REC-xml#idref">
values of this datatype must have
occured within
a document as the value of some
component
of
type ID
</info>
</annotation>
</pattern>
</datatype>
<datatype name="IDREF" source="IDREFS">
<pattern value="[\i-[:]][\c-[:]]*">
<annotation>
<info
source="http://www.w3.org/TR/REC-xml-names#NT-NCName">
pattern matches production 4 from the
Namespaces in XML spec
</info>
<info
source="http://www.w3.org/TR/REC-xml#idref">
values of this datatype must have
occured within
a document as the value of some
component
of
type ID
</info>
</annotation>
</pattern>
</datatype>
<datatype name="ENTITIES" source="string">
<pattern
value="[\i-[:]][\c-[:]]*(\s[\i-[:]][\c-[:]]*)*">
<annotation>
<info
source="http://www.w3.org/TR/REC-xml#NT-Names">
pattern
matches production 6 from
the XML spec
(modified as required by the
Conformance section in
Namespaces
in XML
(http://www.w3.org/TR/REC-xml-names#conformance)
</info>
<info
source="http://www.w3.org/TR/REC-xml#entname">
values of this datatype must match the
name of
an
unparsed entity declared in the schema
</info>
</annotation>
</pattern>
</datatype>
<datatype name="ENTITY" source="ENTITIES">
<pattern value="[\i-[:]][\c-[:]]*">
<annotation>
<info
source="http://www.w3.org/TR/REC-xml-names#NT-NCName">
pattern matches production 4 from
the Namespaces in XML spec
</info>
<info
source="http://www.w3.org/TR/REC-xml#entname">
values of this datatype must match
the name of
an unparsed entity declared in the schema
</info>
</annotation>
</pattern>
</datatype>
<datatype name="NCName" source="Name">
<pattern value="[\i-[:]][\c-[:]]*">
<annotation>
<info
source="http://www.w3.org/TR/REC-xml-names/#NT-NCName">
pattern matches production 4 from
the Namespaces in XML
spec
</info>
</annotation>
</pattern>
</datatype>
<datatype name="QName" source="Name">
<pattern
value="([\i-[:]][\c-[:]]*:)?[\i-[:]][\c-[:]]*">
<annotation>
<info
source="http://www.w3.org/TR/REC-xml-names/#NT-QName">
pattern matches production 6 from
the Namespaces in XML spec
</info>
</annotation>
</pattern>
</datatype>
</schema>
See disclaimer on page 2
<?xml version='1.0'?>
<!-- XML Schema schema for
XML Schemas: Part 1: Structures -->
<!-- Note this schema is NOT the normative structures schema - - the
prose copy in the structures REC is the normative version (which
shouldn't differ from this one except for this comment and entity expansions,
but just in case-->
<!DOCTYPE schema PUBLIC "-//W3C//DTD XMLSCHEMA 19991216//EN" "structures.dtd" [
<!ATTLIST schema xmlns:x CDATA #IMPLIED> <!-- keep this schema XML1.0 valid -->
]>
<schema xmlns="http://www.w3.org/1999/XMLSchema"
targetNamespace="http://www.w3.org/1999/XMLSchema"
xmlns:x="http://www.w3.org/XML/1998/namespace"
version="Id: structures.xsd,v
1.26 1999/12/10 16:08:42 aqw Exp ">
<!-- get access to the xml:
attribute
groups for xml:lang -->
<import namespace="http://www.w3.org/XML/1998/namespace"
schemaLocation="http://www.w3.org/XML/1998/xml.xsd"
/>
<!-- The datatype element and all of
its
members are defined
in XML Schema: Part 2: Datatypes -->
<include
schemaLocation="http://www.w3.org/XML/Group/xmlschema-current/datatypes/datatypes.xsd"/>
<type name="annotated">
<annotation>
<info>This type is extended by
all types
which allow annotation
other than <schema> itself</info>
</annotation>
<element ref="annotation" minOccurs="0"/>
</type>
<element name="schemaTop" abstract="true" type="annotated">
<annotation>
<info>This abstract element
defines an
equivalence class over the
elements which occur freely at the top
level of schemas.
These are: datatype, type, element,
attributeGroup, group,
notation
All of their types are based on the
"annotated" type by
extension.</info>
</annotation>
</element>
<!-- schema element -->
<element name="schema">
<annotation>
<info>The obnoxious duplication
in the
content model below is to
avoid
infringing the no-ambiguity constraint
while still allowing
annotation virtually anywhere.</info>
</annotation>
<type>
<group order="choice"
minOccurs="0"
maxOccurs="*">
<element ref="include"/>
<element ref="import"/>
<element ref="annotation"/>
</group>
<element ref="schemaTop"/>
<group order="choice"
minOccurs="0"
maxOccurs="*">
<element ref="annotation"/>
<element ref="schemaTop"/>
</group>
<attribute name="targetNamespace" type="uri"/>
<attribute name="version" type="string"/>
<attribute
name="finalDefault"
type="derivationSet"/>
<attribute
name="exactDefault"
type="exactSet"/>
</type>
</element>
<!-- annotation element -->
<element name="annotation">
<type>
<group order="choice"
minOccurs="0"
maxOccurs="*">
<element name="appinfo">
<type content="mixed">
<any minOccurs="0" maxOccurs="*"/>
<attribute name="source" type="uri"/>
</type>
</element>
<element name="info">
<type content="mixed">
<any minOccurs="0" maxOccurs="*"/>
<attribute name="source" type="uri"/>
<attributeGroup ref="x:lang"/>
</type>
</element>
</group>
</type>
</element>
<!-- For references to a type -->
<!-- 'element', 'attribute' and
'any'
all use this -->
<attributeGroup name="typeRef">
<attribute name="type" type="QName"/>
</attributeGroup>
<!-- For 'element' and 'attribute' -->
<attributeGroup name="valueConstraint">
<attribute name="default" type="string"/>
<attribute name="fixed" type="string"/>
</attributeGroup>
<!-- for all particles -->
<attributeGroup name="occurs">
<attribute
name="minOccurs"
type="non-negative-integer"
default="1"/>
<attribute name="maxOccurs" type="string"/> <!-- allows '*', so integer won't do -->
</attributeGroup>