XML Schema Tutorial:

Additional Materials [*]

Henry S. Thompson

Language Technology Group,
Human Communication Research Centre
University of Edinburgh

 

London, 15 December 1999

 


Introduction

This document contains additional illustrative and reference material to accompany the narrative presentation in the slide documents.

DISCLAIMER

Some of the material contained herein is drawn from internal working drafts of the XML Schema Working Group of the W3C.   Those  WG-internal drafts are due to be published in the next few days, but may change in the interim, and in any case errors may have been introduced in copying and formatting.  Only material identified as W3C publications is a reliable or quotable* source of information about XML Schema or any other W3C activity.

 

_______

*Note that employees of W3C member companies (that's you) are enjoined from distributing W3C-internal materials to non-W3C members:  these materials are covered by that injunction!

 


XML Schema Goals

“The purpose of an XML Schema: Structures schema is to define and describe a class of XML documents by using [schema] constructs to constrain and document the meaning, usage and relationships of their constituent parts: datatypes, elements and their content, attributes and their values. Schema constructs may also provide for the specification of additional information such as default values. Schemas are intended to document their own meaning, usage, and function through a common documentation vocabulary. Thus, XML Schema: Structures can be used to define, describe and catalogue XML vocabularies for classes of XML documents”

 

“[XML Schema: Datatypes] addresses the need of both document authors and applications writers for a robust, extensible datatype system for XML which could be incorporated into XML processors. As discussed below, these datatypes could be used in other XML-related standards as well.”

 

The Structures draft TOC

1 Introduction
1.1 Documentation Conventions
1.2 Purpose
1.3 Relationship To Other Work
1.4 Terminology
2 Conceptual Framework
2.1 Kinds of XML Documents
2.2 On schemas, constraints and contributions
2.3 Schemas, Types and Elements
2.4 Schemas and their component parts
2.5 Names and Symbol Spaces
2.6 Referencing Schema Components
2.7 Association of components with a target namespace
2.7.1 Association of definitions with a target namespace
2.7.2 Providing a target namespace for definitions and declarations
2.8 Abstract and Concrete Syntax
3 Schema Definitions and Declarations
3.1 The Schema
3.2 The Document and its Root
3.3 References to Schema Constructs
3.4 Types, Elements and Attributes
3.4.1 Simple Type Definition
3.4.2 Complex Type Definition
3.4.3 Attribute Declaration
3.4.4 Attribute Group Definition
3.4.5 Element Content Model
3.4.6 Rich Content Models
3.4.7 Mixed Content
3.4.8 Named Model Group
3.4.9 Element Declaration
3.5 Wildcards
3.6 Deriving Type Definitions
3.6.1 Deriving type definitions by extension
3.6.2 Deriving type definitions by restriction
3.6.3 Controlling derivation
3.6.4 Reinterpreting Content Models
3.6.5 Element Equivalence Classes
3.6.6 The ur-type
3.6.7 Graveyard for stale syntax, here to avoid breaking IDREFs elsewhere *
3.7 Unique, key and key reference constraints
3.8 Notations
3.8.1 Notation Declaration
4 Schema Access and Composition
4.1 Layer 1: Summary of the schema-validation core
4.2 Layer 2: Schema definitions in XML
4.2.1 Assembling a schema for a single namespace from multiple schema definition documents
4.2.2 References to schema components across namespaces
4.3 Layer 3: Web-interoperability
4.3.1 Standards for representation and retrieval of schema definitions on the Web
4.3.2 How schema definitions are located on the Web
5 Annotating schemas
6 Conformance *
6.1 Schema Validity *
6.2 Detailed validity constraints and definitions *
6.2.1 The Schema *
6.2.2 References to Schema Constructs *
6.2.3 Types, Elements and Attributes *
6.2.4 Type Refinement *
6.2.5 Import Restrictions *
6.2.6 Schema Inclusion *
6.2.7 Schema Validity *
6.3 Responsibilities of Schema-aware processors *
6.4 Lexical representation *
6.5 Information set *

Appendices

A (normative) Schema for Schemas
B (normative) DTD for Schemas
C Glossary (normative) *
D References (normative) *
E Acknowledgments (non-normative)
F Sample Schema (non-normative)
G Tabulation of changes
H Open Issues

Datatypes draft TOC

1 Introduction
1.1 Purpose
1.2 Requirements
1.3 Scope
1.4 Terminology
2 Type System
2.1 Datatype
2.2 Value space
2.3 Lexical space
2.4 Datatype dichotomies
2.4.1 Atomic vs. aggregate datatypes
2.4.2 Primitive vs. generated datatypes
2.4.3 Built-in vs. user-generated datatypes
2.5 Facets
2.5.1 Fundamental facets
2.5.2 Constraining or Non-fundamental facets
3 Built-in datatypes
3.1 Namespace considerations
3.2 Primitive datatypes
3.2.1 string
3.2.2 boolean
3.2.3 float
3.2.4 double
3.2.5 decimal
3.2.6 timeInstant
3.2.7 timeDuration
3.2.8 recurringInstant
3.2.9 binary
3.2.10 uri
3.3 Generated datatypes
3.3.1 language
3.3.2 NMTOKEN
3.3.3 NMTOKENS
3.3.4 Name
3.3.5 QName
3.3.6 NCName
3.3.7 ID
3.3.8 IDREF
3.3.9 IDREFS
3.3.10 ENTITY
3.3.11 ENTITIES
3.3.12 NOTATION
3.3.13 integer
3.3.14 non-negative-integer
3.3.15 positive-integer
3.3.16 non-positive-integer
3.3.17 negative-integer
3.3.18 date
3.3.19 time
4 Defining Generated Datatypes
5 Conformance

Appendices

A Schema for Datatype Definitions (normative)
B DTD for Datatype Definitions (normative)
C Datatypes and Facets
C.1 Fundamental Facets
C.2 Constraining Facets
D ISO 8601 Date and Time Formats
D.1 ISO 8601 Conventions
D.2 Truncated Formats
D.3 Deviations from ISO 8601 Formats
D.3.1 Sign Allowed
D.3.2 More Than 9999 Years
E Regular Expressions
F References
F.1 Normative
F.2 Non-normative
G Acknowledgments (non-normative)
H Open Issues
I Revisions from Previous Draft

Simple XML Schema example

First the instance

<PurchaseOrder orderDate="1999-05-20">

    <shipTo type="US">

        <name>Alice Smith</name>

        <street>123 Maple Street</street>

        <city>Mill Valley</city>

        <state>CA</state>

        <zip>90952</zip>

    </shipTo>

    <billTo type="UK">

        <name>Trevor Mostyn</name>

        <street>12, The Gables</street>

        <city>Bourton-on-the-Water</city>

        <state>Glous.</state>

        <zip>GL3 2BB</zip>

    </billTo>

    <shipDate>1999-05-25</shipDate>

    <comment>Get these things to me in a hurry, my lawn is going wild!</comment>

    <Items>

        <Item pno="333-333">

            <productName>Lawnmower,

                  model BUZZ-1</productName>

            <quantity>1</quantity>

            <price>148.95</price>

            <comment>Please confirm this is the electric model</comment>

        </Item>

        <Item pno="444-444">

            <productName>Baby Monitor,

                model SNOOZE-2</productName>

            <quantity>1</quantity>

            <price>39.98</price>

        </Item>

    </Items>

</PurchaseOrder>


Then the schema

 <schema
  targetNamespace='http://…/PurchaseOrder'
  xmlns:po='http://…/PurchaseOrder'

  xmlns='http://www.w3.org/1999/XMLSchema'>

 

 <element name='PurchaseOrder'

          type='po:PurchaseOrderType'/>

 

 <element name='comment' type='string'/>

 

 <type name='PurchaseOrderType'>

  <element name='shipTo' type='po:Address'/>

  <element name='billTo' type='po:Address'/>

  <element name='shipDate' type='date'/>

  <element ref='po:comment' minOccurs='0'/>

  <element name='Items' type='po:Items'/>

  <attribute name='orderDate' type='date'/>

 </type>

 

 <type name='Address'>

  <element name='name' type='string'/>

  <element name='street' type='string'/>

  <element name='city' type='string'/>

  <element name='state' type='string'/>

  <element name='zip' type='integer'/>

  <attribute name='type' type='string'/>

 </type>

 

 <type name='Items'>

  <element name='Item'
           minOccurs='0' maxOccurs='*'>

   <type>

    <element name='productName'
             type='string'/>

    <element name='quantity'>

     <datatype source='integer'>

      <minExclusive value='0'/>

     </datatype>

    </element>

    <element name='price' type='decimal'/>

    <element ref='po:comment' minOccurs='0'/>
    <attribute
name='pno' type='string'/>

   </type>

  </element>

 </type>

</schema>

Schema example:
type derivation and element equivalence classes

The schema

<xsd:type name='animalFriends'>

  <xsd:any equivClass='pet' maxOccurs='*'/>

</xsd:type>

<xsd:type name='pet'>

  <xsd:attribute name='name'/>

  <xsd:attribute name='owner' minOccurs='0'/>

 </xsd:type>

 

<xsd:element name=’pet’ type='pet'
         abstract=’yes’/>

<xsd:element name='cat' equivClass='pet'/>
 <xsd:type
source='pet' derivedBy='extension'>

  <xsd:element name='kittens' minOccurs='0'/>

  <xsd:attribute name='lives'/>

 </xsd:type>

</xsd:element>

<xsd:element name='dog' class='pet'/>
 <xsd:type
source='pet' derivedBy='extension'>

  <xsd:element name='puppies' minOccurs='0'/>

  <xsd:attribute name='breed'/>

 </xsd:type>
</xsd:element>


A valid instance
<anima

<animalFriends>

  <cat name='Fluffy' lives='9'/>

  <dog name='Gromit' owner='Wallace'
       breed='mutt'/>

</animalFriends>

 

The Schema for Datatypes

See disclaimer on page 2

<!-- XML Schema schema for XML Schemas:
     Part 2: Datatypes -->

<!-- Note this schema is NOT the normative
     datatypes schema - - the prose copy in the
    datatypes REC is the normative version (which

     shouldn't differ from this one except for
     this comment and entity

     expansions, but just in case -->

<!DOCTYPE schema PUBLIC
    "-//W3C//DTD XMLSCHEMA 19991216//EN"
    "structures.dtd" >

 

<schema xmlns="http://www.w3.org/1999/XMLSchema"

targetNamespace="http://www.w3.org/1999/XMLSchema"
   version="$Id: datatypes.xsd,v 1.2 1999/12/04 12:09:08 aqw Exp $">

 

  <type name="datatype"
        source="annotated" derivedBy="extension">

    <element ref="facet"
         minOccurs="0" maxOccurs="*"/>

    <attribute name="name" type="NCName">

      <annotation>

       <info>Will be restricted to
             required or forbidden</info>

      </annotation>

    </attribute>

    <attribute name="source" type="QName"
               minOccurs="1"/>

  </type>

 

  <element name="datatype" equivClass="schemaTop">

    <type source="datatype"
          derivedBy="restriction">

     <annotation>

      <info>This is the top-level type element,
            as ref'ed in &lt;schema</info>

     </annotation>


     <attribute name="name" minOccurs="1">

      <annotation>
       <info>
Required at the top level</info>

      </annotation>

     </attribute>

   </type>

  </element>

 

   <type name="facet"
         source="annotated"
         derivedBy="extension">

     <attribute name="value" minOccurs="1"/>

   </type>

 

  <element name="facet" type="facet"
           abstract="true"/>

 

  <element name="minBound" abstract="true"

           equivClass="facet"/>

 

  <element name="minExclusive"

           equivClass="minBound"/>

  <element name="minInclusive"

           equivClass="minBound"/>

 

  <element name="maxBound" abstract="true"

           equivClass="facet"/>

 

  <element name="maxExclusive"

           equivClass="maxBound"/>

  <element name="maxInclusive"

           equivClass="maxBound"/>

 

  <type name="numFacet" source="facet"

        derivedBy="restriction">

   <attribute name="value"

              type="non-negative-integer"/>

  </type>

 

  <element name="precision" type="numFacet"

           equivClass="facet"/>

  <element name="scale" type="numFacet"

           equivClass="facet"/>

 

  <element name="length" type="numFacet"

           equivClass="facet"/>

  <element name="maxLength" type="numFacet"

           equivClass="facet"/>

 

  <!-- the following datatype is used to limit the

       possible values for the encoding facet on

           the binary datatype -->

  <datatype name="encodings" source="NMTOKEN">

    <enumeration value="hex">

      <annotation>

        <info>each (8-bit) byte is encoded as

              a sequence of 2 hexidecimal

              digits</info>

      </annotation>

    </enumeration>

    <enumeration value="base64">

      <annotation>

        <info>value is encoded in Base64 as

              defined in the MIME RFC</info>

      </annotation>

    </enumeration>

  </datatype>

 

  <element name="encoding" equivClass="facet">

   <type source="facet" derivedBy="restriction">

    <attribute name="value" type="encodings"/>

   </type>

  </element>

 

  <element name="period" equivClass="facet">

   <type source="facet" derivedBy="restriction">

    <attribute name="value" type="timeDuration"/>

   </type>

  </element>

 

  <element name="enumeration"
           equivClass="facet"/>

 

  <element name="pattern" equivClass="facet"/>

 

<!-- built-in generated datatypes -->

<!-- only has a few for now, eventually needs to have all of them -->

 

  <datatype name="integer" source="decimal">

    <scale value="0"/>

  </datatype>

       

  <datatype name="non-negative-integer"

            source="integer">

    <minInclusive value="0"/>

  </datatype>

 

  <datatype name="positive-integer"

            source="non-negative-integer">

    <minInclusive value="1"/>

  </datatype>

 

  <datatype name="non-positive-integer"

            source="integer">

    <maxInclusive value="0"/>

  </datatype>

 

  <datatype name="negative-integer"

            source="non-positive-integer">

    <maxInclusive value="-1"/>

  </datatype>

 

  <datatype name="date"
            source="recurringInstant">

    <period value="000000T2400"/>

  </datatype>

 

  <datatype name="time"
            source="recurringInstant">

    <period value="000000T2400"/>

  </datatype>

 

  <datatype name="NMTOKENS" source="string">

  <pattern value="\c+(\s\c+)*">

   <annotation>

    <info source="http://www.w3.org/TR/REC-xml#NT-Nmtokens">

     pattern matches production 8

     from the XML spec

    </info>

   </annotation>

  </pattern>

 </datatype>

 

 <datatype name="NMTOKEN" source="NMTOKENS">

  <pattern value="\c+">

   <annotation>

    <info source="http://www.w3.org/TR/REC-xml#NT-Nmtoken">

   pattern matches production 7 from the XML spec

   </info>

  </annotation>

  </pattern>

 </datatype>

 

 <datatype name="Name" source="NMTOKEN">

  <pattern value="\i\c*">

   <annotation>

    <info source="http://www.w3.org/TR/REC-xml#NT-Name">

   pattern matches production 5 from the XML spec

   </info>

   </annotation>

  </pattern>

 </datatype>

 

 <datatype name="ID" source="NCName">

  <annotation>

   <info source="http://www.w3.org/TR/REC-xml#id">
     values of this datatype must be unique
     within a document

   </info>

  </annotation>

 </datatype>

 

 <datatype name="IDREFS" source="string">

  <pattern
 value="[\i-[:]][\c-[:]]*(\s[\i-[:]][\c-[:]]*)*">

  <annotation>

  <info
  source="http://www.w3.org/TR/REC-xml#NT-Names">

   pattern matches production 6 from the XML spec

   modified as required by the

   Conformance section in

   Namespaces in XML
(http://www.w3.org/TR/REC-xml-names#conformance)

   </info>

   <info
     source="http://www.w3.org/TR/REC-xml#idref">
 values of this datatype must have occured within
 a document as the value of some component
  of type ID

   </info>

   </annotation>

  </pattern>

 </datatype>


<datatype name="IDREF" source="IDREFS">

  <pattern value="[\i-[:]][\c-[:]]*">

   <annotation>

    <info source="http://www.w3.org/TR/REC-xml-names#NT-NCName">
   pattern matches production 4 from the
   Namespaces in XML spec

    </info>

    <info
     source="http://www.w3.org/TR/REC-xml#idref">
 values of this datatype must have occured within
 a document as the value of some component
  of type ID

   </info>

   </annotation>

  </pattern>

 </datatype>

 

 

 <datatype name="ENTITIES" source="string">

  <pattern
 value="[\i-[:]][\c-[:]]*(\s[\i-[:]][\c-[:]]*)*">

  <annotation>

   <info
  source="http://www.w3.org/TR/REC-xml#NT-Names">

     pattern matches production 6 from
     the XML spec

     (modified as required by the

     Conformance section in

     Namespaces in XML
 (http://www.w3.org/TR/REC-xml-names#conformance)

    </info>

    <info
   source="http://www.w3.org/TR/REC-xml#entname">
 values of this datatype must match the name of
  an unparsed entity declared in the schema

    </info>

   </annotation>

  </pattern>

 </datatype>

 

 <datatype name="ENTITY" source="ENTITIES">

  <pattern value="[\i-[:]][\c-[:]]*">

   <annotation>

    <info source="http://www.w3.org/TR/REC-xml-names#NT-NCName">
  pattern matches production 4 from
  the Namespaces in XML spec

    </info>

    <info
   source="http://www.w3.org/TR/REC-xml#entname">
   values of this datatype must match the name of

   an unparsed entity declared in the schema

    </info>

   </annotation>

  </pattern>

 </datatype>

 

 <datatype name="NCName" source="Name">

  <pattern value="[\i-[:]][\c-[:]]*">

   <annotation>

    <info source="http://www.w3.org/TR/REC-xml-names/#NT-NCName">
    pattern matches production 4 from
    the Namespaces in XML spec

    </info>

   </annotation>

  </pattern>

 </datatype>

 

 <datatype name="QName" source="Name">

  <pattern
  value="([\i-[:]][\c-[:]]*:)?[\i-[:]][\c-[:]]*">

   <annotation>

    <info source="http://www.w3.org/TR/REC-xml-names/#NT-QName">
     pattern matches production 6 from
     the Namespaces in XML spec

    </info>

   </annotation>

  </pattern>

 </datatype>

</schema>


The Schema for Schemas

See disclaimer on page 2

<?xml version='1.0'?>

<!-- XML Schema schema for

     XML Schemas: Part 1: Structures -->

<!-- Note this schema is NOT the normative structures schema - - the prose copy in the structures REC is the normative version (which
shouldn't differ from this one except for this comment and entity expansions, but just in case-->

<!DOCTYPE schema PUBLIC "-//W3C//DTD XMLSCHEMA 19991216//EN" "structures.dtd" [

<!ATTLIST schema xmlns:x CDATA #IMPLIED> <!-- keep this schema XML1.0 valid -->

]>

<schema xmlns="http://www.w3.org/1999/XMLSchema"
        targetNamespace="http://www.w3.org/1999/XMLSchema"
   xmlns:x="http://www.w3.org/XML/1998/namespace"
   version="Id: structures.xsd,v 1.26 1999/12/10 16:08:42 aqw Exp ">

 

 <!-- get access to the xml: attribute
      groups for xml:lang -->

 <import namespace="http://www.w3.org/XML/1998/namespace"
schemaLocation="http://www.w3.org/XML/1998/xml.xsd"
/>

 

 

  <!-- The datatype element and all of its
       members are defined

       in XML Schema: Part 2: Datatypes -->

 

  <include
schemaLocation="http://www.w3.org/XML/Group/xmlschema-current/datatypes/datatypes.xsd"/>

 

  <type name="annotated">

   <annotation>

    <info>This type is extended by all types
          which allow annotation

          other than &lt;schema> itself</info>

   </annotation>

   <element ref="annotation" minOccurs="0"/>

  </type>

 

  <element name="schemaTop" abstract="true" type="annotated">

   <annotation>

    <info>This abstract element defines an
          equivalence class over the

          elements which occur freely at the top
          level of schemas.

          These are: datatype, type, element,
          attributeGroup, group, notation

          All of their types are based on the
          "annotated" type by extension.</info>

   </annotation>

  </element>

 

  <!-- schema element -->

 

  <element name="schema">

   <annotation>

    <info>The obnoxious duplication in the
          content model below is to avoid

          infringing the no-ambiguity constraint
          while still allowing

          annotation virtually anywhere.</info>

   </annotation>

    <type>

      <group order="choice" minOccurs="0"
             maxOccurs="*">

       <element ref="include"/>

       <element ref="import"/>

       <element ref="annotation"/>

      </group>

      <element ref="schemaTop"/>

      <group order="choice" minOccurs="0"
             maxOccurs="*">

        <element ref="annotation"/>

        <element ref="schemaTop"/>

      </group>

    <attribute name="targetNamespace" type="uri"/>

    <attribute name="version" type="string"/>

   <attribute name="finalDefault"
              type="derivationSet"/>

   <attribute name="exactDefault"
              type="exactSet"/>

   </type>

  </element>

 

  <!-- annotation element -->

 

  <element name="annotation">

   <type>

    <group order="choice" minOccurs="0"
                          maxOccurs="*">

     <element name="appinfo">

       <type content="mixed">

         <any minOccurs="0" maxOccurs="*"/>

         <attribute name="source" type="uri"/>

       </type>

     </element>

     <element name="info">

       <type content="mixed">

         <any minOccurs="0" maxOccurs="*"/>

         <attribute name="source" type="uri"/>

         <attributeGroup ref="x:lang"/>

       </type>

     </element>

    </group>

   </type>

  </element>

 

 

  <!-- For references to a type -->

  <!-- 'element', 'attribute' and 'any'
        all use this  -->

 

  <attributeGroup name="typeRef">

    <attribute name="type" type="QName"/>

  </attributeGroup>

 

  <!-- For 'element' and 'attribute' -->

  <attributeGroup name="valueConstraint">

   <attribute name="default" type="string"/>

   <attribute name="fixed" type="string"/>

  </attributeGroup>

 

 

  <!-- for all particles -->

  <attributeGroup name="occurs">

    <attribute name="minOccurs"
       type="non-negative-integer" default="1"/>

    <attribute name="maxOccurs" type="string"/> <!-- allows '*', so integer won't do -->

  </attributeGroup>