[Cache from http://redrice.com/schemavalid/faq/xml-schema.html; please use this canonical URL/source if possible.]
DTDs have several limitations, one of which is the fact that they are not written in standard XML data syntax. This means, for instance, that while it is quite possible to write an XSLT transform to document an XML Schema, there are far fewer tools to process DTDs
XML Schema also offers several features which are urgently required for data processing applications, such as a sophisticated set of basic data types including dates, numbers and strings with facets that can be superimposed - inlcuding regular expressions and minimum and maximum ranges and lengths
I am therefore maintaining this FAQ at the XML Schema FAQ web site.
Anyone who has to make different systems talk to each other. I've spent years doing client-server and web application development, and I know how much time is spent on attempting to specify and debug data buffers.
XML Schema as the specification language and XML as the data language fill a gap as vast and as easily overlooked as the air we breathe.
It's an obvious question, but the answer appears to be no. [mailing list references to follow]
[Henry Thompson] DTD processing and XML Schema processing are completely independent, hence combinable: just make sure your schema processor includes a validating XML parser as its first stage.
[Jeff Rafter] The DTD takes precedence for validation purposes. Schema validation comes after DTD validation (if any). This is done to allow entities to be resolved before schema validation. Of course some of this may be application specific..
In a word, No. XML Data Reduced (XDR) is meant to be a subset of the final XML Schema feature set, so that upward compatibility is not a problem. See the answer on the very useful Unofficial MSXML FAQ
Relax is positioned as being radically simpler and upwardly compatible
James Clark, lead author of the XSLT and XPath specs, is developing TREX.
Schematron is XPath based, requires only XSLT to run, and aims to be complementary to XML Schema.
"Part 0" of the spec is a Primer.
There's an excellent tutorial at http://www.xfront.com/xml-schema.html, though you'll need powerpoint to run it.
www.xfront.com also hosts a best practice guide looking at various design issues.
You can get an schema aware editor/validator from XML Spy, for Windows only.
Tibco's XML Instance editor/validator product also now supports XML Schema CR. All of their client tools support Windows, Unix, and Mac (OS X). Their XML Authority also handles conversion between schema flavors, including XSD, DTD, XDR, and SOX.
Validation can be done using Oracle's Java and C products.
Apache - with IBM - has an open source validation project Xerces.
Henry S. Thompson (lead author of the XML Schema specification) has a web-based and downloadable structure validator, XSV.
There is a dedicated mailing list at xmlschema-dev.
If you are developing, rather than simply using, XML tools there is xml-dev.
There's an open-source tool for DTD conversion at: DTD2Schema.
See XML Spy, for Windows, and Tibco's XML Authority
Microsoft have a beta XDR Converter.
Henry Thompson posted the following answer to this problem, using the xsd:union feature
<xsd:simpleType name="UprnType"> <xsd:union> <xsd:simpleType> <xsd:restriction base="xsd:positiveInteger"> <xsd:minInclusive value="1"/> <xsd:maxInclusive value="99999"/> </xsd:restriction> </xsd:simpleType> <xsd:simpleType> <xsd:restriction base="xsd:token"> <xsd:enumeration value=""/> </xsd:restriction> </xsd:simpleType> </xsd:union> </xsd:simpleType> |
XML Schema provides powerful dedicated validation features for things like uniqueness, referential integrity, enumerations, complex types and the various datatype facets, but it doesn't support arbitrary validation logic.
However, in answer to the question: "is it possible to define in a XML-Schema that a elements only appears if a attribute of another elements has a special entry (e.g. one of two enumeration-values)" Henry Thompson replied: "No, co-constraints like this will not be supported until v1.1"
XML Schema also provides a kind of general expansion slot in the form of the annotation / appInfo element, which allows information for other processors to be associated with a component. This might be used for Schematron rules, for instance.
Yes, see the PO example in the primer.
[Noah Mendelsohn] The correct terminology would be that (1) a complex type can have a content model, which can indicate sequences of elements, choices of elements, etc., just as for DTD element types (2) some of the elements named in that content model can themselves be of one or another complex type, or simple type (3) if any of those elements have local declarations, then in the XML transfer syntax, the type declarations are lexically nested, and in that sense can appear to be contained.
llustration of points (1) and (2):
<complexType name='ct'> <!-- here comes a content model --> <sequence> <!-- ...with elements that have types --> <element ref="A"/> <element ref="B"/> </sequence> </complexType'> <element name="A" type="CT2"/> <element name="B" type="CT3"/> |
...and point (3):
<complexType name='ct'> <!-- here comes a content model --> <sequence> <!-- A has anonymous complex type: it looks like it's nested (but in the schema components, we just say it's scoped) --> <element name="A"> <complexType> <sequence> ... </sequence> </complexType> </element> <element ref="B"/> </sequence> </complexType> |
[Noah Mendelsohn] I think the answer is right there in the question. With or without schemas, the vast majority of namespace-based vocabularies use unqualified attributes :
<myns:e attr1="..." xmlns:myns="someuri"/> |
The namespaces recommendation facilitiates such behavior by declining to apply namespace defaults to attributes:
<!-- same element as above, attr still unqualified --> <e attr1="..." xmlns="someuri"/> |
The default behavior of the schema specification is to extend that convention to locally scoped elements:
<myns:outer attr="..." xmlns:myns="someuri"> <inner/> </myns:outer> |
both attr and inner are conceptually associated with the qualified element myns:outer; in other words, they are locally scoped.
I can tell you that the schema workgroup devoted literally months of concentrated effort to the issues underlying the design for local element qualification that you find in our specification. That does not necessarily mean that everyone in the workgroup is equally happy with our final decision, but it does indicate that there are many, many subtleties and that great care was taken in their consideration.
[Noah Mendelsohn] From the specification:
"[Definition:] A distinguished ur-type definition is present in each XML Schema, serving as the root of the type definition hierarchy for that schema. The ur-type definition, whose name is anyType, has the unique characteristic that it can function as a complex or a simple type definition, according to context. Specifically, restrictions of the ur-type definition can themselves be either simple or complex type definitions."
I think that says it, the ur-type is the root of the type hierarchy in each schema. It's like "object" in Java, everything derives from it.
[Martin Duerst explains the "ur-" prefix] "I happen to speak German, and [...] Ur- appears in things such as Ursprung (origin), Urgrossvater (great-grandfather), Urknall (big bang), Ursache (cause, reason), Urheber (originator, author), Urgewalt (elemental force), and so on. The general meaning is something like 'original', 'very very old', and so on."
So, ur-type is approximately supertype or root type. I personally would not have gone for such an obscure name, but.... Anyway, the actual string name you use for it in a schema document is "anyType". e.g.
<xs:element name="envelope" type="xs:anyType"/> |
The manifestation of the urType that admits only simple types is anySimpleType:
<xs:attribute name="thisAttrCanHoldAnyString" type="xs:anySimpleType"/> |
The term urType shows up only when you read the specification itself (historically we called it the urType, then added the convenience name, and decided not to change the whole rest of the spec. Users see "anyType" and "anySimpleType".) There are some subtleties in the type hierarchy, but the above should get you through 99%+ of what you need to do. Hope this helps.
[FN.] The original explanation of the "ur-" prefix triggered some email, but seems near enough - see, for example, this definition.
This is discussed in the Primer Section 3.
Basically there is no direct linkage feature of any kind. The instance document can suggest a link to the parser, but the application or parser may override the choice, for instance by using a different, preferred schema, or by using a cached copy of the suggested schema.
if the instance document doesn't have a namespace, use xsi:noNamespaceSchemaLocation, for example:
[s1.xsd] <?xml version="1.0" encoding="UTF-8"?> <xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema" elementFormDefault="qualified"> <xsd:element name="root"> <xsd:complexType> <xsd:sequence> <xsd:element name="grade" type="abc"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:simpleType name="abc"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="a"/> <xsd:enumeration value="b"/> <xsd:enumeration value="c"/> </xsd:restriction> </xsd:simpleType> </xsd:schema> |
[s1.xml] <root xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:noNamespaceSchemaLocation="s1.xsd" > <grade>a</grade> </root> |
if the instance document does have a namespace, use xsi:schemaLocation, for example:
[s1ns.xsd] <?xml version="1.0" encoding="UTF-8"?> <xsd:schema xmlns:s1="http://www.schemaValid.com/s1ns" targetNamespace="http://www.schemaValid.com/s1ns" xmlns:xsd="http://www.w3.org/2000/10/XMLSchema" elementFormDefault="qualified" > <xsd:element name="root"> <xsd:complexType> <xsd:sequence> <xsd:element name="grade" type="s1:abc"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:simpleType name="abc"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="a"/> <xsd:enumeration value="b"/> <xsd:enumeration value="c"/> </xsd:restriction> </xsd:simpleType> </xsd:schema> |
[s1.xml] <s1:root xmlns:s1="http://www.schemaValid.com/s1ns" xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:schemaLocation="http://www.schemaValid.com/s1ns s1ns.xsd" > <s1:grade>a</s1:grade> </s1:root> |
[Noah Mendelsohn] Different applications require different conventions for determining the schema to be used for validation. For example, an author of individual documents may prefer to indicate directly in the document the location of the schema to be used. Conversely, an e-commerce site might depend on a particular vocabulary, and might not trust the user to determine the schema, even if the user were prepared to do so.
The schema design therefore avoids mandating a particular policy, but it is anticipated that users will acquire and configure processors that do enforce rules appropriate to their needs. So, the schema specification formally provides xsi:schemaLocation as a hint: some processors may follow it, others not. The first user above would indeed do well to use a processor that unconditionally follows schemaLocation, and in that sense elevates it from a hint to a mandatory pointer. The second user would do well to use a processor that either (1) ignores schemaLocation or (2) chooses to reject documents containing schemaLocations or (3) rejects documents in which the schemaLocation does not match the intended schema document. Thus, by using the appropriate processor or processor mode, users can indeed rely on behavior appropriate to particular needs.
Note on options (2) and (3): the schema specification calls for no such checking or error indications, but nothing can prevent a particular processor from declining to attempt a validation based on criteria that meet the users needs (in this case, that the user is unhappy with the schemaLocation occurrence or contents.) If a validation is attempted against one schema or another, the schema specification is very specific about the required results.
One final note: certain schemas provide default values for attributes and elements. In cases where schemaLocation is not honored, users must be very careful to ensure that any such defaults supplied by the schema actually used for validation are consistent with those assumed by the document author. In typical eCommerce scenarios, the company offering the service, or some organization in which they participate, will widely publicize the schemas to be used and the implications of any defaults. At some point in the future, more robust means of protecting against unintended schema uses may be provided.
There's no intended method for doing this, though you might be able to hack it by having only one globally declared element.
The intention of the schema seems to be that of the three roles in any transaction - the schema author, the message author, and the message reader - the message reader should have the last word in specifying to a schema-validating parser what may be a root element.
[Noah Mendelsohn] Yes. As far as the schema language itself is concerned any global element can be used as a root element, but it is intended that the application or processor could be parameterized to check. For example, consider a perfectly reasonable processor that would take a command line like:
validate -instance myinstance.xml -schema myschema.xsd -rootElementName purchaseOrder |
such a processor could provide the added service of checking the name of the root element. There are at least two reasons that the schema language does not take a more rigid view of roots (a) there are situations in which you truly find it useful to have different element names serve as the root of a document and (b) even if purchaseOrder is the root of the instance, you may decide that you only want to validate the shippingAddress. So, the root of the validation need not be the root of the instance document.
Yes, for example XSV, for example, will use "strict" mode if every element from the root down is schema-validatable, but "lax" mode if the root node - or any other element which is allowed to appear in some context - cannot itself be schema-validated.
[Noah Mendelsohn] From xmlschema-1: "With a schema which satisfies the conditions expressed in Errors in Schema Construction and Structure (§7.1) above, the schema-validity of an element information item can be assessed.". It then goes on to say exactly how and against which declarations. Note that it says you can validate an "element", not necessarily the root element of a document.
Net answer to your question: conforming processors can be written to validate any element you like. Not all processors need provide this service: buy or use processors that validate the information you need validated. By the way, the detailed rules give the processor a choice of validating the element against some particular identified element declaration, some particular identified complex type, or to use the mechanisms of strict, lax etc. to determine what to validate based on what declarations happen to be available. All of this is explained at xmlschema-1.
[Noah Mendelsohn] Declaring a global element does NOT directly allow that element to appear in arbitrary points in the validated XML instance document. Global elements are available for reference when building content models. Only the global elements you actually reference from a content model (or allow through use of wildcards) can appear in valid content.
<xsd:element name="a" .... /> <xsd:element name="b" .... /> <xsd:element name="c" .... /> <xsd:element name="x"> <xsd:complexType> <xsd:sequence> <xsd:element ref="b"/> <xsd:element ref="c"/> </xsd:sequence> </xsd:complexType> </xsd:element> |
The global elements named "b" and "c" must occur as children of "x" in the instance document; global element "a" is not legal within "x". Depending how the processor is invoked, any of "a", "b", "c" or "x" might be usable as the root of the validation (which might mean as the root of the whole document, or it might mean that you are validating only part of the document.)
I think the confusion probably arises because readers see the term "global" and presume it to mean "anywhere in the instance document". In fact, it means "available for use in any content model in the schema". By contrast, "local" elements can only be used in the content model of the corresponding (scoping) complex type. Another way to think about this is that global element declarations in schemas give you the same capabilities as the element type declarations in XML 1.0 DTDs. Local declarations are an enhancement in schemas, allowing the same name to be used for different purposes according to context. If you were doing a straight conversion from a DTD to a schema, most likely all the resulting element declarations would be global.
In XML Schema, the <include> element provides for schema modularity. See the XML Schema Primer [1] for more introductory information.
In XML Schema, the <xsd:redefine> element provides for schema modularity with variation, see http://www.w3.org/TR/xmlschema-1/#modify-schema
For a minimal example which works with XSV, assume that [s1.xsd]:
<?xml version="1.0" encoding="UTF-8"?> <xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema" elementFormDefault="qualified"> <xsd:element name="root"> <xsd:complexType> <xsd:sequence> <xsd:element name="grade" type="abc"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:simpleType name="abc"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="a"/> <xsd:enumeration value="b"/> <xsd:enumeration value="c"/> </xsd:restriction> </xsd:simpleType> </xsd:schema> |
defines messages like [s1.xml]:
<root xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:noNamespaceSchemaLocation="s1.xsd" > <grade>c</grade> </root> |
Now [s1_redefine.xsd]:
<?xml version="1.0" encoding="UTF-8"?> <xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema" elementFormDefault="qualified"> <xsd:redefine schemaLocation="s1.xsd"> <xsd:simpleType name="abc"> <xsd:restriction base="abc"> <xsd:enumeration value="a"/> </xsd:restriction> </xsd:simpleType> </xsd:redefine> </xsd:schema> |
will only permit files like [s1_redefine.xml]:
<root xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:noNamespaceSchemaLocation="s1_redefine.xsd"> <grade>a</grade> </root> |
note that the redefined type definition for the grade element of type abc applies even though it is called from a context - the root element - which is only defined in the original [s1.xsd].