XSD: Structure definitions are not DTDs

From owner-xml-dev@ic.ac.uk Mon May 25 11:37:18 1998
Date: Mon, 25 May 1998 12:28:16 -0400
From: Paul Prescod <papresco@technologist.com>
Subject: XSD: Structure definitions are not DTDs

I wanted to summarize and emphasize my arguments for throwing away the "DTD model" in creating structure definitions. I've rather dribbled them out over several days, so I thought that a coherent position paper would be useful.

The DTD concept is more than 10 years old now. The world has changed alot since then, and we know of many flaws with them. The most *obvious* complaint is that they are in a notation that is different from XML instance syntax. Personally, I think that the more subtle flaws are much more major and that it would be a tragedy to correct this minor (and debatable!) flaw without correcting the major (and perhaps less debatable) flaws. There are also purely technical reasons that structure definition documents cannot be used as a "drop-in" replacement for DTDs.

#1. Entities have nothing to do with "document types" or "document structure." In fact, I can't think of any name that could unify entities, elements and attributes other than "bag of definitions." In fact, entities are *declarations* in that they must be declared before they can be used, and element type definitions are *definitions* in that you can use them without declaring them. Entities also relate to the physical structure of an XML document (the mapping from small various text strings to a single combined text string). Elements relate to the logical strutcure (the mapping from a text string to a logical tree).

Let's also look at it practically. Schemata do not typically have to be extended on a per-document basis, but DTDs *do*, because people need to declare the entities (e.g. abbreviations, graphics) required for their document.

#2. Documents could have multiple schemata, but XML and SGML allow only a single DTD. For instance a single DTD could conform to HTML 1.0, HTML 2.0, CML and many other HTML-like DTDs. I think that the application should be able to be validate a document against as many of them as the user feels it should be validated against. An author should also be able to claim conformance to as many schemata as it wants.

#3. DTDs cannot be reused. In the precise sense defined by XML and SGML, a DTD is a part of a document. It is not something that can be defined standalone. HTML 4.0 is not a DTD. It is merely a set of markup declarations. This can be easily verified in two ways. First, you can try to go to either the XML specification or SGML standard and you will see that DTDs are only defined in the context of a document. Second, you can note that every version of HTML or any other major DTDs, is built with a bunch of parameter entities which can be turned on and off in the document instance. Thus even HTML 4.0 is a set of DTDs (loose, strict, etc.)

#4. DTDs are already defined. Any document with a <!DOCTYPE ...> that points to a Structure Definition Document (SDD) cannot be valid or even well-formed. In other words, XSD cannot be "drop-in" replacements for XML DTDs. Period.

#5. SDDs cannot define entities "in time". There is no way that an SDD can make the following XML document well-formed:

<?SDD href="http://www.my.structure.definition.document"?> <FOO>&bar</FOO>

Of course an appropriate DTD *can* make that document well-formed (and valid). So XSDs cannot do everything that DTDs do.

#6. Handling external entities is *hard*. XML says that well-formedness checkers do not have to download external entities. I think this is such a black hole of headaches that I hope that nobody ever uses it. I especially hope that XSD does not depend upon it. There is no parser that I know of that can be directed to "re-parse" a document with an entity expanded. So how would you implement "entity declaration" on top of these parsers? Should XSD depend on features that have not been implemented yet? Consider, if this example *was* legal XML:

<?SDD href="http://www.my.structure.definition.document"> <FOO>&bar</FOO>

Now I have an entity declaration in my SDD: <entity name="bar" value="Hello World!">

How would you redirect a typical parser to reparse the document with the entity replaced by the value?

In conclusion:

I feel that SDDs should be substantially different from DTDs in at least three respects:

Paul Prescod - http://itrc.uwaterloo.ca/~papresco

"You have the wrong number."
"Eh? Isn't that the Odeon?"
"No, this is the Great Theater of Life. Admission is free, but the 
taxation is mortal. You come when you can, and leave when you must. The 
show is continuous. Good-night." -- Robertson Davies, "The Cunning Man"

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)