From owner-xml-dev@ic.ac.uk Mon May 25 11:37:18 1998 Date: Mon, 25 May 1998 12:28:16 -0400 From: Paul Prescod <papresco@technologist.com> Subject: XSD: Structure definitions are not DTDs
I wanted to summarize and emphasize my arguments for throwing away the "DTD model" in creating structure definitions. I've rather dribbled them out over several days, so I thought that a coherent position paper would be useful.
The DTD concept is more than 10 years old now. The world has changed alot since then, and we know of many flaws with them. The most *obvious* complaint is that they are in a notation that is different from XML instance syntax. Personally, I think that the more subtle flaws are much more major and that it would be a tragedy to correct this minor (and debatable!) flaw without correcting the major (and perhaps less debatable) flaws. There are also purely technical reasons that structure definition documents cannot be used as a "drop-in" replacement for DTDs.
#1. Entities have nothing to do with "document types" or "document structure." In fact, I can't think of any name that could unify entities, elements and attributes other than "bag of definitions." In fact, entities are *declarations* in that they must be declared before they can be used, and element type definitions are *definitions* in that you can use them without declaring them. Entities also relate to the physical structure of an XML document (the mapping from small various text strings to a single combined text string). Elements relate to the logical strutcure (the mapping from a text string to a logical tree).
Let's also look at it practically. Schemata do not typically have to be extended on a per-document basis, but DTDs *do*, because people need to declare the entities (e.g. abbreviations, graphics) required for their document.
#2. Documents could have multiple schemata, but XML and SGML allow only a single DTD. For instance a single DTD could conform to HTML 1.0, HTML 2.0, CML and many other HTML-like DTDs. I think that the application should be able to be validate a document against as many of them as the user feels it should be validated against. An author should also be able to claim conformance to as many schemata as it wants.
#3. DTDs cannot be reused. In the precise sense defined by XML and SGML, a DTD is a part of a document. It is not something that can be defined standalone. HTML 4.0 is not a DTD. It is merely a set of markup declarations. This can be easily verified in two ways. First, you can try to go to either the XML specification or SGML standard and you will see that DTDs are only defined in the context of a document. Second, you can note that every version of HTML or any other major DTDs, is built with a bunch of parameter entities which can be turned on and off in the document instance. Thus even HTML 4.0 is a set of DTDs (loose, strict, etc.)
#4. DTDs are already defined. Any document with a <!DOCTYPE ...> that points to a Structure Definition Document (SDD) cannot be valid or even well-formed. In other words, XSD cannot be "drop-in" replacements for XML DTDs. Period.
#5. SDDs cannot define entities "in time". There is no way that an SDD can
make the following XML document well-formed:
<?SDD href="http://www.my.structure.definition.document"?>
<FOO>&bar</FOO>
Of course an appropriate DTD *can* make that document well-formed (and
valid). So XSDs cannot do everything that DTDs do.
#6. Handling external entities is *hard*. XML says that well-formedness
checkers do not have to download external entities. I think this is such a
black hole of headaches that I hope that nobody ever uses it. I especially
hope that XSD does not depend upon it. There is no parser that I know of
that can be directed to "re-parse" a document with an entity expanded. So
how would you implement "entity declaration" on top of these parsers?
Should XSD depend on features that have not been implemented yet?
Consider, if this example *was* legal XML:
<?SDD href="http://www.my.structure.definition.document">
<FOO>&bar</FOO>
Now I have an entity declaration in my SDD: <entity name="bar" value="Hello World!">
How would you redirect a typical parser to reparse the document with the entity replaced by the value?
In conclusion:
I feel that SDDs should be substantially different from DTDs in at least three respects:
Paul Prescod - http://itrc.uwaterloo.ca/~papresco
"You have the wrong number." "Eh? Isn't that the Odeon?" "No, this is the Great Theater of Life. Admission is free, but the taxation is mortal. You come when you can, and leave when you must. The show is continuous. Good-night." -- Robertson Davies, "The Cunning Man" xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)