A (very) selective sampling of posts to the XSL List on the topic of MSIE 5.0's validation of incoming XML. See the thread in full on the XSL List archive.
[Ken Sall 1999-03-31] I've added an example that illustrates your point that IE5 detects DTD syntax errors. http://members.home.com/kensall/tests/collection1bugsdtd.xml http://members.home.com/kensall/tests/collection1bugs.dtd However, if anyone from Microsoft can explain why IE5 doesn't actually use the DTD to validate the document (the way that IE5 Beta 2 did), I'd appreciate it. This problem will be published in an article shortly (in the larger context of positive things you can do with IE5 with XML/XSL) and it would be great to state correctly what Microsoft plans w.r.t. DTD processing. [Jonathan Marsh 1999-03-31] Subject: RE: Why Doesn't IE5 use the DTD to Validate? This is as designed, not a bug. The IE5 XML parser is a validating parser, with two properties set through DOM extensions to control DTD handling: - validateOnParse determines whether validation errors are presented to the user. - resolveExternals determines whether the DTD or XML Schema is loaded and datatypes, default values, etc. are honored. The values of these properties when browsing directly to XML documents is validateOnParse=false and resolveExternals=true. When browsing XML documents on the Web, surfacing validation errors is of little apparent value. I would not expect publishers to author both a DTD or XML Schema and documents that don't conform to that DTD/Schema. So the vast majority will not generate validation errors. For those that declare a DTD and are invalid, is it no better to give the user a validation error instead of displaying the document, in fact the validation error could prevent the user from viewing an otherwise perfectly good document. Also the performance penalty for validation is significant and should not be imposed on end-users without good reason. The only scenario we could come up with where validation is useful when browsing XML documents is when the browser is used as a development tool, allowing easy checking of well-formedness and validation for a document in progress. This scenario can be accomplished by a number of alternative mechanisms without impacting the browsing experience - a simple tool that validates an XML document could be written in a few lines of JavaScript, see http://msdn.microsoft.com/downloads/samples/internet/xml/xml_validator/default.asp for an example. We considered several mechanisms for allowing developers to "turn on" validation errors but did not find a clean solution that could be implemented in time for the IE5 release. - Jonathan Marsh [Daniel Austin 1999-03-31] Ken, While I see your point in terms of validation, I think that Jonathan's argument is sound. As a content provider, I can see little need for *client-side* validation, except in certain special cases. Validation against a DTD is a check for structural validity, and should be carried out by the author prior to serving the document. As Jonathan notes, what is the client supposed to do on receiving an invalid document? Popping up a Javascript alert box is not acceptable, nor is refusing to show the page. We must work on the assumption that the author validated the document. Also, validation by clients would simply take too long and be too large a performance hit. Validation is for author's to check their work, not for browsers to check their input. (These arguments do not apply to well-formedness.) [Paul Prescod 1999-04-01] Chris Lilley wrote: > > So, it always validates, but the flag controls whether error messages > are shown? That sounds fine, until you realise that if a validating > parser founfd an error then not only do you have error messages, you > also have no parse tree. So, what gets displayed? Presumably, some > fixed-up, error-corrected tree. I'm not sure what you mean here. Consider: <!ELEMENT FOO (BAR, BAZ)> ... <FOO><BAZ/><BAR/></FOO> The tree does not conform to the DTD but it is not ambiguous. In the face of errors IE5 should give you exactly the same parse tree that a well-formed only parser gives you. > c) If people want validation, they will make valid documents. If they > don't, they will make well formed documents. I would usually agree with you but in this case there is a bug in the XML spec. that causes a problem. If you have a DTD only in order to specify a single defauled attribute or entity then that document becomes well-formed by *not valid*. There is no concept of "has a DTD but is not meant to be valid" or "conforms to the declarations that are available but the declaration are not complete." If you look at the "Element Valid" validity constraint you will see that an element is only valid if its element type is declared and all of its child elements are also of types that have been declared. I propose a processing instruction that says that a document has a DTD but is not meant to be valid. <?xml:not-valid?> Then validating applications would treat it as if it were just well-formed. [James Clark 1999-04-02] Chris Lilley wrote: > Heh. What you are saying is, have some sort of switch in the document > which says whether the document is asserted to be valid or whether it is > just well formed? > > But that of course already exists, and people can choose to make just > well-formed documents if they want. What switch is that? This document <doc>foo</doc> is well-formed but not valid. This document also is well-formed but not valid: <!DOCTYPE doc [ <!ATTLIST doc a CDATA "default"> ]> <doc>foo</doc> Neither contains an assertion that is valid. Systems that assume that a document is meant to be valid merely because it contains a DOCTYPE declaration are broken; there's nothing in the XML spec that licenses such an assumption. > > The inclusion of a DTD could be interpreted as switch indicating to > > the interpreter that structural integrity check has to be done on the > > document. > > Not "could be"; *is&*. That is the intent of the XML 1.0 spec. That's news to me. > That is > what a validating parser does when encountering a document with a > doctype declaration and an internal subset with anything other than just > entity declarations. So what is this switch? The DOCTYPE declaration? The DOCTYPE declaration unless it's just an internal subset containing entity declarations? What if I have default attributes declared as well? What if I have so many entities that I use an external subset instead? Where does the XML spec mention such a switch? I know Microsoft-bashing is good, clean fun, but actually they've done the right thing here. [James Clark 1999-04-02] Reading the DTD and validating aren't the same thing. Unless a document has standalone="yes", the browser should always read a provided DTD so that it can correctly - default attributes - normalize attribute values - expand entity references None of these things involve validation.