MSIE 5 and Validation


A (very) selective sampling of posts to the XSL List on the topic of MSIE 5.0's validation of incoming XML. See the thread in full on the XSL List archive.


[Ken Sall  1999-03-31]

I've added an example that illustrates your point that IE5 detects DTD
syntax errors.

http://members.home.com/kensall/tests/collection1bugsdtd.xml
http://members.home.com/kensall/tests/collection1bugs.dtd
 
However, if anyone from Microsoft can explain why IE5 doesn't actually use
the DTD to validate the document (the way that IE5 Beta 2 did), I'd
appreciate it. This problem will be published in an article shortly (in the
larger context of positive things you can do with IE5 with XML/XSL) and it
would be great to state correctly what Microsoft plans w.r.t. DTD
processing. 

[Jonathan Marsh 1999-03-31]

Subject: RE: Why Doesn't  IE5 use the DTD to Validate?

This is as designed, not a bug.

The IE5 XML parser is a validating parser, with two properties set through
DOM extensions to control DTD handling:
 - validateOnParse determines whether validation errors are presented to the
user.
 - resolveExternals determines whether the DTD or XML Schema is loaded and
datatypes, default values, etc. are honored.

The values of these properties when browsing directly to XML documents is
validateOnParse=false and resolveExternals=true.

When browsing XML documents on the Web, surfacing validation errors is of
little apparent value.  I would not expect publishers to author both a DTD
or XML Schema and documents that don't conform to that DTD/Schema.  So the
vast majority will not generate validation errors.  For those that declare a
DTD and are invalid, is it no better to give the user a validation error
instead of displaying the document, in fact the validation error could
prevent the user from viewing an otherwise perfectly good document.  Also
the performance penalty for validation is significant and should not be
imposed on end-users without good reason.

The only scenario we could come up with where validation is useful when
browsing XML documents is when the browser is used as a development tool,
allowing easy checking of well-formedness and validation for a document in
progress.  This scenario can be accomplished by a number of alternative
mechanisms without impacting the browsing experience - a simple tool that
validates an XML document could be written in a few lines of JavaScript, see
http://msdn.microsoft.com/downloads/samples/internet/xml/xml_validator/default.asp
for an example.

We considered several mechanisms for allowing developers to "turn on"
validation errors but did not find a clean solution that could be
implemented in time for the IE5 release.

- Jonathan Marsh

[Daniel Austin 1999-03-31]


Ken,

        While I see your point in terms of validation, I think that
Jonathan's argument is sound.
As a content provider, I can see little need for *client-side* validation,
except in certain special cases. Validation against a DTD is a check for
structural validity, and should be carried out by the author prior to
serving the document. As Jonathan notes, what is the client supposed to do
on receiving an invalid document? Popping up a Javascript alert box is not
acceptable, nor is refusing to show the page. We must work on the assumption
that the author validated the document. Also, validation by clients would
simply take too long and be too large a performance hit. Validation is for
author's to check their work, not for browsers to check their input. (These
arguments do not apply to well-formedness.)





[Paul Prescod  1999-04-01]

Chris Lilley wrote:
> 
> So, it always validates, but the flag controls whether error messages
> are shown? That sounds fine, until you realise that if a validating
> parser founfd an error then not only do you have error messages, you
> also have no parse tree. So, what gets displayed? Presumably, some
> fixed-up, error-corrected tree. 

I'm not sure what you mean here. Consider:

<!ELEMENT FOO (BAR, BAZ)>

...

<FOO><BAZ/><BAR/></FOO>

The tree does not conform to the DTD but it is not ambiguous. In the face
of errors IE5 should give you exactly the same parse tree that a
well-formed only parser gives you.

> c) If people want validation, they will make valid documents. If they
> don't, they will make well formed documents.

I would usually agree with you but in this case there is a bug in the XML
spec. that causes a problem. If you have a DTD only in order to specify a
single defauled attribute or entity then that document becomes well-formed
by *not valid*. There is no concept of "has a DTD but is not meant to be
valid" or "conforms to the declarations that are available but the
declaration are not complete."

If you look at the "Element Valid" validity constraint you will see that
an element is only valid if its element type is declared and all of its
child elements are also of types that have been declared.

I propose a processing instruction that says that a document has a DTD but
is not meant to be valid.

<?xml:not-valid?>

Then validating applications would treat it as if it were just
well-formed.

[James Clark 1999-04-02]

Chris Lilley wrote:

> Heh. What you are saying is, have some sort of switch in the document
> which says whether the document is asserted to be valid or whether it is
> just well formed?
> 
> But that of course already exists, and people can choose to make just
> well-formed documents if they want.

What switch is that?

This document

  <doc>foo</doc>

is well-formed but not valid.  This document also is well-formed but not
valid:

<!DOCTYPE doc [
<!ATTLIST doc a CDATA "default">
]>
<doc>foo</doc>

Neither contains an assertion that is valid.  Systems that assume that a
document is meant to be valid merely because it contains a DOCTYPE
declaration are broken; there's nothing in the XML spec that licenses
such an assumption.

> > The inclusion of a DTD could be interpreted as switch indicating to
> > the interpreter that structural integrity check has to be done on the
> > document.
> 
> Not "could be"; *is&*. That is the intent of the XML 1.0 spec.

That's news to me.

> That is
> what a validating parser does when encountering a document with a
> doctype declaration and an internal subset with anything other than just
> entity declarations.

So what is this switch?  The DOCTYPE declaration? The DOCTYPE
declaration unless it's just an internal subset containing entity
declarations?  What if I have default attributes declared as well? What
if I have so many entities that I use an external subset instead?  Where
does the XML spec mention such a switch?

I know Microsoft-bashing is good, clean fun, but actually they've done
the right thing here.

[James Clark 1999-04-02]

Reading the DTD and validating aren't the same thing.  Unless a document
has standalone="yes", the browser should always read a provided DTD so
that it can correctly

- default attributes
- normalize attribute values
- expand entity references

None of these things involve validation.