On May 08, 1998, Paul Prescod raised some questions about the design of the "standalone document declaration" in XML 1.0. His original posting and some followup messages (Tim Bray, David Megginson, James Clark) from from XML-DEV are collected in this document. Last updated: May 09, 1998.
From owner-xml-dev@ic.ac.uk Fri May 8 08:28:49 1998 Date: Fri, 08 May 1998 09:18:22 -0400 From: Paul Prescod <papresco@technologist.com> Subject: SDD bogus Is the standalone document declaration bogus and perhaps dangerous? The whole feature strikes me as over-complicated and over-specific for a language like XML, but I'm aware of the historical processes that gave rise to it. My understanding of a typical usage scenario goes like this: a sender creates a document. It creates it specifically so that it will be standalone. It validates that this is the case (while it validates everything else) and then it sends it to the receiver who hopes to consume it without validating it. Things already strike me as a little bizarre, because if your protocol is designed such that the consumer trusts the receiver, then couldn't the SDD be implied in your out-of-band agreement? Further, what do you do if the SDD is other than you expect? Halt the parse and start again with a validating processor? But that's not what I'm concerned about. I'm concerned because I believe this to be a valid XML document: <?xml version="1.0" standalone="yes"?> <!DOCTYPE MEMO SYSTEM "http://www.sgmlsource.com/memo.dtd" [ <!ENTITY % mess-everything-up SYSTEM "mess.ent"> <!ATTLIST MEMO SECURITY CDATA "TOP-SECRET"> ]> <MEMO></MEMO> In my opinion, section 5.1 will require the non-validating parser to skip the attribute list declaration, even if memo.dtd is an empty file. The receiver has no way of knowing that this case has occured if it uses a "standard parser" (since XML's semantics are, for the moment at least, imprecisely specified, I only know what that means intuitively ... SAX, Lark, Expat, etc. would not give you enough information to detect this case). This to me suggest that applications cannot trust the SDD and it must therefore be presumed to be meaningless. But I'm glad to be proven wrong. Despite its reputation to the contrary, XML is intricate and deep and I may have missed something important. [**Note on diction: Paul clarified/qualified the use of "bogus" in a later post: > Tim Bray wrote: >> >> Having said that, Paul did raise a valid concern about the SDD (too bad >> this issue wasn't pointed out before the spec was frozen). > > Yes, I want to point out to those who do not know the dynamics here that I > use the word "bogus" because I was in the SIG and it is as much my fault > as anyone's that it got through. Were I talking about someone else's spec. > I would be more tactful. ] Paul Prescod - http://itrc.uwaterloo.ca/~papresco Can we afford to feed that army, while so many children are naked and hungry? Can we afford to remain passive, while that soldier-army is growing so massive? - "Gabby" Barbadian Calpysonian in "Boots" xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) ======================================================================= From owner-xml-dev@ic.ac.uk Fri May 8 08:39:42 1998 Date: Fri, 08 May 1998 09:28:10 -0400 From: Paul Prescod <papresco@technologist.com> X-Mailer: Mozilla 4.04 [en] (WinNT; U) MIME-Version: 1.0 To: xml-dev <xml-dev@ic.ac.uk> Subject: SDD again Let me risk another step into the language courtroom. Validating parsers must always read the whole DTD. So the SDD is only for non-validating parsers. Non-validating parsers do not read element type declarations. So what is the point of this line: "The standalone document declaration must have the value "no" if any external markup declarations contain declarations of:" ... "element types with element content, if white space occurs directly within any instance of those types." First, why does a non-validating parser care about element/mixed content? It has no responsibility to do any marking of insignificant whitespace anyhow. Second, if there is no class of processor that can reliably reproduce the intended parse tree without reading the whole DTD, then doesn't that significantly weaken the utility (okay, "purity") of the SDD? Even if I am wrong on the last point, it seems that it does not do what it is supposed to do properly. Paul Prescod - http://itrc.uwaterloo.ca/~papresco Can we afford to feed that army, while so many children are naked and hungry? Can we afford to remain passive, while that soldier-army is growing so massive? - "Gabby" Barbadian Calpysonian in "Boots" xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) ======================================================================= From owner-xml-dev@ic.ac.uk Fri May 8 09:05:31 1998 Date: Fri, 08 May 1998 06:51:30 -0700 To: xml-dev <xml-dev@ic.ac.uk> From: Tim Bray <tbray@textuality.com> Subject: Re: SDD bogus At 09:18 AM 5/8/98 -0400, Paul Prescod wrote: >But that's not what I'm concerned about. I'm concerned because I believe >this to be a valid XML document: > ><?xml version="1.0" standalone="yes"?> ><!DOCTYPE MEMO SYSTEM "http://www.sgmlsource.com/memo.dtd" [ ><!ENTITY % mess-everything-up SYSTEM "mess.ent"> ><!ATTLIST MEMO SECURITY CDATA "TOP-SECRET"> >]> ><MEMO></MEMO> > >In my opinion, section 5.1 will require the non-validating parser to skip >the attribute list declaration, even if memo.dtd is an empty file. Welll, it can't be valid if memo.dtd is an empty file, because you don't have <!ELEMENT memo .. > anywhere. But yes, 5.1 suggests the attribute default shouldn't be used. >The receiver has no way of knowing that this case has occured if it uses a >"standard parser" If the sender is stupid enough to send something like this to a non-validating parser, he gets what he deserves. If it's a validating parser, then of course the emptiness of memo.dtd will be detected. > (since XML's semantics are, for the moment at least, >imprecisely specified, I only know what that means intuitively ... SAX, >Lark, Expat, etc. would not give you enough information to detect this >case). Huh? >This to me suggest that applications cannot trust the SDD and it must >therefore be presumed to be meaningless. You do raise a good question; it would seem that standalone='true' *ought* to mean that the rule of 5.1 about the effect of external PE refs could be ignored. Hmmmm -Tim xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) ======================================================================= From owner-xml-dev@ic.ac.uk Fri May 8 09:16:43 1998 Date: Fri, 08 May 1998 09:51:39 -0400 From: David Megginson <ak117@freenet.carleton.ca> Subject: SDD bogus In-reply-to: <3553061D.BF3A3461@technologist.com> Paul Prescod writes: > Is the standalone document declaration bogus and perhaps dangerous? Yes and yes. The problem, I think, came from the mistaken idea that people (i.e. desperate Perl hackers) would write custom parsers for each XML application (like RDF), and that these people would not want to deal with seemingly difficult problems like external entity resolution. In the end, as one might have predicted, there is an impressive range of free XML processors available in several different programming languages: someone writing an RDF tool does not need to worry about the character and entity level of XML at all, and can work with XML easily through a more abstract interface such as the DOM or SAX. So, we should let the authors decide -- if an author creates a document referencing external entities (including an external DTD subset), then the XML parser should handle them; if the author does not want to use external entities, then she can simply avoid referencing any. As many XML parser writers have shown, resolving external entities is one of the easiest parts of XML (especially in higher-level languages like Java or Perl, and, I presume, Python). Allowing parsers to skip external entities -- rather than simplifying XML -- ended up making it much more complicated, and as you point it, the standalone declaration really doesn't help things. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) ======================================================================= From owner-xml-dev@ic.ac.uk Fri May 8 10:17:44 1998 Date: Fri, 08 May 1998 11:00:17 -0400 From: David Megginson <ak117@freenet.carleton.ca> Subject: SDD again In-reply-to: <3553086A.3494D7F9@technologist.com> Paul Prescod writes: > Let me risk another step into the language courtroom. Validating > parsers must always read the whole DTD. So the SDD is only for > non-validating parsers. Non-validating parsers do not read element > type declarations. So what is the point of this line: Your first premise is correct, but your second one is not. The spec states that a validating parser must use the whole DTD; it does not state that a non-validating parser may not use the DTD. AElfred, for example, reads the DTD well enough that it can even flag ignorable whitespace base on an element type's content model, but it is non-validating. That said, I still agree that the standalone declaration is wrong. Perhaps some day, if there's an XML 1.1, we can think about fixing it. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) ======================================================================= From owner-xml-dev@ic.ac.uk Fri May 8 10:31:07 1998 Date: Fri, 08 May 1998 08:17:12 -0700 From: Tim Bray <tbray@textuality.com> Subject: Re: SDD bogus At 09:51 AM 5/8/98 -0400, David Megginson wrote: >As many XML parser writers have shown, resolving external entities is >one of the easiest parts of XML Yes, but as is well-documented, difficulty is *not* the reason we made their processing optional by non-validating processors. The prime mover behind this decision was a passionate presentation from Jean Paoli explaining that the auto-include semantic of parsed entities is just *wrong* for web browsers. I've attached an explanation of why at the end of this message, but if you want to see it context, go to section 4.4.3 of the annotated spec and click on the "H". Having said that, Paul did raise a valid concern about the SDD (too bad this issue wasn't pointed out before the spec was frozen). Having said *that*, I think, for reasons that are on the record in the same place, that the problem the SDD exists to solve will essentially never arise in real operational scenarios anyhow. -Tim ================= >From the annotated spec at http://xml.com/axml/axml.html Why Are External Entities Included Optionally? In discussion of external entities, we realized that the semantics of external text entities (compulsory inclusion at the point where they are encountered) are deeply incompatible with the desired behavior of Web browsers. Consider the following example of the beginning of an XML document: <?xml version='1.0'?> <!DOCTYPE doc [ <!ENTITY MSA SYSTEM "http://www.microsoft.com/press/311.xml"> <!ENTITY NSA SYSTEM "http://home.netscape.com/PR/x27.xml"> ]> <doc>Netscape today announced that &NSA;. In response, Microsoft issued the following statement: &MSA;. ... A Web browser is typically making an aggressive effort to display text to the user as soon as possible, in parallel with fetching it from the network. In the example above, if a browser were required to fetch and process all external entities, it could only display the first four words before starting another network fetch operation. To make things worse, bear in mind that the replacement text for the entity NSA could well include other external entities which in turn would need to be fetched. This type of situation is unacceptable. Hence the rule that non-validating parsers need not fetch external entities if they don't want to. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) ======================================================================= From owner-xml-dev@ic.ac.uk Sat May 9 00:00:25 1998 Date: Sat, 09 May 1998 11:36:31 +0700 From: James Clark <jjc@jclark.com> Subject: Re: SDD bogus Sender: owner-xml-dev@ic.ac.uk To: Paul Prescod <papresco@technologist.com> Cc: xml-dev <xml-dev@ic.ac.uk> Reply-to: James Clark <jjc@jclark.com> Paul Prescod wrote: > I'm concerned because I believe > this to be a valid XML document: > > <?xml version="1.0" standalone="yes"?> > <!DOCTYPE MEMO SYSTEM "http://www.sgmlsource.com/memo.dtd" [ > <!ENTITY % mess-everything-up SYSTEM "mess.ent"> > <!ATTLIST MEMO SECURITY CDATA "TOP-SECRET"> > ]> > <MEMO></MEMO> > > In my opinion, section 5.1 will require the non-validating parser to skip > the attribute list declaration, even if memo.dtd is an empty file. This is very good point. Your example isn't quite right: the entity must be referenced. Also a non-validating parser only has to skip the ATTLIST declaration if it skips the entity reference. Apart from this, your interpretation of 5.1 is the obvious one. Expat behaves consistently this. I think this is a serious problem, because it breaks the principle that if you declare your document as standalone=yes and validate it, then you will get the same result when you parse it with any non-validating parser (which to me is the point of the SDD). I think a bit of creative interpretation would be in order. Section 5.1 says: [Non-validating processors] must not process entity declarations or attribute-list declarations encountered after a reference to a parameter entity that is not read, since the entity may have contained overriding declarations. The "since" clause is false when standalone=yes, so I think this can fairly be said to be an inconsistency in the spec (rather than simply a poor design choice), which should be resolved by not applying this requirement when standalone=yes. The other way to fix this would be to tweak the definition of standalone to say that declarations after the first reference to an external parameter entity count as external for the purposes of determining whether the document is standalone. This is clearly needs to be fixed one way or the other. > Despite its reputation to the contrary, > XML is intricate and deep and I may have missed something important. Yes. Entities and the SDD are both tricky: the interaction of the two is particularily so. James xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) =======================================================================