XML and non-deterministic ['ambiguous'] content models
Jim Shain asked the following question about (non-)determinism in XML content models; informative responses from Richard Tobin, James Clark, Deborah Aleyne Lapeyre, and Marcus Carr follow. See SGML/XML Notion of Ambiguity (non-deterministic content models).
Date: Wed, 10 Jan 2001 14:22 -0600 From: Jim Shain <Jim.Shain@alltel.com> To: "xml-dev@lists.xml.org" <xml-dev@lists.xml.org> Subject: DTD and Illegal Construct I wanted to run this past some knowledgeable people. Is the following a legal or illegal construct for an element in a DTD? <!ELEMENT Request ((A, B, C?) | (A?, B, C?, D))> Some have stated that it is illegal because "When A is present it isn't possible to determine whether or not D is required." Jim Shain Sr. C/S Architect ALLTEL Information Services
Date: Thu, 11 Jan 2001 00:42:20 +0000 (GMT) From: Richard Tobin <richard@cogsci.ed.ac.uk> To: xml-dev@lists.xml.org Subject: RE: DTD and Illegal Construct > ((A, B, D?) | (A?, B, C?, D?, E)) The rule is that a parser, proceeding from left to right, should be able to decide which symbol to match against on the basis of the current input symbol, without lookahead. So no disjunction of the form ((A, ...) | (A?, ...)) can ever be legal, because if the input starts with A the parser won't known which branch of the disjunction to take. My online validator, http://www.cogsci.ed.ac.uk/~richard/xml-check.html, will check that your content models are deterministic. -- Richard ======================================================================== Date: Thu, 11 Jan 2001 11:27:07 +0700 From: James Clark <jjc@jclark.com> To: xml-dev@lists.xml.org Subject: Re: DTD and Illegal Construct It seems relatively easy to rewrite individual content models to workaround the restriction on ambiguity. Where I find it much more of a nuisance is the way it makes parameterization harder. For example, imagine something like <!ENTITY % local.emph.class ""> <!ENTITY % emph.class "emph|phrase %local.emph.class;"> <!ENTITY % local.tech.class ""> <!ENTITY % tech.class "code|var %local.tech.class;"> <!ELEMENT eg (#PCDATA|%emph.class;)*> <!ELEMENT p (#PCDATA|%emph.class;|%tech.class;)*> Now suppose I want to allow the var element everywhere the emph element is allowed (ie inside eg). In my internal subset I do: <!ENTITY % local.emph.class "|var"> Then I get hit with an ambiguity in the content model for p. TREX avoids this annoyance by fully supporting ambiguous content models. I believe RELAX does also. James ========================================================================= Date: Wed, 10 Jan 2001 19:56:37 -0500 From: Deborah Aleyne Lapeyre <dalapeyre@mulberrytech.com> To: xml-dev@lists.xml.org Subject: RE: DTD and Illegal Construct Wow, it was so good to see Sam Wilmott quoted again, he is one of our industries real thinkers. My personal favorite ambiguity puzzle (which I used to give to SGML DTD-writer classes) goes something like this: (It's been a while, so pleased forgive any ambiguity; the wordiness is deliberate obfuscation.) We need to write a single content model for an element. That element contains 3 elements: X, Y, and A such that: - All X's (if any) must come before all Y's. - All X's must clump together. (e.g., X+ or X*) - All Y's must similarly clump together. - Both the potentially multiple X's and the potentially multiple Y's may be followed by multiple A's. (That is, if there are any A's, they always follow the X's or the Y's.) - A's are always optional. - Need not have X's or Y's. The most straight-forward answers are ambiguous, but listing the cases and solving those will usually lead to a solution. (Parser developers see the solution intuitively, or so they claim.) --Debbie (By X's, I mean the plural of "X", sorry.) -- Deborah Aleyne Lapeyre mailto:dalapeyre@mulberrytech.com Mulberry Technologies, Inc. http://www.mulberrytech.com 17 West Jefferson Street Direct Phone: 301/315-9633 Suite 207 Phone: 301/315-9631 Rockville, MD 20850 Fax: 301/315-8285 ---------------------------------------------------------------------- Mulberry Technologies: A Consultancy Specializing in SGML and XML ====================================================================== Date: Thu, 11 Jan 2001 10:37:55 +1100 From: Marcus Carr <mrc@allette.com.au> To: "xml-dev@lists.xml.org" <xml-dev@lists.xml.org> Subject: Re: DTD and Illegal Construct Jim Shain wrote: > Thank all of you for the input. Is there a place on the web that discusses this > pitfall and potential others like this one? Most XML books don't discuss this > much. Here's the URL for Sam Wilmott's white paper - http://www.omnimark.com/develop/whitepapers/cma.html. -- Regards, Marcus Carr email: mrc@allette.com.au ___________________________________________________________________ Allette Systems (Australia) www: http://www.allette.com.au ___________________________________________________________________ "Everything should be made as simple as possible, but not simpler." - Einstein =================================================================== Date: Thu, 11 Jan 2001 14:00:49 +1100 From: Marcus Carr <mrc@allette.com.au> To: xml-dev@lists.xml.org Subject: Re: DTD and Illegal Construct Deborah Aleyne Lapeyre wrote: > Wow, it was so good to see Sam Wilmott quoted again, > he is one of our industries real thinkers. No question - he's one of the smartest people around. > My personal favorite ambiguity puzzle (which I used to > give to SGML DTD-writer classes) goes something like this: Without doubt, my favorite is: a, (b, a)*, b? It's the only form of ambiguity that can't be resolved while preserving the original intent of the model - but feel free to try... -- Regards, Marcus Carr email: mrc@allette.com.au ___________________________________________________________________ Allette Systems (Australia) www: http://www.allette.com.au ___________________________________________________________________ "Everything should be made as simple as possible, but not simpler." - Einstein
Prepared by Robin Cover for The XML Cover Pages archive.