XML and non-deterministic ['ambiguous'] content models
Jim Shain asked the following question about (non-)determinism in XML content models; informative responses from Richard Tobin, James Clark, Deborah Aleyne Lapeyre, and Marcus Carr follow. See SGML/XML Notion of Ambiguity (non-deterministic content models).
Date: Wed, 10 Jan 2001 14:22 -0600 From: Jim Shain <Jim.Shain@alltel.com> To: "xml-dev@lists.xml.org" <xml-dev@lists.xml.org> Subject: DTD and Illegal Construct I wanted to run this past some knowledgeable people. Is the following a legal or illegal construct for an element in a DTD? <!ELEMENT Request ((A, B, C?) | (A?, B, C?, D))> Some have stated that it is illegal because "When A is present it isn't possible to determine whether or not D is required." Jim Shain Sr. C/S Architect ALLTEL Information Services
Date: Thu, 11 Jan 2001 00:42:20 +0000 (GMT)
From: Richard Tobin <richard@cogsci.ed.ac.uk>
To: xml-dev@lists.xml.org
Subject: RE: DTD and Illegal Construct
> ((A, B, D?) | (A?, B, C?, D?, E))
The rule is that a parser, proceeding from left to right, should be
able to decide which symbol to match against on the basis of the
current input symbol, without lookahead.
So no disjunction of the form
((A, ...) | (A?, ...))
can ever be legal, because if the input starts with A the parser won't
known which branch of the disjunction to take.
My online validator, http://www.cogsci.ed.ac.uk/~richard/xml-check.html,
will check that your content models are deterministic.
-- Richard
========================================================================
Date: Thu, 11 Jan 2001 11:27:07 +0700
From: James Clark <jjc@jclark.com>
To: xml-dev@lists.xml.org
Subject: Re: DTD and Illegal Construct
It seems relatively easy to rewrite individual content models to
workaround the restriction on ambiguity. Where I find it much more of a
nuisance is the way it makes parameterization harder. For example,
imagine something like
<!ENTITY % local.emph.class "">
<!ENTITY % emph.class "emph|phrase %local.emph.class;">
<!ENTITY % local.tech.class "">
<!ENTITY % tech.class "code|var %local.tech.class;">
<!ELEMENT eg (#PCDATA|%emph.class;)*>
<!ELEMENT p (#PCDATA|%emph.class;|%tech.class;)*>
Now suppose I want to allow the var element everywhere the emph element
is allowed (ie inside eg). In my internal subset I do:
<!ENTITY % local.emph.class "|var">
Then I get hit with an ambiguity in the content model for p.
TREX avoids this annoyance by fully supporting ambiguous content
models. I believe RELAX does also.
James
=========================================================================
Date: Wed, 10 Jan 2001 19:56:37 -0500
From: Deborah Aleyne Lapeyre <dalapeyre@mulberrytech.com>
To: xml-dev@lists.xml.org
Subject: RE: DTD and Illegal Construct
Wow, it was so good to see Sam Wilmott quoted again,
he is one of our industries real thinkers.
My personal favorite ambiguity puzzle (which I used to
give to SGML DTD-writer classes) goes something like this:
(It's been a while, so pleased forgive any ambiguity;
the wordiness is deliberate obfuscation.)
We need to write a single content model for an element.
That element contains 3 elements: X, Y, and A such that:
- All X's (if any) must come before all Y's.
- All X's must clump together. (e.g., X+ or X*)
- All Y's must similarly clump together.
- Both the potentially multiple X's and the
potentially multiple Y's may be followed by
multiple A's. (That is, if there are any A's,
they always follow the X's or the Y's.)
- A's are always optional.
- Need not have X's or Y's.
The most straight-forward answers are ambiguous,
but listing the cases and solving those will
usually lead to a solution. (Parser developers
see the solution intuitively, or so they claim.)
--Debbie
(By X's, I mean the plural of "X", sorry.)
--
Deborah Aleyne Lapeyre mailto:dalapeyre@mulberrytech.com
Mulberry Technologies, Inc. http://www.mulberrytech.com
17 West Jefferson Street Direct Phone: 301/315-9633
Suite 207 Phone: 301/315-9631
Rockville, MD 20850 Fax: 301/315-8285
----------------------------------------------------------------------
Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================
Date: Thu, 11 Jan 2001 10:37:55 +1100
From: Marcus Carr <mrc@allette.com.au>
To: "xml-dev@lists.xml.org" <xml-dev@lists.xml.org>
Subject: Re: DTD and Illegal Construct
Jim Shain wrote:
> Thank all of you for the input. Is there a place on the web that discusses this
> pitfall and potential others like this one? Most XML books don't discuss this
> much.
Here's the URL for Sam Wilmott's white paper -
http://www.omnimark.com/develop/whitepapers/cma.html.
--
Regards,
Marcus Carr email: mrc@allette.com.au
___________________________________________________________________
Allette Systems (Australia) www: http://www.allette.com.au
___________________________________________________________________
"Everything should be made as simple as possible, but not simpler."
- Einstein
===================================================================
Date: Thu, 11 Jan 2001 14:00:49 +1100
From: Marcus Carr <mrc@allette.com.au>
To: xml-dev@lists.xml.org
Subject: Re: DTD and Illegal Construct
Deborah Aleyne Lapeyre wrote:
> Wow, it was so good to see Sam Wilmott quoted again,
> he is one of our industries real thinkers.
No question - he's one of the smartest people around.
> My personal favorite ambiguity puzzle (which I used to
> give to SGML DTD-writer classes) goes something like this:
Without doubt, my favorite is:
a, (b, a)*, b?
It's the only form of ambiguity that can't be resolved while preserving
the original intent of the model - but feel free to try...
--
Regards,
Marcus Carr email: mrc@allette.com.au
___________________________________________________________________
Allette Systems (Australia) www: http://www.allette.com.au
___________________________________________________________________
"Everything should be made as simple as possible, but not simpler."
- Einstein
Prepared by Robin Cover for The XML Cover Pages archive.

