Cover Pages Logo SEARCH
Advanced Search
ABOUT
Site Map
CP RSS Channel
Contact Us
Sponsoring CP
About Our Sponsors

NEWS
Cover Stories
Articles & Papers
Press Releases

CORE STANDARDS
XML
SGML
Schemas
XSL/XSLT/XPath
XLink
XML Query
CSS
SVG

TECHNOLOGY REPORTS
XML Applications
General Apps
Government Apps
Academic Apps

EVENTS
LIBRARY
Introductions
FAQs
Bibliography
Technology and Society
Semantics
Tech Topics
Software
Related Standards
Historic

XML and non-deterministic ['ambiguous'] content models


Jim Shain asked the following question about (non-)determinism in XML content models; informative responses from Richard Tobin, James Clark, Deborah Aleyne Lapeyre, and Marcus Carr follow. See SGML/XML Notion of Ambiguity (non-deterministic content models).

Date: Wed, 10 Jan 2001 14:22 -0600
From: Jim Shain <Jim.Shain@alltel.com>
To: "xml-dev@lists.xml.org" <xml-dev@lists.xml.org>
Subject: DTD and Illegal Construct

I wanted to run this past some knowledgeable people.  Is the following a legal
or illegal construct for an element in a DTD?

 <!ELEMENT Request ((A, B, C?) | (A?, B, C?, D))>

Some have stated that it is illegal because "When A is present it isn't possible
to determine whether or not D is required."

Jim Shain
Sr. C/S Architect
ALLTEL Information Services



Date: Thu, 11 Jan 2001 00:42:20 +0000 (GMT)
From: Richard Tobin <richard@cogsci.ed.ac.uk>
To: xml-dev@lists.xml.org
Subject: RE: DTD and Illegal Construct

> ((A, B, D?) | (A?, B, C?, D?, E))

The rule is that a parser, proceeding from left to right, should be
able to decide which symbol to match against on the basis of the
current input symbol, without lookahead.

So no disjunction of the form

  ((A, ...) | (A?, ...))

can ever be legal, because if the input starts with A the parser won't
known which branch of the disjunction to take.

My online validator, http://www.cogsci.ed.ac.uk/~richard/xml-check.html,
will check that your content models are deterministic.

-- Richard

========================================================================

Date: Thu, 11 Jan 2001 11:27:07 +0700
From: James Clark <jjc@jclark.com>
To: xml-dev@lists.xml.org
Subject: Re: DTD and Illegal Construct

It seems relatively easy to rewrite individual content models to
workaround the restriction on ambiguity.  Where I find it much more of a
nuisance is the way it makes parameterization harder.  For example,
imagine something like

<!ENTITY % local.emph.class "">
<!ENTITY % emph.class "emph|phrase %local.emph.class;">

<!ENTITY % local.tech.class "">
<!ENTITY % tech.class "code|var %local.tech.class;">

<!ELEMENT eg (#PCDATA|%emph.class;)*>
<!ELEMENT p (#PCDATA|%emph.class;|%tech.class;)*>

Now suppose I want to allow the var element everywhere the emph element
is allowed (ie inside eg).  In my internal subset I do:

<!ENTITY % local.emph.class "|var">

Then I get hit with an ambiguity in the content model for p.

TREX avoids this annoyance by fully supporting ambiguous content
models.  I believe RELAX does also.

James

=========================================================================

Date: Wed, 10 Jan 2001 19:56:37 -0500
From: Deborah Aleyne Lapeyre <dalapeyre@mulberrytech.com>
To: xml-dev@lists.xml.org
Subject: RE: DTD and Illegal Construct

Wow, it was so good to see Sam Wilmott quoted again,
he is one of our industries real thinkers.

My personal favorite ambiguity puzzle (which I used to
give to SGML DTD-writer classes) goes something like this:
(It's been a while, so pleased forgive any ambiguity;
the wordiness is deliberate obfuscation.)

We need to write a single content model for an element.
That element contains 3 elements: X, Y, and A such that:

  - All X's (if any) must come before all Y's.
  - All X's must clump together. (e.g., X+ or X*)
  - All Y's must similarly clump together.
  - Both the potentially multiple X's and the
    potentially multiple Y's may be followed by
    multiple A's. (That is, if there are any A's,
    they always follow the X's or the Y's.)
  - A's are always optional.
  - Need not have X's or Y's.

The most straight-forward answers are ambiguous,
but listing the cases and solving those will
usually lead to a solution.  (Parser developers
see the solution intuitively, or so they claim.)

--Debbie

(By X's, I mean the plural of "X", sorry.)
-- 

Deborah Aleyne Lapeyre               mailto:dalapeyre@mulberrytech.com
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9633
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
   Mulberry Technologies: A Consultancy Specializing in SGML and XML

======================================================================

Date: Thu, 11 Jan 2001 10:37:55 +1100
From: Marcus Carr <mrc@allette.com.au>
To: "xml-dev@lists.xml.org" <xml-dev@lists.xml.org>
Subject: Re: DTD and Illegal Construct


Jim Shain wrote:

> Thank all of you for the input.  Is there a place on the web that discusses this
> pitfall and potential others like this one?  Most XML books don't discuss this
> much.

Here's the URL for Sam Wilmott's white paper -
http://www.omnimark.com/develop/whitepapers/cma.html.


--
Regards,

Marcus Carr                      email:  mrc@allette.com.au
___________________________________________________________________
Allette Systems (Australia)      www:    http://www.allette.com.au
___________________________________________________________________
"Everything should be made as simple as possible, but not simpler."
       - Einstein

===================================================================

Date: Thu, 11 Jan 2001 14:00:49 +1100
From: Marcus Carr <mrc@allette.com.au>
To: xml-dev@lists.xml.org
Subject: Re: DTD and Illegal Construct


Deborah Aleyne Lapeyre wrote:

> Wow, it was so good to see Sam Wilmott quoted again,
> he is one of our industries real thinkers.

No question - he's one of the smartest people around.

> My personal favorite ambiguity puzzle (which I used to
> give to SGML DTD-writer classes) goes something like this:

Without doubt, my favorite is:

   a, (b, a)*, b?

It's the only form of ambiguity that can't be resolved while preserving
the original intent of the model - but feel free to try...


--
Regards,

Marcus Carr                      email:  mrc@allette.com.au
___________________________________________________________________
Allette Systems (Australia)      www:    http://www.allette.com.au
___________________________________________________________________
"Everything should be made as simple as possible, but not simpler."
       - Einstein



Prepared by Robin Cover for The XML Cover Pages archive.


Globe Image

Document URL: http://xml.coverpages.org/contentModelAlgebra200101.html