[Non-] deterministic content models
Postings from June, 2001. See the XML-DEV archives for other contributions. Note that RELAX NG does not share assumptions of SGML/XML "DTD" syntax which imposes some unusual constraints (and notions of ambiguity).
Date: Thu, 14 Jun 2001 09:30:27 -0700 From: Joe English <jenglish@flightlab.com> To: xml-dev@lists.xml.org Subject: Re: Non-deterministic content model Arthur Rother wrote: > I came across this one in my first lessons on learning DTD's, where I tried > to write a DTD for a game of chess: > > ((whitemove, blackmove)*, whitemove?) That's a neat example. It's well-known that ((a,b)*,a?) is hopelessly ambiguous (i.e., there's no way to write it as deterministic content model), but I've never seen a real-world example where that construct was useful before now. > If this is the only endless case, one could hardcode this in the parser. It's not -- there's an infinite number of "hopelessly ambiguous" regular languages. > How do [XML Spy and XMetal] deal with it? There are plenty of algorithms for regular expression matching besides the one prescribed for SGML. You can convert to a DFA (lex), backtrack (Perl), use derivatives (TREX), or simulate an NDFA, to name a few. The only real problem with nondeterministic content models in SGML has to do with start-tag inference (and even that can be worked around). Since XML doesn't support this feature, it's not as big of a problem for implementations. --Joe English jenglish@flightlab.com
Date: Fri, 15 Jun 2001 14:08:18 +0100 From: Francis Norton <francis@redrice.com> To: Jeni Tennison <mail@jenitennison.com> Cc: xml-dev@lists.xml.org Subject: Re: Non-deterministic content model Jeni Tennison wrote: > > RELAX NG doesn't require deterministic patterns; the following > schema should do it: > <snip/> Yes, I got this to work with James Clark's jing (http://www.thaiopensource.com/relaxng/): <grammar xmlns="http://relaxng.org/ns/structure/0.9"> <start> <element name="game"> <element name="white"> <ref name="move"/> </element> <zeroOrMore> <element name="black"> <ref name="move"/> </element> <element name="white"> <ref name="move"/> </element> </zeroOrMore> <optional> <element name="black"> <ref name="move"/> </element> </optional> </element> </start> <define name="move"> <text/> </define> </grammar> The grammar expanded a bit because you can't simply say <element name="white/>, you have to put in some kind of content description, even if only <text/> or <empty/>, so I created a common type, move. > XML Schema forces deterministic model groups, so you can't use that. > Interesting. <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"> <xs:element name="game"> <xs:complexType> <xs:sequence> <xs:element ref="white"/> <xs:sequence minOccurs="0" maxOccurs="unbounded"> <xs:element ref="black"/> <xs:element ref="white"/> </xs:sequence> <xs:element ref="black" minOccurs="0"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="black"> <xs:complexType/> </xs:element> <xs:element name="white"> <xs:complexType/> </xs:element> </xs:schema> is an XML Schema equivalent. XSV and MSXML4 detect the ambiguous content model, XML Spy 4 beta 1 ignores it and validates correctly. Francis.
Date: Fri, 15 Jun 2001 14:08:18 +0100 From: Francis Norton <francis@redrice.com> To: Jeni Tennison <mail@jenitennison.com> Subject: Re: Non-deterministic content model Jeni Tennison wrote: > > RELAX NG doesn't require deterministic patterns; the following > schema should do it: > <snip/> Yes, I got this to work with James Clark's jing (http://www.thaiopensource.com/relaxng/): <grammar xmlns="http://relaxng.org/ns/structure/0.9"> <start> <element name="game"> <element name="white"> <ref name="move"/> </element> <zeroOrMore> <element name="black"> <ref name="move"/> </element> <element name="white"> <ref name="move"/> </element> </zeroOrMore> <optional> <element name="black"> <ref name="move"/> </element> </optional> </element> </start> <define name="move"> <text/> </define> </grammar>
Prepared by Robin Cover for The XML Cover Pages archive. See SGML/XML Notion of Ambiguity (non-deterministic content models).