SGML: RCDATA

SGML: RCDATA

Posting from Arijit Sengupta and response from David Megginson.


Subject: SP/sgmls bug or am I missing something?
Date: Wed, 29 May 1996 00:50:08 -0500
From: "Arijit Sengupta" <asengupt@cs.indiana.edu>
Newsgroups: comp.text.sgml
-------------------------------------------------------------------- Hi! I think I have to get my knowledge refreshed about SGML, after 3 years with it :) Anyway, here is something I am puzzled about - isn't the following code valid? <!DOCTYPE foo [ <!ELEMENT foo - - (bar)> <!ELEMENT bar - - RCDATA> ]> <foo><bar><hello><world><how></world></hello></bar></foo> With sp-1.0.11 (as well as earlier versions), I get the following: Shouldn't the end-tags for world and hello be ignored too, since it is in the RCDATA region? Thanks! Jit. ---------------------- %~/tmp>nsgmls foo.sgm nsgmls:foo.sgm:5:36:E: end tag for element "WORLD" which is not open nsgmls:foo.sgm:5:44:E: end tag for element "HELLO" which is not open (FOO (BAR -<hello><world><how> )BAR )FOO %~/tmp> ---------------------- -- _|_|_|_|_| _|_|_| _|_|_|_|_| | asengupt@indiana.edu _| _| _| http://www.cs.indiana.edu/hyplan/asengupt.html _| _| _| | Computer Science, LH215 _| _| _| _| | Indiana University, Bloomington IN47405 _|_|_|ust _|_|_|n _|ime | (812) 855-4318 / (812) 334-2695 ===================================================================== From david@baeda.english.uottawa.ca Wed May 29 08:38:13 1996 Date: Wed, 29 May 1996 06:37:23 -0400 Message-Id: <199605291037.GAA28552@baeda.english.uottawa.ca> To: sp-prog@jclark.com Subject: SP/sgmls bug or am I missing something? Newsgroups: comp.text.sgml From: David Megginson <dmeggins@aix1.uottawa.ca> -------------------------- Arijit Sengupta writes: > Anyway, here is something I am puzzled about - isn't the following > code valid? > > <!DOCTYPE foo [ > <!ELEMENT foo - - (bar)> > <!ELEMENT bar - - RCDATA> > ]> > <foo><bar><hello><world><how></world></hello></bar></foo> > > > With sp-1.0.11 (as well as earlier versions), I get the following: > Shouldn't the end-tags for world and hello be ignored too, since it is > in the RCDATA region? You might think so, but in fact SP is right, and your example shows why "RCDATA" and "CDATA" should _never_ be used as content models. Note the following from ISO 8879:1986, clause 7.6: The content of an element declared to be character data or replaceable character data is terminated only by an etago delimiter in context (which need not open a valid end-tag) or a valid net. What this means is that ETAGO (usually "</") is still recognised as a delimiter in CDATA and RCDATA content -- as soon as SP hits the first instance of "</", it stops recognising data characters, even if what follows is not a legal end tag. In fact, you opened the element with a net-enabling start tag (ie. "<bar/") than any instance of NET ("/") would end the RCDATA content. A much easier solution is the protect the whole thing with a marked section: <foo><bar> <![RCDATA[<hello><world><how></world></hello>]]> </bar></foo> In this case, only the MSC ("]]") delimiter will end the RCDATA, though, of course, ERO ("&") will also be recognised. SP handles this as expected. All the best, David -- David Megginson Department of English, University of Ottawa, dmeggins@aix1.uottawa.ca Ottawa, Ontario, CANADA K1N 6N5 ak117@freenet.carleton.ca Phone: (613) 562-5800 ext.1203 WWW: http://www.uottawa.ca/~dmeggins FAX: (613) 562-5990