Joe English on CDATA as declared content Message-id: <9607240020.AA11739@trystero.art.com> From: Joe English To: www-html@w3.org Date: Tue, 23 Jul 1996 17:20:33 PDT Subject: Cougar DTD: Do not use CDATA declared content for SCRIPT The 12-July-1996 draft of the "Cougar" HTML DTD [1] declares: This will not work. In particular, the use of CDATA declared content is incompatible with JavaScript (which, I presume, will be one of the primary scripting languages used in HTML documents). The main reason for this is that the arguments to JavaScript's 'document.write()' method [2], which inserts text and HTML markup into a document, may contain end-tags, e.g.: Elements with CDATA declared content cannot contain any sequence of characters that "looks like" an end-tag -- ETAGO (, and require all occurrences of '<', '&', and '>' in the content to be replaced with '<', '&', and '>'. This is more consistent with the rest of HTML. 2) Use and add browser support for CDATA marked sections: This is the approach favored by most other SGML applications. 3) Allow scripts to be included by external reference: This approach may increase network latency, but has the advantage of better backward-compatibility with SCRIPT-unaware user agents. * * * CDATA declared content is in general a bad idea (it should not be used for STYLE either, and IMO the XMP and LISTING elements should be removed entirely.) Of all of SGML's broken features, CDATA declared content is among the worst. For more details, please refer to the relevant entries on Robin Cover's SGML Web Page [3] under "Other Grammar/Parsing Issues and FEATURES" [4]. Many of the issues brought up there are not particularly relevant to the Web, though there are other problems with CDATA declared content that make it especially dangerous for HTML. [I've expounded on this before on html-wg, but because of the current lack of a working archive I can't cite references :-(] Two things that come to mind are that the presence of *any* element with CDATA or RCDATA declared content in the HTML DTD makes it much more difficult to write a Web search engine -- it becomes necessary to parse against the DTD instead of simple lexical scanning, e.g., with tools like Dan Connolly's lexical analyzer [5] -- and that it greatly increases the amount of SGML knowledge necessary for authors to construct a valid document including such elements. [1] [2] [3] [4] [5] --Joe English joe@art.com