From owner-xml-l@LISTSERV.HEANET.IE Sat Jul 18 04:51:27 1998 Date: Sat, 18 Jul 1998 10:29:50 +0100 From: Sean Mc grath <digitome@IOL.IE> Subject: Re: SGML-XML DTD converters
Notes on SGML-to-XML DTD conversion, by Sean McGrath and by Murray Altheim.
[Sarah Delaney] > >I'm currently endeavouring to do a thesis on SGML and XML, and part of it >involves converting an SGML DTD into an XML DTD. I know SX converts >document instances, and I've been looking for something that does DTDs. If >anyone happens to know of such a program, could you let me know? >Alternatively, are there guidelines in existence for manual conversion? I do not know of any program that converts SGML DTDs to XML DTDs. As for guidelines I belive Eve Maler of Arbortext (or is it Norman Walsh?) of (www.arbortext.com) has published some guidelines on their Web site (sorry, no URL handy). Off the top of my head, here are some of the main things to watch for:- 0) No equivalent of the SGML Declaration. So, keywords, character set etc. are essentially fixed. 1) Tag mimimization is not allowed:- <!ELEMENT x - O (A,B)> --> <!ELEMENT X (A,B)> <!ELEMENT x - O EMPTY> --> <!ELEMENT X EMPTY> 2) #PCDATA must only occur extreme left in an OR model. I.e. <!ELEMENT x (A|B|#PCDATA|C)> -> <!ELEMENT x (#PCDATA|A|B|C)> <!ELEMENT x (A,#PCDATA)> -> Illegal 3) No CDATA, RCDATA elements 4) Some SGML Attribute types are not allows in XML i.e. NUTOKEN. Also no notation attributes (data attributes) 5) Some SGML Attribute defaults are not allows in XML i.e. CONREF 6) Comments cannot be "inline" like this:- <!ELEMENT x (A,B) -- this is an SGML comment --> 7) A whole bunch of SGML optional features are not present in XML All forms of Tag minimization (omittag, datatag,shortref) Link Process Definitions Multiple DTDs per document ... And last but not least, CONCUR! there are some important differences betweeen the internal and external subset portion of a DTD in XML:- 8) Marked sections can only occur in the external subset 9) Paramater entities must be used to replace entire declarations in the internal subset portion of a DTD. E.g. this is invalid XML <!DOCTYPE x [ <!ENTITY % model x "(A|B)*"> <!ELEMENT x %modelx;> ]> <x></x> As I say, this is off the top of my head. I'm sure I have missed a bunch. Peter Flynn, is this in your FAQ and if not, should it be? Sean Mc Grath http://www.digitome.com/sean.htm +353 96 47391 "There are three types of people in the world - those who can count and those who cannot."
Date: Mon, 20 Jul 1998 10:50:50 -0700 From: Murray Altheim <altheim@MEHITABEL.ENG.SUN.COM> Subject: Re: SGML-XML DTD converters The more difficult one (in my experience so far) is the presence of 'ps+' (parameter separator) in the SGML standard as a whitespace delimiter in markup declarations. 'ps+' allows for whitespace, COM-delimited comments, and *parameter entities*. The latter can open a few worm cans, since they can be used to include-by-entity-reference almost any part of the declaration.