SGML-XML DTD Converters

From      owner-xml-l@LISTSERV.HEANET.IE Sat Jul 18 04:51:27 1998
Date:     Sat, 18 Jul 1998 10:29:50 +0100
From:     Sean Mc grath <digitome@IOL.IE>
Subject:  Re: SGML-XML DTD converters

Notes on SGML-to-XML DTD conversion, by Sean McGrath and by Murray Altheim.

[Sarah Delaney]
>I'm currently endeavouring to do a thesis on SGML and XML, and part of it
>involves converting an SGML DTD into an XML DTD. I know SX converts
>document instances, and I've been looking for something that does DTDs. If
>anyone happens to know of such a program, could you let me know?
>Alternatively, are there guidelines in existence for manual conversion?

I do not know of any program that converts SGML DTDs to XML DTDs. As for
guidelines I belive Eve Maler of Arbortext (or is it Norman Walsh?)
of ( has published some guidelines on their
Web site (sorry, no URL handy).

Off the top of my head, here are some of the main things to watch for:-

0) No equivalent of the SGML Declaration. So, keywords, character set
etc. are essentially fixed.

1) Tag mimimization is not allowed:-
        <!ELEMENT x - O (A,B)> --> <!ELEMENT X (A,B)>
        <!ELEMENT x - O EMPTY> --> <!ELEMENT X EMPTY>

2) #PCDATA must only occur extreme left in an OR model. I.e.
        <!ELEMENT x (A|B|#PCDATA|C)> ->  <!ELEMENT x (#PCDATA|A|B|C)>

        <!ELEMENT x (A,#PCDATA)> -> Illegal

3) No CDATA, RCDATA elements

4) Some SGML Attribute types are not allows in XML i.e. NUTOKEN. Also
no notation attributes (data attributes)

5) Some SGML Attribute defaults are not allows in XML i.e. CONREF

6) Comments cannot be "inline" like this:-
        <!ELEMENT x (A,B) -- this is an SGML comment -->

7) A whole bunch of SGML optional features are not present in XML
        All forms of Tag minimization (omittag, datatag,shortref)
        Link Process Definitions
        Multiple DTDs per document
        And last but not least,


there are some important differences betweeen the internal and
external subset portion of a DTD in XML:-

8) Marked sections can only occur in the external subset

9) Paramater entities must be used to replace entire declarations in
the internal subset portion of a DTD. E.g. this is invalid XML
        <!DOCTYPE x [
        <!ENTITY % model x "(A|B)*">
        <!ELEMENT x %modelx;>

As I say, this is off the top of my head. I'm sure I have missed
a bunch. Peter Flynn, is this in your FAQ and if not, should it be?
Sean Mc Grath
+353 96 47391

"There are three types of people in the world - those who can
count and those who cannot."

Date: Mon, 20 Jul 1998 10:50:50 -0700
From: Murray Altheim <altheim@MEHITABEL.ENG.SUN.COM>
Subject: Re: SGML-XML DTD converters

The more difficult one (in my experience so far) is the presence of 'ps+'
(parameter separator) in the SGML standard as a whitespace delimiter
in markup declarations. 'ps+' allows for whitespace, COM-delimited
comments, and *parameter entities*. The latter can open a few worm cans,
since they can be used to include-by-entity-reference almost any part of
the declaration.