[This local archive copy mirrored as an excerpt from the canonical site and URL: http://www.berkshire.net/~norm/docbook/index.html 980310; links may not have complete integrity, so use the canonical document at this URL if possible.]

The DocBook DTD as XML

[Excerpted from: 'Norm's Home Page: Tidbits, Oddments, and Loose Ends, DocBook Projects']


This is an excerpt from DocBook in a Nutshell:

Converting the DocBook DTD to XML is much more challenging than dealing with the instances. It is probably not possible to construct an XML DTD which is identical to DocBook. The list below identifies most of the issues which must be addressed and describes how the DocBk30 XML DTD deals with them:

Comments are not allowed inside markup declarations

Most of them have been moved to comment declarations preceding the markup declaration that used to contain them. A few small, inline comments that seemed like they would be out of context if moved before the declaration where simply deleted.

Name groups are not allowed in element or attribute list declarations

In the small number of places where DocBook uses name groups, they have been expanded out.

One downside, DocBook uses %admon.class; in a name group to define the content model and attribute lists for elements in the admonitions class. In DocBk30 XML DTD, this convenience cannot be expressed. If additional admonitions are added, the element and attribute list declarations will have to be copied for them.

No CDATA or RCDATA declared content

Graphic and InlineGraphic have been made EMPTY. The content model for SynopFragmentRef, the only RCDATA element in DocBook, has been changed to (arg | group)+.

No exclusions or inclusions on element declarations

They had to be removed.

In DocBook, exclusions are used:

Removing these exclusions from DocBk30 XML DTD means that it is now valid, in the XML sense, to do some things that don't make a lot of sense (like put a footnote in a footnote). Caveat user.

Inclusions in DocBook are used to add the ubiquitious elements (IndexTerm and BeginPage) unconditionally to a large number of contexts. In order to make these elements available in DocBk30 XML DTD, they have been added to most of the parameter entities that include #PCDATA. If new locations are discovered where these terms are desired, DocBk30 XML DTD will be updated.

Elements with mixed content must have #PCDATA first.

The content models of many elements have been updated to make them a repeatable or group beginning with #PCDATA.

Many declared attribute types (NAME, NUMBER, NUTOKEN, etc.) are not allowed

They have all been replaced by NMTOKEN or CDATA.

No #CONREF attributes allowed.

The #CONREF attributes on IndexTerm, GlossSee, and GlossSeeAlso were changed to #IMPLIED. The content model of indexterm was modified so that it can be empty.

Attribute default values must be quoted.

Quotes were added where necessary.

Marked sections can't have spaces in the declaration

The spaces were removed.