24 August 1994

1.1 NAME

dpp - DTD pre-processor parser


dpp [ -Iivdt ] < input-file > output-stream



DPP is a parser for SGML document type declarations, intended for use as a front end for filters which modify DTDs (e.g. filters to expand all or some parameter entity references, or to rename elements, etc.). Since DPP uses the same output format as sgmls (more on this below), many existing tools for writing filters for SGML document instances --- to be specific, any tools which accept input data in sgmls output format --- can be used with DPP to make filters for DTDs.

As time allows, I expect to write a number of such filters, using the tf (transducing filter) software developed by Lou Burnard and myself in Spitbol, Rexx, and tcl; these filters should make it easier to perform the following kinds of systematic changes on DTDs:

For now, however, I don't have these filters, just the DTD parser which translates a valid minimal-SGML DTD into sgmls output form. Anyone who wants to write filters for DTDs may find the parser useful; those looking for software to use, rather than software to write, may want to wait for a while yet; future releases will include filters to do all the things listed above, and possibly more.

1.4.1 Output Format

DPP writes to its standard output an sgmls-output-format stream describing the DTD on its standard input as an SGML document instantiating a specialized document type for document type definitions. That specialized document type is defined in the file dtd.dtd, which is part of the distributed material. In dtd.dtd, for example, an entity declaration is represented as an SGML element called ENTITY, which is declared as follows:

  <!ENTITY % enttext 'literal | external'                         >
  <!ELEMENT entity        - -  (entname, (%enttext))              >
  <!ATTLIST entity
            type               (pe | ge)           #REQUIRED      >
  <!ELEMENT entname       - O  (#PCDATA)                          >
  <!ELEMENT literal       - -  (#PCDATA | peref | EE)*            >
  <!ATTLIST literal
            type               (CDATA | SDATA | PI | STARTTAG
                               | ENDTAG | MS | MD | NORMAL)
  <!-- etc. -->
An entity reference which takes the following form in standard DTD notation:
<!ENTITY tei 'Text Encoding Initiative' > 
would take the following form in the DTD document type:
<entity type=ge>
  <literal>Text Encoding Initiative</>
and the following form in the output from DPP:
-Text Encoding Initiative 
Entity boundaries are also represented, so that the original entity structure of the input DTD may be mirrored on output, if desired, and parameter entity references may be resolved or left unresolved by the filters.

1.5 BUGS (and other shortcomings)

Newlines among the keywords of a marked section will cause the line numbering in error messages to be off.

Parameter entities are wrongly recognized and expanded within literals used as specifications of the default value of an attribute, or as sytem or public identifiers.

No support for short references, DATATAG, rank groups, or other optional features.

An explicit document type declaration is needed; DPP will not accept input consisting only of a DTD subset. This will probably change in a later release.

In the current release, only parameter entity references between markup declarations or within parameter literals are reflected in the DPP output; parameter entity references within markup declarations are recognized and expanded silently. This will change in the next release.

1.6 Contents of Distribution

The current version of DPP dates from April 1995 (or so); it can be found at

and includes the following files. Documentation, etc.: Yacc/bison source, output, and related header files: Flex source, output, and header files for lexical scanner: Miscellaneous utility routines (these probably belong in a library):