[This local archive copy mirrored from the canonical site: http://www.via.ecp.fr/sgml-tools/archives/9802/msg00080.html; links may not have complete integrity, so use the canonical document at this URL if possible.]


The mandate of the SGML Tools project is to make SGML useful to
mainstream users who do not have the time to learn about and apply the
various SGML tools (parsers, formatters, DTDs) but who could benefit
from those tools. Of special interest are the free software community
including the Linux Documentation Project and the Free Software
Foundation. As much as possible, the SGML Tools project aims to use
industry standards to guarantee the longevity and portability of
documents created using the package.


The project intends to provide

 * a simple distribution, using standard distribution technologies
 * a set of simple DTDs applicable to the most common document types
created by our user population
 * formatting tools to make those documents presentable in print and
 * documentation on how to use SGML, the DTDs and the formatting tools

In other words, everything needed to make SGML useful "to the masses."



We intend to supply .tgz, .zip and .rpm packages. The code inside the
distributions should run on all major Unixes and both Win32 platforms.

  Supported Formats:

We will support the full range of SGML documents supported by James
Clark's SP parser. We encourage document authors to restrain themselves
to the XML[1] subset of SGML as that subset will have the greatest
portability to new platforms (i.e. Internet Explorer 4.0, handheld
devices with JVMs and so forth). At the same time we recognize that
where this portability is not required, XML can be overly restrictive
and verbose.

[1] http://www.w3.org/XML

  Supported DTDs:

SGML Tools will directly support a short list of DTDs but will provide a
generalized architecture that others can use to extend these DTDs. These
DTDs would all be based on two "base" document types: HTML[1] and
DocBook[2]. The former is simple and well known and the latter more
appropriate for large, technical documentation. Where appropriate, those
new extensions will be rolled back into the SGML Tools architecture.
Other DTDs could be supported through conversion to DocBook, but we will
encourage people to stick to these industry standard DTDs where

Those that want to do something far afield of our document types are
essentially "on their own." Any formatting infrastructure that attempts
to support all document types will become as generalized (and thus
complicated) as DSSSL or XSL. It makes no sense to try and reinvent that
wheel. Our goal is the opposite: to constrain SGML's infinite choices to
a simple subset that people can actually use "out of the box."

On the other hand, if someone goes to the effort to develop a new "base"
document type (for instance a TEI[3] "base") with modular stylesheets
and DTDs conforming to our architecture, then we can distribute and
support that architecture just as we do DocBook and HTML. We have no
desire to constrain people to the first document types we develop, but
we intend to always support a finite number of DTDs (plus extensions to
those DTDs, through modules).

[1] http://www.w3.org/MarkUp/
[2] http://www.oreilly.com/davenport/
[3] http://www.sil.org/sgml/acadapps.html#tei

A good Demo of DocBook is at


We recognize that the future of SGML document formatting lies in the
related standards of DSSSL and XSL. In the SGML industry other
formatting systems are being replaced by these standards and we expect
the same thing to happen in the free software community. Rather than
developing Yet Another Stylesheet Language built upon Yet Another SGML
Formatting Engine we intend to build upon these industry standards and
the high quality tools that are emerging to support them. In particular
we want to capitalize on the high quality Jade formatting engine to
achieve flexible output based on industry standards. But we also
recognize that the DSSSL syntax that Jade supports is intimidating to
some and thus intend to allow authors and "tweakers" to avoid it.

Here is how we propose to achieve this best of both possible worlds:
First, we recognize that most of the work required to support our target
audience already exists in the combination of Jade and the DSSSL
Stylesheets for DocBook. We would initiate a one-time translation of
these documents into DocBook. All of our document types would be
expressed as simpler subsets of DocBook (e.g. using DocBook element type
("tag") names and a similar structure). We recognize that no one
document type can serve all purposes, but feel that DocBook is
sufficiently flexible to handle the variety of documents (usu. technical
documents) that we are interested in. We also believe that a subset of
DocBook can be made which is as easy to learn as any other DTD.

Extensions to this structure could be accomplished using a) DTD
fragments included using DocBook's standard extension mechanisms
(perhaps made easier through the provision of tools and documentation)
and b) stylesheet fragments written in the increasingly popular XSL
stylesheet language (or DSSSL). We believe that XSL is as simple a
stylesheet language as is possible. Though there are simpler ones (like
CSS) they are demonstrably not powerful enough to support generalized
extension of formatting. If they were, the W3C would not have invented
XSL. XSL has the following advantages:

 * (becoming) industry standard
 * multiple independent implementations
 * declarative, which is enough for most simple tasks
 * about as simple to learn as anything else
 * tutorials already exist
 * extensible through a simple, popular scripting language (JavaScript)
 * convertable into DSSSL through free tools
 * explicitly designed to allow extension of existing stylesheets

XSL has one large disadvantage:

 * it is still under development

Still, it is no worse to build on a shifting standard than to build a
proprietary competitor to a standard. We don't have to follow every
change to the standard, and when it is complete, it will be a lot easier
to update our extensions than it would be if we had used something
completely proprietary such as Java or Perl (these tools are
"proprietary" in the sense that basic stylesheet constructs must be
reinvented to make them applicable to the task).

Another option is to create a CSS->DSSSL converter. CSS is older and
more stable than XSL, but less powerful. It would allow an author to
change, for instance, the formatting of a title from 12pt to 14pt, but
not to invent a whole new extension for formatting (e.g.) context free
grammars or a new table model.

Work Required:

To summarize, the work required to create a new SGML Tools based on
these industry standards boils down to:

 * a conversion script from LinuxDoc to DocBook
 * a series of simple DocBook subsets for various tasks
 * "glue" scripts to make the various tools seamless
 * lots of documentation
 * a robust definition of our modularity conventions
 * nice packaging
 * perhaps a converter from CSS to DSSSL
 * perhaps some extensions to the DocBook stylesheets

These are not sexy projects and are not as exciting as developing a new
formatting infrastructure from the ground up, but the simple fact is
that the hard work to bring SGML to the masses has already been done --
but nobody has packaged it up properly. SGML Tools could start from
scratch (again) but why bother?

 Paul Prescod