[This local archive copy mirrored from the canonical site: http://www.via.ecp.fr/sgml-tools/archives/9802/msg00080.html; links may not have complete integrity, so use the canonical document at this URL if possible.]
Mandate: The mandate of the SGML Tools project is to make SGML useful to mainstream users who do not have the time to learn about and apply the various SGML tools (parsers, formatters, DTDs) but who could benefit from those tools. Of special interest are the free software community including the Linux Documentation Project and the Free Software Foundation. As much as possible, the SGML Tools project aims to use industry standards to guarantee the longevity and portability of documents created using the package. Goals: The project intends to provide * a simple distribution, using standard distribution technologies * a set of simple DTDs applicable to the most common document types created by our user population * formatting tools to make those documents presentable in print and online * documentation on how to use SGML, the DTDs and the formatting tools In other words, everything needed to make SGML useful "to the masses." Architecture: Distribution: We intend to supply .tgz, .zip and .rpm packages. The code inside the distributions should run on all major Unixes and both Win32 platforms. Supported Formats: We will support the full range of SGML documents supported by James Clark's SP parser. We encourage document authors to restrain themselves to the XML[1] subset of SGML as that subset will have the greatest portability to new platforms (i.e. Internet Explorer 4.0, handheld devices with JVMs and so forth). At the same time we recognize that where this portability is not required, XML can be overly restrictive and verbose. [1] http://www.w3.org/XML Supported DTDs: SGML Tools will directly support a short list of DTDs but will provide a generalized architecture that others can use to extend these DTDs. These DTDs would all be based on two "base" document types: HTML[1] and DocBook[2]. The former is simple and well known and the latter more appropriate for large, technical documentation. Where appropriate, those new extensions will be rolled back into the SGML Tools architecture. Other DTDs could be supported through conversion to DocBook, but we will encourage people to stick to these industry standard DTDs where possible. Those that want to do something far afield of our document types are essentially "on their own." Any formatting infrastructure that attempts to support all document types will become as generalized (and thus complicated) as DSSSL or XSL. It makes no sense to try and reinvent that wheel. Our goal is the opposite: to constrain SGML's infinite choices to a simple subset that people can actually use "out of the box." On the other hand, if someone goes to the effort to develop a new "base" document type (for instance a TEI[3] "base") with modular stylesheets and DTDs conforming to our architecture, then we can distribute and support that architecture just as we do DocBook and HTML. We have no desire to constrain people to the first document types we develop, but we intend to always support a finite number of DTDs (plus extensions to those DTDs, through modules). [1] http://www.w3.org/MarkUp/ [2] http://www.oreilly.com/davenport/ [3] http://www.sil.org/sgml/acadapps.html#tei A good Demo of DocBook is at :http://www.oreilly.com/davenport/samples/at1.sgm Formatting: We recognize that the future of SGML document formatting lies in the related standards of DSSSL and XSL. In the SGML industry other formatting systems are being replaced by these standards and we expect the same thing to happen in the free software community. Rather than developing Yet Another Stylesheet Language built upon Yet Another SGML Formatting Engine we intend to build upon these industry standards and the high quality tools that are emerging to support them. In particular we want to capitalize on the high quality Jade formatting engine to achieve flexible output based on industry standards. But we also recognize that the DSSSL syntax that Jade supports is intimidating to some and thus intend to allow authors and "tweakers" to avoid it. Here is how we propose to achieve this best of both possible worlds: First, we recognize that most of the work required to support our target audience already exists in the combination of Jade and the DSSSL Stylesheets for DocBook. We would initiate a one-time translation of these documents into DocBook. All of our document types would be expressed as simpler subsets of DocBook (e.g. using DocBook element type ("tag") names and a similar structure). We recognize that no one document type can serve all purposes, but feel that DocBook is sufficiently flexible to handle the variety of documents (usu. technical documents) that we are interested in. We also believe that a subset of DocBook can be made which is as easy to learn as any other DTD. Extensions to this structure could be accomplished using a) DTD fragments included using DocBook's standard extension mechanisms (perhaps made easier through the provision of tools and documentation) and b) stylesheet fragments written in the increasingly popular XSL stylesheet language (or DSSSL). We believe that XSL is as simple a stylesheet language as is possible. Though there are simpler ones (like CSS) they are demonstrably not powerful enough to support generalized extension of formatting. If they were, the W3C would not have invented XSL. XSL has the following advantages: * (becoming) industry standard * multiple independent implementations * declarative, which is enough for most simple tasks * about as simple to learn as anything else * tutorials already exist * extensible through a simple, popular scripting language (JavaScript) * convertable into DSSSL through free tools * explicitly designed to allow extension of existing stylesheets XSL has one large disadvantage: * it is still under development Still, it is no worse to build on a shifting standard than to build a proprietary competitor to a standard. We don't have to follow every change to the standard, and when it is complete, it will be a lot easier to update our extensions than it would be if we had used something completely proprietary such as Java or Perl (these tools are "proprietary" in the sense that basic stylesheet constructs must be reinvented to make them applicable to the task). Another option is to create a CSS->DSSSL converter. CSS is older and more stable than XSL, but less powerful. It would allow an author to change, for instance, the formatting of a title from 12pt to 14pt, but not to invent a whole new extension for formatting (e.g.) context free grammars or a new table model. Work Required: To summarize, the work required to create a new SGML Tools based on these industry standards boils down to: * a conversion script from LinuxDoc to DocBook * a series of simple DocBook subsets for various tasks * "glue" scripts to make the various tools seamless * lots of documentation * a robust definition of our modularity conventions * nice packaging * perhaps a converter from CSS to DSSSL * perhaps some extensions to the DocBook stylesheets These are not sexy projects and are not as exciting as developing a new formatting infrastructure from the ground up, but the simple fact is that the hard work to bring SGML to the masses has already been done -- but nobody has packaged it up properly. SGML Tools could start from scratch (again) but why bother? Paul Prescod --