Cover Pages: SGML/XML and Literate Programming

Introduction. This document provides a collection of references for literate programming techniques and style in the context of descriptive markup languages, e.g., SGML, XML, DSSSL, HyTime, etc. Numerous researchers have observed that the goals of information re-use and data normalization embraced by both literate programming and SGML-based markup languages provide the basis for using the two technologies together.

SGML and Literate Programming
XML and Literate Programming
DSSSL and Literate Programming
General Reference

SGML and Literate Programming

"SWEB: an SGML Tag Set for Literate Programming." By C. M. Sperberg-McQueen. Version 0.5, 25 September 1993; revised August 1994 and March 1995; revised and extended January - March 1996. Referenced by permission from the author. Abstract: "This document describes an SGML tag set for literate programming. First, markup is provided for embedding fragments of programming-language code into SGML documents in arbitrary order, to be recombined before compilation into the order required by the programming language's syntax. Next, tags are defined for identifiers, keywords, code fragments, and literal values occurring as phrase-level elements in the prose documentation. Finally, tags for indexing and for a general structure for reference documentation (alphabetical lists of functions and identifiers, etc.) are defined. For each type of markup, the document gives examples and describes how the markup should be processed by conventional literate-programming weave and tangle processes." [local archive copy]
"A Simple Yacc/Lex Processor for Sweb, an SGML Tag Set for Literate Programming." By C. M. Sperberg-McQueen. This represents a "version 0" implementation of Sweb using yacc and lex. Referenced by permission from the author. Abstract: "This document describes the implementation of a simple processor for Sweb, an SGML tag set for literate programming. The processor is written with the aid of yacc and lex, in several stages. The first stage merely recognizes the <scrap> elements in the input stream; the next also parses their attributes. The next uses their attribute values to build linked lists of scraps and of files of code to be written to disk at the end of the run. After several such stages, a complete working version of the program is finished; later stages add new features. Each stage is kept simple to write and thus simple to understand." [local archive copy]
Also on Sweb: note the reference to Michael Sperberg-McQueen's "Sweb" in the 'XML Specification DTD' documentation - 'Sweb, a TEI-compatible extension tag set for literate programming' as background to the development of this W3C DTD. For context, check out the 'odd' system used to produce the TEI P3 DTDs and associated documentation. See the 1995 CTS posting for a summary. [See following documentation on ODD]
[June 29, 2005] "Documenting Relax NG." By Sebastian Rahtz. Posting to 'rng-users' Discussion List. June 30, 2005. Sebastian Rahtz (Information Manager, Oxford University Computing Services) reports that The Text Encoding Initiative 'P5' XML schemas "are maintained using a module of the TEI called 'tagdocs', which allows for describing elements, attributes, classes of elements, 'entities', and so on. The whole of the large TEI Guidelines are written in the same single document; hence the name of 'ODD' for this system (One Document Does it all). The ODD format reverts to Relax NG markup for specifing the content model of elements. From this we derive, as needed, (1) documentaton (HTML, PDF, LaTeX, TEI XML etc); (2) Relax NG schemas (RNC using trang); (3) XSD schemas (using trang); (4) DTDs (direct translation). There is a complex and powerful system for writing customizations (additions, deletions, changes, internationalization), expressed in the same language." [Note: The 'rng-users' forum at Yahoo!Groups is a mailing for users of RELAX NG, which is a simple but powerful schema language for XML. RELAX NG is used by Open Document, RDF syntax, SVG 1.2, XHTML 2.0, and so forth.]
TEI ODD ['One Document Does It All'] Format for DTD generation. From TEI ED W23: "[TEI] DTDs: These will be presented both (a) much as in the current appendix B, and (b) with alternation of prose commentary and DTD, roughly as in Donald Knuth's publications of Web, TeX and Metafont. It has proved feasible to make a version of Knuth's Web system to work with SGML documents and DTDs (see TEI ED W29, the documentation for the ODD system), and that system is used to generate both the DTD fragments and the full DTDs..." See TEI ODD Format and Multiple Schema language constraints (comments from Sebastian Rahtz and Lou Burnard, 2002-03). References:
- Documenting Relax NG (above)
- "One Document Does It All: Documentation for an ODD system for tag set construction." TEI ED W29. By C. M. Sperberg-McQueen and Lou Burnard. Document creation date: November-December 1991. [SGML source]
- ODD files in tar format. See also the ZIP archive.
- Simple example of ODD text TEI EDW30 ODD. 5 June 1992. By Lou Burnard and C.M. Sperberg-McQueen. In P1 ODD format with embedded commentary; see Supplementary comment in plain text (cache). See the HTML display text.
- "Processing Odd files at UICVM" [ED A16 source]
Overview of ODD from "One Document Does It All: Documentation for an ODD system for tag set construction." TEI ED W29. By C. M. Sperberg-McQueen and Lou Burnard. Document creation date: November-December 1991. From the initial section:
The Odd (One Document Does it all) system is a prototype SGML DTD-generator developed to aid in the production of version 2 of the Text Encoding Initiative's Guidelines for Text Encoding and Interchange (TEI P2). It is modeled very directly on Donald Knuth's WEB system, but substitutes TEI SGML tags for TeX as the document formatting language and SGML DTDs for Pascal as the programming language.

[Note:] WEB is distributed with the public-domain code of TeX and MetaFont, which are written in WEB. See Donald E. Knuth, Literate Programming (submitted to The Computer Journal [n.d.]) and [Donald Knuth], [WEB User Manual], Stanford Computer Science Report CS980 (September 1983). Both are distributed with (at least some versions of) TeX.

In general, Odd works like this: in a single document (the Odd document) the user describes an SGML-based markup language or tag set, using the Odd tag set. The Odd document is then used by three distinct processors to produce three very different kinds of output:
- One processor (OddP2X) produces SGML-tagged prose documentation for the tag set, embedding fragments of the DTD within it (as Pascal code is intermingled with prose in the Web system); it corresponds to Knuth's Weave processor.
- A second processor (OddRef) produces SGML fragments to be included in an alphabetic reference manual of the tags, attributes, and other items in the markup language. OddRef has no direct correspondence in WEB, though it is closer to Weave than to Tangle.
- The third processor (OddDtd) produces an SGML DTD for the markup language; it corresponds to Knuth's Tangle processor.
A fourth processor (DtdOdd) has been developed which makes the production of Odd documents easier: it takes a conventional DTD as input and produces a set of partially completed tag documentation crystals; it can be used as the first step in preparing Odd documentation for existing tag sets.

In this document, we describe the types of files used by the Odd system for input and output, the specialized tags provided by the Odd DTDs, the DTDs and their structure, and the processors which work with them. An appendix includes a brief description of the tags in Tiny.dtd, the base tag set. The document in its current form is intended for internal use within the TEI and assumes either a profound familiarity with SGML and the TEI DTDs or a great tolerance for unfamiliar technical material.
Daniel Morales Germán, Donald Cowan, and A. Ryman: "SGML-Lite -- An SGML-based Programming Environment for Literate Programming," October 1996. See also Germán, Daniel Morales: "An SGML Based Programming Environment for Literate Programming," 1994.
Hans Holger Rath and Hans-Peter Wiedling, "Making SGML Work: Introducing SGML Into an Enterprise and Using its Possibilities in Advanced Applications." In Computer Standards & Interfaces 18/1 (January 1996) 37-53 (with 11 references). See especially section 3.1 ("Integrated specification and documentation of hypermedia SGML applications", pages 41-45), where the authors define and elaborate upon a "Literate Specifying" approach, called "SWEBS: SGML Web System." As of the date of publication ("January 1996"), SWEBS was said to "under development. The SWEBS DTD is quite stable. The graphical DTD editor, the generator of the setup data, and the interface to SGML are almost completed. . . for the moment, a self-defined layout language is used, which is very close to DSSSL. The self-description of SWEBS within SWEBS shows that SWEBS is a powerful, but easy-to-use tool to define a complex SGML application and to produce easy-to-understand documentation. The idea of 'Literate Programming' is very well applicable for the specification of SGML applications."
[July 16, 2001] DBLP: DocBook-based Literate Programming. By Mark B. Wroth. Version 25, 20 April 2001. "The DocBook-based Literate Programming system provides mechanisms to write literate programs using a minor extension of the SGML DocBook DTD, and to create both computer usable versions ("tangle") and human readable versions ("weave") of these programs. The system consists of two main parts: (1) A DTD that extends DocBook to add the logic needed for literate programming. These are relatively minor extensions to the basic DTD. (2) DSSSL style sheets that, together with a DSSSL engine that implements some of James Clark's extensions, serve as tangle and weave processors. The documentation of the project also discusses the design considerations behind the implementation, and provides a short sample document that serves as an example of how the DTD is used (and serves as a simple test case)." [main difference between version 1.4 and version 2.5 is the translation of the system into itself, rather than nuweb] This information supersedes that in the following entry.
[April 13, 2001] **See preceding entry for update. "DocBook-Based Literate Programming." By Mark Wroth. Version 1.3, April 7, 2001. 35 pages. "The DocBook-based Literate Programming system provides a mechanism to write literate programs using a minor extension of the Standard Generalized Markup Language (SGML) DocBook Document Type Definition, Document Type Declaration (DTD). The system consists of two main parts: (1) A DTD that extends DocBook to add the logic needed for literate pro-gramming. These are relatively minor extensions to the basic DTD. The details are discussed in Chapter 3. (2) Document Style Semantics Specification Language (DSSSL) style sheets that, together with a DSSSL engine that implements some of James Clark's extensions, serve as 'weave' and 'tangle' processors. These style sheets are discussed in Chapters 5 and 4, respectively. This document also discusses the design considerations behind the implementation, and provides a short sample document that serves as an example of how the DTD is used (and serves as a simple test case). ... This project creates a set of extensions to the DocBook SGML DTD to allow its use for literate programming markup. The resulting system shall (1) Provide a mechanism to extract program files from the literate programming source in appropriate forms for their use as source code in the in-tended programming language or languages. (2) Permit the use of existing DocBook-based tools with only minor modifications (ideally none) to produce documentation of software projects. It is the intention of this system to: a) Maintain the ability to update the extensions to new versions of Doc-Book as they are published. b) Make the extensions as easy as possible to to move between the SGML and XML versions. c) Make it as simple as possible to add the literate programming functionality to other DocBook-based DTDs. d) Provide a basis on which other implementations of the tangle and weave functions could be built to support other tool chains. This system performs three basic functions: (1) Provides a DTD that allows the markup of literate programs, including a exible system for describing the purpose and implementation of the computer program (based on DocBook) and markup of the program code itself to allow the literate program to produce the computer instructions. (2) A tangle mechanism that actually produces the computer instructions from the literate programming source code. This implementation, SGMLTangle.dsl is a DSSSL style sheet using extensions to the DSSSL standard as implemented in James Clark's Jade DSSSL engine. (3) A weave implementation that renders the literate programming source into useful documentation. This style specification, SGMLWeave.dsl, extends Norman Walsh's Modular DocBook Style Sheets. It provides both print and HTML output, in the style sheets print and HTML respectively." See also the posting and "An Experiment in Literate Programming Using SGML and DSSSL", Revision 0.109, December 31, 1999. [cache article, and code]

XML and Literate Programming

[October 15, 2002] "Literate Programming in XML." By Norman Walsh (XML Standards Architect, Sun Microsystems, Inc). Sun Microsystems Technical Report. October 15, 2002. Presented at the XML 2002 Conference, Friday, December 13, 2002 in Baltimore, MD, USA. See also the presentation slides. "Literate programming is a programming and documentation methodology. Its central tenet is that documentation is more important than source code and should be the focus of a programmer's activity. Literate programming facilitates this approach by combining code and documentation into a single, unified source document. One interesting aspect of this combined form is that it is neither source code nor documentation. Instead, a literate programming system provides tools that allow a user to extract the source code or documentation automatically, but neither of these extracted forms is ever modified. Using a literate programming system offers some interesting benefits for many programming styles. Because the combined format is machine processed to produce the source code, the author is no longer required to maintain or write the code in the linear fashion that the computer ultimately expects. This is clearly advantageous for top-down and bottom-up design strategies. It may also have benefits for more modern programming methodologies, such as Extreme Programming. Typical literate programming systems are quite complex. They are built on top of some underlying documentation system (such as TeX) and described in terms of the macros and other documentation markup required to describe an xweb document. However, it quickly becomes apparent that XML can greatly simplify this situation. By stipulating that the documentation format include a few (namespaced) elements, it is possible to implement literate programming in XML on top of any format that the author chooses: DocBook, TEI, XHTML, you name it. In the past few years, the number of XML vocabularies has exploded. Where there used to be just a few, there are now hundreds. In addition, many of these new vocabularies have all sorts of sophisticated processing expectations: XSLT, W3C XML Schema, RELAX NG, Schematron, SAML, SVG, and MathML just to name a small handful. Luckily, Literate Programming with XML applies equally well to XML, so it is possible to apply a literate programming methodology to the development of XML vocabularies. This paper describes the design and implementation of a literate programming system using XML and XSLT. The resulting system is equally capable of authoring systems in traditional programming languages and systems that are themselves built from XML. The paper includes several examples to demonstrate these features and pointers to real-world systems that are actively exploiting the power it offers..."
[January 03, 2002] xmLP: A New Literate Programming Tool for XML. A posting from Anthony B. Coates (Reuters Plc) announces the public availability of an XSLT-based literate programming tool for XML. Version 1.0 of xmLP is distributed under the Lesser GNU Public License, and may be downloaded from SourceForge. xmLP is "a literate programming tool written in XSLT and heavily influenced by experience with FunnelWeb, a non-XML literate programming tool. xmLP differs from traditional literate programming tools when it comes to weaving. Traditionally, weaving involves both generating cross-reference information and producing formatted output. However, tools like XSLT make it unnecessary for an XML literate programming tool to deal with display rendering. Hence the xmLP weaver is intentionally minimalist, and does nothing except add cross-reference information to the original literate document. This additional cross-reference information makes it much easier to build cross-reference hyperlinks using a simple rendering XSLT stylesheet." [Full context]
[August 05, 2002] "xmLP: Literate Programming tool for XML and text." By Anthony B Coates and Zarella Rendon (XML Factor). Presented at the Extreme Markup Languages Conference 2002. Wednesday, August 7, 2002. "LitProg (Literate Programming) is a technique created by Donald Knuth to make computer programs readable and maintainable: the source code for the program is interspersed in the prose documentation of the program's function, data structures, algorithms, and organization. Knuth's original WEB system used an extended form of TeX for the documentation part, and an extended version of Pascal for the compilable code. Many literate programming systems have been developed since, using markup languages other than TeX and customized for programming languages other than Pascal. The literate programming tool xmLP uses XML as the documentation language and can be used to develop literate programs (or other control file sources) whose content is XML or text..." See also "xmLP: A New Literate Programming Tool for XML."
[December 31, 2001] "SXML as a higher-order markup language and a tool for literate programming." By Oleg Kiselyov. "S-expressions, DOM trees and syntax-heavy XML documents are three different realizations of a hierarchy of containers made of strings and other containers (Infoset). Unlike DOM trees, S-expressions and XML documents both have an external representation. SXML is a S-expression-based, parsed, abstract syntax tree representation of an XML document; as such SXML is concise, expressive and more suitable for queries and transformations than the raw XML... SXML is also suitable for literate XML programming -- design of a markup format. A literate design document should permit a transformation into a well laid-out, easy-to-read hyperlinked user manual. A literate design document should be easy to write. And yet the user manual should be precise enough to allow automatical extraction of a formal specification. SXML fulfills all these roles. SXML is similar to TeX, but far easier to write and read. SXML transformations do the job of "weaving" a document type specification and of "typesetting" the user manual. See also: XML and Scheme.
[March 03, 2001] xmLP - Literate Programming in XML. The xmLP tool demonstrates how XML can be used to convert any underlying document into a literate program. At present, xmLP uses a small number of XML tags which can be included in any XML document. The supplied sample shows the xmLP tags embedded in an HTML document (which could just as easily be an XHTML document or a DocBook document). xmLP is heavily influenced by FunnelWeb, which is language-independent. FunnelWeb views all source code as plain text which is to be assembled into one or more source code files. This proves to be advantageous if you need to embed multiple file types in a document (e.g., source code + text resources). Available online, [cache]
XML-LitProg mailing list. The list is "dedicated to the development of literate programming tools that leverage XML, XSL, and XLink/XPtr libraries, so that the jobs of parsing and weaving (producing documentation), perhaps tangling (producing source code) too, are not part of the literate programming core, and do not need to be redone by each author of a literate programming tool." See a provisional task list.
Seminar - XML and Literate Programming. By Anthony B. Coates.
[May 12, 2000] Literate Programming: Experiences with Marius. By Andrew Dwelly. See also the article 'Java, XML, and Literate Programming' below.
XML and Literate Programming - FunnelWeb
[June 15, 2001] "XML-Lit." See the SourceForge project 'XML-Lit: A Simple XML-Based Literate Programming System'. A communique from Rafael R. Sevilla: "I've started a new XML literate programming project I call XML-Lit.... 'XML-Lit: A simple XML-based literate programming system' This project is somewhat inspired by a very simple program by Jonathan Bartlett called xmltangle. XML-Lit is a simple literate programming system that you can use with any XML-based markup language to make your literate program..." From the introduction to the documentation: "I recently found a simple program called xmltangle by Jonathan Bartlett that provides a simple literate programming system based on DocBook. I have been somewhat frustrated by that program though; for one thing, it did not allow program code snippets to be enclosed within CDATA sections, which would make including a program inline a lot easier to do, and easier to read on screen while you're editing it, especially with programming languages that are chock full of <'s such as the typical C program, or worse yet, an XSL stylesheet, which I planned to use Jonathan's program for. So I set off to create a complete rewrite of the program, which uses James Clark's expat XML parser. So now, I have come up with my own simple literate programming system, xml-lit which takes a similar approach, but instead of enclosing code snippets within within DocBook <programlisting/> tags, I define a new namespace xml-lit which (for now) contains a single tag <xml-lit:code> which has a single attribute named file which gives the name of the file to which the code it encloses should be output. This eliminates the program's dependency on DocBook, so it can be used with any XML-based document markup language (such as XHTML). It's a very simplistic system, but it's able to do the task for which it was designed. The program is also backward-compatible with Jonathan's work given a command line switch..." See the source code download and online documentation.
'The toxml Back End for noweb' - Barry MacKichan
[January 07, 2000] "Java, XML, and Literate Programming." By Andrew Dwelly. In Dr. Dobb's Journal (February 2000), pages 62-68. "Marius, the system Andrew [Dwelly] presents here, implements some of Donald Knuth's ideas about literate programs, but uses Java as its programming language, with HTML as the output. In the process, Marius leverages the power of XML. Additional resources include litjava.txt (listings) and litjava.zip (source code)." [local archive copy, listings/src] See the treatment in the Marius page at CedillaSoft: "There are a number of programmer's tools available such as parsers that make it easy to work with XML. In case of Marius the source document has to be marked up into areas that represent code, and areas that represent explanation, XML is the ideal way to accomplish this. Taking advantage of the free XML parser available from Sun, means that projects such as Marius can be created in a remarkably short time -- days rather than months. The Marius source document will contain chunks of explanatory text, called 'narrative' and chunks of Java code that will ultimately be assembled into working classes. However, since we want to present code in an order suitable for explanation, we will allow code to be presented before a class is defined, or indeed at any point in the document. We rely on our version of Weave to take the source document and produce HTML that can be read with an ordinary web browser. We will use another program to take the source document and produce syntactically correct Java. In this case, we would like the Java to be readable as well so our program is called 'Comb' rather than Knuth's original 'Tangle'..."
See JDox - "The DTD that defines the XML formatting of JDox closely follows Sun's Javadoc specification, and utilizes Sun's Doclet technology in creating the XML files. We have also expanded on the Javadoc standard by providing space for code examples. Planned extensions to JDox will further enhance the comprehensive nature of this powerful documentation tool." See: "JDox: XML Format for Sun Javadoc."
[April 16, 2001] "The Design of the DocBook XSL Stylesheets." By Norman Walsh (XML Standards Engineer Sun Microsystems, Technology Development Center). Paper presented at XSLT-UK 01 (08 Apr - 09 Apr 2001, Keble College, Oxford, England). 08-April-2001. Version 1.0. "Building stylesheets for a large, rich XML vocabulary is a challenging exercise. This paper explores some of the design issues confronted by the author in designing XSL stylesheets for DocBook, an XML DTD maintained by the DocBook Technical Committee of OASIS. It is particularly well suited to books and papers about computer hardware and software (though it is by no means limited to these applications). DocBook consists of nearly 400 tags. The HTML and Formatting Object stylesheets each consist of roughly 1000 templates spread over about 30 files. The design for the DocBook XSL Stylesheets attempts to meet the following goals: (1) Full support for all of DocBook. (2) Full support for both HTML and XSL Formatting Object presentations. (3) Utility for a wide range of users, with varying levels of technical skill. (4) Support across diverse hardware and software platforms. (5) Provide a framework on top of which additional stylesheets can be written for schemas derived from DocBook. (6) Support for internationalization. (7) Support for a wide range of projects (books, articles, online- and print-centric presentations, etc.) Although not all of these goals have been completely achieved, progress has been made on all of them. Five techniques stand out as important factors in achieving these goals: modularity, parameterization, self-customizing stylesheets, 'literate' programming, and extensions. The rest of this paper will discuss these techniques in detail... [Conclusion:] XSL Transformations and Formatting Objects are a rich platform on which to build stylesheets for large, sophisticated XML vocabularies. Designing stylesheets that will be adaptable and maintainable is an interesting software engineering challenge. In this paper we've examined five factors that contribute to the successful design of XSL stylesheets: modularity, parameterization, stylesheet generation, documentation, and XSLT extensions." References: (1) "DocBook XML DTD"; (2) "Extensible Stylesheet Language (XSL/XSLT)."
[July 20, 2002] XMLTangle - Literate Programming in XML. See the SourceForge project page and "XMLTangle - Literate Programming in XML.". "Literate Programming is a style of programming in which the programmer writes an essay instead of a program. The essay's code fragments are then merged together to form a full program which can be compiled or interpretted. This article is a literate program designed to perform this task with XML documents... The current literate programming tools are problematic, however. They are still too wedded to individual programming languages and document formats. This program is a version of the tangle program which has the following features: (1) Works with any programming language; (2) Uses XML as the documentation language; (3) Is not tied to any specific DTD - instead it relies on processing instructions to perform its tasks; (4) It does not include every feature of literate programming - specifically it does not include any macro facility. Originally this program was written in C, only worked with the DocBook DTD, and only had a very primitive subset of the literate programming paradigm. Specifically, the code could only be broken up into files - it was not possible to include named code fragments which would be defined elsewhere - you could only append to files. This version is written in Python and captures much more of the literate paradigm... Future Developments. The following features will be implemented before releasing version 1.0: (1) Error checking and reporting to make sure no constructs are used out of order; (2) Verify that no reference is unused; (3) Verify that no section contains circular references before printing; (4) Make sure all sections belong to files; (5) Include easy-access features for DocBook (i.e. - make programlisting automatically set the first line to the section id and the remaining to lp-code if the role='literate' or something like that); (6) Include easy-access features for Python (specifically for block indentation). The following features are being thought about for some time in the future: (1) Creating section indexes for DocBook or other DTDs; (2) Implementing a backend filter interface to record all top-level declarations in an index; (3) Stylistically, I could also reduce a lot of my functions into lambda functions..." Contact Jonathan Bartlett.
[See previous entry] xmltangle - literate programming with XML. By Jonathan Bartlett. "xmltangle is a program to do a somewhat literate programming style using XML DocBook. For those of you who don't know, in literate programming, you essentially write an essay about your program which also contains your program. This way, you are forced to think more clearly about the decisions you make, and how you design your program. It makes your programs better structured and better thought out. xmltangle does not include all of Knuth's web system, but it is at least a good start. The way xmltangle works, is that within a DocBook document, if it finds a <programlisting> tag, it copies the code listed there into the filename specified by the role attribute. The next version of xmltange will probably use XML processing instructions instead of specific tags, and support more of the 'literate programming' concept." See the documentation; [cache]
[May 04, 2000] "XML, Reflective Pattern Matching, and Java." By Andrew Dwelly. In Dr. Dobb's Journal Volume 25, Issue 6 (June 2000), pages 46-54. [Special Issue on Object-Oriented Design.] "Although the pattern matching available in Hex, the program Andrew presents here, is relatively simple, it is still powerful enough to perform sophisticated XML document processing. Additional resources include xmljava.txt listings and the source code." [Alternate email: andy@cedillasoft.com] On HEX: see Chapter 3 of "The Annotated Marius." "Earlier versions of Marius use a standard technique, recursive descent, for processing the parse tree of DOM nodes that the XML parser produces. The resulting classes were large and very tightly couple to the structure of the Marius DTD. Even the smallest change to the DTD requires a change sometimes a substantial one to the class that processes the tree of Node objects to produce the output. From version 0.6 a new technique is used, reflective pattern matching (rPM). This idea, original to the Marius project, uses one of the more powerful aspects of Java, reflection, combined with the pattern matching paradigm for specifying tree processing. The resulting class is marginally smaller; my experiments show a typical reduction between ten to fifteen percent. However the structure of the class is much less strongly coupled to the structure of the DTD. A change to the DTD does not necessarily imply a change to the processing class. Even if it does, a change may not be a substantial one... The need for a scheme like Hex arose because I was looking for a way to break the increasingly unwieldy Weave class into something smaller and more manageable. The use of recursive descent does not produces small classes if the DTD itself is large. I was also searching for a way to make it easy to generate a variety of outputs from a single DTD. I'd experimented with output templates in my "Intranet in a Box" project a few years ago, but this turned out to be rather unsatisfactory as I was constantly finding new situations that the template solution could not easily cope with. I've also spent part of this year trying to adapt Prolog style Horn clauses to this task. This idea looked promising, but after some effort on my part, did not really seem to offer the elegance and simplicity of a satisfactory solution. So Hex is my third attempt at improving XML processing, and I think I've struck gold this time. Hex has the following advantages: (1) The delegate classes tend to be smaller (experiments show this is around a 15% reduction) (2) The coupling between a delegate class and the XML is much looser. (3) Pattern matching is a powerful way of expressing tree processing..." [Why call it Hex? - Officially Hex stands for Highly Evolvable XML processing. Actually I named it after the computer Hex in the Terry Pratchett discworld novels. Ever since I first read about this I've been looking for an opportunity to name a method "redoFromStart"; and this turned out to be my chance. I haven't found a good use for an "OutOfCheeseException" yet.] Marius documentation, cache.
Gavin Nicol wrote on DSSSList: "FYI. The DOM spec is written in XML, and the CORBA and JAVA binding automatically generated from it. I developed a DTD for literate programming for interfaces definitions. . ." (17 Mar 1998)

DSSSL and Literate Programming

[CR: 20010412]

"An Approach to Literate Programming With SGML Architectures," By W. Eliot Kimber.
"Using SGML Architectures and DSSSL to Do Literate Programming." By W. Eliot Kimber.
[April 13, 2001] "DocBook-based Literate Programming." Posting from Mark Wroth. April 10, 2001.
[November 03, 1998] DocBook and Jade for Literate Programming By W. Eliot Kimber.
[November 06, 1998] "A DTD for Literate DSSSL Programming with DocBook." From Norman Walsh. A DocBook-derived DTD with the DSSSL architecture. - 'This DTD is an extension to DocBook that conforms to the DSSSL architecture. This means that instances of this DTD can be legal DSSSL stylesheets and (almost) legal DocBook documents simultaneously.'
[November 17, 1999] On Literate Programming with SGML and XML: "...you may be interested in the self-documenting DSSSL stylesheets at http://www.oreilly.com/%7Ecrism/dsssl/orastyle/, which uses a slightly different DocBook/DSSSL approach from Norm [Walsh]'s. I'm looking towards an XML/Perl literate programming approach for a project, and if I don't end up using SWeb, I'll definitely post something about it." [Christopher R. Maden, 1999-11-18.]

General Reference

LiterateProgramming.com - Web site by Daniel Mall
Literate programming. From Wikipedia.
The nuweb system for Literate Programming
Literate Programming. By Donald E. Knuth [Wikipedia]. CSLI Lecture Notes, 27. Stanford, California: Center for the Study of Language and Information, 1992. ISBN 0-937073-80-6. "Literate programming is a methodology that combines a programming language with a documentation language, thereby making programs more robust, more portable, more easily maintained, and arguably more fun to write than programs that are written only in a high-level language. The main idea is to treat a program as a piece of literature, addressed to human beings rather than to a computer. The program is also viewed as a hypertext document, rather like the World Wide Web."
[September 16, 2000] "Literate Modelling - Capturing Business Knowledge with the UML." By Jim Arlow, Wolfgang Emmerich, and John Quinn. "We have found during several large UML projects at British Airways, that non-technical end-users, managers and business domain experts find it difficult to understand UML visual models. This leads to problems in requirement capture and review. To solve this problem, we have developed the technique of Literate Modelling. Literate Models are UML diagrams that are embedded in texts explaining the models. In that way end-users, managers and domain experts gain slight but useful understanding of the models, whilst object-oriented analysts see exactly and precisely how the models define business requirements and imperatives. We discuss some early experiences with Literate Modelling at British Airways where it was used extensively in their Enterprise Object Modelling initiative. We explain why Literate Modelling is viewed as one of the critical success factors for this significant project. Finally, we propose that Literate Modelling may be a valuable extension to many other object-oriented and non object-oriented visual modelling languages." [cache
[January 16, 2001] URN Namespace for Literate Programming: Anthony B. Coates. By Anthony B. Coates. Network Working Group, Internet Draft 'draft-coates-urn-namespace-01.txt'. October 17, 2000. URI: http://www.theoffice.net/abcoates/. "This document describes a URN namespace for use in identifying XML namespaces for use with applications created by the author, Anthony B. Coates. In particular, the author develops applications for literate programming. XML namespaces require a URI to identify them. While URLs have commonly been used for this purpose, the recent controversy over the use of relative URLs for namespaces has highlighted the deficiency in using locators (which are generally assumed to name an actual resource) for disambiguating XML namespaces. This has moved the author to request a URN namespace with the ID 'abc' (the author's initials). For professional reasons, the author finds himself changing continent every few years, and hence uses e-mail and Web redirection services whose domain names are not under his control, and hence are subject to change without notice. The assignment of a persistent URN would remove any inherent dependence on URLs outside of the author's control. This namespace specification is for a formal namespace." See: "Namespaces in XML." [cache]


SEARCH \| ABOUT \| INDEX \| NEWS \| CORE STANDARDS \| TECHNOLOGY REPORTS \| EVENTS \| LIBRARY

Contents

SGML and Literate Programming

XML and Literate Programming

DSSSL and Literate Programming

General Reference