Jon Bosak's Revised XML Document Collections

Date: Fri, 30 Jan 1998 22:17:03 -0800
From: Jon.Bosak@eng.Sun.COM (Jon Bosak)
Subject: Revised XML Document Collections

[July 1999] See now: The Plays of Shakespeare in XML. Jon Bosak (Sun Microsystems) has announced the availability of version 2.0 for the set of plays of Shakespeare in XML format. "The Shakespeare set is the companion to a set of religious works marked up in XML that was updated almost a year ago with a revised DTD and a set of DSSSL stylesheets. [Jon says:] Both sets began as ASCII files in the public domain. I marked them up in 1992 as a beginner's exercise in SGML DTD and stylesheet design and released them in 1996 as the earliest examples of real documents in XML -- so early, in fact, that they were not completely compliant with the XML Recommendation as finally approved in February 1998. The Religion set achieved XML compliance with the release of version 2.00 in September of 1998, and now the Shakespeare set joins it in the same happy state. When taken with their corresponding DTD, the plays are both valid SGML (according to nsgmls) and valid XML (according to nsgmls, the Java Software Project X parser, and the validator developed by the Scholarly Technology Group at Brown University). I invite everyone to check these files with their own favorite parsers and let me know if anything is found amiss. . ."


Revised versions of my XML-tagged Religion and Shakespeare sets are now available at:

As usual, I note that the documents in these collections do not exercise most of the features of XML, but they are real documents of fairly considerable size that are useful in trying out certain kinds of XML tools. They are also fun to read.

I have taken advantage of this revision to incorporate some corrections that have been accumulating since these collections were first made publicly available in 1994. I would like to thank everyone who has contributed to this effort, especially the anonymous workers who created the ASCII texts upon which the marked-up versions of the religious works were based; Moby Lexical Tools, for putting the ASCII versions of Shakepeare's plays into the public domain; Eve Maler, for her help in getting my old SGML DTDs into XML; Simon St. Laurent, for finding a patch of bad markup in Macbeth; and Yuichi Tanaka, for a number of small corrections to the Old Testament and in particular for drawing my attention to the missing subdivisions in Psalm 119, which have been restored and are now reflected in new div and divtitle elements in the tstmt DTD.

These files may be freely distributed as long as the integrity of the sets is maintained.


 Jon Bosak, Online Information Technology Architect, Sun Microsystems
 901 San Antonio Road, MPK17-101        |  Best is he that inuents,
 Palo Alto, California 94303            |  the next he that followes
 ISO/IEC JTC1/WG4::NCITS V1::SGML Open  |  forth and eekes out a good
 Davenport Group::W3C XML WG and SIG    |  inuention.

xml-dev: A list for W3C XML Developers. To post,
Archived as:
To (un)subscribe, the following message;
(un)subscribe xml-dev
To subscribe to the digests, the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (