Publicly Available Software for SGML/XML/DSSSL
Introduction
Priority is given to "public" SGML/XML software in this document database since the scope of interest is mainly the Internet, where the ethic of public gift is highly esteemed. The wealth of SGML software made freely available for public use is evidence of that ethos. As a supplement to the links and information provided on public SGML software below, readers should consult Steve Pepper's "Whirlwind Guide to SGML Tools and Vendors." See the main bibliographic entry for the Whirlwind Guide for a document abstract and detailed information about its contents.
See also the detailed software summary for 207 products extracted from the technical report of Eila Kuikka and Erja Nikunen [updated January 1998]: (a) the full bibliographic entry, or (b) the overview in the "Commercial SGML Software" page. NICE Technologies [November 1996] also has an online database of SGML vendors and products (local archive copy).
Primary sections in this document include the following -- however infelicitous the taxonomy for software categories. See the Contents listing to link directly to a particular description.
- SGML Parsers
- SGML/HyTime Editing, Browsing, and Searching Tools
- SGML Data Conversion, Transformation, and Manipulation
- SGML Formatting Tools
- DSSSL Software Tools
- XML/XSL/XLL Software Tools
Public SGML Software: Table of Contents
- SGML Parsers
- SP: James Clark's SGML Parser Toolkit: SP
- parseDTD - DTD parser package for SP
- Graphical Front Ends for SP
- ARC-SGML: Charles Goldfarb's Almaden Research Center SGML Parser
- ASP-SGML: Jos Warmer's Amsterdam SGML Parser
- SGMLS: James Clark's SGMLS parser
- YASP: Pierre Richard's Yorktown Advanced SGML Parser (or: 'Yet Another SGML Parser'
- YAO (Yuan-Ze--Almaden--Oslo project) Parser Materials
- SGML/HyTime Editing, Browsing, and Searching Tools - For DTDs and instances
- Lennart Staflin's PSGML
- Emacs LISP Mode - sgml-mode.el
- tdtd - Emacs Macro Package for Editing SGML/XML DTDs
- Panorama: SoftQuad's SGML Viewer for WWW
- HoTMetaL: SoftQuad's HoTMetaL editor for HTML
- HyBrick - SGML/XML Browser
- The WP Project
- GRIF Symposia: "A Collaborative Authoring Tool for the World Wide Web"(HTML and XML)
- perlSGML - Perl programs and libraries (Earl Hood)
- Carthage, dpp, and Bison tools by Michael Sperberg-McQueen
- DTDParse, by Norman Walsh
- Fred - The SGML DTD/Grammar Builder
- NORMDTD (by Richart Light)
- Babble - Synoptic Text Browsing/Searching Tool
- IADS: Integrated Authoring and Display System
- SARA (SGML-Aware Retrieval Application)
- Ispell for SGML
- Syntext -- the SGML Grammar Grapher
- MtSgmlQL, the SgmlQL interpreter
- 'sgrep' grep-like searching of structured documents
- Inside & Out, from ZGDV
- MU: Forms Assisted SGML Markup
- Markus Hoenicka's SGML/DSSSL Setup for Windows NT
- SGML Data Conversion, Transformation, and Manipulation
- Rainbow
- ICA: Integrated Chameleon Architecture
- STIL - `SGML Transformations in Lisp'
- CoST (Copenhagen SGML Tool, UNIX)
- costwish - SGML postprocessor and renderer based upon CoST
- SGMLS.pm and sgmlspl: A Simple Post-Processor for SGMLS and NSGMLS
- OmniMark LE
- LT NSL and NSL (Normalised SGML Library)
- TclYasp SGML toolkit
- Python for XML/SGML Processing
- I4I S4-Desktop V2.1 SGML middleware
- SENG: SGML/Scheme Transformation Engine
- SGML-SPGrove
- SGMLC (-Lite) products for MS-Windows
- SGML Formatting Tools
- format: Thomas Gordon's QWERTZ SGML -> LaTeX formatting package
- gf: Gary Houston's general formatter program
- Jörg Wittenberger's Typeset Package
- Jörg Wittenberger's SDC Package
- SGML-Tools [Was: Linuxdoc-SGML]
- TEItools
- MetaMorphosis - SGML/XML Tree Transformer
- gmat: an SGML Publishing System
- SGML2TeX - SGML-to-TeX converter
- Ken MacLeod's Generalized Document Objects (GDO)
- tei2latex - TEILITE to LaTeX2e
- DSSSL Software Tools
- Jade - James [Clark]'s DSSSL Engine
- Jade MIF Backend
- YADE (Yet Another DSSSL Engine)
- DSC---DSSSL Syntax Checker
- DSSSL Developer's Toolkit
- Kawa - Java-based Scheme system (SENG)
- psgml-dsssl
- panodssl
- psgml-jade
- Jadetex Package
- DSSSL editing under emacs (dsssl/scheme mode)
- SGML/DSSSL Presentation Development Application
- XML/XLink/XSL Software Tools
- Lark, an XML processor
- DXP - DataChannel XML Parser
- [NXP - Norbert's XML Parser]
- Microsoft XML parser in Java (MSXML)
- XP, an XML parser in Java (James Clark)
- expat - XML parser in C (James Clark)
- [XMLTok - XML parser in C (James Clark)]
- SX - An SP application for SGML to normalized XML
- SAX - the Simple API for XML
- FREE-DOM - W3C DOM API using SAX (formerly: SAXDOM)
- Saxon: An Open-Source XSLT Processor
- XAF - an XML Architectural Forms Processor
- XML Testbed - Java XML application environment
- DAE SDK and DAE Server SDK (Copernican Solutions)
- IBM XML for Java - validating XML processor in Java
- JUMBO - XML browser/editor
- LT XML - XML toolset
- RXP XML (SGML) parser program
- XED - A WYSIWYG XML instance editor
- Ælfred XML Parser
- DataChannel XML Development Environment (DXDE)
- Tcl XML Parsing Package
- XML Editing Mode in PSGML
- XSLJ: Jade-compatible XSL-to-DSSSL translator
- docproc - an XML + XSL document processor
- DTDGenerator - XML DTD Generator
- Near & Far Designer - DTD Design Tool
- The Ace Scripting Language
- HXA/HXP - Hubick's XML Analyzer, Parser
- Microsoft XML Notepad
- xmlproc: A Python XML parser
- xmlarch.py: An XML architectural forms processor
- DB2XML
SGML Parsers
SP: James Clark's SGML Parser
[CR: 20001011]
James Clark's SP parser toolkit is the successor to his SGMLS parser. Formally, SP is "An SGML System Conforming to International Standard ISO 8879 -- Standard Generalized Markup Language" [and] "A free, object-oriented toolkit for SGML parsing and entity management."
[October 11, 2000] SP development (OpenSP) in the OpenJade project. OpenJade Source Control Repository Home Page". See also the project summary page. Contact Matthias Clasen. OpenSP-1.4, cache. See also OpenSP-1.5 pre-release in CVS.
[March 2000] New Version of OpenSP from the OpenJade Team. Matthias Clasen (Mathematisches Institut, Albert-Ludwigs-Universität Freiburg) has announced the availability of a new version of OpenSP (OpenSP-1.5pre1). OpenSP is a variant of James Clark's SP SGML parser, maintained by the OpenJade team. "The OpenJade team has made a prerelease of OpenSP-1.5 available at ftp://openjade.sourceforge.net/pub/openjade/OpenSP-1.5pre1.tar.gz. Changes in version 1.5 include: (1) More of Annex K supported: Common data attributes can now be specified in external entity declarations. (2) The architecture engine supports #MAPTOKEN. (3) The multibyte version of OpenSP now uses 32bit chars and supports the full UTF-16 range 0x0000-0x10ffff." Bugs in the release should be sent to the development team at jade-bugs@infomansol.com." OpenJade "is a project undertaken by the DSSSL community to maintain and extend Jade. OpenJade is distributed under the same license as Jade. Jade is James Clark's implementation of DSSSL -- Document Style Semantics and Specification Language -- an ISO standard for formatting SGML (and XML) documents."
[March 10, 1998] See the announcement from James Clark for the public availability of SP version 1.3 and Jade version 1.1. "The main change in SP 1.3 is better support for XML based on the Web SGML TC. In Jade 1.1 the main changes are the experimental extensions for XSL (documented in dsssl2.htm), and the use of XML for the FOT backend's output." See Clark's Web site for detailed information. Note to SP and Jade users who depend upon the architectural processing support: the appropriate ArcBase processing instruction is now <?IS10744 ArcBase DSSSL>, and no longer <?ArcBase DSSSL>; SP and Jade will now require the former, on penalty of an error message (ca.) "jade:E: specification document does not have the DSSSL architecture as a base architecture. . ." or similarly. Thanks to Eliot Kimber (ISOGEN International) for clarification on this point. Also: Jade 1.1 and sp 1.3 for OS/2 provided by David J. Birnbaum.
[February 16, 1998] An announcement from James Clark for a new test release of SP (version 1.2.92) and Jade (version 1.0.93). The main changes in Clark's SP package since version 1.2.91 are enhanced support for XML based on the final WebSGML Adaptations Annex (ISO 8879 Annex K) and the inclusion of the SX application (for converting SGML to normalized XML). [SP version 1.2.92 and Jade version 1.0.93, sources, archive copy]; [SP version 1.2.92 and Jade version 1.0.93, Win32 binaries, archive copy]
[October 17, 1997] An announcement from James Clark describes a test release of SP with improved XML support. This test/experimental version is available via FTP as part of a Jade test release: source, or Win 32 binaries. In this distribution, SP supports "a number of key features from the WebSGML SGML TC," including: unbundling of SHORTTAG, feature to allow elements declared EMPTY to have end-tags, duplicate enumerated attribute tokens are allowed, support for multiple ATTLIST declarations for a single element type, relaxation of rules on use of parameter entity references inside groups, feature that turns off SGML's traditional record end rules, NESTC (net-enabling start tag close) delimiter, support for predefined single character entities in the SGML declaration (lt, amp etc), etc. See the text of the announcement for full details about this SP test release.
[September 03, 1997] As of this time, the most recent version of SP is also available as part of James Clark's Jade package.
[October 28, 1997] Announcement from James Clark for a "very preliminary release of SX, an application built with the SP library for converting SGML to XML." This tool will eventually be included in the standard SP distribution. SX (the provisional name) "parses and validates the SGML document contained in sysid... and writes an equivalent XML document to the standard output. SX will warn about SGML constructs which have no XML equivalent." The distribution includes both source and Win 32 binaries (the sp120u.dll file included in the SP 1.2.1 Win32 Unicode binary distribution is required). Note that the program "does not yet provide enough to handle the situation where you want to migrate your document source from SGML to XML. In particular it doesn't try to preserve entity references; all entities are expanded."
Note: this paragraph is not up-to-date for SP version 1.2, released in September 1997; see the official documentation, and/or the links in the description of SP version 1.2. . . The current version is SP 1.1.1 (July 30, 1996). SP is a "free, object-oriented toolkit for SGML parsing and entity management." SP is written in C++, supports the LINK feature, is reentrant (a single process can use multiple parsers at the same time), is command-line compatible with SGMLS, includes an application [nsgmls] to generate sgmls-style output format, and an application [rast] to generate RAST output format (like SGMLS) conforming to ISO/IEC 13673:1944. Other parser tools include [sgmlnorm], a simple SGML tag normalizer, and [spent], a facility for printing an SGML entity on standard output. SP supports any concrete syntax allowed by ISO 8879, and supports large character sets (can be compiled to use 16-bit characters internally; supported systems include UTF-8, Unicode/UCS-2, UJIS/EUC, and Shift-JIS). It is said to be fast for large documents. In addition to the C++ source code, binaries [nsgmls and rast] are available for MS-DOS (SP version 0.2) and several UNIX systems. The MS-DOS binaries use a 32-bit DOS extender (included in the distribution), so that the MS-DOS 640K conventional memory barrier should not be a limiting factor in the use of SP.
In the most recent releases of SP, James Clark has also issued some very useful tools that handle entities and "normalize" SGML documents in various ways, as specified in command line options. For example, SPAM (SP Add Markup) will provide canonical SGML when SHORTTAG and OMITTAG have been used in the SGML source. The output SGML is determined by the user's specification. SPAM (SP Add Markup) thus serves as a markup stream editor. See the documentation from the official site for complete details. Version 1.1 also supports Architectural Form Processing [mirror copy], on which, see the following "toy example".
[April 10, 2000] XML Base Architectures in SP. Steve Newcomb writes: "You can now use SP to validate the conformance of XML documents to base architectures (meta-DTDs). TechnoTeacher has created a version of SP with full industrial-strength support for the alternative PI-based "Base Architecture Declaration" syntax. The enhancement builds on pioneering work done by Luis Martinez while he was working at TechnoTeacher, and it has recently been brought up to industrial strength by Peter Newcomb. Because of urgent need in certain industrial quarters (mortgage, healthcare, etc.), we've placed binaries of this version of SP at our FTP site: ftp://ftp.techno.com/TechnoTeacher/SPt..." [cache]
[September 1996] Commercial support for SP is provided by TechnoTeacher, Inc. - NB, James Clark himself has no commercial connection with TechnoTeacher, Inc. See the support announcement.
[November 25, 1997] See the announcement for a GC-enabled spgrove application, from Vladimir V. Tsychevski.
Other links:
- [September 03 [09], 1997] Announcement from James Clark for the release of SP version 1.2 -- the version of SP included with Jade version 1.0. New features in SP version 1.2 (other than bug fixes) are as follows: (1) "The Extended Naming Rules TC is supported. The extensions supported in external concrete syntaxes have been changed for compatibility with this [Extended Naming Rules were specified in Annex J of ISO 8879:1986, added by the 1996 TC = TC for Extended Naming Rules for SGML: N1896Rev]; (2) The handling of character sets in the multi-byte version is more sophisticated. The character sets HTML page gives more information.; (3) SP has built-in knowledge of many more base character sets; (4) nsgmls will report empty elements if the
-oempty
option is used." SP 1.2 etc. "adds support for (XML) documents that are merely well-formed. This is enabled by using-wno-valid
. There's also an undocumented-wxml
switch that warns about various things that are legal SGML but not XML." See the main SP page on James Clark's WWW server for the full documentation. SP 1.2 is available in several packages, including source code and binaries with Unicode support for Windows 95 and Windows NT . - Hints about enhancements possibly in SP version 1.2, from test version 1.1.2; see summary on the "What's New" page (February 18, 1997). Note the update from James Clark: "A new release of SP is available as part of Jade 0.5 from ftp://ftp.jclark.com/pub/jade/jade0_5.zip. This fixes the compilation problems with gcc as well as a couple of other minor glitches. This SP release should be considered a beta release. [February 21, 1997]
- Announcement from James Clark for version 1.1.1 of SP. Version 1.1.1 represents a minor revision: "The only serious bug 1.1 is [was] the incorrect handling of colons in SGML_CATALOG_FILES on MS-DOS and Windows machines."
- For configuration of SP and Jade, note that Henry S. Thompson (HCRC Language Technology Group, University of Edinburgh) also has a 'configure' file (uses 'install' - from X11R5, mit/util/scripts/install.sh); it has been tested for Jade 1.1. [local archive copy, 1998-09-25]; [local archive copy, earlier version]
- Compilation notes: Notes on compiling and installing SP 1.0.1 for several systems, by Nelson H. F. Beebe (Email: beebe@math.utah.edu) [ mirror December 26, but use the canonical version if possible]. As of 15-November-95, Nelson Beebe had successfully compiled SP 1.0.1 on these systems: DEC Alpha OSF/1 3.0, DECstation 3100 and 5000 ULTRIX 4.3, Hewlett-Packard 9000/735 HP-UX 10.0.1, IBM RS/6000 AIX 3.2.5, Silicon Graphics Indigo/2 IRIX 5.3, Sun SPARCstation SunOS 4.1.3, and Sun SPARCstation Solaris 2.3 and 2.4.
- Programming with SP
- A mailing list for programmer-level discussions of SP. See also the entry in the lists page. Mail subscription requests sp-prog-request@jclark.com. Messages for the list should go to sp-prog@jclark.com.
- Nelson Beebe's collection of binaries for various Unix machines
- [*NB June 10, 1996. The following links are somewhat out-of-date; see the main site] Some very accessible written description and documentation (HTML format) from a recent [June 10, 1996] version (test release 1.1). See the current and test releases (JClark FTP server) for more current and more complete information. See here provisionally (with some incomplete linking):
- Summary of SP's features
- What's new in SP?
- How to get SP
- nsgmls, a replacement for sgmls [incompletely linked here]
- spam, a sophisticated normalizer, perhaps better thought of as a markup stream editor [incompletely linked here]
- Generic API to SP
- Catalogs: Using SGML Open catalogs to generate system identifiers
- sp-1.1.1 with gcc 2.7.2 under Solaris 2.5 (binaries)
Pointers to the latest released version of the SP parser (version 1.0.1: October 21, 1995) and its description:
- SP - an SGML Parser (Official WWW Page for SP)
- SP source-code changes for "port of James Clark's SP version 1.1.1 to Mac PowerPC as a set of MPW tools, including Open Transport support for HTTP" December 1996. [Ashley Colin Yakeley]
- FTP to JClark. Data is mirrored on the SGML Repository FTP server
- FTP to Darmstadt
- WWW link to JClark
- Overview of SP (described by James Clark) with links to FTP server
- Formatted man pages for (version 0.2) nsgmls and rast applications. [Note: get version 0.3 now]
- See also: David Megginson's SGMLS.pm: A Post-Processor for SGMLS and NSGMLS
- [July 16, 1998] Porting SP 1.3 to Macintosh, by Peter Robinson
parseDTD - DTD parser package for SP
[CR: 19980612]
[February 06, 1998] From Peter Newcomb, of TechnoTeacher Inc.: parseDtd. It parses an SGML declaration set in the absence of a document (e.g., can parse a DTD and spit out information about the elements and attributes defined in it). It is based on the SP SGML parser, version 1.2.1, written by James Clark. Peter's description: "I recently put together a small SP-based package that parses declaration sets irrespective of particular documents, returning the result as an SP DTD object."
Links:
- Information: (FTP Directory)
- Sources: ftp://ftp.techno.com/TechnoTeacher/parseDtd/parseDtd.zip
- [June 12, 1998] Patch to update parseDtd for SP 1.3
- The README file; [local archive copy, 980508]
- Some discussion about parseDTD in email
- Source, local archive copy, 980508
Graphical Front Ends for SP
[CR: 19971028]
Probably there are several such front ends. [Please let me know what's missing in the list below.]
- SP Wizard: Advertised functionality: ". . . a freeware 32 or 16 bit Windows interface using OLE Automation wrappers around NSGMLS and SPAM. (1) Allows you to interactively change settings of all command line parameters and environment variables. (2) Allows multiple files to be parsed at the press of a button. (3) Displays clickable error messages which puts the cursor in front of the offset within the line that was in error. (4) Allows you to correct errors as you find them. (5) Search and Replace. (6) Undo up to 32000 characters at multiple levels. (7) Prints reports of error messages and files that parsed with no errors. (8) OLE Automation for NSGMLS, SPAM and execution of DOS programs which can be used from Visual Basic and Visual C++. (9) All SP files were taken from the SP 1.1.1 distribution."
- Apropos of the above: Announcement from Larry Robertson for "a web page with a sample program and some notes on the Grove OLE Automation class. . . The Grove OLE Automation Class is basically intended for parsing and fully supports the 9401 catalog; it is extremely fast and easy to use." Title: How to use the Grove OLE Automation Class in Visual Basic 5.0. "The sample program will batch parse sgml and html files. It will print reports has a very simple editor." [September 13, 1997]
- CSW Parser Plus. "CSW Parser Plus is a graphical front end for the popular SP parser, running under Windows NT/95. With CSW Parser Plus, its easy to set up options for the SP parser and process SGML files one at a time, or in batches. . . CSW Parser Plus is packed with useful features to help set up and run the SP parser, including: (1) set the SGML Declaration and DTD; (2) process one document file, or a batch of files; (3) view errors on screen, or redirect to a file; (4) set warning and output options; (5) define locations for multiple catalog files; (6) launch editors and processing tools"
- RUNSP2: a user-friendly Windows shell for NSGMLS, from Richard Light. "RUNSP2 is designed to let you run the NSGMLS parser in a Windows environment. It provides standard Windows facilities for opening a file to be parsed and running the parser, but goes beyond that by 'reading' the error messages, and providing a helpful editing environment in which the user can correct the errors found. The original idea was to support all the command-line options of NSGMLS via menu options or a dialog box, and I will go on to do this if the basic idea works well enough to justify the effort. At present this program just runs the parser (NSGMLS) and the simple normalizer (SGMLNORM). Later, I may extend it to run all the programs in the SP suite." source, and local archive copy [September 18, 1997].
- See also Groves and Grove Plans in SGML/DSSSL/HyTime
ARC-SGML: Charles Goldfarb's Almaden Research Center SGML Parser
ARC-SGML was one of the first SGML parsers to be made publicly available, and it provided the basis for the development of SGMLS by James Clark.
- ARC-SGML from the SGML Repository
- ARC-SGML from Exeter
SGMLS: James Clark's SGMLS parser
[CR: 19970909]
SGMLS is probably the most widely used "public domain" parser as of late 1994. It has been incorporated as a validating parser into several commercial products as well. It is superseded now in part by James Clark's "SP" parser (and perhaps by the YASP and YAO parser materials) though for many simple validation tasks, SGMLS remains quite useful. SGMLS is also very fast. Its output is intended for a structure-oriented application, and this output is trivially parsable. SGMLS has been ported to many platforms, including OS/2.
- Get SGMLS Source (James Clark): Remote file ftp.jclark.com/pub/sgmls/
- SGMLS sources mirrored at SGML Repository
- SGMLS sources mirrored at Exeter
- [September 09, 1997] Macintosh versions of the SGMLS parser are available from the Brown University Scholarly Technology Group SGML Archives, maintained [September 1997] by David G. Durand. URLs: (1) 68K version: ftp://ftp.stg.brown.edu/pub/sgml/sgmls_68K.hqx; [local archive copy]; (2) FAT version: ftp://ftp.stg.brown.edu/pub/sgml/sgmls_FAT.hqx; [local archive copy]; (3) Power PC version: ftp://ftp.stg.brown.edu/pub/sgml/sgmls_PPC.hqx; [local archive copy]. [thanks to Elli Mylonas for the URLs]
- An SGMLS help file prepared by Michael Sperberg-McQueen [September 1993, revised January 1994] explains the SGMLS entity manager's use of the environment variable SGML_PATH and other strategies for locating entities. The document is available in HTML format: "Notes on sgmls handling of search for entities" [mirror copy, January 1996]. Or obtain it via FTP from the SGML Repository or via email from the TEI/UICVM Listserver. In the latter case, send email with the command GET SGMLSENT DOC TEI-L in the body of the email message to listserv@uicvm.uic.edu). Or get a text version of the help file from the local WWW server.
YASP: Pierre Richard's Yorktown Advanced SGML Parser (or: 'Yet Another SGML Parser')
[CR: 19970405]
- [April 1997.] Announcement from Christophe Espert (Electricité de France, Direction des Etudes et Recherches) for a new release of the YASP SGML parser interface. YASP has been implemented as a DLL for Windows NT and Windows 95, but the source code may also be compiled on Unix and other systems. The new version of YASP (1.36) has functionality "that will help enhance GROVE building in applications. YASP now reports ELEMENT, ATTLIST, NOTATION and ENTITY declarations as it parses them. YASP still gives access to the fully resolved DTD after the document prolog has been parsed. Therefore objects of classes in the PRLGABS0, PRLGABS1 and PRLGSDS modules can be built."
- Announcement from Christophe Espert (Electricité de France, Direction des Etudes et Recherches) for the availability of YASP ('Yet Another SGML Parser', developed by Pierre G. Richard), on Windows 95 and Windows NT. August 27, 1996. URL: ftp://ftp.edf.fr/pub/SGML/YASP.
- Announcement from Christophe Espert for a new distribution package for YASP, for DOS and Windows (July 1996); [winyasp.zip, 1258734 bytes] "It includes source code, documentation and binaries for Windows. The YASP library is a Dynamic Link Library. It has been built with Visual C++. . ."
- April 1997 sources: ftp://ftp.edf.fr/pub/SGML/YASP; archive copy
- April 1997: documentation in PDF format
- FTP YASP from the SGML Repository
- FTP YASP from Exeter
- FTP: ftp://ftp.edf.fr/pub/SGML/YASP (A new package for the YASP parser, available for UNIX; from Christophe ESPERT ]Christophe.Espert@der.edf.fr], February 1996)
- See also the TclYasp SGML toolkit
YAO (Yuan-Ze--Almaden--Oslo project) Parser Materials
- FTP YAO from the SGML Repository
- See the description of the Project YAO: "Project YAO Announced [December 7, 1993]," <TAG> 7/1 (January 1994) 20.
- Pekka Kataja's UNIX port of YAO
PSGML, by Lennart Staflin
[CR: 20001201]
PSGML is described as "a major mode for editing SGML and XML documents. It works with GNU Emacs 19.34, 20.3 and later or with XEmacs 19.9 and later [perhaps also Lucid Emacs 19.9, OEmacs, NTEmacs]. PSGML contains a simple SGML parser and can work with any DTD. Functions provided includes menus and commands for inserting tags with only the contextually valid tags, identification of structural errors, editing of attribute values in a separate window with information about types and defaults, and structure based editing." David Megginson's personal testimonial: "XEmacs+PSGML is my editor of choice for all of my XML and SGML work. I've used it to create probably close to 10,000 printed pages of documentation over the last few years, and have used XEmacs's regular-expression facilities for adding complex markup to e-texts. It's probably not suitable for naive users (give 'em XMetaL or WordPerfect, or maybe XED), but for the tech-savvy, it's great." [XML-DEV]
[December 06, 2001] "Using Emacs for XML Documents. Install add-ons to the powerful Emacs text editor to build a platform-independent (and free) environment for working with XML." By Brian Gillan (Software engineer, ID Technology and Design Group, IBM). From IBM developerWorks XML Zone. December 2001. ['Emacs, best known as a powerful text editor for UNIX developers, can be an ideal XML editor for MS-DOS, Windows, and MacOS. The author describes how to install the right add-on packages and modify settings to create a powerful XML/SGML editing-and-validation environment in Emacs with extensions such as PSGML and OpenSP. Most of the work involved in setting up this environment ends with downloading and installing Emacs and the individual packages, but you must also configure Emacs properly and enable the DTDs you plan to work with. The article includes sample configuration files and XHTML DTDs.'] "Though it's best known as a powerful text editor favored by UNIX developers, Emacs can be used to work with XML in non-UNIX platforms such as Windows, MS-DOS, and MacOS. Emacs works as a full-blown development environment for processing text, writing applications, and, as I'll discuss, creating structured information like XML and SGML. I use it as a general-purpose editor for creating and managing some of my programming projects, and for writing XHTML and playing around with SGML and XML. In fact, I used it to write this article. This article tells how to install Emacs and the extensions PSGML and OpenSP. It also outlines how to customize Emacs to make it function with a variety of DTDs. I present many of the Emacs customizations one piece at a time. However, you can download a zip file with sample DTDs and all of the Emacs customizations. My intent is to get you started using Emacs by providing you with just enough information for you understand what's going on. Then you'll be able to add DTDs and customize Emacs based on your needs and preferences..." PSGML version 1.2.3 was released on SourceForge November 8, 2001; see the download. [PSGML version 1.2.3, November 8, 2001, cache]
[December 01, 2000] Update notice 2000-10-27. "The future of PSGML: It is currently not in active development. I plan to put out one or two bug fix releases and the move the sources to source forge (possibly after restructuring the code a bit and merging in various patches and additions that has been send to me.) I will then invite others to take an active part in the future development of PSGML. To start this I have created two mailing lists on source forge. A psgml-user for general discussion and questions about PSGML and psgml-devel for discussion about the future development of PSGML. Visit the SourceForge: Mailing Lists for PSGML page for subscription information..."
- Description HTML version of PSGML
- [March 2001] See the source for PSGML version 1.2.2, from SourceForge.
- [October 14, 1999] Staflin released a beta version (1.2.0) with XML editing support. [local archive copy]
- [1999-10-14] Kai Grossjohann described a problem with incompatible system identifiers when using psgml to edit XML documents; David Megginson supplied the lisp code for a provisional fix.
- See also David Megginson's enhancements for XML Editing Mode in PSGML and psgml-dsssl (DSSSL editing mode). Updated 980223 and possibly later.
- Miyashita Hisashi has reportedly implemented a version of PSGML-XML that works on Meadow. Meadow ('Multilingual enhancement to gnu Emacs with ADvantages Over Windows') is a fully internationalized version of Emacs20 on MS Windows.
- Version 1.0.1 (November 20, 1996); [archive copy]
- [December 16, 1998] Bob DuCharme posted an announcement for the online availability of Chapter 2 of his book, SGML CD: "Editing SGML Documents with the Emacs Text Editor." This Adobe Acrobat version of Chapter 2 (99 pages) "assumes no initial knowledge of Emacs and provides a basic introduction to creating and navigating simple text files before it covers PSGML - Lennart Staflin's add-in that turns Emacs into a menu-driven, validating, SGML/XML editor." Bob says: "The SGML CD book is a tutorial and user's guide to free SGML/XML software, and you can link to all the software from the web page whether you want to buy the book or not. I have my own time- and keystroke-saving PSGML tricks (mostly in the form of
.emacs
lines) and I'm curious about those of other PSGML users, so I'll be posting a Web page of my own and soliciting those of others to add in a few weeks. Feel free to send them to me anytime; I'll credit all contributors." - See Markus Hoenicka's SGML/DSSSL Setup for Windows NT - including PSGML
- Editing SGML with Emacs and PSGML - Manual
- PSGML and Fonts. David Megginson explains how to map font faces to any or all of the symbols 'comment', 'doctype', 'end-tag', 'entity', 'ignored', 'ms-end', 'ms-start', 'pi','sgml', 'short-ref', 'start-tag' and so forth. This works! [June 1997]
- Another discussion (TEI-L) on fontifying/colorizing with PGSML; see also (in greater detail) David Megginson's recipe above.
- SGML: Lysator PSGML (Remote file ftp.lysator.liu.se/pub/sgml)
- FTP PSGML from the SGML Repository
- FTP PSGML from Exeter
- Setting up PSGML and sgmls for HTML, or try: this link; (courtesy of Martijn Koster, m.koster@nexor.co.uk)
- [October 14, 1998] PSGML setup instructions, provided by Peter Flynn
- [August 09, 1997] Announcement from David Megginson (Microstar Software Ltd.) for initial enhancements of PSGML to enable an XML editing mode: ". . . I patched PSGML to add an XML mode that enables XML-specific delimiters, parsing, and error-reporting -- in other words, it's a real, native XML DTD-driven editor." The new code for XML support has not yet been incorporated into the main psgml distribution, but Megginson is requesting assistance from qualified alpha testers to help debug the code.
tdtd - Emacs Macro Package for Editing SGML/XML DTDs
[CR: 20011102]
[June 09, 2001] The web site URL for 'dtd -- Emacs Major Mode for SGML and XML DTDs' is http://www.menteith.com/tdtd/. The latest version is 0.7.1. Features of tdtd revision 0.7.1 include: (1) Standalone mode for editing DTDs; (2) "Goto" menu for locating declarations within the current buffer; (3) dtd-etags function for creating Emacs TAGS files for easy lookup of any element, parameter entity, or notation's definition using Emacs's built-in tag-lookup functions; (4) dtd-grep function for searching files that shares a file history with dtd-etags for easy searching of the same files with both functions; (5) Specific font lock highlighting of declarations in XML DTDs, SGML DTDs, SGML Declarations, and System Declarations so that the important information stands out; (6) XML-specific behaviour that, at user option, is triggered by automatic detection of the XML Declaration; (7) Functions for writing and editing element, attribute, internal parameter entity and external parameter entity declarations and comments to ease creating and keeping a consistent style; and (8) Elements and parameter entity names referenced in declarations are stored in minibuffer history to minimise retyping in new declarations..." [cache cersion 0.7.1]
In March 1999, Tony Graham (Mulberry Technologies, Inc.) released an updated version of his tdtd 'Emacs Major Mode for SGML and XML DTDs'. Features in revision 0.7: (1) Standalone mode for editing DTDs; (2) dtd-etags
function for creating Emacs TAGS files for easy lookup of any element, parameter entity, or notation's definition using Emacs's built-in tag-lookup functions; (3) dtd-grep
function for searching files that shares a file history with dtd-etags
for easy searching of the same files with both functions; (4) Specific font lock highlighting of declarations in XML DTDs, SGML DTDs, SGML Declarations, and System Declarations so that the important information stands out; (5) XML-specific behaviour that, at user option, is triggered by automatic detection of the XML Declaration; (6) Functions for writing and editing element, attribute, internal parameter entity and external parameter entity declarations and comments to ease creating and keeping a consistent style; (7) Elements and parameter entity names referenced in declarations are stored in minibuffer history to minimise retyping in new declarations."
[August 03, 1998] Update of the tdtd emacs macro package for editing SGML/XML DTDs.
[May 27, 1998] The tdtd Emacs Macro Package for editing SGML/XML DTDs was updated by Tony Graham on May 24, 1998. Version 0.5.1 features: "1) dtd-etags function for creating Emacs TAGS files for easy lookup of any element, parameter entity, or notation's definition using Emacs's built-in tag-lookup functions; 2) Font lock highlighting of declarations so that the important information stands out; 3) XML-specific behaviour that, at user option, is triggered by automatic detection of the XML Declaration; 4) Functions for writing and editing declarations and comments to ease both creating and keeping a consistent style."
Previously: Tony Graham (Mulberry Technologies, Inc.) announced the availability of a tdtd Emacs Macro Package for editing DTDs (revision 3, December 14, 1997). The macro package was presented in a poster session at SGML/XML '97. The macros have been developed "intermittently over the last two years." Tony says: "The tdtd macro package for an Emacs major mode for editing DTDs is available at ftp://ftp.mulberrytech.com/pub/tdtd. The package includes font lock keywords for colour highlighting of declarations and reserved words plus a collection of macros that help when writing DTDs. The dtd-mode
is a derived mode that builds on sgml-mode
, and the features of sgml-mode
are still available." The author will gladly accept bug reports and/or enhancements.
Links:
- [March 22, 1999] dtd Version 07, March 15, 1999. [local archive copy] See also the 0.7 README document.
- [August 03, 1998] Announcement for the 0.6 release of tdtd. - The current revision is 0.6, dated August 1, 1998 [or later].
- Sources, version 0.6, archive copy
- Version 0.6 README
- [May 27, 1998] Announcement for the 0.5 release.
- [April 22, 1998] Update of the macro package to version 0.4. Changes to 'tdtd-font.el' include the addition of '(WWW)' and 'xml' as reserved words.
- Sources via FTP: ftp://ftp.mulberrytech.com/pub/tdtd
- README document
- Local archive copy, version 0.5
- Local archive copy, revision 4; April 21, 1998.
- Local archive copy, revision 3; December 1997.
- Also by Tony K. Graham of Mulberry Technologies, Inc.: xslide. The xslide package features an Emacs major mode for editing XSL stylesheets.
Panorama: SoftQuad's SGML Viewer for WWW
[CR: 19980408]
SoftQuad Panorama is a free version of SoftQuad Panorama PRO. It supports browsing (and searching?) of fully compliant SGML documents on the WWW.
- Panorama is now released (May 1995) as the "First Freeware SGML Viewer for the World Wide Web". See the public announcement and the Information Page: The Wider World of SGML on the Web. If you already have Panorama, link here
- PanoramaFree for Windows 3.1; [mirror copy]
- Register/Download Panorama Viewer [May 1997]
- The software and documentation are also available from sites in Sweden
- A list of links and resources from the University of Michigan Humanities Text Initiative: HTI Resources in support of Panorama (DTDs, style sheets, navigators, SDATA mapping files, etc.)
- University of Michigan Press ISO 12083 stylesheet (Panorama); [mirror copy]
- See provisionally the description of the commercial version, called Panorama PRO [announcement of the pre-release edition of Panorama PRO, 8-April-95]
- See a brief overview of features in Panorama's style sheet language
- Notes on Panorama for users of the EAD DTD [principles applicable to other complex DTDs. From Stephen D. Miller. [mirror copy of help document, .ZIP file with help, catalog, and entityrc]
- See help information for use of Panorama with the TEI (Lite) DTD
- See the description of SoftQuad Panorama on SoftQuad's WWW server [from a press release, October 19, 1994]; see also the announcement and feature list in mirror copy here.
- See "SoftQuad Panorama -- A Companion for Mosaic," <TAG> 7/11 (November 1994) 9.
- Eliot Kimber: Nifty...Panorama
- Scholar's Press public domain Greek font (SPIonic), and an sdata.map for adding support for SPIonic to SoftQuad's Panorama
HoTMetaL: SoftQuad's HoTMetaL editor for HTML
HoTMetaL is an unsupported version of the commercial product HoTMetaL Pro. It provides an editor/browser for (extended) HTML documents. HoTMetaL is available on a number of platforms (UNIX, MS-Windows, etc.). A tutorial for HoTMetaL Pro teaches HTML basics, supported by an HTML Quick Reference guide. The most recent [March 1995] Windows version of HoTMetaL supports some of the Netscape extensions (e.g., <CENTER>, <BLINK>), displays graphics inline, uses a stylesheet configured to look like a standard HTML browser, and supports a filter for loading plain text files and invalid HTML documents. See the posted public announcement or the fuller description on the SoftQuad server, including FTP location. Try the FTP directory ftp://ftp.ncsa.uiuc.edu/Web/html/hotmetal/Windows, and specifically the binary file ftp://ftp.ncsa.uiuc.edu/Web/html/hotmetal/Windows/hotm1new.exe).
- FTP from SGML Repository
- FTP from Exeter
- HoTMetaL executables (Remote file ftp.ncsa.uiuc.edu/Web/contrib/SoftQuad/hotmetal)
Other mirror FTP sites list for HoTMetaL
Connect to the SoftQuad server for a recent list of FTP sites in the US, Canada, and Europe that host HoTMetaL. The FTP links below are older, but may still be alive:
- ftp.ncsa.uiuc.edu:/Mosaic/contrib/SoftQuad
- ftp.ifi.uio.no:/pub/SGML/HoTMetaL
- sgml1.ex.ac.uk:SoftQuad
- doc.ic.ac.uk:/pub/packages/WWW/ncsa/contrib/SoftQuad
- askhp.ask.uni-karlsruhe.de: /pub/infosystems/mosaic/contrib/SoftQuad
- ftp.cs.concordia.ca:/pub/www
- ftp.cc.gatech.edu:/pub/gvu/www/pitkow/misc
- ftp.sunet.se:/pub/www/Mosaic/contrib/SoftQuad
- ftp.uco.es:/www
- olymp.wu-wien.ac.at:/pub/sgml/exeter/SoftQuad
- ftp.germany.eu.net: /pub/infosystems/www/ncsa/Web/contrib/SoftQuad
- ftp.informatik.uni-freiburg.de: /pub/WWW/editors/HoTMetaL
- gatekeeper.dec.com: /pub/net/infosys/Mosaic/contrib/SoftQuad
- Email to: webmaster@sq.com
HyBrick - SGML/XML Browser
[CR: 19990304]
[March 04, 1999] Ralph E. Ferris (Fujitsu Software Corporation) has announced a new release of Fujitsu's HyBrick SGML/XML browser, with expanded support for XLink/XPointer. It is available from the Fujitsu Software Corporation's Web site. New features in HyBrick V0.82 related to XLink and XPointer include: "1) XLink/XPointer error/warning info is shown in the error list dialog; 2) A 'Document Group' sub-menu has been added in the 'XLink/XPointer' menu; users can now navigate between inter-linked documents by using Document Groups as well as through individual links; 3) In the 'select link' dialog, link element 'role' values are displayed instead of GIs. This feature, as well as the 'Document Group' display feature, are particularly useful for creating and navigating 'Topic Maps.'; 4) The mouse cursor now changes its shape over links." Also new in HyBrick 0.82 are multiple stylesheet support (if multiple stylesheet PIs are present, users are presented with a dialog box to select the stylesheet they want to use), 'Reload hubdocument' function and 'Close window' function. 'HyBrick' is "an advanced SGML/XML browser developed by Fujitsu Laboratories, the research arm of Fujitsu. 'HyBrick' is based on an architecture that supports advanced linking and formatting capabilities. HyBrick includes a DSSSL renderer and XLink/XPointer engine running on top of James Clark's SP and Jade. It supports both valid and well-formed XML documents, XLink and XPointer (XLink implemented as a subset of the HyTime property set), SGML (ISO 8879), DSSSL (ISO 10179) online specification, printing and print previewing based on DSSSL stylesheets." See more on HyBrick Support for XPointer in a posting of March 4, 1999.
[February 15, 1999] Ralph E. Ferris (Fujitsu Software Corporation) posted an update on the HyBrick V0.80 support for XLink and XPointer. HyBrick is an advanced SGML/XML browser developed by Fujitsu Laboratories, the research arm of Fujitsu. HyBrick is based on an architecture that supports advanced linking and formatting capabilities. HyBrick includes a DSSSL renderer and XLink/XPointer engine running on top of James Clark's SP and Jade. It supports "both valid and well-formed XML documents, XLink and XPointer, SGML (ISO 8879), DSSSL (ISO 10179) online specification, printing and print previewing based on DSSSL stylesheets." To make the point [about HyBrick XLink/XPointer support, Ralph has] put some files with XLink/XPointer declarations in them up on the HyBrick Web site at http://www.fsc.fujitsu.com/hybrick/. These files are intended to be accessed over the Web. If your network access environment allows you to though, you can see XLink and XPointer at work over the Web by downloading HyBrick and pointing it at: http://www.fsc.fujitsu.com/hybrick/hubdoc-1.xml . . ." [see the posting for caveats and full details.] HyBrick Version 0.8 with XLink/XPointer support is now available for download.
[Earlier description:] "HyBrick" is 'an advanced SGML/XML browser developed by Fujitsu Laboratories, the research arm of Fujitsu. "HyBrick" is based on an architecture that supports advanced linking and formatting capabilities. HyBrick includes a DSSSL renderer and XLink/XPointer engine running on top of James Clark's SP and Jade. HyBrick supports: 1) Both valid and well-formed XML documents; 2) XLink/XPointer on the local file system [XPointer is implemented as a subset of the HyTime property set; Link traversal can use either "New" or "Replace" to display a new page]; 3) SGML (ISO 8879); 4) DSSSL (ISO 10179) online specification; 5) Printing and print previewing based on DSSSL stylesheets.'
[November 03, 1998] Ralph E. Ferris of Fujitsu Software Corporation has announced that HyBrick V0.8 with XLink/XPointer is Now Available for download.
Links:
- Main web site
- [Another Information Page] [English]
- Version 0.8 Announcement
- Download from CO.JP: http://www.fujitsu.co.jp/hypertext/free/HyBrick/download2.html
- Send questions or comments to: hb-staff@ml.flab.fujitsu.co.jp
The Wurd [was: WP] Project
"Wurd is an SGML capable Wurd Processor and publishing tool for multiple operating systems/platforms - although at the moment the only operating system supported is Linux. [June 1997]
[Work in progress only] WP is "a word processor being built by linux enthusiasts. . . with a native file format based on the SGML model. . .The use of SGML as the file format means that wp has an open interchange format. It will be possible to maintain World-Wide Web pages directly with wp."
- Home Page
- Home Page (http://wpprj.home.ml.org)
- Trevor Jenkins' development site
- Call for participation issued by Paul Colclough
GRIF Symposia: "A Collaborative Authoring Tool for the World Wide Web" (HTML and XML)
[CR: 19970827]
Links:
- Symposia Home Page
- Demo Version
- "Symposia doc+ - is a complete intranet publishing solution that combines a powerful WYSIWYG authoring tool, a database publishing mode and a graphical site manager in a single, easy-to-use package."
- Authoring and Formatting XML Documents
- Symposia- Welcome
- http://symposia.inria.fr/symposia/userdoc/put/writable-server.html
- Grif FAQ Document
HyBrowse HyTime Browser
[CR: 19961126]
HyBrowse is a HyTime Browser from TechnoTeacher, Inc., - a HyMinder application. "HyBrowse is a true HyTime (ISO/IEC 10744) hyperdocument browser for Windows 95 and Windows NT. It is useful for developing electronic document architectures that employ HyTime's strongly typed location-independent linking mechanisms." HyBrowse is publicly available (free) [as of November 22, 1996] for a trial period of 45 days. In addition to standard features one would expect, it supports: (1) True HyTime independent hyperlinking; (2) User-defined strong hyperlink typing with [a] icons assignable to anchor roles over entire bounded object set (BOS), [b] rendering styles assignable to anchor roles over entire BOS; (3) HyTime-conforming address elements ; (4) Aggregate location and hyperlink traversal handling; (5) Arbitrary BOS awareness allows users to add (import) a document into the current BOS; (6) Re-open browsing sessions without reparsing or reprocessing."
Eliot Kimber writes: "NOTE: HyBrowse is intended as a tool for creating prototypes and demos of HyTime features. It is not intended to be a production-quality information delivery system. The formatting features are minimal compared to Panorama or DynaText but sufficient to demonstrate the very interesting things you can do with independent links and anchors thereof. If you've been thinking of ways that HyTime hyperlinking could solve some of your information management problems but never had a way to realize or test those ideas, now you do, for free."
Links:
- Announcement for the HyBrowse HyTime Browser, from Eliot Kimber
- Announcement for HyBrowse 1.0.1, from Steven R. Newcomb
- HyBrowse description and download instructions on the TechnoTeacher server
- Supporting documentation from W. Eliot Kimber
perlSGML - Perl programs and libraries (Earl Hood)
[CR: 19970918]
perlSGML is a collection of Perl programs and libraries written by Earl Hood for processing SGML documents. The following software is available in the perlSGML distribution: dtd.pl (A Perl library to parse SGML DTDs), dtd2html (An SGML DTD documentation/navigation tool), dtddiff (a utility to list changes in a DTD), dtdtree (Generate content hierarchy trees of SGML elements), dtdview (Interactively query a DTD), sgml.pl (A Perl library to parse SGML instances), stripsgml (utility to remove SGML markup).
The 'dtd2html' tool is widely used. "What is dtd2html: dtd2html is part of the perlSGML package. dtd2html is a program that generates an HTML document (composed of several files) that documents and allows hypertext navigation of an SGML DTD."
- [September 18, 1997] Announcement from Earl Hood (University of California, Irvine) for a new release of the perlSGML toolkit. perlSGML is a collection of Perl programs and libraries for processing SGML DTDs and documents. "This release mainly includes a new set of Perl 5 modules. A new stripsgml is available and some corrections to dtd.pl are included in the release."
- perlSGML Main Page
- Documentation for perlSGML
- October 09, 1996: Announcement from Earl Hood for a new release of the perlSGML tools -- a collection of perl software for processing SGML data. These SGML software tools run under Perl versions 4 and 5. Most important changes: (a) "Hierarchial tree output of DTDprint_tree of dtd.pl modified to preserve the content model in the output. New tree format utilized by dtd2html, dtdtree, and dtdview; (b) sgml.pl rewritten to be more efficient and be useable for large files. Still more suited for simple tasks. stripsgml rewritten to utilize new sgml.pl." Available in .gz or .zip distribution format.
- December 09, 1995: Announcement for a new version or Earl Hood's perlSGML. perlSGML is a collection of Perl programs and libraries for processing SGML documents: dtd.pl (2.2.0) -- A Perl library to parse SGML DTDs; dtd2html (1.4.0) -- An SGML DTD documentation/navigation tool; dtddiff (1.1.0) -- List changes in a DTD; dtdtree (1.2.0) -- Generate content hierarchy trees of SGML elements; sgml.pl (0.1.0) -- A Perl library to parse SGML instances; stripsgml (0.1.1) -- Remove SGML markup. Changes: (1) Fixed code so it will run under Perl 4 and 5; (2) MS-DOS usage support; (3) Entity map file syntax has changed to the SGML open catalog format; (4) Support for the envariables SGML_SEARCH_PATH, SGML_CATALOG_FILES; (5) New functions added; (6) Speed improvement; (7) Bug fixes. See the text of the announcement, or link to the WWW page.
- Links on Earl Hood's page, including demos for DTDs processed (TEI, HTML 2.0, HTML 3.0).
- FTP from the SGML Repository
- FTP from Exeter
- documentation for dtd2html (Earl Hood) via CETHMAC
- documentation for dtd2html (etc) on Earl Hood's (OAC) Home Page
- FTP to Darmstadt
Carthage, dpp, and Bison tools by Michael Sperberg-McQueen
[CR: 19970122]
Several SGML grammar tools have been created and made publicly available by TEI editor Michael Sperberg-McQueen. DPP: "DPP is a parser for SGML document type declarations, intended for use as a front end for filters which modify DTDs (e.g. filters to expand all or some parameter entity references, or to rename elements, etc.). Since DPP uses the same output format as sgmls. . .many existing tools for writing filters for SGML document instance . . . can be used with DPP to make filters for DTDs." Bison tools: "The subdirectory pub/tei/grammar/bison contains files with Bison grammars and Flex scanners for SGML document type definitions, SGML document instances, and SGML declarations. See ftp://ftp-tei.uic.edu/pub/tei/sgml/grammar for fuller description of these grammar tools.
Another of the tools is a utility called Carthage. "Carthage is a yacc/lex-based parser for SGML DTDs which can delete references to undeclared elements. It can also do a few other things, depending on the run-time flags you give it." Some options include: (1) dropping or keeping marked sections; (2) warning if entities are declared twice; (3) dropping or keeping parameter entity declarations; (4) deleting named GIs from content models; (5) listing of specified classes of elements in the DTD [used, unused, default undeclared, declared]; (6) dropping or keeping comments in the output file, etc. The software is "unsupported" but "users who improve it or fix errors are requested to notify the author so he can also fix them." [extracts from the README file, dated June 17, 1996.
- The Carthage README file; [mirror copy, made August 02, 1996]
- See the database entry in the SGML/XML Web Page for dpp and other SGML grammar tools by Sperberg-McQueen
- Main FTP directory for Sperberg-McQueen SGML grammar tools
DTDParse, by Norman Walsh
[CR: 19980409]
"DTDparse reads an SGML DTD and constructs a simple, easily parsed database of its content. This database can be examined to construct other views of the DTD. The DTDparse distribution contains several scripts which use the database to extract useful information about the DTD: (1) parents lists the parents of a particular element; (2) children lists the children of a particular element; (3) dtd2man produces DocBook RefEntry pages ('man' pages in common UNIX parlance) for the components of the DTD; (4) dtd2html [unrelated to Earl Hood's program of the same name] builds an HTML web of the components of the DTD." The documentation page provides sample output for DTDs such as DocBook 3.0, HTML 3.2, ISO 12083 DTDs, TEI Lite 1.6, and the CALS Table DTD.
- DTDParse Home Page
- Version 0.97 sources; [local mirror copy]
- Sample output for the DocBook 3.0 DTD
- Sources (will require Perl)
- Contact the author (Norman Walsh, Technical Director, Online Publishing, O'Reilly & Associates, Inc.
- Links to some other tools that (sort of) generate DTD documentation from DTDs
Fred - The SGML Grammar Builder
[CR: 19980508]
"Fred is an ongoing research project at OCLC Online Computer Library Center, Inc. (OCLC) studying the manipulation of tagged text. As a service to the community, OCLC has decided to make several portions of Fred freely available via a WWW server." These services include (subject to documented limitations): automatic SGML DTD creation from tagged text, grammar reduction (BNF, DTD, and Four-Tuple output formats), and arbitrary transformations.
Links:
- The OCLC Fred Home Page
- Automatic DTD Creation from a URL or Sample Text - Description of the online Fred service
- Automatic DTD Creation from a URL
- Automatic DTD Creation from Sample Text
- Fred Translation Services
- OCLC SGML Grammar Builder Project - DTD and document grammar. Additional links and background.
- See also: XML DTD generation using DTDGenerator - XML DTD Generator
NORMDTD (by Richart Light)
[May 1996] "NORMDTD is a DOS (yes!) program that reads a valid SGML DTD, even a TEI-like one that uses marked sections and multiple input files, and generates a single file containing a normalized version of that DTD. The element content models in this normalized DTD will not contain any references to elements that are not declared, and so it can be used by highly-strung SGML packages such as RulesBuilder that refuse to process TEI applications (in particular) for this reason. In fact, having a normalized DTD in a single file can be helpful for a number of reasons, to a variety of SGML applications."
NORMDTD is written in Borland Pascal and runs only under DOS.
- The text of the announcement, with brief documentation
- Source for self-executing utility on OTA FTP server: ftp://ota.ox.ac.uk/pub/ota/TEI/software/normdtd1.exe
- Mirror copy, May 1996
- Also by Richard Light: The SGML Tagger, from Oxford University Press; [mirror copy]. "SGML Tagger is loaded on top of word-processing software, and allows users to insert SGML markup accurately and efficiently, without the need to learn a specialized SGML editor." See the bibliographic entry.
Babble - Synoptic Text Browsing/Searching Tool
[CR: 19970628]
"Babble, under development by Robert Bingler at the Institute for Advanced Technology in the Humanities (University of Virginia in Charlottesville), is an SGML-capable synoptic text tool that can display multiple texts in parallel windows. It uses Unicode, an ISO 16-bit character set standard, which allows multilingual texts, using mixed character sets, to be displayed simultaneously. Babble also allows users to search for strings in text or in tags, and to link open texts for scrolling and searching. Currently, Babble runs as an application, and not as an applet . . . Babble was originally prototyped in C++ and Motif++ for AIX 3.25 by Pete Yadlowsky. The current version is written in Java." [from the Home Page]
Note: Babble has been described to me as nominally but usefully SGML-aware. For example: "The search function allows you to search for strings, either in text or--if the file you're searching is marked up in SGML--within tags. When you click on the search button, a dialogue box appears, offering two choices: search in text or in tags, and a character set for the search. It is assumed that SGML tagging will be done in the Latin alphabet, but Babble will allow you to search for a non-Latin string within tags." [from the online documentation]
Links:
- [June 28, 1997] Announcement from David L. Gants and John Unsworth for Babble version 1.1.1, with several significant enhancements
- Babble Home Page
- Help for Babble: A Synoptic Unicode Browser
- Unicode and the Web
IADS: Integrated Authoring and Display System
[CR: 20011019]
"Interactive Authoring and Display System (IADS) was developed as a U.S. Army Missile Command initiative to reduce or eliminate paper documentation. IADS utilizes standard generalized markup language (SGML) to manipulate the text and graphics. The author can chose to display graphics within the text and/or in separate windows." [from the Home Page]
- New URL: iads.redstone.army.mil
- IADS version 3.0 feature list
Interactive Authoring and Display System (IADS). The IADS program distribution includes an XML DTD. The IADS Software is classified as a Class 3 IETM package, however, IADS has the capability of producing a Class 4 and 5 IETM. IADS uses SGML as its underlying text format. WYSIWYG editing is now provided which allows text entry, graphic manipulation, tag insertion, and modification within the context of the formatted display. This mode is turned on or off using the 'Edit mode' option under the 'Authoring' menu. The DTD (if specified in the DOCTYPE) is loaded, processed, and its rules stored for use when inserting or editing tags in the document. The tag editor dialog box will only allow tags and tag attributes to be inserted that are defined in the DTD. 'Currently [2001-10], IADS is the only software able to parse and display IETMs meeting MIL-STD-40051A.' Contact: iads@redstone.army.mil ([Neil Frazier] IADS / Publications Services, US Army AMCOM).
- IADS Users Group
- IADS Software Main Page
- U.S. Army Missile Command (Sponsor Site)
- Author and Editor modes in IADS
- ]FTP from the SGML Repository - probably an out-of-date version by now]
- FTP from Exeter
- IADS Version 2.0, available from the Exeter FTP Server or from the SGML Repository.
SARA (SGML-Aware Retrieval Application)
The SARA system. SARA (SGML-Aware Retrieval Application) is a client/server software tool allowing a central database of texts with SGML mark-up to be queried by remote clients. The system was developed at Oxford University Computing Services, with funding from the British Library Research and Development Department (1993-4) and the British Academy. The original motivation for its development was the need to provide a robust low-cost search-engine for use with the 100 million word British National Corpus, and several features of the system design necessarily reflect this.
The SARA system has four key parts:
- the indexing program, which generates an index of tokens from an SGML marked-up text
- the server program, which accepts messages in the Corpus Query Language (see below) and returns results from the SGML text
- the SARA protocol, a formally defined set of message types which determines legal interactions between the client and server programs; this protocol makes use of a high-level query language known as CQL (for Corpus Query Language)
- one or more client programs, with which a user interacts in any appropriate platform-specific way, and which communicate with the server program using the protocol
Links:
- See the main BNC entry
- SARA Documentation
- SARA (SGML Aware Retrieval Application) Workshop 29th June 1994 [mirror copy]
Ispell for SGML
[CR: 19970225]
- Announcement from R. Alexander Milowski of Copernican Solutions Incorporated for a utility that 'spell-checks' SGML documents: Ispell for SGML. Sources are available as a patch to the standard distribution; binaries are also available for Solaris 2.5, and a WIN32 port will be provided in the future. The brief description on the COPSOL WWW site says [970225]: "Ispell for SGML is a version of the ispell spell checker distribution that has been patched to understand and ignore SGML markup. This version is a simple markup scanner that does not assume any further knowledge of the DTD. It purely relies on markup mode scanning as specified in the SGML standard."
Syntext -- the SGML Grammar Grapher
[CR: 19960521]
"SYNTEXT is an SGML DTD providing elements and attributes to mark up text in English for: (1) syntactic structure, including (a) X-bar based parsing, with Government and Binding-style PRO and t, (b)grammatical relations a la Quirk et al. marked as attributes; (2) cohesion ; (3) coreference; (4) conjunctive relations as attributes of sentence specifiers; (5) lexical cohesion as attributes of lexical items; (6) rhetorical figures. Any text marked up for these features and identifying itself as DOCTYPE SYNTEXT is an SGML document and can be browsed in a SGML browser or viewer such as SoftQuad's free Windows browser Panorama or the costwish viewer for X Windows being developed by Peter Murray-Rust. It is an SGML application, the purpose of which is to provide markup for the analysis of syntactic and textual structure; a marked up text can viewed as a tree and in other modes and can be searched with context sensitive and contingent scans, making it very powerful for stylistic analysis (once a passage is marked up!)."
Links:
- the text of the announcement
- Home Page: http://weber.u.washington.edu/~syntext/; [mirror copy, incomplete missing graphics causa]
- SYNTEXT Documentation, including DTD
- syntext DTD Quick Reference; [mirror copy]
MtSgmlQL, the SgmlQL interpreter
[CR: 19971216]
"The SGML query language SgmlQL was developed in the context of the MULTEXT project. It is a functional language based on SQL, which enables complex operations on SGML documents, for instance: (1) extraction of parts of an SGML document that satisfy given criteria; (2) tests, counts, and various other computations on SGML elements in a document; (3) construction of new elements and documents using the result of queries. Because SgmlQL is a functional language, all data and program statements are expressions, or queries, which are recursively evaluated. It allows for manipulation of numbers, strings, (SGML) names, elements, attribute-value sets, documents, and (mixed content) lists. A free alpha version for UN*X of MtSgmlQL, the SgmlQL interpreter, can be downloaded to your system for non-commercial, non-military purposes (see the user agreement).
Links:
- Announcement for alpha release of the SGML/HTML query language
- SgmlQL - SGML Query Language
- Examples of SgmlQL usage
- The Multext SGML Query Language interpreter - reference; [archive copy, reference documents in HTML format]
- MtSgmlQL manual (including downloading instructions)
- SgmlQL reference: The Multext SGML Query Language
- See: Le Maitre, Murisasco, and Rolbert, "From Annotated Corpora to Databases: The SgmlQL Language." Papers presented at a conference held March 23-24, 1995, University of Groningen, 1995.
- Multext Home Page
'sgrep' grep-like searching of structured documents
[CR: 19981210]
Description: 'sgrep' (structured grep) "is a tool for searching text files and filtering text streams using structural criteria. The data model of sgrep is based on regions, which are nonempty substrings of text. Regions are typically occurrences of constant strings or meaningful text elements, which are recognizable through some delimiting strings. Regions can be arbitrarily long, arbitrarily overlapping, and arbitrarily nested. Sgrep is a convenient tool for making queries to almost any kind of text files with some well kown structure. These include programs, mail folders, news folders, HTML, SGML, etc... With relatively simple queries you can display mail messages by their subject or sender, extract titles or links or any regions from HTML files, function prototypes from C or make complex queries to SGML files based on the DTD of the file." Sgrep is distributed under GNU General Public License.
[December 10, 1998] Jani Jaakkola has announced the availabilty of "sgrep-1.90a - An SGML and XML Search and Indexing Tool." Sgrep is a tool to search and index text, SGML, XML and HTML files using structured patterns. New features in Sgrep version 1.90a include: 1) query operators that support direct containment, so that one may query children and parents of given elements; 2) the sources are available under GPL-license for those interested in compiling sgrep; 3) Sgrep now uses GNU autoconf, so compiling sgrep under Unix-systems should be easy; 4) bug fixes. This version of Sgrep contains the sources, Win32 binaries, and binaries for HP-UX, Linux, OSF1 and Solaris. The Win32 binary also includes the m4 macro processor. For more information on Sgrep, see README file or the overview.
[August 29, 1998] Jani.Jaakkola@cs.helsinki.fi (Department of Computer Science, University of Helsinki) posted an announcement for the release of sgrep version 1.71a as the first prerelease of sgrep-2. Sgrep is a tool to search and index text, SGML, XML and HTML files using structured patterns. Features new in version 1.17 include: "1) Indexing of both structure and content; 2) SGML/XML/HTML scanner; 3) both Win32 and i386-Linux binaries; 4) compatibility with older versions of sgrep; 5) no dependence upon 'sgtool'. Features announced for inclusion in sgrep-2 are: 1) Support for querying notations, element type declarations and attribute list declarations inside SGML/XML document prolog; 2) Parsing of all well-formed XML-documents; 3) Proper documentation.
Links:
- Announcement for version 1.90a
- Announcement for first prerelease of sgrep-2 [1998-08-28]
- HTML version of sgrep manual page
- Latest version of sgrep via FTP
- sgrep example queries
- "Using sgrep for querying structured text files" (Technical Report)
- README document; [local archive copy]
- Announcement for version 0.99 of 'sgrep' (April 30, 1996)
- Comments on sgrep by users (positive)
- Announcement for preliminary/test version of 'sgrep' (April 18, 1996)
- Home Page [Jani Jaakkola] also: Pekka Kilpeläinen
- DocMan - The Document Management Research Group (the Department of Computer Science at the University of Helsinki)
- Tarred and gzipped sgrep distribution version "sgrep-0.99.tar.gz"; [mirror copy]
- See the bibliographic entry for a technical report: Jani Jaakkola and Pekka Kilpeläinen. "Using sgrep for querying structured text files," In J. Saarela, editor, Proceedings of SGML Finland 1996, Espoo, Finland, October 1996. SGML Users Group Finland, pages 56-67.
- Email contact: Pekka.Kilpelainen@cc.helsinki.fi or Jani.Jaakkola@cc.helsinki
Inside & Out, from ZGDV
[CR: 19970522]
Inside & Out is a graphical DTD editor created by Hans Holger Rath and Ulrich von Engelberg, of the Computer Graphics Center (ZGDV) in Darmstadt, Germany. It runs under MS-Windows 3.1 (386 PC) with 4 MB RAM. The editor is designed to build SGML DTDs interactively, providing a graphical presentation of the DTD in the shape of a a syntax (or railroad) diagram. Every element and parameter entity definition is shown in a single diagram. All definitions are alphabetically sorted (first all entity, second all element definitions)"
Links:
- Main Page (FTP directory)
- The README document; [mirror copy]
- Sample document DTD
- Binary Executable; [mirror copy, entire package]
- Contact: iout@igd.fhg.de
MU: Forms Assisted SGML Markup
"MU is a perl-based program that builds fill-out forms for SGML editing, based on simple templates. It supports lock files (for networked workgroups), and it is distributed with a TEI-lite template. Demonstrations, source code, help files, and an email list for bug reports and developers are available. . .Features: (1) Helps to automate the SGML markup process; (2) Quite general - works on various types of DTD templates; (3) Version 1.1 deals quite nicely with attributes; (4) Allows for multi-user editorial communication through the use of remarks; (5) Supports internet workgroups via lockfiles."
Markus Hoenicka's SGML/DSSSL Setup for Windows NT
[CR: 19981014]
"These pages describe how to set up a free integrated SGML editing and publishing system running under Windows NT - and, with a few modifications of the installation procedure, also on Windows 95/98 boxes." The documentation provides instructions for the installation of Emacs, Jade, PSGML, Ghostscript, Acrobat, MiKTeX, AucTeX, Jadetex, DocBook, etc.
Links:
- Main Page
- Introduction
- Email contact: Markus Hoenicka
SGML Data Conversion, Transformation, and Manipulation
At SGML'96, Boston, November 1996, Tony Graham (Mulberry Technologies, Inc.) presented "Free SGML Transformation Tools." "The criteria for selecting an SGML transformation processing tool are discussed, and the details and SGML-processing features of several free SGML transformation tools are listed."
Rainbow
Several companies have collaborated on the design of an SGML interchange language for word-processing formats. Rainbow makers produce SGML from the supported word-processing formats, preserving as much information about document structure as can be deduced reliably. The Rainbow SGML format can then be used as input to other applications. See further explanation on EBT's server or on the mirrors in the file 'rainbow.why'. Rainbow makers are now available (free) for FrameMaker/FrameBuilder MIF, RTF, Interleaf, and (possibly) Ventura. Authoritative files for the Rainbow distribution are located on EBT's FTP server (SGML Rainbow via ftp.ebt.com/pub/nv/dtd/rainbow/
Other sources for Rainbow makers include:
- Announcement for Rainbow 2.01
- [mirror copy]
- Information: rainbow@ebt.com
- FTP from the SGML Repository
- FTP from Exeter
ICA: Integrated Chameleon Architecture
The ICA (Release 1.6, February 1994) is a toolset for generating data translators. In particular, the toolset can be used to generate translators to and from a constrained subset of instances of SGML Document Type Definitions (DTDs). There are several example translators included in the distribution. The first is a book DTD and includes specific translators for the LaTeX book documentstyle and a specific troff macro package. The second is a bibliographic DTD and includes specific translators for BibTeX and refer bibliographic database formats. Please note that the ICA is for developing translators and not providing translators. The ICA runs in the Unix environment, using the X Window System for the basis of the graphical user interfaces.
A new user's manual for ICA is also available. Published by Prentice Hall, the book is entitled The Integrated Chameleon Architecture: Translating Documents with Style, by Sandra Mamrak, Conleth S. O'Connell, and Julie Barnes. ISBN 0-13-056418-4. This book contains much new and revised material over the previously available online documentation, including a chapter on the ICA and SGML. See also description in excerpts from the release notes.
See further description in the ICA toolkit anouncement, and see network addresses for supporting mailing list. The sources for ICA on the Internet are:
- FTP from SGML: ICA Chameleon: Remote file archive.cis.ohio-state.edu/pub/chameleon/
- FTP from the SGML Repository
- FTP from Exeter
STIL - `SGML Transformations in Lisp'
STIL is a stylesheet language developed by Joachim Schrod (Computer Science Department Technical University of Darmstadt, Germany). "STIL (`SGML Transformations in Lisp') is a style sheet language to create structure-controlled SGML applications. In these applications you have neither access to the DTD nor to the original document source, instead you operate on a tree representation of the document. If you know CoST (the tree mode version) or SGMLSpm, STIL uses the same concept as these style sheet languages. The most obvious difference is the use of Common Lisp instead of Tcl or Perl5.
You define classes for elements that appear in a document, instances of these classes are the inner nodes of the tree. Automatic transformation of attributes to data structures more appropriate in your task domain than simple strings is available. Elaborate handling of PCDATA is supported, too.
The document tree is traversed, you can specify operations (`callbacks') that are triggered at certain points in that traversal. Within these callbacks, you have access to the full tree." [from the README, 1995/09/09]
Links:
- README [mirror copy, November 1995]
- STIL 1.0 Manual, by Joachim Schrod and Christine Detig [mirror copy, November 1995]
- stil-1.0.tar.gz
CoST (Copenhagen SGML Tool, UNIX)
[CR: 19990628]
[June 28, 1999] Joe English has announced the release of Cost version 2.2, which now provides 'preliminary support for XML'. Cost is a free "structure-controlled SGML application programming tool. It is implemented as a Tcl extension, and works in conjunction with James Clark's nsgmls and/or sgmls parsers. Cost provides a flexible set of low-level primitives upon which sophisticated applications can be built. These include: (1) A powerful query language for navigating the document tree and extracting ESIS information; (2) An event-driven programming interface; (3) A specification mechanism which binds properties to nodes based on queries. Cost can be dynamically loaded into a Tcl application with the usual package mechanism, or it can be statically linked into a custom Tcl interpreter. There is also a command-line interface, costsh, which can be used interactively or as part of a command pipeline. A windowing interface, costwish, is also available for building GUI applications with Cost and Tk. New features in Cost version 2.2 include: (1) It should compile and install out-of-the-box on most Unix platforms, with any Tcl release from 7.5 through 8.1.1 - courtesy autoconf; (2) One can load more than one document at a time, and switch between them with the new 'selectDocument' and 'withDocument' commands; (3) It allows comments at certain places in specifications. (4) It provides preliminary support for XML, courtesy expat by James Clark. Note: XML support is largely untested and has a few known deficiencies (and probably several unknown ones!); I'd appreciate any feedback/bug reports. (5) It is released under a Tcl-style license instead of the 'Artistic' license. (6) Cost can now be loaded as an extension into multiple Tcl interpreters without conflicts. (7) Many minor bugfixes, enhancements, and cleanups."
[1997] "What is CoST? CoST (Copenhagen SGML Tool) is a structure-controlled SGML application programming tool. It is built on top of a public domain SGML tool: the SGMLS parser made by James Clark. With CoST you can write translation specifications for SGML document instances. CoST is purely structure driven, i.e. it gives you access to the structure of the SGML document instance. It won't, however, let you access the lexical and syntactical details in the SGML entities that represent the document instance in storage. You can write CoST programs that will translate SGML document instances or perform other processing in response to SGML documents. You program CoST using TCL - Tool Command Language." [from the Manual Introduction [March 1995]
CoST was written by Klaus Harbo (Klaus.Harbo@euromath.dk) and is maintained by Joe English (joe@flightlab.com).
Links:
- Copenhagen SGML Tool - Cost Home Page.
- Cost reference manual
- [June 16, 1998] Boris Tobotras posted a CoST patch for multiple document instance support. "Some other fixes available, contact me if you're using CoST."
- READMEdescription, '28 May 1998'. [local archive copy]
- Announcement for CoST version 2.0 beta, September 14, 1995, including a draft version of the manual in Postscript format. Version 2.0 [2.0a2, October 13, 1995] contains a new query language.
- Searching SGML structure with CoST 2.0 (Peter Murray-Rust), April 1996
- Frequently Asked Questions about CoST
- CoST 2 Reference Manual
- Sources for the CoST package; [local archive copy, snapshot 980616 for 'cost-2.1a0.tar.gz' of May 30, 1998]
- Or FTP the package from Exeter.
- Related work: ExCost ExCost is for 'Expat and Cost'. Uses an extension to TCL that allows it to parse ESIS file and handle output in a event or tree driven behaviour. It provides about the same functionality as Cost, but for XML.
- See the Documentation of the Copenhagen SGML tool at http://laurel.euromath.dk/test.html.
costwish - SGML postprocessor and renderer
"Costwish is a graphical interface (SGML postprocessor and renderer) for Joe English's CoST-2 tool. From the README: "costwish is a generic graphical interface to Joe English's CoST SGML/ESIS post-processing tool. It is aimed at those who wish to: (1) run sgmls (or other ESIS-based parser) under a graphical interface; (2) browse their documents graphically (3) customise their postprocessing easily, powerfully and flexibly; (4) construct powerful searches of SGML-based documents; (5) and manage the results interactively; (6) develop interfaces to helper applications (e.g. graphical renderers)." [from the README, April 1996]
Links:
- Index Page: http://www.venus.co.uk/omf/costwish/
- Costwish Home Page
- Index of /omf/packages/binaries/
- Overview [mirror copy, April 1996]
- README file
- Documentation
- Review comments (compliments) by Len Bullard
SGMLS.pm and sgmlspl: A Simple Post-Processor for SGMLS and NSGMLS
[CR: 19980423]
SGMLS.pm and sgmlspl were written by David Megginson, and were maintained by him through 1995. The current maintainer [1998] of the SGMLS.pm Perl package is Ingo Macherius (Ingo.Macherius@tu-clausthal.de).
David's description: "SGMLSpm is a free perl5 object-oriented postprocessor for James Clark's SGMLS and NSGMLS parsers. The main part of this release is a library, SGMLS.pm, which repackages the ESIS output of (N)SGMLS into perl5 objects. On top of this, I have built a script, sgmls.pl, for formatting or processing SGML documents quickly using event patterns. Like CoST (which is several times slower), and unlike QWERTZ (etc.), SGMLSpm is a general-purpose package which can be used with any DTD. It even includes a script, skel.pl, which will write a skeleton conversion script for your document automatically!"
"sgmlspl is a sample application distributed with the SGMLS.pm perl5 class library -- you can use it to convert SGML documents to other formats by providing a specification file detailing exactly how you want to handle each element, external data entity, subdocument entity, CDATA string, record end, SDATA string, and processing instruction. sgmlspl also uses the Output.pm library (included in this distribution) to allow you to redirect or capture output."
- SGMLSpm source: http://home.sprynet.com/sprynet/dmeggins/SGMLSpm-1.03ii.tar.gz
- Source: local archive copy 1.03ii
- SGMLSpm documentation - from CPAN module documentation [Comprehensive Perl Archive Network]
- "Developing SGML Applications with Perl. Perl 5 and SGMLS.pm." [Chapter 5 and pages 260-276 in] SGML CD: A Complete SGML Toolkit, by Bob DuCharme.
- SGMLS::Handler - code to modularize SGMLS.pm's commandline driven sgmlspl tool, written by Ingo Macherius
- Home Page: http://home.sprynet.com/sprynet/dmeggins/
- Email: dmeggins@microstar.com (w); ak117@freenet.carleton.ca (pers)
- The original SGMLS.pm announcement (July 1995)
- Comparison of OmniMark and SGMLS-PM, by Lou Burnard (February 1996)
OmniMark LE
[CR: 19970923]
[September 23, 1997] Announcement for the OmniMark LE, available "at no charge for a limited time." OmniMark is a flagship industry software product -- a leading SGML based "hypertext programming language for development of on-line, Web, CD-ROM and print-on-demand publishing applications, being used for SGML conversion by a wide range of industry-leaders, including over 700 companies in 34 countries." OmniMark LE is a free product which runs utility-sized OmniMark programs. It is described as useful for: "(a) small-sized utility programs; (b) program development on the road away from your commercial licenses (since OmniMark LE will compile a large program -- it won't just run it); (c) evaluating OmniMark V3's capabilities before licensing V3." OmniMark LE is available on many platforms, including Windows NT/95 and many varieties of UNIX. See the program description for other information, or the main database entry.
"OmniMark LE will compile and execute programs that contain 200 or fewer actions in the program source. An action is a statement that OmniMark executes, distinguished from a "rule header" (e.g. an element rule) which describes an event. Within each element rule, one action is not counted towards the 200-action limit. The action count is performed at compile time, not run time; this means that any given action in a 200-action program could execute millions of times."
Links:
LT NSL and NSL (Normalised SGML Library)
[CR: 19970128]
From the Language Technology Group, Human Communication Research Centre, University of Edinburgh: the "Normalised SGML Library (NSL version 2.0 ) . . .consists of a set of C programs for manipulating SGML files and a C application program interface (API) designed to ease the writing of C programs which manipulate SGML documents."
"LT NSL is a development environment for SGML-based corpus and document processing, with support for multiple versions and multiple levels of annotation. It consists of a C-based API for accessing and manipulating SGML documents and an integrated set of SGML tools. The LT NSL initial parsing module incorporates v1.1.1 of James Clark's SP software, arguably the best SGML parser available. The basic architecture is one in which an arbitrary SGML document is parsed once, yielding two results: (1) An optimised representation of the information contained in the document's DTD; (2) A normalised version of the document instance, which can be piped through any tools built using our API for augmentation, extraction, etc.
Links:
- The main entry for the LTG in the "Academic Applications" area of this database
- January 28, 1997: Announcement from David McKelvie for the HCRC Language Technology Group's public release of LT NSL --- Normalised SGML Library, version 1.4.6. The toolkit offers significant enhancements over version 1.4.4. "LT NSL is an integrated set of SGML querying/manipulation tools and a C-language application program interface (API) designed to ease the writing of C programs which manipulate SGML documents. Its API is based on the idea of using 'normalised' SGML (i.e. an expanded, easily parsable subset of SGML) as a data format for inter-program communication of structured textual information. The API defines a powerful query language which makes it easy to access (either from the shell or in a program) those parts of an SGML document which you are interested in. Both event based and (sub-)tree based views of SGML documents are supported."
- LTG Home Page [or, no frames: ]
- LT NSL main page
- The LT NSL documentation
TclYasp SGML toolkit
Extracts from the announcement by David Durand: "TclYasp integrates a conforming SGML parser with the TCL scripting language. . . Unlike CoST 1.1, it uses an simplest-possible procedure call interface, rather than an eloborate object-oriented interface. . . TclYasp does have a few unique features: it's based on YASP, which is more easily portable (it's in ANSI C and not C++) and was designed to be integrated with an application. Since Yasp is fully re-entrant, more than one parser can be active at a time. It is not restricted to the informationd efined by the ESIS, as the full parser data is available. . . TclYasp/Mac includes a command shell, multiple-pane windows, limited on-screen text formatting, and a variety of interface features as well as the SGML processing stuff."
Links:
- Announcement, April 17, 1996 by David G. Durand
- Sources: ftp://ftp.stg.brown.edu/pub/sgml", [Files: 1394391 Apr 16 21:00 TclYasp-Mac.sit.Hqx OR 2373632 Apr 16 04:24 tcl_yasp-1.0.tar] via the Scholarly Technology Group at Brown University
- Tclyasp-Mac.sit.Hqx [mirror copy]
- tcl_yasp-1.0.tar.gz, archived by the Scholary Technology Group at Brown University
- Email contact: David G. Durand [dgd@stg.brown.edu], maintainer of the SGML archives
- More information about the use of SGML by the Scholarly Technology Group at Brown University
Python for XML/SGML Processing
[CR: 19981103]
A few people (at least) believe that Python is well suited for SGML text processing. Sean McGrath wrote that it "beats any other language I know for SGML processing hands down", and Paul Prescod said: "Python is a really easy, incredibly powerful scripting language. . . [it] combines the best features of other scripting languages and borrows many neat features from the Great Languages from history (Simula, SmallTalk, Lisp, Algol)."
Links [provisionally]:
- Documents on Paul Prescod's Home page: "SGML Processing in Python"; "Using SGML Groves from Python, Visual Basic and other OLE client scripting languages"; "PySgml: A Module (under development) for SGML Processing in Python"; "An Introduction to Groves for Python Programmers."
- Announcement from Paul Prescod for a series of documents on SGML processing using Python
- XML and Python - Database section in the XML page.
- See ParseMe.1st, by Sean McGrath: several chapters illustrate the Python framework for processing SGML information objects; [bibliographic entry].
- Python module for XML. "Extensible Markup Language Scanner, Checker, and Utilities," from Dan Connolly, May 1997; [local archive copy]
- Python XML SIG. As of March 17, 1998, a mailing list "has been created for discussing XML and Python, with the goal of developing a set of Python tools for processing XML documents."
- [November 03, 1998] Python and SGML" - By W. Eliot Kimber. ". . .Its easy-to-use object orientation, its built-in list semantics, and the fact that it's interpreted make it really easy to create the same sorts of programs you might use DSSSL or Balise for, but with a general-purpose programming language that is easy to learn and much more familiar that DSSSL or Omnimark. Python is a free, publicly-developed language, not a commercial product. . ."
- [February 12, 1998] "XML Programming in Python," by Sean McGrath. In Dr. Dobb's Journal February 1998 [Scripting Languages]. Abstract: "XML brings to the document world what the database world has had for a long time -- interoperability via open systems. Sean shows how you can use Python as a development platform for XML programming. Additional resources include the Python web page, and PXML.TXT (listings from DDJ)." See also the bibliography entry for "XML Programming in Python."
- XMLParser class in the Python [1.5] distribution. 11.10 Standard Module
xmllib
: "This module defines a class XMLParser which serves as the basis for parsing text files formatted in XML (eXtended Markup Language)." [from lmg]
I4I S4-Desktop V2.1 SGML middleware
[CR: 19970212]
Educational Support Program:
- Announcement from Infrastructures for Information Inc. for an Educational Support Program. "Infrastructures for Information Inc., . . . announces the no-cost availability of its S4-Desktop V2.1 middleware to any educational institution engaged in the use of SGML for teaching or non-profit research purposes. The program will make available to qualified institutions (public schools, universities and colleges) production versions of Infrastructures S4-Desktop Developer Kit middleware. S4-Desktop is a development environment for the SGML enabling of applications. It provides to the programmer a set of high-level functions that allows, for instance, a C/C++, PowerBuilder, or BASIC programmer to develop SGML applications.
SENG: SGML transformation engine
[CR: 19960806]
SENG = Scheme engine add-in for SP. "SENG is a transformation engine based on SP 1.0. It executes an SGML document as scheme code, [using a 'style semantics' concept]. SENG provides some basic procedures (some DSSSL like) to manipulate and access information from an SGML Instance. SENG was developed as a testbed for DSSSL experiments as well as an interm transformation engine for SGML. Features: (1) Cross document transformations; (2) Access to element context and left-siblings; (3) R4RS Scheme programming environment; (4) Simple syntax for style semantics (style sheets)."
"There is a free distribution for SENG for both the WIN32 and Solaris 2.x environments."
Links:
- Announcement for SENG, a free transformation engine, from Copernican Solutions Incorporated
- SENG Transformation Engine
- Technical Information
- Examples
- Copernican Solutions Incorporated
- FTP server: ftp://ftp.winternet.com/users/sgml
SGML-SPGrove
[CR: 19980105]
"SGML-SPGrove is a Perl module that links with James Clark's SGML Parser (SP) and builds in-memory groves from SGML, XML, and HTML documents. The groves can be accessed using iterator and callback (visitor) interfaces."
"SGML::SPGrove takes a system identifier and passes it to SP to parse, as each element is parsed from the document SPGrove builds Perl objects to match. When done parsing, SPGrove returns an SGML::SPGrove object that contains the root element of the parsed document and an array (hopefully empty :-) of parser errors. Elements of the document are SGML::Element objects. Elements have a generic identifier (or name), attributes, and the contents of the element. Attributes are stored as a Perl hash, with the values as an array of scalars and SGML::SData objects. The contents of an element may be more Elements, scalars, SData objects, processing instruction (PI) objects, or Entities. SGML::SData objects are replacements for character entity references within the document. The Text::EntityMap perl module can be used to map SData replacements from common character entity sets to common output formats."
Links:
- [January 05, 1998] Announcement from Ken MacLeod for the release of SGML-SPGrove version 1.00. See the README; [archive copy]
- Announcement posted to CTS, October 05, 1997
- FTP sources
- [0.02 mirror copy]
SGMLC (-Lite) products for MS-Windows
[CR: 19980218]
[February 18, 1998] Announcement from Bruce Hunter (SGML Systems Engineering) for an updated version (1.3) of the SGMLC language tools "designed specifically for creating SGML document processing applications in the Microsoft Windows environments." New features in this release include: "1) inbuilt ODBC support (query and update databases during document processing); 2) PERL-type regular expression support; 3) binary file I/O, with associated seek, put, etc. functions; 4) widow/orphan and hyphenation control; etc. The free, unregistered version of the SGMLC Publisher Development Environment includes all the new features listed above, plus full access to all the conversion, browser and publisher facilities."
"The 16-bit SGMLC-Lite free compiler for MS-Windows is now available from http://www.dircon.co.uk/sgml. A 32-bit version will be available soon from the same source. Unix and Mac versions are still [April 1996] some way off, I'm afraid. . .SGMLC is a language designed for processing SGML documents. It is based upon the C language, with some elements of C++. It recognises events which occur when processing an SGML document. You then provide the code to tell the application how to process the event. . .SGMLC may be used, for example, for writing SGML transformation applications, for converting SGML documents into some other form; extracting selected bits of information from an SGML document; creating semantic parsers, where you need to check more than just the SGML conformance of a document (for example, that a CDATA attribute conforms to a defined criteria); creating SGML browser and IETM applications, via the SGMLC-View add-on library; formatting and printing SGML documents into various output formats (PostScript, PDF, HPGL, etc.), again via the SGMLC-View library." [from the announcement, April 1996]
Links:
- SGML Systems Engineering
- Several useful free SGML processing applications
- Announcement from Bruce Hunter of SGML Systems Engineering for the availability of the 32-bit versions (for Windows 95 and NT) of the SGMLC products.
- The previous announcement [April 1996]
SGML Formatting Tools
format: Thomas Gordon's QWERTZ SGML -> LaTeX formatting package
[CR: 19980220]
- See the bibliographic entry for the manual
- FTP from the SGML Repository
- FTP from Exeter
- Thomas F. Gordon. "The QWERTZ Synthesis of SGML and LATEX." Computer Standards and Interfaces 17/1 (January 1995) 25-33. See the bibliography entry.
gf: Gary Houston's general formatter program
[CR: 19960724]
"gf is short for "general formatter", i.e., it can work on documents which use the ISO "general" document type definition (DTD). It can convert SGML documents conforming to a small number of DTDs into various output formats: LaTeX, ASCII, RTF and Texinfo. However not every output format can be generated for every DTD."
"Apart from the general DTD, gf supports the HTML DTD used in the WWW project and the Snafu DTD I [Gary Houston] just made up. There are many other DTDs which would be worth supporting. However gf is not intended as a flexible system for hacking up a formatter for a random DTD, but as a usable document production system for a few DTDs." [from the README, version 0.46, July 1996]
Links:
- The README for gf version 0.46 (July 1996)
- gf version 0.46, mirror copy
- Announcement for version 0.45 (November 1995)
- README for version 0.45
- FTP (older version?) from Exeter
- Contact: ghouston@actrix.gen.nz
Jörg Wittenberger's Typeset Package
Wittenberger's typeset is a formatter for SGML documents. "Overview [March 1995; link to the URL given below for a July 12, 1995 update]: Typeset is an extensible formatter for documents. It transforms documents using SGML markup into various target formats. Typeset comes with a couple of document type definitions (DTD's). The DTD's feature the reuse of text, minimization of markup and readability of the SGML source. They share their elements as much as possible. The formatting differs due to the features possible in the target format and to the rules common for type of the document. This includes the automated rearrangment of text and insertion of standard parts like content sections and bibliography. The latter for instance is composed from the items of a database which are referenced in the document. Future versions will also generate an index. According to the goal and the aim to support many target formats, these DTD's don't attempt to cover each and every case possible. Instead they try to provide all elements nessesary for daily use and leave the implementation of special features to extensions. It is fairly easy to coerce typeset to parse documents with other DTD's. But this implies to write rules for formatting in the desired target format(s). The transformation (formatting) is described by files of scheme code related to both, the document type and the target format. Only combinations of common value are supported by default. (For instance for letters only PostSript output is defined.)"
"Currently there are these DTD's: document (Simple ,,plain'' documents); report (Technical reports, documentation etc.); bibdata (Bibliography database); manpage (Pages for the Unix(TM) man command); brief (A letter according to DIN). Currently the following output formats are supported: PostScript (for english and german text); HTML (Hyper Text Markup Language); Info (to be included into the on-line help of emacs); man suitable for roff -man; ASCII; limited support for RTF. Future output formats will include: RTF, LaTeX."
Update on features [July 15, 1995]:* sorted indexes; * notation handling; * Addison-Wesley like entities for math symbols and greek letters; * handling for entities of notations eps, latex, roff (tbl), xfig, lout (as far as there is a chance for a final repesentation); * inlined code of aforementioned notations possible; * LaTeX 2.09 and LaTeX2e supported as backendl * limited support of RTF; "But it still lacks a native SGML table construct" [communique from Joerg Wittenberger]
Contact:
Jörg [Jerry] Wittenberger
Technische Universität Dresden
Institut für Betriebssysteme,
Datenbanken und Rechnernetze
Jörg Wittenberger's SDC Package
[CR: 19960806]
"SDC is a well featured, free system aiming to make SGML suitable for day to day use. SDC compiles SGML documents into representations as PostScript, LaTeX, HTML, man pages, (emacs) info files and is a little RTF aware. The goal of SDC is to be author friendly, easy to use without the need of special editors and to hide as much of the backend as possible. Hence the required markup is minimized, mixed content type allow you to type text (almost) everywhere and get the desired meaning. There are no `special characters' but those special to SGML (< and &)."
Links:
- Announcement (July 02, 1996)
- More Information [mirror copy, partial links only]
- FTP: ftp.inf.tu-dresden.de/pub/people/jerry/sdc-1.0beta.tar.gz
- Longer documentation [note Typeset origins]; [mirror copy (Date: June 1996)]
RATFINK SGML <--> RTF Conversion
[CR: 19970107]
"RATFINK, a library of Tcl utilities for generating RTF files, including a Cost script for converting SGML to RTF, is now available." From Joe English
Links:
- http://www.flightlab.com/cost/ratfink/index.html
- RATFINK Manual in HTML format, also in Postscript format
- information about Cost
SGML-Tools [Was: Linuxdoc-SGML]
[CR: 19980716]
SGML-Tools "is a text-formatting package based on SGML (Standard Generalized Markup Language), which allows you to produce LaTeX, HTML, GNU info, LyX, RTF, and plain ASCII (via groff) from a single source; due to the flexible nature of SGML many other target formats are possible. This system is tailored for writing technical software documentation, an example of which are the Linux HOWTO documents. However, there is nothing Linux-specific about this package; it can be used for many other types of documentation on many other systems. It should be useful for all kinds of printed and online documentation. The package was formerly called Linuxdoc-SGML because it originates from the Linux Documentation Project (LDP). The name has been changed into SGML-Tools to make it clearer that there is no Linux-specific stuff included in this package." Currently [September 1997] maintained by Cees de Groot.
[March 17, 1998] Update on SGML-Tools, from Cees de Groot: "SGML-Tools v2.0 will be an all-new version of SGML-Tools (currently in its 1.0 incarnation) that will base on DocBook and offer users migration software from linuxdoc to DocBook. We hope that this move will give authors more freedom in choosing their software, give anthologists and publishers/Linux distributors more useful raw material, and maybe even move a lot of current LaTeX-based documentation into the existing body of SGML documents. At the moment, the main target is to decide which software to base on: Quilt, by Ken MacLeod, or DSSSL. From there on, we'll attempt to build a distribution that offers functionality comparable to the current version of SGML-Tools (like support for multiple languages), and we hope to release in 98Q3. . . "
[February 27, 1998 - provisional paragraph] Announcement from Ken MacLeod for a new 'Quilt Kit' centered around Quilt, with DTDs and docs for DocBook, LinuxDoc, and TEI Lite. It has been produced in conjunction with SGML-Tools. Quilt is a "processing and formatting framework for structured documents. Quilt is intended to support processing of common, rich document elements and is tested against and comes with support for the DocBook, LinuxDoc, and TEI Lite SGML document types (DTDs) and formatting to Ascii and HTML. Quilt-Kit Quilt bundled with all the tools, except Perl, to format DocBook, LinuxDoc, and TEI Lite documents, including user guides. The kit also includes DSSSL stylesheets for DocBook for use with Jade. The Kit includes: James Clark's Jade, James Clark's Jade, Dean Roehrich's Class-Eroot, Norm Walsh's DSSSL stylesheets for DocBook (db104), The Davenport Group's DocBook v3.0 DTD (docbk30), The Davenport Group's DocBook SGML documentation (dbsset), SGML-Tools' LinuxDoc DTD and docs (sgml-tools-dtd), The Text Encoding Initiative's TEI Lite DTD (teilite), TEI's TEI Lite documentation (teiu5), Class-Visitor, PkgMaker (a utility used by the Kit to install), Quilt, SGML-Grove, SGML-SPGroveBuilder, entity-map, and iso-entities-8879.1986." See the Quilt for Perl README (1997/10/25).
Links:
- SGMLtools Home Page (New 980716)
- Description of SGML-Tools package
- README file accompanying the package
- SGML-Tools Mailing List
- SGML-Tools Mailing List Archives
- README [1998/02/22], [local archive copy]
- SGML-Tools User's Guide, by Matt Welsh. Updated by Greg Hankins [v0.99.0, 29 November 1996]
- Sources from Consultronics [September 12, 1997: 0.99.17]
- Note Paul Prescod's 980206 "Concrete Proposal" for the SGML Tools project. [local archive copy]
- See also the article "Flexible Formatting with Linuxdoc-SGML", by Christian Schwarz; [mirror copy]
- [April 09, 1998] Moving from linuxdoc DTD to the DocBook DTD
TEItools
[CR: 19981216]
[December 16, 1998] From Boris Tobotras. "TEItools is loosely coupled set of scripts, written in Tcl, which does various SGML transformations. Source file format TEItools uses is SGML, in TEI Lite incarnation. Currently they include converters: 1) from TEI Lite to HTML, RTF, TeX, DVI, PS, PDF; 2) from HTML to TEI Lite, Linuxdoc, TeX, DVI, PS, PDF; 3) from Linuxdoc to HTML, TEI Lite, DocBook, TeX, DVI, PS, PDF; 4) from DocBook to TEI Lite (very preliminary). TEItools was inspired by idea of SGMLtools (previously known as SGML-tools, previously known as linuxdoc-sgml. TEItools belongs to SGML conversion class of tools. It is part of entire document management system. You should be absolutely clear about that: just the part. Entire DMS includes also document repository, version control, access control, usage policy, documents editing, search and retrieval, to name just the few. Please don't expect TEItools will be all of that. But it can help you to build such a system, as it did for me."
References:
- TEItools Home Page
- TEI: Text Encoding Initiative - Main entry
MetaMorphosis - SGML/XML Tree Transformer
[CR: 19980910]
[September 10, 1998] An announcement was made on CTS by OVIDIUS for the release of MetaMorphosis 3.0. "MetaMorphosis is a target-driven SGML/XML tree transformer. Parsers for other input formats may easily be plugged into MetaMorphosis using the freely available tree representation API (MMdb-API). The software runs on MS Windows95/98/NT, Linux, and Solaris. Version 3.0 represents a complete redesign of MetaMorphosis. It has a modular architecture, and a set of APIs and is available as an SDK Version which allows a complete integration of MetaMorphosis into other applications. New features in version 3.0 include an enhanced query and transformation language, full XML support, support of various character encodings (Unicode, Shift-JIS, etc.), full integration of SP, etc." A demo version (Win32) and a free for Linux (ELF) version are available for download.
[Earlier description:] MetaMorphosis (from MID/Information Logistics Group) "is a modular, programmable tree transformer. It is used to convert any valid SGML instance to any other format, including SGML, arbitrary word processor formats, formats for hypertext systems, database tables, etc. MetaMorphosis has three main modules: (1) a workflow system which is highly configurable; (2) the MetaMorphosis kernel which in itself has a modular architecture [source tree generator, binary tree reader, tree transformer, tree annotator], and (3) a set of output processors. MetaMorphosis is quite different from any other SGML conversion tool. The underlying paradigm is that SGML instances are treated as trees rather than character streams. An SGML instance conceived as a character stream basically allows only sequential access to the instance. Furthermore, context information is usually limited to the elements on the root path and their left siblings. The tree model on the other hand allows random access to any node in the tree at any moment. The instance is seen as a kind of structured database each part of which can be accessed, copied, moved or deleted."
Links:
- OVIDIUS announcement for Version 3.0
- OVIDIUS description and download
- Announcement for an updated version (2.1) of MetaMorphosis-free (May 22, 1996)
- Announcement for MetaMorphosis-free - the free version of MetaMorphosis, a tree-based conversion system for SGML document instances - by Thomas Plass
- Metamorphosis: http://www.mid-heidelberg.de/docs/mid-de/produkte/metamorphosis.html
- MetaMorphosis Reference Manual
- Download MetaMorphosis-free (or use the SGML Repository FTP server)
gmat: an SGML Publishing System
[CR: 19971113]
"gmat: an SGML Publishing System is being developed by O'Reilly & Associates. It is currently in early alpha testing and suffers most noticeably from a lack of documentation." Version 0.2.2b (gmat-0.2.2b.tar.gz) is dated July 14, 1997.
SGML2TeX - SGML-to-TeX converter
"Converts a fully-qualified, pre-parsed SGML instance to a file with TeX-style control sequences replacing the SGML tags and entity names, and writes the element, attribute and entity names to a template style file so that their TeX expansions (macro meanings) can be edited in and the file printed using TeX. Written in PCL, requires PCL.COM and PCL.SYS to run (available from Calend Ltd, Twickenham, Middlesex, England)."
Generalized Document Objects (GDO)
[CR: 19970628]
Written by Ken MacLeod, "GDO is a framework for loading, integrating, and formatting structured documents. It can be used to load all or part of a document or data structure, contain or be contained by application objects, query and iterate over elements, merge elements or other documents, and format and output a document. Documents can be manipulated in their native format, in a generalized format, or in the output format.
"Possible applications include interactive documentation, context-sensitive help, dynamic Web pages, document management, contextual data, catalogs and directories, and browser-agnostic Web servers. . . release 0.4 is written in Perl 5, and supports SGML input and output to HTML 2.0 and plain ASCII.
Links:
- Home Page (FTP directory)
- README document; [mirror copy]
- Tools related to GDO
- The 0.4 distribution; archive copy
tei2latex - TEILITE to LaTeX2e
[CR: 19971022]
TEI2LATEX and TEI2HTML version 0.2. - 'Two Perl5 Programs to Translate TEI Lite Documents into LaTeX2e and HTML documents .' "tei2latex is a Perl5 Program to Translate TEI Lite Documents into LaTeX2e documents. . . The translation process can be configured in several ways for two reasons: (1) to enhance the default translation in case TEI Lite lacks information about the presentation (as in tables for instance); (2) to personalize the presentation of a document or a set of documents." [from the announcement; see below]
- [October 22, 1997] Announcement from Jean-Daniel Fekete (Ecole des Mines de Nantes) for the availability of TEI2LATEX and TEI2HTML version 0.2. - 'Two Perl5 Programs to Translate TEI Lite Documents into LaTeX2e and HTML documents .'
- tei2latex version 0.1 - Announcement, from Jean-Daniel Fekete (Universite de Paris-Sud) for tei2latex version 0.1f
- FTP: ftp://ftp.lri.fr/LRI/soft/ihm/tei2latex-0.1f.tar.gz
DSSSL Software Tools
[CR: 19981013] [Table of Contents]
See the main DSSSL entry for fuller information about DSSSL sample application profiles, (Jade) compatible utilities, DSSSL stylesheets, DSSSL tutorials, DSSSL development tools, etc.
Jade - James [Clark]'s DSSSL Engine
[CR: 19981013]
- [October 13, 1998] James Clark announced the public availability of Jade Version 1.2.1. Most of the changes are 'to make it build better on various systems'. Windows binaries are compiled using Visual C++ 6.0, and there's a new
.tar.gz
source distribution for Unix systems. - [September 24, 1998] James Clark announced the availability of Jade version 1.2. Jade (James' DSSSL Engine) is an "implementation of the DSSSL style language that features: 1) an abstract interface to groves - designed to be implementable on top of a database, in addition to simple in-memory implementations; 2) an in-memory implementation of this interface built with SP; 3) a style engine that implements the DSSSL style language; 4) command-line application, jade, that combines the style engine with the spgrove grove interface and five backends - XML representation of the flow object tree, RTF, TeX, MIF, SGML [SGML transformations]. New features in this release of Jade include: 1) a MIF backend written by Kathleen Marszalek and Paul Prescod, sponsored by ISOGEN International Corp; 2) an enhanced TeX backend written by Kathleen Marszalek, sponsored by Novaré International, and an associated LaTeX macro package by Sebastian Rahtz; 3)
configure/autoconf
support from Cees de Groot which should make it easier to install on Unix boxes." - [Earlier version description]: Jade is for "(James [Clark]'s DSSSL Engine." Jade (April 1997) includes the following components: (1) An abstract interface to groves; (2) An in-memory implementation of this interface built with SP; (3) A style engine that implements the DSSSL style language; (4) A command-line application,
jade
, that combines the style engine with the spgrove grove interface and four backends: "(a) a backend that generates an SGML representation of the flow object tree; (b) a backend that generates RTF (tested with Microsoft Word 97); (c) a backend that generates TeX; (d) a backend that generates SGML. This is used in conjunction with non-standard flow object classes to generate SGML, thus allowing Jade to be used for SGML transformations." - [May 18, 1998] "Jade 1.1.1 is now available. There are no new features since 1.1, only bug fixes. . ." -JClark
- [March 10, 1998] See the announcement from James Clark for the public availability of SP version 1.3 and Jade version 1.1. "In Jade 1.1 the main changes are the experimental extensions for XSL (documented in dsssl2.htm), and the use of XML for the FOT backend's output." Note above the different requirement for the ArcBase PI.
- [February 16, 1998] An announcement from James Clark for a new test release of SP (version 1.2.92) and Jade (version 1.0.93). In Jade 1.0.93, "the main change since 1.0.92 is in the FOT backend. The FOT file is now well-formed XML. It has also been changed to make it closer to the action part of an XSL style-sheet. The hyperlinking information is also represented in a more straightforward way. The idea is to make it practical both to have new backends that work from the FOT file and to have other programs that generate an FOT file." [SP version 1.2.92 and Jade version 1.0.93, sources, archive copy]; [SP version 1.2.92 and Jade version 1.0.93, Win32 binaries, archive copy] Also: Jade 1.1 and sp 1.3 for OS/2 provided by David J. Birnbaum.
- For configuration of SP and Jade, note that Henry S. Thompson (HCRC Language Technology Group, University of Edinburgh) also has a 'configure' file (uses 'install' - from X11R5, mit/util/scripts/install.sh); it has been tested for Jade 1.1. [local archive copy, 1998-09-25]; [local archive copy, earlier version]
- [October 17, 1997] Jade 1.0.1 is available: ftp://ftp.jclark.com/pub/jade/jade1_0_1.zip; Win 32 binaries. Bug fixes only. James Clark has also made a new Jade test release available. Among the new Jade features are some experimental DSSSL extensions, designed and implemented in Jade "so that, with these extensions, DSSSL provides a superset of the semantics [needed in] XSL (Extensible Stylesheet Language) for flow object tree construction. Jade has a
-2
option that enables these extensions." The extensions relate to (1) imperative programming features, from R4RS (e.g., assignment (set!) expressions (with restrictions), vectors (with restrictions), call-with-current-continuation (with restrictions), begin expressions, multiple expressions in procedure bodies and cond clauses, alternate in if expression optional, etc.); (2) style rules; (3) extended patterns [provide provide a superset of the semantics of XSL patterns]; (4) multiple patterns per rule; (5) flow object macros; (6) characteristic value conversion; (7) characteristic names. - [September 03, 1997] Release of Jade version 1.0, intended to be reliable for production use.
- [July 24, 1997]. Announcement from James Clark for a new version of Jade. Jade version 0.9 contains mainly bug fixes, but it has two enhancements: (a) "incorporation of the latest changes to the TeX backend, from Sebastian Rahtz; (b) the RTF backend now handles the span characteristic." Jade 0.9 binaries for OS/2 from David J. Birnbaum, email: djb@clover.slavic.pitt.edu.
- Update for Jade version 0.8 [May 28, 1997]. See the announcement from James Clark for a new version of Jade - "James' DSSSL Engine." A major new feature in Jade version 0.8 is "support for the sgml-parse procedure which allows multiple source documents. . .Other more minor features include: (1) a '-G' which causes Jade to produce stack traces on error; (2) support for quasiquotation; (3) a read-entity procedure that returns the content of an external entity as a string; (4) an all-element-number procedure, which is like element-number, but which counts elements of all types and is much faster (constant time)." For 0.8, James has also merged in the source for the Grove OLE Automation interface. See also: Jade/SP 0.8 Unix binaries, from Ingo Macherius [Linux 2.0.30; Solaris 5.5; DigitialOS 3.2 (aka OSF1); Linux 2.0.30; MS-Windows NT/95]
- Update for Jade version 0.7 [April 1997]: the announcement describes several interesting aspects of Jade version 0.7: (a) "One important internal change is that backends can now add their own specialized flow objects without changing the front end. The SGML transformation backend now uses this to implement its flow objects"; (b) "The main addition in the front-end is support for much more of the query language. A lot of what is not implemented can be implemented easily with procedure definitions"; (c) The version of SP releases in Jade 0.7 "now has some support for XML validation. Use -wxml to turn this on."; (d) "Microsoft have now released its free Word Viewer 97, and the Jade RTF backend now uses the Word/Word Viewer 97 hyperlink mechanism for representing links. Word Viewer 97 has several features that are significant for Jade, notably much better Unicode support and tables with vertical spans." Binaries (for Windows 95 or Windows NT) and source code are available.
- Jade Binaries, organized and/or compiled by Ingo Macherius; for (posibly) Linux 2.0.30; Solaris 5.5; DigitialOS 3.2 (aka OSF1); Linux 2.0.30; MS-Windows NT/95.
- Update from James Clark, February 21, 1997. "A new release of SP is available as part of Jade 0.5 from ftp://ftp.jclark.com/pub/jade/jade0_5.zip. This fixes the compilation problems with gcc as well as a couple of other minor glitches. This SP release should be considered a beta release. Please test it out and let me know of any problems. Jade Win32 binaries are available in ftp://ftp.jclark.com/pub/jade/jadew0_5.zip.
- Jade update November 11, 1996: Announcement for the first beta release of Jade (James' DSSSL Engine). Jade is implementation of the DSSSL style language developed by James Clark, author of the SGMLS and SP parser tools. "Jade is freely available, with source code, with no restrictions on commercial use. The development platforms are Windows 95 and Windows NT, but it also works on Unix. Jade allows you to display and print SGML documents; you control it by specifying a DSSSL style sheet. It has a modular design that allows you to add support for new output formats by adding a new 'backend'. At the moment the most mature backend is for RTF (as supported by Microsoft Word for Windows 95). A TeX backend has been contributed by David Megginson. Jade currently supports the DSSSL Online subset of DSSSL with some additions. Jade is designed to have good performance. On the portable computer from which I am posting this it can turn the SGML source of the DSSSL standard into an RTF document of over 200 pages in under 20 seconds. On a high-end PC, it's more than twice as fast."
- Node Properties in Jade [0.7], provided by David Megginson (email: ak117@freenet.carleton.ca)
- Serialized tutorial article "Formatting Documents with DSSSL Specifications and Jade," by Bob DuCharme, published in <TAG>: The SGML Newsletter, May, June, [...] 1997. See the bibliographic entry.
- "The Simpleton's Guide to Transformations using Jade." - From Graydon Hoare (GroveWare)
- Jade 'transform' of XML to HTML (reportedly ?):
jade -Dc:\jadedir\jade -dmydsssl.dsl -thtml xml.dcl test.xml
Jade MIF Backend
[CR: 19980505]
As of May 04, 1998, 'Jade MIF Back' is based upon James Clark's Jade version 1.1, and its current version number is 1.0e.
- Main Page; [local archive snapshot, 19980505, text only]
- Contact: Kathleen Marszalek (kmarszal@watarts.uwaterloo.ca)
- Download zipped source code
YADE (Yet Another DSSSL Engine)
[CR: 19970421]
YADE (Yet Another DSSSL Engine) is a DSSSL engine being developed by Norbert H. Mikula, [previously of Philips Semiconductors]. YADE "is, as the name suggests, a project whose final outcome should be a Java-based implementation of a tool that is able to process documents conforming to ISO-IEC standard 10179. In order to reduce development complexity DSSSL-Online, a subset of DSSSL that has been specifically designed to allow early software implementers to provide a common accepted minimal conformance to ISO-IEC 10179, has been chosen as the first milestone to achieve." The current version of YADE [April 21, 1997] uses Kawa 1.2.
According to Lou Burnard's report on YADE as presentated at the Third Annual Conference of the Belgium-Luxembourgian SGML Users' Group: 'Yet Another DSSSL Engine' (or YADE), uses Milowski's Kawa scheme interpreter [sic - for "Milowski's" read 'an older version of Per Bothner's Kawa Scheme implementation'], also written in Java. The context for these tools is the Philips Semiconductors Electronic Databook, an application of PCIS, the dtd Philips have developed within the Pinnacles framework, and forms the basis of Mikula's research at the University of Klagenfurt in Austria. His presentation was impressive, and although only in prototype form, the work he outlined shows great potential."
As of March 29, 1997, Mikula considered YADE not yet ready for public release, but thought it might be after the presentation at WWW6 in Boston.
"YADE... is a DSSSL engine that has been implemented using Java. Right now it is used in conjunction with my XML parser NXP (Norbert's XML Parser). YADE is using the Scheme engine Kawa, which has been developed by Per Bothner. YADE also follows the concept of having a core DSSSL engine and 'backends' for output to a certain device. As of today, YADE only supports the Java AWT (Abstract Window Toolkit) as a reference implementation of a backend." [from a description contributed by NHM]
Links:
- Description of YADE (March 29, 1997)
- Bibliographic entry for "Electronic Databooks: Proof of Concept.", in BeLux Proceedings of the 3rd Annual Conference on the Practical Use of SGML. The document briefly describes YADE. Available online in HTML format: Electronic Databooks: Proof of Concept, by Norbert H. Mikula; [mirror copy].
- Slides from poster session, at SGML '96
- Slide of YADE (PDF format), mirror copy
- Author's email address: nmikula@edu.uni-klu.ac.at [Department of Informatics, University of Klagenfurt, Austria]
- Norbert Mikula's Home Page
DSC---DSSSL Syntax Checker
[CR: 19970710]
"This tool, which embeds a full R4RS Scheme interpreter in James Clark's SP parser, is designed both to provide an online syntax checker for all DSSSL expression, style and transformation language programs, and to serve as a preprocessor for any Scheme-embedded DSSSL implementation." [from version 1.0 announcement] "Version 2.0, providing a much richer implementation framework, including the ocre query language, is scheduled for 2Q97."
- FTP source: ftp://www.cogsci.ed.ac.uk/pub/ht/dsc-1.0.tar.gz; [archive copy]
- Announcement from Henry Thomson for the "availability of the public release of version 1.0 of DSC---DSSSL Syntax Checker [text version]. This tool, which embeds a full R4RS Scheme interpreter in James Clark's SP parser, is designed both to provide an online syntax checker for all DSSSL expression, style and transformation language programs, and to serve as a preprocessor for any Scheme-embedded DSSSL implementation. Virtually the entire language as specified in chapters 8 through 12 of the standard is checked for syntactic correctness, and a nearly complete implementation of the core expression language is included. . . This is a UNIX-only release, tested so far under SunOS 4/5 and FreeBSD 2.1."
- See the previous entry. -- Announcement from Henry S. Thompson for the public availability of DSC --- DSSSL Syntax Checker version 0.7. "This tool, implemented in Scheme, is designed both to provide an offline syntax checker for all DSSSL expression, style and transformation language programs, and to serve as a preprocessor for any Scheme-embedded DSSSL implementation. Virtually the entire language as specified in chapters 8 through 12 of the standard is supported." Note: by about mid-January 1997, look here for an announcement for DSC-1.0 -- "a DSSSL syntax checker and implementation framework, which is based on an existing full R4RS Scheme interpreter and includes support for the DSSSL-O subset of the expression language, including lambdas with keywords" [from Henry S. Thompson]
- See also: "Index to all DSSSL procedures by prototype", by Henry S. Thompson. Derived automatically from the DSSSL standard using Jade. April 21, 1997. [mirror copy]
DSSSL Developer's Toolkit
[CR: 19970602]
The announcement from R. Alexander Milowski (Copernican Solutions Incorporated) describes the DSSSL Developer's Toolkit (DSSSLTK) version 1.0, available as a downloadable distribution. The toolkit "is similar in nature to the applet or serverlet architectures developed by Sun Microsystems/JavaSoft. . . a set of abstract interfaces written in Java to allow application developers to work with different Java-based DSSSL environments. . .[it] serves as an interface between difference DSSSL components. It represents an architecture for building DSSSL-oriented systems using the Java programming language. . .[it] provides a means for different DSSSL implementations in Java to share components such as parsers, transformation engines and flow object semantics. The toolkit contains three Java packages: dsssl.engine, dsssl.grove, and dsssl.flowobject. . . Developed as part of the Seng DSSSL Environment from Copernican Solutions, the SSSL Developer's Toolkit contains: (1) Full source code to the interfaces and classes; (2) Javadoc for the API reference; (3) Configuration and makefile utilities for building the distribution; (4) A prebuilt zip file containing all the classes."
Links:
- Announcement for version 1.0 [June 02, 1997]
Kawa - Java-based Scheme System (SENG)
[CR: 19970421]
"Kawa is a full Scheme implementation. It implements almost all of R4RS (for exceptions see section Features of R4RS not implemented), plus some extensions. It provides define-syntax from the R4RS appendix, and (from the draft R5RS) eval and multiple values. . . It is completely written in Java. Scheme functions and files are automatically compiled into Java byte-codes, providing reasonable speed. (However, Kawa is not an optimizing compiler, and does not perform major transformations on the code.) . . .Kawa provides the usual read-eval-print loop, as well as batch modes. . . Kawa is written in an object-oriented style. Kawa implements most of the features of the expression language of DSSSL, the Scheme-derived ISO-standard Document Style Semantics and Specification Language for SGML. Of the core expression language, the only features missing are character properties, external-procedure, the time-relationed procedures, and character name escapes in string literals. Also, Kawa is not generally tail-recursive, and literal unescaped symbols are case-insensitive (folded to lower-case). From the full expression language, Kawa additionally is missing format-number, format-numer-list, and language objects. Quantities, keyword values, and the expanded lambda form (with optional and keyword parameters) are supported." [from the FAQ for version 1.4, updated March 31, 1997.]
Links:
- Copernican Solutions Inc., Kawa Project, contributed by R. Alexander Milowski (versions 0.1 and 0.2) and Per Bothner (versions 0.3 and later). "The Kawa Project (pronounced 'kava') is a scheme interpreter totally written in Java -- a portable object-oriented language developed by Sun Microsystems. The intentions of the projects are to have a totally portable scheme interpreter with two major extensions--access to the java object system and syntax extensions to support DSSSL."]
- Kawa FAQ document
- FTP sources from CYGNUS - ftp://ftp.cygnus.com/pub/bothner
- Sources for version 1.4 from COPSOL
- Kawa recent news
- SENG DSSSL Environment -- see the entry above
- Se also: DSSSL-grove package and DSSSL Grove Guide
psgml-dsssl
[CR: 19971113]
"This program generates skeleton DSSSL specifications for DTDs from within PSGML. Emacs and PSGML are required."
- From David Megginson: announcement for a utility 'psgml-dsssl.el', which works with the free 'psgml' editor under Gnu Emacs. "This small package works together with the PSGML editor in XEmacs or Gnu Emacs to produce a skeleton DSSSL style spec automatically for the current document's DTD."
- http://home.sprynet.com/sprynet/dmeggins/psgml-dsssl.el
- Source: archive copy, October 20, 1996.
panodssl
[CR: 19970307]
PANODSSL.pl version 0.2. Script for converting Panorama stylesheets to DSSSL specifications.
- From Geir O. Grønmo: announcement for 'panodssl.pl' - Perl script for converting Panorama stylesheets to DSSSL specifications. [October 20, 1996]: "a very early release; more like a pre-alpha." "It has been tested it with Jade, James Clark's excellent DSSSL-engine, and it seems to work."
- URL: http://www.falch.no/people/geirog/panodssl/panodssl.pl; [mirror copy]. Update [November 01, 1996]: announcement for PANODSSL.pl version 0.2, available at: http://www.falch.no/people/geirog/panodssl/index.htm.
psgml-jade
[CR: 19980423]
Matthias Clasen (Institut für Mathematik, Albert-Ludwigs-Universität Freiburg) has contributed psgml-jade. psgml-jade is "an add-on to the psgml package for editing SGML files with Emacs which is intended to make menu-driven processing SGML files with jade and jadetex possible. It requires Gnu Emacs or XEmacs, together with Lennart Staflin's PSGML mode (tested with version 1.0.1) and David Megginson's DSSSL extensions (psgml-dsssl.el). It can also take advantage of David Love's new scheme.el which defines dsssl-mode."
Links:
- [April 22, 1998] FTP Directory for psgml-jade sources; [sources local archive copy, 980422]
- README document for psgml-jade; [local archive copy]
- [April 22, 1998] Some patches contributed by Dr. Markus Hoenicka (Hoenicka@pbmail.me.kp.dlr.de) have been integrated, which make psgml-jade work on Windows NT.
- [July 11 [14], 1997] Matthias Clasen: "I have put a new version of my add-on to psgml at http://logimac.mathematik.uni-freiburg.de/mixed/psgml-ja.tgz - or - ftp://logimac.mathematik.uni-freiburg.de/logimac/Dokumente/www/mixed/psgml-ja.tgz. This version has improved support for menu-driven customization of style sheets. You can now save and reuse customized values." [archive copy]
- Update (June 09, 1997), version 1.1.1.1 1997/06/08: "...the ability to select the SGML backend of jade and a menu entry to edit an existing style sheet associated with a sgml file." [mirror copy]
- Announcement from Matthias Clasen (clasen@netzservice.de) for psgml-jade.el -- emacs lisp code which adds jade and jadetex support to psgml. Requires Gnu Emacs 19.* or XEmacs, together with Lennart Staflin's PSGML mode (tested with version 1.0.1) and David Megginson's DSSSL extensions (psgml-dsssl.el). "Now, whenever you are editing an SGML document with PSGML, you will see an additional menu with title "DSSSL". It contains entries to Jade, JadeTeX, Xdvi, David Megginson's `sgml-dsssl-make-spec' function and two more entries to display the results of process and to kill a running process."
- URL: http://www.uni-kiel.de:8080/Logik/persons/mc/psgml-jade.el, [mirror copy]
Jadetex Package
[CR: 19981020]
"Jadetex package, an implementation of the TeX skeleton produced by "jade -t tex". . built on top of LaTeX. From Sebastian Rahtz (s.rahtz@elsevier.co.uk) and David Megginson:
- [October 20, 1998] jadetex test version 2.3 - reworking of vertical spacing, support for the "score" flow object, and revised entity lists.
- [September 24, 1998] Note on Jade 1.2 and JadeTeX
- [August 06, 1998] Update from Sebastian Rahtz. Sebastian Rahtz (Elsevier) posted an announcement for an updated set of JadeTeX macros, and further (unofficial) patches to Jade itself. The Jadetex package provides an implementation of the TeX skeleton which uses James Clark's Jade DSSSL Engine with the
-t tex
option; it is built on top of LaTeX. "Jade's TEX backend (originally written by David Megginson, since modified by Sebastian Rahtz and Kathleen Marszalek) has a very simple model: it emits a TEX command for the start and end of every flow object, defining any changed characteristics at the start of the command. This abstract TEX markup can then be fleshed out by writing definitions for each of the flow object commands, and this is what the JadeTEX macro package provides." In connection with this new release of JadeTeX, Sebastian has also made available an (excellent!) article "The TEX Backend for Jade and the JadeTEX Macros." [local archive copy] - [July 15, 1998] Update from Sebastian Rahtz. Announcement for Jade patches for Jadetex, and a new Jadetex package. The Jade TeX Backend (with table extensions) is now at version 1.0b5, based on Jade 1.1. This update from Sebastian includes new files added by Kathleen Marszalek and Paul Prescod for enhanced table support, new documentation, context diffs between jade1.1 source tree (jclark.orig), and revised source tree (jclark), and updated Jadetex macros. Part of the work has been "commissioned by Novare International, who have agreed that the code can be distributed under the same conditions as the rest of Jade."
- [June 16, 1998] Update from Sebastian Rahtz: Jade's TeX backend and Jadetex now properly support the processing of tabular material. See the demonstration: http://www.tug.org/applications/jadetex/tempest.pdf.
- [February 09, 1998] Announcement from Sebastian Rahtz for version 0.56, which "fixes a couple of bugs (including the one about horizontal rules in multi-columns), and adds a subdirectory './cooked' containing all the TeX macro files you need to build the format."
- [January 31, 1998] Announcement from Sebastian Rahtz for an updated version of his jadetex LaTeX macro package. The jadetex package uses James Clark's Jade DSSSL Engine with the
-t tex
option (i.e., tex backend). This release 0.55 of jadetex "fixes some problems with cross-referencing, gets back in sync with hyperref, and adds language support. The latter uses the LaTeX babel package, and it is up to you to compile the right hyphenation patterns into the format file. . . I have added a directory called `test' which contains a self-contained trivial table, with multi-column spanning, and cell alignment. You see [there] the .sgm file. the DTD, the .dsl, the .pdf output, and the .rtf output." - CTAN sources [CTAN /tex-archive/macros/jadetex]
- jadetex.dtx
- Hints on using Jade and TeX (!), by Sebastian Rahtz
- July 23, 1997. Updated version of the package to go with Jade 0.9. See: the description, or ftp://ftp.tex.ac.uk/tex-archive/macros/jadetex/; [mirror]
- Notice on availability of CDROM with upgraded 'jadetex' and the DSSSL standard in PDF format generated using jadetex [970509]
- Comments [970305] by Sebastian Rahtz
- Comments on update [970424]
- Jadetex enhancements: labels, links, color, etc. "This is the DSSSL spec transformed to TeX, and run through pdftex, which generates PDF output instead of TeX's original .dvi form." [970425]
- TeXFOTBuilder: a Generic TeX backend for Jade - Information in the Jade documentation set
DSSSL editing under emacs (dsssl/scheme mode)
[CR: 19970425]
- Postings from Dave Love and David Megginson
SGML/DSSSL Presentation Development Application
[CR: 19980511]
Ken Holman (Crane Softwrights Ltd.) announced the public availability of an SGML/DSSSL Presentation Development Application. It is an SGML application for frame-based presentation slide-shows with DSSSL scripts for the rendering of the slides to HTML and RTF final forms. This shareware application may be used with James Clark's JADE DSSSL Engine "to create slide-show presentations and associated paper handouts" from SGML source documents. The tool is "based on an SGML document model (DTD) and uses two DSSSL stylesheet scripts to render the structured presentation in both HTML and RTF."
Links:
XML/XSL Software Tools
[CR: 20001011] [Table of Contents]
The main XML document in the SGML/XML Web Page contains a section with references to generally-available XML/XSL/XLink software, and a section on XML design and development resources. Some other XML (demo) applications are listed in the section XML: Miscellaneous Unevaluated Uncategorized. Software packages specific to XSL and XLink are listed on the dedicated XSL and XLink pages. For XML software tools, see also: Steve Pepper's Whirlwind Guide to SGML Tools and Vendors, and the Free XML software list from Lars Marius Garshol.
Lark, an XML processor
[CR: 19980105]
Tim Bray of Textuality (and one of the XML editors) is developing Lark, an XML processor. The name 'Lark': "Lauren's Right Knee" [ask Tim]. The Textuality server contains a document "An Introduction to XML Processing with Lark," the abstract of which says, in part: "Lark is a non-validating XML processor implemented in the Java language; it attempts to achieve good trade-offs among compactness, completeness, and performance. . . Lark is available on the Internet for general public use." Note that the Textuality Web server has a number of other resources for XML.
"Lark is a processor only; it does not attempt to validate. It does read the DTD, with parameter entity processing; it processes attribute list declarations (to find default values) and entity declarations. Lark's internationalization is incomplete; it reads UCS-2, UTF-16, and ASCII (making use of the Byte Order Marks and Encoding Declarations in the appropriate fashion), but not UTF-8. Aside from that, Lark is relatively full-featured; it implements (I think) everything in the XML spec, except conditional DTD sections, and reports violations of well-formedness." [description of October 29 1997]
- [January 05, 1998] Announcement 1.0 final beta of Lark, and release 0.8 of Larval, a validating XML processor based on Lark.
- [October 29, 1997] Announcement from Tim Bray of Textuality for the release of Lark version 0.97.
- [September 09, 1997] Announcement from Tim Bray (Textuality) for release 0.92 of Lark, "a non-validating XML processor implemented in the Java language." Beginning with version 0.91, Lark processes Unicode: "It reads the BOM and thus UCS-2/UTF-16 (even byte-swaps); if there's no BOM, reads and tries to use the encoding declaration, boots it if it says anything but 'UTF-8' or 'UTF8'." Lark 0.92 is faster - 11.9 times faster than 0.91.
- [June 27, 1997] Announcement from Tim Bray for the availability of Lark version 0.90. Differences between this and the previous version include: (1) "handling of entity references in attribute values; (2) handling of &#X style hex character references; <3) draconian error handling; (4) the Handler has an element() method to serve as an element factory; (5) lots of bug fixes; (6) disrtibuted in a package, textuality.lark." The new version also "now comes with an application named XH . . ."
- Document URL for "An Introduction to XML Processing with Lark": http://www.textuality.com/Lark/
- Code sources: http://www.textuality.com/Lark/lark.tar.gz
DXP - DataChannel XML Parser
[CR: 19980504]
"DXP is a validating XML Parser in Java. DXP is based on NXP (Norbert Mikula's XML Parser), one of the first XML parsers." The current version of DXP [19980504] is 1.0 beta1c.
"DXP is specifically aimed at providing a utility for server-side applications that need to integrate XML capabilities into existing systems and for out-of-the-browser Java-based software. DXP provides the highly sophisticated error-checking mechanisms required for XML-based data interchange. DXP has not been architected for usage in an applet context, downloaded via the Internet. DXP, due to its complexity and feature set, is too large and would cause performance problems if transferred via the Internet. DXP uses JavaCC, a Java compiler-compiler that allows for the automatic generation of a parser framework based on a formal specific of the language (XML) targeted."
NXP, Norbert's XML Parser: an XML parser written in Java
[CR: 19980504]
[The successor to NXP is DXP - DataChannel XML Parser. See immediately above.]
NXP is a public domain XML parser written in Java, by Norbert Mikula. The lexical analyzer and the grammar has been defined using the parser generator Jack. In beta development stage. As of March 08, 1997, it supported: Public Identifiers, catalogs (incl. DELEGATE and CATALOG), Parameter Entitities, Resolution of Name conflicts, Attribute defaults.
- NXP Main Web Page
- NXP's mission statement
- Announcement for a new beta version [March 08, 1997]
- NXP's catalog support
- README / NXP
- Sources for Beta
- URL: NXP - Norbert's XML Parser, and a small test instance, by Jon Bosak. (January 14, 1997)
Microsoft XML parser in Java (MSXML)
[CR: 20001011]
[October 11, 2000] Microsoft XML Parser (MSXML). See (1) Joshua Allen's "Unofficial MSXML XSLT FAQ." (2) "What's New in the September 2000 Microsoft XML Parser Beta Release." (3) "Internet Explorer Tools for Validating XML and Viewing XSLT Output" (March 15, 2000 or later). (4) "Installing Msxml3.dll in Replace Mode" (September 2000 or later).
[19971209] "The Microsoft XML Parser is a validating XML parser written in Java. The parser loads XML documents and builds a tree structure of Element objects, starting with the root object of type Document. Each XML tag can either represent a node or a leaf of this tree. You can then browse and edit the tree using the methods of the Element class, and you can save the tree back out in XML format."
[December 09, 1997] Version 1.8 of the Microsoft XML Parser in Java was released on December 04, 1997. Version 1.8 of the parser implements the entire W3C working draft of the XML specification dated November 17, 1997, including support for the standalone attribute, new End-of-Line Handling, support for the xml:lang attribute on any tag regardless of ATTLIST declaration, [now] lower-casing of some generated GIs and attribute names, etc. The parser "will be revised to reflect future W3C changes to the specifications. The Microsoft XML Parser is a validating XML parser written in Java(r). The parser checks for well-formed documents and optionally permits checking of the documents' validity."
[November 01, 1997] New features were announced for version 1.6 of the Microsoft XML Parser in Java. Released October 31, 1997, the package containing the source code for the latest version of the XML Parser supersedes the XML Parser that shipped with Internet Explorer 4.0..."it implements the entire W3C working draft of the XML Specification dated August 7th, 1997, and will be revised to reflect future W3C changes to the specifications. . . The Microsoft XML Parser is a validating XML parser written in Java(tm). The parser checks for well-formed documents and optionally permits checking of the documents' validity. Once parsed, the XML document is exposed as a tree through a simple set of Java methods, which [Microsoft is] working with the World Wide Web Consortium (W3C) to standardize." As elaborated in the release notes, changes in the latest version include: (1) Case sensitivity; (2) Conditional sections in the DTD (INCLUDE and IGNORE keywords); (3) Support for namespaces (see XML Namespaces document); (4) Support for the ENCODING attribute on the XML tag; (4) Support for the XML-SPACE attribute in regular XML and in the DTD; (5) Support for the RMD attribute on the XML tag; (6) New Document save options for COMPACT and PRETTY save formats; (7) Support for floating ampersands, e.g., 'This & that'; (8) Support for empty end tags, e.g., <Foo>bar</>." The main XML page from Microsoft now references several online demos for XML, and sample XML files.
[June 1997] "The XML Parser in Java (MSXML) from Microsoft Corporation is now [June 07, 1997] available for download. This is the second piece of XML technology from Microsoft, the first being the Channel Definition Format support in Internet Explorer 4.0. The Microsoft XML Parser can be installed on any machine that has the Java Development Kit (JDK 1.0.2 or JDK 1.1.1)."
Links:
- Description of the Microsoft XML Parser, MSXML
- Microsoft XML Page
- Microsoft XML Parser in Java -- Installation
- See also: Microsoft XML Parser in C++ - non-validating
- Other details on Microsoft XML Support
XP, an XML parser in Java (James Clark)
[CR: 19980813]
On January 26 1998, James Clark posted an announcement for the public availability of a new XML parser in Java, tentatively called XP, along with an expanded collection of test cases, and a specification of a subset of XML called Canonical XML (for use in testing XML parsers). The XP parser, now in alpha-test version, "is fully conforming: it detects all non well-formed documents. It is currently not a validating XML processor. However it can parse all external entities: external DTD subsets, external parameter entities and external general entities." XP's design goals are documented as follows: 1) Conformance and correctness: XP is designed to be 100% conformant to the XML specification; 2) High performance: XP aims to be the fastest conformant XML parser in Java; 3) Layered structure: In addition to a normal high-level parser API, XP provides a low-level API that supports the construction of different kinds of XML parser (such as incremental parsers)." XP is one of several XML development resources made available by James Clark; see the link to his "XML Resources" below.
[August 13, 1998] On August 13, James Clark announced the availability of XP version 0.4 - 'XML Parser in Java'. In XP version 0.4, the main change "apart from bug fixes is that XP now makes available much more information about the markup of the document (non-ESIS information) including information about comments, entity references and the document type." XP supports several encodings: UTF-8, UTF-16, ISO-8859-1, US-ASCII.
Links:
- XP Main Page
- XP API documentation (generated by javadoc)
- XP version 0.4 (archive copy)
- Announcement for XP [980126]
- "XML Resources" from James Clark
- XP sources (.ZIP); [local archive copy, 980210]; and earlier, [local archive copy, 980126]
- James Clark's Home Page
expat - XML parser in C
[CR: 20001011]
[October 11, 2000] See the announcement for the release of version 1.2 [2000-10-06]. With this release, expat development is handed over to Clark Cooper and others.
[July 02, 2000] expat - "12-May-00 01:11 145k" [cache]
[May 31, 1999] James Clark announced the release of Expat Version 1.1, which may be used under either the Mozilla Public License Version 1.1 or the GNU General Public License. "Expat (XML Parser Toolkit) is an XML 1.0 parser written in C. It aims to be fully conforming [but] is currently not a validating XML processor. New features of expat version 1.1 relative to 1.0 include: (1) Support for XML namespaces, (2) Ability to report comments, (3) Ability to report CDATA section boundaries, (4) Ability to report which attributes are defaulted, (5) Compile option to reduce object-code size at the expense of speed. Expat has built in support for the following encodings: utf-8, utf-16, iso-8859-1, and us-ascii. Additional encodings can be supported by using XML_SetUnknownEncodingHandler
." For other information, see the the primary expat documentation page and the document "Frequently Asked Questions about Expat." [local archive copy]
[November 23, 1998] James Clark has announced the availability of the expat - XML Parser Toolkit Version 1.0.1, containing bug fixes. Expat "is an XML 1.0 parser written in C which aims to be fully conforming, but is currently not a validating XML processor. . . [the distribution] contains the xmlwf
application, which uses the xmlparse
library. The arguments to xmlwf
are one or more files which are each to be checked for well-formedness. An option -d dir
can be specified; for each well-formed input file the corresponding canonical XML will be written to dir/f
, where f
is the filename (without any path) of the input file. A -x
option will cause references to external general entities to be processed." A new test version of expat which adds support for checking of lexical aspects of XML namespaces specification is also available. Expat is released under the Mozilla Public License Version 1.0, but wDWith this release of expat, 'as a special exception', one may elect to use the GNU General Public License.
[August 14, 1998] A new version of James Clark's Expat is now available. Expat version 1.0 represents the first production release of this XML Parser Toolkit. Changes since the last beta version are a few minor bug fixes. Clark's Expat is an XML 1.0 parser written in C. It 'aims to be fully conforming, but is not currently a validating XML processor. The distribution comes with Win32 executables. It also includes an "xmlwf application, which uses the xmlparse library. The arguments to xmlwf are one or more files which are each to be checked for well-formedness. An option -d dir
can be specified; for each well-formed input file the corresponding canonical XML will be written to dir/f
, where f
is the filename (without any path) of the input file. An -x
option will cause references to external general entities to be processed."
[June 21, 1998] James Clark has announced the release of a new version of expat, his "high-performance, fully conforming, non-validating XML 1.0 parser toolkit written C." The public distribution comes with source code and Win32 binaries, and is subject to the Mozilla Public License Version 1.0. "The directory xmlwf contains the xmlwf
application, which uses the xmlparse library. The arguments to xmlwf are one or more files which are each to be checked for well-formedness. An option -d dir
can be specified; for each well-formed input file the corresponding canonical XML will be written to dir/f
, where f
is the filename (without any path) of the input file. A -x
option will cause references to external general entities to be processed." Considerable new functionality has been introduced in this latest beta version of expat, including: a) a callback that allows an application to add to the set of encodings that expat supports [with an example of using this to hook into the Windows code page support]; b) expat can be compiled to pass characters to the application in Unicode (i.e., as a sequence of 16-bit codes) rather than in UTF-8; c) new callbacks to provide information about unparsed entities and notations; d) new functions that allow an application to determine the location (line number, column number, byte index) of all events; e) hooks to allow applications to have access to the raw markup of the document along with the parsed result [allows the writing of XML-to-XML filters that don't normalize the document markup].
[May 1998] James Clark's Expat (XML Parser Toolkit) is distributed under the Mozilla Public License Version 1.0. In its current beta-test version (19980504), "Expat is an XML 1.0 parser written in C. It aims to be fully conforming. It is currently not [currently] a validating XML processor. . . [the distribution contains] the xmlwf application, which uses the xmlparse library. The arguments to xmlwf are one or more files which are each to be checked for well-formedness. An option -d dir can be specified; for each well-formed input file the corresponding canonical XML will be written to dir/f, where f is the filename (without any path) of the input file." The predecessor to expat was called xmltok (see below).
Links:
- Description of expat
- Download - expat.zip [local archive copy, 19980621 version]
- [July 27, 1998] TAKAHASHI Masayoshi posted a notice for the availability of an unofficial version of expat to handle Japanese encodings, Shift_JIS and EUC-JP. The author requests feedback from testers.
- Local archive copy, Version 1.0, 980814
XMLTok - XML parser in C
[CR: 19980210]
See now the successor (expat), above. From James Clark, XMLTok "is an XML parser in C. This includes 1) a low-level XML tokenizer; 2) a non-validating XML parser built on the tokenizer; this has an API designed for integration into Web browsers; 3) a simple application xmlwf for testing the parser, which can test XML entities for well-formedness and generate canonical XML."
Links:
- XMLTok, .ZIP archive format; [local archive copy, 980210]
- James Clark's XML Resources
SX - An SP application for SGML to normalized XML
[CR: 19980216]
James Clark posted an announcement on October 28, 1997 for a "very preliminary release of SX, an application built with the SP library for converting SGML to XML." This tool will eventually be included in the standard SP distribution. SX (the provisional name) "parses and validates the SGML document contained in sysid... and writes an equivalent XML document to the standard output. SX will warn about SGML constructs which have no XML equivalent." The distribution includes both source and Win 32 binaries (the sp120u.dll file included in the SP 1.2.1 Win32 Unicode binary distribution is required). Note that the program "does not yet provide enough to handle the situation where you want to migrate your document source from SGML to XML. In particular it doesn't try to preserve entity references; all entities are expanded."
As of the February 1998 test release of SP - per an announcement from James Clark for a new test release of SP (version 1.2.92) and Jade (version 1.0.93) - SP includes the SX application.
Links:
- SX - now in the SP package
- Description and documentation
- Source (.ZIP); [local archive copy]
SAX - the Simple API for XML
[CR: 19990226]
SAX 1.0 (the Simple API for XML) was released on May 11, 1998. SAX is a common, event-based API for parsing XML documents, developed as a collaborative project of the members of the XML-DEV discussion under the leadership of David Megginson. Relative to the preliminary draft version of SAX released in January 1998, SAX Version 1.0 represents a major reimplementation, adding some important features such as the ability to read documents from byte or character streams. "SAX fills the same role for XML that the JDBC fills for SQL: with SAX, a Java application can work with any XML parser, as long as the parser has a SAX 1.0 driver available. . . The first release of SAX is in Java, but versions in other programming languages may follow. SAX is free for both commercial and non-commercial use."
Links:
- SAX Home Page, and Microstar mirror site
- [May 12, 1998] Announcement for the public release of SAX 1.0. See: http://www.megginson.com/SAX/.
- "What is an Event-Based Interface?" By David Megginson.
- [April 27, 1998] For developers only: SAX 1.0 beta, almost ready for announcement.
- [January 12, 1998] Announcement from David Megginson
- [March 20, 1998] Proposed SAX Revisions , per the posting from David Megginson
- [April 10, 1998] For developers only: test re-release beta version of SAX, and demo which does the identity transform
- Draft Interface Specification
- SAX: Java Implementation
- History and Contributors
- Parsers and Applications Currently Using SAX
- "SAX - The Simple API for XML." By David Megginson (Microstar Software Ltd). Handout from the presentation at XML Developers' Day, Seattle, WA, 27 March 1998. A Postscript version of this document is available as well.
- [February 26, 1999] Draft for Perl SAX Basic interface for a simplified Perl binding for SAX (Simple API for XML), by Ken MacLeod.
Docuverse DOM SDK. Previously 'FREE-DOM - W3C DOM API using SAX' and SAXDOM
[CR: 19980907]
Docuverse DOM SDK is an implementation of W3C Document Object Model (DOM) API in Java. As of Preview Release 2, it includes W3C DOM HTML API support. Has: "Support for W3C Proposed Recommendation for DOM (Core) Level 1, Support for SAX 1.0 compatiable XML Parsers, Support for custom node implementations, Full JavaDoc documentation for the DOM API."
[Previous description, partially out-of-date, follows:] The FREE-DOM package is from Don Park. A free DOM implementation, formerly called SAXDOM, ". . .supports but is not limited to SAX. Look for AElfred and MSXML support in the near future with expanded DOM spec support (meaning XML and HTML portion of the spec). It can be used with MSXML if you combine two drivers: one for bridging FREE-DOM to SAX and another for bridging SAX to MSXML. A direct driver from FREE-DOM to MSXML as well as other popular parsers is planned."
FREE-DOM [SAXDOM], under development by Don Park, is an implementation of W3C Document Object Model (DOM) API using Simple API for XML (SAX). In Java. SAXDOM is in public domain and can be used for any commercial or non-commercial purpose.
"The DOM isn't finished, so any implementation is necessarily tentative. With that warning, however, you can look at http://www.quake.net/~donpark/saxdom.html. The nice thing about Don's work is that SAXDOM will run with any SAX-conformant Java XML parser, so you can use NXP, Lark, MSXML, AElfred, and/or XP, as you wish. Don also includes some information about integrating the DOM with the new, standard Java Swing widgets." [comment from David Megginson, author of SAX, posted to XML-DEV on February 7, 1998]
Links:
- [May 06, 1998] FREE-DOM - main page
- [July 23, 1998] Don Park announced an update of the Free-DOM documentation, and a new version of Free-DOM which supports the latest version of the W3C DOM specification.
- [May 04, 1998] SAXDOM was updated 980504 to support the "04/16/98" W3C DOM spec. 'This version of SAXDOM includes NodeIterator.release method which is currently not in the spec.'
- SAXDOM Description, Updated 980406 to support the DOM specification of 03/18/98, WD-DOM-19980318. [local archive copy, incomplete links]
- Demo online, including XML-Binary example
- [April 07, 1998] Notes on the 980406 Release: The SAXDOM package now includes both source code and classes. It is also compressed so you will have to expand it or repackage it into a form more friendlier to the Java system you are using (jar, uncompressed ZIP, or just plain files). SAXDOM package name has changed from org.xml.dom to org.xml.saxdom as of 04/06/98 release."
- Sources 980406: SAXDOM package containing SAXDOM files as well as W3C DOM files
- [February 10, 1998] See the announcement from Don Park for the public availability of a browser-based demo of SAXDOM being used from JavaScript. The demo consists of some JavaScript code which invokes the SAXDOM through an applet to retrieve the DOM objects. The returned DOM objects are used to generate XML text in color.It is currently limited to Internet Explorer 4.0. The demo "shows DOM being used by a scripting language just as it was designed for."
Saxon: An Open-Source XSLT Processor
[CR: 20020411]
Update 2002-04
On 20-December-2001, to coincide with the publication of the first working drafts of XPath 2.0 and XSLT 2.0, Michael Kay [now of Software AG] released version 7.0 of Saxon, as an initial implementation of these drafts. Saxon 6.5 continues to be available as an XSLT 1.0 processor with some extensions based on the abortive XSLT 1.1 working draft. Since its initial release, Saxon has concentrated increasingly on its role as a highly-conformant XSLT processor, known for its good processing speed and for its library of extension functions. Meanwhile its original role as a Java class library has declined, although it can still be used this way. As well as XSLT 1.0 and XPath 1.0 support, Saxon supports the JAXP 1.1 API, and has interfaces with DOM, SAX2, and JDOM. It also integrates with the Apache FOP processor for XSL Formatting Objects." [text supplied by MKay]
References:
- SAXON: The XSLT Processor. Home page for the SAXON XSLT processor developed by Michael Kay. In this connection, note Michael Kay's highly-esteemed book XSLT Programmers Reference, Second Edition.
- Saxon XSLT Processor Project on SourceForge
Earlier references Retained for historical purposes.
[2000-01-19] "The SAXON package is a collection of tools for processing XML documents. The main components are: (1) An XSL processor, which implements the Version 1.0 XSLT and XPath Recommendations from the World Wide Web Consortium, found at http://www.w3.org/TR/1999/REC-xslt-19991116 and http://www.w3.org/TR/1999/REC-xpath-19991116 with a number of powerful extensions (2) A Java library, which supports a similar processing model to XSL, but allows full programming capability, which you need if you want to perform complex processing of the data or to access external services such as a relational database. So you can use SAXON by writing XSL stylesheets, by writing Java applications, or by any combination of the two. If you are only interested in running the XSL interpreter, on a Windows platform, try Instant SAXON. At 250 Kb, this is a much smaller download; it excludes source code and API documentation. SAXON provides a set of services that are particularly useful when converting XML data into other formats. The output format may be XML, or HTML, or some other format such as comma separated values, EDI messages, or data in a relational database. SAXON implements the XSLT recommendation, including XPath, it its entirety. SAXON also does things that are beyond the scope of the XSL standard: for example: (1) It allows XSL processing and Java processing to be freely mixed, so you can always escape into procedural code to do something non-standard (such as accessing a database) (2) It allows multiple output files. SAXON is particularly useful for splitting a large document into page-sized chunks. You can do this without writing any Java code. (3) It allows multi-pass processing, by means of an extension function that converts a result tree fragment to a nodeset, or by chaining stylesheets together (4) It allows variables to be updated..."
[January 19, 2000] A posting from Michael Kay to the XSL-List announces the release of SAXON version 5.0, which supports the W3C XSLT and XPath Recommendations published on November 16, 1999. Kay says that the SAXON 5.0 package is now "a complete implementation of XSLT 1.0 and XPath 1.0 - If there are any parts of the spec it doesn't implement, then that's an oversight and will be treated as a bug. Apart from full conformance, the new things in this release include: (1) a number of new extension functions [intersection(), difference() and has-same-nodes() to compare node-sets; line-number() and system-id() of the current node in the source document; if(condition, then, else)]; (2) stylesheet chaining: [specify <saxon:output next-in-chain="phase2.xsl"> to send the output of this stylesheet to be the input to another stylesheet]; (3) user-definable numbering and collating sequences; (4) internal improvements to node-set handling and sorting, which should result in better performance when handling large node-sets, and should certainly reduce the load on the garbage collector." The SAXON package is "a collection of tools for processing XML documents. The main components are: (1) An XSL processor, which implements the Version 1.0 XSLT and XPath Recommendations from the World Wide Web Consortium [...] with a number of powerful extensions; (2) A Java library, which supports a similar processing model to XSL, but allows full programming capability, which you need if you want to perform complex processing of the data or to access external services such as a relational database. So you can use SAXON by writing XSL stylesheets, by writing Java applications, or by any combination of the two. If you are only interested in running the XSL interpreter, on a Windows platform, try Instant SAXON. At 241 Kb, this is a much smaller download; it excludes source code and API documentation. SAXON provides a set of services that are particularly useful when converting XML data into other formats. The output format may be XML, or HTML, or some other format such as comma separated values, EDI messages, or data in a relational database. SAXON implements the XSLT recommendation, including XPath, it its entirety. SAXON also does things that are beyond the scope of the XSL standard: for example: (1) It allows XSL processing and Java processing to be freely mixed, so you can always escape into procedural code to do something non-standard (such as accessing a database); (2) It allows multiple output files. SAXON is particularly useful for splitting a large document into page-sized chunks. You can do this without writing any Java code; (3) It allows multi-pass processing, by means of an extension function that converts a result tree fragment to a nodeset, or by chaining stylesheets together; (4) It allows variables to be updated." For related software, see "XSL/XSLT Software Support."
[February 16, 1999] Michael H. Kay (ICL) has announced the release of SAXON Version 4.0. "SAXON is a Java library for processing XML documents: it provides a number of services above the SAX and DOM level to make applications easier to write and more modular. The services are particularly useful for applications performing XML-to-XML or XML-to-HTML transformations. SAXON is available as a free download with source code included." Among the 'substantial changes' in the version 4 release of SAXON: 1) 'Improved support for processing using the DOM, in a way that is forward compatible with serial (SAX-based) applications: you can use the same element handlers in both modes; the processing model (selecting an element handler based on a pattern match) is identical to that for XSL; 2) Support for Stylesheets: you can now invoke many of SAXON's capabilities without writing any Java code. SAXON Stylesheets support a useful subset of XSL and provide two important additional features: the ability to create multiple output files, and the ability to freely mix XSL and Java code: XSL can be used to process some elements, and Java for others, or you can preprocess the element in Java before rendering it in XSL. Very useful if you are doing more than simple rendering, e.g., if you are loading a relational database.'
Links:
- [October 20, 1998] Michael Kay has announced the availablility of SAXON Version 3.1. The new version has enhancements requested by users. "SAXON is a java class library that sits on top of a SAX Parser or DOM implementation, providing a variety of facilities that help the application to process an XML document. It is designed primarily to support applications handling specific document types (as opposed to general-purpose tools), especially applications doing XML-to-HTML or XML-to-XML transformations. SAXON works in principle with any SAX 1.0 conformant parser (the latest version is tested with xp, xml4j, SUN XML Library, Ælfred) or with any implementation of the latest DOM specification (tested with docuverse and xml4j; should work with SUN again when they upgrade to the latest spec). The download includes source and object code, documentation, and sample applications. One of the sample applications is a DTD Generator which has proved popular in its own right."
- [July 23, 1998] Michael Kay updated SAXON to use the new [July 22, 1998] version of Free-DOM.
- [February 17, 1999] SAXON 4.0 and XSL Queries
- [June 16, 1998] Announcement from Michael Kay for an updated version of SAXON. SAXON is a "Java class library providing a range of services on top of SAX; it is particularly useful for writing applications to process specific document types. [The author has] used SAXON to do a wide variety of XML-to-XML and XML-to-HTML transformations, and to load XML data into relational databases." Changes in this release include: "1) substantial performance improvements (factor of 2 to 3); 2) greatly improved error/exception handling; 3) minor bug fixes and documentation improvements; 4) a new (experimental) integration with the DOM, as implemented by FREE-DOM."
- Announcement from Michael H. Kay (May 12, 1998)
- Description - Main Page; [local archive copy]
- Download
XAF - an XML Architectural Forms Processor
[CR: 19980610]
On June 10, 1998, David Megginson (Megginson Technologies Ltd.) posted an announcement for the beta release of XAF, an XML Architectural Forms Processor. Accompanying the software package is detailed, tutorial-oriented documentation about XAF and architectural forms (Using the XAF package for Java), appropriate for both XML document designers and XML software designers. According to the announcement, XAF is "a Java-based XML architectural forms processor that acts as both a SAX application and a SAX parser. XAF uses any SAX 1.0-conformant parser to parse an XML document, then masquerades as a SAX parser itself: the client application sees the (virtual) architectural document instead of the actual XML document. Architectural forms are a very powerful markup facility that simplifies embedding multiple structures in a single XML document. They are especially useful for working with XML-related standards like RDF and MathML. You even can use XAF together with Don Park's FREE-DOM to create a DOM of a virtual architectural document."
Links:
XML Testbed
[CR: 19980901]
On September 01, 1998, Steve Withall announced the release of an XML application environment written in Java. At the earlier XML Developers' Conference in Montréal, Withall gave a presentation "XXX - eXpandable XML eXploitation" which described a number of design ideas for flexible, expandable applications that manipulate and otherwise exploit XML documents. The details of the Java 'XML Testbed' application used to demonstrate these ideas are now documented online, and the software is available from the W3C web server. Slides from the Montréal presentation are also available.
"The software uses an XML configuration file to define the (Swing-based) user interface. It includes its own non-validating XML parser (though it can use any SAX parser instead), a nascent XSL engine (to the old/submission standard - just in time to be out of date), and a few other odds and ends. The key feature of the infrastructure is that it is intended to be easily expandable, to allow application-specific functionality to be slotted in dynamically. This is achieved by registering the classes to be instantiated for given named elements, and invoking special behaviour in a generic way by invoking a method called verify() on each element as soon as it has been parsed. The software is freely available for non-commercial use and can be downloaded, with all source code.
XML Testbed application is "written in Java, with its own supporting XML infrastructure, including an XML parser and grove. A key feature of the infrastructure is a 'node type registry', which allows dynamic control over which classes are used for particular types of elements - the element class to represent them, the parser class to parse them, the customizer class to edit them and the view class to display them (using a Swing text editor kit). The XML Testbed provides means to edit and then parse an XML source - currently going so far as to highlight the portion of the source at which any error occurs. It also allows the parsed document to be viewed in the form of a tree. The Testbed user interface is implemented using Swing. The software has been designed to be as modular as possible, to be divided into a suite of relatively small packages, each with a clear role. Each usage of XML (using the word 'usage' rather than 'application' to avoid a dual meaning of the latter) is placed in its own additional package. Three such usages are included in this release, demonstrating how to build on the basic infrastructure, and also providing some (limited!) usable functionality. These three usages are a nascent XSL engine, XML-based user interface configuration, and a database analyser for generating an XML file of the schema of a database. To run the XML Testbed requires JDK1.1 or above and Swing 1.0.2 or above. These are the only essentials. To parse using SAX requires the installation of the desired parser(s) and their drivers. By default, parsing is performed using the parser in the xe package, which is included in this release."
Links:
- Presentation "XXX - eXpandable XML eXploitation" at the Montréal XML Developers' Conference, August 1998
- Slides from the Montréal presentation
- Announcement posted September 01, 1998
- Steve's XML Testbed Home Page
- XML Testbed Overall Software Structure
- Sources for the software are available; [local archive copy]
- XE Wishlist - Things to be done
- Contact the author: Steve Withall
DAE SDK and DAE Server SDK (Copernican Solutions)
[CR: 19980228]
The DAE SDK is an "SGML, XML, and DSSSL technology for a Java application environment. . . DAE SDK is an implementation of the DSSSL Developer's Toolkit. Its principal features are support for XML Parsing and Groves, SDQL from Java, DSSSL Formatting, and Scheme scripting. It provides a framework for processing SGML and SGML-related documents with DSSSL and non-DSSSL constructs. The DAE Server is an integration of the DAE SDK into a web server completely written in Java. This integration provides means for the server to manage automatic access to groves and different processors." As of January 1998 release, the DAE SDK supports: "1) The full DSSL expression language; 2) A majority of the SDQL procedures; 3) DSSSL style language support; 4) XML Processor for building groves from XML documents; 5) A full API in Java for processing, loading groves, and applying style; 6) A full API in Scheme for processing, loading groves, and writing transformations."
Note: earlier XML development tools from Copernican Solutions were released as part of an 'XML Toolkit.' This toolkit (XDK) provided a developer with both light weight and vigorous parsers and APIs for validating, loading, and accessing XML documents" and Featured: (1) A light weight Well-Formed XML document parser; (2) A uniform document API based on the DSSSL ISO 10179 standard; (3) Interfaces for loading and accessing XML documents in arbitrary data stores; (4) A validating XML parser for syntax checking an XML document. Programming languages supported: Java and C++."
Links:
IBM XML for Java - validating XML processor in Java
[CR: 19981009]
On February 10, 1998, XML-DEV received an announcement from Kent Tamura (Tokyo Research Laboratory, IBM Japan) for the release of `IBM XML for Java' - a validating XML processor written in Java. The processor is said to provide two main functions: 1) Parsing an XML document and construction of a Java object tree, and 2) Generation of an XML document from a Java object tree. The package requires Java 1.1, and may be downloaded from IBM alphaWorks: http://www.alphaworks.ibm.com/formula/xml. According to the developers (apparently: Kento Tamura and Hiroshi Maruyama): "XML for Java is a validating XML parser written in 100% pure Java. The package (com.ibm.xml.parser
) contains classes and methods for parsing, generating, manipulating, and validating XML documents. XML for Java is believed to be the most robust XML processor currently available and conforms most closely to the XML specification proposed and recommended by W3C in December, 1997."
[May 14, 1998] An announcement was posted by TAMURA Kent (Tokyo Research Laboratory, IBM Japan) for an updated version of 'XML for Java'. Among the enhancements: 1) update to support the W3C DOM specification of April 16, 1998; 2) support for SAX Version 1.0; 3) support for UTF-16 encoding; 4) new factories. According to documentation packaged in this release: `XML for Java' is an XML processor written in Java, a library for parsing XML documents and generating XML documents. XML for Java runs on Java 1.1 and Java 1.2 Beta, not Java 1.0. The distribution includes some sample applications: 1) trlx, an XML syntax checker; 2) SiteOutliner - a Java application that scans a Web site and reports its profile in CDF format; 3) CDF Editor is a Java application to edit CDF files; 4) CDF Viewer is an applet that parses CDF files and visualizes their structures by using a tree; 5) Validating Generation sample - generates a valid element tree according to the specified DTD; 6) XML TreeView.
From the June 12, 1998 version README: "XPointer package com.ibm.xml.xpointer
package provides parsing XPointer expression, generating an XPointer instance from a node in a document tree, searching for nodes pointed by an XPointer instance." This version also has an XPointer sample demonstration.
[June 25, 1998] As of June 23, 1998, IBM's XML for Java, version 1.0.0, has been released with a free commercial license. Previously distributed under a 90-day trial license for commercial purposes, the Java Edition XML parser now allows developers to "use XML, create derivative works, and sell [their] products with IBM's XML parser inside." The IBM parser toolkit is still under development, under the supervision of Kent Tamura and Hiroshi Maruyama (IBM Tokyo Research Laboratory).
[September 03, 1998] IBM's XML for Java has been updated. It runs on Java 1.1.x, and some samples require Swing 1.0.x. The revision of September 2, 1998 provides support for the W3C DOM specification of 1998-08-18 (Document Object Model (DOM) Level 1 Specification, Version 1.0). It also includes an experimental implementation of the attribute-based namespace working draft (Namespaces in XML, WD-xml-names-19980802); the PI-based namespace support has been removed.
[October 09, 1998] The Alphaworks IBM Laboratory has released version 1.1.4 of the IBM XML Parser in Java (October 7, 1998). The new version of XML4J provides support for the REC-DOM-Level-1-19981001 W3C DOM Specification Version 1.0 (1-October-1998), and includes additional support for 18 different EBCDIC encodings; performance has been significantly improved (it runs 'twice as fast' as version 1.0.9), and numerous bugs have been fixed. From Kent Tamura and Hiroshi Maruyama, XML4J "is a validating XML parser written in 100% pure Java. The package (com.ibm.xml.parser
) contains classes and methods for parsing, generating, manipulating, and validating XML documents."
Links:
- Original Announcement, February 10, 1998
- Updated April 16, 1998
- Updated May 13, 1998
- UpdatedJune 12, 1998
- Sources [archive/snapshot 1.0.0] June 25, 1998
- Updated July 28, 1998 "for bug fixes, new samples, additional command line options, and updated API documentation."
- README document for VERSION: alpha-6 [12-Jun-1998]; not linked
- README document [from 980210 .ZIP package, for VERSION: alpha-3 Feb-1998
- `IBM XML for Java' - Description
- Now dated - Source (.ZIP, 980120); [local archive copy]
JUMBO - XML browser/editor
[CR: 19980904]
JUMBO (Java Universal Markup Browser for Objects) "is a Java-based browser for XML documents, being developed by Peter Murray-Rust. JUMBO is a set of Java classes for viewing CML (and other XML) applications. It can be used in standalone mode (application), or as applets downloaded from a server to a traditional Java-enabled browser, or locally, within a Java-enabled browser, with the classes under the document tree."
[September 04, 1998] Peter Murray-Rust posted an announcement describing the release of the latest snapshot of JUMBO2 (alpha2, version 2A2) and the associated Web site, xml-cml.org. XML-CML at xml-cml.org is the home page of the nascent Chemical Markup Forum, metamorphosing from the Open Molecular Foundation. JUMBO2 is an element-oriented XML-browser, in Java/Swing. It is an application for the demonstration of XML and CML. Its source is freely available with the normal sort of copyright. The architecture tries to follow the specs and anticipate the possible XML-related APIs. JUMBO2 is now offered to the community as a catalyst to spawn the creation of high-quality client-side tools ('browsers'). Ideally we converge towards a set of core APIs and all that remains of my code will be the elephant-specific stuff. I have already started to get some offers of help."
[May 28, 1998] An announcement was posted by Peter Murray-Rust for the release of JUMBO 2.0 (alpha). "JUMBO 2.0 is a Java-based freeware SAX-compliant XML browser/editor prototyping tool which tracks the emerging XML specs. It is a complete rewrite of JUMBO1 and has new functionality, especially for editing and exploration. JUMBO 2.0 uses the SwingSet (JFC) 1.0.1, with SAX, and your parser(s) of choice. [It] is offered as a collaborative core for Java-XML based projects. . . XML namespaces, XSL, XML-DTDs, XML-LINK, Xpointer etc. will be implemented as soon as the current [W3C] drafts firm up."
[January 29, 1998] Announcement from Peter Murray-Rust for an alpha "snapshot" (i.e. release) of his Java-based JUMBO tool. David Megginson has added JUMBO to the list of clients supporting SAX: "In Java, there are now five XML parsers with SAX support available and four publically-announced SAX clients (that makes twenty possible client-parser combinations, according to my arithmetic)." The documentation from Murray-Rust describes JUMBO as "an element-oriented system for processing XML documents. It can read and parse (with/without additional parsers, with/without the SAX interface). It creates a tree or elements and attributes with various types of content. It also supports processing instructions (PIs) in a generic manner. There is support for namespaces and XSL stylesheets, though JUMBO does not have sophisticated rendering. It has a browsing model based on a tree/TOC model, event streams or customised element display. It supports (SIMPLE) XLL navigation including NEW and REPLACE and most Xpointer syntax. It extends the latter to provide sophisticated search and navigation tools for the document. JUMBO also provides authoring and editing facilities, driven by DTD information where possible. These can be customised to provide novel types of data input other than text. JUMBO is designed to be extended, especially through subclassing or elements, and I hope that a collaborative community (cf. tcl/tk, LaTeX, Linux) will develop for its future support. . . [Among the principal features]: 1) JUMBO is 100% pure Java (1.02) and runs as an applet or application; 2) JUMBO does not knowingly deviate from the X*L specs, apart from known limitations; 3) JUMBO has an elementary XML parser, sufficient for its own configuration files; 4) JUMBO has been developed to be used with the SAX API so that any SAX-J-compliant parser [1998-01-28: AElfred, Lark, MSXML, NXP, (XP not yet done)] can be used at runtime." See http://www.vsms.nottingham.ac.uk/vsms/java/jumbo/jan9801.
[May 24, 1997] Jumbo is "a prototype XML engine primarily aimed at: (1) Providing a prototyping tool for XML developers; (2) Exploring non-textual uses of XML; (3) Specifically, but not exclusively, supporting Molecular Science; (4) Resolving semantics through hyperlinking to documents or Java methods."
"JUMBO is built from components and is not limited in what applications it can be configured for. At present it consists of these parts [described here in abbreviated format; see the full documentation for updated information]
[November 10, 1997] Announcement from Peter Murray-Rust (Virtual School of Molecular Sciences) for updates to JUMBO and CML1.2 (Chemical Markup Language).
Links:
- An XML parser (JUMBO will also interoperate with Lark, NXP or ESIS input)
- A TableOfContents/Tree Tool. JUMBO's main emphasis is on Structured Documents and most instances are presented as TOCs
- Generic Java class Downloader
- Applications, including TechnicalMarkupLanguage and ChemicalMarkupLanguage
Links:
- JUMBO 2.0 Alpha Source [local archive copy, 980528]
- JUMBO 9801 Alpha Home Page
- Alpha snapshot January 1998: from VSMS
- Mirror copy of JUMBO9801a1 release, 01 Feb 1998; [previous release: jumbo9801a]
- Overview of Jumbo [May 24, 1997]
- [June 28, 1997]. Announcement from Peter Murray-Rust - following the discussion of the API and the elements-as-classes - for the release of the API for JUMBO as javadoc classes
- Jumbo FAQ document
- Jumbo - publicly available XML browser
- Jumbo README; [mirror copy]
- Chemical Markup Language - CML Home Page
- CML Documentation
- XML Links from VSMS
LT XML - XML toolset
[CR: 19980626]
LT XML is issued by The Language Technology Group (Human Communication Research Centre, University of Edinburgh). "LT XML is an integrated set of XML tools and a developer's tool-kit, including a C-based API. It contains everything required to process a very wide range of conformant XML documents. The tools are intended to process all documents which are well formed according to [the XML specification]. Updated June 24, 1998 or later. LT XML is a cut down version of the LT NSL package. LT XML only processes XML files, rather than arbitrary valid SGML files. However, LT XML contains its own XML parser, thus does not require the SP SGML parser." A derived parser under development [in February 1998] is RXP (a non-validating XML parser in C); see below.
[June 26, 1998] An announcement was posted by Henry S. Thompson (HCRC Language Technology Group, University of Edinburgh) for the release of LT XML version 1.0. LT XML now meets the requirements for a fully conformant XML processor (per the XML 1.0 specification) and includes support for a wider range of characters encodings for input and output (UTF-8, ISO-646, SO-8859-n, UTF-16 and UCS-2). LT XML is both a set of command-line/console XML applications and a C language library supporting a powerful API for new application development. The new release comes in two versions: 1) a source version for UN*X platforms, with straight-forward compilation and installation procedures, and 2) source plus DLLs and executables version for WIN32 platforms. LT XML is available free for evaluation and non-commercial use. The package includes extensive documentation of the tools and the API, together with detailed examples of how to build your ownapplication using the API. Online documentation in HTML (built using DocBook 3.0) is also available in the "The XML Library LT XML version 1.0. User Documentation and Reference Guide." The LT XML API allows applications to choose, or even switch, between an event-oriented and a tree-oriented view of XML documents. The functionality of the tools in this release includes 1) Text extraction; 2) Powerful markup-aware 'grep' (search); 3) Down-translation; 4) Tokenisation; 5) Sorting; 6) Transclusion using a subset of XML-link. [adapted from the posting and Web site information]
The new release [version 0.9.5, September 01, 1997] of LT XML represents ... a " high-performance publicly available XML toolset written in C. The LT XML tool-kit includes stand-alone tools for a wide range of processing of well-formed XML documents, including searching and extracting, down-translation (e.g., report generation, formatting), tokenising and sorting... [the release] includes executable images for a range of platforms, including Windows 95 and Windows NT, FreeBSD, Linux and Solaris. A preliminary partial Macintosh version is also available. This release is restricted to 8-bit character input/output, and does NOT do validation, although it does process and make use of DTDs in documents which include them... [Tools in the new 0.9.5 release]: (1) sggrep -- extract sub-parts of XML documents, using patterns over element structure and text content; (2) textonly -- extract text content only; (3) sgsort -- reorder sub-elements within specified elements; (4) sgmltrans -- pattern+action downtranslation tool; (5) sgrpg -- Structure-based transformation tool; (6) simple, simpleq -- event- and fragment-based examples of API use."
Links:
- [September 01, 1997] Announcement from Henry S. Thompson of The HCRC Language Technology Group for a new release of LT XML.
- Announcement from Henry S. Thompson [May 26, 1997]
- LT XML Main Page
- Documentation online
- Download instructions
- Language Technology Group Home Page
Project addresses:
Language Technology Group
Human Communication Research Centre
2, Buccleuch Place
Edinburgh EH8 9LW, UK
Tel: +44 131 650 4426
Fax: +44 131 650 4587
Email: M.Moens@ed.ac.uk
RXP XML parser program
[CR: 20000807]
[August 07, 2000] See the 'XML well-formedness checker and validator' based upon RXP - interactive and online. "Use this form to check an XML document for well-formedness and (optionally) validity. External entity references are included, even when not validating. If the document is well-formed, the parser outputs the corresponding canonical XML." Validation and namespace processing can be toggled off/on. "XML namespaces don't mesh well with DTD-based validity, so you quite likely won't want to select both validation and namespace processing. Only HTTP URLs are allowed. My HTTP code doesn't understand redirects, so be sure to put a slash on the end of URLs that refer to directories. RXP is licensed under the GNU Public Licence. It may be made available under other licensing terms; contact M.Moens@ed.ac.uk for details." The author provides Win32 binaries as well as source code; cache]
[February 17, 1999] Richard Tobin has announced the release of RXP Version 1.0. RXP is a validating XML parser in C, developed by the Language Technology Group, Human Communication Resarch Centre, University of Edinburgh. A simple application (called rxp) is provided that parses and writes XML data, optionally expanding entities, defaulting attributes, and translating to a different output encoding. Some command-line options include: insertion of declared default values for omitted attributes; expansion of entity references; printing output as "bits"; XML well-formedness checking mode (vs. validation mode); treating the input as normalized SGML rather than as XML; producing output in the specified character encoding (ISO-8859-1, UTF-8, ISO-10646-UCS, UTF-16); specifying big- or little-endian byte order for 16-bit encoding names. There is an RXP web page at http://www.cogsci.ed.ac.uk/~richard/rxp.html. Bug reports should be sent to richard@cogsci.ed.ac.uk. RXP 'is used by the LT XML toolkit, and in the Festival speech synthesis system'; it also supports an online XML checking tool. "Whereas previous versions were available only for individual, research and educational use, this version is licensed under the GNU Public Licence (GPL)." Other XML parsers: see "XML Parsers and Parsing Toolkits."
[Earlier description] RXP (version .8, beta-test release, May 26, 1998) is a non-validating XML parser in C. It is maintained by Richard Tobin (Centre for Cognitive Science and the Human Communication Research Centre, Edinburgh). RXP is based on the W3C recommendation of 10th February 1998, and is free for individual, research and educational use and for evaluation. RXP comes with a sample application (parser program, 'rxp') which "reads and parses XML from the (or standard input if none is provided) and writes it to standard output, optionally expanding entities, defaulting attributes, and translating to a different output encoding. [. . .] It can be compiled in 8- or 16-bit character mode. In 8-bit mode, the internal encoding is a superset of ASCII, in which all characters above 0xa0 are treated as name characters. Characters are not translated on input or ouput. This means that well-formed documents in ASCII and ISO-8859-N should work. In 16-bit mode, the internal encoding is UTF-16 and the supported input encodings are ISO-8859-N (1 <= N <= 9), UTF-16 and UTF-8."
Links:
- Download directory (FTP)
- [October 18, 1999] Available also with MSDOS/Windows32 binaries.
- RXP Description - based upon version .8
- Man page documentation for rxp, beta version .8
- [November 17, 1997] Earlier update description. RXP (alpha). Announcement from Richard Tobin (Language Technology Group, HCRC) for an alpha-test release of RXP, an XML parser in C which "will be the parser in the next release of the LT XML system." According to the documentation, the parser application "reads and parses XML from the URL (or standard input if none is provided) and writes it to standard output, optionally expanding entities, defaulting attributes, and translating to a different output encoding." [...] "RXP is based on the W3C WG draft of 7th August 1997, with some more recent changes; it is free for individual, research and educational use and for evaluation. It can be compiled in 8- or 16-bit character mode. In 8-bit mode, the internal encoding is ISO-8859-1 (Latin-1), and that is the only supported input encoding. In 16-bit mode, the internal encoding is UTF-16 and the supported input encodings are ISO-8859-1, UTF-16 and UTF-8."
- [February 16, 1998] Announcement from Richard Tobin for an updated version of "RXP - a non-validating XML parser in C, with support for UTF-8, UTF-16, and ISO-8859-1 character encodings..." URL: ftp://ftp.cogsci.ed.ac.uk/pub/richard/rxp.tar.gz; [local archive copy, 980216]
XED - An XML document instance editor
[CR: 19980715]
[July 15, 1998] An announcement from Henry S. Thompson reported on the availability of a new beta-release of the XED "XML document instance editor" from the HCRC Language Technology Group, University of Edinburgh. This new beta-level release of XED has additional features, improved installation packaging for WIN32 platforms, and bug fixes. Upgrades include: 1) refilling of text content and indenting of element content upon request; 2) accented character support [ISO-8859-1]; 3) an experimental file processing facility: processing may be invoked on the file, "and XED will then step you through any validation or application errors which are logged" (e.g., nsgmls and jade).
[March 18, 1998] Henry S. Thompson (Language Technology Group, University of Edinburgh) posted an announcement for the availability of an alpha release of 'XED: A smart XML instance editor'. As a WYSIWYG XML instance editor, "XED uses the LT XML toolset integrated with a Python-Tk user interface, to provide a free, cross-platform, well-formedness preserving editor for XML document instances. . . as a text editor for XML document instances, it is designed to support hand-authoring of small-to-medium size XML documents, and is optimised for keyboard input. It works very hard to ensure that you cannot produce a non-well-formed document. Although it neither parses DTDs in detail nor validates, it does keep track of your document structure, and provides context-based accelerators to make element and attribute entry fast and easy. XED keeps track of all the changes you make in your document, so that you can undo changes, as many as you need to, if you make a mistake. This makes it easy to learn . . ." Windows95/NT and Solaris 2.5 binaries are available now [980320]. The author solicits feedback from testers for this alpha version of XED.
An update notice for the alpha version 0.2.1.4 was posted on April 02, 1998.
- Changes and feature list for version 0.2.1.4
- XED Main Page - With a description of features and limitations.
- Announcement for the alpha version 980318
- Windows95/NT alpha binary 980318
- Solaris 2.5 alpha binary 980318
Ælfred XML Parser
[CR: 19990324]
[May 05, 1998] Announcement from David Megginson for the public release of an updated (1.2) version of Microstar's Ælfred XML parser. Ælfred is "a small, fast, DTD-aware Java-based XML parser, especially suitable for use in Java applets, free for both commercial and non-commercial use." User-visible changes in the parser since version 1.1 include: 1) The XmlParser.parse
method for parsing from a URI now has a third argument, for an encoding (if known); 2) The XmlHandler.resolveEntity
method is more powerful: you may return a String (for a URI), an InputStream, or a Reader. If you return null, the parser will take the default action; 3) The SAX driver has been updated to SAX 1.0gamma (released 1 May 1998), available from http://www.megginson.com/SAX/. [cache copy]
[March 09, 1998] Announcement from David Megginson (Microstar Software Ltd.) for the version 1.1 release of Microstar's free Java-based XML parser, Ælfred. From the announcement: "Ælfred is a very small, very fast XML parser optimised for use with applets, where Java 1.0.2 compatibility and download time are major requirements. Ælfred is forgiving with some errors, but otherwise supports the entire feature set of the XML 1.0 recommendation including Unicode, defaulted attribute values, external DTD subsets, external entities, and flagging of ignorable whitespace. The distribution also contains a native SAX (Simple API for XML) driver so that you can interchange Ælfred with other SAX-supported parsers without rewriting your code. Version 1.1 introduces a smaller, cleaner interface, together with some important new functionality: 1) the ability to read an XML document from an input stream as well as a URI; 2) a new, optional SAX driver; 3) a new, optional base class for deriving event handlers; 4) a new, optional exception class for reporting parsing errors; 5) use of the HTTP content-encoding parameter, if available; 6) better position-reporting for errors." See the Microstar news document for more detailed information on Ælfred 1.1 changes,
On December 09, 1997, an announcement was posted by David Megginson of Microstar Software Ltd. for the availability of a free Java-based XML parser, the AElfred XML Parser. According to the announcement, Microstar has released "Ælfred (AElfred), a small, fast, DTD-aware Java-based XML parser, especially suitable for use in Java applets. Ælfred has been designed for Java programmers who want to add XML support to their applets and applications without doubling their size: Ælfred consists of only two class files, with a total size of approximately 24K, and requires very little memory to run. Ælfred also implements Java's java.lang.Runnable
interface and a zero-argument constructor, so it's easy to start Ælfred as a separate thread or to adapt it for use as a JavaBean. Ælfred is free for both commercial and non-commercial use. . ."
[December 11, 1997] 1.0beta3 release: The new version is still interface-compatible with the first two public betas, but it adds the ability to query for content models and enumerated attribute types (both returned as normalised strings, with whitespace removed and parameter entities resolved). With the new query routines, Ælfred is now capable of producing a normalised version of an XML document's DTD; in fact, the distribution now includes a new demonstration class, DtdDemo.java, that does exactly that."
Links:
- Ælfred - Microstar's Java-Based XML Parser
- Ælfred Design Principles
- Online Demonstration
- Ælfred News document
- Original announcement - December 9, 1997
- Ælfred Version 1.1 README
- Download
- JavaDoc Documentation
DataChannel XML Development Environment (DXDE)
[CR: 19980211]
"DXDE, with its first complete roll-out in 1998, will be a collection of XML tools including parsers, viewers, and APIs. We will also supply documentation and tutorials. Primary contributors to DXDE include Norbert Mikula and John Tigue, both XML pioneers and DataChannel's XML experts, as well as other leading XML researchers and developers." As of December 10, 1997, components available included: NXP - Norbert's XML Parser, A demo of the XML viewer, Deployment kit for the XML viewer, and Source code to an XML parser (Pax Syntactica). An addition in 1998 will be an XML Server - "a platform-independent server that supports a database schema for managing and distributing meta-data."
[February 11, 1998] See the announcement from Norbert Mikula of DataChannel for the availability of a beta version of DXP - DataChannel's XML Parser. According to the press release, "the DataChannel XML Parser is a Java-based XML parser designed for server side-based XML parsing and integration. It is a redesigned version of NXP (Norbert Mikula's XML Parser), one of the first XML parsers. DXP allows application developers to make their applications XML-aware by providing them with the ability to import XML data into their own data structure. Data can come from a database, the Web, a file, or from a local application -- whatever a URL can address. . . The DataChannel XML Parser is part of the DataChannel XML Developer Toolkit (DXDE), which will be available Q1Y98."
Links:
- DXDE - XML tools and technology
- DXP - DataChannel XML Parser
- [December 09, 1997] "DataChannel provides free online toolkit for XML developers"; [local archive copy]
- DataChannel Home Page
Tcl XML Parsing Package
[CR: 19990524]
From the Australian National University, a Tcl package has been created for parsing XML documents and DTDs. This package requires Tcl 8.0b1 or a later version. The parser has been tested with a simple DTD and several small document instances; it is said to use "the XML namespace."
Links:
- TclXML
- [May 24, 1999] Steve Ball (Zveno Pty. Ltd.) has announced that TclXML version 1.2 -- the All-Tcl XML parser -- is now available for download. "TclXML is a non-validating, event-based parser that is plug-compatible with TclExpat, a Tcl interface to James Clark's expat XML parser. This parser only works with Tcl 8.1 or later. Since it is pure Tcl, no compilation or extensions are required. TclXML will run on any platform where Tcl runs: Unix, Windows and Macintosh. . . Version 1.2 includes support for Unicode documents, using the new facilities of Tcl 8.1. There are no API-level changes."
- "XML and the Desperate Tcl Hacker." By Steve Ball [Plume Project, Australian National University]. Presented at 7th International World Wide Web Conference. See: the presentation abstract.
- Other [earlier?] links
- Contact: Steven Ball. Email: steve@cs.anu.edu.au
XML Editing Mode in PSGML
[CR: 19980223]
[December 09, 1997] Announcement from David Megginson (Microstar Software Ltd.) for a new public version of the XML patches for Lennart Staflin's PSGML (an SGML mode for Emacs). Available from the author's home page. "These patches allow you to use PSGML in Emacs as a non-validating XML editor: all names will be case-sensitive, many (but not all) forbidden constructions will generate errors, all attribute values will be quoted, and PSGML will use the variant XML delimiters. There are also two changes that are useful for full SGML as well as XML: 1) these patches add support for multiple ATTLIST declarations for the same associated element type; 2) the variable sgml-namecase-general allows you to make element type names, attribute names, and keywords case-sensitive in full SGML as well."
[August 09, 1997] Public posting of an announcement from David Megginson (Microstar Software Ltd.) for initial enhancements of PSGML to enable an XML editing mode: ". . . I patched PSGML to add an XML mode that enables XML-specific delimiters, parsing, and error-reporting -- in other words, it's a real, native XML DTD-driven editor." The new code for XML support has not yet been incorporated into the main psgml distribution, but Megginson is requesting assistance from qualified alpha testers to help debug the code. Please help! The announcement contains a list of currently supported and unsupported XML features.
Links:
- [February 23, 1998] Announcement from David Megginson for additional support in the PSGML-XML patches. In addition to bug fixes, Megginson has implemented new support for the
`sgml-system-path'
variable: its initial value automatically from the environment variableSGML_SEARCH_PATH
, allowing reference to a DTD with a (single) relative URL, independent of the current/working directory. See Megginson's home page. - Announcement for new version - December 1997
- Miyashita Hisashi has reportedly implemented a version of PSGML-XML that works on Meadow. Meadow ('Multilingual enhancement to gnu Emacs with ADvantages Over Windows') is a fully internationalized version of Emacs20 on MS Windows. [Note from MURATA Makoto, 980220]
- Download 9712 version: http://home.sprynet.com/sprynet/dmeggins/psgmlxml-19971208.zip; local archive copy
- Announcement - August 1997
- Main entry for PSGML, with hints for installation and configuration for use with fonts
XSLJ - Jade-compatible XSL-to-DSSSL translator
[CR: 19980112]
[January 12, 1998] Announcement from Henry S. Thompson (Human Communication Research Centre, University of Edinburgh) for the "final" beta release of XSLJ. XSLJ is an XSL to DSSSL Translator. Specifically, it translates from "the XML style language proposed in 'A Proposal for XSL' to the augmented version of DSSSL which is supported by the test release of JADE. Thus, xslj "translates valid XSL style sheets into valid extended DSSSL style sheets, which can then be used to render XML documents using Jade." The current release from Thompson includes bug fixes and an aditional increase in conformance to the W3C proposal (e.g., mixed content is now allowed in style sheet 'actions'). How does xslj compare to Microsoft's new XSL support in MSXSL? According to the xslj documentation, Microsoft's MSXSL "does not support flow-object macros or named styles and supports only the HTML flow-objects, but can therefore be integrated more closely with a browser."
Announcement from Henry S. Thompson for the release of an alpha version of xslj, a Jade-compatible XSL-to-DSSSL translator. "XSLJ is a virtually complete implementation of XSL by way of translation into extended DSSSL, as supported by the latest test release of James Clark's DSSSL engine Jade. XSLJ translates valid XSL style sheets into valid extended DSSSL style sheets, which can then be used to render XML documents using Jade. Virtually all of XSL as described in the W3C document 'A Proposal for XSL' is supported, although some minor modifications have been necessitated by the exigencies of implementation, all of which are described in detail in material contained in the release.." XSLJ development was supported by the UK Economic and Social Research Council via their support for HCRC and by a grant from Microsoft. See the University of Edinburgh Web site for details: http://www.ltg.ed.ac.uk/~ht/xslj.html.
"Major XSL features which are [now 971121] supported include: 1) template-based style rules using XML itself as the notation; 2) The pattern language: how to identify elements in style rules; 3) The rendering language: how to describe the desired appearance; 4) The expression language (based on JavaScript): when computation is required; 5) Flow-object macros; 6) Style rules (cascading). [...] XSL specifies two sets of flow objects for expressing the style of desired output: one based on DSSSL and one based on HTML/CSS. Both are supported by xslj. Using the DSSSL flow objects, output using any of the Jade backends is supported, including RTF, TeX and SGML. Using the HTML/CSS flow objects, output is to HTML using the Jade SGML backend."
[November 25, 1997] Announcement by Henry S. Thompson (Human Communication Research Centre, University of Edinburgh) for an updated version of the XSL-to-DSSSL translator xslj. Version 0.3 "includes a number of bug fixes (thanks for reports) and much improved HTML output when the CSS/HTML flow objects are used."
docproc - an XML + XSL document processor
[CR: 19980318]
Under development by Sean Russell (Department of Physics, University of Oregon), "docproc is a software package that provides processing and layout of XML documents based on XSL scripts. docproc is written in pure java, and can be used as a server-side preparser for serving XML documents on the web. . .docproc can be used in two different ways. The first, and ideal, method is to use docproc as a servlet; the other way to use docproc is to call it by hand on documents that you want to reformat."
Links:
- Announcement from Sean Russell for the beta release of docproc 2.
- URL: http://javalab.uoregon.edu/ser/software/docproc_2/docs/index.xml
- News on docproc
- Updated 08 Feb 98 to improve the HTML markup output.
- [March 18, 1998] [Snapshot of the development/documentation
DTDGenerator - XML DTD Generator
[CR: 20000105]
"DTDGenerator is a program that takes an XML document as input and produces a Document Type Definition (DTD) as output. The aim of the program is to give you a quick start in writing a DTD. The DTD is one of the many possible DTDs to which the input document conforms. Typically you will want to examine the DTD and edit it to describe your intended documents more precisely. In a few cases you will have to edit the DTD before you can use it. DTDGenerator was written by Michael Kay of ICL. DTDGenerator is now issued as part of the SAXON XSL product. It can be used either by installing SAXON on your own machine, or as a web-based service provided by Paul Tchistopolskii at http://www.pault.com/Xmltube/dtdgen.html. If you use this service, ensure that the XML file you upload contains no references to other local files such as a DTD or an external entity."
References:
- SAXON home page
- DTD Generator home page
- Dated: Description of DTDGen [from the distribution dated '29 April 1998', 980505]
- Distribution, archive copy 980505
- Announcement for DTDGen, 30-April-1998
- See also: SGML DTD generation from tagged text, with OCLC's Fred
Near & Far Designer - DTD Design Tool
[CR: 19980511]
Near & Far Designer is a visual DTD design tool, especially useful for those who are new to structured information and DTD design. "DTDs can be created and modified graphically without prior knowledge of XML/SGML language syntax. With the intuitive tree representation, a DTD can be created from scratch or imported, reworked and exported as a revised DTD. Structures can be explored to any level of detail. The drag and drop interface makes working with DTDs easy." [adapted 980511]
Links:
- Tool description
- Microstar Home Page
- Demo/Evaluation version: [new URL?]
The Ace Scripting Language
[CR: 19980513]
Ace is a high-performance, strongly typed language with comprehensive support for the SGML and XML document standards. It features an extensive library of SGML and XML manipulation functions. It is part of the Structured Information Manager (SIM) product range. Free for use in a non-commercial or commercial application, as long as you does not sell it or include it in a product for sale. "SIM includes a high performance SGML/XML database server for multi-gigabyte databases and a high performance web server. The Ace scripting language is used throughout SIM providing a high degree of configurability."
Links:
- Ace Overview; [local archive copy]
- Download
- Sample code (including SGML Normalisation, SGML Tree Walking)
HXA/HXP - Hubick's XML Analyzer, Parser
[CR: 19980723]
On July 22, 1998, Chris Hubick posted an announcement for the availability of the beta version of an 'Online XML Analysis Tool'. "HXA - Hubick's XML Analyzer is a [grammar] production based online XML parser/analysis tool. . . it is a pure Java tool built upon a low level XML parser (HXP) which breaks an XML file down into its constituent productions for analysis. HXA allows one to examine the production hierarchy for any character in an XML document or document fragment. For easy reference, HXA also provides links from each production in the analysis to its corresponding section in the XML specification." The XML parser used with HXA is said to be 'not yet' a proper XML parser.
Links:
Microsoft XML Notepad
[CR: 19980723]
On July 22, 1998, Microsoft Corporation released the Beta 1 version of a "Microsoft XML Notepad." The online description says: "Microsoft XML Notepad is a simple prototyping application for HTML authors and developers that enables the rapid building and editing of small sets of XML-based data. With XML Notepad, developers can quickly create XML prototypes in an iterative fashion, using familiar metaphors. XML Notepad offers an intuitive and simple user interface that graphically represents the tree structure of XML data. . . XML Notepad's user interface is simple and intuitive. The XML source is represented graphically. The topmost element is the root element. Every XML file can have only one root element. Elements are represented by either folder icons, if they have dependent structures (for example, attributes or other elements), or by leaf icons if they have no substructures. Attributes are represented by 3-D blocks while text and comments are represented by text icons and exclamation mark icons, respectively. The structure of the data is represented in the left column while the values of the nodes are displayed in the right column." Interesting features: 1) search and replace of text can be restricted to one or more of 'content, element type names, attribute names, attribute values, and comments'; 2) files for editing can be nominated by system (filename) or URL; 3) drag-and-drop nodes.
Links:
- Microsoft XML Notepad - Introduction
- Microsoft XML Notepad - Frequently Asked Questions
- Microsoft XML Notepad - Release Notes
- Microsoft XML Notepad - Download Page
- Microsoft XML Notepad - Getting Started
xmlproc: A Python XML parser
[CR: 19980724]
Lars Marius Garshol is developing xmlproc as part of a larger project, ""Tools for parsing XML with Python," itself "a part of the ongoing effort to make Python the language of choice for XML processing." "xmlproc is an XML parser written in Python. It is a fairly complete validating parser, but does not do everything required of a validating parser, or even a well-formedness parser. The average user should not run into any omissions, though. Later releases will be more complete. xmlproc can be used both as a command-line parser and as a parser API you can use to write XML applications. xmlproc supports both SGML Open Catalogs and XCatalog 0.1." [Version 0.50, July 18, 1998]
Links:
- xmlproc Main Page
- Documentation: The xmlproc APIs
- Documentation: xmlproc DTD APIs
- Documentation: xmlproc catalog file support; [local archive copy]
- Sources [local archive copy, 980724]
- Python XML-SIG mailing list
xmlarch.py: An XML architectural forms processor
[CR: 19990323]
[March 23, 1999] In connection with the release of tmproc, note that Geir Grønmo has released a new version of xmlarch [0.25] - An XML Architectural Forms Processor. "The xmlarch module contains an XML architectural forms processor written in Python. It allows you to process XML architectural forms using any parser that uses the SAX interfaces. The module allow you to process several architectures in one parse-pass. Architectural document events for an architecture can even be broadcasted to multiple DocumentHandlers."
[July 24, 1998] Geir Ove Grønmo (STEP Infotek) has announced the 'very early release' of an XML architectural forms processor in Python. xmlarch.py is a module which contains "an XML architectural forms processor written in Python. It allows you to process XML architectural forms using any parser that uses the SAX interfaces. The module allow you to process several architectures in one parse pass. Architectural document events for an architecture can even be broadcasted to multiple DocumentHandlers. (e.g. you can have 2 handlers for the RDF architecture, 3 for the XLink architecture and perhaps one for the HyTime architecture.) The architecture processor uses the SAX DocumentHandler interface which means that you can register the architecture handler (ArchDocHandler) with any SAX 1.0 compliant parser." The online documentation contains two complete examples and links for architectural forms processing in SGML/XML. The author solicits feedback on his software.
Links:
DB2XML
[CR: 19990301]
On March 01, 1999, Volker Turau (Fachhochschule Wiesbaden, Fachbereich Informatik) announced the public release of DB2XML, available now for download. "DB2XML is a tool for transforming relational databases into XML (Extensible Markup Language) documents. It is written in Java. DB2XML provides two main functions: 1) Transforming the results of database queries into XML documents; 2) Providing attributes describing the characteristics of the data. DB2XML comes with an easy to use graphical user interface and accesses databases using JDBC drivers. It requires JDK 1.1 (or higher) and a database with a JDBC driver (or a ODBC driver using the JDBC-ODBC bridge). DB2XML is well documented and can be used freely."