[This local archive copy mirrored from the canonical site: http://www.infotek.no/~grove/software/xmlarch/, 980724; links may not have complete integrity, so use the canonical document at this URL if possible.]

xmlarch.py: An XML architectural forms processor

Version: 0.10
Author: Geir Ove Grønmo
Email: grove@infotek.no
Released: July 24th 1998

What is xmlarch.py?

The xmlarch module contains an XML architectural forms processor written in Python. It allows you to process XML architectural forms using any parser that uses the SAX interfaces. The module allow you to process several architectures in one parse pass. Architectural document events for an architecture can even be broadcasted to multiple DocumentHandlers. (e.g. you can have 2 handlers for the RDF architecture, 3 for the XLink architecture and perhaps one for the HyTime architecture.)

The architecture processor uses the SAX DocumentHandler interface which means that you can register the architecture handler (ArchDocHandler) with any SAX 1.0 compliant parser.

It currently does not process any meta document type definition documents (DTD). When a DTD parser module is avaliable I will use that in order to process meta DTD information.

Please note that validating and well-formed parsers may report different SAX events when parsing documents.

What does the xmlarch module contain?

xmlarch.py contains six classes ArchDocHandler, Architecture, ArchParseState, ArchException, AttributeParser and Normalizer.

Using the xmlarch module

Using the xmlarch module usually means that you have to do the following things:

A simple example

Python code

# Import needed modules
from xml.sax import saxexts, saxlib, saxutils
import sys, xmlarch

# Create architecture processor handler
arch_handler = xmlarch.ArchDocHandler()

# Create parser and register architecture processor with it
parser = saxexts.XMLParserFactory.make_parser()
parser.setDocumentHandler(arch_handler)

# Add an document handler to process the html architecture
arch_handler.addArchDocumentHandler("html", xmlarch.Normalizer(sys.stdout))

# Parse (and process) the document
parser.parse("simple.xml")

XML document

<?xml version="1.0"?>
<?IS10744:arch name="html"?>
<doc>
<title html="h1">My first architectual document</title>
<author html="address">Geir Ove Gronmo, grove@infotek.no</author>
<para>This is the first paragraph in this document</para>
<para html="p">This is the second paragraph</para>
</doc>

Result


<html>
<h1>My first architectual document</h1>
<address>Geir Ove Gronmo, grove@infotek.no</address>

<p>This is the second paragraph</p>
</html>

See also the files simple.py and simple.xml in the distribution.

If you try to process the persons architecture in this document instead you get the following output:

Result

<persons>

<author>Geir Ove Grønmo</author><mentioned>Eliot Kimber</mentioned><mentioned>David Megginson</mentioned><mentioned>Lars Marius Garshol</mentioned>
</persons>	      

A more complex example

Python code

# Import needed modules
from xml.sax import saxexts, saxlib, saxutils
import sys, xmlarch

# create architecture processor handler
arch_handler = xmlarch.ArchDocHandler()

# Create parser and register architecture processor with it
parser = saxexts.XMLParserFactory.make_parser()
parser.setDocumentHandler(arch_handler)

# Add an document handlers to process the html and biblio architectures
arch_handler.addArchDocumentHandler("html", xmlarch.Normalizer(open("html.out", "w")))
arch_handler.addArchDocumentHandler("biblio", saxutils.ESISDocHandler(open("biblio1.out", "w")))
arch_handler.addArchDocumentHandler("biblio", saxutils.Canonizer(open("biblio2.out", "w")))

# Register a default document handler that just passes through any incoming events
arch_handler.setDefaultDocumentHandler(xmlarch.Normalizer(sys.stdout))

# Parse (and process) the document
parser.parse("complex.xml")

Because this causes a lot of output I've not included the XML document and the results. See instead the files complex.py and complex.xml in the distribution and try it yourself.

Testing the xmlarch module

The distribution also contain test scripts. archtest.py can be run on the command line. It needs two arguments. The first is the name of the architecture to process and the second is the XML document to process. The result is printed on stdout as a normalized document. You can also use the --debug flag to tell it to output debug information to stderr.

Example I: python archtest.py html simple.xml

Example II: python archtest.py --debug biblio complex.xml

Example III: python archtest.py persons http://www.infotek.no/~grove/software/xmlarch/xmlarch.html

Download

You can get it here.

Related information

Feedback -- bug reports, features and improvements

I would very much welcome any feedback on any issue regarding this piece of software. Feedback should be sent to grove@infotek.no.


July 24th 1998, 10:35 Geir O. Grønmo, grove@infotek.no