[This local archive copy mirrored from the canonical site: http://www.infotek.no/~grove/software/xmlarch/, 980724; links may not have complete integrity, so use the canonical document at this URL if possible.]
xmlarch.py: An XML architectural forms processor
Version: | 0.10 |
Author: | Geir Ove Grønmo |
Email: | grove@infotek.no |
Released: | July 24th 1998 |
What is xmlarch.py?
The xmlarch module contains an XML architectural forms processor written in Python. It allows you to process XML architectural forms using any parser that uses the SAX interfaces. The module allow you to process several architectures in one parse pass. Architectural document events for an architecture can even be broadcasted to multiple DocumentHandlers. (e.g. you can have 2 handlers for the RDF architecture, 3 for the XLink architecture and perhaps one for the HyTime architecture.)
The architecture processor uses the SAX DocumentHandler interface which means that you can register the architecture handler (ArchDocHandler) with any SAX 1.0 compliant parser.
It currently does not process any meta document type definition documents (DTD). When a DTD parser module is avaliable I will use that in order to process meta DTD information.
Please note that validating and well-formed parsers may report different SAX events when parsing documents.
What does the xmlarch module contain?
xmlarch.py contains six classes ArchDocHandler, Architecture, ArchParseState, ArchException, AttributeParser and Normalizer.
- ArchDocHandler is a subclass of the saxlib.DocumentHandler interface. This is the class used for provessing an architectural document.
- Architecture contains information about an architecture.
- ArchParseState holds information about an architecture's parse state when parsing a document.
- AttributeParser parses architecture use declaration PIs (attribute strings).
- ArchException holds information about an architectural exception thrown by the ArchDocHandler
- Normalizer is a document handler that outputs "normalized" XML.
Using the xmlarch module
Using the xmlarch module usually means that you have to do the following things:
- Import the required SAX modules; saxexts, saxlib, saxutils.
- Import the xmlarch module.
- Create a SAX compliant parser object.
- Create an XML architectures processor handler.
- Register this handler with the parser.
- Add document handlers for the architectures you want to process.
- Register a default document handler with the architecture processor handler.
- Parse a document.
A simple example
Python code |
# Import needed modules
from xml.sax import saxexts, saxlib, saxutils
import sys, xmlarch
# Create architecture processor handler
arch_handler = xmlarch.ArchDocHandler()
# Create parser and register architecture processor with it
parser = saxexts.XMLParserFactory.make_parser()
parser.setDocumentHandler(arch_handler)
# Add an document handler to process the html architecture
arch_handler.addArchDocumentHandler("html", xmlarch.Normalizer(sys.stdout))
# Parse (and process) the document
parser.parse("simple.xml")
|
XML document |
<?xml version="1.0"?>
<?IS10744:arch name="html"?>
<doc>
<title html="h1">My first architectual document</title>
<author html="address">Geir Ove Gronmo, grove@infotek.no</author>
<para>This is the first paragraph in this document</para>
<para html="p">This is the second paragraph</para>
</doc>
|
Result |
<html>
<h1>My first architectual document</h1>
<address>Geir Ove Gronmo, grove@infotek.no</address>
<p>This is the second paragraph</p>
</html>
|
See also the files simple.py and simple.xml in the distribution.
If you try to process the persons architecture in this document instead you get the following output:
Result |
<persons>
<author>Geir Ove Grønmo</author><mentioned>Eliot Kimber</mentioned><mentioned>David Megginson</mentioned><mentioned>Lars Marius Garshol</mentioned>
</persons>
|
A more complex example
Python code |
# Import needed modules
from xml.sax import saxexts, saxlib, saxutils
import sys, xmlarch
# create architecture processor handler
arch_handler = xmlarch.ArchDocHandler()
# Create parser and register architecture processor with it
parser = saxexts.XMLParserFactory.make_parser()
parser.setDocumentHandler(arch_handler)
# Add an document handlers to process the html and biblio architectures
arch_handler.addArchDocumentHandler("html", xmlarch.Normalizer(open("html.out", "w")))
arch_handler.addArchDocumentHandler("biblio", saxutils.ESISDocHandler(open("biblio1.out", "w")))
arch_handler.addArchDocumentHandler("biblio", saxutils.Canonizer(open("biblio2.out", "w")))
# Register a default document handler that just passes through any incoming events
arch_handler.setDefaultDocumentHandler(xmlarch.Normalizer(sys.stdout))
# Parse (and process) the document
parser.parse("complex.xml")
|
Because this causes a lot of output I've not included the XML document and the results. See instead the files complex.py and complex.xml in the distribution and try it yourself.
Testing the xmlarch module
The distribution also contain test scripts. archtest.py can be run on the command line. It needs two arguments. The first is the name of the architecture to process and the second is the XML document to process. The result is printed on stdout as a normalized document. You can also use the --debug flag to tell it to output debug information to stderr.
Example I: python archtest.py html simple.xml
Example II: python archtest.py --debug biblio complex.xml
Example III: python archtest.py persons http://www.infotek.no/~grove/software/xmlarch/xmlarch.html
Download
You can get it here.
Related information
Feedback -- bug reports, features and improvements
I would very much welcome any feedback on any issue regarding this piece of software. Feedback should be sent to grove@infotek.no.
July 24th 1998, 10:35
Geir O. Grønmo, grove@infotek.no