[This local archive copy is from the official and canonical URL, http://jfinity.com/muxml/; please refer to the canonical source document if possible.]
MuXML (Version 0.1)
An XML Document Multiplexor
MuXML is a prototype Perl module that implements configurable multiplexing of XML document streams accessed via the LWPng module and parsed using the XML::Parser module. Its returns its results using the XML::Grove module. Its primary purpose is to serve as a demonstration of the use of non-blocking design approaches with XML and Perl. If it also ends up being a useful tool then that's gravy :-).
You can find the source code here if you are interested in browsing.
One way to think about the problem domain that MuXML (and the underlying sub-systems) is addressing is the generalization of stream oriented document processing to the handling or more than one document stream at a time (lets call them a stream set). It turns out that in a single threaded environment, you don't even have to have more than one document in order to to need to employ the kinds of approaches used by LWPng and MuXML.
The various streams in a stream set may have very different throughput levels. This means that you may need to be able to throttle the fast streams in order to not overwhelm the slow ones. Single stream processing frameworks like those of SAX based XML parsers do not generalize to use with stream sets. The main reason is that single stream based frameworks do not support flow-control except in a cumbersome round-about way. MuXML can be viewed as a framework for processing stream sets with explicit flow control built in.
The most obvious use of MuXML is to multiplex record oriented XML document streams. I'll try to provide some demonstration data generated with something like XFlat/XML Convert in the next release.
One possible application of MuXML would be the emulation of the UNIX sort command's merge capability. Let's call this sample application MergeML. You would pass MergeML a list of XML documents, each of which was already sorted. It would output an XML document that interleaved the contents of the input documents based on same sorting criteria used for the internal sort.
Another sample application would be aggregating information from a distributed logging system. Let's say that you have a site that is replicated across multiple distinct servers. Each server is stand-alone and does its own logging. The servers create a nightly ordered listing of page accesses (in XML for arguments sake :-). Your MuXML application would access the hit logs and process them in parallel. It would output the aggregate access count across all the sites.
The MuXML conceptual model is that of a filtering smart multiplexor. You hand MuXML a list of URL. It accesses and parses them incrementally based on a blockSize that you specify. MuXML uses HTTP partial GET requests to allow flow control to be propagated to the resource servers. It will only get as much data from the server as it needs it in order to generate a fragment. Once it has one or more fragments queued for a particular stream, it will only get more data once the application has consumed the current fragments.
It filters out fragments of the incoming documents using a fragment recognizer you give it. Whenever it has a complete fragment from any of the document streams that it is processing, It calls the fragment multiplexor that you have provided. Below are description of these callbacks.
LWPng Perl module provides an
event-driven framework for
interacting with web-based
resources. LWPng is the
LWPng provides programmatic access to the HTTP/1.1 protocol. The 1.1 version of HTTP has various features that enable more scaleable interaction with web-based resources. MuxML doesn't make use of any of these HTTP/1.1 capabilities. Instead it makes use of the event-driven framework that LWPng provides in conjunction with partial HTTP GET request which were already available in HTTP/1.0.
XML::Parser Perl module provides both
low-level and a
higer-level interface to James
Clark's C language parser, Expat
. The underlying C parser provides an
incremental parsing interface by
default. Up until version 2.22,
XML::Parser did not expose this
interface. Version 2.22 was made available
by Clark Cooper (the author of all but the
first few versions of XML::Parser) on