[Posted to XML-DEV by Paul Pazandak on March 10, 1998]

XML Documents Are Objects!

Killing OO Softly With XML

Paul Pazandak Object Services and Consulting, Inc.

"Wouldn't it be nice if one could simply tell an object to serialize to XML, and then deserialize back into an object?"

As programmers do you long for the old days when data was data and code was code? Do you buy into the idea that the behavior associated with data should be embedded within the application so as to restrict reuse of that data? Ah, the good old days of relational databases! In its current usage XML is enabling you to revisit those days again... but don't be persuaded by the dark force! Put on your OO glasses and see the light!

Sure, XML provides incredible potential, and I am all for it. But in their current form, XML documents are nothing more than mobile semi-structured non-object databases (ohhh so close! But not quite enough). Why is it that programmers have suddenly forgotten all about objects just so they could write XML? Is a return to relational databases that enticing? (Bleech!) The only practical reasoning behind such an approach is that programmers want to keep their data private. They don't want other applications to have the ability to reuse that data, and they accomplish this feat by embedding all of the code associated with that data (formally called "behaviors" in the OO era) in their own applications. [Who's running this show anyway? Is XML some kind of conspiracy to kill OO?]

Here's a simple example. You write an application that converts unformatted poems into composite poem objects rich with behavior. You want to store these poems, and share them with other applications that want to do things with poems (whatever it is you do with poems). You define an XML structure and start generating XML documents as a means to store and share the poems. Every application (including yours) that reads in your poems using an XML parser will see the poem as something similar to:

[This XML document was taken from an example accessible at the Microstar website (distributors of the AElfred XML parser). The file name is donne.xml. Below is the parse tree for this document.]

Pretty impressive right? Then every single application will need to supply its own code to understand how to navigate and interpret this structure, and provide behavior for it. This is typical if you are a C programmer, but be clear, this isn't OO. And, while DOM takes us a bit farther, you still won't get the parser to produce a poem object and its poem-specific behaviors from the XML document (but we still want DOM!).

The process of generating XML strips the behavior out of the objects; or, saying it differently, XML and related standards do not describe a mechanism by which one can attach behavior to XML documents. The parser, in turn, cannot therefore work miracles when it reads the data (which are no longer objects) back into the application. Or can it? Why can't we view XML as a serialized object representation? If we agree that this is not too far fetched, then why can't parsers deserialize or objectify the objects contained in the XML documents, rather than simply handing us data and making the applications do all of the work? What if the parsers generated real classes (with behavior!) instead of generic Element classes? The poem above would instead look like this: (perhaps if we talked about XML documents as orders (or anything else) instead of poems it might be more motivating?)

Oh, but could it be that simple? (The answer is "yes.") Would having a parser output objects with type-specific behavior be useful? (Hmm...) Would programmers really want to share their objects if they could? (The answer should be "yes.") Even if they didn't want to share their objects, or if nobody wanted their objects, why violate the principles of OO and make the programmers' lives more difficult? Wouldn't it be nice if one could simply tell an object to serialize to XML, and then deserialize back into an object?

With some VERY simple extensions to current parsers this can occur, and already has -- we've created an extended version of the Lark XML parser which provides this capability. Our input to this extended parser is the XML document and the type-specific classes (like poem) extended with the basic ability to deserialize themselves.

Introduction

XML documents are indeed objects, or at least they could be. If we simply associate behavior with the data structures defined within the XML documents we could have normal, living, breathing objects... like we're used to in the programming world. Instead of having the parser breathe life back into our objects, as part of the deserializing or re-objectifying the object, we are forced to do this within our applications. Simply put, parsers aren't doing enough for us.

XML parsers currently support non-portable object specifications. While the XML documents themselves are portable by virtue of being written in XML, the objects represented by those documents are cannot be objectified without an accompanying document-specific application which interacts with the parser.

Current XML parsers provide the ability to parse an XML document, and perhaps generate a generic object structure (parse tree) corresponding to the document. However, XML documents could potentially represent more than simple structured documents, they could describe complex objects with behavior. Common (simple) examples of XML documents include address lists. But making use of this information requires each application which desires to consume address lists to write parser-related code, as well as code to implement the behaviors of the address lists and their entries. We propose a simple extension to parsers which would all but eliminate application-parser interaction and the need for document handlers (which do not migrate with the XML document), and would facilitate objectifying XML documents into type-specific objects (like we're used to having in the programming world) having all related behaviors intact.

Background

Current XML parsers generate generic parse trees (most do anyway). These trees represent the structure of the data that was parsed. But what is missing is the behavior associated with this data. While there are methods associated with the generic parse tree elements, these are not data-specific but rather generic methods (see the sample code). This approach places the burden on the application to deserialize the document back into objects using the generic calls and a lot of validating code. This is true of all current XML parsers (which support parse tree generation).

Once the XML document is parsed the information needs to be retrieved by the application, so it must access it from the parse tree (if one was generated -- see the note on problems with event-based parsing). In general, the consuming application may proceed in one of two ways to accomplish this:

Simple Extraction.

Tree Transformation / Mapping

In both cases, the application embeds the intelligence of how to access the data within itself. This is not unlike the approach used by relational database applications which separates the data from its behavior. If another application wishes to access this data, it must define its own behavior for that data.

An Example

Here's an example to illustrate this. This XML document was taken from an example accessible at the Microstar website (distributors of the AElfred XML parser). The file name is donne.xml. When an XML parser generates a parse tree for this document, the resulting (informative) tree will look like the following in Lark (and similar in the other parsers as well):

The Element entries are the objects created by XML parser corresponding to the Element Declarations in the XML document. To determine what each element is, the application must navigate the structure and inspect each Element object using a generic API. This requires that the knowledge of how to navigate the structure is embedded within the application. The interface of this object must be embedded within the application as well which really violates the object-oriented paradigm -- yes, the data is stored in objects, but the associated type-specific behavior is stored someplace else. While this may appear similar to how objects are serialized today (without code), the distinction is that any other application that wants to access this object will not have access to the code since it is buried in the application which created it. All other applications will have to provide their own code (this, again, is how applications for relational databases are written).

There are several other problems with this approach, not the least of which is that the application should not be responsible for doing this. Furthermore, the parser-related code required to walk a complex structure is complex itself (not quite as complex as code used for event-based parsing of complex structures however), and is more difficult to maintain. Finally, the application is forced to do what the parser has already done, that is understand and navigate the structure of the document. The parser has already gone through the entire document and generated a structured instantiation of objects. The crux of the problem is that the parser generates generic objects which forces all of this additional work on the application. Worse yet, there is no reason this has to occur -- nor does the (tree-generating) parser have to be significantly modified.

Event-based parsing

An alternative to tree generation is simply to consume the structure on-the-fly as it is parsed. This requires writing an XML structure-specific handler (a document handler in SAX terms) which describes what should happen for each XML declaration that is encountered; no structure is automatically generated, so if objectification of the XML document is desired the handler is responsible for this. Using event-based parsing the application could adopt either of the above two approaches, the first being simple consumption and the latter which would cause the construction of some structure corresponding to the XML document. In both cases, at least for complex XML structures, there would be a lot of conditional segmented code which is more difficult to write and modify when changes in the XML structure occur. Using the extension proposed the majority of the work is done by the tree-generating parser, empowering the application to see XML documents as objects and alleviating their burden of using event-based parsing.

Granted, when an application will only encounter one kind of XML structure, event-based parsing might be a reasonable approach from the standpoint that only one handler would need to be written. But it still suffers from some of the same problems as generic parse tree generation (see the summary section).

XML Parsers Extended

What if the output of the parser was a type-specific structure which coincided with the definition of the structure in the XML document? And, what if that resulting objects contained the type-specific behavior for the specific element type parsed? What if the resulting parse tree for the example above instead looked like:

where poem, front, body, title, author, revision-history, andstanzawere all classes with type-specific behavior? Instead of writing something like the following to retrieve the title of the poem:

Element front = null;
Vector v = root.chilren();
if (v != null) {
    Element front = v.elementAt(0); // v(0) "should" be the front element, we hope
    if (front != null)
        v = front.children();
        for (i=0; i < front.size(); i++) {
            Element child = v.elementAt(i);
            if child.type().equals("title")
                return child.content();
        }
     }
}
return null;

one could simply write:

poem.getTitle();

More importantly, all of the behaviors that should be associated with each of these object types would be defined as part of the object interfaces themselves rather than embedded within the application.

Granted, an application can generate this same structure using the transformation / mapping technique above. However, this is partially a duplication of effort since it requires the application to navigate the structure generated by the parse tree, and then generate a new structure which mirrors the parse tree. The extension to Lark eliminates the need to do this because it instantiates the correct type-specific parse tree the first time. Note that this is an extension to Lark, and therefore applicable to any XML document.

Details

What occurs in the underlying implementation of an XML parser is rather straightforward. When it sees an XML element declaration, it instantiates a generic Element object (with Element only related methods). The extension to Lark simply extends the behavior of the parser so that instead of instantiating generic Element objects, it instantiates type-specific ones.

So when the parser encounters a new element declaration, it looks for a class declaration which identifies which class to instantiate in lieu of a generic Element class object (where it looks is described below). For example, when the parser identifies the "poem" element declaration, it looks for a class declaration for poem. If it finds one, it instantiates an object of that class rather than a generic Element object. The poem class extends the interface of the Lark Element class, but in addition, adds type-specific methods relevant to a poem object.

Within a type-specific parse tree class, like poem, is code which understands how to extract the parsed information. In effect, the object understands how to investigate itself. This code is provided by the object type creator. It will travel with the object as a means to facilitate re-objectifying the XML back into an object. This enables reuse of the object by any application. Of course, as stated above, the poem class will also provide a poem-specific interface.

A method I have added to the Element class is process(). It can be called once an element has been parsed. In each implementation, for example within the poem class, the process() code handles extracting the data from the inherited generic structures of the Element class. Alternatively, poem methods could simply be written that do this directly. But, it is important to note that the object itself is doing this, and further, that no other parse trees or duplicate structures are being constructed.

The location of the class declaration is not hard-coded. It could be within the XML file itself, in a DTD, in a stylesheet, or in a remote repository, for example. In addition, local class declarations may be used to override default class declarations. In the implementation of the Lark extension, I have simply embedded them in the DTD file along with the declaration of the structure of the XML file. In its current form the class declaration would look like the following for the poem example above, although there would be many ways to accomplish this:

<!ENTITY Poem-Class "http://www.objs.com/xml/poem/com.objs.ia.specification.xml.poem">
<!ENTITY Front-Class "http://www.objs.com/xml/poem/com.objs.ia.specification.xml.front">
<!ENTITY Body-Class "http://www.objs.com/xml/poem/com.objs.ia.specification.xml.body">
...
<!ENTITY ClassSuffix "-Class">

The ClassSuffix is used to avoid possible naming collisions (which may be solved otherwise using the XML namespaces proposal). So, when a new element declaration is identified by Lark it inspects this list looking for an entry matching the pattern <element type><ClassSuffix>, or in the case of the poem element declaration, "Poem-Class".

Cavaet Language?

Is this a language-specific extension? Not really. The class declarations could be (for example) written in Active-X I suppose, or even wrapped in CORBA, thereby enabling any language to take advantage of the idea of XML documents as objects. It would up to the parser to find the correct class declaration and objectify accordingly.

Implementation Experience

My experience with XML parsers began last year. As part of a DARPA-funded project I am implementing an architecture to demonstrate scalable object service architectures. I started using event-based parsing as a means to import object service specifications. These XML specifications represent real (Java and CORBA) services that are invoked by the architecture.

I noticed that by adopting an event-based approach to parsing I would have to write a lot of code which would be difficult to maintain should I have changes in the future. In addition, this code would be hard for someone else to understand since each parser callback method would include conditional statements for several types of elements, and the code would be spread across several methods. I prefer a clean separation of code whenever possible, and this didn't seem very clean.

I decided that tree parsing was a more practical route. The parser would automatically generate a structure for me. But, then I realized that I had to write all of the code to navigate this generic object structure, pull out the information I wanted, and then copy it into service specification objects having the behavior I wanted.

Since the parser was already generating classes, why not just tell it to generate the real classes to begin with? The classes themselves would handle deserialization. Sounds like OO to me! With modest changes to Lark, when it sees an XML service specification document it will generate service specification objects right away. This extension will work for any XML document which defines specializations of the Element class and makes them available to the parser. Besides asking Lark to parse the document, my application has no other parser-related code. Furthermmore, any other application can use my XML service specification documents, and load them in as service specification objects with only a few lines of code.

Summary

In summary, an extension has been presented which extends the capabilities of Lark, but which could be applied to all tree-generating XML parsers. It enables type-specific composite object construction to occur within the parser which is a significant improvement over generic parse tree construction because:

We can attach behavior to XML documents.
We can therefore treat XML documents as objects.
It eliminates most of the neccessity of the application to understand SAX, or parser-specific structures, as well as reducing the amount of direct interaction between the application and the parser. To a certain extent DOM will accomplish this, but the extension proposed here augments this by enabling the generation of type-specific interfaces.
The application is then free to interact with the generated objects as "real" objects having type-specific structure and behavior.
More importantly, the XML documents can roam freely (likened to serialized objects) which can be objectified again by any application. This would not be possible with generic tree parsing or with event-based parsing (which requires specialized structure-specific parsing handlers).

If we view XML as a means to serialize an object, we should view the parser as the mechanism to deserialize (or objectify) it. Once we convert an object to an XML representation, it simply doesn't make sense to throw away its behavior or the code which understands how it should be deserialized. Embedding this knowledge within an external application is just revisiting the relational DBMS experience and ignores the principal benefits of object technology.

If this proposed extension were adopted it would benefit significantly from a standardization of the Element interface (something that will happen with DOM). In this way, the associated class files would not be parser-specific, and therefore any XML document could be objectified by any tree-generating parser.

Status

I anticipate that the extensions I have made to Lark will be incorporated into a next version of Lark (I assume this from previous dialogues I have had with Tim Bray). If not, and in the meantime, the enhanced version of Lark is freely available on request.

References & Acknowledgements

Related work in this area is described in Towards a Web Object Model by Frank Manola, Object Services and Consulting, Inc. Thanks to Frank Manola (OBJS, Inc.) and Tim Bray (Textuality, Inc.) for their useful feedback.

This research is sponsored by the Defense Advanced Research Projects Agency and managed by the U.S. Army Research Laboratory under contract DAAL01-95-C-0112. The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied of the Defense Advanced Research Projects Agency, U.S. Army Research Laboratory, or the United States Government.

© Copyright 1998 Object Services and Consulting, Inc. Permission is granted to copy this document provided this copyright statement is retained in all copies. Disclaimer: OBJS does not warrant the accuracy or completeness of the information in this document.