[This local archive copy mirrored from the canonical site: http://www.objectmagazine.com/frompages/9802/carlson.html; links may not have complete integrity, so use the canonical document at this URL if possible.]
OBJECTS AND THE WEB
Document Objects with Style
David Carlson, Ph.D. (firstname.lastname@example.org) provides synergistic consulting services with Ontogenics Corp. in Boulder, CO.
Technology trends come in at least two flavors: hype and revolution. Separating the two, and deciding where to apply your limited resources, can often feel like a roller coaster ride, or a trip through the haunted house! Back in August 1995, I was convinced that the Java train would lead to revolution. The extensible markup language (XML) is now at a similar stage of evolution, and I'm equally convinced that it will yield a revolution in hybrid Web-Object systems. Now, I've hitched my cart to the XML train.
Although XML is heavily dependent on its relationship and inheritance from the Standard Generalized Markup Language (SGML), I will attempt to describe XML's benefits and merits based on its own specifications. SGML (an ISO standard since 1986) contributes a lot of capabilities and prior thought, but it also brings a bit of baggage from its history. I do not mean to criticize SGML; it will continue to be a viable choice for complex document management. But for XML to be successful in the much larger and more diverse Web community, it needs to stand on its own feet and be understandable without history lessons. So far, however, I've found it necessary to dig deeply into SGML's roots to understand XML's potential. Hopefully, this will change with maturity and with new XML guidelines and books.
First, you need to expand your notion of what constitutes a "document": an XML (or SGML) document is a composite structure of node objects, each having optional attributes. The principal subnodes are typed "elements" and blocks of uninterpreted text. From these roots, you can construct schemas defining valid node structures, and document instances that adhere to the schema. I previously1 discussed several XML draft standards and W3C working documents for defining XML document schemas. This month, I'll focus on two draft standards for processing XML document instances: the Document Object Model and the Extensible Style Language.
Document Object Models
From the developer's perspective, XML usage
can be roughly separated into parsing and application processing. There are several
good XML parsers written in Java, available for free download (see, for example,
xmlparse.htm). However, if you isolate the act of parsing to the generation of tokens according to some predefined grammar, then you still need to think about useful object models for representing the "source tree" that is produced by the parser. One potential standard for representing this structure is the Document Object Model (DOM) as defined by a W3C sponsored working group (see www.w3.org/DOM). The DOM goes significantly beyond representation of the parse tree and proposes an interface for manipulating document objects and for constructing documents within your application program.
I'll briefly summarize the core objects in the DOM, but keep in mind that this is a draft specification subject to change. The Node class defines an abstract interface for getting, inserting, and removing child nodes within a recursive structure. NodeList and NodeEnumerator classes are defined for traversing sets of Node objects. Several specialized subclasses of Node are defined for Document, Element, Attribute, Text, Comment, PI (processing instruction), and Reference. The Document object contains a pointer to the root Element in the document tree, and a DocumentContext object contains a pointer to the Document, plus adds additional metadata about that document. Instances of the Element object would be created for each markup tag in the document, and uninterpreted text is stored in a data attribute of the Text object.
It's interesting (and appropriate) that the DOM specification defines the object model interface using the CORBA interface definition language (IDL). The authors are careful to point out that use of IDL doesn't require using CORBA, but enables a language-independent definition that can be easily translated to implementation languages. The specification also provides an equivalent Java interface definition. I have not yet seen any implementations of the DOM interface specifications, but I expect that will change by the time you read this column.
At the time of this writing, only the Core Document Structure and Navigation specification draft has been published. Future specifications will specialize this object model to HTML and XML structures, and to object models for document schemas and stylesheets.
Flow Objects and Transformations
There is a second specification draft targeted
at document stylesheet definition and document formatting. The Extensible Style Language
(XSL) is itself based on XML -- the stylesheets are XML document instances, specifying
how other XML documents should be transformed and/or formatted for presentation (for
the specification draft, see
The term "stylesheet" is somewhat misleading, because it includes a general capability for transforming the document's source tree into an output tree, based on a set of construction rules. The output tree can be another document object model defined by a different schema, and the construction rules will map target elements from the source tree into corresponding elements in the output tree. The term "tree" is used to signify the composite structure of the document object model hierarchy of nodes. Each output element is called a "flow object," and the XSL specification includes definition of a standard set of flow objects, analogous to a standard class library in Java. Initially, a set of flow objects will be defined that allow XML documents to be transformed into HTML documents, which can then be viewed in existing Web browsers.
Whereas only one construction rule can be applied to each element in the source tree, any number of style rules can be applied. Style rules do not create new flow objects, but modify the characteristics of flow objects produced by construction rules. If you are familiar with rule-based expert systems, these style sheets look like a knowledge-base for document transformation. Each construction rule contains a pattern that identifies the source element to which the rule applies, and an action that specifies the flow object to be created. There is even a conflict resolution algorithm for choosing from among multiple rules that might be applied to a particular element.
Although XSL is only in its first draft,
I've already found two implementations available on the Net. First, xslj is, according
to its developer, a "virtually complete implementation of XSL by way of translation
into extended DSSSL." Xslj is a front-end for processing XSL stylesheets and
XML documents with existing SGML tools; DSSSL is a Scheme/Lisp- based stylesheet
language used with SGML documents. However, this approach has a useful advantage
in that you can use these existing tools to transform any XML document into other
presentation formats, including SGML, HTML, RTF, and TeX. To download a copy of xslj,
including its C source code, see
If you're not afraid of the bleeding edge, check out the "docproc" tool available at javalab.uoregon.edu/ser/software/docproc_2/docs/. This XSL processor is written entirely in Java and runs as a servlet, allowing any XML document to be filtered and formatted for presentation in existing HTML Web browsers. It is in the midst of development, so current features may vary, but it looks like a very interesting testbed!
An XML document, by definition, only specifies
the logical structure of document elements, and makes no statement about the document's
formatting or presentation. An XSL stylesheet and processor would transform the document
into its viewable form. If we take a look into the future, one can envision Web browsers
that receive XML documents, parse them into DOM representations, and apply their
built-in XSL processor to the stylesheet specified by the document. Shared, standardized
stylesheets can be available from centralized Web servers, and an XML document simply
refers to the URL for its preferred presentation style. Alternatively, a customized
XML document structure can refer to its private stylesheet, or include the style
rules directly in the document.
1. Carlson, D. XML Documents Can Fit OO Apps, Object Magazine, 7(9):24-26, 1997.