BEA Systems, Inc. has announced the availability of the "first compliant preview" of the Streaming API for Java (StAX), documented in Java Specification Request (JSR) 173, Streaming API for XML. StAX is "a new Java API designed to improve developer productivity and performance by making it easier to incorporate Extensible Markup Language (XML) into Java. StAX is designed to overcome many of the disadvantages of former methods of accessing and manipulating XML documents from Java applications by providing the efficiency of streaming APIs and the control of tree-based APIs. This new method represents a next generation of APIs -- pull parsing. Unlike SAX and DOM, StAX is bidirectional, and can allow programs to both read existing XML documents and create new ones. Unlike other event-based approaches, developers using StAX for XML document parsing can pull events, rather than handling events in callback, which can enable them to stop processing, skip ahead, or get subsections and, ultimately, help to gain precise control, thereby saving time and reducing overall development costs." The BEA preview release of StAX is designed to provide developers with a suite of tests, tools and documentation designed to allow fully standardized implementation. This StAX preview is freely available online for developers.
Bibliographic Information: Streaming API for XML
"Streaming API for XML." Java Specification Request (JSR) #173. Specification Lead: Christopher Fry (BEA Systems). Produced under the Java Community Process. Expert Group Members: Arnaud Blandin, Intalio, Inc.), Andy Clark (Apache), James Clark, Christopher Fry (BEA Systems, Specification Lead), Stefan Haustein). Simon Horrell (Developmentor), K. Karun (Oracle), Glenn Marcy (IBM), Gregory M. Messner (Breeze Factor), Aleksander Slominski), David Stephenson (Hewlett-Packard), James Strachan), and Anil Vijendran (Sun Microsystems). JSR-000173 Streaming API for XML Specification 0.7. Proposed Final Draft. August 27, 2003. 61 pages. A reference implementation is included in the ZIP archive containing the draft specification. "This specification describes the Streaming API for XML (StAX), a bi-directional API for reading and writing XML. This document along with the associated API documentation is the formal specification for JSR-173... This document specifies a new API for parsing and streaming XML between applications in an efficient manner. Efficient XML processing is fundamental for several areas of computing, such as XML based RPC and Data Binding... The Streaming API for XML gives parsing control to the programmer by exposing a simple iterator based API and an underlying stream of events. Methods such as next() and hasNext() allow an application developer to ask for the next event (pull the event) rather than handling the event in a callback. This gives a developer more procedural control over the processing of the XML document. The Streaming API also allows the programmer to stop processing the document, skip ahead to sections of the document, and get subsections of the document. The Streaming API for XML consists of two styles: A low-level cursor API, designed for creating object models and a higher-level event iterator API, designed to be used in pipelines and be easily extensible..." Background to the StAX design: "Processing XML has become a standard function in most computing environments. Two main approaches exist: (1) the Simple API for XML processing [SAX] and (2) the Document Object Model (DOM). SAX is a low-level parsing API while DOM provides a random-access tree-like structure. One drawback to the SAX API is that the programmer must keep track of the current state of the document in the code each time they process an XML document and thus cannot iteratively process it. Another drawback to SAX is that the entire document needs to be parsed at one time. DOM provides APIs that allow random access and manipulation of an in-memory XML document. At first glance this seems like a win for the application developer. However, this perceived simplicity comes at a very high cost: performance. For very large documents one may be required to read the entire document into memory before taking appropriate actions based on the data..."
From the BEA Announcement
BEA Systems, Inc., the world's leading application infrastructure company, today announced the public availability of a preview release of the Streaming API for Java (StAX), a new Java API designed to improve developer productivity and performance by making it easier to incorporate Extensible Markup Language (XML) into Java. Through its leadership in the creation of StAX, as well as its breakthrough XMLBeans technology, BEA continues to drive innovation in Java community standards, helping to make developers more efficient and decrease the complexity of new technologies in enterprise and Web services applications.
"StAX is a substantial improvement in the speed and simplicity of Java development, and can offer significant value in the proliferation of Web services," said Adam Bosworth, chief architect and senior vice president of Advanced Development at BEA Systems, Inc. "This API builds upon previous standards and innovations like XMLBeans to combine the best of all approaches for easily and powerfully manipulating XML. Through decreased memory usage and increased precision and control in XML and Java application development, StAX can provide new momentum within the developer community."
With the dramatic growth of XML-based applications such as Web services, accessing and manipulating XML documents from Java applications has become increasingly critical to the enterprise. Until now, most XML APIs for Java have fallen into one of two broad classes: event-based, streaming APIs like the Simple API for XML processing (SAX) or tree-based APIs like the Document Object Model (DOM).
Both of these approaches have significant advantages and disadvantages. For instance, SAX is fast and highly efficient but strips the developer of total control over the development process, often creating excess work. Conversely, tree-based DOM provides greater developer control but can be highly memory intensive, making it inappropriate for larger documents or memory-constrained environments..."
"StAX solves these problems by providing more control of XML parsing to the programmer, in particular by exposing a simple iterator-based API and an underlying stream of events. Methods such as next() and hasNext() allow an application developer to ask for the next event, or pull the event, rather than handle the event in a callback. StAX also enables the programmer to stop processing the document at any time, skip ahead to sections of the document, and get subsections of the document.
StAX helps you process XML faster and easier in these typical use cases:
- Data binding, a two-way process that reads and writes XML (unmarshaling and marshaling) to and from a programming language data structure
- SOAP message processing (SOAP is an XML message transport format used predominantly by Web services)
- Parsing a specific XML vocabulary
- Processing pipelined XML..."
BEA served as Specification Lead for the past two years, driving the public review of StAX through the Java Community Process (JCP) to final approval as Java Specification Request (JSR) 173. The BEA preview release of StAX is the Specification's first compliant preview, and can provide developers with a suite of tests, tools and documentation designed to allow fully standardized implementation..." [adapted from the announcement and BEA StAX web site]
Principal references:
- Announcement 2003-11-05: "BEA Systems Continues Leadership in Open Standards Innovation with Preview Release of Streaming API for Java (StAX). New API Designed to Increase Developer Performance and Productivity, Making it Easier to Incorporate XML into Java."
- Streaming API for XML (StAX). BEA dev2dev.
- Pull Parsing XML By Chris Fry. "This API grew out of the need to read and write XML in an efficient manner in the context of XML Binding and Web Services. At the time this API was created there was no standard way to read and write XML in a symmetrical way. This example shows you how to parse XML into a simple set of value objects..."
- XMLBeans: The easiest way to use XML in Java
- BEA dev2dev home page
- BEA Systems, Inc.
- Java Specification Request 173: Streaming API for XML. Reference document.
- Final Approval Ballot for JSR 173. November 03, 2003. Unanimous approval by Apache Software Foundation; Apple Computer, Inc; BEA Systems; Borland Software Corporation; Caldera Systems; Cisco Systems; Fujitsu Limited; Hewlett-Packard; IBM; IONA Technologies PLC; Lea, Doug; Macromedia, Inc; Nokia Networks; Oracle; SAP AG; Sun Microsystems, Inc.
- See also:
- Using the WebLogic XML Streaming API. From BEA. "The WebLogic XML Streaming API provides an easy and intuitive way to parse and generate XML documents. It is similar to the SAX API, but enables a procedural, stream-based handling of XML documents rather than requiring you to write SAX event handlers, which can get complicated when you work with complex XML documents. In other words, the streaming API gives you more control over parsing than the SAX API... You can parse many types of XML documents with the streaming API, such as XML files on the operating system, DOM trees, and sets of SAX events. You convert these XML documents into a stream of events, or an XMLInputStream, and then step through the stream, pulling events such as the start of an element, the end of the document, and so on, off the stack as needed..."
- Xerces XNI Pull Parser Configuration. "There are two parser configuration interfaces defined in Xerces Native Interface (XNI): the XMLParserConfiguration and the XMLPullParserConfiguration. For most purposes, the standard parser configuration will suffice. Document and DTD handler interfaces will be registered on the parser configuration and the document will be parsed completely by calling the parse(XMLInputSource) method. In this situation, the application is driven by the output of the configuration. However, the XMLPullParserConfiguration interface extends the XMLParserConfiguration interface to provide methods that allow the application to drive the configuration. Any configuration class that implements this interface guarantees that it can be driven in a pull parsing fashion but does not make any statement as to how much or how little pull parsing will be performed at each step..."
- DOM Pull Parser for Python, from Paul Prescod. "PullDOM is a really simple API for working with DOM objects in a streaming (efficient!) manner rather than as a monolithic tree..." [source, cache]
- XmlReader Class from the Microsoft .NET Framework Class Library. "XmlReader provides forward-only, read-only access to a stream of XML data. The current node refers to the node on which the reader is positioned. The reader is advanced using any of the read methods and properties reflect the value of the current node..."
- "Write XML Documents with StAX." By Berthold Daum. IBM developerWorks.
- "Screen XML Documents Efficiently with StAX." By Berthold Daum. IBM developerWorks.
- "An Introduction to StAX." By Elliotte Rusty Harold. From XML.com (September 17, 2003). "BEA Systems, working in conjunction with Sun, XMLPULL developers Stefan Haustein and Aleksandr Slominski, XML heavyweight James Clark, and others in the Java Community Process are on the verge of releasing StAX, the Streaming API for XML. StAX is a pull parsing API for XML which avoids most of the pitfalls noted in connection with XMLPULL; XMLPULL was a nice proof of concept, but StAX is suitable for real work. Like SAX, StAX is a parser independent, pure Java API based on interfaces that can be implemented by multiple parsers. Currently there is only one implementation, the reference implementation bundled with the draft specification."