[This local archive copy mirrored from the canonical site: http://www.balise.com/current/balxml.htm; links may not have complete integrity, so use the canonical document at this URL if possible.]

Balise and XML


XML is a standard that "simplifies" SGML and enlarges its scope, especially with respect to World Wide Web applications. XML is developed under the auspices of the World Wide Web Consortium (W3C) and is already supported by most major software vendors, including Microsoft. One of the main conceptual innovations in XML is the notion of "well formed document" (WFD) as distinct from that of "SGML validity". In a few words, this means that it is now possible to perform useful processing tasks on an XML instance without knowledge of its DTD.

Several related drafts or proposals are also introduced. They globally define a comprehensive framework for creating, managing and distributing structured information. Main proposals are :

Balise support for XML has been announced very early (see AIS Software announces XML support in Balise). A first version of a non-validating XML parser has been proposed as a plugin to Balise Release 3.1 since March 1997. An updated version reflecting the latest changes in the XML spec is now shipped as an integral part of the new Release 4.0.

Balise Release 4.0 has a set of unique features oriented towards XML support :

Input Parsers

Several parsers can be directly used for reading input data into Balise. This includes:

Balise Release 4.0 is therefore able to directly read XML documents with or without DTD. It can be used both as a validating XML parser and non validating XML processor.

Unicode Support

Balise provides full support for Unicode since Release 3.1 (see AIS Software announces double-byte version of Balise). Balise internally uses a Unicode-based, double-byte representation of characters which provides native support for Unicode/ISO-10646, as well as for most national character sets currently used in East-Asian countries. All string operations, including string search, string sort and pattern matching, are performed upon double-byte character strings.

Being able to process arbitrary UFT-8 and UTF-16 (UCS-2) encoded files is a mandatory requirement of the XML standard that is completely fullfilled by Balise.

DOM-compatible tree manipulation

The tree manipulation approach consists in providing the programmer with a tree manipulation abstraction rather than an event abstraction for XML instance manipulation. The manipulation is done through an API that allows navigating in trees, modifying trees, creating new trees, etc. This paradigm corresponds to transformations that require some global view of the manipulated documents. The forthcoming DOM W3C standard specifically refers to such a tree manipulation paradigm.

Balise is today the only XML processor that integrates both an event abstraction and a tree abstraction for manipulating structured data. This integration provides maximum flexibility for programming XML transformations.