[This local archive copy mirrored from the canonical site: http://www.balise.com/current/balise4desc.htm; links may not have complete integrity, so use the canonical document at this URL if possible.]
AIS Software announces the new Balise 4.0 Release and unveils its delivery schedule. This major release brings a significant set of new features:
Below are some more details about each of these features. As you will see, this represents a *major* functionality upgrade for Balise! Feel free to use to send us your comments or ask for more details about any of these features.
To be able to process RTF documents to convert them efficiently & easily to other formats, typically some flavour of SGML.
A new scanner in Balise that is activable using the currently available Open Parser Interface. The RTF document scanner generates "ESIS" events corresponding to an implicit "RTF DTD", which models every logical piece of information one can find in an RTF document.
You will then be able to write Balise programs to handle the "restructuration" phase of the transformation. The big benefit is that low-level decoding and analysis will be automatically performed for you.
The following MS-Word features are supported
- style sheets, all MS-Word character & paragraph properties,
- headers/footers ,
- markers
- graphics, which are extracted and saved into files (no automatic graphic conversion)
- equations, which are extracted and saved into files (but no automatic conversion to SGML or TeX)
- special characters which are mapped to SGML entities and/or Unicode character codes.
The first version will only support "Western RTF". RTF-J (Japanese) will be considered in a later phase (which might in fact be superseded by support of the new "Universal" RTF 1.5 which comes with Word 97).
To be able to handle XML documents directly in Balise and manipulate them exactly as we can today manipulate SGML documents.
Beyond XML concerns, this module will bring much flexibility in handling tagged files without explicit DTDs at hand.
A new parser in Balise that is activable using the currently available Open Parser Interface.
A prototype of this parser is already available as a free unsupported add-on to Balise 3.1. The official release which will be an integral component of Balise 4.0 implements the XML-LANG specification as it has stabilized over this last quarter.
- Complete support of the XML specification, including Unicode support
- Support of DTD-less Well Formed XML Documents
- Custom extensions allowing parsing of DTD-less SGML documents ("XML documents with traditional SGML syntax")
To be able to access and manipulate DTD information while processing an SGML instance. In particular, DTD API gives access to the details of entity and/or element declarations beyond what is currently implemented in Balise 3.1.
A set of new functions and a predefined data type which models DTD information.
When DTD information is loaded, it becomes a Balise data structure that can be browsed and even updated using the existing Balise mechanisms.
This mechanism will subsume the Balise 3.1 entity access mechanism which remains available for compatibility reasons.
- possibility to access the DTD of the currently parsed document,
- possibility to explicitly load any number of other DTDs from entities (files),
- possibility to update & save DTD info into a new DTD file,
- possibility to query a DTD as a model to validate the structure of a Balise document (tree) or subdocument,
- access to full DTD information at both logical and syntactic levels,
- identification of information specifically originated from the declaration subset .
To be able to handle specific SGML-to-SGML transformations that require fine-grain knowledge of the SGML markup, typically beyond the ESIS information level supported by Balise 3.1.
As a related extension, Balise also now provides access to the SGML simple and implicit LINK information.
A set of new SGML event types is introduced, which correspond to this new information : external entity boundaries, marked sections boundaries, SGML comments, etc.
Corresponding content rules are also added to the Balise language. For performance reasons, these events will only be raised as an option.
The following non-ESIS events are introduced:
- "#COMMENT" for SGML comments,
- "#CDATA_MS_START" for the start of CDATA marked sections,
- "#IGNORE_MS_START" for the start of IGNORE marked sections,
- "#INCLUDE_MS_START" for the start of INCLUDE marked sections,
- "#EXT_ENTITY_START" for the starting point of external entities,
- "#INT_ENTITY_START" for the starting point of internal entities,
- "#CDATA_MS_END" for the end of CDATA marked sections,
- "#IGNORE_MS_END" for the end of IGNORE marked sections,
- "#INCLUDE_MS_END" for the end of INCLUDE marked sections,
- "#EXT_ENTITY_END" for the end of external entities,
- "#INT_ENTITY_END" for the end of internal entities.
SGML LINK information is transferred through Balise applicative attributes.
To be able to easily manipulate indexed structures that are too large to fit in memory and/or should be shared and reused through multiple Balise sessions.
A set of Balise functions that provides a very similar interface to that of Balise maps, but manipulate disk-based structures.
Such disk-based maps are indexed (access is very efficient) and are scalable from very small to very large numbers (several millions) of items.
Since Release 3.0, Balise has provided an API for manipulating SGML instances (or parts of instances) as "Balise documents" represented as SGML elements trees.
This has proved to be a very efficient and powerful feature for some kinds of SGML transformations involving structural manipulations. The main limitation of this tree-based manipulation approach was that SGML trees were handled in memory.
This new feature precisely removes this limit by providing an unlimited disk-based implementation of Balise Documents.
Disk-based Documents will be manipulated exactly through the same API as memory-based documents available in Rel 3.1. This is made possible by the Abstract Document Interface implemented in Balise.
Only a few new functions are provided such as creation of a disk-based document from an SGML instance, or opening a disk-based document for manipulation and update.
- possibility to directly create a disk-based Balise Document from an SGML instance.
- transparent access through the existing SGML tree manipulation primitives.
- support of existing SGML tree update primitives.
- scalability from very small to very large documents (millions of SGML elements).
AIS Software
15 - 17 Rue Rémy Dumoncel
F-75014 Paris
France
Tel: (+33) 1 40 64 43 00 Fax: (+33) 1 40 64 43 10 email: info@balise.com