The Balise language offers all the power of a fully-fledged programming language and all the flexibility the user can expect from an interpreted language.
Balise is able to handle any SGML transformation problem. Major application areas are: content level instance validation, instance enrichment, database loading, instance formatting, and SGML to SGML transformation.
The DynaText/Balise toolkit will provide (due July 1994) a direct connection between the Balise SGML processing language and DynaText databases. DT/BL Toolkit will give structured access to the SIT navigation and query capabilities through Balise functions, as well as full access to DynaText stylesheets.
Balise operates on the SGML structure of a document, rather than on the occurrence of start- and end-tags. This means that Balise recognises the start or end of an element irrespective of whether the start- or end-tag is explicit or whether it is implied because of the usage of minimisation features.
Balise can be used in three different modes :
The Balise language contains programming components such as:
In SGML event-driven mode, Balise recognises the following events :
Balise keeps track of ancestors and attribute values for those ancestor nodes, enabling context sensitive decisions to be made within the program. Balise also supports `applicative attributes' these are user defined variables attached to the current element and remain accessible until the end tag of that element is discovered.
Balise is command line driven and allows the user to pass arguments to the program that can be used to set variable values within the program (this is similar to the usage of the C `argv' variable). The documentation is packaged in a single manual with a very usable Quick Reference section.
Information available to Balise is ESIS, with two extensions information is retained from the DTD about whether an element was declared EMPTY, and SDATA entity names are recorded. In most cases this set of information is sufficient for complex processing of SGML documents, but in some cases it may also be wished to access other non-ESIS information. For example, in Balise it is not possible to test the type of an attribute (eg. whether it is declared as a NAME, ID, NUTOKEN, etc), and it is also not possible to access information contained in comments.
Balise is easy to use, especially for experienced C or C++ programmers, and provides a great deal of help in the processing of SGML files through a comprehensive set of pre- defined functions. Any functionality not provided by those functions can be incorporated by the use of user-defined functions, the shell spawning program and a special development toolkit which allows extension through user provided C or C++ libraries. The Development licence includes a compiler, enabling the creation of distributable runtime applications.
As with all SGML transformation tools, the level of SGML conformance of these products is determined by the parser which they incorporate. In the case of Balise this is SGMLS. SGMLS supports only the Reference Delimiter Set of short reference delimiters and does not support LINK (in fact, none of the commercial transformation tools support LINK).
OmniMark offers an explicit pattern matching mechanism that supports event-driven processing, based on lexical events. It provides a tight coupling with an embedded SGML parser so that pattern matching can be made dependent on the SGML context. It has its own rule-based programming language that is `English-like' and although the language is obviously aimed for the non-programmer, people who use it will still need to know the basic concepts of writing a program loops/macros/etc.
OmniMark has three data types (which can be declared as global or local to an element):
All OmniMark objects are associative arrays called shelves, which may be variable or fixed in size.
The control structures that the OmniMark language contains are:
do when ... {else when ...} {else ...} done
do ... done
repeat ... again
repeat over (shelf) ... again
There is no for loop or case structure, although these can be created from the other structures. In addition there are control structures for text scanning (do scan/repeat scan) and text skipping (do skip).
OmniMark provides a general-purpose macro capability that allows a user-defined name to abbreviate a more complicated expression. Macros can also be assigned to delimiter characters so that a special character substitutes for a longer expression. Macros can be parameterized so that a repeating but variable pattern can also be shortened.
An OmniMark program consists of a set of rules and a set of actions associated with each rule. In up-translations and cross-translations the basic rule is the FIND rule, which describes a pattern of interest in a document and the actions to take whenever it is encountered. In down- translations the basic rule is the ELEMENT rule, which describes the actions to be performed whenever an element of a particular type is encountered in an SGML document. Both types of rule may be qualified by the SGML context, e.g. position in the tree hierarchy.
OmniMark can work on any SGML document instance and its associated DTD. It can access comments and processing instructions contained within the SGML document instance and act upon those instructions. OmniMark keeps track of ancestors and attribute values for those ancestor nodes enabling context sensitive decisions to be made within the program. It also keeps track of entity values (both user defined and ISO Standard) and can act upon the values of those entities.
The documentation that comes with OmniMark is both comprehensive and useful, but in some places it could do with more examples and a more complete explanation of those examples.
Overall, OmniMark V2.4 is a very powerful package that handles all types of translation and is straightforward to use. The main features of the product are:
Whether or not this last feature is a benefit or a drawback is a moot point. The idea is presumably that the language be as easy to learn as possible by a non-programmer. However, in order to write OmniMark programs the user must still be able to think as a programmer, in planning program flow, making use of the inbuilt functions and data types, etc. For experienced programmers (especially those used to C or C++) the terminology and syntax of the language can be confusing to learn and a hinderance to program development rather than an advantage. The lack of user-definable functions and the restricted range of data types can also be frustrating at times, though there are usually ways to work around such restrictions using the existing data types and macros.
SGML Hammer consists of a parser coupled to a LOUISE application language processor (the same language used by FastTag). SGML Hammer applications fall into three general categories:
The parser is based on SoftQuad's parser, and it supports OMITTAG and SHORTTAG but not SHORTREF or LINK. The LOUISE language is the same as is found in the FastTag product. LOUISE is based upon procedures, or `action blocks', rather than functions, with all variables being global. User- defined procedures can be created, which are simply named code segments without arguments. The only data types are strings (which may be contextually understood as numbers), and dynamic indexed arrays of strings. The control structures available are : if ... else; while; for; foreach.
Information exchange between the parser and LOUISE is at an extended ESIS level. As well as ESIS information, it is known whether or not an element was declared as EMPTY, and the names of SDATA and CDATA entities. SGML Hammer maintains element type information for all ancestor nodes, but attribute information only for the current element. Information on sibling elements is not maintained, though one useful feature of SGML Hammer is its ability to look ahead to test the name of the next element.
SGML Hammer is only able to have one input file and one output file open at any one time, which can be a big disadvantage in certain situations. For example, for subsequent loading into a database it may be desired to despatch the content of different SGML elements into different files. Whilst this is a simple matter with Balise or OmniMark, it would require multiple passes of SGML Hammer. It can also be frustrating not being able to read from external control files or to initiate external processes on the fly during processing. This means, for example, that it would not be possible to use SGML Hammer to create a FOSI-based document processor or any other document processor which requires the reading of a specification from an external file at the same time as the document is being processed.
Resolution of cross-references can also be difficult with SGML Hammer, as the extended string data type has a maximum length of 2K, so it can not be used for buffering large amounts of data.
Optional output procedure libraries are also available as costed extras. These contain procedures for creating output files in (currently) FrameMaker MIF, Interleaf ASCII, Microsoft RTF and WordPerfect 5.1 formats.
Generally, SGML Hammer is adequate for performing fairly simple SGML transformations. For this type of task it is possible to quickly create effective programs with a minimal learning curve. However, whilst it is possible to use SGML Hammer for more complex tasks, the limited functionality of the LOUISE language can make such procedures laborious and difficult to plan, requiring an experienced programmer to achieve the most from the product. Anything which can be written in SGML Hammer can generally be written more concisely and efficiently in either Balise or OmniMark. But it is not possible to rewrite all Balise or OmniMark programs using SGML Hammer, as much of the functionality of these other languages is not available within LOUISE, such as user-defined functions and complex data types.
The user documentation is clear and concise, allowing novice users to quickly create working transformation programs. More complex transformations require an experienced programmer in order to make the optimal use of the restricted programming environment.