Section 2.2(B): SGML-aware additions to existing word-processing/page composition systems


Product:
WordPerfect Intellitag (v.1)
Associated Products:
requires WordPerfect 5.1
Developer:
Wordperfect Corporation (USA)
UK Supplier(s):
Grafnet
Price:
Single: $495; $295 for next then reducing
Platforms:
MS-DOS, Unix
Description:
Intellitag is used to convert WordPerfect documents to SGML format. You can use Intellitag to do the following:

Intellitag is designed primarily to convert WordPerfect documents to SGML format. It is neither a fully-featured SGML editing environment nor a fully-featured SGML text retrieval system. Intellitag uses users own DTDs. Intellitag can import documents but it does not automatically format them for printing or display.

Whether or not a document has been pre-tagged, it can be edited in Intellitag. The full editing capabilities of WordPerfect, such as fonts and format settings, are not available or necessary in Intellitag. The purpose of editing in Intellitag is primarily to insert, modify, or delete SGML tags and entity references, and to define the attributes of SGML tags. However, the user can edit, search, replace, block move and copy text and codes, edit the contents of headers, footers, and footnotes, edit and insert styles, and edit any text associated with graphics.

While being edited, the document can be validated. The validation process uses the Error List and Next error features.

Version 2 is due for release in October 1994.

Assessment:
WordPerfect Intellitag is a DOS-based application which looks very similar to the WordPerfect 5.1 word processing software under DOS. It is a stand-alone application since WordPerfect is not needed in order to run Intellitag. What it does need is a binary version of the DTD which governs the converted document's structure. This binary version of the DTD is created using the dtd2lgc tool supplied with Intellitag.

Intellitag is used to convert WordPerfect 5.1 documents to SGML by allowing the user to insert, modify and delete SGML elements, attributes and entities. Intellitag does not offer the whole range of editing facilities that WordPerfect offers (eg. fonts and format settings are not available or necessary) because its primary purpose is to insert, modify, or delete SGML tags and entity references, and to define the attributes of the SGML tags.

Intellitag offers two routes for getting documents into SGML:

  1. Converting existing WordPerfect documents to SGML document instances;
  2. Authoring of new SGML documents conforming to a DTD.

Intellitag also claims to offer the facility to import an SGML document instance and convert it to a tagged WordPerfect document, though see limitations below. There are four mechanisms that allow the user to create an SGML document:

  1. a batch process tagging facility (pre-tagging);
  2. limited editing capabilities so that the user can complete the tagging;
  3. a validation mechanism (SGML parser);
  4. an output facility.

In order to accomplish the pre-tagging of WordPerfect documents, the user has to create a set of conversion rules and then apply them to a WordPerfect document. These conversion rules define the relationships between SGML tags and the text and styles used in a WordPerfect document. The rules themselves are created in Intellitag and stored as {CONVERSION RULE} codes. These rules are used to convert both ways (from WordPerfect to SGML and vice versa) and are applied in order, one at a time, to the whole WordPerfect document (a separate pass is required through the whole document for each rule).

When converting from a WordPerfect file to an SGML document, the conversion rules can search for a given string of text characters (no wild cards or regular expressions) or a WordPerfect style and once found, can replace them with an SGML element. There is a one-to-one replacement relationship between string/style and SGML element and there is no form of context-sensitive or attribute-dependent replacement.

As an editor Intellitag is popup menu based with certain keys invoking the menus. The user can obtain a list of valid elements at any time and choose which one to insert. If there are required attributes, Intellitag will prompt the user to enter them. There are three types of entities that can be inserted into the SGML document instance: Character (eg. the characters defined in external entity files); DTD Defined (those defined in the DTD eg. common abbreviations); and Document specified (created by the user from within Intellitag).

Once a document has been created/translated into an SGML document instance, Intellitag validates the document against the DTD and can either save the document as a WordPerfect file with the SGML tags preserved, or save the content of the tagged WordPerfect file into an ASCII file containing the SGML tags (an SGML document instance).

Intellitag needs an external entity mapping file which states exactly where the external entity files are. For example

PUBLIC "-//TEST/ELEMENTS Body Elements//EN"
DTD C:\dtds\body.dtd
PUBLIC "ISO 8879-1986//ENTITIES Added Latin 1//EN"
ENTITIES D:\entities\isolat1.ent

Intellitag also allows the use of an SGML declaration file, and supports OMITTAG but does not support DATATAG, RANK, SHORTTAG, CONCUR, or SUBDOC.

The printed documentation that comes with version 1.0 is almost non-existent — the licence booklet is larger! There are examples which can be worked through in order to get to know the system and there is a reasonable (if convoluted) on-line help system invoked with the F3 key (rather than the usual F1 key). The help files are very small WordPerfect documents but there is no way of printing them out (in order) for easy reference.

As an SGML editor Intellitag is a competent piece of software but, since it is ASCII based, there is no form of formatting for the elements themselves.

The main drawbacks with using Intellitag as a conversion tool are: that it is limited to a one-to-one replacement of WordPerfect style/string with an SGML element ie. there is no form of context- or attribute-sensitive processing; and that the whole document is processed for each conversion rule. This means that for very large documents, with many conversion rules, the conversion of the document will be time consuming.


Product:
SGML Tagger
Associated Products:
any DOS word-processor
UK Supplier(s):
Oxford University Press, Oxford, UK
Price:
£75, bulk purchase price negotiable
Platforms:
MS-DOS (requires 160k free RAM in addition to word-processor)
Description:
SGML Tagger is loaded on top of MS-DOS word-processing software, and allows users to insert SGML markup quickly and efficiently, without the need for a specialised SGML editor. The software works out what markup is allowed, and will only let the user insert a tag which is correct in that particular context. It is designed for experts and novices alike. Anyone who can operate a word-processor will be able to mark up documents with SGML Tagger.

Assessment:
SGML Tagger is a DOS TSR (Terminate and Stay Resident) program which, when loaded, can be called on top of an existing DOS editor or word-processor. It is called from within the editor by a user-definable hot key combination, and a window appears which contains three panes. One pane shows the list of currently open elements at the current cursor position, a second `Current Status' window explains verbally what may be inserted at the current position (eg. "You can insert mixed content here. You can insert 4 subelements."), and the third pane contains the main SGML Tagger Menu. This menu contains eight possible options, though those available at any point are determined by the current context. The possible options are:
  1. add a complete element
  2. insert an entity name
  3. add an element start-tag
  4. add an element end-tag
  5. optional settings
  6. start a new document
  7. do nothing

Selecting 1, 3 or 4 results in a list of the relevant valid element names being presented. Selection of an element results in the insertion of either a pair of start and end tags, a single start tag or a single end tag. If there are any required attributes for the element then SGML Tagger will present an attribute window in which values can be provided for attributes. A user defined option can be set to make SGML Tagger always display the attribute window or only when there are required attributes. References to entities defined in the DTD can also be inserted by selecting them from a menu. There is no support for referencing entities defined in a declaration subset within a document, or for declaring such entities from within SGML Tagger.

SGML Tagger is not an SGML editor, and it is not marketed as such. It is a tool which can assist authors in the task of entering SGML markup by presenting lists of suggested choices and allowing insertion of markup characters by selection from lists. SGML Tagger works out the element choices it offers by examining the tags it can currently see on the screen. It then infers from the DTD any ancestor tags which must be open for the displayed tags to be valid.

This approach works fine if there are enough tags visible on the screen for SGML Tagger to correctly work out the current contextual position. However, in the worst case there may be no tags at all visible on the screen (for example, a long paragraph consisting of just text characters). In these cases SGML Tagger cannot work out the current context and displays a message informing the user of this. In other cases there may be some tags on the screen but not enough for SGML Tagger to work out an unambiguous context, in which cases SGML Tagger can get confused and can present invalid element insertion options. This is why there is a menu option `Start a new document', which forces SGML tagger to reinitialise its stack of open elements and to parse the document from the beginning to try to work out the current context.

DTDs are `compiled' by SGML Tagger into an expanded canonical form in which all parameter entity references and marked sections are resolved. User-defined element and attribute help messages are automatically made available by extracting any comments in the element or attlist declarations and compiling them into the Tagger help system.

SGML Tagger provides no way to reference an SGML Declaration. The feature set supported is hard-coded into the product and the equivalent Declaration supported conforms to the Reference Concrete Syntax, with the following exceptions :

All other features, including SHORTTAG, are not supported.

As an aide memoire to a user who is familiar with a given DTD, and who prefers to work in a DOS editor environment where all tags are visible on the screen, SGML Tagger can be a useful tool within the limits of its restrictions. The produced SGML document, however, should never be assumed to be valid. Sgmls is bundled with SGML Tagger, and all documents should also be parsed using such a parser.

A side benefit of SGML Tagger, which can be useful in trying to understand complex DTDs with multiple nested parameter entities and/or marked sections, is that the DTD compilation process produces a canonical form of the DTD in which all parameter entity references and marked sections are resolved. For those who are trying to understand exactly what is happening in some complex DTDs it may well be worth buying SGML Tagger just for this feature.


Product:
SGML Author for Word
Associated Products:
will require Word 6
Developer:
Microsoft Corporation, Inc. (USA)
UK Supplier(s):
Microsoft UK
Price:
n/a
Platforms:
To be announced
Description:
No technical description is available, but the following text has been extracted from pre-release information.

"To author an SGML document the end users simply construct their documents in Word as they normally would, except they must use styles for all formatting. To ensure that they use the styles appropiately, the users format according to an MIS provided style guide and set of Word templates. To create SGML, the user then saves the file as SGML just as they would export to any other file format.

Once the user has chosen to save an SGML representation of the file, an ASCII text file is created which contains syntactically correct (ie. parseable) SGML. To achieve this syntactically correct SGML, the converter may modify the Word file to ensure conformity to the DTD.

To ensure that the desired result is obtained, the converter has to be pre-configured to create the appropriate SGML. This is done by creating a mapping file using a provided Mapping Application. This application is geared at the SGML knowledgeable individual, and it allows this individual to build specific mappings between Word templates (ie. styles) and the structures in the SGML DTD. Where standard DTDs do exist, Microsoft will provide pre- assembled mapping files and templates.

This product is planned for commercial availability for the first half of 1994. [Note by authors of this report: It had not appeared by Sept 1994, but has now appeared.] The initial release will be for MS- Windows, and it will be followed by releases for the Apple Macintosh and Windows NT. These products will be sold as separate add-ons to Microsoft Word, and they will require Word 6.0 or later to run."

Assessment:
As noted above, this product had not appeared by Sept 1994. Due to the widespread use of Word throughout UK academia, it is appreciated that this product could be extremely valuable to users wanting to use SGML within Universities. An assessment will be produced and distributed as soon as a copy of the product can be obtained and time allocated to the task.


Product:
SGML TagWizard Version 1.3
Associated Products:
requires Microsoft Word 6
Developer:
NICE Technologies, France
UK Supplier(s):
-
Price:
650 Ffrs (`site' and other general licences are available)
Platforms:
PC-Windows
Description:
SGML TagWizard is an SGML instance editor. Its user interface has been designed to fit seamlessly with Word 6. It contains its own SGML parser, and allows `WYSIWYG' formatting of SGML documents. Version 1.3 works with the English American and Australian English, French and Canadian French, German and Swiss German, Dutch, and Swedish language versions of Word. It is guaranteed to work with two DTDs — HTML and ISO 12083:1994 Book. These DTDs and formatting tables are shipped with the product.

Assessment:
SGML TagWizard is easy to install and fits well with Word 6. As with any SGML-specific editor, inputting tags through the menus is longer-winded than typing them directly would be, but using the menus is essential with SGML TagWizard as the tags are stored in the Word document using Word `quote fields'. When tags are inserted, the user is given a list of the contextually correct tags (but not in alphabetic order, so sometimes not easy to find the correct one). Start- and end-tags are added independently. This means that a document may be closed in an `incorrect' state (normally, all start- tags will be included as text is input so the document would be correct up to the closing of all currently-open elements). The `correct' end-tag can be added with a single mouse click on the end-tag icon, however with the HTML DTD the `correct' end-tag is not always obvious (see comments below). The parser does not currently completely conform to ISO 8879:1986. None of the features except OMITTAG are supported and the reference concrete syntax is assumed. Since it is an instance editor, the DOCTYPE declaration plus DTD must be removed from any document before importing it. Attributes are fully supported. SGML TagWizard will convert Word graphics into external GIF files and those files can be `anchored' into the document. It also will markup the rows and columns of Word tables with the appropriate tags.

The facility to import SGML documents works but is slow. The facility to export documents as ASCII SGML files works well, but even if the user has just parsed the document prior to issuing the export command, the document is parsed again. There is a need to go through what seems like an endless set of dialog boxes, but that is probably because of the need to satisfy both SGML TagWizard and Word. Entity references to special characters will be translated to the appropriate Windows symbol on import and automatically translated to entity references (as defined by the DTD) on export. Parameter entities are used throughout the certified DTDs. General entity references are translated on Import and Export (I would not have expected entity references to be expanded in either direction, but to be indicated as `recognised' on import, and to be flagged if missing on export). The #DEFAULT general entity is not implemented.

Documents may be viewed and edited in non-formatted or formatted modes, with or without tags showing. The switching between modes is slow (about 10 seconds per page on a 25MHz 486), but isn't everything with Word 6? [Note: The author of this assessment is used to a very fast DOS-based word- processor designed for the creation of large documents.] Switching back from formatted to non-formatted is not perfect (any inserted blank lines are left in). All the normal Word-facilities are available.

The key `failing' of SGML TagWizard is the formatting which is limited to 8 parameters for each element — font name, point size, colour, bold, italic, underline, blank lines before, blank lines after. It is not possible to develop anything other than very simple formatting facilities. It is only possible to automatically number ordered lists or to bullet un-ordered lists through the Word options rather than as part of the TagWizard formatting (in which case the bullets and/or numbers are exported into the ASCII SGML file). However, the formatting is just about sufficient to get a `feel' for the expected look of a document when properly formatted. Although advertised as a WYSIWYG editor, that is a clear over-statement.

Other problems are with the DTDs that are available. The ISO 12083:1994 (ISOBOOK) DTD is large and takes considerable time to validate. Because of its flexibility and the large set of tags that could be contextually correct at any one point, creating documents that only require a few elements (the majority) becomes difficult — not being able to see the wood for the trees. The DTD as supplied does not provide access to the Math elements. On the other hand, the DTD does provide access to the additional ICADD attributes, so ensuring documents can be created for use by the partially sighted (if suitable display systems are available). The HTML DTD was an afterthought for the developers of WWW. So, although it has been correctly implemented in SGML TagWizard, there can be unusual effects where users of HTML do not necessarily understand the SGML — for example, in HTML the tags to initialize the two parts of a definition list item (DT and DD) are EMPTY elements when a regular user of SGML would have expected them to have end-tags to mark the closure of the definition term (DT) and the definition text (DD) respectively. This situation should be improved as HTML develops and as use of SGML TagWizard expands.

NICE Technologies is a small French company totally committed to the development and use of SGML tools. The company was founded by Eric van Herwijnen who has been involved in SGML activities for many years whilst employed by CERN. Development of SGML TagWizard is continuing, and the deficiencies reported above may well be removed in due course. At about £80, the product is value for money, and would be suitable for the introduction of SGML into a Word- community, but it is not suitable for heavy SGML use — a dedicated SGML editor would be a better purchase.


Product:
InContext Editor
Associated Products:
CAPS
Developer:
XSoft, Inc
UK Supplier(s):
XSoft Ltd
Price:
n/a
Platforms:
MS-Windows
Description/Assessment:
XSoft InContext is a Microsoft Windows based SGML document instance editor. InContext works with any DTD and features quick-and-easy editing and manipulation of text, elements, attributes and entities. InContext actually works from the DTD itself and not from a binary representation of the DTD which other SGML editing products use.

InContext allows users to:

  1. create and edit SGML documents conforming to a specified DTD;
  2. work on several documents at the same time (even if each conforms to a different DTD);
  3. create and edit graphics and tables.

InContext uses external programs to handle the editing of both graphics and tables. Graphics (bitmaps) are edited through the use of the pbrush application that comes with windows but they can be created by any graphics application. Tables are created/edited with Microsoft's Excel 4.0 application so if Excel is not on the PC using InContext, tables have to be incorporated as graphics!

When InContext is invoked, two views of the SGML document instance are displayed: the first displays the logical structure of the document (element names but no text); and the second view displays the text of the document. The text of the document is displayed in a block format and there is no attempt to display the SGML document as anything other than an SGML document — there is no concept of formatting the text in any way. Entities are displayed surrounded by the & and ; eg. ‘é’ instead of the actual character being displayed. In Windows, the replacement of the entity with the actual character (an acute e) is straight-forward and would make reading of the document on screen so much easier.

InContext does ensure that the SGML document instance is always in compliance with the DTD and offers the normal facilities that a word-processor offers (cut/paste, spell checker, grammar checker and thesaurus).

InContext allows modification of the following SGML declaration variables through the use of the ic.ini file that Windows uses. These variables are: ATTCNT, ATTSPLEN, BSEQLEN, DTAGLEN, DTEMPLEN, ENTLVL, GRPCNT, GRPLVL, LITLEN, NAMELEN, NORMSEP, PILEN, TAGLEN, TAGLVL.

InContext is good at what it does — making the creation and editing of an SGML document instance simple, but there are drawbacks: the main one being its reliance on Excel 4.0 to handle the tables — not everyone has Excel installed on their PC. There can also be problems if the user has the SGML_PATH environment variable set, which is the mechanism used by the Sgmls parser (see below) for finding public entities. InContext uses a modified form of the Sgmls parser but has its own specific way of mapping public entities to physical files. However, if the SGML_PATH environment variable is set then the internal parser seems to get confused and sometimes fails to find entities correctly. This behaviour has been reported to Xsoft as a bug.

There is no form of context- or attribute-sensitive processing since InContext is a editor with no concept of displaying the text of a document in a formatted presentation. It is a stand-alone system, with no interprocess communication facilities or internal command language by which it may be customised.

InContext is an editor, and only an editor, facilitating the creation and editing of SGML documents. Version 2.0 is due to be released in October 1994.


Product:
FrameBuilder and SGML Toolkit
Associated Products:
FrameMaker
Developer:
Frame Technology, Inc. (USA)
UK Supplier(s):
Frame Technology International Ltd
Price:
FrameBuilder: Windows 3.1 & Macintosh: £1,195,
X/Motif: (shared) £3,250, (fixed) £1,595
SGML Toolkit: Windows 3.1 & Macintosh: n/a,
X/Motif: £15,000
Platforms:
Sun and HP Unix
Description:
FrameBuilder provides a structured document environment that supports document models described in SGML DTDs. The FrameBuilder product line provides two software components: FrameBuilder to create edit and revise documents while maintaining document structure; FrameBuilder Development Edition to create custom-based document applications.

Assessment:
The Frame SGML ToolKit (see below), may be used to import and export both DTDs and SGML instances into and out of FrameBuilder.

FrameBuilder is an enhanced version of Frame Technology's FrameMaker, which is popular as a technical publishing application which runs on multiple platforms. FrameBuilder is not an SGML system in that it does not operate directly on an SGML file nor does it directly support some of the concepts of SGML. The SGML Toolkit is an add-on product which allows the user to specify how the structures found within an SGML document may be mapped to FrameBuilder structures and vice versa. An SGML document can then be loaded into FrameBuilder, edited, and re-exported as an SGML document, though with some restrictions. [Note that this process requires access to the Frame SGML ToolKit, currently available only on Sun SPARC systems.]

The FrameBuilder document model is of a hierarchy of typed objects. Document classes are defined by a EDD (Element Definition Document). This definition is similar to the element structure defined by an SGML DTD, and the SGML Toolkit provides functions for mapping the element structure defined by a DTD into that understood by an EDD. So if your EDD `matches' your DTD then SGML can be mapped into FrameBuilder, manipulated, and exported back to SGML; the resulting instance should parse correctly against its DTD.

A major difference between the document models expressed by a DTD and those understood by FrameBuilder is that the objects (elements) defined by a DTD may also be labelled (ie. they may have attributes). FrameBuilder has no concept of labelled objects, therefore any attributes declared in a DTD must be mapped to pseudo objects in FrameBuilder. The SGML Toolkit recognises two type of attribute : active and passive. `Active' attributes are those which have a direct affect on the formatting of a document (for example, ), and these are mapped to FrameBuilder formatting characteristics. `Passive' attributes are those which do not directly affect the appearance of a document (eg. ), and these are mapped to FrameBuilder objects.

FrameBuilder provides users with Guided Editing which Frame claims is unique. When editing a document, FrameBuilder understands the concept of information objects in a hierarchy and element content rules. So at any point in that document FrameBuilder tells the user, via an element catalog, which elements are valid at that point. The user sees no SGML syntax or markers, instead FrameBuilder offers the user a `structure view' which shows the elements in the document in their hierarchy. The document can be manipulated through either view.

The interface with SGML instances and DTDs is the SGML Toolkit and is designed to be used by application developers with SGML expertise and a good working knowledge of FrameBuilder. Note that getting some DTDs to map properly can take a lot of work, and in some cases the `document' paradigm applied by FrameBuilder is just not applicable (for instance, those DTDs which represent database structures rather than documents).

The SGML Toolkit contains:

Default mapping is provided for common situations — tables are mapped using a fragment of a DTD that Frame provide. This default behavior can be modified using the Read/Write Rules, this is also where particular notations can be associated with filters so that graphics and illustrations can be imported automatically.

The Toolkit API allows you to add custom functionality to the base product and create special handling for unusual SGML requirements. All API programming is in C.

All the standard processes that need to be looked at, from document analysis to integration with other applications, are still applicable with FrameBuilder. It is not, in most instances, a shrink wrapped product.


Product:
Interleaf5 <SGML> and Toolkit
Associated Products:
Interleaf <SGML> Gateway, WorldView
Developer:
Interleaf, Inc, MA, USA
UK Supplier(s):
Interleaf (UK) Ltd
Price:
Platforms:
as for non-SGML Interleaf products
Description:
Interleaf5 <SGML> is an extended version of their publishing software. It contains a WYSIWYG, context-sensitive editor with fully integrated creation, revision and validation of SGML documents. The product offers the flexibility to create a full range of information, from structured SGML documents to non-structured documents.

The Toolkit is an application development environment that will allow the user to write and extend DTDs, create and support new SGML applications, modify Interleaf5 <SGML> functions and user interface to automate tasks and to tailor the system for specific requirements.

Interleaf <SGML> Gateway is a conversion toolkit to create filters that convert Interleaf documents into SGML documents.

The WorldView system has two components — WorldViewer and WorldView Press. WorldViewer accepts Interleaf5 <SGML> documents, and WorldView Press processes most other types of file for use on WorldViewer.

Assessment:
This product has not been assessed. To use it propoerly, the user requires considerable help and advise from Interleaf to use the Toolkit to develop the modifications required to each installed version of Interleaf5 for each DTD required. For this reason, and the fundamental size of the package, assessment has not been attempted.


Product:
GUIDE Professional Publisher (v3.5)
Associated Products:
GUIDE Table Viewer, GUIDE Full Text Indexer, GUIDE Image, GUIDE Author, GUIDE Reader
Developer:
Info Access, Inc. (formerly OWL International, Inc.)
UK Supplier(s):
Clark Associates, Fife, Scotland
Price:
n/a
Platforms:
MS-Windows
Description:
The GUIDE products provide the range of publishing tools for the production, indexing, dissemination and reading of complex electronic documents.

GUIDE Professional Publisher transforms existing text files into suitable "hypertext" format for the various GUIDE reader and indexer products. It accepts text in formats from the major word-processors (Word and WordPerfect) and other formats, including SGML-tagged documents. It is not a specific SGML publishing system. Its strengths lie in being able to handle a wide variety of formats, having a range of associated products to handle all aspects of large collections of elctronic manuals and similar documents, and being based on MS-Windows rather than the more usual Unix workstation for products of this complexity.