SGML '93 Conference Report, by Michael Popham


The following report was obtained from the Exeter SGML Project FTP server as Report No. 9, in UNIX "tar" and "compress" (.Z) format. It is unchanged here except for the conversion of SGML markup characters into entity references, in support of HTML.

THE SGML PROJECT                                 SGML/R25

SGML '93
BOSTON, MA, USA 6TH-9TH DECEMBER 1993           Issued by
					 Michael G Popham
				       22nd December 1993


The conference opened with a welcome from Norm Scharpf of the 
GCA.  This year, around 450 attendees were anticipated, about 
50% of whom were attending their first SGML conference.  Norm 
announced that the GCA will be taking over the maintenance of 
the AAP/ISO 12083 DTDs from EPSIG.

In his initial remarks, the Conference Chair (Yuri Rubinsky) noted 
that this was the largest SGML event in history.  The size of the 
conference  meant that it was split into two tracks running 
concurrently, one for SGML novices, the other for experts.  (I 
attended the technical track for experts).

1.      "The SGML Year in Review" - Yuri Rubinsky, B 
	Tommie Usdin, and Debbie Lapeyre
2.      Keynote Address: "TEI Vision, Answers and 
	Questions: SGML for the Rest of Us" - Lou Burnard 
	(Text Encoding Initiative)
3.      Poster Session
4.      Reports from the Front
5.      Multi-company SGML Application Standard 
	Development Process" - Bob Yencha (National 
	Semiconductor), Patricia O'Sullivan (Intel 
	Corporation), Jeff Barton (Texas Instruments), Tom 
	Jeffery (Hitachi Micro Systems, Inc.), Alfred 
	Elkerbout (Philips Semiconductors)
6.      "Archetypal Early Adopters? Documentation of the 
	Computer Industry"
6.1     " Information Development Strategy and Tools - 
	IBM and SGML" - Eliot Kimber and Wayne Wohler 
6.2     Eve Maler (Digital Equipment Corporation)
6.3     "Implementation of a Corporate Publishing System" 
	- Beth Micksh (Intergraph)
6.4     Jon Bosak (Novell)
7.      International SGML Users' Group Meeting
8.      "Real World Publishing with SGML" - Terry Allen 
	(Digital Media Group, O'Reilly & Associates, Inc.)
9.      "HyTime Concepts and Tools" - Dr. Charles 
	Goldfarb (IBM)
10.     "HyTime: Today's Toolset" - Dr. Charles Goldfarb 
	(IBM) and Erik Naggum (Naggum Software)
11.     "Charles Goldfarb, Please Come Home: SGML, the 
	Law, and Public Interest" - Ina Schiff(Attorney at 
	Law) and Allen H Renear (Brown University) 
12.     "Online Information Distribution" - Dave Hollander 
	(Hewlett Packard)
13.     "Xhelp: What? Why? How? Who?" - Kent Summers 
	(Electronic Book Technologies)
14.     "Digital Publications Standards Development (DPSD): 
	A Modular Approach" - Horace Layton (Computer 
	Sciences Corporation)
15.     "A Practical Introduction to SGML Document 
	Transformation" - David Sklar (Electronic Book 
16.     "SGML Transformers: Five Ways" - Chair: Pam 
	Gennusa (Database Publishing Systems Limited)
17.     "The Scribner Writers Series on CD-ROM: From a 
	Great Pile of Paper to SGML and Hypertext on a 
	Platter" - Harry I. Summerfield (Zandar 
	Corporation), Anna Sabasteanski (Macmillan New 
18.     "The Attachment of Processing Information to SGML 
	Data in Large Systems" - Lloyd Harding (Mead Data 
19.     ISO 12083 Announcment" - Beth Micksch 
	(Intergraph Corp.)
20.     Reports from the SGML Open Technical Committees
21.     "A Technical Look at Authoring in SGML" - Paul 
	Grosso (ArborText)
22.     "Implementing an Interactive Electronic Technical 
	Manual" - Geoffrey von Limbach, Bill 
	Kirk(InfoAccess Inc.)
23.     "The Conversion of Legacy Technical Documents into 
	Interactive Electronic Technical Manuals: A NAVAIR 
	Phase II SBIR Status Report" - Timothy Billington, 
	Robert F. Fye (Aquidneck Management Associates, 
24.     New Product Announcements and Product Table Top 
25.     Poster Session
26.     "Implementing SGML Structures in the Real World" 
	- Tim Bray (Open Text Corp.)
27.     "User Requirements for SGML Data Management" - 
	Paula Angerstein (Texcel)
28.     "A Document Query Language for SGML Databases" 
	- Ping-Li Pang, Bernd Nordhausen, Lim Jyh Jang, 
	Desai Narasimhalu (Institute of Systems Science, 
	National University of Singapore)
29.     Closing Keynote - Michael Sperberg-McQueen (Text 
	Encoding Initiative)

1.      "The SGML Year in Review" - Yuri Rubinsky, B Tommie Usdin, 
	and Debbie Lapeyre

The full text of "The Year in Review" will be published in <TAG> 
and also posted to comp.text.sgml.

The review was loosely split into a number of categories, the first 
of which focused on Standards Activity.  The interim report of the 
review of ISO 8879 will be published in <TAG>, however it is 
clear that changes to the SGML Standard will be required.  ISO 
10744, the HyTime Standard is being adopted by IBM, whilst 
TechnoTeacher are producing relevant tools; also, at least 3 
HyTime-related books are currently in preparation.   A revised 3-
level version of DSSSL will be out early next year, and the SPDL 
Standard is now ready to go to press.  Information on  ISO 12083 
(the fast-tracked version of the revised AAP DTD) will be given at 
this conference. 

User group activity - the Swedish group has been very active in 
the last twelve months as has the Japanese SGML forum, which 
attracted 400 people to an open meeting on SGML.  Erik Naggum 
was welcomed as the new Chair of SGML SIGhyper.

Major Public Initiatives - SGML Open was founded this year, 
more information was given later in the week. The NAA and IPTC 
(both major news industry bodies), have been working on an 
SGML-based universal text-format, the "News Industry Markup 
Language" [?], for interchanging news service data.  Co-ordinators 
of The Text Encoding Initiative (TEI) met with people developing 
the World Wide  Web (WWW) to discuss the production of 
HTML+ (a revised version of the Hypertext Markup Language, the 
markup scheme recognized by WWW browsers).  The TEI have 
now completed all their major goals and some supplementary 
work, which is now publicly available (via ftp).  The 
DAVENPORT group made an amicable split into DAVENPORT 
and CApH; more details were given later in the conference.  In the 
US, 18 other states have followed Texas in requiring text books to 
be produced following using SGML and following the  ICADD 
guidelines (the International Committee for Accessible Document 
Design) - several companies have said they will be providing 
tools to handle the ICADD tagset.  By 1995, all companies in the 
US will be able to provide their financial information according to 
EDGAR.  British Airways is developing an ATA DTD-based 
system, and Lufthansa already have an SGML-based system in 
place (details of which were given at SGML `93 Europe). 

Publications - SGML has received an increasing amount of 
coverage in the mainstream computer press.  Prentice Hall will be 
publishing a series of books to do with open information 
interchange, under the guidance of Charles Goldfarb.  Kluwer will 
publish "Making Hypermedia Work" a handbook on HyTime by 
Steve de Rose and David Barnard.  A new version of Eric van 
Herwijnen's "Practical SGML" will be available early in 1994.  
Van Nostrand Reinhold will be publishing a manager s' guide to 
SGML.  Exoterica released their "Compleat SGML CD-ROM" in 
1993, and will be releasing a conformance suite CD-ROM next 
year.  Elliot Kimber has written a HyQ tutorial (HyQ is the query 
language described in the HyTime standard), which is available 
via ftp.

Major corporations and government initiatives - The American 
Memory Project (run by the Library of Congress), has chosen to 
use SGML to create a text base of materials.  IBM has developed 
an internal SGML-based system, called IBMIDDOC, which they 
will use to create, manage and produce/deliver all their product 
documentation.  The OCLC have been selected to develop an 
SGML-based publishing system for use by the ACM.  The British 
National Corpus, a 100 million word tagged corpus, is to be made 
available next year.  Springer Verlag is currently producing 50 
journals a year using an SGML-based system, and next year this 
figure will rise to 100 [150?].  Various patent/trademark bodies, 
including the US Patent and Trademark Office, the European 
Patent Office and the Japanese Patent Office, are adopting SGML-
based approaches.  In France, SGML is being used by a number of 
key players in various industries (maritime, aerospace, power 
industry, publishing), whilst SGML uptake in Australia is also 
increasing.  UCLA is adopting SGML for all its campus-wide 
information publishing, both on-line and on paper, as is the Royal 
Institute of Technology in Sweden.

Miscellaneous - Adobe has an agreement with Avalanche to 
develop SGML filters to move information to/from Acrobat.  
Lotus Development Corporation is looking at incorporating SGML 
awareness into a future version of Ami Pro.  Microsoft announced 
the development of an SGML-based add-on for Word (to be called 
"SGML Author").  The American Chemical Society is updating its 
SGML DTDs, whilst the IEEE is developing a suite of DTDs for 
publishing its standards.  The Oxford Text Archive now has about 
100 works tagged with TEI-conformant SGML, which are 
available for ftp over the Internet.  The first SGML summer school 
was organized in France, and attracted 35 attendees.  Joan Smith, 
the "Godmother of SGML" and founder of the International 
SGML Users' Group, has retired this year; Yuri remarked that her 
presence will be greatly missed and thanked her for all her efforts 
over the years.  And in the "believe it or not" category, came the 
news that IBM will be producing an SGML-tagged CD-ROM of 
interviews published in Playboy magazine between 1962 and 

2.      Keynote Address: "TEI Vision, Answers and Questions: 
	SGML for the Rest of Us" - Lou Burnard (Text Encoding Initiative)

This presentation looked at the wider relevance of the Text 
Encoding Initiative (TEI) - its origins, goals and achievements. It 
included an overview (tasting?) of the TEI's DTD pizza model and 
the TEI class system, and a look at some TEI techniques and 

The TEI comes primarily from the efforts of the Humanities 
research community.  Sponsors include the Association for 
Computational Linguistics (ACL), the Association for Literary and 
Linguistic Computing (ALLC), and the Association for Computing 
and the Humanities (ACH).  Funding bodies include the US 
National Endowment for the Humanities, the  Mellon Foundation, 
DG XIII of the European Commission, and the Social Science and 
Humanities Research Council of Canada.

The TEI addresses itself to a number of the fundamental problems 
facing Humanities researchers, although their findings are widely 
applicable throughout academia and beyond.  It looks at the re-
usability of information, particularly with regard to  issues of 
platform-, application-, and language-independence.  It accepts the 
need to support varied information sources (such as text, image, 
audio, transcription, editorial, linking, analysis etc.).   The 
developers of the TEI's guidelines have also given careful 
consideration to the interchange of information  - issues such as 
what and how to encode, generality vs. specificity, providing an 
extensible architecture and so on.

The basics of the TEI's approach were published in the first draft 
of "Guidelines for the Encoding and Interchange of Machine-
Readable Texts" (also known as TEI P1).  Specialist workgroups 
were set up as part of the process to develop the guidelines.  They 
focused on identifying significant particularities, ensuring that the 
guidelines were independent of notation or realization,  avoiding 
controversy, over-delicacy, or inadequacy, and seeking 
generalizable solutions.  The second draft, TEI P2, covers such 
areas as: segmentation, feature-structures, certainty, manuscripts, 
historical sources, graphs, graphics, formulae and tables, 
dictionaries, terminology, corpora, spoken texts, and so on.  The 
consequences of the TEI's approach mean that they have focused 
on content, not presentation.  The guidelines are  descriptive, not 
prescriptive in nature - and any redundancy has been cut out.  
The aim of the TEI's work (in addition to producing the 
guidelines) was to create a modular, extensible DTD.

To date, the TEI have produced a number of outputs.  It has 
created a coherent set of extensible recommendations (contained 
in the guidelines).  It has made available a large number of SGML 
tagsets, which can be downloaded from several public ftp servers 
around the world.  In addition, the TEI has developed a powerful 
general purpose SGML mechanism (using global attributes, etc.).  
TEI P2 serves as a reference manual for all of these aspects.  The 
current version of TEI P2 is now available as a series of fasicules 
(available via ftp) which outline the basic architecture and core 
tagsets.  For each element, TEI P2 provides descriptive 
documentation,  examples of usage, and reference in formation.  It 
also contains some  DTD fragments (i.e. tagsets) which can be 
used to create TEI-conformant applications.
Lou then raised the more general question of "How many DTDs 
does the world really need?" - to which there are several possible 
answers.  One massive and/or general (vague) DTD might meet 
the lowest common denominator of peoples' needs, but the TEI 
could only develop it if they adopted a "we know what's best for 
you"-philosophy. At the other extreme, one could argue that the 
world does not need any generalized DTDs, because no-one could 
write a DTD (or set of DTDs) which could truly address all the 
specific concerns of individual users.  An intermediate alternative 
would be to develop "as many [DTDs] as it takes" to support a 
particular level of user requirements.  However, the TEI believes 
that it is possible to adopt an entirely different approach to the 
three (extremes) mentioned above, which  it calls "The Chicago 
Pizza Model".

The Pizza Model allows users to make selections from a 
prescribed variety of options, in order to meet their particular 
needs as closely as possible.  The overall model assumes that a 
pizza is made up of a choice of bases, mandatory tomato sauce and 
cheese, plus a choice of any combination from a given list of 
toppings.  In terms of the "TEI Menu", the base consists of a 
choice of one tagset to describe prose, verse, drama, transcribed 
speech, letters and memoranda, dictionaries, or terminology.  The 
TEI's header and core tag sets are mandatory (i.e. they are the 
equivalent of the cheese and tomato sauce which come with every 
pizza), after which the user can select one or more tagsets to 
describe linking, analysis, feature structures etc.

The mandatory parts of the TEI's model, cover a number of vital 
aspects.  The TEI header consists of an AACR2-compatible 
bibliographic description of an electronic document and its 
sources.  The header also contains standardized descriptions of the 
encoding systems applied, codebooks and other metadata, and the 
document's revision status.  The core tag set provides markup for 
highlighted phrases (e.g. emphasis, technical terms, foreign 
language matter, titles, quotations, mention, glosses etc.), '"data" 
(e.g. names, numbers, dates, times, addresses), editorial 
intervention (e.g. corrections, regularizations, additions, 
omissions, etc.) as well as lists of all kinds, notes, links and 
pointers, bibliographic references, page and line breaks, verse and 
drama.  Lou gave some examples of the use of core tags, and how 
to customize the TEI DTD to rename elements, undefine elements 
and select tagsets.

The TEI have also adopted a class system. Element classes consist 
of semantically related elements which may all appear at the same 
place in model.  Attribute classes  are made up of semantically 
related elements which share a common set of attributes.  The 
classes are  implemented using parameter entities, which 
simplifies documentation and  facilitates controlled modification 
of the DTD by the user.  Lou gave an example of how new 
elements could be added to a class using the TEI's technique.  
Unfortunately, due to time constraints, Lou did not present his 
slides on the TEI's use of global attributes, nor how the problems 
of alignment, using identifiers, pointers and links are dealt with.

Several major scholarly publishers (e.g. Chadwyck Healey, Oxford 
University Press, Cambridge University Press) have begun to 
adopt the TEI's guidelines and implement their techniques.  The 
same is true of a number of large academic projects - such as the 
Women Writers project (at Brown University), CURIA, the 
Wittgenstein Archive, the Oxford Text Archive, etc. - and some  
Language and Research Engineering (LRE) projects (e.g. 
EAGLES, and the British National Corpus).  The TEI's work is 
also being taken up by librarians and archivists (such as in the 
Library of Congress' American Memory Project, at the Center for 
Electronic Texts in the Humanities (CETH), and so on).

Developing a TEI conformant application still requires the 
essential processes of systems analysis/document design.  Once 
this has been done, the designers can choose the relevant TEI 
tagsets using the "pizza model" approach.  Any restrictions, 
modifications or extensions must be carefully identified and 
documented (to help ensure the usefulness of the tagged data to 
later scholars).  The TEI DTD (i.e. its various constituent tagsets) 
is now available for beta test, and can be downloaded from a 
number of sites around the world (e.g. ftp from 
[] in directories tei/p2/drafts and tei/p2/dtds; or send 
email to containing the single line: sub tei-
l <Your Real Name>)

Once the testing of the TEI P2 DTD is complete, any revisions 
will be incorporated into a final version of the "Guidelines..." to be 
published as TEI P3.  The next phase of work will involve the 
development of application-specific tutorials (including electronic 
versions), development of appropriate software tools (e.g. TEI-
aware application packages), and the creation of new tagsets to 
extend the TEI's guidelines to support new kinds of application.

3.      Poster Session

Poster sessions formed a much larger part of this year's conference 
schedule than on previous occasions.  The idea behind the session 
is that they allow speakers to give a short, informal presentation on 
any topic, after which they are available for questions and 
discussion.  Each speaker can support his/her presentation with 
one or more specially created posters which may consist of 
anything from summary diagrams or a list of points, to the full-text 
of a presentation. 

It would be impossible to provide full details of all the poster 
presentations.  However, given below are the title and extracts 
from the poster abstracts for each of the presentations mentioned 
in the programme.  (N.B. Some poster sessions were put on 
impromptu, and these are not show below).  The posters were 
loosely grouped into categories.

SGML Transformations:

"From SGML to Acrobat Using Shrinkwrapped Tools" - the 
transformational process the BCS uses to transform its documents 
into pdf files freely distributed over the internet. (Sam Hunting, 
BCS Magazine)

"SGML Transform GUI" - describes a language-based, syntax-
independent GUI for SGML structure and style semantic 
transformation, that supports both declarative and procedural 
processing models.  (Michael Levanthal, Oracle Corporation)

"A Tale of Two Translations" - a comparison of the development 
of translation programs using Exoterica's OmniMark, and 
Avalanche's SGML Hammer. (Peter MacHarrie, ATLIS 
Consulting Group)

"Data Conversion Mappings" - mapping old data formats to new, 
using 1 to 1, 1 to many, 0 to 1 and 0 to many conversions, and how 
these mappings can be automated.  (David Silverman, Data 
Conversion Laboratory)

"DTD to DTD Conversion for Producing Braille Automatically 
from SGML" - following the techniques created by the 
International Committee on Accessible Document Design 
(ICADD) to produce braille, large print and voice synthesized 
books from SGML source files. (David Slocombe, SoftQuad)

"Let's Play UnTag!" - untagging an SGML document to get a 
proprietary format.  (Harry Summerfield, Zandar Inc.)
"Introducing Rainbow" - a DTD archetype for representing a 
wide variety of proprietary word processor data formats to 
facilitate proprietary-to-SGML interchange and transformation.  
(Kent Summers, Electronic Book Technologies)

"Converting Tables to SGML" - converting legacy table data 
from typesetting files or formatted visual representation into 
SGML.  (Brian Travis, SGML Associates)
Business Case for SGML:

"Fear and Loathing in SGML: Life After CALS" - an overview 
of a recent study of SGML products and markets.  (Antoinette 
Azevedo, InterConsult)

"Designing Open Systems with SGML" - the business role and 
benefits of using SGML, and how to design an SGML based 
system.  (Larry Bohn, Interleaf)

"The Commercialization of SGML" - a review of the strengths 
and benefits of SGML and its current perception in the 
commercial business world.  (Allen Brown, XSoft)

"SGML: Setting up the Business Case" - an approach to making 
the business case for SGML.  (Eric Severson, Avalanche 
Development Corporation)

"Document Management Lingo: Why Executives Buy SGML" - 
a framework for selling SGML in relation to document 
management.  (Ludo Van Vooren, Interleaf)

How To......

"To 'INCLUDE' or 'EXCLUDE' That is the Question" - the use 
of INCLUSION and EXCLUSION  in DTDs.  (Bob Barlow, 
Barlow Associates)

"Communicating Table Structures Using Word-processor Ruler 
Lines" - a method for writers to indicate the structures of tables 
using simple, mnemonic "ruler line encoding".  (Gary Benson, 
Fluke Corporation)

"Pre-Fab Documents: Modularization in DTD design" - groups 
of related document types are traditionally described in large, 
monolithic DTDs.  If the related document types contain similar 
structures, they can be described as a series of related DTD 
modules.  (Michael Hahn, ATLIS Consulting Group)

"SGML + RFB = Access to Documents" - how Recording for the 
Blind, Inc. (RFB) provides E-Text materials to print-disabled 

"Handling Format/Style Information" - using FOSIs to describe 
formatting/style information, and how to develop FOSIs using the 
Output Specification DTD provided in MIL-M-28001B.  (Denise 
Kusinski and Pushpa Merchant, World Computer Systems Inc.)

"Remodeling Ambiguous Content Models Through ORs" - using 
factorization to avoid ambiguity in model groups resulting from 
improper use of occurrence indicators and OR connectors.  (John 
Oster, McAfee & McAdam, Ltd.)

"Reuse and conditional processing in IBM IBMDOC" - [not 
listed in programme] (Wayne Wohler and Eliot Kimber, IBM)

"An easy way to write DTDs" - [not listed in programme] 
([speaker unknown])

Case Studies:

"SGML and Natural Language Processing of Wire Service 
Articles" - the Mitre Corporation's use of Natural Language 
Processing and SGML-tagging to add value to news-wire articles 
and other kinds of document.  (John D Burger, MITRE)

"SGML Databases" - (Mike Doyle, CTMG Officesmiths)

"Active Information Sharing System (AISS) SGML Database API" 
- the applications interface to the AISS SGML database to 
produce a total solution integrated SGML environment.  (Hee Joh 
Lian, Information Technology Institute (ITI) Singapore)

"AISS Document Formatting API" - a processing model of a 
native SGML fomatter, the issues involved and architectural forms 
to define them.  (Yasuhiro Okui, NIHON UNITEC CO. LTD.)

"AIS Document Management API" - controlling workflow using 
SGML and HyTime.  (Roger Connelly, Fujitsu; Steven R 
Newcomb, TechnoTeacher)

"Paperless Classroom" - building an interface to large amounts 
of information through the use of SGML-based hypermedia 
technology.  (Barbara A Morris, Navy Personnel Research and 
Development Center)

"Integrating SGML into On-line Component Information 
Delivery" - a comparison of using manual and SGML-based 
processes for database loading.  (Javier Romeu, Info Enterprises 
Inc. - A Motorola Company)

"SGML Support for Software Reuse" - using SGML to markup 
software for reuse (John Shockro, CEA Inc.)

"What is an IETM?" - what constitutes an "Interactive Electronic 
Technical Manual" (IETM)?  The different classes of IETM and 
how to build one.  (Geoffrey von Limbach, InfoAccess 

"Use of SGML to Model Semiconductor Documents" - tagging 
information on electronic components,  which can be used to 
produce printed documents and treated as machine-readable data.  
(Various speakers, Pinnacles Group)

"Producing a CALS-Compliant Technical Manual" - ensuring 
that the DTDs, Tagged Instances, and FOSIs support the users' 
needs for creating CALS compliant Air Force Technical Manual 
Standards and Specifications.  (Susan Yucker and Matthew 
Voisard, RJO Enterprises Inc.)

"SGML/HyTime and the Object Paradigm" - a comparison of the 
object-oriented and SGML/HyTime ways of representing 
information.  (Steven R Newcomb, TechnoTeacher Inc.)

"An object-oriented API to SGML/HyTime documents" - the 
design of some of the HyMinder C++ library (developed by 
TechnoTeacher), and a comparison of SGML/HyTime constructs 
and HyMinder object classes.  (Steven R Newcomb, 
TechnoTeacher Inc.)

"SlideShow:  An Application of HyTime Application" - the 
system design and architectural forms used to create a sample 
HyTime application called SlideShow.  (Lloyd Rutledge, 
University of Massachusetts).

"HyQ: HyTime's ISO-standard SGML query language" - a 
discussion of the main features and advantages of HyQ.  (Steven R 
Newcomb, TechnoTeacher/Fujitsu/ISI)
Technical Gems:

"A Document Manipulation System Based on Natural Semantics" 
- Natural Semantics, its relationship to SGML, and the results of 
some document manipulation experiments.  (Dennis S Arnon, 
Xerox PARC; Isabelle Attali, INRIA Sophia Antipolis; Poul 
Franchi-Zannettacii, University of Nice Sophia Antipolis).

"Digital Signatures Using SGML" - a schema for digital 
signatures of electronic documents using SGML.  (Bernd 
Nordhausen, Chee Yeow Meng, Roland Yeo, and Daneel Pang 
Swee Chee, National Information Infrastructure Division).

"CADE - Computer Aided Document Engineering" - a 
framework of six methodologies for the Document Development 
Life Cycle.  (G Ken Holman, Microstar Software Ltd.)

"Using SGML to Address the 'Real' Problems in Electronic 
Publishing" - using Requirements Driven Development (RDD) 
for the generation of information capture and production 
environments, so as to ensure the balance of the supply and 
demand for data.  (Barry Schaeffer, Information Strategies Inc.)

"Recursion in Complex Tables" - a recursive-row table model, 
that can model the structure of most tables with multi-row 
subheads.  (Dave Peterson)

4.      Reports from the Front

In this session, several speakers briefly outlined major SGML-
related industry activities.
Beth Micksh summarized the history and purpose behind parts of 

CALS MIL-M-28001, dealing with the use of SGML.  The latest 
version MIL-M-28001B was made available last summer, and it 
supports the electronic review of documents.  Early in 1994, the 
publication of a "MIL SGML Handbook" is expected; it will cover 
some of the fundamental and general aspects of SGML, as well as 
providing some valuable CALS-specific information.

Terry Allen described the work of the Davenport Group.  He 
outlined their general purpose and main members, full details of 
which are given in the publicly available material circulated by the 
Group.  the DOCBOOK DTD has been the Davenport Group's 
major accomplishment to date - and v2.1 should be ready for 
release in the week commencing December 13th 1993; 
announcements will be posted to the comp.text.sgml newsgroup.

Steve Newcomb talked about CApH, a breakaway group from 
Davenport, which focuses on Conventions for the Application of 
HyTime.  They intend to provide guidance on how to use HyTime 
for anyone who wishes to  adopt ISO 10744 for the interchange of 
multimedia and/or hypertext information.  CApH will provide 
general policies and guidelines on how to tackle typical problems 
(e.g. how to generate a master index for a set of documents which 
all have their own separate indexes), but will not deal with how to 
enforce such policies

The next speaker discussed the joint efforts of the International 
Press and Telecommunications Council (IPTC) and the 
Newspapers Association of America (NAA) to devise a base set of 
tags for marking up newswire text.  This work will effect not only 
the providers of this kind of  information (e.g. companies such as 
Reuters), but also the newspapers and broadcast services which 
make use of it, and the database/archiving specialists (such as 
Mead Data Central who will want to store it.

Eddie Nelson spoke about the ATA DTDs, which he stressed were 
designed for interchange only.  The ATA DTDs will influence the 
documentation activities of all the manufacturers, component 
suppliers, operators etc. in the commercial aviation industry.  To 
date, the ATA has released 8-9 DTDs, mostly dealing with the 
major types of technical manual.

Dianne Kennedy discussed the Society of Automotive Engineers 
(SAE) J2008 standard, which has been prompted by the emissions 
regulations in the Clear Air Act.  By model year 1998, all new 
automobiles sold in the US must be supported by documentation 
conforming to the J2008 standard.  J2008 is actually a suite of 
standards covering such aspects as the use of text, graphics and the 
interchange of electronic information - it is not just a DTD.  
Also, J2008 includes a data modelling (database) approach, which 
is separate from the actual documentation considerations; 
information required by the former is essentially relational in 
nature, whilst that for the latter is hierarchical.  Careful thought 
has been required to ensure that the documentation DTD will 
sufficiently support mapping into (i.e. populating) the data model.  
The next meeting of those involved in developing J2008 will take 
place in January 1994.

5.      Multi-company SGML Application Standard Development Process" - 
	Bob Yencha (National Semiconductor), Patricia O'Sullivan 
	(Intel Corporation), Jeff Barton (Texas Instruments), Tom 
	Jeffery (Hitachi Micro Systems, Inc.), Alfred Elkerbout 
	(Philips Semiconductors)

This presentation described the work of the Pinnacles Group, a 
consortium of the five major semiconductor manufacturers, to 
develop a common form of electronic product datasheet for 
interchange between themselves and their customers.

Product datasheets are relatively few (typically <10,000 per 
company) and small (typically < 100 pages), but they are very 
complex in terms of their structure and content.  The decision to 
adopt a common, SGML-based electronic solution, was based on 
the need to simultaneously resolve business problems (i.e. collect 
and deliver information efficiently), and respond to market 
pressures (i.e. customers wanted the information quickly and in 
electronic form).

Developing the DTD jointly by all the members of the Pinnacles 
group ensured harmonization, distributed the development costs, 
and encouraged the development of tools for both information 
providers and users.  The speakers repeatedly stressed the 
importance of having access to the knowledge of content experts 
during the document analysis phase, and the benefits of having 
observers to ensure continuity between the various analysis 
sessions that were held.  A cumulative document analysis process 
was strongly recommended.

The draft DTD is due out at the end of 1993.  Following a period 
for review and revision, it is expected to become an industry 
standard by April 1994.  Each individual company still needs to 
consider how it will customize the DTD for its own use, and how 
the standard will be implemented within the company. 

The speakers felt that if members of any other industries were 
considering forming a similar group, companies should join early, 
plan carefully, and ensure that the anticipated benefits are 
continually "sold" to participants and stakeholders throughout the 
entire development process.

6.      "Archetypal Early Adopters? Documentation of the Computer 

6.1     " Information Development Strategy and Tools - 
	IBM and SGML" - Eliot Kimber and Wayne Wohler (IBM)

Wayne Wohler began by describing the "BookMaster Legacy".  
BookMaster is a GML application used by IBM Information 
Developers to create IBM product documentation.  GML is very 
like SGML, although it lacks a concept comparable to the DTDs 
of ISO 8879.  BookMaster is a fairly extensive authoring language, 
which has met IBM's information interchange and reuse 
requirements for several years.  However, now that IBM supports 
more platforms and delivery vehicles (and wishes to interchange 
information with other enterprises), and to answer the growing 
demand from users, IBM has decided to migrate its Information 
Development (ID) operations to SGML.

Eliot Kimber described how this migration is being carried out.  
The procedure first involved the design of a processing 
architecture  (InfoMaster) on which to base the application 
language and semantics (IBMIDDOC).  Tools had to be found for 
authors, editors and users, existing data had to be migrated to the 
new environment, with documentation and educational materials 
being developed along the way.

InfoMaster is an architecture for technical documentation that 
defines the base set of element classes for technical documents; it 
defines how DTDs should specify the semantics of such 
documents and how programs should use the information.  IBM 
drew on the HyTime concept of Architectural Forms to 
standardize the application semantics, and to facilitate the 
interchange of information between different DTDs.

IBMIDDOC is based on industry standards, and was designed 
without bias towards any particular processing model.  It makes 
uses of explicit containment to describe all containing 
relationships and uses elements as the basis of all processing 
semantics.  All relationships between elements are treated as 
(HyTime-conforming) hyperlinks. IBMIDDOC does not use 
inclusions or exclusions, short references, or #CURRENT 

Documents conforming to IBMIDDOC are organized into 
conventional high level structures (e.g. prolog, front-, body- and 
back-matter), each of which can contain recursive divisions.  
Below the division level, elements are classified either as 
information unit elements (e.g. paragraphs, lists, figures etc.), or as 
data pool elements. (e.g. phrases and other flowed material).  
IBMIDDOC supports multimedia objects, and hyperlinks that can 
either be cross references or explicit hyperlink phrases.  Any 
element can be a hyperlink source or target anchor, and 
IBMIDDOC also supports HyTime's nameloc, dataloc, and name 
query features.

6.2     Eve Maler (Digital Equipment Corporation)

(I missed this session, as I was slogging round Boston trying to get 
my ailing laptop repaired.

6.3     "Implementation of a Corporate Publishing System" 
	- Beth Micksh (Intergraph)

(I also missed this session, as I was still slogging round Boston 
trying to get my ailing laptop repaired - this brief write up is 
based upon the copy of Beth's overheads included in the 
conference proceedings).

Intergraph has 15 documentation departments with 120 full-time 
writers and a number of third party documentation suppliers.  They 
publish and maintain over 400, 000 pages per year at an annual 
cost of $20million.  The objectives behind developing and 
implementing a corporate publishing system were to standardize 
and facilitate the document creation and maintenance process, and 
to create a corporate documentation production system.

The new system would be required to provide a standard 
documentation source format capable of supporting four different 
document types.  The system would need to be robust enough to 
handle the production of large software and hardware manuals, (in 
all the required data formats),  and also facilitate the reuse of 
source data in a multi-platform environment.  In addition, it should 
also make possible the provision of on-line information, allow the 
translation of existing system data, and support multilingual 

SGML was the obvious solution to many of these problems, and 
was adopted accordingly.  The anticipated benefits to both the 
corporation and to users were comparable to those that have been 
outlined previously in other presentations (i.e. cost savings and 
improved productivity, consistent document structures, greater 
information interchange and re-use etc. etc.).  The new system was 
implemented in a two phase process - the first phase designed to 
prove the principle concept(s); the second to produce the 
production ready system.  Development was done jointly by three 
divisions (systems integration, electronic publishing and corporate 
publishing services) using a modular approach.

One DTD provides the structure necessary to support all the 
required variants of Intergraph user documentation.   Filters have 
been developed to allow the conversion of legacy data from 
existing tagged ASCII and FrameMaker formats.
The success of the introduction of the new system has depended 
upon the cooperative efforts of all concerned- developers, input 
from users, and support from management.

6.4     Jon Bosak (Novell)
(I also missed this session), 

7.      International SGML Users' Group Meeting

This was the mid-year meeting of the International SGML Users' 
Group (ISUG), the AGM having been held at SGML Europe`93 in 
Rotterdam earlier this year.

After a welcome from Pam Gennusa (President of the ISUG), 
representatives from many of the Users' Group National Chapters 
addressed the meeting.  Most of the Chapters claimed a 
membership of around 45-70 individual members and a handful of 
corporate members.  Several people took the opportunity to 
announce the recent formation of new National/local Chapters 
(e.g. in Denmark, Sweden, US Tri-State, US Northern California), 
whilst others expressed an interest in setting up such groups (e.g. 
in US Alabama/South East [Beth Micksh] and Ottawa [Ken 
Holman]).  Pam announced that Richard Light, an independent 
consultant based in the UK, had taken over from Francis Cave as 
the Treasurer of ISUG following a vote by the ISUG Committee.

Some Chapters had been extremely active in the preceding twelve 
months, staging numerous well-attended events (often vendor-
oriented).  The first event staged by the newly formed Swedish 
Chapter attracted around 200 attendees to a special one-off SGML 
conference, although they do not yet have a clear idea of the size 
of their ordinary membership.

Several Chapter representatives/members reported feelings of 
apathy within their groups.  Some Chapters had held only one or 
two events in the past year and were having difficulties attracting 
active members or developing programmes which would re-
enthuse existing members to support Chapter activities.  Members 
from those Chapters which had organized several successful 
events outlined the strategies and policies that they had used, and a 
brief discuss ensued for the benefit of existing and newly forming 

Pam spoke briefly about the software which has been released 
through the ISUG (i.e. the ARCSGML Parser Materials, and  the 
IADS software).  She wished to re-emphasize that any software so 
released is not in any way endorsed, approved or checked by the 
ISUG.  The ISUG does not have the resources to undertake 
software investigation or evaluation, but is willing to  consider 
facilitating the distribution of any software which might be of 
interest to members of the SGML user community.  Pam also 
outlined the relationship that has been established between ISUG 
and the SGML Open industry consortium; the ISUG has a non-
voting place on the committee of SGML Open, and has put 
forward a proposal that ISUG members may be willing to 
participate in a case study exercise for SGML Open.

Brian Travis reminded all present that the monthly newsletter 
"<TAG>" is available at a special subscription rate to ISUG 
members.  Anyone interested should contact him for more details 
(Phone: +1 303-680-0875  Fax: +1 303-680-4906).  Daily editions 
of the <TAG> newsletter were also being circulated throughout 
the duration of the conference.

The next meeting of the ISUG will be the AGM, to be held at 
SGML Europe `94 (Montreux, Switzerland, May 15-19th 1994).

8.      "Real World Publishing with SGML" - Terry Allen 
	(Digital Media Group, O'Reilly & Associates, Inc.)

O'Reilly's online "Whole Internet Catalog" first appeared in 
printed form, produced using a troff-based approach.  The 
structure of the information was very loose, with the title being the 
only common structural element shared between all the entries.

Terry described how he managed to develop an online version of 
the "Whole Internet Catalog" by using a combination of sed and 
awk scripts to translate the source files into versions tagged with 
HTML markup.  HTML, the Hypertext Markup Language, is a 
DTD developed for providing information on the World Wide 
Web (WWW) - a global network of information servers linked 
together over the Internet.

HTML was first designed as a tag set, and although it can be used 
as at DTD there is no requirement that it should be.  The HTML 
aware browsers which are used to access information on WWW 
tend to treat HTML as if it is a procedural (rather than a 
descriptive) markup language.  It is left to the creator of the 
information to decide whether or not to validate markup against 
the HTML DTD, as current implementations of the browsers to 
not parse any document they process, and are very (perhaps too) 
forgiving of any markup discrepancies.  HTML is fundamentally 
lacking in terms of imposing any strong structural conventions (the 
occurrence and ordering of many elements are often optional), and 
its designers appear to have made some rather surprising decisions 
- such as the use of an empty <P> element to indicate the breaks 
between paragraphs.

Terry described how he has decided to filter all his files into a 
more constraining but still simple DTD.  The source files for the 
"Whole Internet Catalog" are now more like records in a database 
than narrative text files, but they can be easily converted (using 
OmniMark and an awk script) into HTML.  This approach has 
proved successful up till now, because Terry has been working 
alone to maintain and process the data.  New authors for the 
Catalog will need to be trained and provided with tools to support 
authoring with Terry's DTD.  Limited testing has shown that the 
most successful approach to adopt with new authors is likely to 
involve the use of SoftQuad's Author/Editor and a template file of 
tags to generate the source information.  Terry said that his 
experiences had shown that the authoring/editing process was less 
likely to be error prone if HTML attributes were actually 
represented as ordinary elements in his DTD - and if good use 
was made of display facilities (e.g. use of fonts and colour) to 
facilitate at-a-glance structural checks by humans.  He has also 
adopted a macro-based approach to map the source SGML files 
into gtroff  for printing on paper - although developing more 
robust filters may be required in the future.

Terry felt that the lessons he had learned were that the production 
and handling of SGML source is (currently) likely to be done in an 
heterogeneous environment (i.e. authoring using one set of tools, 
linking and processing using another, and so on).  He is still 
looking for good, cheap tools which will support the use of 
arbitrary DTDs, but an ideal future tool might be a sophisticated 
browser (such as Xmosaic) which also provided an authoring 
mode which supported user-supplied DTDs.  Terry reported that 
he has also been closely following the development of HTML+, a 
revised version of HTML, which might provide a more 
robust/constraining DTD.

9.      "HyTime Concepts and Tools" - Dr. Charles 
	 Goldfarb (IBM)

Dr. Goldfarb began by providing a very brief overview of SGML, 
and the advantages to be gained from its use.  He showed a simple 
example of some typical SGML markup, and then discussed the 
impact of SGML since its release as an ISO standard in 1986.  
SGML has become the dominant tool for producing structured 
documents, and has been widely adopted by industry, government 
and education.  However, the real impact of SGML is that 
information owners, not programs, now control the format of their 
data; information creators/providers are no longer at the mercy of 
the proprietary solutions developed by vendors.

Widespread adoption of SGML, will in turn encourage the 
creation of Integrated Open Hypermedia (IOH) information and 
systems.  IOH information is integrated in as much as all 
information is linkable, whether or not it was specially prepared 
with linking mind.  It is "Open" because the addressing of the 
linked location is not bound to a physical address until the link is 
"traversed" and the "anchor" accessed.  Whilst "Hypermedia" 
represents the union of hypertext (information which can be 
accessed in a random order) and multimedia (information 
communicated by more than one means eg. text + graphics + audio 
+ animation etc.)

HyTime, the Hypermedia/Time-based Structuring Language 
(ISO/IEC 10744), is an application of SGML that has been 
developed to enable IOH.  HyTime standardizes the most useful 
and general constructs and concepts to do with linking 
hypermedia/time-based information.  It facilitates hyperlinking to 
any  information object (whether or not it is SGML), has a rich 
model for representing time, and supports an isomorphic 
representation of time and space.  The success of SGML and 
HyTime is being driven by the fact that users are now demanding 
products that use real Standards, rather than those which merely 
offer proprietary solutions.

HyTime standardizes the use of SGML for hypertext and 
multimedia.  It provides sets of standardized SGML attributes  
(called "architectural forms"), which convey the information used 
by the useful/general hypermedia constructs mentioned in the 
paragraph above.  Architectural forms can be recognized by 
suitable processing software, and decisions or actions taken on the 
basis of the values of the attributes.  HyTime extends SGML 
hierarchical structures by facilitating the lexical modelling of data 
content and attribute values, and also by supporting inheritance of 
attribute values.  HyTime extends SGML hyperlinking (ie. 
IDREF) capabilities, and adds co-ordinate structures (called Finite 
Co-ordinate Spaces) to handle the alignment and synchronization 
of information objects in time and space.  Using HyTime means 
that we no longer have to deal with a single SGML document, but 
can make seamless use of whole libraries of documents or pieces 
of documents.

Dr Goldfarb then talked about the development and release of two 
Public Domain HyTime tools: ObjectSGML and POEM.  
ObjectSGML is an Object-Oriented SGML parser which supports 
incremental parsing, entity structures as they were originally 
envisaged during the development of ISO 8879, and the processing 
LINK feature.  It also offers native HyTime support - validating 
architectural forms, handling location addressing, and processing 
HyQ queries and properties.  The source code will be made 
publicly available for free.

POEM, the Portable Object-oriented Entity Manager, provides a 
platform-independent interface to real physical storage.   It 
supports ISO/IEC 9070 public identifiers for universal object 
identification, ISO/IEC 10744 Standard Bento (SBENTO) and ISO 
9069 the SGML Document Interchange Format (SDIF).  It has no 
parser dependencies, so it can be used with any of the existing 
SGML parsers.  As with ObjectSGML, the source code will be 
made publicly available for free.

ObjectSGML and POEM are the results of Project YAO, 
conducted by the International consortium for free SGML 
software development kits (SDK).  Participants included Yuan-ze 
Institute of Technology (Taiwan, ROC), IBM (in both the US and 
France), Naggum Software (Norway) and TechnoTeacher Inc (in 
the US).  Between them, the development team had extensive 
experience of developing SGML-aware tools and systems.  
Implementations of SDK will use the standard C++ class library, 
be entirely platform-independent, and be based on proven 

The architecture of ObjectSGML is build around a low-level 
"parser event" API, a variable-persistence cache, a high-level 
"information object" API, and uses POEM for entity management.   
Most existing SGML parsers use the low-level "parser event" 
approach (ie. passing all start/end tags, attributes, data etc. when 
found, whilst only retaining the current structural context).  The 
use of a high-level "information object" API means ObjectSGML 
will provide access to both the element and entity structure of a 
document; addressing can be done using the HyTime location 
model ("proploc" and HyQ).  The variable-persistence cache will 
be maintained by the application in a proprietary format; it will 
allow rapid access to information found by the parser event API, 
and can be optimized for an application.  Using such a cache 
avoids re-parsing, since it will hold such values as "next event", 
"element", "entity", "parsing context" etc.

Alpha test versions of the software were shipped to test sites last 
Thursday.  The results of testing will determine what revisions 
may be required.  However, the current intention is that both 
ObjectSGML and POEM should be publicly available no later 
than the end of the first quarter of 1994.

10.     "HyTime: Today's Toolset" - Dr. Charles Goldfarb 
	(IBM) and Erik Naggum (Naggum Software)

SGML was developed for the benefit of information owners.  It 
requires that information representation should be independent of 
applications and systems, which means that more than one 
representation is always necessary: an abstract (logical) 
representation, one or more perceivable presentations, and an 
internal storage representation.  The real storage of information is 
obviously platform dependent at some stage, but SGML liberates 
information from such dependencies through the use of entities 
and an entity manager.

SGML entities can take a variety of forms.  External identifiers, 
SGMLall declare (and some also reference) entities.  A system 
identifier is really a "locator" in as much as it specifies the 
physical location where an entity can be found; it involves the use 
of a Storage Object Specification (SOS), and a storage manager.  
There is no requirement that an entity should occupy all the 
content of a storage object, so the manager must be able to extract 
substrings (as well as handle things like record boundary insertion 
and omitted system identifiers).

The use of public identifiers means that registered and formally 
identified entities can be recognized by conforming SGML 
systems.  The practical implementation of the formal registration 
procedures outlined in ISO 9070 has yet to  be finally sorted out 
by the selected registration authority (the GCA).  Until this is 
done, it is perfectly possible to formally identify public entities 
using ISBNs - and this approach has already been adopted by 
companies such as IBM.  It should also be remembered that the 
entities associated with public identifiers can also exist in different 
versions; for example some SGML software requires that DTDs 
are precompiled before the system can use them.

Erik Naggum spoke briefly about Formal System Identifiers 
(FSIs).  An FSI lists a Storage Object Specification, giving details 
of the storage system type, storage object identifications, record 
boundary indicators, and substring specifications.  The record 
boundary indicator was felt to be necessary, because many people 
now move files between Dos, Unix, and Mac systems, and each of 
these has a different way of indicating record boundaries.  Erik 
showed an example of how files on two different types of systems 
could be concatenated and passed to an Entity Manager as if they 
were a single unit.

Interchange facilities involve a separation of the virtual entity 
management from the real physical storage entity management.  
SDIF (ISO 9069) details how SGML objects can be combined into 
a single stream for the purposes of interchange.  SBENTO is 
described in the HyTime standard (ISO/IEC 10744) and whereas 
conventional BENTO uses a directory-based approach to control 
the packing of objects into a single stream, SBENTO uses SGML 
entities to make the process simpler.  It is also possible to 
interchange SGML objects packaged using conventional archiving 
tools (e.g. PKZIP).

Good entity management is very important to the success of any 
SGML-based system.  Entity structure is a "virtual storage system" 
which isolates the element structure from system dependencies, 
and allows storage-related properties and processes.  Entity 
structure is (literally) the foundation of SGML; it supports both 
SGML-aware and non-SGML aware storage and programs, and 
also allows the successful interworking of both types.

The soon-to-be released POEM (Portable Object-oriented Entity 
Manager) announced in Dr Goldfarb's previous presentation, 
implements all the principles of good entity management.  A copy 
of the POEM specification (version 1.0, alpha level 1.0 ) was 
distributed to attendees.

11.     "Charles Goldfarb, Please Come Home: SGML, the 
	Law, and Public Interest" - Ina Schiff (Attorney at 
	Law) and Allen H Renear (Brown University) 

(Ina Schiff was unable to give her part of this presentation, which 
was given on her behalf by another speaker.)
The conventional wisdom is that SGML is best-suited to long-
lived, structured documentation such as technical manuals.  
However, the purpose of this presentation was to suggest that it 
could be effectively applied to the handling of structured legal 
documentation of the sort regularly produced by attorneys on 
behalf of their clients.

Attorneys are used to researching and supplying the information 
content that comprises the key parts of most legal documents.  
However, they generally leave the structuring of the information 
and the inclusion of legally required ("boilerplate") text to 
paralegal and secretarial staff.  Adopting an SGML-based 
approach to the creation of such structured documents would mean 
that attorneys would no longer have to rely on other members of 
staff to correctly structure their texts and include required 
elements etc.  A well-featured SGML-based system could easily 
provide a good authoring and editing environment in which to 
create and revise these kinds of documents.

It is possible to imagine a future in which electronic multimedia 
structured documents are acceptable as submissions in court.  If 
such documents were also archived in the main legal text 
databases, it would greatly facilitate the generation, delivery, 
interchange and reuse of legal information.  This would represent 
a better service to clients, and hopefully lessen the tremendous 
amount of paperwork which is currently required as part o f the 
legal process.

Allen Renear strongly argued for the collaborative development of 
any DTDs for the sorts of documents mentioned above.  He 
suggested that the legal community could usefully benefit from 
modelling the approach adopted by the academic community when 
developing the Text Encoding Initiative's (TEI) Guidelines.  

Allen gave an entertaining account of how he had been called as 
an expert witness to defend Ina Schiff's use of SGML to produce 
her own structured documents, after an opponent who had lost a 
case to Ina was contesting the size of her fees (having had costs 
awarded against them).  The case against Ina suggested that by 
entering the data content herself, she had actually been performing 
a secretarial task, and so the work should have been charged at an 
appropriate rate; they also alleged that Ina had made substantially 
less-than-average use of paralegal and secretarial assistance when 
preparing her case.  Allen argued that her use of an SGML-based, 
structured document authoring environment had allowed her to get 
on with the job of producing information content, and to do it 
more efficiently;  Ina won the case.  Allen said he would now be 
looking forward to the time when attorneys could be taken to court 
for not using SGML when preparing documents for a case

12.     "Online Information Distribution" - Dave Hollander 
	 (Hewlett Packard)

There are several current barriers to the delivery of online 
information.  Authors only have a limited selection of authoring 
tools, they are not used to reusing information, and the information 
they receive/deliver can be inconsistent.  Publishers have no 
standard tools to process the information they receive from 
authors.  Whilst customers have specific hardware and software 
requirements, which may be unique to them.  This means that 
throughout the process of developing online information, different 
convertors are  required for every environment - and this creates 
particular difficulties for large companies like HP, who now 
produce 3-4 gigabytes of information each month.

The kinds of information that might be distributed online vary 
considerably.  A typical list might include such things as: context-
sensitive and task-oriented application help, online reference 
manuals, multimedia (graphics, audio, video), hypertext, 
information that is conditional on the current environment, history 
or other factors, and so on.

HP have come up with a short term solution, which is not ideal in 
SGML terms, but fits their purpose.  The Semantic Delivery 
Language (SDL) is a delivery format defined by a DTD; it 
provides an intermediate language/format in which to deliver 
information, and it also facilitates tool development and 
information reuse.  SDL's development was entirely driven by 
practicalities, rather than a wish to experiment with SGML-based 

Achieving good performance was the number one issue, so 
documents were broken down to allow for multiple entry points.  
The designers of SDL also  had to give up some SGML features 
(eg markup minimization), use pre-calculated counters for  
chapters etc., use a DTD which would work with simple/cheap 
parsers, and use a small number of elements.  Certain parts of the 
documents (e.g. the table of contents, indexes etc.) are pre-
computed before display to improve performance.

SDL had to be flexible.  The system had to support font and page 
specifications which are separated out from the document content 
(but which allow flexible displays).  A normalized set of semantics 
is included, although the designers also allowed fourteen different 
types of system dependencies [?].  There are a variety of link 
types, but SDL is not (yet) HyTime compliant.  A version id is 
placed on all containing elements to support version 
control/viewing if the tools being used are powerful enough.

SDL is intended to provide a structure which can help a reader get 
to the right information at the right time.  Source semantic 
identifier attributes allow individual and/or groups of elements to 
be easily identified - but this approach is not ideal, and HP may 
adopt a HyTime based approach in future.  SDL's designers put 
alot of thought into the development of filters; from the first, SDL 
was planned with filtering in mind.  The inclusion of display level 
semantics facilitates filtering from SDL, and from other 
procedural formats that are still being used.

SDL is a complete information model, in that it includes 
formatting (DSSSL-type) information as well as structuring 
information.  SDL hierarchical modelling also makes it easier for 
non-SGML aware programmers to develop tools quickly, and gets 
them interested in the concepts of SGML.  However, the real value 
of SDL lies in the fact that HP's customers can now get consistent 
information distributed online.

13.     "Xhelp: What? Why? How? Who?" - Kent Summers 
	(Electronic Book Technologies)

Xhelp a standard online help solution for the Unix/X-Windows 
environment. Xhelp was not produced by EBT in isolation, but is 
the results of collaboration between numerous X/Unix developers.

The current situation is that each vendor often has their own 
solution to providing online help - which makes the whole 
situation very complex, and makes life difficult for the end users.  
Current solutions to this problem either provide less effective help, 
or are more expensive.

The people involved in producing online help (writers, designers, 
programmers, managers etc.) are not always able to collaborate, 
and any new solution has to bear this in mind.  Online help must 
be consistent, and should exist as a service which is separate from 
the client applications.  Applications and online help often have 
different release cycles, so online help should remain uncoupled 
from applications in order to support incremental updating of 
content.  Moreover, any solution to the problems of providing 
online help must be truly cross-platform for both applications and 

The Xhelp solution makes use of the existing standard 
communication layer in X.  Information representation and display 
favours using SGML, but the display system also supports 
PostScript, ASCII etc. to allow for the easy inclusion of legacy 

Kent then spoke about the Xhelp architecture (which separates the 
communication logic, data representation, formatting and display 
tool layers), and the Xhelp protocol.  He described the various 
parameters that comprise the Xhelp client message content, and 
outlined the advantages to be gained from using the Xhelp 
architecture in relation to the Xhelp protocol and Xhelp.dtd.  
Advantages to be gained from Xhelp procedural approach 
included such factors as: programmers no longer have to do 
anything to support on-line help; time of writer/programmer 
collaboration is reduced; the fallback algorithm works to provide 
some help, even if the user's query cannot be answered directly; 
supports context-sensitive and task-oriented help; offers "fill-in-
the-blank" templates for linking to help information.  The 
Xhelp.dtd means that authors gain reusable skills, since they 
become familiar with a single content model and set of authoring 
tools.  Other advantages of Xhelp included: no installation, and 
help can be distributed independently of applications; good 
performance; encourages a good environment for 

Kent talked through the online help production process from the 
programmers' and writers' points of view, comparing the 
traditional approach with an approach based on the use of Xhelp.  
The Xhelp approach requires far fewer steps (4 as opposed to 12) 
and the programmers' input is reduced to zero.  Xhelp provides 
numerous benefits to Unix systems vendors, independent software 
vendors,  and to end users.  It provides a cost-effective, cheaper 
solution, with positive benefits for all involved.
(The Xhelp developers forum and DTD is maintained on the 
O'Reilly & Associates ftp site)

14.     "Digital Publications Standards Development (DPSD): 
	 A Modular Approach" - Horace Layton (Computer 
	 Sciences Corporation)

DPSD is a three phase programme to streamline and modernize the 
acquisition, preparation, production and distribution of 
information for the US Army.  The developers hope to have the 
finished standard available next summer.

The MIL-STD-361A standard, is the flagship product of the DPSD 
programme.  It is task-oriented, and consolidates six former 
technical manual standards into one.  The DTD eliminates 
chapters, sections, and most paragraph numbering requirements, 
and focuses on structuring information content rather than style or 
formatting issues.  Horace talked through the evolution of MIL-
STD-361A, and the products produced by the end of  DPSDphase 
II (in June 1993) -  which including several DTDs and a number 
of FOSIs.  He then looked at the programme for DPSD phase III.

The concept behind MIL-STD-361A is that the majority of 
maintenance manuals, regardless of the level of  maintenance, 
contain similar functional requirements.  Knowledge of technical 
manuals and analysis of technical manual data allows one to group 
functional requirements into modules of information.  Creation of 
DTDs for these modules allows use of the same modules wherever 
the same functional requirements are imposed.  There is only one 
DTD requirement for each technical content volume.  The 
approach can be used for all levels of maintenance (operator, unit, 
direct support/general support, and depot).  A single  DTD is used 
to assemble all the required modules into a complete technical 
manual (TM).

The DTDs in MIL-STD-361A are content driven,  and comply 
with  both ISO 8879 and MIL-M-28001.  The DTDs are quite 
small in size (8-10 pages), and   currently only seven DTDs cover 
all the Army's TM requirements contained in MIL-STD-361A.  
The DTDs are intended to be easy for authors to understand and 

Horace showed a diagram of the MIL-STD-361A technical manual 
structure, and another giving an overview of the MIL-STD-361A 
concept.  He then briefly described the relationship between the 
MIL-STD-361A and MIL-D-87269 (IETM - Interactive 
Electronic Technical Manual) standards.  Horace closed with a 
description of the field test/proof of concept objectives, and the 
test sites and schedule.  Testing is due to finish by the end of April 
1994, with the final document ready for submission by May/June.  
The DPSD programme means that the US Army is well on its way 
to achieving a full digital delivery capability.

15.     "A Practical Introduction to SGML Document 
	Transformation" - David Sklar (Electronic Book 

This presentation looked at the requirements for, and features of 
an "SGML transformation engine" which David likened to "the 
Swiss Army pocket knife of the SGML community" (i.e. an 
extremely useful, multi-featured tool - although this description 
somewhat loses the humour of David's remarks).

David began by proposing his personal nomenclature for 
distinguishing between "conversions" and "transformations".  
Conversions are "Up and Lexical" in that they involve the 
(upward) conversion of non-SGML unstructured source 
information into SGML, on the basis of a content-identified (i.e. 
lexically-based) process; it is difficult to achieve 100% accuracy 
during conversion.  Transformations are "SGML and Down" in 
that they involve the (downward) translation of SGML, content-
identified, validated data into a non-SGML form; it is possible to 
achieve 100% accuracy during these types of transformation.  This 
presentation focused on SGML transformations (rather than 

A dreaded "chasm" exists between the optimal in-house DTD, and 
the processor needs/distribution DTD.  An optimal in-house DTD 
is based on a content-driven design, is formatting-free, and 
supports omission of generatable data.  A "processor needs" DTD 
is useful for people who want to output hardcopy or to publish 
online - but this is still a relatively young industry, and the 
processors have limitations (e.g. a processor like DynaText cannot 
do such things as the auto-numbering of heterogeneous element 
types; it is inefficient at calculating list-adornment types (e.g. 
bullets) based on context-sensitive algorithms; it should not be 
used to auto-generate text as this will be unknown to the search 
engine).  A "distribution DTD" might be something like an 
industry-standard DTD (e.g. ATA, J2008 etc.) or a DTD required 
by a major customer.  This situation usually results in the in-house 
DTD being compromised to bridge the gap between the optimum 
and processor needs/distribution DTDs.

Compromising the optimal DTD is the wrong solution to adopt.  It 
leads to a number of additional costs (e.g. disruption of the current 
authoring/conversion process, the DTD's portability/lifespan is 
shortened etc.).  The solution is distasteful in so far as it represents 
a change in long-term strategy to compensate for temporary 
deficiencies in current SGML processors.  Ultimately, the aim 
should be to bring the SGML consumer to the data, and not to 
force the data to travel to the customer.

The correct way to bridge the chasm, is to build an "Xform 
Bridge" (automated SGML transformation) - to transform data 
which conforms to the optimal in-house DTD into a form that 
complies with an interim, "process-ready" DTD.  This way, the 
author/review environment remains unaffected by the 
transformation process is not affected, and the possibility that the 
transformation can be 100% automated means that it can  be 
potentially 100% accurate.  Moreover, the interim instance need 
not conform to an actual DTD, as this is not required by some 
processors (e.g. DynaText), and validation may be deemed 
optional if the transformation process is itself validated.

David then compared the main characteristics of an "Author-
Ready" DTD with those of a "Process-Ready" DTD.  An Author-
Ready DTD will contain no information that can or should be 
auto-generated; authors are allowed to focus on content which they 
alone can produce.  David called this approach  the "noble but not 
usually practical" from of SGML (emphasizing that it is not 
riddled with "compromise" attributes designed to satisfy the 
processor).  A Process-Ready DTD will contain auto-generated 
information that is needed by the processor - and David called 
this  the "real-world" form of SGML (in the sense that this is how 
SGML systems are typically implemented).

Transformations can also be used for a variety of other purposes, 
such as importing data from external sources (e.g. generating 
tables from information held in a database).  Transformations can 
also be used to  automate "mindless" or "formatting-based" 
authoring tasks, such as calculating a list-adornment type.  They 
can be used to perform certain types of document analysis, for 
example producing a report on some data (e.g. statistics, frequency 
counts etc.).  A transformation engine can also assist the semantic 
validation of data, for example it could check the validity of the 
value of a "date-month-year" attribute (which an ordinary SGML 
validating parser will not check).  Transformations could also help 
to extract/duplicate data (e.g. generating an index or table of 
contents), and to hide data on the basis of an intended audience.  
Futhermore, a transformation engine could apply style information 
during the transformation process - which would facilitate the 
fast on-line display of the resulting SGML documents.

Most of the existing transformation engines fall into what David 
called the category of "Parser with a Print statement" utilities (i.e. 
the current context is maintained using a simple element stack etc.)  
This approach is limited in a as much as it offers no lookahead 
capability, and only limited "lookbehind" (typically only to 
ancestors and left siblings).  The output side often has no SGML 
intelligence, and so it is quite easy to output invalid documents.

Another category of transformation is the "tree analyzer with a 
print statement" utility.  There are very few tools of this type (e.g. 
EBT System Integrators Toolkit), and although they allow 
unlimited access to the document tree (with arbitrary look-
ahead/behind, and re-ordering of input elements), there is still no 
SGML-awareness on the output side.

There is a third category of transformations, which David called 
"tree transformation" utilities (e.g. the Polypus library extensions 
to Balise from AIS).  To date, this would appear to be the only 
product with SGML-awareness on both the input and output sides 
of the transformation process.

David then proposed a number of tips that might prove useful 
when comparing transformation engines.  Check the software 
externals, such as the range of platforms supported, speed, RAM 
requirements etc.  Also check the software internals i.e. whether or 
not the scripting language is comfortable/easy to learn/offers good 
diagnostics/is extensible, look at the script-debugging aids, the 
error recovery during parsing of an input document, access to the 
input tree, pattern matching, and access to external data.

David had hoped to talk about the role of GLTP (the General 
Language Transformation Processor) as defined in the DSSSL 
standard, but unfortunately he did not have time.  (N.B. GLTP is 
likely to be renamed as STTL, the SGML Tree Transformation 
Language, in the next draft of DSSSL).  David also wanted to talk 
about "GLTPeeWee" the demonstration prototype GLTP 
transformation engine that he hopes to release into the Public 
Domain, but this had to be left until a special evening session.

16.     "SGML Transformers: Five Ways" - Chair: Pam 
	Gennusa (Database Publishing Systems Limited)

This session was designed to give an overview of how various 
tools would solve some typical transformation problems (e.g. 
transforming document instances from one DTD to another, or 
between an SGML document instance and something else).  A 
number of tools were represented, and the problems were specified 
in advance by the speakers themselves.

The Tools:

Balise and Polypus (AIS/Berger Levrault) - Christophe LeCluse
SGML Hammer (Avalanche) - Ludo van Vooren
The Copenhagen SGML Tool [CoST] (Public Domain, University 
of Copenhagen) - Klaus Harbo
GLTPeeWee (Public Domain, David Sklar) - David Sklar 
[unfortunately, David was not able to present his GLTPeeWee 
solutions to the problems, because his notes had disappeared]. 
OmniMark (Exoterica Corporation) - Andrew Kowai
TagWrite (Zandar Corporation) - Harry Summerfield.

The Problems:
(N.B. most problems were accompanied by a sample case of the 
type of DTD outlined in the problem specification, and some also 
included sample post-transformation output.)

1) INSTANCE NORMALIZATION: Starting from an arbitrary 
source SGML instance with potential markup minimization, 
generate a non-minimized result instance in which all SGML 
information according to ESIS have been made explicit, and which 
can still be parsed using the same DTD.  The program should be 
written to be DTD independent.  This kind of processing is 
extremely frequent.  The result instance can be easily post-
processed by non-SGML systems (e.g. typesetting systems 
loaders) or by minimal SGML systems.  [Set by AIS]

dictionary made of entries, split this instance into as many files as 
there are entries, while generating an SGML skeleton which keeps 
track of the entry file names.  In a second step, perform the inverse 
operation: use this skeleton to gather all entries (stored one per 
file) and re-create the original instance.  This exercise is a 
simplified version of a very common operation in database loading 
and extraction situations.  It stresses the input/output capabilities 
of the application language.  [Set by AIS]

3) STRUCTURAL TRANSFORMATION: Starting from a source 
instance described by a hierarchical type DTD, generate a "flat" 
instance described by a "flat type" DTD.  (This kind of problem is 
very typical of problems encountered when generating input for a 
non-SGML word processor or DTP tool; "flattening" to some kind 
of "transit" DTD is often needed).  In a second step, do the 
opposite: from the generated "flat" instance, re-generate the 
original hierarchical instance.  (This kind of problem is very 
common in retroconversion).   Note: the two sample DTDs are 
designed to illustrate recursive definitions, and are not meant to be 
really useful in the real world.  [Set by AIS]

parser checks uniqueness of ID attributes in the instance and 
checks that IDREF attributes are resolved, but this does not 
guarantee by itself correctness of the cross-references.  In the 
cases when cross-references are "typed" (i.e. a cross-reference to a 
figure is not the same as a cross-reference to a table), checking the 
type of elements associated to the ID/IDREF attribute pairs 
provides an additional checking level.  This problem examines 
how to handle such a task.  As an auxiliary task, it is asked to fill 
implied ID attributes.

Floating (empty) elements are often used in DTDs to markup 
revisions or internationalization.  These empty elements occur in 
pairs with a "starting" and an "ending" element.  When such 
elements are declared floating, an SGML parser cannot check 
much about the way they are used.  This problem examines how to 
handle that task with an application language.

6) CALS TO AAP TABLE CONVERSION: The general problem 
consists in transforming SGML structured tables from the CALS 
table DTD to the AAP table DTD.  The transformation program 
should handle: the general structure of the table, spanning, and 
borders.  The program should be DTD independent.  That is, given 
a DTD including CALS table description, the program can take 
any instance of this DTD and output the given instance where all 
CALS tables have been replaced by AAP tables.  [Set by AIS]

XREF: Consider a DTD that contains SECTREF elements that 
represent cross-references to SECTION elements [DTD example 
omitted].  The transformation should replace each input-DTD 
SECTREF with an enhanced output-DTD SECTREF [example 
omitted].  When a SECTREF is encountered during the 
transformation, the engine should automatically generate this text 
as the content of the output:" SECTREF: see 'XXXXXX'", where 
XXXXXX is a copy of the contents of the TITLE of the 
SECTION that is referenced by the SECTREF.  Note that TITLE 
is not atomic; a TITLE instance can contain an arbitrary number of 
EMPH subelements.  Also note that it is possible for a SECTREF 
to appear before the SECTION to which it refers.  The 
transformation should perform error checking to ensure that 
exactly one SECTION is referenced, report problems 
appropriately in a 'non-fatal' way (i.e. should continue processing 
and produce a usable document instance).  [Set by AIS]

8) EXTRACTION/SORTING: Consider a DTD that allows 
FIGURE elements to be placed anywhere in the body of the 
document.  Some figure elements have caption attributes, but not 
all of them do: [DTD example omitted].  Create a transformation 
that appends an element called FIGREVIEW to the end of the 
(FIGURE+)>.  FIGREVIEW simply contains copies of all the 
FIGURE elements found in the body of the document, sorted 
lexicographically based on the figure' s caption.  (Assume all text 
is ISO 8859-1).  Uncaptioned figures should not appear at all in 
the FIGREVIEW section.  Note that each FIGURE element is 
actually the root of a subtree of arbitrary size (I intentionally don't 
show the content model for HOTSPOT); make sure the entire 
subtree of each captioned FIGURE is duplicated in the 
FIGREVIEW section.  (I am interested in knowing if that is 
possible, in your technology, without full knowledge of the 
content model of HOTSPOT).  [Set by EBT]

9) ROW COUNTING: While converting an SGML marked-up 
table to some other form (e.g. a typesetting language), count the 
number of rows and columns in the table and put the total in a 
specified position at the start of the table.  [Set by Exoterica]

10) LIST MARK FORMATTING: Given a <list> element, that 
can have other <list> elements within its content, which has an 
optional attribute whose presence determines the manner in which 
items in each list are marked or numbered, and whose absence 
indicates that the mark or numbering form is to be deduced from 
the ancestory of the <list> (e.g. a <list> with a decimal-numbered 
list is to use lower-case letters), correctly mark or number each list 
item.  Note that alignment of the text following the marks/numbers 
can be dealt with as another problem.  [Set by Exoterica]

11) DATE:  Output the current date in a form determined by an 
attribute value in the SGML document.  [Set by Exoterica]

12) LINE BREAKING: In the process of converting an SGML 
document to some other form (or even SGML), produce output 
that has a given number of text characters on each line, not 
counting any commands or tags in that other form in the total, and 
adjusting the length of each line so that breaks only occur at 
"allowed" points (such as at spaces and existing line breaks).  
Require that there be no trailing spaces at the points of the breaks 
(i.e. none of the preceding lines have trailing spaces).

or more SGML documents each of which contain references to 
objects and locations in themselves and other documents in the set, 
replace each reference by an identification of the referenced object 
or location, the identification having been defined by the 
processing of the target, whether it be in the document doing the 
referencing or some other.  [Set by Exoterica]

14) ICADD TRANSFORMATION: This transformation exercise 
gives all of the participants the opportunity to implement (and if 
necessary make suggestions for) the mapping techniques which 
have been designed to allow any DTD to carry with it the 
information needed to be turned into the input (with an instance) 
to an automated Braille translation process.  This is, admittedly, 
quite a complicated exercise by the end, and people may wish to 
build processors only for the earlier attributes.  [Yuri 

The Solutions:

1) INSTANCE NORMALIZATION:  Christophe said that 
Balise/Polypus could solve this problem in about three lines of 
code (but this was not shown).  Ludo devised a simple SGML 
Hammer module to handle any starts (and attributes), data content, 
and element ends;  a couple of other simple, very short modules 
were also required to provide a complete solution.  Klaus said that 
CoST could also cope with these requirements, and summarized 
his solution.  Andrew said that OmniMark could solve the main 
bulk of the problem, but could not provide a solution that would be 
truly DTD independent (however this would only require a simple 
fix, and it will be corrected in future versions of OmniMark).  
Harry said that TagWrite could solve the general problem using 
two or three simple rules and its notion of supertokens, however 
the solution would probably not be completely DTD independent.  
David Sklar suggested that the problems for GLTP will come 
when trying to handle empty elements.

that the key to any solution is handling the fact that the application 
will need to address several external files.  Christophe said that 
this problem would be quite simple for Balise.  SGML Hammer 
cannot do multiple file output, so Ludo suggested that this case 
might cause difficulties for the product.  CoST could handle the 
problem fairly easily, although some thought would need to be 
given as to how to make the filenames unique, and making the 
filenames out of PCDATA content might also be a bit tricky.  
OmniMark would find the problem simple to solve, whilst 
TagWrite could provide only a partial solution (the remainder 
could be solved via the use of simple WordPerfect/Word macros).

FORMATTING:  These problems were combined, because they 
shared fundamental similarities.  Balise/Polypus can solve the list 
mark problem, however the nesting and un-nesting might be a 
difficult area.  A Polypus library function  could be used to do the 
structural transformation.  OmniMark could solve the list mark 
formatting issue by using pattern matching and text rebuilding , 
and solving the problem of structural transformations can also be 
done (in each case the code used in the modules was shown).  
SGML Hammer (and the Louise scripting language) could cope 
with the list mark formatting problem (by using arrays, and acting 
on the basis of context).  TagWrite can solve the list problem 
(although Harry admitted that it entailed using hideous SGML); 
the structural transformation problem is quite familiar and solvable 
with TagWrite.

4) ID TYPE CHECKING / ID COMPUTATION:  Balise can solve 
the type checking of ID/IDREF elements - by going through the 
document building a data structure of IDs and IDREFs and then 
checking that everything matches up and resolves correctly (if not, 
it reports and error).  The Louise/SGML Hammer solution would 
be similar to that adopted by Balise.  Using an array-based 
approach, it would first read the whole file into the array (to avoid 
having to perform any lookahead), then pass through the array to 
check all ID/IDREFs resolve.  At the end of the process, 
Louise/SGML Hammer will output a report identifying any 
problems (e.g. any footnotes that have been used but not 

CoST offers two ways to solve this problem.  One is the 
Balise/Louise approach.  The other uses CoST's "tree mode", 
making use of an object-oriented techniques to build a parse tree 
which it can then use to check ID/IDREFs.  This would also be a 
two pass process, in which the first pass builds the parse tree and 
the second does the checking.  OmniMark would also solve the 
problem by first constructing associative arrays, which are then 
used to check ID/IDREFs.  Any errors would be output to an error 

TagWrite does not but instead uses counters to keep track of 
things within the document.  The counter-based approach can be 
used to place IDs in a document that has no IDs, but if some IDs 
are present/missing, then TagWrite would not be able to cope (and 
it would be necessary to develop some supporting solutions).

not appear to have any notes on the answers to this problem, other 
than what appeared in subsequent handouts.  I believe that most of 
the panel considered that the solution to this problem would be 
essentially similar in technique to that described for problem 
number 4).

6) CALS TO AAP TABLE CONVERSION:  Christophe said that 
AIS did not really expect anyone to be able to provide a simple 
solution to this common problem, as the real issue is one of tree 

Balise/Polypus solves the problem by using the Polypus library.  
Ludo said that although the problem appears complicated, it is not 
actually very difficult because all the data stays in the same order 
(therefore, SGML Hammer could provide a linear rather than a 
tree-based solution).  Klaus said that he did not even attempt to 
provide a solution using CoST, because he was not familiar with 
the DTDs and it would have taken him too long to understand 
them.  Andrew offered a skeletal OmniMark solution, which relied 
on the use of OmniMark's macro facilities rather than a wholly 

SGML-based approach.

In setting his problem, David Sklar (of EBT) wanted to see if the 
Xrefs could be resolved in a single pass.  Exoterica also wanted to 
see if it could be done in a single (or more) pass.  (Where "pass" 
means taking in the data stream only once)
OmniMark can solve the second problem using two passes (by 
building an intermediate file)and then working from that to resolve 
the external cross references.  The first problem can be solved 
using a single pass.

The solutions to both problems offered by Balise/Polypus makes 
use of several functions defined in the Polypus library.  Essentially 
the problems are solved by the building and manipulation of a 
parse tree.

SGML Hammer, like OmniMark, also has to do multiple passes to 
solve the second problem.  The first problem could be solved in a 
single pass if it has been written using entity references - where 
the entity references will be resolved during the parsing process[?] 
- otherwise this would also require multiple passes.
TagWrite - the duplication issue is easy to solve using TagWrite 
supertokens.  The second problem is non-trivial, and TagWrite 
could not resolve it.

Using CoST, the first problem can be solved in a single "pass", 
because it can build and subsequently interrogate a parse tree.  
Klaus suggested that the second problem is really one of co-
ordination, which could be resolved in various ways with CoST.

8) EXTRACTION/SORTING:  David Sklar said that the GLTP 
solution to this problem is very elegant and simple, but was unable 
to demonstrate this for the reasons mentioned above.
Balise/Polypus can solve this problem using the Polypus library 
(ie. using trees).  Christophe showed some code, including the 
function called after the document tree has been built which is 
required to perform some of the global actions.

This problem is not very difficult to solve with SGML Hammer .  
It collects the subtrees during the parse phase, and handles them 
accordingly at the end.  Ludo suggested that interesting solutions 
could also be created by using different types of DTD for 
authoring and processing the document.

CoST handles the problem by processing the figures in CoST's 
tree mode (ensuring that they will not be deleted), then extracting 
them from the tree and moving them to the appropriate place.
OmniMark also makes this problem simple to solve.  It puts the 
figures in an associative array at the end of the document, which 
can then be subsequently sorted.

TagWrite could not do this because it does not store associative 
arrays (nor build parse trees).  Harry suggested that this was really 
a question of how to use SGML not how to create/transform 
SGML, and therefore TagWrite was never designed to handle this 
kind of problem

9) ROW COUNTING:  All the tools seemed able to cope with this 
problem (although a TagWrite solution was not offered as Harry 
was speaking elsewhere).  OmniMark solves the problem quite 
easily in one pass of the document (the code of the solution was 
shown).  Balise can also solve the problem in a single pass 
(without recourse to using the Polypus library), using an approach 
similar in spirit to that offered for the ID/IDREF checking 
problem.  SGML Hammer would simply store the relevant 
information in an array, then output it as appropriate.  CoST would 
use a two pass solution whilst in tree mode.

11) DATE: OmniMark is able to solve in a single pass, taking 
advantage of OmniMark's built-in function  to get the date.  Balise 
would use a system call function, the results or which would be 
processed accordingly.  SGML Hammer follows the same 
approach as Balise, but also uses a string processing function.  For 
CoST the problem lies in parsing the attribute value; obtaining the 
date information and handling the formatting can be done 

14) ICADD TRANSFORMATION: Yuri suggested that this is 
quite a complicated problem, but this is primarily due to the 
difficulties that arise from handling the multiplicity of possible 
inputs in the specification attributes.

A solution is possible using Balise, but Christophe acknowledged 
that handling all the possible things that could occur in the 
attributes would be quite hard.  Ludo suggested that this problem 
is a good demonstration of the use of architectural forms.  Since 
processing is attached to specific attributes defined in the 
architectural forms, SGML Hammer would be well-suited to 
handling this kind of thing.  Klaus said that CoST, like SGML 
Hammer, was very proficient at handling architectural forms and 
so a solution would be possible.  Andrew stated that a real-world 
solution (using OmniMark) to the problems of handling ICADD 
transformations was currently being developed at California 

At the end of the session, all the panel agreed that they would post 
their solutions in electronic form to the newsgroup 

17.     "The Scribner Writers Series on CD-ROM: From a 
	Great Pile of Paper to SGML and Hypertext on a 
	Platter" - Harry I. Summerfield (Zandar 
	Corporation), Anna Sabasteanski (Macmillan New Media)

Anna Sabasteanski works for the electronic publishing division of 
Macmillan New Media, which currently publishes about 50 
electronic titles, mostly to do with medicine.  The  Scribner 
Writers Series represents a mix of writers in English - American, 
British etc. - who are also classed into numerous genres (such as 
Children's authors etc.).  The decision of which authors to put on 
the CD-ROM was based on a survey of the texts most used in 
schools (about 550 authors are represented on the CD-ROM).

Anna talked generally about the development process.  Many of 
the texts are only available in hot-metal/printed text, rather than in 
electronic form - therefore the initial data conversion costs were 
quite high.  The developers had to decide how to differentiate and 
provide added benefits to encourage use of the electronic form of a 
text over the existing paper version.  Copies of the paper books 
were physically broken up, so that the pages could be easily 

A specialist company was used to handle the conversion from 
scanned pages to SGML tagged and validated files.  The company 
guaranteed an accuracy rate of 99.5% (which was equivalent to 
about two errors per page).  Markup errors were fairly easy to find 
and correct (using SGML-aware validation tools), although 
correcting these required human editorial intervention.  The 
markup conformed to the AAP Book DTD, with a few 
corrections/amendments (e.g. to allow for extra entities necessary 
for ancient texts).  Attributes were also added to indicate genre, 
language, nationality, race, sex etc.

One of the main aims of the project was to use SGML markup to 
tag documents for inclusion in what is effectively a database (i.e. 
the header information of each text is used to facilitate organizing 
and searching).  Microsoft's Multimedia Viewer was used to 
present the information, with the SGML tagged files converted 
into RTF (the format recognized by Multimedia Viewer) by 
Zandar Corporation.

A number of problems were encountered when developing the 
CD-ROM, for example handling certain special characters (which 
could not be represented in RTF), and deciding how to handle 
represent links to other part or whole texts.  A particular headache 
arose when converting the original bibliographic sections - the 
designers of the CD-ROM version wanted all bibliographies to 
follow the same conventions, but senior editors at Macmillan also 
imposed the requirement that the bibliographies could be re-
presented in the style used in the original source text.  A final 
quality control procedure was necessary to check the end product, 
from the point of view of both software and content. The testers 
found several bugs in the beta version of Multimedia Viewer 
software which took some time to get corrected.

Harry Summerfield then described how his company had 
approached the project.  As described above, Zandar were 
contracted to carry out the transformation from scanned, then 
hand-edited SGML files, into RTF.  However, they first carried 
out an important feasibility study to ensure that they were capable 
of doing the job and meeting Macmillan's deadlines.

When they began to design the transformation process, Zandar did 
not just want to see the DTD being used, but also examples of live 
documents containing real tags.  Zandar was aware that DTDs 
change, and that the tagging actually used in files may or may not 
always match up with the current version of the DTD.

The conversion had to handle 50Mb of data in 510 files.  The 
actual conversion process was done in five passes, because this 
approach was cheaper to develop.  The first pass had to find any 
(SGML) invalid characters, and convert them to SGML entities.  
The second pass was to make the first letter of the text of every 
file a dropped cap (in RTF) - this was made possible by having 
used special SGML markup.  The third pass was to do the 
conversion of all special characters into RTF.  The last two passes 
[?] had to strip out SGML tagging in the main texts and 
bibliographies, and format them appropriately for RTF (converting 
the different kinds of bibliographic entries to a uniform structure 
was tricky, and intentionally omitted end-tags in the entries made 
things even harder).  

The entire conversion process (excluding building hyperlinks) 
took six hours of cpu time on a 486 PC.  As part of the project, 
Zandar developed a separate tool called HyperTagWrite to handle 
the creation of the hyperlinking markup, which could be converted 
in a subsequent pass into the format used by Multimedia Viewer.
New writers will be added into future versions of the CD-ROM.  
Changes will be made in the SGML database, from which the RTF 
(or whatever future target formats might be required) can be 
generated.  Using an SGML-based approach should greatly 
facilitate the production of future editions.

In the subsequent questions and answers session, a number of 
points were raised.  Proper names in the original texts were 
identified and processed on the basis of the punctuation in the data 
content.  The conversion process was relatively cheap in terms of 
man months (i.e. Macmillan put one person on the project full-
time for only three months).  The quality control checking took 
seven people six months, and every hyperlink was hand-tested.  
When proofing uncovered errors, all corrections were made to the 
SGML source files.  The retrieval engine indexes every word, but 
they also built several specialist indexes on the basis of markup 
(e.g. index of authors, titles, genre etc.).

18.     "The Attachment of Processing Information to SGML 
	 Data in Large Systems" - Lloyd Harding (Mead Data Central)

Mead Data Central collects information from over 2000 sources, 
but has no control over the received format; they currently have 
about 200 million documents available on-line.  

Conversion/handling on this scale has to be automated as much as 
possible, bearing in mind that the target is to produce an on-line 
information system and that they are not concerned about delivery 
on paper or other media.

Lloyd compared the handling of (electronic) information, to the 
process of industrial manufacturing - especially in its infancy.  
Standardized solutions have resolved many of the problems that 
faced early manufacturing industry, but can the same be done for 
information fabrication systems?  He proposed a new paradigm in 
which the author marks up the information content, an 
"Information Fabrication System" adds value useful for the target 
system (publishing, linguistic analysis etc.), and the target systems 
use the information.  The new middle process extents the 
traditional paradigm.

Traditional SGML techniques may provide solutions within this 
new Information Fabrication paradigm. Markup standardization, 
that is agreed common DTD development along the lines of the 
work of the AAP, TEI and existing efforts at Mead Data Central, 
might help to provide markup relevant to specific target 
applications (publishing etc.) but will only ever be a partial 
solution.  The use of architectural forms may also provide some 
benefits, but adding them to existing DTDs requires skill.  Link 
Process Definitions are another possibility, but they also require 
skill, and they are not supported by many existing tools.  FOSIs 
are  fairly straightforward to use but may not be generalizable 
enough to use as part of an Information Fabrication process.  

DSSSL's Association Specification and Output DTD, combined 
with GLTP, appear to offer the greatest promise of a solution, but 
implementing them will require programmer expertise, and 
DSSSL is still "shimmerware".  As none of these traditional 
approaches really provide a complete and/or ideal solution,  Lloyd 
proposed his own Information Fabrication System Architecture.
For the Information Fabrication Paradigm, the most viable concept 
is a generalization of the FOSI to accommodate any type of 
fabrication process.  The Architecture required to do this consists 
of two components: the Application Interface DTD (AID), and the 
Processing Output Specification Instance (POSI). AID provides 
the syntax for specifying the attachment and association 
information.  POSI specifies the attachment and association 
information for a specific application and raw material.  Lloyd 
then talked through two examples of  the steps required to develop 
and use an application using this Architecture.

The Architecture-based approach solves the attachment and 
association problems, alleviates some of the expense issues, and 
reduces the skill set requirements involved.  The goal of 
developing an Information Fabrication System is to free the author 
from target system constraints, thereby permitting him/her to focus 
on content (e.g. authors should not have to worry too much about 
authoring links).  This requires Information Fabrication Systems 
that can accept any author's creation and cost effectively prepare 
that creation for a target system.  AIDs and POSIs can provide the 
basic underpinnings for such Information Fabrication Systems.

During questions and answer, Lloyd said that his was not 
necessarily the ideal/only/whole solution, but he would like to see 
people talking about "Information Fabrication Assembly Lines" - 
where as much as possible of the process of generating marked up 
information for target systems could be automated.  His approach 
has not yet been adopted at Mead Data Central, but it will be.

19.     ISO 12083 Announcement" - Beth Micksch (Intergraph Corp.)

This presentation was intended to provide a brief history and 
update on ISO 12083 "The Electronic Manuscript Preparation and 
Markup" Standard.  Formerly an ANSI standard ( Z39.59, but 
generally referred to as the "AAP"), ISO 12083 is now being fast-
tracked through the ISO standards process.  The first ballot on the 
Draft International Standard(DIS) was in November 1992, and the 
voting went as follows: 14 positive, 5 negative, and one 

Eric van Herwijnen was asked to be the editor and to set up a 
small technical committee.  Eric was required to resolve all the 
comments received on the DIS into the Standard, as fast -tracking 
means that a second vote would not be needed before the Standard 
is approved.

The Standard is intended to  facilitate the creation and interchange 
of books, articles and serials in electronic form.  It is meant to 
provide a basic toolkit which users can pick up and modify 
according to their needs.  The Standard is meant for use by 
authors, publishers, libraries, library users, and database vendors.
Use of the Standard is indicated by its public identifier (e.g. ISO 
12083:1993//DTD Book//EN - for the Book DTD).  Elements or 
entity references may be removed or modified as needed.  Users 
can declare their own elements in external parameter entities, and 
the parameter entities defined in IS0 12083 can be overridden to 
modify order and occurrence or to specify user defined 
elements/attributes; alias elements are not permitted.  The 
Standard allows SHORTTAG and OMITTAG, although the 
revised usage examples will be fully normalized.  The application 
must conform to ISO 8879:1986.

ISO 12083 contains four DTDs: Book, Article, Serial, and 
Mathematics.  It has a very large Annex (A) which comments on 
the DTDs and covers such things as design philosophy, structure 
descriptions, special characters, electronic review, mathematics, 
tables, braille/large print/computer voice facilities, and HyTime 
facilities.  Annex B contains descriptions of the elements, and 
indicates how all the elements relate to one another.  Annex C 
contains examples, some of which are normalized versions of the 
examples which first appeared in the ANSI standard.

Numerous changes have been made to ANSI Z39.59.  Element 
names have been changed and additions made, to make them less 
cryptic than in the original;  there are new elements for things like 
poetry.  The Mathematics DTD is based on the work of the AAP 
update committee (which has met at a number o f SGML 
conferences, and corresponded over the internet).  ISO 12083 
currently offers minimal HyTime capability, but this should be 
enough to get people started.  The Standard also  supports the use 
of ICADD's Structured Document Access (SDA) attributes, to 
facilitate mapping to braille, large-print or voice-synthesizing 
systems.  The use of  SHORTREFs is deprecated but still possible.  

An alphabet attribute has been added to title, p (paragraph) and q 
(quotes) - to allow the use of special characters in these 
elements.  Electronic review is also supported, and this was 
achieved by incorporating the  CALS Electronic review 
declaration subset.  The names and descriptions of the elements 
and attributes are now more explicit and meaningful to make the 
DTDs more "user-friendly"; the number of illustrative examples 
has also increased.

The Standard will be published very shortly.  It will be available 
from ANSI and NISO (and reserving a copy before January 15th 
can save $10).  NISO will email out electronic copies of the DTDs 
to anyone that wants them.  To get a copy contact:

	National Information Standards Organization
	P.O. Box 1056
	Bethesda, MD 20827

	Phone: (301) 975-2814
	Fax: (301) 869-8071

[This information probably only applies to people in the United 
States.  Elsewhere, people should first try contacting their own 
National standards body].

Beth closed by remarking that the second edition of Eric van 
Herwijnen's book "Practical SGML" has been produced using the 
ISO 12083 Standard (including the HyTime capabilities), and 
things seem to have worked pretty well.  The indications from 
other tests which are currently underway have been equally 

20.     Reports from the SGML Open Technical Committees

Paul Grosso (Chair of SGML Open's Technical Committee), 
reported that they not yet met - although the first meeting would 
be held on the Friday immediately after this conference.  On this 
one occasion, anyone who wished to attend would be allowed to 
do so; future attendance at such meetings would be restricted to 
people connected with SGML Open.

Paul suggested that the general role of the Technical Committees 
will be to look at interoperability issues and make sure that SGML 
solutions work (i.e. that SGML applications can successfully 
interact).  The main Technical Committee will form specifically-
tasked/short-lived sub-committees as necessary.  The Technical 
Committee will need to get input from all the SGML Open 
member companies, and no-one should expect to be able to "piggy 
back" on the efforts of a few dedicated companies or experts.  
Particular problems which may be considered by the Technical 
Committee include things like entity management, how to package 
together and exchange SGML files (cf. SDIF, SBENTO etc.), 
handling tables and math (where many issues go beyond the area 
covered by ISO 8879), HyTime issues, and so on.

21.     "A Technical Look at Authoring in SGML" - Paul 
	Grosso (ArborText)

There are a number of ways of authoring in SGML.  Approaches 
include using a standard "ASCII" editor to author both the text and 
the markup, using a conversion program to add SGML markup to 
existing content, using an SGML-aware (non "ASCII") editor that 
provides the proper markup during the initial authoring process, 
and recreating an SGML document from a repository of existing 
SGML or SGML-type documents.

When discussing authoring in SGML, it is useful to distinguish the 
roles of the parser and the editor.  A parser turns an ASCII 
character stream into tokenized SGML constructs (it also expands 
any markup "minimization").  However, the parser also leaves 
many things to the application for checking.  A non-ASCII SGML 
editor is such an application, and only it can associate meaning to 
the SGML constructs returned by the parser.  Such an editor is not 
just an interface to the parser - it is an application optimized to 
author structured documents that represent valid SGML document 
instances.  A non-ASCII SGML editor should provide an interface 
that transcends the syntactic details; it should represent the 
document using internal data structures that are isomorphic to the 
basic constructs of SGML.

There are three levels of "understanding " an SGML document.  
The lowest is the recognition of SGML syntax (e.g. recognizing 
the individual characters in the string </para> as a an end tag for a 
particular element called "para").  The middle level entails 
understanding and providing an interface to SGML semantics, for 
example what it means to be an element, an attribute, an entity, or 
a marked section, and what it means to have a content model, a 
declared value, a default etc.  The top-most level of understanding 
performs the attachment of application-specific semantics, for 
example in a composition system it determines how to format a 
paragraph element.

All non-ASCII SGML editors must convert the ASCII SGML 
input into an internal representation, but an SGML editor that 
inherently understands SGML semantics can provide much greater 
benefit to the end user than an editor - even a structure one - 
that "imports" and "exports" SGML by converting to an alternate 
view that does not maintain a real-time comprehension of and 
compliance to SGML semantics.  When it comes to measuring an 
SGML editor's performance, the parsing component of an SGML 
system is defined in Annex G of ISO 8879.  Conformance of an 
editor application is often measured by examining what it can 
import and export; it should be able to read/output a wide range of 
valid SGML.  However, real-time context checking is also 
important in an SGML-aware editing system.  The system should 
guide the author whilst creating a valid SGML document, as 
continual validation and checking will make life easier for the 
user.  However , it must be remembered that during the authoring 
process it is possible to have "incomplete" as opposed to "invalid" 
documents - the incomplete document contains nothing that is 
actually wrong, it just does not yet contain everything that is 

The SGML Conformance Standard uses the concept of ESIS 
(Element Structure Information Set) to define conformance for a 
pasrser.  However, the definition of ESIS is not inclusive enough 
to describe all that an SGML editor must do (and this has led to 
the notion of "ESIS+").  ESIS+ suggests that things of importance 
to an SGML editor could include comments, the existence of 
ignored marked sections, and the existence and name of internal 
general text entities.  An SGML editor's view of SGML is really 
dependent on its view of ESIS+.  Therefore, an SGML editor 
could/should be evaluated on the basis of what it recognizes as the 
scope of ESIS and ESIS+

The interfaces to complex constructs can cause problems for 
SGML editors; for example handling such things as marked 
sections should be done properly, even when they are specified 
using a parameter entity reference.  The editor should allow for 
marked sections with unbalanced markup (i.e. which include the 
start tag of an element but not its corresponding end tag); it should 
also allow the synchronous changing of the values of parameter 
entities so that the final result is valid even though an intermediate 
state may be invalid.

Subdocs are another complex structure to be considered.  A 
subdoc is basically an external SGML entity with its own DTD.  
The authoring interface to a subdoc can be similar to that for a 
regular external SGML entity but there are issues are additional 
issues to be considered, such as the different ID name/entity name 
space, a need for a potentially different presentation style for the 
subdoc, to say nothing of what it might meant to actually compose 
a document containing a subdoc.

A third type of complex structure involves the use of the 
CONCUR feature.  Using concur allows a document instance to 
contain completely independent and orthogonal structural 
hierarchies in the same ASCII SGML file.  At any given time, the 
document must be parsed according to the currently active DTD.  
When a different DT is made active, it is equivalent to reading in a 
different document from a different document type.  Although the 
character data of the different views of the document remains the 
same in both cases, some thought needs to be given as to which tag 
sets should be displayed to the user - only the active DTD, or one 
or more of the others that apply.

22.     "Implementing an Interactive Electronic Technical 
	Manual" - Geoffrey von Limbach (InfoAccess Inc.)

There are two specifications which relate to IETMs.  The first is 
GCSFUI (General Content, Style, Format, and User Interaction 
Requirement - MIL-M-87268), which specifies the on screen 
layout and behaviour of an IETM (e.g how an IETM should handle 
warnings - their duration, iconization etc.)  The second is 
IETMDB (MIL-M-87269), but although this mentions "Data Base" 
in the title, it is primarily a set of architectural forms or templates 
for IETMDB compliant DTDs.  IETMDB also specifies a linking 
mechanism for sharing content between document instances, based 
on the use of HyTime ILINKs.

Several classes of ETMs and IETMs have been identified in a 
recent paper by Eric Jorgeson ("Classes of Automated TM 
Systems" Eric L Jorgeson, Carderock Division, Navel Surface 
Warfare Center, 11 August 1993).  In summary, these are as 

Class 1: stored page images (+ display mechanism and 
access via an index)
Class 2: as 1, but adds hypertext links
Class 3: real IETMs (display conforms to GCSFUI, file is 
tagged in accordance with IETMDB)
Class 4: as 3, but authored with a relational or Object-
oriented underlying database
Class 5: as 4, but the display is integrated with other tools 
(such as an expert systems to assist diagnostics).

Geoffrey then described the implementation of a prototype IETM 
at the David Taylor Research Center (DTRC).  The DTRC 
prototype was implemented using Guide Professional Publisher 
(GPP).  GPP includes a scripting language which enabled the flow 
control required by IETMDB as well as a flexible user interface 
which can be adapted to meet the requirements of GCSFUI.  The 
DTRC provided a DTD and document instance derived from the 
architectural forms specified in IETMDB.  DTRC also specified 
the screen layout beyond the requirements of GCSFUI.  Geoffrey 
then showed an example of some sample warning source text and 

It became clear during the DTRC project that IETM production 
requires flexible software.  The user interface must be adaptable to 
user requirements.  The DTD involved does not remain constant, 
and is likely to be frequently revised.  There is also a need for 
tools which can handle things like HyTime's ILINKs.  Geoffrey 
recommended that anyone working on a similar project should try 
to use inherently flexible tools (such as an adaptable user 
interface, and a good scripting language).  Other tools which offer 
an API which can be called from a development language such C 
or C++ are also worth considering, although they can lead to 
higher development costs and greater overheads as requirements 

23.     "The Conversion of Legacy Technical Documents into 
	Interactive Electronic Technical Manuals: A NAVAIR 
	Phase II SBIR Status Report" - Timothy Billington, 
	Robert F. Fye (Aquidneck Management Associates, Ltd.)
[This presentation followed on closely from the previous one.  It 
involved a large number of slides (all of which are included in the 
proceedings), so I shall only attempt to summarize the main 

The Navy's ETM strategy depended upon a comparison of the 
Class 2 and Class 4 ETMs/IETMs outlined above.  Class 2 ETMs 
are typically linear/sequential in nature.  An SGML instance and 
document DTD are fed into an indexing tool, and the resulting 
files are fed into an ETM SGML and graphics browser (with style 
declarations being applied to control formatted display).  In Class 
4 IETMs, the logic sequence is much more complex - since the 
underlying SGML tagged information has to be displayed on the 
basis of interactive inputs from the user (e.g. branching on the 
basis of user responses to dialogue boxes).

Aquidneck Management Associates were awarded a second phase 
Small Business Innovative Research (SBIR) contract to develop 
processes and procedures for the transition of legacy, paper-based 
NAVAIR Work Packages into Class 4 IETMs in accordance with 
Tri-Service specifications.

Timothy described the migration process.  This involves scanning 
massive amounts of legacy data, and particular attention needs to 
be given to potential problem areas - such as how to markup 
tables to support interactive decision making.  Migration has to be 
a phased process, with increasingly sophisticated markup being 
added at each phase.

Timothy talked briefly about data enhancement and authoring, as 
well as data storage and maintenance.  He stressed that the 
subsequent data quality assurance process is very important to 
users, and should not be neglected or played down.  He identified 
some of the features of an IETM presentation system (e.g. tracking 
user interactions, setting and clearing logic states, navigating to 
the next logical node etc.), and showed some diagrams illustrating 
the operation of a frame oriented IETM.

Looking to the future of IETMs, Timothy said that there is a need 
to find a cost effective authomated means of converting paper-
based legacy data to Class 4 SGML-based IETMs; this is a pre-
requisite for the widespread implementation of IETM technology.  
He noted with regret that this is something of a circular argument, 
since the IETMs will not appear without the cost-effective 
conversion technologies, but they will not be developed until there 
is sufficient demand for IETMs.

24.     New Product Announcements and Product Table Top 
[These announcements were made in quick succession, so I 
apologize in advance for anything I missed or mis-heard.]

Xyvision Publishing Systems announced that they have built an 
SGML publishing solution around their document management 
system (and also round FrameMaker and Ventura).
Incontext announced the release of their Windows 3.1 based 
SGML editor (which uses Excel to handle tables).

Electronic Book Technologies (EBT) announced that DynaText 
now has a new multibyte core display language, which means that 
it can display Asian languages (eg. Kanji).  The Rainbow DTD 
being shown at the Poster Sessions will be made publicly available 
to facilitate translations from word processors to SGML
SoftQuad announced a new version of Application Builder which 
allows Author/Editor to be used as an interface to document 
management systems.  They would also be showing the latest 
version of Author/Editor (v.3.0).

Recording for the Blind announced the creation of their Etags 
product, to assist the  production of electronic texts that can be 
made accessible to the print-disabled.
OpenText Corporation announced the creation of new client/server 
extensions to support easy combination of hardware.  Their 
product has also been extended to support multibyte char sets (eg. 

Datalogics announced that their WriterStation P/M product has 
now been ported to Windows 3.1.

Schaeffer [?] consultants announced that they would be showing 
some of the integrations they have done, (based on OpenText), to 
facilitate data management.

Grif SA announced that they would be showing GATE [tools to 
integrate Grif into your system?], Grif SGML Editor for Windows, 
and CAT (Corporate Authoring Tool) for authoring in BookMaster 
(GML to SGML).

Oracle announced the release of OracleBook, an online 
multimedia document delivery system.  Version 2 will be SGML-
aware, and will be demonstrated at this conference.
Texcel announced the release of Information Manager, a package 
for building document management/collaborative authoring SGML 

Tachyon Data Services announce that they would be showing the 
customizable Tagger software that they have developed to convert 
files to SGML.

Synex Information [?] announced the release of SGML Darc, an 
SGML-based document management and archiving system for 

ArborText announced the recent release of version 5 of the 
ADEPT Series of software (now including a FOSI editor).  They 
were also demonstrating Powerpaste (a set of conversion tools), 
and the Windows version of the ADEPT SGML Editor, which is 
due to be released in the second quarter of next year.
Interleaf announced the recent release of  the Interleaf 5 <SGML> 
Editor, and the Interleaf 5 <SGML> Toolkit to develop 

Exoterica announced that they would be demonstrating a new 
release of OmniMark for use on Macintosh machines.
Passage systems announced the release of PassagePro, a document 
management and production system.  Currently available on SGI 
machines, and soon on Suns, they hope to have it running under 
Windows by next year.

Frame Technology announced that they would be demonstrating 
FrameBuilder, and announcing/demonstrating their SGML Toolkit 
(which facilitates mapping between SGML and Frame's 
WYSIWYG environment)

Zandar Corporation announced that they would be demonstrating 
the latest version of their TagWrite conversion tools (currently at 

The following companies demonstrated/exhibited their products 
[this list is taken from the proceedings, so some late entries may 
have been omitted]: 

ArborText, Inc.; AIS/Berger-Levrault; CTMG/Officesmiths; Data 
Conversion Laboratory; Datalogics Inc.; Electronic Book 
Technologies; Exoterica Corporation; InContext Corporation; 
InfoAccess; Information Strategies Inc.; Information Dimensions 
Inc.; Intergraph Corporation; ISSC Publishing Consulting; 
Microstar Software Ltd; Ntergaid; Open Text Corporation; 
Recording for the Blind; Saztec International Inc.; SoftQuad Inc.; 
STEP Sturtz Electronic Publishing GmbH; Synex Information; 
Tachyon Inc.; Texcel; WordPerfect Corporation; Xyvision Inc.; 
Zandar Corporatio

25.     Poster Session

As above

26.     "Implementing SGML Structures in the Real World" 
	 - Tim Bray (Open Text Corp.)

Tim began by remarking that the number of people attending the 
conference indicates that there  are a vast amount of SGML-tagged 
information being created, but where is it all being put?  Some of 
the possible technologies and systems have been shown at the 
vendor demonstrations at this conference.

At SGML'92, Neil Shapiro said "I can model SGML with a slight 
extension to SFQL and store SGML in a relational database".  In 
response, Charles Goldfarb said "SGML should be stored in a flat 
file in a native form".  This presentation looked at some of the 
different possible approaches.

Computer systems at the operating system level, have an extremely 
linear view of the world (i.e. they see everything as a row of 
bytes).  The application program, which actually uses files, has a 
different view of the world; it sees SGML etc. as a sequence of 
characters, although it still goes through the file in a linear fashion.  
However, you might want to jump right into the middle of an 
SGML file (for example to pick up a particular entity), accessing 
information in the same way that humans do.

The design of a system that needs to access SGML-tagged 
information in this very direct way, must incorporate a number of 
goals.  The design must be open, that is it must provide full SGML 
text to any application (with or without entity management? - 
Tim commented that it is good to see that relevant PD software 
tools will soon appear, such as the POEM software announced 
earlier in this conference).  The design must make information 
highly accessible, retrievable on the basis of structure, content, 
and linkage.  It should also be "alive", allowing information to be 
updated quickly and safely.  The design should support 
"Document Management" (i.e. versioning and workflow), and it 
should be able to do all of the above quickly.

What does "Open" mean in this context?  Flat filesystem access 
(real or emulated) is the lowest common denominator shared by all 
open systems, but being able to pass SGML files between systems 
requires additional sophistication.  You really need to develop an 
"element server" (effectively an element API) to have truly open 

There are four possible strategies which can be adopted when 
developing a solution.  The first involves the use of standalone flat 
files (where the SGML tagged information sits in separate files);  
this approach offers complete SGML fidelity, maximal openness, 
and relative ease of update - however, retrieval can be hampered 
by poor tools and performance, and it is difficult to perform 
updates safely and securely..  From the standpoint of data 
management, this approach appears neutral (neither especially 
good or bad) because although there are no tools, there are also no 

The second strategy involves the use of indexed flat files; this is 
the approach adopted by the Opentext product (i.e. it builds 
structure and content indexes which can be then used to 
access/retrieve information, support updating etc.; the indexes 
relational database.  The arguments in favour of this approach are 
that it allows complete SGML fidelity, excellent openness, 
excellent retrieval.  However, update implementation is complex, 
as it is difficult to insert to stick bytes into the middle of a large 
indexed flat file without creating problems.

The third strategy requires the use of a relational database (and 
Tim pointed out that 90% of the world's existing business 
information is already stored in relational databases - so there is 
a great deal of expertise with this kind of database).  In a relational 
database, SGML elements or entities are mapped into relations, 
and extra information is included to model the SGML element 
hierarchy.  Some extensions to SQL may be provided to support 
this approach.

Tim gave some examples of how relational databases to handle 
SGML have been implemented in some of the products 
demonstrated at this conference.  The first used a single table of 
three columns (Context representation, Properties, and SGML 
Text); record boundaries are inserted at the starts of a subset of 
"distinguished" elements.  The hierarchy is stored and rebuilt via 
Context Rep [?], whilst metadata (versioning and presentation) is 
stored in properties.  This means that the SGML can be 
reconstructed on demand, and the product can perform structure 
and/or content queries on the distinguished elements.  In his 
second example, the relational database was built from a simple 
decomposition process - breaking the documents down into 
distinguished elements (e.g. paragraphs); each table record is one 
such element, with one field of text, and the rest of attributes.  In 
his final example, the SGML text is stored in BLOBs (Binary 
Large Objects) divided purely for implementation reasons.  In this 
case, elements are stored as table entries with the fields: BLOB id, 
parent ID, sibling ID, first child ID, and attributes].

There are advantages to using the relational database approach.  
SQL is a world-standard tool; the theory and practice for safe and 
efficient updating of relational databases is well-understood.  It 
also offers the possibility of excellent integration with existing 
document management techniques.  However, using relational 
databases offers poor SGML fidelity, the openness of the 
information is compromised, and retrieval performance can also be 

With this in mind, Tim proposed his fourth strategy for developing 
an open system for handling SGML information - using a "native 
SGML" (object-oriented type) database.  In this case the Database 
Definition Language (DDL) is SGML, and the Data Manipulation 
Language (DML) is also SGML-oriented; there is no hidden or 
relational machinery.  As an example of a product that has 
implemented this approach, Tim talked about SGML/Store; the 
input is via an SGML parser, and the database essentially stores 
the resulting ESIS.  The software has API primitives for tree 
navigation and content queries, and treats both elements and 
attributes as nodes.  It uses a transaction-oriented, versioned-object 
model, and supports multiple instances and DTDs per database.
The advantages to using a native SGML database are that it allows 
for complete SGML fidelity, it gives an opportunity to implement 
commercial database security and integrity features, and it also 
makes it possible to optimize for performance.  The disadvantages 
are that it involves a proprietary implementation, and the use of 
proprietary API and/or DML.

In conclusion, Tim recommended that if it is possible to get away 
with storing SGML documents in flat files, then this is the 
preferred solution as it is simple and thus safe.  He felt that there is 
still a major requirement for the development of relevant standards 
(e.g. an SQL for handling SGML documents), and hoped that this 
need would be met sooner rather than later.

27.     "User Requirements for SGML Data Management" - 
	 Paula Angerstein (Texcel)
[Paula suggested that an appropriate alternative title to this 
presentation could well have been the question, "Why do we want 
to have SGML-based approaches to data management in the first 

The current business trend shows a growing awareness of 
documents.  Documents are at last being recognized as a corporate 
asset, although they are often not managed as well as other types 
of corporate information (such as financial data).  There also 
appears to be a gap in the corporate information infrastructure (for 
example, many large companies cannot share documents 
effectively and efficiently internally) - so strategies are needed to 
track and share information.

There are a number of common document-related problems in 
business.  For example, finding the right version of the right 
document, keeping related documents compatible with each other, 
and synchronizing documents with the latest data.  Other typical 
problems include getting a document customized for a particular 
job, reusing appropriate existing document parts, and coordinating 
the multiple access and update of documents.

An effective document repository management system should 
provide a number of benefits: it should support automated quality 
control, maintain document security, account for and trace any 
amendments, facilitate document reuse, and assist worker 
coordination.  The successful solution to providing such a system 
must offer facilities for the automation of the main business 
processes, provide for the collaborative reuse of information, and 
be easy to integrate with existing data management practices 
(although this is often more of a cultural than a technical problem).

The key to a successful information management strategy is a 
centralized information repository.  It represents a "logical vault" 
for all the documents and information relevant to a workgroup.  As 
a managed collection, documents can be browsed, queried, and 
used by all members of a workgroup to collaborate on projects.  
Versions, configurations, and status of information can be 
centrally kept up-to-date, providing automated quality control, 
accountability, and traceability.  Information can be shared and 
reused, guaranteeing integrity of data and boosting productivity.
SGML-based repository management goes beyond document 
image and/or traditional document management.  It makes it 
possible to use the rich set of information in a document  - 
namely the markup as well as the content.  Document components 
can be shared, reused, and subject to configuration management.  
Moreover, it means that automated processes can be driven by the 
document contents (i.e. by the data), and so do not have to be left 
to other tools.

This approach is different from traditional product configuration 
management, in so far as documents are structured but have 
"unpredictable" data (i.e. they contain elements which have no 
fixed size, order or content type).  The typically hierarchical 
structure of documents does not model naturally as relational 
tables, and the level of granularity required to track changes etc. 
probably does not correspond to a document as it is presented to 
the end user.  Similarly, most people probably would not wish to 
adopt a forms-based document editing environment.

Although SGML is often thought of as an interchange format only, 
it also provides a small but powerful set of semantics for document 
data modelling.  Element and attribute definitions provide a 
"schema" and attributes for objects in a repository.  Entity 
definitions provide a way to share units of information, and 
IDs/IDREFs (together with appropriate external identifier 
resolution or HyTime mechanisms) provide repository-wide 
linking.  The benefits of using SGML modelling in a data 
repository stem from the fact that SGML is optimized for 
documents.  Document markup and structure contribute to the 
process of retrieval.  It also means that you need only one model 
and language for information creation, management, and 

SGML repository management enables new business processes.  It 
becomes possible to have on-demand generation of new 
information products, through dynamic document assembly and 
"data analysis".  Element-based access control makes it easier to 
share document components amongst the members of a 
collaborative workgroup.  It also becomes possible to  track the 
life-cycle of document components through element-based 
configuration control.  Whilst the use of structure-based queries 
allows dynamic online viewing and navigation around document 
components within the repository.

SGML repository management also facilitates existing business 
processes such as storage, filing, browsing, query, retrieval, access 
control, auditing, routing, version, job tracking and reporting, 
archiving and backup.  For many of these functions, an SGML 
repository enables operation on individual elements in addition to 

Documents are being recognized as an increasingly important part 
of the information infrastructure of a company.  With the 
introduction of SGML-based approaches, we should witness a 
gradual movement from the current notion of "document 
management" to that of "information management".

28.     "A Document Query Language for SGML Databases" 
	 - Ping-Li, Pang, Bernd Nordhausen, Lim Jyh Jang, 
	 Desai Narasimhalu (Institute of Systems Science, 
	 National University of Singapore)
	 [This presentation was delivered by Ms Ping-Li Pang.]  

The Institute of Systems Science is one of the main research 
departments at the National University of Singapore.  Recently, 
they have been looking at managing documents, especially using 
SGML-based approaches, and this has led to the development of 
DQL, a document query language for SGML databases.

The main requirements for a language to query a database of 
SGML structured documents are that it must support queries on 
the basis of complex structural information, facilitate information 
retrieval, and assist document development.  Ping-Li talked 
through some examples of query expression in DQL, showing the 
typical form of a query (select....from.....where....), and the use of 
path expressions (for elements, database attributes, and SGML 
attributes).  She then discussed the DQL method for expressing the 
following types of query: a DTD structural query, a document 
instance structural query, a document instance content query, a 
query on versioning information, and a query on links.  [Readers 
who would like to see examples of these queries should probably 
contact the DQL team at the Institute of Systems Science].

DQL was an attempt to implement an SQL-like language for use 
with SGML.  It has the expressive power to query on structural 
and/or content information at any granularity.  DQL is being 
implemented at the Institute of Systems Science and an initial 
prototype that has all the features of DQL will be ready in March, 

During questions, Ping-Li stated that the "database attributes" 
mentioned in her examples are defined when the database model is 
developed (i.e. the database attribute "title" maps from the content 
of the SGML element "title" in the relevant DTD[?]).  The DQL 
development team have not looked at HyQ for querying SGML 

29.     Closing Keynote - Michael Sperberg-McQueen (Text 
	Encoding Initiative)

Following the success of the presentation he gave last year, 
Michael Sperberg-McQueen had again been invited to deliver the 
closing keynote speech.  The full text of his address will be posted 
to the comp.text.sgml newsgroup.

Michael was really posing the same question that he asked last 
year, "What will happen to SGML and the market for SGML 
information in the future?"  He emphasised that he would be 
stating his personal opinions only, and they should not be taken as 
representative of the TEI or any other institution.

Michael noted that some progress had been made on several of the 
issues that he raised in his closing address at SGML`92.  The 
growing expertise in SGML has meant that improved styles of 
DTD design are being adopted.  DTDs are being developed to 
meet the users' information handling needs, not the requirements 
of a system or application (often evidenced in earlier DTDs by the 
presence of processing-motivated "tweaks").  The HyTime engines 
which are now approaching should be able to close some of the 
"gaps" of using SGML, since they will make it possible to use 
architectural forms to convey some sense of system and/or 
application awareness without compromising the SGML source.
SGML promises data portability, but it may also lead to 
application portability.  The HyTime and DSSSL standards should 
facilitate this process, and developers will need to know about 

The biggest change since SGML`92 is the amount by which the 
volume knob on the so-called "Quiet revolution" has been turned 
up.  SGML has already begun its entry into mainstream 
information processing.  It is already a standard that is being 
adopted world-wide.

Whilst the use of HTML (the Hypertext Markup Language) on the 
World Wide Web is perhaps not an ideal demonstration of what 
SGML-based approaches can achieve, they are doing a very good 
job of selling SGML to people who were previously ignorant or 

Michael said that he would like to see SGML-awareness 
embedded into as many mainstream applications as possible as a 
matter of course.  When SGML-awareness is embedded almost 
incidentally in an application, then SGML will be able to realize 
its full potential, and it could be argued that this is already 
beginning to happen.  The future is perhaps not only closer than 
we think, it may already by here now; the products demonstrated 
at this conference, and the public domain tools such as 
ObjectSGML, POEM and CoST are perhaps examples of this.
Michael predicted that it might not be too long before we see 
SGML being used in the area of literate programming.  It is clearly 
an ideal case for an SGML application.

SGML to SGML transformations have been a key topic at this 
conference.  This is important, because in future only legacy data 
that is being converted to SGML, and outputs from SGML 
systems (e.g. for printing a document), will exist in non-SGML 
form.  All other information interchange will be done using SGML 
and thus DTD to DTD transformations will be a fundamental 
issue.  The points raised by Dave Sklar in his presentation on 
GLTP (now to be re-named STTL) will become highly pertinent, 
as will the requirement to use GLTP and DSSSL in general.
Things are going to become more complicated, therefore we will 
all need new/better tools for things like DTD design and 
development, SGML systems development, DTD transformation 
design, and the actual transformations themselves.

There is clearly still much more work to do.  Mainstream vendors 
will need to understand more about SGML, and if they do not 
change their products in the long run they will lose out as 
customers come to expect/demand embedded SGML-support.  It is 
certain that technology will keep on changing, and superficially 
attractive (non-SGML-based) solutions to managing information 
will always ultimately fail.  For when this situation comes about, 
SGML will not just be in the mainstream, it will be the 

For further details of any of the speakers or presentations, 
please contact the conference organizers are:

Graphics Communications Association
100 Daingerfield Road 4th Fl
Alexandria VA 22314-2888
United States of America

Phone:    +1 (703) 519-8160
Fax:      +1 (703) 548-2867
You are free to distribute this material in any form, provided 
that you acknowledge the source and provide details of how to 
contact The SGML Project.  None of the remarks in this 
report should necessarily be taken as an accurate reflection of 
the speaker's opinions, or in any way representative of their 
employer's policies.  Before citing from this report, please 
confirm that the original speaker has no objections and has 
given permission.

Michael G Popham
SGML Project - Computer Development Officer
Computer Unit - Laver Building
North Park Road, University of Exeter
Exeter EX4 4QE, United Kingdom

email: (INTERNET)
Phone:  +44 392 263946  Fax:  +44 392 211630