Credits

The following report was obtained from the Exeter SGML Project FTP server as Report No. 9, in UNIX "tar" and "compress" (.Z) format. It is unchanged here except for the conversion of SGML markup characters into entity references, in support of HTML.

SGML '91 Conference Report, by Michael Popham

UNIVERSITY OF EXETER COMPUTER UNIT                        SGML/R9

THE SGML PROJECT

CONFERENCE REPORT:

    SGML '91
    THE OMNI BILTMORE HOTEL, PROVIDENCE,
    RHODE ISLAND, USA
    OCTOBER 20th-23rd 1991
                                                        Issued by
                                                   Michael Popham
                                                29 September 1992
_________________________________________________________________


1.  SUBJECT

The SGML '91 conference was organized by the Graphic
Communications Association (CGA).  In their promotional
literature, the GCA used the following terms to describe the
conference:

    "SGML '91 will be a high-speed, interactive, meeting of 
     the people who are using SGML right now or are just on 
     the verge.  We will hear SGML success stories and
     discuss the problems and issues that face many users.   
     The agenda includes guided discussions of common concerns.   
     Participants will be called upon to join working groups 
     and to contribute to conference documents which may
     shape standard practice in this developing field!"

In this long report, I tried to record as much about the
presentations and events at the conference as I could.   I take
full responsibility for any mistakes or misrepresentations and I
apologize in advance to all concerned.  Any text enclosed in
square brackets is mine.

This report could not have been completed without the assistance
of Neil Calton and the goodwill shown by his employers,
Rutherford Appleton Laboratory (RAL).  The SGML Project takes
full responsibility for all the facts and opinions given in this
report, none of which necessarily reflect opinion at RAL.  All
Neil Calton's contributions are indicated by [NBC], all other
reports were written by me.

1.1.    List of Contents

    1.  Subject

    2.  Background

    3.  Programme - Day 1

        3.1  "SGML The Year In Review, A User Guide to the
             Conference" -- Yuri Rubinsky (President, SoftQuad Inc)
        3.2  "SGML - The Wonder Years" -- Pam Gennusa (Consultant, 
             Database Publishing Systems)
        3.3  "Grammar Checking and SGML" -- Eric Severson (Vice 
             President) and Ludo Van Vooren (Director of Applications,
             Avalanche Development) Company
        3.4  "Attaching Annotations" -- Steven DeRose (Senior 
             Systems Architect, Electronic Book Technologies)
        3.5  "Using Architectural Forms" -- Steve Newcomb (President,
             TechnoTeacher Inc Case Studies)
        3.6  "Implementing SGML for the Florida Co-operative
             Extension Service" -- Dr Dennis Watson (Assistant 
             Professor, University of Florida)
        3.7  "SGML User Case Study" -- Susan Windheim (Technology 
             Consultant, Prime Computer Technical Publications)
        3.8  "STEP and SGML" -- Sandy Ressler (National Institute 
             of Standards and Technology) [NBC]
        3.9  "Multi-vendor Integration of SGML Tools for Legal
             Publishing" -- Francois Chahuneau (AIS/Berger-Levrault)
        3.10 "Developing a Hypertext Retrieval System Based on
             SGML" -- Tom Melander (Sales Engineering Manager,
             Dataware Technologies) [NBC]

        Application Topic#1
        3.11 "Data for Interactive Electronic Technical
             Manuals (IETMs)" -- Eric Freese (Senior Technical Specialist, 
             RYO Enterprises)
        3.12 "International SGML Users' Group Meeting

     4. Programme - Day 2

        Reports from the Front - Various Speakers
        4.1 "OSF's Pursuit of DTDs" -- Fred Dalrymple (Group Manager,
            Documentation Technology, Open Software Foundation)
        4.2 "The Text Encoding Initiative: A(nother) Progress
            Report" -- Lou Burnard (Co-ordinator, Oxford Text Archive,
            Oxford Computing Service)
        4.3 "TCIF IPI SGML Implementation" -- Mark Buckley (Manager, 
            Information Technology, Bellcore)

        Application Topic#2
        4.4 "Rapid DTD Development" -- Tommie Usdin (Consultant, Atlis
            Consulting Group)

        Poster Session 1
        4.5 "Tables in the Real World" -- Various speakers
        4.6 "Handling Tables Practically" -- Joe Davidson (SoftQuad)
        4.7 "TCIF approach to Tables" -- Mark Buckley (Manager, 
            Information Technology, Bellcore)
        4.8 "Format-oriented vs content-oriented approaches
            to tables" -- Ludo Van Vooren (Director of Applications,
            Avalanche Development Company)
        4.9 "How should statistical packages import/export
            SGML tables?" -- Peter Flynn (Academic Computer Manager,
            University College, Cork)

        Formatting Issues and Strategies - Various speakers
        4.10 "Formatting - Output Specifications" -- Kathie Brown
             (Vice-President, US Lynx)
        4.11 "Formatting as an Afterthought" -- Michael Maziarka 
             (Datalogics Inc., Chicago, Illinois)
        4.12 "Native versus Structure Enforcing Editors" --
             Moria Meehan (Product Manager, CALS Interleaf)

        Poster Session 2 
        4.13 "Verification and Validation" -- Eric Severson 
             (Vice-President, Avalanche Development Company)
        4.14 AAP Math/Tables Update Committee Chair - Paul Grosso

     5. Programme -Day 3

        5.1 "Unlocking the real power in the Information" --
            Jerome Zadow (Consultant, Concord Research Associates)
        5.2 "The Design and Development of a Database Model
            to Support SGML Document Management" -- John Gawowski, 
            Information Dimensions Inc [NBC]
        5.3 "A Bridge Between Technical Publications and Design 
            Engineering Databases" -- Jeff Lankford, Northrop 
            Research and Technology Centre [NBC]
        5.4 "Marking Up a Complex Reference Work Using SGML
            Technology" -- Jim McFadden, Exoterica [NBC]
        5.5 "Nuturing SGML in a Neutral to Hostile Environment"
            -- Sam Hunting, Boston Computer Society [NBC]
        5.6 Trainers Panel [NBC]
        5.7 "Reports from the Working Sessions [NBC]
             #1 Standard Practices -- Eric Severson and Ludo Van
                                      Vooren
             #2 A Tool for Developing SGML Applications

     6. Summary


2.  BACKGROUND

This was a well-attended conference, with over 150 participants.
European interests were sadly under-represented, with only about
10 attendees from E.C. nations (of which two-thirds were from
academic or research institutions, and the remaining third was
predominantly Dutch publishing houses).  Japan had sent
representatives from Fujitsu International Engineering Ltd, and
the Nippon Steel corporation, but all the other attendees were
from North America.  The American delegates were a reasonable mix
of SGML software houses and consultancies, academic/research
institutions, and many large corporations;  the following were
well-represented:  AT&T, Boeing, Bureau of National Affairs Inc.,
IBM, InfoDesign Corporation, Interleaf, US Air Force and several
U.S. Departments.   Various activities preceded the start of the
conference proper, most notably a short tutorial for those who
had had little direct experience of SGML Document Type
Definitions (DTDs) and a guided tour of the Computer Center at
Brown University.   The tour highlighted Brown's success at
forging sponsorship deals with the commercial world, and we were
shown rooms full of workstations available for undergraduate use.
We were also given a demonstration of some of their research on
computer graphics - including the construction of an
'intelligent' graphic object (a cheese seeking mouse) using a
number of logical building blocks.


3.  PROGRAMME - Day 1


3.1. "SGML The Year In Review, A User Guide to the Conference"
     -- Yuri Rubinsky (President, SoftQuad Inc.)

The conference opened with a presentation from the Chair, Yuri
Rubinsky.   He displayed a graph showing the dramatic rise in
attendance of such conferences since SGML '88 was held, and
suggested that this reflected the growth in interest in the whole
computing community.

Rubinsky then went on to talk about a range of SGML-based
activities, and noted that every initiative that he had talked
about at SGML '90 was still on-going (and thus such projects
should not be though of as 'flash-in-the-pan').  He then listed a
large number of points, many of which I have tried to reproduce
below;  any inaccuracies or omissions are mine.

Under the general heading of SGML Applications, Rubinsky stated
that the Text Encoding Initiative now includes fifteen affiliated
projects, with a combined funding of $30+ million, and involving
approximately 100,000 people.  He also reported that the European
Workgroup on SGML (EWS) is continuing to develop its MAJOUR DTD
-- which will now incorporate parts of the AAP's work for its
body and book matter; the European Physics Association are to
adopt MAJOUR and also intend to campaign for changes to the AAP
DTD.

Rubinsky stated that the Open Software Foundation (involving such
companies as IBM, Hitachi, HP, and Honeywell) are to use SGML for
all their documentation.   Also the French Navy are to have their
suppliers provide documentation marked up with SGML, and the CALS
initiative itself remains very active.

With regard to international standards, Rubinsky noted that
HyTime was about to go forward as a Draft International Standard
until April 1992.  Meanwhile, the Document Style Semantics and
Specification Language (DSSSL) has been approved as a Draft
International Standard, and the Standard Page Description
Language (SPDL) was due to finish balloting at the end of October
-- after which Rubinsky expected to move into being a standard.
He also reminded everyone the ISO 8879, the Standard dealing with
SGML, is now entering its review period, any subsequent
version(s) will be backwardly compatible with existing
applications and systems.   Rubinsky said that the current
intention is to maintain a database of all comments made about
ISO 8879, which will be submitted with markup conforming to a
specially developed DTD.

Rubinsky mentioned several interesting projects and initiatives.
Microsoft is about to release an updated version of its
"Bookshelf" CD-ROM, which will be partially coded with SGML
markup.   The ACM is to extend use of SGML to cover its
documentation, whilst the group of banks involved in the SWITCH
scheme will produce all their internal documentation with SGML
markup.   The CURIA project at University College, Cork,
(Ireland) will spend ten years marking up Irish literary
manuscripts, and the publishers Chadwyck-Healey have recently
released a CD-ROM containing the works of 1350 poets marked up
with SGML.

Rubinsky also referred to the work of the Text Encoding Iniative
(TEI) and that of the SGML Forum (Japan) -- the Japanese chapter
of the SGML User's Group who have recently produced a piece of
software called "SGF", a Simple SGML Formatter.  The Canadian
Government have stated that they will be producing all their
national standards using SGML marked up text.   In France, three
dictionaries are being produced with the help of SGML, whilst a
number of car and aeroplane manufacturers have begun to use SGML.
In addition, Rubinsky spoke of the work being carried out at the
Biological Knowledge Laboratory (Northeastern University, USA) to
produce a body of on-line SGML texts and knowledge-based tools.
He also cited the work of the British National Corpus -- a
collection of around 100 million words, tagged according to the
guidelines produced by the TEI.

Rubinsky briefly noted that three relevant books had appeared in
the past year:  Goldfarb's "The SGML Handbook", a Guide to CALS
(No authors mentioned) and a bibliography on SGML (produced at
Queen's University at Kingston, Canada).   He then went on to
list the plethora of new products, technology and services that
had appeared since SGML '90, including the following [Please
forgive any omissions]: the ARC SGML Parser (now installed at
100+ sites and ported from DOS to several versions of UNIX),
AIS/Berger-Levrault's "SGML Search", AborText's "SGML
Editor/Publisher", Avalanche's "FastTag" (now available for DEC
workstations), Agfa CAP's next generation of CAPS software, Bell
Atlantic's SGML-based hypertext product, the Computer Task Group
(Consultants), Electronic Book Technologies new version of
"Dynabook". E2S "EASE", Exoterica Corporation's "Omnimark",
Office Workstations Limited (OWL)'s "Guide Professional
Publisher" (   which creates hypertext versions of SGML
documents), PBTrans (a context-sensitive replacement program to
be released as shareware), and the SEMA Group's work on its
"Mark-It" and "Write-It" products.

On the corporate front, Rubinsky said that Framemaker are
committed to producing an SGML-based version of their product by
the end of 1992.   Thirty percent of Exoterica Corporation had
recently been sold to a large French firm.   WordPerfect
Corporation have said that they will enable users to manually tag
documents (although there is no stated time-frame) and IBM have
also announced a commitment to providing users with SGML
facilities.

Rubinsky closed by citing various published items on SGML use,
and remarking upon work to produce the Publishers' Interchange
Language (PIL), which will be based on SGML.



3.2 "SGML - The Wonder Years" -- Pam Gennusa (Consultant,
    Database Publishing Systems)

Gennusa began by reminding the conference that the fifth
anniversary of ISO 8879 had only just passed;  she stated that
her presentation would look at what had happened over  that five
year period, and how we could expect to see the use of SGML
evolve.

Indulging in a little wordplay with the third letter of the
acronym SGML, Gennusa suggested that the Standard Generalized
MARKUP Language had been the direct result of the decision to
begin using a generic tagging scheme for markup.   However,
recent practice has been to treat SGML as a MODELLING language,
where Document Type Definitions (DTDs) are written to model the
"content reality" of a type of document.  Nowadays, there is a
growing recognition that SGML provides the support of a document
MANAGEMENT language -- with the creation of DTDs that assist in
the management of document instances and applications.

Gennusa had observed that a number of similarities have emerged
in SGML usage.  Conventions for DTD construction are beginning to
appear -- for example, placing all entity declarations at the
start of a DTD, typically having attribute list declarations
immediately below element type declarations, and conventions for
naming parameter entities.   The need for modelling information
corpora rather than the information contained in individual
documents has also been recognized -- with the emergence of the
HyTime Standard, work on Interactive Electronic Technical Manuals
(IETMs) for the U.S. Department of Defense, and the AECMS 1000D
specification for technical information for the European Fighter
Aircraft.

Gennusa also noted that the use of SGML has become increasingly
mature.   She claimed that there has been "more concentration on
content over pure structure", "a more generic notion of behaviour
mapping", and "more flexibility in instance tailoring".   Also,
there were now new definitions of the term "document", and the
relationship between use of SGML and databases had begun to
strengthen and be consolidated.

Gennusa raised the "ol'saw" of content-oriented vs structure-
oriented markup, and suggested that publishing is only a
reflection or view of the real information contained within a
"document".  She suggested that information and its uses should
always be borne in mind and that the work of a DTD writer "must
reflect not just the document but [also] the application goals".
Indeed Gennusa suggested that it was strongly inadvisable to try
and write a DTD without establishing how the information will be
used;  this requires careful document and application analysis.
She briefly mentioned how early use of SGML had concentrated on
separating content from format, but noted that there now seemed
to be a movement towards the use of `architectural forms' (which
she described as " ... indicative of a class of behaviour that
would be manifested differently in different environments", and
also noted that " .. multiple elements within a DTD may have the
same architectural form, but require unique element names for
various reasons").

Gennusa stated that people's definition of a `document' is
changing, with the advent of hypertext, meta-documents, and text
bases.   However, SGML is keeping pace with these changes,
through the introduction of HyTime, meta-DTDs and modular DTDs.
She also felt that there was an increase in document instance
flexibility, through such practices as the use of parameter
enties for content models, the use of marked sections, and the
use of publicly identified declaration sets.

Gennusa then discussed work on the Interactive Electronic
Technical Manuals Database (IETMDB) DTD for the U.S. Department
of Defense.   The DTD is known as the Content Data Model (CDM),
and attempts to model all the data required for the maintenance
of a weapons system in such a way that selected sections can be
extracted for viewing on-line or on paper.   This is the first
use of HyTime's linking and architectural form features within a
defence environment.

AECMA 1000D is the specification for technical information
relating to the European Fighter Aircraft.   The primary object
of the system is a `data module', which contains a data module
code, management information, and content of a particular type.
The application makes extensive use of publicly identified
element and entity declaration sets (with one declaration per
element across all the declaration sets), marked sections, and
the use of parameter entity replacement text as content models.

Looking ahead, Gennusa identified a number of important
developments on the horizon.  She expected to see a strengthening
in the relationship between SGML and object-oriented programming,
systems and databases.   Data dictionaries will be used to
control the semantics used within an application, and the use of
architectural forms will increase.   The number of documents that
only ever exist within an SGML environment will grow, and
document instances will increasingly have content-oriented rather
than structure-oriented markup.   Gennusa felt that although the
work with SGML was far from complete, there was no possibility of
"turning back" [but if the President of the SGML User's Group
said anything different, I would be very worried!]

On a light-hearted note, Gennusa closed with some remarks on the
theory of the "morphogentic field" (M-field) which postulates
that once a piece of information has been acquired by a
particular portion of the population of a given species, this
knowledge will `become available' to all the member of the
species (and their descendants) via resonance across the M-field
-- whatever their physical location.   Gennusa hoped that the
number of participants at SGML '91 would be just sufficient to
spread recognition of the value of SGML as an enabling technique
for information producers and consumers, across the M-Field of
our species.



3.3 "Grammar Checking and SGML" -- Eric Severson (Vice
    President), Ludo Van Vooren (Director of Applications,
    Avalanche Development Company)

This presentation was aimed primarily at writers of (technical)
documentation within a commercial environment.   Severson and Van
Vooren began by giving the reasons for applying standard
practices to document handling procedures -- ie to provide an
impression of corporate identity and consistency (in terms of
writing style, information structure and layout etc), and also to
enable uniform and flexible processing.

Standardized formatting can be facilitated by separating document
form and content, making use of generic markup and style sheets,
and adopting appropriate international standards (such as SGML,
DSSSL, FOSI, ODA etc).  The contents of documents can also be
organized in a standard fashion -- by ensuring the separation of
content and structure, using object-oriented hierarchical
structures, and employing rule-based systems (eg with DTDs) to
enforce occurrence and sequence in structures.   The contents
themselves can also be standardized, through the adoption of
well-defined writing styles and techniques; this could involve
anything from passing text through spelling/grammar/readability
checkers, to adopting strict rules on vocabulary, sentence
construction etc (eg Simplified English).   Severson and Van
Vooren then spent some time looking at the capabilities of the
state-of-the-art tools that are available to help standardize the
writing of text -- such as spelling checkers, thesauri,
readability indexes and grammar and style checkers.  Of course
authors can also adopt self-defined rules on grammar, vocabulary,
sentence construction etc.

Using a markup system based on SGML makes it easier to apply any
standard practices and rule checkers to specific objects within a
document structure.   Moreover, any automated checking procedures
can extract information about the nature and context of an object
from the relevant (SGML) markup.  Severson and Van Vooren then
gave some examples of how markup could be used as a basis for
recognizing, and perhaps even automatically correcting, any non-
standard text occurring within a document. Thus, a poorly
written warning could be picked-up by a grammar checker suitably
primed to look for certain features within any text element
marked up as a warning;  readability checkers could be set up to
accept different levels for different elements in the text (eg to
reflect that help text should always have a much lower
readability level than, say, a technical footnote);  different
user-defined dictionaries could apply when spell-checking certain
sections of a document.

On a slightly different note, Severson and Van Vooren closed with
a brief discussion on how grammatical understanding can improve
the performance of auto-tagging applications.  For example,
sentences and paragraphs can be checked for completeness before
they are marked up as such (which is particularly useful if text
is split across columns or pages).   A grammar checker can also
help to parse sentences in order to recognize (and tag) elements
such as cross-references, citations, and so on.



3.4 "Attaching Annotations" -- Steven DeRose (Senior Systems
    Architect, Electronic Book Technologies).

DeRose gave the first paper to deal with a single (technical)
issue in detail -- though without going into the minutiae of his
topic.  He began by raising the question of what do we mean by
the term "annotation", and suggested that the term could be
applied to any "text or other data supplemental to a published
document".  (By "published document", I took DeRose to be
referring to a time-fixed release of information intended for
human consumption  -- such as a particular version of a specific
article stored within a hypertext database.)   Thus "annotation"
includes anything which is not part of the document as published,
and excludes any author-created supplements to the published text
-- such as footnotes or sidebars.  This interpretation implies a
distinct separation between the work of the author, the published
document, and any reader-supplied annotation.

Since DeRose's work with Electronic Book Technologies focuses
primarily on he generation of and access to hypertext, his
presentation concentrated on the problem of annotation and
electronic (hyper-texts).  Having defined what he mean by
"annotation", DeRose raised the general question of where they
should be stored -- inside or outside the document?   The main
advantages to storing annotations inside a document are that it
is easy to see where they attach, and it is easy to keep them
attached to the correct point if the text of the document is
edited.  However, he identified a number of problems with this
approach, for example the fact that readers would be allowed to
change a published document    which may be undesirable in itself
-- but which might involve checking a reader/annotator's
authority, risk corrupting the original content or invalidating
its markup, and so on.  Moreover, it would be necessary to
identify any reader-annotation as distinct from the original
content -- which might require changes to the document's DTD, re-
validation etc -- and there would still be practical difficulties
to overcome, such as how to stop copies of the same published
document getting out of synch, how to robustly attach annotation
if the reader is given no write-permission, or the document is
stored on a read-only medium (eg CD-ROM).  Storing annotations
outside of the published document resolves many of these problems
because there is no danger of the original being changed or
corrupted, however, this approach raises difficulties of its own.
DeRose suggested it makes it harder to specify exactly where in
a document an annotation attaches to, more difficult to keep
annotations with the relevant part of the document as it is
edited, and the process is further complicated if the document's
publisher and reader are not on the same network.

DeRose then discussed several ways to attach annotation to an
SGML document. The `natural' way would be to use ID/IDREF
attributes but the problem with this is that most elements in a
document do not have IDs.  Another technique might involve
identifying a path down the document tree using named nodes (eg
BOOK, then CHAP4, then SEC3, then PARA27);  this avoids having to
take into account the actual contents of the document (such as
chapter, titles etc) or the data type (eg CDATA), and is easily
specified and understood by humans.  However, changes in the
structure of the document tree might cause problems.   A related
approach would be to specify a path down the document tree using
attributes (DeRose gave the following example: "WORK with
name=Phaedo, SECTION N=3, LINE N=15"), or by using unnamed nodes
(such as Root, then child 5, then 4, then 27") -- however there
could still be complications if the tree was altered.  Other
techniques which DeRose briefly mentioned included using element
numbering (which he described as "well-defined, but
inconvenient"), using token offset ("poorly defined") or using
byte offset (which he dismissed as "Not even well-defined in
SGML" [because?] "Nth data character is not what the file system
tells you").

DeRose felt that the main challenge to the successful handling of
annotation was how to cope with changes to a document (and thus
its underlying tree structure).  He identified two ways in which
the path through a document tree could be altered, or "broken" --    
either "perniciously" or "benignly".  Pernicious breaking occurs
when it is impossible for an application program to tell that the
path through a document tree has been broken, and this could
cause severe problems.   Whereas, with benign breaking, at least
the program would be able to recognize (and inform the user or
system) that an unrecoverable break had occurred in the path that
enabled the identification of the annotated text.

DeRose then considered in greater detail how each of the various
technologies for attaching annotation to an SGML document that he
had outlined earlier, would be affected by changes made to the
document tree.  If element IDs/IDREFs had been used, DeRose
suggested that the link between the annotation and the relevant
text would remain very stable;  authoring software could help to
prevent the (accidental) re-assignment of IDs IDREFs by the
document's publisher or annotator, and if such a link were to
fail it would do so benignly.   If the use of a path of named
nodes had been adopted,  DeRose felt that any "long-distance
breakage" should be very unlikely.  He pointed-out that breaks
would only occur if a very restricted set of elements changed --  
such as if a direct ancestor, or an elder sibling on an ancestor,
was added or deleted.   DeRose gave an example, stating that
"Named path pointers into Chapter 2 are safe no matter what
changes happen within other chapters" -- and argued that if such
path pointers broke, they would be likely to point to a node
which no longer existed, and thus would behave benignly.

DeRose believed that a path based on the attributes of document
tree nodes, would perform in a very similar way to a path of
named nodes, and would be equally benign.  However, he felt that
the technique of using a path through a document tree based on
unnamed nodes (eg  "Root, then child 5, then child 4 .... etc")
would be less likely to be benign when it broke.   For despite
the fact that breaks would only occur when changes were made to
certain elements, with mixed content models "the number of
"elements" would include CDATA chunks, and [so] the count can
change subtly".   DeRose held that the other techniques he had
discussed would all break perniciously;   thus a path based on
element numbering would break perniciously if any tagged element
was added/deleted/moved and it preceded the element the
annotation pointed to -- similarly if a preceding word was
added/deleted/moved with a token offset-based path, or a
preceding character with a byte offset-based path.

DeRose then compared the convention (largely human-centered)
approaches to identifying sections of texts, with the techniques
he had been discussing.   He suggested that using IDs/IDREFs, or
stepping through a tree using named nodes or node attributes,
corresponded well to the way human beings typically refer to
(parts of) texts.   For example, we might use a scheme of serial
numbers to identify and refer to a particular paragraph in a
manual;  we might reference text in a book by its title, chapter,
section and paragraph number, or a piece of verse by the name of
the work, section and line number.   DeRose suggested that
identifying nodes in a document tree through a system of numbered
family relations (eg  "Root, then child 5,..... etc") was an
approach familiar to computer programmers, whilst referencing
solely on the basis of sequential numbering, token offset, or
byte offset was only used by "counting machines".

DeRose then looked at the sort of information one would want to
keep with an annotation -- chiefly details about the annotation
itself.  It would obviously be necessary to know what the
contents of the annotation were and where it attached in the
relevant document -- but it could also be very helpful to know
who made the annotation, when and what was its
purpose/function/type (eg criticism, support, correction etc).
There might also be some need to say how the annotation should be
presented -- although DeRose seemed somewhat sceptical of this.
However, he was very certain of the fact that for any document
processed electronically, it would be necessary to record the
version of the target document to which an annotation applies.

Looking to the future, DeRose discussed some of the features
encountered in the special case of annotating hypetext documents.
In a hypertext system -- where a document is stored as a web of
nodes rather than an hierarchical tree structure -- it would be
necessary to have at least two links to attach annotation:  one
between the annotation and the annotated node, and the other
between the annotation and the node which connects the annotated
node to the rest of the document web;  if there were several
other nodes which connected to the annotated node, they would all
need to know about the annotation.   [I am not sure that I
understood DeRose fully on this point, and may therefore have
misrepresented his argument].   DeRose also suggested that in a
hypertext [hypermedia?] system, each end of an annotation link
would need to specify "what application can handle the data"[?].

DeRose also noted that in a hypertext system, an annotation might
be attached to a note that "was not an element".  [By which I was
unclear whether DeRose meant "not a [single] element", "not a
[small] element", "not a [text] element", or something else];
his suggested solution was to "Point to [the] lowest element,
then [to an] offset within".

It is also conceivable that a reader might want to attach an
annotation that crosses element boundaries    although DeRose
felt that this would be unlikely (as people usually want to refer
to elements), and could be resolved by simply having the
annotation attached to its start and end points.   Similarly, if
an annotation applied to multiple/complex/discontinuous hypertext
nodes, this could be resolved by putting the information about
each attach point in a specially created node.  In general,
DeRose's advice was for designers to implement the most flexible
and robust pointers that they can, so that any breaks will be
benign.  He also indicated that the emerging HyTime standard is
developing a range of techniques for handling annotation and
related problems (such as activity- and amendment-tracking).



3.5 "Using Architectural Forms" -- Steve Newcomb (President,
    TechnoTeacher Inc)

Newcomb has been closely involved with work to develop ISO/IEC
DIS 10744, which describes the Hypermedia/Time-based Structuring
Language (HyTime) and he is also the Chairman of the SGML Users'
Group Special Interest Group on Hypertext and Multimedia (SGML
SIGhyper).  He began his presentation be reminding listeners that
voting on ISO/IEC DIS 10744 had started only eleven days earlier,
and urging all interested parties to obtain a copy of the
document as soon as they can.   Newcomb then turned to his main
theme, the use of architectural forms, which are a direct
development of work on HyTime.  [I was unfamiliar with this
topic, and strongly suggest that any interested readers should
obtain a copy of ISO/IEC DIS 10744 and/or get in touch with SGML
SIGhyper].

Newcomb suggested that two sets of practices had recently emerged
in the computing world -- the use of OOPS (Object-Oriented
Programming Systems), and the use of generic markup (especially
SGML).  Although both came from quite different sources, they
share a fundamental concern about how data will be used.

Newcomb noted that one of the main principles underlying the
development and use of SGML was to enable the re-use of
documents.  However, he also noted that whilst a particular group
of people might find that SGML facilitates their re-use of their
own documents, different groups have different perceptions of
what is important in a document.  Thus rather than adopt, say, a
publicly registered DTD for the document type `report', many
groups of SGML users have been writing their own uniquely
tailored DTDs (or amending DTDs they have obtained from a public
source).  Such behaviour limits the re-usability of documents
which are, ostensibly, of the `same' type -- as it becomes
impossible for one group of users to easily re-use the documents
created by another.   Newcomb described the problem as one of
"creeping specialization and complexity", with the only
foreseeable result being a return to the "Tower of Babel"
document interchange scenario, which prompted the development of
SGML in the first place!

Newcomb's suggested solution, was that we (the SGML/HyTime/user
community) should only attempt to standardize the truly common
parts of a document [type].   Architectural forms enable the
creation of a set of rules at a meta-level, all the objects
created as part of one set of rules (ie as part of a meta-class
inherit all the properties of that meta-class.   Newcomb gave the
example of having to write a DTD for some patient medical
records.  He argued that there would be certain information that
it would always be necessary to know (eg name, sex, age, blood
type, allergies etc) but there would be other pieces of
information which would become more or less important over time.
Since the writers of the DTD have no way of knowing how the
second type of information will be re-used in the future, there
is every likelihood that they will write the DTD to reflect their
current priorities -- so making all of the information less
easily re-usable.  Newcomb's solution would be to use
architectural forms in the DTD to define those elements whose
information content will remain of fixed importance over time;
this should help to guarantee that crucial information will be
available for future re-use.   Thus, in an on-line environment,
element components (based on HyTime's architectural forms), user
requirements, and the SGML syntax are all combined to produce a
user-created DTD (with entities, elements, attributes etc, and
all the other features of a traditional SGML DTD) for the
generation and manipulation of hyperdocuments.

Newcomb put up a series of slides, illustrating the general
HyTime approach to architectural forms, and some specific
examples taken from DTDs.   Since I cannot reproduce the slides
here and do not feel that I understood the subject well enough to
summarize Newcomb's examples and discussion, I would direct
readers to ISO/IEC DIS 10744.

In conclusion, Newcomb stressed the fact that any conforming SGML
parser that currently exists should be capable of parsing a
HyTime document.   Newcomb said that HyTime is an
application/extension of SGML (ISO8879), and is not intended as a
replacement or hypertext equivalent.  The development of
architectural forms would appear to be another step towards
ensuring the re-usability of SGML documents, without necessarily
involving a move to a (HyTime-based) hypertext/multimedia
environment.



CASE STUDIES -- Various speakers


3.6 "Implementing SGML for the Florida Co-operative Extension
    Service"  -- Dr Dennis Watson (Assistant Professor, University 
    of Florida)

Watson gave a brief overview of the approach adopted at the
Institute of Food and Agricultural Sciences (IFAS) of the
University of Florida to solve the documentation needs of the
Florida Co-operative Extension Service (FCES).   IFAS not only
supports FCES, several hundred scientists, and dozens of Ccounty
Extension Offices, academic departments, research and education
centres, it does so with the express purpose of educating
students and strengthening the dissemination and application of
research.  Moreover, IFAS needs to be able to distribute
documentation in a huge variety of paper-based and electronic
forms.  The IFAS approach is for authors to product their
documents using WordPerfect, then pass then on to an editorial
team who are responsible for reviewing the document and using
automated techniques to convert them into suitable formats.  Over
time, this approach had been growing progressively unmanageable
and inefficient.

IFAS adopted an SGML-based approach quite recently -- having
their first tutorial in October 1990.   Watson's team developed a
WordPerfect style sheet which can be incorporated into the
authoring environment through WordPerfect's style feature and
additional pop-up menus.  Authors are offered a range of options
under the general areas of `File', `Edit', `Headings', `Mark-up',
`Tools' and `Set-up'.   Using WordPerfect's, built-in option to
reveal (hidden) codes, it is possible for authors to see how the
`styles' have been incorporated into the body of the text  -- with
embedded tags indicating where each style starts and stops.   It
is then possible to take (a copy of) the resulting files and
replace the embedded style tags with suitable SGML markup.  The
SGML files can then be processed for storage within a database,
from where they can be easily retrieved for publishing on paper,
on-line, as hypertext CD-ROM etc.

Since the summer of 1991, Watson and his team have been involved
in document analysis and DTD development.   They are aiming for
compatibility with the AAP's (Association of American Publishers)
DTD    ideally a subset of the AAP article DTD, with as few
locally unique elements as possible.  They are intending to use a
combination of custom-built and commercially available software
to handle the process of converting to, and validating, the SGML
documents.  Whilst they have identified a number of limitations
with their approach  -- non-referenced figures, the omission of
critical data from the original WordPerfect texts (eg publication
number and date), and the requirement to add some local DTD
elements -- they feel they have gained a great many benefits.
Apart from streamlining the entire production process, the IFAS
approach encourages authors to feel they have some control over
the markup of their documents, whilst simultaneously helping to
impose and maintain documentation standards.  Having documents in
SGML format facilitates interchange and re-use, and the break-
down of documents into suitable "chunks" for database storage can
now be done in an automated and controlled fashion.



3.7 "SGML User Case Study" -- Susan Windheim (Technology
    Consultant, Prime Computer Technical Publications)

Windheim described Prime Computers experience in attempting to
produce SGML-based technical documentation.  Since 1989,
Windheim's team had been investigating the possibility of
delivering their documentation electronically (in the form of on-
line access to quarterly-updated CD-ROMs) in order to improve
overall quality and gain cost savings.   However, if possible
they wanted to be able to maintain the existing authoring
environment, and also port existing documents into the new
system; they also had a number of other factors influencing their
choice of software (eg the filters/tools they would require).

In April 1991, Prime began training staff in document analysis
and DTD development, at the same time work began on producing the
necessary filters to integrate the current authoring environment
into the new system.  Text was already being produced using troff
and Interleaf -- which would both require filters -- and there was
also a need to define processor for handling indexing and
graphics, and to develop style sheets for the delivery system,
DynaText (from Electronic Book Technologies).   The goal for
Interleaf and troff documents to be filtered into a parsable SGML
form where errors could be corrected, and the text browsed via
Dynatext;  the result would also be stored in a text database for
further editing and correction.  The system was built using a
variety of off-the-shelf software such as FastTAG, the XTRAN
parser/language, and Author/Editor, as well as custom software
such as the troff and Interleaf filters, Cprograms and XTRAN
programs.

Windheim said that Prime's experiences with available SGML
products had shown that there is little choice, that each tool
has an associated learning curve, and that each has limitations
and makes compromises.  For example, they found that FastTAG
could not support some of their DTD features, and that more
system variables would be required to enable it to differentiate
between all the text objects they wanted to target.   With
Dynatext, they had encountered difficulties with horizontal and
vertical lines, dynamic formatting, and processing instructions.

With regard to their DTDs, Windheim's team found that it was
being altered too frequently - with the consequence that legacy
SGML documents no longer conformed to the latest version.  The
possible solutions to this problem included default translation
of markup that did not directly map to that defined in the latest
version of the DTD, editing old documents so that they would
parse, writing a more lenient DTD, or creating alternate DTDs
(rather than relying on developing the `same' one).  Other DTD
issues included whether or not Prime should deviate from SGML's
Reference Concrete Syntax and whether they should tailor their
DTD (s) to take in to account the limitations of their off-the-
shelf applications.

Prime's aim is to deliver their documentation on a CD-ROM --  
requiring a system to have a CD-ROM drive, x-terminal facilities,
and a PostScript laser printer.  Users will be offered full text
indexing with suitable search and retrieval tools, hypertext
links (that may be either automatically or author-generated),
user bookmarks and annotation, a WYSIWYG display, and high
quality hardcopy output.  Windheim felt that using SGML brought a
number of benefits, including the ability to search texts at the
element level, flexibility in Prime's choice of applications),
facilitating the move towards centralized document databases,
ensuring consistently marked up texts, and helping Prime meet
international requirements.



3.8 "STEP and SGML" -- Sandy Ressler (National Institute of
    Standards and Technology) [NBC]

At NIST they are looking at potential SGML uses within the STEP
standards effort    this includes both STEP (Standard for the
Exchange of Product Model Data) and PDES (Product Data Exchange
using STEP).   As with most standards activities there are a
loarge number of documents from a diverse set of sources.  SGML
seems the natural choice for integration.  They are seeking
conformance with the DTD for ISO Standards.

At the moment most documents are being written using LaTEX.  In
the future they will be aiming to convert LaTEX STEP documents
into SGML and they are starting to use Author/Editor as an SGML
input tool.



3.9 "Multi-vendor Integration of SGML Tools for Legal Publishing"
    -- Francois Chahuneau (AIS/Berger-Levrault)

Chahuneau described his company's work to develop a prototype
SGML-based editorial system for the French publishing house
Editions Francais Lefebvre. Although Lefebvre pass on SGML files
to their typesetters, they are not directly concerned with
printing and chiefly view SGML as the means to support the tools
for the users of their on-line editing system.

Lefebvre have been using SGML since 1988 and have developed DTDs
for each of their main types of document - weekly periodicals,
monthly periodicals, looseleaf central publications, and books.
These DTDs share a large number of common elements and tags, and
are required to support a variety of tasks; for example, the
looseleaf publications consist of 180 mb of textual data, of
which 20% has to be updated annually.

Chahuneau presented a diagram of the system's general
architecture -- which consisted of several (currently only
writer/editor workstations connected via a LAN to a database
server running BASISplus.  Mounted on each workstation are a GUI
(an OpenWindows application[?] to manage the user interface,
LECTOR (an on-line browser), and an Editor (SoftQuad's
Author/Editor).

Using a series of slides showing workstation screen dumps,
Chahuneau talked his audience through a typical user session.
Using the software developed by Chahuneau and his team, users are
able to freely browse the textual database, cutting and pasting
selected sections of text between browsing and editing windows.
However Chahuneau coined the phrase "cut and parse", to suggest
the concept of any attempt to paste some text into a new document
being subject to validation by a parser.   The parsing actually
relies on Author/Editor's rule-checking feature (which users can
turn off), but Chahuneau and Co had developed the X-Windows
command module to support cutting and parsing, and the
communication module to relay everything to and from the database
server.



3.10 "Developing a Hypertext Retrieval System Based on SGML"   
      -- Tom Melander (Sales Engineering Manager, Dataware 
      Technologies) [NBC]

Dataware have been contracted by the United States Government
Printing Office (GPO) to develop a hypertext retrieval system for
a large database of legal texts.  The GPO currently use their own
tagging system, but intend to convert entirely to an SGML-based
approach within the next five years.

Melander based his presentation on an analysis of a small sample
of typical text taken from the current database - complete with
embedded tags, non-printing characters etc.   Dataware have
created tables which identify each of the various textual
elements and describe their function.   They have used these as
the basis for writing a collection of C routines through which
the text can be passed - to replace any non-printing characters,
and substitute the appropriate SGML conformant markup for the
original tagging scheme.  Melander gave examples of the C code of
some of these routines, and showed how the original text was
progressively translated as it was passed through each routine.

Once all the texts have been suitably tagged, Dataware will make
them available on CD with a hypertext interface.  Melander closed
by identifying the six key additional features which the
conversion to hypertext would bring, namely a complete table of
contents, support for fielded browsing, the ability for users to
follow cross references and to insert their own bookmarks and
notes, and support for a history mechanism (though I was unsure
if Melander meant a history of document amendments, of user
activity, or both!)



3.11 APPLICATION TOPIC #1:  "Data for Interactive Electronic
     Technical Manuals (IETMs)" -- Eric Freese (Senior Technical
     Specialist, RYO Enterprises)

This presentation was primarily aimed at attendees with an
interest in CALS.  Technical documentation is estimated to cost
the USAF $7.5 billion per year, and involves in excess of 23
million pages (of which 19 million are authored and maintained by
contractors and the remaining 3 million by the USAF itself).

Freese described the evolution of the technical manual as going
through three stages:

    * The past/present situation, where documentation is created,
      distributed and used on paper.

    * The present/future situation, where digital documents are
      produced electronically and distributed on disk.

    * The future situation, where IETMs will be produced by
      creating documents electronically, stored in large 
      technical manual databases, and accessed via
      portable display devices.

Freese pointed out that even today, most documentation data is
produced as paper with only a limited amount of accessible on-
line.  By 1999, Freese expects to see about 30% of data produced
as IETMs, distributed on optical disks and accessible on-line,
with the remainder still appearing in more traditional document-
based forms.  However, by 2010 Freese believes there will have
been a shift of emphasis towards the processing information --  
with data being held in integrated databases which can be
accessed interactively.

Freese then spoke briefly about Technical Manual Standards and
Specification (TMSS), in relation to his predictions for the
future development of manuals.   Freese felt that the current
situation led only to a lack of guidance on standards for
technical manuals, and that the CALS initiative was therefore
poorly supported.   For greater interoperability between systems,
users, Government and industry, Freese argued that there would be
a need for changes to the TMSS following a co-ordinated policy
between standards organizations to ensure consistency in
specifications and standards.

Freese then outlined the main features of the role of IETMs,
these being:

    * To provide task-oriented digital data to the user.

    * To allow for the development of a non-redundant database 
      (ie where no piece of data is duplicated in several 
      places in the database).

    * To support the development of an Integrated Weapons Systems
      Database (IWSDB).

    * To provide guidelines for the style, format, structure and
      presentation of digital data.

He then went on to talk about the chief characteristics of the
data that makes up an IETM.  This display medium will be
electronic (rather than paper, as at present), and the data
primitives will include not just text, tables and graphics, but
also audio, video, and processes.  All IETM data will be marked
up with content tagging (and absolutely no format tagging) --
where each primitive is identified on the basis of the role or
function of the data it contains, rather than its position in the
logical document structure.  The data will be organised in a
fully integrated database, with relational links and no
redundancy.  The data will also be suitable for dynamic
presentation, with context dependent filtering as well as user
interaction and branching.

Freese devoted much of the rest of his presentation to a closer
examination of the layers that make up an IEMT database (IEMTDB),
paying particular attention to the generic layer.   According to
Freese, the generic layer would probably be based on the HyTime
standard, which would enable developers to draw on such concepts
as architectural forms.  He closed with a discussion of some
sample SGML markup declarations for IETM, and an examination of
how such markup might appear in practice.



3.12 International SGML Users' Group Meeting

This was the SGML Users' group (SGMLUG) mid-year meeting,
following on from the AGM held at Markup '91 in Lugano in May.
Just before the meeting Pam Gennusa, the SGMLUG chair, had asked
me if I would be prepared to say a few words on the release of
the ARC SGML Parser materials - which were originally distributed
from The SGML Project at Exeter University.  I was, therefore,
somewhat surprised when I was formally introduced as the first
speaker (and it's always tough being the warm-up act).

I began by apologizing for the slight hitch that had occurred
when we first tried to release the parser over the academic
network (at the behest of the SGMLUG and the material's anonymous
donor).  I briefly outlined what I knew to be in the collection
of materials, and mentioned some of the methods and locations
from which they could be obtained.  I took the opportunity to
point out that the SGML Project is not actively engaged in any
SGML development and that our familiarity with the code and use
of the Parser materials is actually very limited.  I expressed
some regret at the fact that the original intention for the SGML
Project to act as a clearing house for bug reports and ports
seemed to have not been realized in practice -- however, I was
very gratified to learn that the materials were now widely
disseminated and widely used amongst the SGML community.  The
only port of which I had been directly notified , was that
performed by James Clark to a UNIX environment (from DOS) -- but
other information had been sparse.  Following these remarks, two
other attendees said that a port to the Apple Mac environment was
nearing completion, and that there appeared to be a bug in the
ARC SGML parser relating to the use of CONREF in a HyTime
document.  [Since the conference, I have received no further
information about either of these points].

Pam Gennusa then gave a brief account of how the AGM's decision
to employ a part-time secretary for the secretariat functions of
the SGMLUG had been carried out.  Ms Gaynor West now fills this
role.  She also raised the general possibility of re-structuring
the SGMLUG in some way other than the current system of national
chapters and special interest groups (SIGs).

There then followed reports from the various chapters and SIGs,
starting with  representative from the Dutch Chapter (Mark
Woltering).  He reported that they now have about 120 members and
were able to hold two meetings in the past twelve months -- both
of which have been well-supported and active.  He also stated
that the Dutch Chapter is heavily involved in the European Work-
group on SGML (EWS) and had helped to develop the MAJOUR header
DTD which was released at Markup '91.  The EWS was now working on
the creation of a body and back matter DTD but was unsure whether
they should risk creating their own DTD `from scratch', or absorb
the relevant parts of the AAP's DTD (and try to influence future
amendments to that).

The representative from the Japanese Chapter (Makoto Yoshioka)
gave a brief outline of the rather different composition of their
Chapter consisting of forty large companies and organisations
rather than individuals, each paying a subscription in the order
of Y500,000 (which evoked a mixture of gasps and sobs).  Yoshioka
had brought with him two products which members of the Japanese
Chapter had developed and were prepared to distribute gratis to
members of the SMLUG.  The first is SGF (Simple SGML Formatter),
which is a DOS [Windows?] -- based application that formats
documents with embedded SGML markup for output on, say, an HP
laserjet printer (or for previewing on screen).  Yoshioka said
that SGF is still being developed, and that additional printer
drivers would probably be added, yet he hoped to be able to
distribute the executable code with documentation in English very
shortly (with the source code available at a later date).

The SGMLUG meeting overran its time allocation and was continued
the following evening -- when Yoshioka gave a demonstration of
SGF;  unfortunately, I missed much of his presentation but hope
to obtain a copy from the SGMLUG (for possible distribution, with
permission).  The other product developed by the Japanese
Chapter, vz, was described as an `input editor'  and if there was
a demonstration of it I neither heard or saw anything. However, I
was interested to learn that the Japanese make use of the ARC
SGML Parser.

The only other reports that I heard were from various North
American groups/chapters:  The "SGML Forum of New York" reported
that they had held their first meeting in April, which had
attracted about 50 attendees.  The decision was taken to allow
both individual and corporate members, of which there are around
twenty that are fully paid-up;  they also decided to form a non-
profit making company.  On September 24th, the forum held the
"SGML and Publishing Case Study", which also attracted about 50
people.  Their short-term goals include the setting up of an
electronic bulletin board to disseminate information on SGML,
sample DTDs, the ARC SGML Parser, and down-loaded postings from
the newsgroup comp.text.sgml.  They are also intended to co-
sponsor a one-day introductory seminar on SGML in conjunction
with the Electronic Publishing Special Interest Group (EPSIG).

Other reports came from the Canadian Chapter, who now have a
paying membership of about 15 (mainly from the publishing and
pharmaceutical industries) who meet quarterly, the Mid-West
Chapter -- who are sponsored by Datalogics [?] and due to start
meeting on October 29th -- and the [Washington?] DC Chapter, who
are sponsored by UNISYS [?].  Several other reports were probably
made at the second half of the SGMLUG meeting, and I refer
interested readers to the next SGML User's Group Newsletter.



REPORTS FROM THE FRONT -- various speakers


4.1 "OSF's Pursuit of DTDs" -- Fred Dalrymple (Group Manager,
    Documentation Technology, Open Software Foundation)

Dalrymple stated that the objective of the OSF (Open Software
Foundation) was to "Develop one or more DTDs that enable document
interchange among OSF, member companies, technology providers,
licensees".  He also pointed out that although OSF had been
responsible for developments such as Motif, they no longer saw
themselves as tied only to the UNIX environment.  Following their
recognition of the need to interchange documentation, the OSF had
originally opted to use troff (with mm and ms macro packages) but
following negative feedback from their members they had been
forced to revise this decision in favour of SGML.

In the first quarter of 1990, OSF put forward a proposal and
began implementation of SML (the Semantic Macro Language).  This
was effectively another macro package designed to replace the mm
and ms specific macros with generic structural markup.  SML was
only intended as a provisional and temporary approach to markup,
and in December 1990 the OSF issued a request for DTDs. By April
1991, the OSF had received DTD submissions from ArborText (tables
and equations), Bull, CERN (based on IBM's BookMaster),
Datalogics (AAP), DEC, HP (HP Tag), and IBM (based on
BookMaster).

Following presentations on the DTDs, the available SGML software,
and from various industry experts, the OSF organised a series of
subgroups to examine issues in more detail and produce position
papers.  By August 1991, it was necessary to refine the position
papers and produce a requirements matrix.  Since October 1991, a
design group has been working on the OSF's DTD's) -- exploring the
practicalities of certain issues, and reporting to the various
subgroups.  The group is using a combination of top-down and
bottom-up approaches to try and identify the common elements
shared by the proposed DTDs, however their main emphasis is on
facilitating document interchange.  To this end, they have
decided to adopt the reference concrete syntax and permit no
markup minimization -- although they may extend the permitted
length for names, attributes, literals etc.

Dalrymple said that the next phases of work for the OSF will
involve the creation of an analysis matrix, the specification of
the OSF DTD, followed by its implementation, documentation and
eventual publication.  The OSF are keen that theirs should be the
DTD that people will automatically think of in connection with
writing any computer-documentation and they want to ensure that
its specification and implementation will prove satisfactory for
the requirements of OSF members.

Initially, the DTD will only be distributed amongst the OSF, but
later they may release it in to the public domain; in either
case, the OSF wants to set up a body to ensure that the DTD will
be properly maintained.  The OSF are also aware that they have
yet to consider any formatting issues, such as FOSIs (Format
Output Specification Instances) or the use of DSSSL (Document
Style Semantics and Specification Language).



4.2 "The Text Encoding Initiative: A(nother) Progress Report"   
    -- Lou Burnard (Co-ordinator of the Oxford Text Archive, 
    Oxford University Computing Service)

This presentation served as both an introduction to the Text
Encoding Initiative (TEI), and a progress report for those
already familiar with its work.  Burnard had a great deal of
information to get across in a fairly limited amount of time, and
despite his energy and enthusiasm (and pleas to the conference
Chair) he was unable to get through all his slides.  [This seemed
somewhat unfortunate, given that those involved in the TEI have
devoted a great deal of effort to the problems of marking up real
texts, and many attendees might have gleaned some useful tips if
they had been given the opportunity to hear about the TEI's work
in a little more detail].

The TEI is "a major international project to establish standards
and recommendations for the encoding of machine readable textual
data" (as used by researchers largely in the Humanities).   Its
main goals are to facilitate data interchange and provide
guidance for text creators, and to this end TEI has produced
guidelines which address both what to encode, and how to encode
it.  When looking for a text encoding scheme the TEI wanted
something which had wide acceptance, was simple, clear, and
rigorous, was adequate for research needs and conformed to
international standards.  It also needed to be software,
hardware, and application independent.  As far as the TEI were
concerned, SGML was the only choice.

Burnard then gave a brief overview of the organizational
structure of the TEI, and outlined the main achievements and
activities prior to the conference.   He drew particular
attention to the publication of TEI P1 ("Guidelines for the
Encoding and Interchange of Machine-Readable Texts"). which had
provoked a variety of reactions since its first release in July
1990.  Burnard also outlined the procedures which would lead to
the publication of the second version of the "Guidelines" (TEI
P2) in January 1992, followed by TEI P3 in April 1992, and the
final version, in June 1992.  He gave an indication of the sort
of work carried out by the TEI's committees and work groups, and
noted with regret that the following areas would probably not be
covered satisfactorily:  physical description of manuscripts,
analytic bibliography, encyclopaedias, directories and other
reference books, and office documents.

Burnard stated that most of the reactions to TEI P1 fell into one
of the four following types:

    * It is too literary/linguistic/specialist etc and pays 
      too little attention to my own needs.

    * It is technically too complex/not complex enough

    * It is not didactic enough

    * It violates textual purity

He then listed the six most frequently asked questions addressed
to the TEI, along with the current standard replies.  Thus, the
TEI hope to make the "Guidelines" available in electronic form,
but have not yet done so.  People are free to use SGML's
minimization features, but in order to conform to the TEI's
"Guidelines .." must not do so in any document which is to be
exchanged between machines.  The TEI will be enforcing its
decision to adopt the ISO646 subset.  The TEI gives users the
freedom to select from its standard tag set, and does not require
people to use more than they need.  The TEI is working on
providing a simpler version of its "Guidelines .."/tag set [?]
for beginners.  The TEI have yet to make a decision on what
software it should recommend.

Burnard gave an indication of what TEI P2 would contain.  It will
contain some Tutorial Guides covering the theory and practice of
markup and how SGML can help (giving extended examples) as well
as a "barebones subset of P2"  (There will be one generic
Tutorial Guide, and several lexicographers, theoretical
linguists, discourse analysts, textual critics, and so on.)  A
revised version of the "Guidelines ..." (TEI P1) will form the
main part of TEI P2, and will consist of formal prose, an
alphabetical reference section, and some DTDs.  The formal prose
will introduce the basic notions, contain three major sections
(the core and default DTD base, alternate DTD bases, and DTD
toppings), and a section contents (of formal prose specifications
for related sets of textual features).  The alphabetical
reference section of TEI P2 will be modelled on FORMEX and
MAJOUR, with a generic identifies, a definition, and some
indication of optionality being given for each element and its
attributes.  Moreover, the parentage, content, defaults, and the
semantics of values are given for each attribute.  However, TEI
P2 will still not offer or recommend any software.

Burnard summarized the TEI's approach to DTD design, and defined
some of the fundamental concepts and preferences that had been
adopted (eg the notion of bound and floating elements, the
concept of crystals, the alternation style of element declaration
etc).   Discussion DTD design in more detail, Burnard gave the
rationale behind the TEI's decisions; comparing the process to
designing a restaurant menu, he made the following observations:

    * Adopting an a la carte (or mix`n'match) model -- gives 
      users maximum freedom to choose all and only the tags 
      that they want.  However, it makes document interchange
      difficult.

    * Choosing a menu (or set meal) model -- offers minimal
      freedom and is highly prescriptive.  Of course, this 
      makes document interchange very reliable.

    * Using a pizza model --this gives users "freedom within 
      the law", ie a limited ability to add or change 
      tags, and makes document interchange fairly reliable.

Having adopted the `pizza' model approach, the TEI then decided
to offer several types of base DTDs (either the TEI core, or a
DTD suitable for spoken text, lexicography, mixed from text etc)
in conjunction with an appropriate choice of `topping (s)' -- for
example hypertext, textual criticism or linguistic analysis.

Burnard then discussed the solutions that he believes the TEI has
to offer.  For users in general, it offers the following:

    * a single coherent framework in which interchange can be
      carried out

    * a set of tools and principles for user-defined extensions

    * a standard for documenting the content and structure of
      electronic texts.

For those involved with literary and historical applications, it
gives:

    * sets of general purpose structural tags adequate to most
      surviving written material produced in the Western world 
      during the last 2000 years

    * a way of embedding arbitrary levels of interpretative
      annotation with WYSIATI [What You See Is All There Is?] 
      texts

    * a way of implementing an electronic variorum, in which 
      every instantiation of a given text can be represented in 
      parallel

Lastly, for linguistic applications, Burnard felt that the TEI
offers

    * ways of structuring and documenting the contents of large
      textual corpora, and of guaranteeing their re-usability

    * ways of aligning and synchronising distinct components of
      transcribed speech

    * powerful general purpose tools for the unification of
      linguistic analyses.

Judging from his collection of slides, Burnard had also wanted to
talk about the TEI's approach to spoken language and the use of
feature structures, but he was not given enough time.  He would
also have liked to mention the work of the twelve work groups --
reporting on their objectives and status as of 1st October 1991.
Burnard closed by urging everyone at the conference to contact
TEI if they either wished to subscribe to the distribution list
TEI-L and/or to obtain a copy of TEI P2.



4.3 "TCIF IPI SGML Implementation" -- Mark Buckley (Manager,
    Information Technology, Bellcore)

Buckley's presentation was substantially revised and bore little
relation to the material contained in his handouts (see below).
However, the following extracts from his handouts may be of
interest to some readers:

"The Information Product Information (IPI) Committee of the
Telecommunications Industry Forum (TCIS) has emerged from a
recognition of the need for members of the telecommunications
community to exchange electronic forms of technical information
and the consequent need for the voluntary adoption of standards
that facilitate such exchange.

In 1990 the TCIF IPI Committee recommended SGML for use where
appropriate to facilitate the exchange of document text and began
work on a Document Type Definition suitable for descriptive
Telecommunications Practices, a common type of technical
document.  Draft 4 of the Telecommunications Practice (TP) DTD,
the first public draft, was published in October of 1990 and
distributed at SGML '90.

Draft 6 of the TP DTD descends directly from Draft 5.  Changes
reflect the input of numerous comments and experiments with the
earlier draft and have been made primarily to allow for:

    * easier modification for use in internal corporate 
      languages and for use in various SGML tools available 
      from vendors

    * greater flexibility in tagging

To these ends,

    * We have stripped most internal comments from the DTD file.

    * We have consolidated the separate DTD and ELEMENT files into
      one DTD file.

    * We have redesigned content models to be descriptive rather
      than prescriptive.  The new content models impose very 
      little in the way of structural requirements.  It is likely
      that TCIF or specific document producers and recipients will
      evolve stricter requirements with experience.  Such constraints 
      may be reintroduced into the DTD when appropriate or may be 
      validated by processing applications.

    * We have added four general data content elements that greatly
      reduce the size of the compiled DTD in most environments.

    * We have further reduced the size of the DTD by consolidating
      what were separate elements into fewer general elements 
      with "type" attributes.

Above all, the DTD [has] been redesigned to function as a DTD
specifying the syntax of a language meant for intercorporate
document exchange.  It is not likely to serve as the foundation
of an internal SGML application without modification".

Rather than simply duplicate the material contained in his
handouts, Buckley chose to set the decision to design the TP DTD
in context.  He gave a detailed description of the nature and
problems of the relationships between the numerous (American)
telecommunications companies (and groups of such companies).
There are a great many restrictions on the information that can
be passed between groups/companies, and even if there were not,
most of the companies are so large that they face real problems
when trying to interchange documents between different
divisions/departments.



4.4 APPLICATION TOPIC #2: "Rapid DTD Development" -- Tommie Usdin 
    (Consultant, Atlis Consulting Group).

In her presentation, Usdin examined the problems of DTD
development, the Joint Application Development (JAD) methodology,
and how JAD can facilitate rapid DTD development.  She saw the
latter as important because it is expensive, represents a
recurring cost and can result in much wasted time and money as
DTD often change rapidly when put into production.

Usdin put the root case of problems with DTD design down to the
fact that it is being left to the wrong group of experts.  She
argued that in fact it is users who know most about how documents
are created, the relevant document parts (their names and
definitions), the rules governing document structure, an the
final uses/purpose of the information.  Usdin saw the root cause
of the problem as being management's decision to ask SGML experts
to perform document analysis, rather than the genuine document
experts (namely, users).  Usdin did not wish to imply that SGML
experts are inept, merely that they are forced to rely on
interviews with users, frequently inadequate/poorly selected
samples etc, in order to produce the DTD that they believe is
required.  Not only is DTD development time-consuming and
expensive, outsiders (such as an SGML expert) tend to use
language and examples with which users are unfamiliar.
Furthermore, since an SGML application is only as flexible as its
DTDs, these will often have to be fine-tuned to meet user's
(real) needs  -- and any rules enforced by the application may be
resented and circumvented by irritated users.

The solution would appear to be to allow users to develop any
DTDs themselves.  Yet this is clearly impractical since they do
not know, or want to learn, the intimate details of SGML syntax.
Even if this were not the case, they would almost certainly lack
the necessary experience of DTD development, and each would have
only a limited view of the document life cycle.

Usdin likened DTD development to systems development -- with
conflicting requirements coming from many users who only know
part of the overall requirements coming from many users who only
know part of the overall requirement.  The value of an
application depends heavily on the quality of the stated/perceived 
requirements, and user hostility can quash an otherwise good product.  
Usdin characterized the traditional method of systems requirements 
development in terms of the following steps:

i)      Serial interviews of users

ii)     Requirement document sent to users for comment

iii)    Conflicts in requirements resolved by systems
        analysts based on:
                - sequence in which requirements received
                - authority of requestor
                - ease of implementation

iv)     Comments incorporated as received.

She then displayed a pie-chart showing the reasons for code
changes under traditional systems development --  which showed
that about 80% were due to problems with either system design, or
the requirements and analysis (the latter pair accounting for
around 60% of all changes!).  Usdin felt that this highlighted
the important role of requirements definition -- not only at the
development stage, but also during maintenance.

As a solution to the problem of requirements definition, Usdin
offered "Joint Application Development (JAD) --  highly effective
approach to gathering user requirements completely and
efficiently, as well as reconciling differences among conflicting
user requirements.   Usdin described JAD as a "one time one
place" highly structured forum for decision making -- a workshop-
based process where requirements are stated in the users'
vocabulary, and which is led by a facilitator trained in
communications, system analysis, and group dynamics.  Usdin made
some very impressive claims for the JAD technique -- reductions
of up to 50% in the time needed to define user requirements and
functional specifications, improvements of up to 33% in the
accuracy of user requirements and design documents, increased
commitment to the software from users and management, improved
system usability and a reduction in maintenance plus enhanced
communication between systems, designers and users.  She felt
that the SGML community could learn from the experiences of the
Systems development community, the benefits of exploiting user
knowledge, the advantages to be had from using meetings to
extract information, build consensus, and create a feeling of
`user' ownership'.   Usdin felt this approach was faster and,
consequently, better;  it would also be more satisfactory for
management, who want to see results rather than processes.

Usdin summarized rapid DTD development as "A method of harnessing
users' knowledge of their own documents, information management
process, and needs. Instead of having consultants learn your
document structures, let your users do the document analysis".
Its main goals are to reduce the time and cost both of DTD
development, and the customization when new DTDs are brought into
production.  Based on JAD, rapid DTD development is an
interasive, highly-structured workshop-based process, in which
users are encouraged to adopt a top-down approach to document
analysis, and their consensus is sought at every stage.
Information is collected via a forms-based approach -- where
users are asked to complete forms detailing all the information
relating to the document elements that they want and/or have
identified (giving examples of the elements in context etc.)

Usdin then described the workshop process in more detail.  She
said that a rapid DTD development should consist of no more than
fifteen people, and should include the SGML analyst/facilitator,
authors, editors, production and system staff.  the role of the
facilitator (normally the SGML analyst), is to plan and manage
the workshop process, create e and explain the forms used, to
facilitate discussions, and help participants create acceptable
compromises and build consensus.  A typical workshop agenda (for
the work to be done by the users) involves the following steps:

        * Define the document type

        * Select application standards

        * Decompose document into elements (top-down)

        * Define, name, and document each element

        * Describe element model and presentation format

        * Identify, define and name attributes

        * Identify, define and name general entities (ie what 
          users might call "boilerplate" text).

For users who are authors and/or editors, in conjunction with the
steps mentioned above they should also be encouraged to select
tag names (since they are the people who are most likely to deal
with them directly), and to identify element relationships.

The production and systems staff need to ensure that adequate
information is captured for presentation, and also to identify
elements needed for routing,control and management.  They should
also identify any systems constraints and learn the vocabulary,
needs and concerns of other users.  The role of the SGML analyst
is to record the results of the workshop as a DTD suitable for
parsing and validation;  however, at this stage the DTD must be
revisable in accordance with user specifications (ie users must
not be forced to comply with the DTD).   The SGML analyst must
ensure that the process of document analysis is completely
thorough (reminding participants of such factors as database
requirements, future uses of the information, and non-printing
elements -- such as routing, tracking and security information).
Moreover, the analyst must ensure that the DTD conforms to any
relevant standards (eg CALS), and must provide full documentation
including a tag library, an hierarchical [tag] listing, and
alphabetical indexes to element and tag names.

Usdin noted that rapid DTD development works most effectively
when there are:

        * Established documents

        * Users who are knowledgeable of their own documents,
          applications, requirements etc.

        * An experienced SGML analyst/JAD facilitator

In the light of the above, Usdin remarked that rapid DTD
development would therefore be inappropriate for:

        * Start-up operations

        * Completely new document types

        * Where there were no experienced users/authors

        * Where there was no access to the users/authors

        * An unstructured approach to data gathering

        * When the facilitator lacks SGML knowledge or JAD
          experience

However, when it is possible to product DTDs using the rapid DTD
development process, they have several inherent advantages.

For example, the DTDs are based on the way users create,
manipulate and use information, and employ user defined names for
elements and attributes.  Such DTDs reflect the appropriate level
of detail needed for current and planned products and are ready
to use and fully documented in weeks (rather than months).  From
a management point-of-view, rapid DTD development techniques save
time and money by making the DTDs available sooner and by
reducing user distrust and hostility (since they feel they played
a part in the analysis and "own" the DTD).  Moreover, training
costs are reduced because staff are already familiar with the
names, vocabulary and examples used;  integration costs are cut
because systems concerns have been addressed during the
development phase (meaning that less technical fine-tuning needs
to be done).  Finally, Usdin asserted that rapid DTD development
can produce a DTD within two to six weeks -- as opposed to the
two to six months of traditional methods;  furthermore, it is
much cheaper for a business to use its own staff time than to pay
the fees of an SGML consultancy!



4.5 POSTER SESSION #1: "Tables in the Real World" -- various
    speakers

The idea of the poster sessions is for several speakers to give
simultaneous presentations on topics that relate to a general
theme.  In the time available, each speaker repeats his/her
presentation, fields questions from the audience -- who are free
to move from speaker to speaker as they wish.  Whilst the idea
seems attractive, it was difficult to give serious attention to
more than a couple of presentations, the better speakers tended
to attract (and keep) the largest audiences whatever their
topic), and the timing of the presentations quickly got out of
synch.  What I heard was generally worthwhile, but I would put in
a plea to the conference organizers for more draconian time
management in future.

The theme for the poster session was briefly introduced by Eric
Severson, who raised the same question as he had (apparently)
asked last year -- are tables graphical objects or multi-
dimensional arrays?  The speakers, and a rough guide to their
topics were as follows:

        * Bob Barlow        CALS (DoD): The CALS table tagging
                            Scheme

        * Mark Buckley      TCIF: tables rely on being two-
                            dimensional to convey information, 
                            it is not practical to concentrate 
                            primarily on the data held in tables;  
                            the TCIF have no general solution
                            for handling `non-representational'
                            tables.

        * Joe Davidson      Handling tables practically

        * Peter Flynn       How should statistical packages
                            import/export SGML tables?

        * Thomas Talent[?]  The `Oakridge' approach to
                            producing tables.

        * Ludo Van Vooren   Format-oriented vs content-based
                            approaches to tables.



4.6 "Handling Tables Practically" -- Joe Davidson (SoftQuad Inc)

Davidson began by stating SoftQuad's view of tables -- that they
are information objects with specific visual characteristics.
The visual characteristics play an inherent part in an
information object's ability to convey information,, in order to
support this assertion,  Davidson cited the difficulties of
trying to describe a table to someone who cannot see it (eg over
the telephone).  [Although it did seem to me that Davidson was
rather blurring the distinction between trying to convey the
information contained in a table, and the way that the
information was formatted for clear presentation ie moving away
from a logical view of the text/table towards a more
presentation-oriented view].

Davidson then discussed how tables can be coded in SGML for
display purposes -- using attributes to code, the number of
row/columns etc.   However, he also asserted that table design is
a very diverse and individual, and it is difficult to design a
tool that can format an on-screen representation of a table for
any and all possible SGML table DTDs.  Therefore, SoftQuad have
decided to work on the premise that there are basic features
which are common to the vast majority of tables, and they have
then designed a tool to cater for these features.  Davidson then
used a series of slides to demonstrate a user session with
SoftQuad's table editing tool -- which will be available as an
add-on to Author/Editor.

To create a table from scratch using the table editor, a dialogue
box is called up and the user is prompted to answer several
questions (eg number of rows/columns etc.)  When this process is
complete, the template of the table appears as a graphical object
on the screen, and the user is free to enter data into the
available cells.  The user also has the option to view the whole
table as data and raw SGML tags, should s/he so wish.  It is also
possible to import valid SGML marked up tables into the current
document, and have the table editor display the information in a
suitable graphical form.

Apart from entering valid data into the cells of the graphical
representation of the table, it is also possible to perform cut
and past operations -- moving selections of cells, rows etc to
any valid new location.  It is even possible to perform invalid
cut and past operations, but only with Author/Editor's rule
checking facility turned off (since this performs on-the-fly
validation against the document's DTD).  [If stated, I missed
quite how the validation (and, therefore, DTD extension?)
relating to the table integrates with the DTD which governs the
rest of the document -- since surely it would not always be
advisable to give users the freedom to personalize a standard'
DTD in this way].  Using the table editor, it is possible to
alter the structure of an existing table, with the restriction
that no table can have more than 64K rows or 64K columns.  Scroll
bars make it possible to view large tables on screen, but this
might be a little tedious with very wide tables.



4.7 "TCIF Approach to Tables" -- Mark Buckley (Manager,
    Information Technology, Bellcore)

I came in mid-way through Buckley's presentation -- which took the
form of continuous discussion rather than a cycle of main points.
From the bulleted points noted on his flip over chart, the TCIF's
approach to tables (in draft 5/6 of their Telecommunications
Practice DTD) was:

        * Do what you can

        * Pass any information that is cheap to produce and 
          that may be of use to the document's recipient

        * The TP DTD should be able to support this kind of
          activity, ie passing information about tables 
          in a standard way -- using attributes.



4.8 "Format-oriented vs Content-oriented Approaches to Tables"   
    -- Ludo Van Vooren (Director of Applications, Avalanche 
    Development Company)

This was another presentation which was in `full-flow' by the
time I arrived.  Vooren argued that the logical structure of any
table is, in fact, a multi-dimensional array.  As with an array,
any part of the table can be accessed (and the information
retrieved therefrom), by giving unique sets of co-ordinates.
Thinking of, and marking up a table in this way, makes it much
easier to manipulate the information contained in that table;
for example, it could be possible to extract the data contained
in a user-defined selection of cells from the table, and re-
present that data in the form of a pie chart.  Adopting a format-
based approach to table markup, makes it much more difficult (if
not impossible) to manipulate the data contained in a table in a
way that is equally useful.

However, the content-oriented/multi-dimensional array approach to
table markup, raises a number of significant issues.  For
example, DTD designers will need to know very clearly how and for
what purposes the information contained in a table will be used.
Moreover, it will not be an easy task to educate users to think
of tables as multi-dimensional (logical) arrays, rather than as
graphical objects.  Also separating the form and content of
tables highlights the particular difficulties of trying to
represent such logical objects in a formatted form suitable for
presentation on screen or paper.

As I left the discussion, Van Vooren had just raised the
interesting question of whether or not we should treat forms as
simply a sub-group of the type `table' -- and the implications
this might have for information processing.



4.9 "How Should Statistical Packages Import/Export SGML Tables?"
    -- Peter Flynn (Academic Computer Manager, University College,
    Cork)

I only caught a little of Flynn's remarks in passing -- and I
think it was regrettable that there were not more
statistical/spreadsheet package manufacturers attending the
conference (let alone Flynn's poster session).  He seemed to be
arguing that depressingly little attention had been paid by the
SGML community and developers to the problems involved in
importing and exporting into and out of statistical packages.
Although there seems to be general interest in, say, the problems
of using SGML files with popular workprocessing packages such as
WordPerfect or Microsoft Word, Flynn said that he had heard very
little about developing SGML features for packages such as Lotus
1-2-3 or SPSS.  [Feedback to the SGML Project has been notably
sparse on this area also].



"FORMATTING ISSUES AND STRATEGIES" --  Various speakers


4.10 "Formatting -- Output Specifications"  Kathie Brown (Vice-
     President, US Lynx)

Brown began by describing where the creation of an output
specification fitted into the overall SGML composition process.
Her main theme was that many SGML implementations are being
delayed by the lack of suitable tools for creating and using
output specifications.

Brown defined an output specification as something which relates
source elements and a set of style specifications.  It uses
element tags and the source tree structure to locate places where
a style should be applied or changed, or other processing
initiated.  Furthermore, the output specification should be
written in a system-neutral language, and employ standard terms.
Brown also suggested that an output specification addresses a
number of other problems -- it describes document flow rules as
well as local formatting, it describes page layout and directs
source rules for generating and re-ordering source content, and
it defines all the logical operations necessary for resolving
values and initiating logical routines.

Brown then identified what an output specification must describe:

        assembly and ordering of final document text
        - generation of implied content
        - reordering of content
        - replication of content for extractions, strings etc.
        - suppression of content
        - denoting the style in which text elements should appear
        - assembling into text blocks
        - numbering of elements

        description of page models
        - media characteristics
        - page size, orientation, and margins
        - imposition
        - layout of main text
        - other page areas, including relative placement
        - page ruling
        - repeating page elements, including graphics and
          extractions
        - page order
        - associated blanks

        composition rules and style characteristics
        - area/text flow and rules for area/text fill
        - arbitration rules for competing values
        - rules for placement of tables and figures etc.
        - page and column balance
        - rules for handling footnotes and similar structures
        - ordering of document values

        assembly into printable document and other finishing
        operations
        - numbering pages
        - ordering of values used in page marking
        - insertion and treatment of blank pages
        - rules for generating and formatting empty elements
        - handling of separate volumes or parts
        - effects of document binding on output

        handling of local elements in the source (overrider)
        - overriding document-wide values
        - overriding element attributes values
        - rules for reformatting tables if necessary
        - changing row/column relationships
        - adjusting graphic sizes
        - suppressing source elements

        how to handle non-parsed data types

Brown went on to identify those characteristics which she
believed would form the basis of a good output specification.
It should be neutral to DTDs, and not based on assumptions
inherent in a single DTD or class of documents.  A good output
specification should also be widely used (to promote the writing
of proprietary output drivers), and also easy for an author to
understand, use, and retrace his/her steps.  It should be
structured so as to ease the building of automated authoring
tools -- and if it were machine structurable, it would be
possible to build output maps directly from the output
specification.  A good output specification would also be
characterized by stable syntax and semantics.

Brown noted that output specifications could be written to
support many other structures than only those required for paper-
based printing.  For example, output specifications could be
written to support database loading, hypertext; illustrated parts
lists (and similar derived tabular structures), indexes of
abstracts or paragraphs, screen displays, or on-line database
loading and screen display.

Brown identified a number of issues directly relating to the
output of SGML.  In addition to decisions concerning the use of
character sets and fonts, there are semantic questions which
require resolution (eg the identification of "significant"
element names, passing formatting values in attributes, the
standardization of style specification labels, and the production
of proprietary specifications).  There are also logical problems
to overcome (eg counter management/output, extraction of
content/attribute values, extraction of computed values/strings,
graphic placement etc) and what Brown referred to as "fidelity
guarantees" and "language biases".  She also identified
"geometric issues" relating to SGML output -- which concludes
such matters as the declaration of a co-ordinate system,
dimensioning, the description.invocation of relative areas, and
the windowing/scaling of graphics.  She also briefly mentioned
the overheads to be considered when revising output
specifications.

In the process of moving from an output specification to final
composed pages, Brown suggested that several new elements are
introduced as part of the transformation process, which require
consideration.  These new elements included such factors as file
management, the use of proprietary syntax/coding, graphic
anchoring and graphic file format conversions, how to handle page
elements such as headers, footers and folios, how to cope with
system executable functions, document assembly routines, and the
resolution of values.  The sorts of values which might require
resolution, Brown identified as : [and] numbering, specific
references to font/character locations, spacing specifications,
entities, attribute values, cross-references, external
references, variables, footnotes, any system-invoked constraint
rules, graphic file referencing and the setup of tabular
material.

Brown then summarized the current situation comparing the SGML
users' situation to that of the characters in Beckett's play
"Waiting for Godot" -- like Godot, DSSSL's imminent arrival is
always anticipated, but never actually happens!)  Brown stated
that there are currently available a number of programming
libraries that allow links with parsers to enable custom
programming.   There are also a few commercially available
specified languages to develop custom applications.  In addition,
there are some proprietary solutions that support specific
classes of documents, as well as FOSIs written within systems
(many of which support specific DTDs).  Lastly, there are what
Brown referred to as "Combinations of specialized languages
linked to parsers and programmatic approaches to file management
and value resolution".  However, Brown suggested that all these
approaches (and DSSSL engines) depend on knowing the composition
system at a very low level.

Looking ahead, Brown anticipated that arrival of WYSIWYG output
specification writing tools, as well as user-friendly mapping
development tools, She also predicted the production of automatic
output-specification interpreters and DSSSL engines.  Yet Brown
stressed that each of these developments would depend on first
having a stable out-put specification or an DSSSL.

Brown concluded by summarizing what her company (US Lynx) offered
in terms of solutions for working with SGML and output
specifications.  They offer general consultation, project
management, and custom programming solutions as well as DTD and
FOSI development.  They use/have developed [it was unclear which]
a "context-wise" SGML document tagger, which offers context-
sensitive transformation of exisitng documents into SGML
documents.  They use/have developed [it was unclear which] an
"instance imager output system" that offers a modular structure
-- revision and editorial station, graphic editing station, and
output driver -- and an "Instance Imager" development kit for
writing drivers.  Brown said that US Lynx also use/have developed
[it was unclear which] a technical manual output specification
DTD which can handle most paper output specifications, and offers
assistance for custom or proprietary implementations.



4.11 "Formatting as an Afterthought" -- Michael Maziarka
     (Datalogics Inc., Chicago, Illinois)

Marziarka began by proposing that early DTDs were written with
the goal of publishing in mind, and so were optimized for
presentation.  This mean that DTDs were written for almost every
type of document, purely on the basis that each `type' had a
different look.  It was unclear what should be regarded as
formatting information, and what as structural information.

The result of this confusion, Maziarka suggested, was that too
many DTDs were being written;  the different appearances of
various documents were confusing the issue of whether or not they
contained different (types of) information.  However, Maziarka
felt that people's goals were now changing -- with the
realization that the money is in the data, not the paper on which
the data is formatted and stored.   SGML provides a tool to
manage data, and Maziarka argues that publishing should be seen
as a by-product not the major goal of SGML use.  He identified
four goals for data management with SGML -- to manage information
rather than pages, to author the data once only (and then re-use
it), to provide easier access/update/use of data, and to produce
data (rather than document) repositories.

Maziarka stated that modern DTDs are written to optimize data
storage, with data stored in modules and the use of boiler-plate
text, in order to eliminate redundancy.  Moreover, multiple
document types can be derived from a single storage facility, by
using one "modular" DTD.   However, Maziarka then raised the
difficulty of moving data from its stored SGML form into an SGML
form suitable for formatting.

Is it possible to format directly from the "storage" DTD (ie is
the data sequenced correctly for linear presentation, and/or do
links in the data need to be resolved during the extraction
phrase)?   Or, must you try and extract data from a "storage" DTD
into a "presentation" DTD?  To highlight the difficulties with
both approaches, Maziarka then presented a number of practical
examples, (including problems relating to the use of a "storage"
DTD, and the handling of footnotes and tables).

Maziarka suggested that "storage" DTDs should be carefully
written so as not to hinder formatting (or movement into a
"presentation" DTD).  Similarly, "presentation" DTDs must be
written in such a way as to ensure their compatability with
"storage" DTDs.  Maziarka also put in a plea for the production
of more sophisticated extraction utilities, since formatters
require data in a linear fashion (ie in the order it will appear
on the page) to be at their most effective.  Maziarka concluded
with a repetition of his calls for DTDs that serve both storage
and formatting purposes, and for more sophisticated extraction
routines.



4.12 "Native vs Structure Enforcing Editors" -- Moira Meehan
     (Product Manager CALS, Interleaf)

[No copies of Meehan's slides were made available, and her
presentation was not always easy to follow -- therefore I cannot
guarantee that I have represented her opinions correctly].

Meehan began by outlining "native" SGML editing, which
implies/involves the processing of SGML markup, the storing of
the SGML markup alone, and the manipulation of advanced SGML
constructs.   However, she suggested that SGML may not guarantee
consistent document structure, since identical fragments can
yield inconsistent document structures, which in turn has
implications for editing operations.   Unfortunately, none of
these assertions were supported by examples.

Meehan, (citing an ANSI document [ANSI X3VI/91-04 (Appendix A -
ESIS) ??]), suggested that there are two types of SGML
application    object-oriented systems that can control
structures, and WYSIWYG systems that facilitate processing.  She
then stated that Interleaf 5 is a fully object-oriented system,
in which the arrangement of objects can be controlled and the
parser is able to produce the data in canonical form.  WYSIWYG
systems can use formatting to imply document structure, they are
easy to use, and enable verification of the logical document, its
semantics, and compound element presentation [?].

Meehan concluded by saying that controlling document structures
(objects) is valuable, as is the process of text conforming to
ISO 8879.   Adding that [on-screen] formatting facilitates the
creation of such text.  She was then subjected to a barrage of
questions about Interleaf's policy on SGML and how their products
were going to develop.  Meehan said that she knew of no plans to
re-write their engine to support SGML;  currently, Interleaf 5
does not directly process SGML markup, but instead maintains the
document structure internally and recomputes the SGML.  SGML
documents that are imported into Interleaf are mapped to
Interleaf objects in a one-to-one mapping, although their process
has been found by some users to be unacceptably slow/

[My thanks to Neil Carlton, who provided most of the information
used in this write-up].



POSTER SESSION 2:  Verification and Validation -- various speakers

[I was only able to attend one of the presentations within this
session]



4.13 "Verification and Validation" -- Eric Severson (Vice
     President, Avalanche Development Co)

Severson proposed the development of a tool that relies on
weighted rules to check markup.  He suggested that a statistical
approach could be adopted in order to identify the areas where
most markup problems occur, and then assign weightings to a set
of rules which control the tool's sensitivity when checking
markup.

Severson could see two major problems with the approach he was
suggesting.  No-one has done any research to establish the
statistical distribution of incorrect markup -- and it was
unclear whether such a distribution would be effected by who or
what did the markup, and of what types of document structure.
For example, is the process of auto-tagging tables, more or less
error-prone than the manual tagging of book texts?

Severson suggested two possible ways of studying the statistical
distribution of incorrect markup.  Either to perform a "global
analysis" and adopt what he called a `process control' approach,
or to perform a "local analysis" and attempt to establish a rule
base that detects the symptoms of local problems.



4.14 AAP Math/Table Update Committee -- Chair, Paul Grosso

This was an evening session for anyone interested in the AAP
Standard (Association of American Publishers) and particularly
those aspects relating to the handling of Maths and tables.

Paul Grosso opened with a brief commentary on the AAP standard,
and noted that AAP's imminent intention to revise it.  Grosso
said that the AAP still tended to think in terms of a US
audience, but was gradually recognizing that its work was quickly
becoming a de facto standard in the publishing industry world-
wide.

Discussion was vigorous and wide-ranging, and consequently not
easy to summarize.  A member of the committee that had originally
devised the AAP standard said that they were well aware of its
inherent imperfections, but this had been a by-product of the
broad policy decision to "dumb-down" the original DTD in order to
make it more suitable for non-technical and widespread
consumption.

Grosso raised the question of what do people actually want to do
with the AAP tagging schemes?   He had received direct
communications from people concerned with mathematics who had
suggested several things they would like to do with the AAP
Standard and SGML, but had fount it difficult/impossible with the
standard in its current form (eg actually using the mathematics
embedded in documents, easy database storage, searching etc).
However, Grosso said he had heard surprisingly little from people
who wanted to use the AAP Standard as a basis for processing
tables.  He suggested that users need to decide what their goals
are, with respect to handling mathematics. and tables, then
decide how it should be marked up.  Thus Grosso posed the
question "What is the intended purpose of the AAP DTD?"

Replies fell roughly into two broad types.  One type placed the
emphasis on making documents generic -- fully independent of
either hardware or software -- with the main intention being to
capture only information (which would be devoid of anything to do
with formatting or styles).  The other type of reply suggested
that the main purpose of the AAP DTD was so that scientists could
communicate more easily.

Grosso then asked for an indication of how many of those present
at the meeting actually used either of the AAP tagging schemes.
For both the math and table tagging schemes the answer was around
six out of approximately thirty attendees (largely the same six
in both cases).  [Of course there was no real way of telling how
significant these results were, as the attendees were all simply
interested parties rather than a representative sample of the
user community;  also, some attendees were present on behalf of
large companies or publishing concerns, whilst others were there
only as individuals].

Fred Veldmeyer, (Consultant, Elsevier Science Publishers), said
that they have been using the AAP scheme for tagging math, but
had encountered problems with the character set (especially when
handling accented characters). Elsevier are now looking at the
ISO approach to handling math and formulae, and they are also
involved in the work of the EWS (European Workgroup on SGML).

Steven DeRose (Senior Systems Architect, Electronic Book
Technologies) suggested that in its current form the AAP tag set
does not enable users to markup enough variety of tables, and it
also does not facilitate extraction of information from tables.

William  Woolf (Associated Executive Director, American
Mathematical Society), spoke of his organization's involvement in
the development of TeX.   He said that he wanted to ensure that
TeX users would be able to be carried along by, and fit in with,
the transition towards SGML.  Richard Timoney (Euromath) pointed
out that any SGML-based system for handling math must be able to
do at least as much as TeX or LaTex, otherwise mathematicians
would be very reluctant to start using it.  He said that the
Euromath DTD currently contained far fewer categories than in the
AAP math DTD, (and in a number of respects it was more like a
translation from TeX).

Someone then asked if handling math was only a question of
formatting, why not simply abandon the AAP approach and encourage
users to embed TeX in their documents?  This provoked a reply
from another of the members of the committee that had originally
worked on the AAP standard.  He said that designing the DTD [s}
had been very hard work -- and that any revisions would be
equally difficult and time consuming.  They had tried to aim for
a system of math markup which would be keyable by anyone (eg copy
typists, editors) whether or not they had specialist knowledge of
the subject.  However, he also stated that he wanted the AAP
standard to be dynamic and responsive to the comments and
requirements of users through a process of regular review and
revision.  Another speaker remarked that the situation has
changed since the AAP standard was drawn up -- with the arrival
of products such as Mathematica and Maple, and new envisaged uses
for marked up texts.

A speaker from the ACM (Association for Computing Machinery)
remarked that whilst as publishers they did not expect the AAP
Standard to be an authoring tool, they still needed a way to turn
the texts authors produce into/out of the Standard, (and such
tools are not currently available).

Steve Newcomb (Chair SGML SIGhyper), suggested that future
revisions of the AAP Standard might perhaps incorporate some of
the facilities pioneered in the development of HyTime.  He
proposed the use of finite co-ordinate spaces and bounding boxes
as HyTime facilities which could assist in the process of marking
up tabular material.  He also remarked that adding these
facilities to the Standard need not entail a wholesale movement
over to HyTime.

Fred Veldmeyer (Elsevier) remarked that whatever changes are
going to be made to the AAP Standard, the process should not take
too long.  Several people at the meeting were also of the opinion
that since it would effectively be impossible to supply a DTD
that met everyone's full requirements, it was better to have an
imperfect Standard in circulation than no standard at all.  An
alternative opinion was that whilst it is plainly necessary to
get the revised version of the Standard out as soon as possible,
if it served no-one's needs adequately it would eventually be
left to wither away though lack of use.

A straw poll was then taken of the remaining people at the
meeting -- to see which of the current approaches to handling
maths should form the basis for revisions to the AAP Standard.
About five people were in favour of using ISO 9573's approach to
math, five for the developing the current AAP approach, and five
for investigating the Euromath DTD.  No vote was taken on an
approach to tables.

The American Mathematical Society volunteered to set up an
electronic distribution list, so that the threads of discussion
on Math and tables could be continued over the network for anyone
who wished to take part.  [Since SGML '91, the distribution lists
have been operational, in order to subscribe to the list, email
listserv@e-math.ams.com, an email message whose body consists of
the following three lines (for math):

    subscribe sgml-math <your name>
    set sgml-math mail ack
    help

(for tables):

    subscribe sgml-tables <your name>
    set sgml-tables mail ack
    help

In each case <your name> is your full name (so in may case <your
name> = Michael G Popham).  You should receive acknowledgement
from the list server by return email.]  The next meeting of the
AAP Math/Table Update Committee is intended to co-incide with
TechDoc Winter '92.



5.1 "Unlocking the real power in the Information" -- Jerome Zadow
    (Consultant, Concord Research Associates).

Zadow opened by posing the question "What's the good of SGML?",
particularly with regard to the topic of the title of his
presentation.  He then set out his own position, aware of his own
bias towards SGML.  He suggested that most people considering
SGML systems do not understand the reach and power of the
concepts involved;  most that are already implementing SGML, are
doing so for the wrong reasons.  He argued that the simple
powerful concepts of SGML still require careful thought and
analysis, and represent as many implementation considerations as
a large database.

Drawing on an article from Scientific American (L.Tester, 9/91),
Zadow summarized computing trends for each decade from the 60's
to the 90's.  He also discussed the main features of the networks
currently in use, and how they work together to create the
existing network infrastructure.  Zadow believed that trends in
computer use and networking, meant that users are now overwhelmed
with information.  Users now create and deal with much more
information than in the past and also spend more of their time
formatting the information they process.  Zadow described the
current situation as one of "information overload", where there
is too much information of too many types in too many different
places.  The result is that the cost of finding, acquiring, using
and preparing necessary information is becoming too great.

Looking ahead for the next ten years, Zadow anticipated the
increased use of "Knowledge Navigators" which facilitate
intelligent network roaming.  He foresaw a growth in the number
and size of public repository databases, and a fusion of
computers, communications, TV and personal electronics.  However,
he also accepted that many factors would continue to change --  
platforms, screen resolutions, windowing mechanisms, carriers,
speeds, operating systems etc. What remains comparatively
stable are the information sources and data -- both of which have
a long life -- and the fact that the number of information types
will continue to grow.

The initial rationale for SGML was for the publishing and re-
publishing of frequently revised information, and for the
distribution of information in a machine- and application-
independent form.  The justification for establishing SGML as a
standard was to ensure the validity, longevity and usability of
such information.  However, Zadow suggested that SGML is now
being used will beyond its initial purposes.  Information marked
up with SGML is no longer confined to publishing on paper -- as
well as supporting publishing in a variety of different media,
SGML is used in distributed, databases, and interactive
environments.  SGML structures have grown beyond words and
pictures to include motion, sound and hyperlinks.

Zadow then put up some schematic diagrams of the current
(typical) information flow, and the future information flow using
SGML.  He annotated each diagram with notes on the processing
stages involved in the information flow.  Instead of information
going directly from the publisher to the consumer, with some
information re-directed into a library for storage and retrieval,
Zadow predicted that all published information will flow directly
into the `library'.  The generators of information will no longer
merely research and write it, but will also be responsible for
adding intelligence (via markup), and providing additional
features such as motion, sound etc.  Information will be
distributed from the library to the consumer on the basis of
additional processing    which will only be made possible by the
inclusion of intelligent markup in the information (facilitating
security checks, the matching of requests will use profiles etc.)

Zadow suggested that all information should be analysed, so that
its purpose could be defined.  it should be established how you
want to use and share the data, who you want to share it with
(the community), and over how much of the community your
techniques will be valid.  Zadow then proposed three broad types
of data:

    a)  Information managed as pages -- hard copy or master:
        little or no intelligent coding

    b)  Information managed as (parts of) documents -- hard copy
        or electronic:  structure, form and some content encoded.

    c)  Information managed as a database:  content encoded,
        structure and form attached.

An example of type (c) data would be found in an hierarchical
non-redundant database of re-usable data elements.  each of these
elements would have content, relationships (with other elements)
and attributes (for use).   He then showed a schematic
representation of type (c) data -- stored in a non-redundant,
non-formatted neutral database -- being extracted and variously
used as task-oriented data, training data, data for documentation
and management data.

Zadow emphasized the importance of careful information analysis
-- adopting structure-based, information tagging, and hybrid
approaches.  It is vital to establish how the information is used
now, and what facilities are missing;  it is also important to
bear in mind the extent to which any approach will meet the
requirements of the broader community.  Any new SGML application
should be defined broadly -- with the required and optional parts
clearly determined.  The application should be prototyped and
thoroughly tested against any stated purposes and goals.
Analysts must also ensure that the definition of this application
will offer the least hinderance in attaining future goals.

Zadow proposed a number of scale economies that could result from
an SGML-based approach -- primarily those of reduced costs.  He
encouraged users to "think globally, act locally" and to focus on
their immediate functional purposes whilst still planning to
maximize future benefits.  However, Zadow also sounded a note of
caution.  Simply using proper SGML application does not mean that
everyone will be able to use your data.  He encouraged the use of
the concrete reference tag names and syntax, and urged designers
to adopt (or modify) existing public SGML applications rather
than working from scratch.

Zadwo envisaged a changing realm of publishing, one that
fulfilled the definition he cited from Webster's 7th New
collegiate Dictionary -- "[To] Publish: to make generally known,
to place before the public:  disseminate".  He saw this in terms
of a move away from paper to other media (including electronic),
from formality to informality, from active to passive, from
serial to self-organized, from the physical limitations of paper
to the extra dimensionality offered by motion, sound etc.   He
also saw traditional libraries becoming more like archives.  They
would continue their existing functions as information
repositories, providers of user access to multi-publisher
document collections, and collection management, but also take on
new and additional functions.  These would include "routing"
(distributing information from publisher to consumer), automated
electronic search and selection, information marketing and order
fulfilment, and several other diverse functions.

Zadow suggested that any organization intending to use SGML to
unlock the real power of information should be prepared to change
the scope of its information systems management;  it will also
need to adopt new production/competitive strategies, and refine
its information.  When considering the range of alternative
applications SGML could be another way to do just what you are
doing now; or it could be part of a much broader strategy, in
which case the costs are likely to be greater, but the benefits
deeper and must longer lived.

Zadow concluded by saying that SGML applications should not just
replace current methods.  We must ensure success of early
implementations because they are the pilots for broader
applications.  Organizations must start changing now, or find
themselves lagging behind.  We must organize communities with
shared interests to agree upon and manage applications and data
dictionaries;  only in this way will our applications by broadly
usable over time.



5.2 "The Design and Development of a Database Model to Support
    SGML Document Management" -- John Gawowski, Information
    Dimensions Inc  [NBC]

This was another piece of work in progress.  He listed the
requirements for an SGML document management system and the
development of a model to support these requirements.  Such a
system must be able to manage documents at the component level:
documents of  different types must co-exist;  creation of new
documents from the components of several documents should be
possible;  and searching based on structure as well as content
should be supported.

In the model they are developing they are making a division of
data into contextual content and contextual criteria.  They
therefore take a DTD and separate its elements into one of the
above.  As an example, for a simple resume the following division
could be made.

        CONTEXTUAL CONTENT              CONTEXTUALCRITERIA

        resume                          name
        emp_history                     birthdate
        job                             marital_status
        jobdes                          from
        career_goals                    until

Putting these together gives structures of the form:

        CRITERIA                VALUE           CONTENT

        name                    T. Adam         resume
        birthdate               April 12, 1950  resume
        marital_status          Single          resume
        from                    1976            job(1)
        from                    1981            job(2)
        until                   1981            job(1)
        until                   1991            job(2)

This model has been found to meet the stated requirements and can
be generalised to support arbitrary applications.  It is also
appropriate for the support of structured full-text query
languages (SFQL) being proposed.  Their development system has a
relational engine (Basis+) with a text retrieval capability.



5.3 "A Bridge Between Technical Publications and Design
    Engineering Databases" -- Jeff Lankford, Northrop Research 
    and Technology Centre [NBC]

At Northrop they are also working on a database publishing
system.  They too have come up against the problem of too much
paper and too little information.  he showed a slide of a
technician next to a fighter aircraft with a pile of manuals
taller then himself.  They are looking at the opportunities
provided by advanced technology publishing.  These include:

        * being able to retrieve the technical content from 
          on-line databases;

        * having the presentation dependent on the document,
          display and use;

        * content-sensitive document scanning;

        * electronic multi-media presentation;

        * and interactive, animated graphics.

Their approach to obtaining these goals is based on the notion of
dynamic documents.  These have a number of characteristics:

        * there is not final published version, they are 
          created on demand;

        * the technical content is retrieved from on-line
          databases;

        * and authoring binds markup with specific sources.

To meet these goals, the solution they have come up with is known
as TINA (TINA Is Not an Acronym)

With TINA they are trying to develop a client-server model.  The
architecture of TINA involves multiple distributed concurrent
clients and servers using a network protocol based on ISO and SQL
accessible databases.  the TINA server has drivers for the
different databases being used.  The clients provide the
information from the databases.  The server is able to decode
information from the clients and switch to the appropriate
driver.  The client is responsible for putting together a valid
SQL construct.  They have produced a successful prototype and a
production version is in progress.  Their aim is to replace the
pile of manuals next to the technician by a portable computer
terminal.

Currently they are not using SGML with TINA but the speaker
suggested that it could be used in the definition and
construction of the source databases, for data interchange among
clients and servers, and for CALS-compliant distribution to
customers.  However, he saw some barriers to the use of SGML.
Firstly, the existing application works fine without SGML so it
is hard to persuade management to change.   Secondly, the speaker
complained that there are no wide-spread public domain SGML
tools.  This statement drew criticism from the audience when
someone suggested that while academics might have trouble
affording commercial software, Northrop ought to be able to
afford to purchase products.  The speaker insisted that very
little money came his way to buy SGML software.  He was still
trying to educate his management of its advantages.



5.4 "Marking Up a Complex Reference Work Using SGML Technology"
    -- Jim McFadden, Exoterica [NBC]

This presentation was billed as an example of the use of "Fourth
Generation SGML".   According to McFadden, fourth generation
SGML, or fourth generation markup languages, is a new term
designed to distinguish a particular methodology.  Fourth
generation SGML documents are marked up generically ie they do
not contain any explicit information about how the data in them
will be processed or displayed.  This is the goal of third
generation SGML documents as well.  However, fourth generation
SGML documents are distinguished by two important features.
Firstly, the data are organized such that the structure being
represented is the structure of the data and not the structure of
the presentation, and the markup language defined for the
document is powerful enough to capture the data in a natural and
economical way.   Secondly, intelligence is added to the document
so that the data are unambiguous to both human coders and the
computer;  all redundancy and data dependency will be resolved.
In traditional DBMS technology this is referred to as "data
normalization".

With SGML a properly normalised data structure ensures that the
data can be easily applied to several diverse applications
without any requirement for modification of the original marked
up source.  Fourth generation SGML documents can become very
sophisticated webs of complex knowledge.  Fourth generation SGML
requires the functionality of the reference concrete syntax, in
order to support advanced markup notations, and it requires
sophisticated processing software to support the wide range of
possible data manipulations.

Exoterica are using these fourth generation SGML techniques to
capture the detail in complex texts, such as reference works.
The presentation described a real ongoing commercial activity
involving movie reference works.  They are taking a number of
books on the same subject and combining them into a single
computer-based book.  The customer wanted the ability to create
complex multimedia hypertext and the ability to recreate the
original book forms.

Text was received in book and typesetter form and they have used
their own OmniMark product, writing scripts to convert binary
codes to ascii.  As the document was already very structured an
OmniMark script could also be used to tag the document with "full
tags".  At this stage they had produced effectively a third
generation generically tagged SGML document.  Another OmniMark
script was used to translate the full markup to a simpler
language which is more readable for the coders.  Some elements
become attributes and some tags are replaced by strings.  The
coders then have to go through the whole text as a preliminary QA
step.  They have found that it takes (sometimes expert) human
intervention to resolve ambiguities for the computer.

CheckMark, a validating SGML editor, was then used to ensure that
the markup in the files was syntactically correct.  At the
completion of this phase the files were valid SGML documents
which had enough detail captured to format the documents in a
number of ways.   There is also sufficient detailed captured to
create powerful hypertext documents.

The marked up files were then converted to RTF (used by Word)
with another OmniMark script.  Coders then compare the Word
display to the original text and any necessary corrections made.
They have had four people working on this project since October.
They reckoned on four man-months of work to do about four
(similar) books.  They are also doing a World almanac.

They see many potential uses for the data.  One can:

        * form the original database for future amendments;

        * generate various hypertext documents;

        * generate books of various types based on criteria;

        * generate various indices;

        * combine the data with other books.



5.5 "Nurturing SGML in a Neutral to Hostile Environment -- Sam
    Hunting, Boston Computer Society. [NBC]

I caught only the end of this talk but picked up some of the
background material.  The BCS has 25,000 members and 800
activists who primarily, deliver member services on a volunteer
basis,  using software and hardware donated by industry.  They
produce over 20 publications.  They have recently set up the
Accessible Information Technology Project (AIT).  Its mission is
to develop a common electronic document standard for the BCS,
which can be rendered (viewed) in any medium.   They have chosen
SGML for the electronic format because it is:

           (a) an ISO standard,
           (b) device and media independent,
           (c) human and machine-readable,
           (d) parseable,
           (e) allows house style to be enforced unobtrusively.

The rationale behind the project is that in the past they have
not been good stewards of their knowledge base. Delivered on
paper, member services are inevitably trashed.  With their
publications they want to be able to archive, retrieve, send out,
and repackage. A small committee has been formed to oversee the
project.  Sam's background has involved him in   desktop
publishing and previously he has looked at style-sheets as a
means of automating processes.  However, the style sheet
conceived as a collection of specifications addressing a
paragraph lacks the notion of context, leading to anomalous
results eg string conversions,  in-line  formatting.  SGML, with
its nested elements,  therefore appeared to him as a solution to
automating formatting problems (or, rather, design solutions).

The project aims to show people what is possible ("If you build
it,  they will come").  The 800 activists cannot be coerced, but
can be inspired.  Since they are what the industry refers to as
"early  adopters".  AIT's  focus  has  been on demonstration
projects - the proof of the concept. Progress so far includes the
development of a magazine DTD (for their shared publications). A
print format has been designed from this (by a designer). The
initial delivery of a print version    was seen as easing the
credibility problems.   Further iterations on the DTD have been
done and back issues of magazines have been retrofitted to the
current DTD.  In addition to the already-produced print versions
they have an EBT WindowBook and a full-text retrieval bulletin
board.



5.6 "Trainers Panel" [NBC]

There was a short session when four speakers gave their tips on
how to teach SGML to people. The most enthusiastic response was
for the person who used cut-out figures complete with hats
and boots to represent different parts of the SGML syntax.
Someone else suggested playing hangman using words from the SGML
grammar and then getting people to give examples of how the word
was used in SGML.



5.7 "Reports from the Working Sessions" [NBC]

There were short reports from the two working sessions which had
been held on Monday and Tuesday evening.


#1: STANDARD PRACTICES - Eric Severson and Ludo Van Vooren

In this session people had considered whether a standard "systems
development methodology" approach could be applied to SGML
implementation.   Discussion produced the following task
list.

(1)     Preliminary assessment. Determine the boundaries and
scope of the project Using a prototyping approach, define
specific milestones, deliverables and management/user sign-off
points.

(2)     Analyse the present product. This includes documenting
the current author/edit cycle and critical control points.

(3)     Define success criteria for the new product.  Identify
business objectives in implementing the new product. Evaluate the
present product's strengths and weaknesses.

(4)     Develop implementation approach for new product. There
are many areas where changes to the existing systems will need to
be described.

(5)     Perform document analysis.  Identify a small group of
qualified people to perform the document analysis.  Express the
results of the analysis in a "rigorous English" form,  not
directly in SGML.  Document analysis is not the same as DTD
development, and must precede any DTD development.   SGML syntax
is not a good form to communicate with management and users!

(6) Design functional components.  Particularly define a strategy
for converting existing documents.

(7) Implement new system. Select and install software.  Create
DTDs based on the document analysis.  Convert existing data and
implement new policies.

(8) Perform post-implementation review.  Evaluate the new
product's success against defined criteria. Refine and tune the
new system.


#2: A TOOL FOR DEVELOPING SGML APPLICATIONS

The purpose of this session was to describe the functionality of
an information management or CASE tool to support the development
of SGML applications.  People had made a first attempt  to
outline the functions necessary to support SGML application
creation and maintenance, including: gathering the application
and element level data necessary to create, compare and maintain
SGML applications; producing SGML "code" such as DTDs and FOSIs;
and, producing SGML documentation such as Tag Libraries and tree
structure diagrams.

The following functional categories were identified:  data
collection, report generation, code generation, administration
support, and house-keeping functions.  Each of these had then
been discussed in some detail, generating a lot of ideas about
what such a tool should provide under each category

Ideas generated by the group in the brainstorming session were
categorized as follows:

Data Collection

Put information in once;  Build DTD from tagged instance;
Identify attribute value types from example;  Import BNF
structures;  Graphic data Collection;  Read DTDs;  Store
rationale for analysis;  Store source of information;  Prompt
through document or application anaylsis;  Prompt through
instance building;  Prompt through attribute creation;  Prompt-
based SHORTREF generator;  Special character recognition and
cataloging;  Support Work Group;  Support multiple simultaneous
users;  Group conference-ware capability (electronic conference).

Report Generation

List elements;  List tags;  List entities;  List attributes and
values;  List omissible tags;  Exception report
(inclusions/exclusions);  List of FQGIs;  Tree structure diagram;
Heirarchical listing;  Structure diagram;  Plain-English Tag
Library;  Quick Reference Cards;  Tag Usage Report;  Compare
instance to straw-man DTD (simultaneously);  DTD-to-DTD mapping;
Well-populated sample document;  Minimal parseable document;
Minimal partial model (eg front matter);  Flag identical
elements;  Flag similar elements;  Element source listing.

Code Generation

Generate SGML Declaration;  Build DTD;  Build FOSI;  Build
"straw-man" DTD (based on partial information - for evaluation);
Automatic parameter entity generation; Create translation
filters;  Export BNF structures;  Custom editor generation; 
API for other tools (Application Profile Interface).

Administration Support

SValidate DTDs;  Validate tagged instance;  Validate SGML
Declaration;  Compare DTDs;  Implement Architectural Forms;
Warnings about bad practice;  Identify errors;  Error
explanations;  Find inconsistencies (eg in content models);  On-
line Help re: SGML standard;  On-line Help re: CASE Tool;
Search/Find;  Find paths in DTD;  Access to LINK library.

House Keeping

Audit trail;  History tracking;  Version control;  Automatic
Minimazation;  Disambiguator;  Error correction;  Content Model
libraries;  Structure libraries;  Shared libraries (Among
Multiple Applications or Multiple DTDs);  Fully functional
editor;  Multilingual;  User tracking.



6  Summary

This was a well-attended and lively conference, with more
technical content than can be found in the comparable
"International Markup" series of conferences that are also
organized by the GCA.  However, the audience's level of SGML
expertise was very diverse, and some of the more rexperienced
felt that the conference had been less rewarding than they would
have hoped.

I would take issue with this view of the conference, as it seemed
to me a reflection of the hidden agenda of some attendees to get
free solutions/advice on how to resolve their organizations
particular difficulties with their implementation of SGML.  As a
general forum for discussion on some of the technical aspects of
SGML, I found the conference to be very useful.  I would urge
anyone seriously interested in implementing SGML to attend    and
to present papers or poster sessions if they have encountered a
particular problem which could usefully be shared and discussed
with other attendees.

I will be attending "SGML '92" (Danvers, MA) October 25th - 29th,
and hope to produce a succinct and timely report soon after.  I
apologise to all those who have waited so long for this document
to appear.


=================================================================
For further details of any of the speakers or presentations,
please contact the conference organizers at:

Graphic Communications Association
100 Daingerfield Road, 4th Fl.
Alexandria, VA 22314-2888
United States

Phone: (703)519-8157          Fax:(703)548-2867
=================================================================
You are free to distribute this material in any form, provided
that you acknowledge the source and provide details of how to
contact The SGML Project.  None of the remarks in this report
should necessarily be taken as an accurate reflection of the
speakers' opinions, or in any way representative of their
employers' policies.  Before citing from this report, please
confirm that the original speaker has no objections and has given
permission.
=================================================================
Michael Popham
SGML Project - Computing Development Officer
Computer Unit - Laver Building
North Park Road, University of Exeter
Exeter EX4 4QE, United Kingdom

Email: sgml@exeter.ac.uk      M.G.Popham@exeter.ac.uk (INTERNET)
Phone: +44 392 263946        Fax: +44 392 211630
=================================================================