Credits

The following report was obtained from the Exeter SGML Project FTP server as Report No. 9, in UNIX "tar" and "compress" (.Z) format. It is unchanged here except for the conversion of SGML markup characters into entity references, in support of HTML. The document contains two appendices: (1) Yuri's "Year in Review", and (2) Michael Sperberg-McQueen's closing address
SGML '92 Conference Report, by Michael Popham


THE SGML PROJECT                                         SGML/R16

CONFERENCE REPORT

SGML '92
DANVERS, MA, USA OCTOBER 25TH-29TH 1992                 Issued by
						 Michael G Popham
						2nd December 1992
-----------------------------------------------------------------
NOTE:  Thanks to Yuri Rubinsky and Michael Sperberg-McQueen, for
posting the texts of their speeches to comp.text.sgml, which have been
reproduced in the Appendices to this report.  Initially, Yuri and
Michael's postings were reproduced without their permission -- so as
well as a huge debt, I also owe them my profound apologies.  Copyright
and permission to re-use these texts remain with Yuri and Michael.

BACKGROUND

This was the tenth in a series of annual SGML conferences
organized by the Graphic Communications Association (GCA).  This
was the most well attended conference to date, with 270 attendees
drawn from a wide range of backgrounds.

SESSIONS ATTENDED

    1.  SGML:  The Year in Review -- Yuri Rubinsky (SoftQuad,
	Canada) (Full text of this presentation is given in Appendix I 
	attached)
    2.  I have seen the Future of SGML and It is ... -- Dr Charles Goldfarb
	(IBM USA)
    3.  Lessons Learned from the Text Encoding Initiative -- 
	Susan Hockey (CETH USA)
    4.  The Marks that Monks Make:  Tagging Irish Manuscripts   
	-- Peter Flynn (University College, Cork, Ireland)
    5.  Using SGML in Non-SGML Environments -- Ludo van Vooren
	and Eric Severson (Avalanche Development Company, USA)
    6.  SGML and Braille -- George Kerscher (Recording for the
	Blind, USA), Yuri Rubinsky (SoftQuad, Canada)
    7.  Standards Activity and News Briefing -- Various Speakers
    8.  The Novice's Guide to HyTime -- Lloyd Rutledge
	(University of Massachusetts, USA)
    9.  One Doc - Five Ways:  Comparative DTD Session -- Various
	Speakers
    10. The OSF DTD Recommendations:  Lessons We Learned --
	Jeanne El Andaloussi (Bull SA), Eve Maler (Digital Equipment
	Corporation)
    11. Guidelines for Document Analysis reports and a Tool for
	Maintaining Many, Varied DTDs -- Dennis O'Connor (Bureau of National
	Affairs Inc, USA)
    12. Sharing the Lessons of the CALS SGML Activity -- Beth
	Micksh, Robin Tomlin (Intergraph, USA)
    13. SGML:  Extending and Confirming Object-Based Software --  
	Don Davis (Interleaf, USA)
    14. Poster Session -- Various speakers
    15. International SGML Users' Group Meeting
    16. The Society of Automotive Engineers J2008 Task Force --
	Jim Harvey (Volt, USA)
    17. The Air Transport Association/Aerospace Industries
	Association, rev 100 -- Diane Kennedy (Datalogics, USA)
    18. The Davenport Group for On-line Documentation -- Various
	Speakers
    19. Implementing a HyTime System in a Research Environment -- 
	Lloyd Rutledg (University of Massachusettes, USA)
    20. Poster Session -- Various Speakers
    21. The User SGML at the Boston Computer Society -- Sam
	Hunting and Irina Golfman
    22. Document Management in Production Publishing Environments --
	William Trippe (Xyvision, USA)
    23. An SGML Pilot Project:  The OSTI Reports Received List --
	Norman Smith (Science Application International Corporation, USA)
    24. Frame-Based SGML -- Len Bullard (Paramax, USA)
    25. SGML as Foundation for a Post-Relational Database Model --
	Tim Bray (Open Text Corporation, Canada)
    26. The SGML View of a Database -- Bob Barlow, Fritz Eberle
	(AGFA, CAPS)
    27. SGML Queries -- Paula Angerstein (Texcel, USA)
    27.1  Comparative Implementation of the SGML/Search
	  Query Language Francois Chahuneau (AIS/Berger Levault, France)
    27.2  Structured Queries and Location Models:  DSSSL   
	  Paul Grosso (ArborText, USA)
    27.3  Structured Queries and Location Models: HyQ   
	  Steve DeRose (Electronic Book Technologies, USA)
    27.4  Structured Queries and Location Models:  SFQL   
	  Neil Shapior (Scilab Inc, USA
    28. Structured Queries and Location Models, Part II --
	Various Speakers
    29. Transforming Airworthiness Directives from Paper to CD-
	ROM -- Hally Ahearn (Oster & Associates Inc, USA)
    30. The Development of a Technical Support Database for On-
	Line Access and for Publication on CD-ROM -- Elizabeth Jackson 
	(K.E.B. Jackson Consulting)
    31. The Making of Microsoft Cinemania:  SGML in a Multimedia
	Environment -- John McFadden (Exoterica Corporation, Canada)
    32. Process of Converting from Paper Documentation to SGML
	Based CD-Rom -- Ken Kershner (Silicon Graphics, USA)
    33. Converting 180+ Million Pieces of Paper -- Eric Freese
	(Mead Data Central, USA)
    34. Poster Session Conversion -- Various Speakers
    35  Back to the Frontiers and Edges -- Michael Sperberg-
	McQueen (University of Chicago, USA) (Full text of this 
	presentation is given in Appendix II attached).


1.  SGML:  The Year in Review -- Yuri Rubinsky (SoftQuad, Canada)

Yuri Rubinsky (YR) offered his now traditional rapid run-through
of the past year's main SGML events and activities.  The full
text of his presentation will be published in both <TAG> and The
International SGML User's Group Newsletter (it also appears in
Appendix A of this report).  However, some of the highlights in
the areas identified by YR include:

    * Standards activity     HyTime, the Conformance Testing
      Initiative, the SGML review, and work on query languages.
    * User Groups new groups in Australia, United Kingdom,
      Washington DC., South Ontario, Seattle, Colorado
    * Major public initiatives    European Community projects
      promoting information access  access for the blind.  
      Work of the Davenport Group.  Work on ISO TR 9573 at the ISO.
      Work of the EWS (European Workgroup on SGML) on DTD for
      journal articles.
    * Major Corporate and government initiatives    Use of SGML by
      Canadian Standards Association, Australian Parliament, US Navy, 
      Fokker aircraft, Wolters Kluwer, Boeing,
      US Air, US Dept. of Energy, Silicon Graphics and Novelle (for
      delivering hardware and software documentation on CD-ROM)
    * Publications -- a new book by Joan Smith.  2nd print-run of
      The SGML Handbook. An electronic version of Practical SGML 
      to be produced.
    * Vendors -- (In addition to those that would be documented
      later) YR mentioned WordPerfect's "Markup", Adobe's "Carousel" 
      (2nd release to have SGML `smarts'),
      Quarkexpress (will be able to export SGML) and the future
      release of TechnoTeacher's "HyMinder" (HyTime engine).
    * Miscellaneous    December '92 release of Exoterica
      Corporations "Compleat SGML" text suite etc. on CD-ROM.  
      Work of CURIA (The Irish Manuscript Project).  Proposed
      SGML-aware extension to LaTex3.  Release of the ICA's
      software for building translators.

2.  I have seen the Future of SGML and It Is .... -- Dr Charles
Goldfarb (IBM, USA)

Dr Goldfarb (CG) began be expressing his personal disappointment
at the continuing appearance of proprietary "standards" which
users are still buying.  He urged all attendees to promote the
advantages of using SGML (and other non-proprietary standards) as
widely as possible.  He then began his talk proper.

The world of the isolated single document is dead.  Now there is
hypertext and multimedia the future lies in documents that
conform to SGML and HyTime (which is both an application of SGML
and a conceptual extension to it).

CG showed some slides of pages taken from the Winchester Bible   
a highly ornate, illustrated twelfth century manuscript.  He
proposed that these were, in fact, multimedia documents   
showing that readers have been using the techniques to access
such texts for several centuries.  CG also remarked that time
dependencies represent a central feature of using hypertext and
multimedia -- but again argued that readers have been familiar
with such concepts for years, in the form of music manuscripts
and scores.  In the same way that music notation represents the
relative duration of notes, HyTime extends the concept of
addressing into "finite co-ordinate addressing", where location
and time co-ordinates are expressed in terms of their relative
position to a known address.   This enables HyTime to express
almost any kind of relationship, just as SGML can express any
ordered structure.  Co-ordinate space can have any number of
dimensions and these, plus the units of measurement, are
definable by the HyTime system developer.

CG concluded by encouraging all new and prospective users of
HyTime (and SGML) with the thought that we are all experts in the
use of hypertext and multimedia already!

3.  Lessons Learned from the Text Encoding Initiative -- Susan
Hockey (CETH, USA)

Susan Hockey (SH) summarized the organisational structure and
work of the TEI.  I refer readers who are not familiar with the
TEI to the account of Lou Burnard's paper at SGML '91, given in
the conference report produced by the SGML Project (SGML/R9).
The latest TEI news from SH is summarized below:

	* TEI P2 is due by July '93, with a "final report" (P3) due
	  soon after.
	* the TEI is actively seeking additional funding to
	  continue its work.
	* users need to be educated to realize they will not
	  need to adopt everything that is in the TEIs "Guidelines" 
	  in order to use them
	* libraries are taking an increasing interest in handling
	  electronic texts.   Much of their attention focuses on work 
	  to create a standard TEI document header (and the
	  software to process these automatically).
	* the TEI is also seeking to establish guidelines for
	  testing TEI conformance, and how best to maintain and develop
	  the "Guidelines"

4.  The Marks that Monks Make: Tagging Irish Manuscripts -- Peter
Flynn (University College, Cork, Ireland.

Peter Flynn (PF) described the activities of the CURIA project,
which is funded for ten years to make machine-readable copies of
Irish manuscripts texts from the sixth century to the sixteenth
century.  Most of these texts are illuminated, and written in
either Irish and/or Latin.  PF said they anticipated multiple
uses for the electronic text archives, and that it was less
important to record formatting information because the electronic
texts would not be used to recreate printed or displayed copies
of the original manuscripts.   However they had to provide
electronic texts which were acceptable to the increasing number
of scholars who wished to analyse them.

CURIA's aim is to make the archive widely accessible -- via
anonymous ftp, telnet, WWW browsers (also Gopher and WAIS), on
CD-ROM and via interactive Bitnet messages.  PF showed how the
electronic texts were derived;  rather than using an original
manuscript, they scan a book version (usually nineteenth century)
of the text -- which takes about two minutes per page with 99%
accuracy to get an ASCII file.  The file is hand-tagged using
"Author/Editor" (using Latin tag names, to satisfy the lingua
franca of mediaeval scholars), to produce a TEI-conformant SGML
file which can also be printed or displayed with user-defined
options.

			       
5.  Using SGML in Non-SGML Environments -- Ludo van Vooren and
Eric Severson (Avalanche Development Company, USA)

Van Vooren and Severson (V&S) gave a highly entertaining
dramatised dialogue to present a discussion of how best to
implement SGML in a real-world, non-SGML environment.  They
suggested that it would be unacceptable to try and impose the use
of SGML and/or structured editors in many traditional working
environments.

V&S urged systems implementors to make SGML appear as simple and
friendly as possible  through the adoption of minimally rigid
DTDs, and "verifiers" which can be incorporated into familiar
word processing environments, so that users can be guided to
produce acceptable structured documents before the files
converted into SGML.  They argued that to be truly successful and
widely adopted, SGML must offer solutions to the problem of
managing the "infoglut" -- and that it must always appear simple,
even if an implementation is, in fact, highly complicated behind
the scenes.

6.  SGML and Braille -- George Kerscher (Recording for the Blind,
USA), Yuri Rubinsky (SoftQuad, Canada)

George Kerscher (GK) spoke of the Texas State law which has
recently been passed to mandate that all school texts must be
made available to the blind and other "print disabled" groups.
Similar laws have subsequently been passed in eight other states.

Under the Texas law, conventional publishers are required to
provide braille publishers with the electronic files they use to
format their printed textbooks.  Braille publishers are able to
strip out the formatting codes to produce a plain ASCII file
which they can then mark up using SGML.  However, as electronic
text/manuscript standards, such as the one developed by the AAP
become widely adopted amongst publishers, the braille publishers
will need to find a way of mapping from say, the AAP DTD to one
which provides sufficient structural information for their needs.
Yuri Rubinsky described the tag set which has been developed for
this second target DTD and discussed some of the problems that
the translation process has identified -- ie the need to change
tag names, recognise contexts , and so on.  This work was being
carried out in conjunction with the activities of the
International Committee for Accessible Document Design (ICADD).

7.  Standards Activity and News Briefing -- Various Speakers

Sharon Adler (Electronic Book Technologies, USA) described the
latest developments affecting SGML Registration Procedures and
Authority.  After several years delay following the appearance of
ISO 9070, it now seems almost certain that the GCA will become
the controlling authority for issuing Public Owner Identifiers
etc.

Marion Elledge (GCA, USA) described activities in the sphere of
conformance testing.  Harmonized test suites will be developed by
the end of 1992, and in the first quarter of 1993 an SGML
conformance testing laboratory will be established at the NCC in
Manchester (UK).  The laboratory will test parsers for
conformance to the core of ISO 8879, test features (such as
minimization), and test applications (such as the MIL 28001 DTDs,
ATA DTDs etc.)

Anders Berglund summarized moves towards developing an harmonized
SGML math.  He characterized the existing difference as follows:

       *  ISO 9573 and AAP DTD offer meaningful element names, and
	  aim to represent first year university math (including basic 
	  semantics) 
       *  Euromath DTD offers a layout-oriented approach
       *  ISO TR 9573 and AAP DTD have more in common than they
	  have differences.

He suggested that the work of the AAP math revision subcommittee
will be closely based on the latest version of ISO TR 9573, which
offers a three layer approach to math encoding.  He argued that
this approach offered opportunities for semantic and layout-
oriented markup and was highly user-extensible.  Work on the AAP
math DTD would continue at the conference.

Sharon Adler (Electronic Book Technologies, USA) gave an update
on the status of DSSSL (ISO/IEC DIS 10179 -- Document Style
Semantics and Specification Language (DSSSL).  The second DIS
is scheduled for ballot during April 1993, and it is hoped that
it will be available as a full International Standard within the
following six months.  A number of issues and features
surrounding DSSSL still remain to be resolved and Adler welcomed
any input from users through their national standards bodies.

8.  The Novice's Guide to HyTime -- Lloyd Rutledge (University of
Massachusetts, USA)

Lloyd Rutledge offered a very detailed introduction to the main
concepts of the HyTime standard (ISO 10744).  Much of the value
of the session came from extensive question and answer periods.
The content of the slides was too extensive to reproduce here,
and interested readers are recommended to contact Rutlege
directly (his address can be obtained from the GCA or The SGML
Project).

9.  One Doc - Five Ways:  Comparative DTD Session -- Various
Speakers

This session was Chaired by Tommie Usdin (ATLIS Consulting Group,
USA), who began by expressing her belief that DTD development is
an art form, not a science.  Five experienced DTD designers had
been asked to produce a DTD for "The New Yorker" magazine.  Each
of the five was asked to keep a particular application or purpose
in mind when writing their DTD.

Usdin remarked that the five DTD designs had thrown up a number
of interesting similarities and differences.

Similarities included:

	* Similar tag names chosen
	* influence of the AAP model
	* layout of the DTDs (they all looked physically similar)
	* oriented towards content modelling (even when the
	* designers had been asked to focus on formatting issues)
	* all the authors had identified themselves in comments
	  near the start of the DTD.

Differences:

	* only one DTD expressly accommodates document management
	* inclusion/omission of the SGML declaration
	  level of detail
	* only one DTD design was modularized
	* only one contained "code comments"
	* several different types of graphics had been identified
	* documentation: internal, external, both or none
	* great variation in the extent of use of attributes

Debbie Lapeyre had been asked to produce a design based on the
AAP standard.  She changed the AAP DTD model by stripping out a
number of tags and content models -- and tried to add as few new
tags/models as possible.  She modularized her DTD to reflect the
split between the magazine's editorial and authorial production
roles.  All the base level elements were therefore included at
the "article" module level (ie. the authorship level), but were
accessible from the "magazine" (editorial) level module.

Yuri Rubinsky's design illustrated the use of content tags and
tagging techniques.  He felt that content could only be modelled
in consultation with end users of the DTD, and so had "cheated"
by actually phoning the staff of "The New Yorker" magazine!  Some
content was fixed, but some was surprisingly varied -- for
example, titles in the table of content were not always the same
as those actually used in the articles themselves.   Rubinsky
discussed how he had identified separate pieces of content and
justified his design decisions in terms of how they modelled the
end users' perception of the magazine content.

Halcyon Ahearn produced a DTD that took advantage of the markup
minimization features of SGML.  She had scanned pages of "The New
Yorker", then used regular expressions to enable an SGML parser
to recognise certain character combinations in the file as being
DTD tags -- ie that two successive carriage returns indicated the
end of one paragraph and the start of the next.  She altered the
SGML declaration to permit the use of long, descriptive tag
names, but used parameter entities to simplify the content models
used in the DTD.  She had also taken advantage of SGML's SHORTREF
and LINK features to further minmize markup without losing any
functionality.

Dennis O'Connor had written a DTD to support print publication.
Although he was concerned with formatting issues, he had found it
necessary to use a surprising number of content-oriented tags.
He used an empty <font> element to signify changes to the current
font when printing.  Dimensions of figures and graphics were
expressed in terms of the number of whole column widths they were
intended to span.

David Durand and Steve DeRose had produced a DTD to support
hypermedia applications.  They stated that their design had been
based on the creation of a traditional (non-hypermedia) DTD
design, with additional "milestone" tags incorporated to aid
navigation and reference within the document.  Some of the
hypermedia links would have to be hand-coded into a Hypermedia
document that conformed to their DTD, whilst other links could be
derived from database versions or indexes of the document.  The
practical issues involved, such as dealing with different graphic
types, would take a long time to resolve (and they had not
tackled them in their design).  Putting IDs on every tag seemed
the simplest way to allow hypertext linking/searching/navigation;
they suggested the IDs are the basic pointing mechanism within
SGML.  Their envisaged application would use an "authority list"
of document object IDs to control hypermedia linking and
retrieval.  They remarked that the more a DTD facilitated content
tagging, the easier it would be to convert a document conforming
to such a DTD into some sort of hypermedia form.

10. The OSF DTD Recommendations:  Lessons We Learned -- Jeanne El
Andaloussi (Bull SA), Eve Maler (Digital Equipment Corporation)

Andaloussi and Maler (A&M) gave an account of their experiences
developing the OSF's DTD recommendations.  They had developed a
DTD design methodology as part of the project, and went on to
describe its benefits -- namely that it was a rigorous process
enabling high quality DTD design with clear rationale;  it also
produced very thorough and understandable documentation to
accompany the DTD.

In order to produce the guidelines, the OSF had asked its members
to send DTDs so that they could be analyzed by a design team.
The bulk of A&M's presentation focussed on the development and
use of a tree-based diagram notation to reflect a DTDs overall
design, element hierarchy and so forth.  Making DTD designs
explicit in this way had enabled them to compare and better
justify DTD designs decisions.  Each stage of the DTD design
process was also thoroughly documented through the use of
standardized forms.

A&M felt that their design methodology and accompanying tools
bore close similarities to current software development
practices.  The itterative design process had proved to be very
thorough, and encouraged designers to work for and justify the
functionality they really wanted.  The use of the tree-diagram
notation approach, and standard documentation forms had enabled
non-SGML experts to contribute effectively to the design process.

A&M are hoping to produce a more generally applicable and
formalized DTD design methodology.  They repeatedly identified
the benefits to be gained from agreeing and documenting clearly-
defined design axioms and principles throughout the design
process.  The creation and maintenance of a glossary of key terms
played a major part in ensuring that all those involved in the
design process used the same terms in the same way -- thereby
minimizing any ambiguities and misunderstandings between members
of the DTD design and maintenance teams.

11. Guidelines for Document Analysis Reports and a Tool for
Maintaining many, varied DTDs -- Dennis O'Connor (Bureau of
National Affairs Inc, USA)

Dennis O'Connor (DC) described his work to develop guidelines for
writing Document Analysis reports -- which provide a means for
maintaining many, varied DTDs.  The reports facilitate inter-
departmental communications, permit control over the growing
volume of document types, and enhance the quality of available
documentation.

DC's guidelines define the structure and content of Document
Analysis reports, and are written in plain, non-technical
English.  The Guidelines require a Document Analysis report to
include background information -- such as "Who did the document
analysis?". "Which documents were looked at?", "When was the
analysis done?", "Why was the analysis done?"

The bulk of the guidelines require the author of a Document
Analysis report to define the document type, what a document is
made up of (elements, attributes, character sets, special
characters and other entities).  The guidelines also require the
recording of the history of the Document Analysis report -- when
was it written?  What, when, why were changes made?

A well written Document Analysis report provides a means for
those who know about documents to communicate with those who know
about SGML.  The report effectively provides some of the
documentation which ought to accompany a DTD.


12. Sharing the Lessons of the CALS SGML Activity -- Beth Micksh,
Robin Tomlin (Intergraph, USA).

The CALS (Computer-Aided Acquisition and Logistics Support)
initiative, which started in 1985, aims to improve the timeliness
of documentation and improve the quality of weapon systems.
Micksh and Tomlin (M&T) felt that users in other industries could
learn from the experiences of those involved in CALS -- and
perhaps adopt similar terminology, or methodology in their own
areas.

The presentation took the form of a mock news cast, with CALS
celebrities being given brief interviews.  The areas covered and
interviewees' remarks are summarized below:

	* MIL-28001 the SGML part of CALS.  Work was initially 
	  industry/committee driven but now there is a need for a 
	  single central body (such as the Dept of Defense) to
	  take over and unify the administrative functions.
	* Output Specifications -- a standard way of exchanging
	  formatting information (an appendix of MIL-28001 currently 
	  under revision).  Other industries would probably
	  benefit from producing their own output specifications.
	* Electronic Review -- a machine-readable way of handling
	  structured comments on electronic texts. ie comments are 
	  inserted in a standard way (using SGML tags and
	  attributes) -- so that they can be easily searched,
	  retrieved etc.  Structured comments are quite easy to use 
	  and are vital to any complex review project which requires
	  careful monitoring and control of comments.
	* Declaration Subsets -- declarations in addition to the
	  external declaration subset (DTD).  Users can include things 
	  in the declaration subset which can override parts
	  of the DTD.   This effectively makes a DTD
	  modifiable/modularized -- so it would
	  be possible, for example, to take a DTD and add
	  electronic review functionality without affecting the main DTD.
	* CALS SGML Registry and Library -- registration/evaluation
	  process for all the elements etc. going into CALS DTDs, FOSIs etc.  
	  The aim of the library is to allow DTD designers/users to login 
	  and download example DTDs, entity sets and so on.

13. SGML:  Extending and Confirming Object-Based Software -- Don
Davis (Interleaf, USA)

Much of Don Davis' (DD) presentation looked at the particular
Object-Oriented approach to SGML adopted within the Interleaf
range of products.  Such details are omitted here, and interested
readers should contact either DD or their nearest Interleaf
agent.

DD suggested that it is useful to think of an SGML encoded data
stream as a set of objects, where the element structure provides
object "handles", and element-in-context details add more
information.  Treating SGML encoded data in this manner, enables
the use of Object-Oriented programming languages and techniques
to support the mapping of SGML files to/from proprietary, object-
based editing/processing systems.  DD proposed that an Object-
Oriented approach provides significant benefits when developing
and deploying SGML-based applications.

14. Poster Session -- Various Speakers

This poster session offered software vendors and developers the
opportunity to demonstrate and discuss their products.  The
companies present were as follows:

Agfa CAPS, ArborText, Avalanche, Data Conversion Laboratory,
Datalogics, Exoterica, Frame Technology Corporation, Interleaf,
Open Text Corporation, Recording for the Blind, Silicon Graphics,
SoftQuad, TMS Incorporated, US Lynx, Xerox, Zandar.

15. International SGML Users Group Meeting

Considering the number of people at the conference, this meeting
was (surprisingly) poorly attended, only about 25 people came.
The meeting was chaired by Steve Downie (Secretary of the
International SGML Users' Group).

Brief reports were heard from representatives of the following
Chapters:
	SGML Forum of New York
	Canadian SGML Users' Group
	Mid-Atlantic SGML Users' Group
	Mid-West SGML Forum
	SGML  UK
	SGML France

There was a limited discussion on what the various Chapters did
(and should do), how much they charged etc.

Steve Downie reported that the Canadian Chapter will be making a
proposal to the International SGML Users' Group to launch an
initiative to produce reports on issues of typical concern to
SGML systems implementors (ie human, contractual and technical
issues).  These reports would be distributed through the Users'
Group membership to facilitate their work.

Yet again, calls were made for the publication of conference
papers -- either by the GCA or the Users' group.  A
recommendation to this effect will be put to the Users' Group
Committee.

16. The Society of Automotive Engineers J2008 Task Force -- Jim
Harvey (Volt, USA)

J2008 is a standard for the automotive industry which resulted
from the passing of the Clean Air Act.  J2008 will shortly appear
as a published document.

Producing J2008 involved setting up a number of committees to
look at:  data modelling (ie what data should be brought
together?), DTD development, Administration, Orientation,
Communication and Graphics (TIFF, CGM).  Bringing the DTD and
Data Model together had revealed some problems    however two
manufacturers are now testing the DTD, and their comments will be
taken into account when it is revised for publication.
Manufacturers aim to use a database of J2008-conforming documents
for publication;  third party information provider will be able
to take J2008-conforming data and re-use the information to
supply to independent service providers.

J2008 makes no recommendations about the hardware or software
anyone should use    but all adherents to the standard will want
to re-use their data as much as possible.  A draft version of the
DTD will soon be publicly available for comment.

17. The Air Transport Association/Aerospace Industries
Association, Rev 100 -- Diane Kennedy (Datalogics, USA)

ATA is a written specification (cf CALS), first issued in 1956
and currently undergoing its thirty-first revision.  In the 1988,
revision of ATA 100, it was decided that SGML should be the
standard for the interchange of electronic data (with graphics
conforming to CCITT 4 and CGM).

Revision 31 of ATA 100 (approved two weeks ago) includes six DTDs
for Aircraft Maintenance Manuals, Aircraft illustrated Parts
Catalogues, Engine Shop Manuals, Engine Illustrated Parts
Catalogues, Service Bulletins, and Master Minimum Equipment
Lists.  ATA's first attempts to develop a DTD had relied on a
single small group which had proved to be very slow.
Consequently, a set of controls were specified to speed up DTD
development -- including the setting up a strong committee
structure to handle DTD development and approval, and the
production of a DTD requirements document and an industry
glossary.  All six current DTDs have been harmonized for revision
31.

The DTD working groups have a majority membership of subject
matter experts (with little/no SGML experience), with the bulk of
SGML work being done by SGML experts working in conjunction with
the groups.  SGML DTD modelling is done using structure charts.
Conformance to the ATA DTD recommendations is voluntary -- so
they have introduced four levels of DTD to allow for options
based on manufacturers' needs.

	* Level 1 DTDs (DTD is precisely defined;  no options; all
	  tags in glossary)
	* Level 2 DTDs (DTD is precisely defined; options allowed,
	  all tags in glossary)
	* Level 3 DTDs (DTD defined by document producer following
	  ATA framework;  all tags in glossary; used where there is 
	  minimal agreement between a group of manufacturers).
	* Level 4 DTDs (DTD defined by document producer; all tags
	  in glossary; used for unique documents - ie where one manufacturer 
	  produces a unique product).

All document revisions are strictly controlled and recorded.  The
have adopted the CALS approach to tables.

In 1993, the number of DTDs within the ATA is expected to more
than double.  Many airlines are automating to take advantage of
such new publication/information standards.

18. The Davenport Group for On-line Documentation -- Various
Speakers

Fred Dalyrymple (FD) summarized the reasons for the formation of
the Davenport Group.  Manufacturers, vendors and users had all
experienced the frustrations of working with proprietary
solutions, and wanted to take advantage of emerging standards
such as SGML and HyTime.  Standardizing information would enable
generalized, shareable help systems, on-line documentation,
virtual libraries, information webs etc.  The Davenport group had
established four working groups, dealing with

	* architectural forms (producing DASH and SOFABED
	  standards)
	* the Committee for the Common Man
	* SGML Query Language
	* SGML Resources (cf. the CALS Registry and Library)

In January 1992 a Davenport workshop on Architectural Forms was
held, in the hope that this would enable the unification of
several manufacturers DTDs.  At that time, not many people knew
about HyTime, which was still a Draft International Standard.
General aims included

	* unifying DTD linking specifications (via HyTime's
	  Architectural Forms)
	* enabling equal access to diverse technical documents
	* separating link descriptions from access methods
	* enabling the assembly of several documents into compound
	  documents
	* enabling document publication in a variety of media from
	  a single source document.

So far, they have produced the Davenport Advisory Standard for
Hypermedia (DASH) document -- which includes specifications
relating to producing indexes, glossaries, bibliographies, tables
of contents and cross references.  Participants have also gained
a working knowledge of HyTime.  The workgroup responsible for
producing the DASH has been very productive, and once they have
published a final version of the document (due soon), the group
will be disbanded.

Lar Kaufman spoke about the work of the Committee for the Common
Man (where "man" alludes to the Unix command to request on-line
documentation, and the set of macros used to produce such
documentation).  Originally a separate endeavour, the CftCM had
soon decided to work within the Davenport Group.

The CftCM had emerged following a discussion on the USENET
newsgroup comp.text.sgml, and had worked largely through email
correspondence between expert volunteers.  To their surprise,
they had found "man" papers to be less consistent, structured and
uniform than they had first imagined.  The results of their work
will be posted to comp.text.smgl as a "White Paper", with a
request for comments.  It should be noted that the CftCM envisage
completely portable "man" pages, -- ie across all systems (not
just UNIX).  Although they have defined a tag set, they have not
yet agreed a heirarchical structure for the tags -- they also
still need to consider how to convert the `legacy data' of
existing "man " pages, and how best to take advantage of the
existing tools for viewing/manipulating "man" pages.

19. Implementing a HyTime System in a Research Environment -- 
Lloyd Rutledge (University of Massachusettes, USA)

It is very difficult to report the content of Lloyd Rutledge's
(LR) presentation -- since it relied very heavily on complex
schematic diagrams showing a number of models.  LR showed a model
for his HyTime Hypermedia Presentation System and also one for
the data layers in the system.

Much of his presentation focussed on a diagram showing a model of
his Hyperdocument Processing System being developed at the
University of Massachusettes.  Built around a shared database,
the model included a conventional SGML parser, through which all
SGML/HyTime input to the system had to pass.  The results of
passing files through the parser are stored in the database.
Next, a HyTime Engine takes in the output of the SGML parser and
checks the HyTime-specific markup;  it can query the SGML
document stored in the database, if necessary,  The output of the
HyTime Engine is also stored in the database.

The output from the HyTime Engine is also passed to a HyTime DTD
(HDTD) processor (which can also query the HyTime Engine's output
to the DTD).  The HDTD processor outputs to the hypermedia
presentation system (ie the application), and takes queries from
it which the HDTD processor in turn uses to query the database.
In the case of large documents and/or interactive processing,
some outputs of the HDTD processor may have to be passed back to
the SGML parser -- eg if the processor needs to open a particular
file (which will have to go through the SGML parser and HyTime
Engine as usual, before it can be used by the HDTD processor).

20. Poster Session -- Various Speakers

This poster session looked at the problems of handling tables
within SGML.  There were ten presenters, covering many of the
concerns about tables.  Since poster sessions thrive upon ad lib
discussions, I have not reported them here.

21. The User SGML at the Boston Computer Society -- Sam Hunting
and Irina Golfman

Sam Hunting (SH) gave some of the background to the Boston
Computing Society -- which is the world's largest user group
(c.25,000 members), is organized by volunteers, and publishes
around 25 newsletters, maintains bulletin boards and help lines
(each of which represent document content).

SH identified the typical problems of maintaining information in
conventional print or electronic form.  Document content cannot
be easily retrieved, repackaged, or delivered in a number of
different media from a single source.  Presentation of such
information is enforceable only by editorial staff working to
strictguidelines, and conventional page markup is difficult to
automate.

SGML allows information content to be retrieved, repackaged and
so forth;  it can also permit the enforcement of house-style
structures (thereby permitting automated page makeup).  The
Boston Computer Society has the same difficulties implementing
SGML as many other organizations have experienced -- resistance
from staff, lack of SGML expertise, no easy to use, attractive,
interactive SGML applications, no "shrink-wrapped" tools to
process SGML etc.

Irina Golfman (IG) described how they had tried to overcome the
problems of implementing SGML.  Volunteers had had to be educated
so that they could develop some enthusiasm for SGML.  Vendors of
SGML-aware products had been invited to give presentations.  The
Hypertext-subgroup sponsors the "SGML Panel", to discuss the
practical implementation issues of using SGML.  A growing number
of BCS print publications would be produced from SGML source
files (at present four newsletters are done in this way).
Several other projects and activities were also envisaged.

22. Document Management in Production Publishing Environments --
William Trippe (Xyvision, USA)

Xyvision sells UNIX-based production publishing systems.  William
Trippe (WT) reported that they are encountering a growing
interest in the use of SGML within traditional publishing
organizations.  Xyvision's experiences have shown that there are
great production benefits to be gained from using SGML -- and
that one of the keys to successful SGML system implementation and
acceptance is satisfying the concerns of editorial personnel.
However, WT remarked that it is important to remember that large
SGML applications require lots of computing "horsepower" (eg
large CPU, memory disk storage, fast backup devices etc.)

In order to ensure its success, WT recommended that SGML needs:

	* more of an infrastructure
	* standard data modelling techniques
	* standard training and documentation tools
	* a more helpful and obvious vocabulary
	* more helpful tools

23. An SGML Pilot Project:  The OSTI Reports Received List --
Norman Smith (Science Application International Corporation, USA)

The Office of Science and Technical Information (OSTI) maintains
a large energy science and technical database (2 GB of data, with
a 14 month maintenance window).  The database can be used to
generate bibliographies, reports etc.

The Reports Received List is a paper document produced to
accompany the microfiches of all the reports received at OSTI
each week.  Norman Smith's (NS) pilot project aimed to produce
the Reports Received list as an SGML document.  Since it was
already highly structured, it was quite straightforward to alter
their existing structured production process to output an SGML
document.  They also wrote an electronic document viewer (as a
stand-alone application) -- which adds value to the SGML document
and offers potential benefits for other SGML applications that
might be developed at OSTI.

NS summarized the lessons that they learned from their
experiences during the pilot project   
	
	* Document analysts and application programmers must work
	  closely together
	* Treat DTD development as a software development project
	* SGML applications are easier to develop when approached
	  from a database-oriented perspective.
	* use selective parsing
	* add value to distributed documents (by producing tools
	  etc) for non-SGML users.

NS also offered the following hints for anyone about to embark on
an SGML pilot project:
	
	* pick something do-able but not trivial
	* keep it simple and non-critical.  (If it succeeds   
	  good.  If it fails - it can be thrown away).
	* approach from a database perspective
	* involve the whole organization
	* work within the h/w s/w framework provided
	* provide adequate resources for the project
	* don't be afraid to try unconventional approaches

24. Frame-Based SGML -- Len Bullard (Paramax, USA)

Len Bullard demonstrated the Interactive Authoring and Display
System (IADS) that Paramax had developed to create and deliver
frame-based SGML.  The demonstration content discussed the nature
of frame-based SGML.  IADS was running under MS-Windows, taking
raw SGML frames and displaying them interactively.

A "frame" is an addressable SGML node in a map of SGML nodes.
The map of frames is a flat file web of frame nodes connected by
links.  Len Bullard showed the element declarations for the
frames, buttons, hotspots etc that are hard-coded into the SGML
file(s) which underlie IADS.

25. SGML as Foundation for a Post-Relational Database Model --
Tim Bray (Open Text Corporation, Canada)

Tim Bray (TB) discussed how current attitudes to text processing
limited the usefulness of the entire process.  For example,
current opinion tends to reflect such statements as:

	* files belong to applications
	* good printout = good application
	* data sharing = data conversion
	* no ad hoc access to information
	* intolerable application backlog

He proposed that the field of MIS had been a comparable mess
prior to the adoption of a number of techniques -- which could
also be usefully employed in the text processing arena.   These
techniques included:-

	* data centering via database management
	* data modelling for system and language
	* using a database access language
	* indexing for performance
	* using 4GLs and GUIs
	* using administration features (eg concurrency control,
	  transaction model, an audit trail etc).

However, unlike MIS, TB argued that the field of text processing
could not adopt a relational approach to implementation (cf the
success of Relational Databases within MIS).

TB argued that text processing could not follow a relational
model because:

	* text is tabular and not normalized
	* in text, nothing above the character level is atomic   
	  -- so entity-relationship modelling is hard
	* neither the relational algebra nor calculus are an
	  effective Data Access Language for text.
	* it is possible to decompose a DTD into a relational
	  schema, but it is a bad idea (except in cases where 
	  documents are of an identical, tabular structure eg insurance
	  claim forms.

With regard to the techniques mentioned earlier, TB proposed that
SGML represented the data modelling system and language.  A
database language is under development (eg DSSSL).  We are
already able to index for performance use 4GLs and GUIs and make
use of database administration features.  All that is now
required is an SGML-based, post-Relational implementation.

26. The SGML View of a Database -- Bob Barlow, Fritz Eberle
(AGFA, CAPS)

Bob Barlow (BB) began by stating that when people are considering
implementing an SGML database, they are confronted by a number of
questions    "To what level of granularity should I store my
data?", "How do I identify the stored objects?",  "How do I
accomplish revision control?"   Granularity issues will affect
the pieces of data that are available for use/re-use, location in
the database etc.  Identifying objects involves considering if
there is a relationship between SGML element IDs and database
IDs, how to link to SGML objects, what information is stored in
the database, and what is stored in SGML attributes.  Revision
control could be carried out within the SGML document or within
the database environment.

Fritz Eberle offered some general approaches for resolving the
sorts of questions raised by BB.  For example, when considering
approaches to SGML document management, it is possible to exploit
structure at the element, entity or content level -- or a
combination of any of these.  One of the interesting cases raised
in Fritz Eberle's presentation was an example where two
hyperdocuments share the same entity; whilst the entity is the
same, the links that each uses to retrieve the entity may have
different names.

27. SGML Queries -- Paula Angerstein (Texcel, USA)

Paula Angerstein (PA) as Chair of the panel dealing with the
afternoon's topic of SGML and Databases, took this opportunity to
say something about SGML queries.  Essentially a query is a
question about what is in an SGML document or documents.  As far
as users are concerned, they want access to all the information
in the document which will resolve their query.  However, PA
pointed out that developing an SGML query language raises a
number of issues, namely:

	* whether to query parsed or unparsed data
	* whether to query the ESIS and beyond
	* applying queries to classes vs instances
	* allowed "root" of a query
	* user interfaces to the query language

PA had asked all of the panel speakers (who had developed query
languages) to discuss their approaches to resolving a number of
test queries.

27.1.   Comparative Implementation of the SGML/Search Query
Language -- Francois Chahuneau (AIS/Berger Levault, France

SGML/Search is the name of both a query language, and a system
implementing this language developed by AIS (running on top of
PAT/Lector from OpenText).  SGML/Search considers the document
(or documents), if all conform to single DTD) as a database.  It
uses some of the structural information given in the ESIS, it
loads SGML documents and extracts SGML fragments, and it returns
SGML fragments in response to queries.

SGML/Search can user numerous filtering conditions to refine its
queries -- ie element type, element hierarchical position,
structural or lexical distance (from a known point), attribute
constraints, logical connectives to combine queries, set
operations etc.

Francois Chahuneau (FC) showed how SGML/Search would handle the
sample queries proposed by Paula Augerstein, then went on to
discuss some queries, which would be "impossible" within the
syntax of SGML/Search (though the queries could be resolved in
other ways).

FC compared SGML/Search queries to DSSSL queries -- pointing out
that SGML/Search is oriented towards fragment retrieval rather
than processing and that an SGML/Search query target can only be
an SGML element.  He claimed that any valid SGML document can be
imported into SGML/Search -- giving full access to the SGML
structure and text content, as well as high indexing and search
performance via PAT's full-text engine.  SGML/Search is an open
tool for systems integrators -- offering a forms-based GUI and a
complete API accessible to C/C ++ programmers.

27.2    Structured Queries and Location Models:  DSSSL -- Paul
Grosso (ArborText, USA)

Paul Grosso stated that DSSSL queries are useful for locating
objects within a tree that are going to be acted upon.  He also
reminded everyone that the DSSSL query language is only one
component of the whole DSSSL standard.

The DSSSL query grammar is based upon several premise, namely
that:

	* a query works on a tree of digits
	* objects have object attributes
	* a query uses relationships based on pre-order traversal
	  of the tree
	* a query uses information from both structure and
	  "content"
	* a query can "start" from the tree root, or any object, or
	  set of objects

PG went rapidly through a series a slides describing the DSSSL
query language in more detail, then showed how DSSSL queries
would be written to satisfy the text cases supplied by Paula
Angerstein.  Readers  who wish  to know more about DSSSL might
try contacting PG through the conference organizers.
Alternatively they could contact the committee of their national
standards body assigned to consider DSSSL.

27.3    Structured Queries and Location Models:  HyQ -- Steve
DeRose (Electronic Book Technologies, USA)

HyQ is the portion of HyTime concerned with querying.  Querying
is a central part of hypertext, and developers require systems to
have both authored linking and dynamic querying.  Steve DeRose
(SD) characterized a query as a mechanism for specifying sets of
things -- where a set of things is a "node list" (HyTime-speak
for an ordered list of locations in the information world,
locations can be spread across any number of documents and can be
at any level from characters to the world!)

HyQ provides a number of functions for operating on node lists   
set operations, list processing, filtering/selection operations
etc.  SD did not have the time to cover the features of HyQ in
any real detail, and interested readers should try retrieving
information from another source (ie see remarks at the end of the
previous section, or contact SGML SIGhyper).  SD summarized the
design principles that underlie the HyQ query language:

	* must be adept, not just capable, in order to deal with
	  complex trees
	* must handle multilingual data and different SGML syntaxes
	* must provide natural access to all SGML phenomena
	* it is not designed for end-users to type in (it will be
	  hidden behind 4GLs and GUIs)
	* it has a macro mechanism to make it manageable

SD then talked through the HyQ solutions to the sample queries
distributed by Paula Angerstein.

27.4.   Structured Queries and Location Models: SFQL -- Neil
Shapiro (Scilab Inc, USA)

Neil Shapiro (NS) reported on the ATA's experiences of querying
with SQL (the precursor to SFQL).  SFQL is being developed in
response to the drive to produce Interactive Electronic Technical
Manuals (IETMs) -- which consist of both data and applications,
requiring quick and intelligent access to the information they
contain.  Typical IETMs are distributed on CD-ROMs and involve
using very large files.  Current problems arose from the fact
that proprietary access software (requiring proprietary indexes)
lead to software dependence.  A typical airline will receive
several IETMs from different manufacturers, and each will
necessitate the use of different user interfaces (which is
clearly unsatisfactory).

The ATA solution is to work towards software independence,
dividing the search engine (server) from the user interface
(client).  This means the server can represent data in a
proprietary format, but the user will only see  a single user
interface.  NS put up the following model diagram

		User interface (client)
			^       |
			|       |
			|       |
		--------|-------|-------   can standardize this
		|       |       |      |   area (the server
		|       |       |      |   request language
		--------|-------|-------
			|       |
			|       |
			|       v

		Search engine (server)

Possible standards available to the ATA for the server request
language are SQL (structured Query Language) and Leverage SQL.
SQL has been extended to give better support for text-based
queries;  the new version will be called SFQL (Structured Full-
text Query Language).  SFQL will support such things as:

	* fielded searches
	* advanced searches (using fuzzy matches)
	* retrieval control (relevance ranking, projection)
	* extended data types (SGML, CGM, TIFF CCITT etc).

SFQL is an abstract access model, which conceals all index and
storage design differences    whilst the SFQL conceptual model
enables views of data without saying how it should be stored.

After reminding everyone that users should look for software
independence, and not just data portability, NS went on to show
how SFQL would resolve Paula Angerstein's sample query.

28. Structured Queries and Location Models, Part II -- Various

The idea of this session was to build on the query language
presentations given earlier.  Initial comments revealed that
DSSSL will only work with SGML documents, whilst HyQ needs to be
able to support both SGML and other types of document.  HyQ does
not have the higher level constructs of some query languages, but
this is intentional and can be circumvented by developing HyQ
macros.

Unfortunately the discussion degenerated into little more than a
series of accusations and counter-accusations about what the
various query languages could do and how efficiently they
achieved this.   It was mostly impossible to distinguish facts
from opinions -- and perhaps the point most clearly demonstrated
during the session was that the ISO standards developers held
SFQL in low esteem.

29. Transforming Airworthiness Directives from Paper to CD-ROM -- 
Hally Ahearn (Oster & Associates Inc., USA)

The first half of Hally Ahearn's (HA) talk was a conventional
overview of SGML theory -- looking at such issues as the SGML
processing model, the SGML Document, SGML as metalanguage, the
role of the parser, the process of document analysis, and when to
use private or public entities.

In the second half of her presentation, HA demonstrated how ASCII
output from a word-processing system could be successively marked
up and parsed to give ever greater levels of structural encoding.
The file could then be edited using SGML-aware structure editing
tools so as to fully prepare it for processing for storage in a
database and/or display via a presentation system.

30. The Development of a Technical Support Database for On-Line
Access and for Publication on CD-ROM -- Elizabeth Jackson (K.E.B.
Jackson Consulting)

Elizabeth Jackson gave a high-level overview of the work she had
done towards producing HelpdisQ -- a database of multi-vendor
technical support for PC hardware and software products
(published as a CD-ROM in May 1992).  Regrettably, this
presentation lacked detailed information, which was only
available in a handout available at the conference.

31. The Making of Microsoft Cinemania: SGML in a Multimedia
Environment -- John McFadden (Exoterica Corporation, Canada)

Throughout this presentation John McFadden (JM) was able to
demonstrate the main features and capabilities of MicroSoft's
"Cinemania" CD-ROM.  The disk has only been available for about
three weeks and costs $79 -- although Microsoft had spend nearly
$5m to produce it.

Running under MS Windows, the user interface is a typical mix of
windows, buttons and icons to access the Cinemania data.  As well
as selecting particular films, the user can call up glossary
definitions, film-star biographies, academy awards lists, sound
clips and video stills.  The Cinemania database contains five
traditional publications converted into a single SGML document
database.  The CD-ROM contains 22Mb of markup and text (the DTD
is 64kb) -- and a total of 220Mb of data and software (so the
disk is still less than half full!).   The Cinemania database
also contains 1.6 million identified SGML elementa, with typical
element granularity of ten characters per element (cpe)    whilst
name elements typically have a granularity of 2cpe.  The database
stores 19000 movie listings, 300 biographies, 500 articles, 745
reviews, a glossary, 175 still video clips etc.

The original paper documents were re-keyed to give an electronic
form, which was automatically tagged, marked-up, and quality
controlled using Exoterica's "OmniMark" product.  Exoterica's
internal "CheckMark" product was used for any final structural
editing, and to handle exceptional cases which had not been
tagged automatically.  The resulting SGML was stored in an SGML
Knowledge Base.  Microsoft then took the SGML Knowledge Base,
used OmniMark to control linking, flow and formatting (and some
presentation design experience), to get the SGML data into the
Microsoft Multimedia Viewer -- which is the front-end
presentation system used to create the "Cinemania" product that
users actually see.

Looking at some instances of conventional SGML encoding JM showed
how OmniMark could greatly simplify the markup which has to be
added to a document.  This markup stood in for conventional raw
SGML encoding -- which made it much easier for authors with
little SGML experience to edit the document.

Exoterica worked hard to ensure that the resulting SGML Knowledge
Base was completely independent of its intended use (eg it would
now be a simple matter to create new electronic/paper books based
on selected extractions from the Knowledge Base).  There are no
explicit links in the markup;  these are generated when the data
is translated for the Multimedia Viewer -- which makes updating
or amending the data in the Knowledge Base, or even adding new
categories, very much easier.

Approaching the development in this way had proved to be very
fast and economical.  They have produced an extremely
maintainable and adaptable information resource with guaranteed
quality levels.  OmniMark had been used to expedite things at
almost every major point in the development of the system.  JM
also reserved high praise for MicroSoft's MultiMedia Viewer,
which only costs $495.

32. Process of Converting from Paper Documentation to SGML Based
CD-Rom -- Ken Kershner (Silicon Graphics, USA)

Ken Kershner (KK) stated that the reason behind Silicon Graphics
decision to adopt SGML had come from their wish to enable the
company, its developers and customers to deliver technical
reference information (in electronic form, for publishing on
paper or on-line).  This was a very visual demonstration, with
the results of Silicon Graphic's effort being displayed and
discussed throughout.

The benefits of using SGML included:

	* reduced printing and freight costs (=money saved)
	* reduced hot line calls (=money saved)
	* quick information retrieval (=money+time saved)
	* creating allies in manufacturing and Customer Support

The SGML-based electronic technical reference information system
produced is called IRIS InSight.  It provides "one-stop-shopping"
for on-line documentation, support information, and tightly
integrated digital media.  Although all the source files are in
SGML, these are compiled into books for DynaText (Electronic Book
Technologies browser) to display to the user.

Usability tests of the pre-alpha release of IRIS InSight using 16
novice users, had tested link behaviour and task times.  Results
showed that users preferred scrolling within books to
conventional methods of displaying on-line technical information.
Task times averaged 10.2 minutes.

Taking what they had learned from the first session, Silicon
Graphics conducted a second usability test at the pre-beta
release stage of development, also using 16 novice users.  They
tested user preferences to on-line vs paper documentation;  they
were a little surprised (based on previous research) to find that
task performance using on-line documentation was equal  to that
using paper.  Task times averaged only 4.3 minutes.  The
researchers believe that as users become more accustomed to using
on-line documentation, their efficiency will improve, as will the
users' learning curve.

KK identified a number of traps to avoid when developing such
projects:

	* there can be conflicts during development between
	  sticking strictly to the standards and guidelines that 
	  have been adopted, and getting a working product.
	* beware of underestimating the effort involved in
	  converting data beware of underpowered hardware
	* avoid making any dependency changes when development is
	  nearing a milestone point.
	* start licencing discussions early

KK then identified the things Silicon Graphics would do again on
a similar project:

	* get together a committed engineering team
	* use outside experts
	* survey customers
	* build a multi-functional team (including authors,
	  editors, trainers, customer support, software engineers etc.
	* pick the pilot project very carefully
	* document the entire development process.

33. Converting 180+ Million Pieces of Paper -- Eric Freese (Mead
Data Central, USA)

Mead Data Central maintain a number of massive, very diverse
databases of textual information.  Their holdings currently
represent the equivalent of 180 million pieces of paper, to which
another 40 million are added annually.

Erice Freese (EF) gave some of the reasons why Mead had decided
to convert all their electronic data to SGML form -- primarily to
take advantage of SGML's device, application and language
independence, and to use related standards such as HyTime and
DSSSL.  The more structural information they can capture in their
encoding scheme, the easier it will be to provide satisfactory
responses to users' queries and requests.  For the SGML data they
already process, they have adopted the FOSI approach (from CALS)
to get formatted output, but they will certainly adopt DSSSL once
a stable standard and software tools have appeared.  EF claimed
that because of the huge range of document types stored in Mead's
Databases, they will ultimately need to develop between 4,00-
7,000 DTDs (and 8,000-14,000 FOSI's to allow for paper and on-
line formatting).

The conversion process will need to be almost entirely automated,
as it would be unrealistic, if not impossible to do it by hand.
The conversion process is due to start in 1994 and will need to
handle 1.5 million documents a day.  The production of database
definitions, DTDs and FOSIs is expected to take only 5 months   
whilst the actual conversion of the documents themselves is
scheduled to take only 6 months.  The result will be one of the
largest SGML applications in the world.

EF believes that standards such as HyTime will become ever more
important to them over time.  The sort of search and retrieval
systems he envisages for the future at Mead include:

	* distributed environment (possibly world wide)
	* GUI
	* Multimedia - hypertext, graphics, video, audio
	* international sources and delivery (10+ source languages,
	  multilingual documents, and possibly even automatic language 
	  translations)

34. Poster Session: Conversion -- Various Speakers

This session considered various approaches to, and experiences of
converting to and from SGML documents.

35. Back to the Frontiers and Edges -- Michael Sperberg-McQueen
(University of Chicago, USA)

This was the closing keynote speech, in which Michael Sperberg-
McQueen speculated on some of the SGML-related developments he
expected and/or hoped to see over the coming years.  Amongst a
long list of various issues that he raised were such points as   
"Will we find a way to encode semantics?", "Will a methodology
for developing DTDs evolve?" and a number of similar questions.
The full text of this presentation is given in Appendix II)

For further details of any of the speakers or presentations,
please contact the conference organizers are:

Graphic Communications Association
100 Daingerfield Road, 4th Fl.
Alexandria VA 22314-2888
United States of America

Phone: (703) 519-8157     Fax: (703) 548-2867
=================================================================

You are free to distribute this material in any form, provided
that you acknowledge the source and provide details of how to
contact The SGML Project.  None of the remarks in this report
should necessarily be taken as an accurate reflection of the
speaker's opinions, or in any way representative of their
employers' policies.  Before citing from this report, please
confirm that the original speaker has no objections and has given
permission.
==================================================================
Michael Popham
SGML Project - Computing Development Officer
Computer Unit - Laver Building
North Park Road, University of Exeter
Exeter EX4 4QE, United Kingdom

Email: sgml@exeter.ac.uk    M.G.Popham@exeter.ac.uk (INTERNET)
Phone: +44 392 263946       Fax: +44 392 211630


APPENDIX 1

		 THE SGML YEAR IN REVIEW    1992
		 by Yuri Rubinsky, SoftQuad Inc


STANDARDS ACTIVITY

1.  The highlight of the SGML year, I think most people would
agree, was the adoption of HyTime as an international standard.
Sort of like a child born already having been accepted into a
good university, HyTime has been considered for some time a
necessary component of many initiatives, including the grand old
US DoD CALS. I think we're going to see an outburst of activity
and creativity revolving around HyTime over the next year.

The standard will be published shortly by ISO in Geneva. Copies
will be available from national standards bodies like ANSI and
BSI; there will probably be a few authorized redistributors like
GCA and TechnoTeacher.

2.  Conformance Testing Initiative: Spearheaded by the GCA in the
US and the National Computing Centre in the UK, the SGML
conformance testing initiative slowly but surely attempts to
gather the momentum (and money) it needs to proceed. There seems
to be general agreement that independent testing of SGML
capabilities is needed (with some vocal exceptions citing
examples of the market deciding what conformance means) but no
agreement whatsoever on where the money should come from.
Nonetheless, the GCA GenCode Committee continues to explore the
possibilities.

3.  The SGML Review: ISO regulations call for a review of each
standard around the time of its 5th birthday. That review will
continue over the next few months; Dr. Goldfarb is chairing the
Special Working Group on SGML and invites comments and
suggestions.

4.  From Jim Mason, Convenor of WG8, comes the following news
clip:
"There are two query languages for SGML documents being developed
in ISO/IEC JTC1/SC18/WG8: HyQ, as part of HyTime, ISO 10744; and
an unnamed language as part of DSSSL, ISO/DIS 10179. We are
developing two languages because these two standards, while they
both manipulate SGML documents, are partially complementary in
scope and functionality. DSSSL deals with pure SGML files.
Although HyTime requires the `hub document' to be in SGML,
subsidiary documents may be in any format, including binary
digitized audio or graphics. DSSSL and HyQ both operate on
property sets defined in SGML. DSSSL's location model is entirely
in terms of SGML structures (where an element is in the tree, its
relationships to its siblings, and so on). HyTime also needs to
deal with finite coordinate spaces (this happens three seconds
after that). As of the September meeting of WG8, we feel that in
areas of simultaneous interest, there should be simple mappings
between the languages."

5.  DSSSL, which I hope will be the highlight of next year's Year
in Review, is expected to go out for a second draft ballot about
April, 1993.

USER GROUP ACTIVITY

1.  Here's a good piece of "you heard it here first" news: Today,
[Oct. 26, day one of SGML '92] on the other side of the planet,
that is, about 12 hours from now, the Australian National SGML
User Group is being formally incorporated under the name SGML OZ.
Chaired by Carlyle Nagel, the group's inaugural sponsor is Xerox
Australia, who, in Carlyle's words, "provide the tea and
biscuits". Congratulations to that group on this exciting and co-
incidental day

2.  A Mid-Atlantic SGML Users' Group has been formed, catering to
SGML Users' living in Atlantis. Well, the Washington D.C. area
actually.

3.  With the International SGML Users' Group having started in
the United Kingdom, it was often easy for everyone to think of
the International group as meeting the needs of a local chapter,
but of course local chapters have a separate role to fulfill, and
accordingly those big islands finally have their own U.K. SGML
Users' Group with Nigel Bray as Chairman.

4.  The Southern Ontario User Group (covering a broad sweep of
area more or less centered on World-Series-winning Toronto,
Canada) recently held a successful vendor day and continues to
publish its newsletter.

5.  A Seattle User Group has begun, under the sponsership of DEC.

6.  Meanwhile in Colorado, what I mentioned last year as the
planned Boulder SGML User Group finally got off the ground last
month with its first meeting and a working name of The Rocky
Mountain SGML Entity.

7.  The Dutch Chapter of the SGML Users Group reports that it had
a difficult year, due to the resignation of Dieke Van Wijnen as
secretary of the Group. In September the group found a new
secretary and now activities are resuming, including on November
25th, a one day conference on the managerial implications of the
introduction of SGML applications. On December 9th, the annual
meeting of the group will be held.

MAJOR PUBLIC INITIATIVES

1.  The Air Transport Association/Aerospace Industries
Association subcommittee responsible for text standards has just
released Revision 31 of Specification 100. This standard includes
six DTDs covering a range of technical maintenance publications
(including Aircraft Maintenance Manuals, Engine Ship Manuals and
Service Bulletins). The new spec also includes a DTD Requirements
Document and an SGML Data Dictionary (an industry-wide list of
reusable elements and attributes).

2.  Latest news on the CALS front is that MIL-M-28001B is
expected to be released about March. Revision B will include
significant changes in the Appendix B Output Specification, a
tagging scheme for partial document delivery requirements, and a
tagging scheme meeting the requirements for electronic annotation
and review requirements.

MIL-STD-1840B availability will be announced at CALS Expo '92. It
is expected to provide more flexibility in data delivery such as
accomodating other data types, device-independence, and tape
medium.

3.  The Commission of the European Communities (CEC) is funding
the TIDE (Technology Initiative for Disabled and Elderly people)
Pilot Action.

Within TIDE, the Communication and Access to Information for
Persons with Special Needs (CAPS) project started in December
1991, and will last until the end of March 1993.

This project's main objective is to provide broader access to
digitally distributed documents (especially newspapers, books and
public information) to a significant group of handicapped and
elderly persons who have difficulty in accessing the printed word
and/or electronic information. The print disabled group includes
the blind, the deaf blind, the visually impaired, the dyslexic
and those with motor impairments that make it difficult to
physically control paper documents or to use traditional methods
for computer access.

Working with the CAPS committee, Manfred Kruger of MID has
written a DTD for electronic delivery of newspapers    including
such interesting and once-you-think-about-it-perfectly-sensible
constructs as an entity for an "invisible blank". This is the
character that tells a voice synthesizer to break the current
word into parts which are pronounced separately.

4.  In a related item, a related committee with some overlapping
membership, a working sub-committee of the International
Committee for Accessible Document Design, has completed a draft
DTD to support the formatting of braille from SGML. This DTD was
accepted by the full committee last week and will now go forth to
the Texas legislature to become part of state law regarding
accessible electronic versions of all textbooks approved for use
in the state educational system. Anyone interested in learning
how to make new or existing DTDs "braille-ready" should contact
the author at SoftQuad. (+1 416 239-4801)

5.  In a single year, the Davenport Group, as part of its
"Davenport Advisory Standard for Hypermedia (DASH)" activity,
started (in January) and by December will have completed the
design and publication of a set of HyTime-based SGML
architectural forms, tentatively dubbed the "Standard Open Formal
Architecture for Browsing Electrical Documents," for the standard
representation of indexes, tables of contents, glossaries, and
cross references, for use with online documentation on Unix and
Unix-like Open Systems. Unix International, the Open Software
Foundation, Novell and others participated in this development,
and the Open Software Foundation is probably going to implement
the SOFABED architectural forms immediately.

6.  X Consortium, the people who brought you X-Windows, has been
presented with a protocol, proposed by Kent Summers of EJV and
Jeff Vogel of EBT, for online help servers. The proposal puts
forward a scheme which takes advantage of the hierarchical
tendencies of SGML models but also can support anything else that
has a notion of unique identifiers.

7.  As evidenced by the turnout from the drug industry at SGML
'92, there is strong interest in SGML from the point of view of
the manufacturers and of the US Federal Drug Administration.
Simultaneous with the first two days of this conference is a
CANDA (Computed-Assisted New Drug Applications) conference in
Washington, at which they are discussing SGML. The FDA has said
it wants electronic submissions of New Drug Applications by 1995
and is interested in experimenting with SGML. The Pharmaceutical
Manufacturers Association Task Force has suggested an SGML pilot
project.

8.  The CAD Framework Initiative (CFI    a good example of a
nested acronym) has a task force to develop a semiconductor
industry SGML application for use in transferring component
documentation within the industry. This group deserves special
mention for its formal name:
CAD Framework Initiative Design Information Technical Committee
Components Information Representation Technical Subcommittee
Electronic Data Book Working Group Technical Documentation
Interchange Standard Task Force.

9.  The US Congress' 1990 Clean Air Act requires that by 1996 car
manufacturers provide all emissions system documentation to
anyone who requests it. That will be done in SGML, in an
application known as J2008 and created by a subcommittee of the
Society of Automotive Engineers.

10. Formatting images for CD-ROM publishing and other electronic
image management systems can be made easier if there is an
organized scheme to follow. The Association for Information and
Image Management C15.9 standards committee members are developing
a scheme for generating image tags, based on Standard Generalized
Markup Language (SGML), that will be compatible with numerous
image indexing and retrieval products. The objective of the
project is to assist users by providing a versatile path for
converting image files into other publishing systems' proprietary
formats or database files. The project is entitled Compact Disk
Read Only Memory (CD-ROM) Application Profile For Electronic
Image Management (EIM).

11. A healthy collection of standards bodies, including AFNOR,
BSI, DIN and the IEEE are all looking at using (and modifying as
necessary) the DTD in the "ISO Technical Report 9573: Techniques
for Using SGML" for standards creation and production.

The Canadian Standards Association is developing, with
InfoDesign, a new SGML-based information and publishing system,
currently in pilot test phase, to encompass all facets of the
standards development process at CSA. When complete, the
information and publishing system will allow all data relevant to
a document to be created directly in SGML allowing its retrieval
in both view-only format and editable SGML text, and publishing
to both hardcopy and CD-ROM directly from SGML.

12. The Text Encoding Initiative is nearing completion of the
second version, much after the hoped-for date, but
correspondingly more thorough and well thought out. A number of
major commercial publishers are encoding considerable volumes of
material in TEI (Chadwyck-Healey and Oxford University Press,
among others).

13. The European Workgroup on SGML is working on a DTD for
scientific journal articles, the so called MAJOUR Article DTD
(Modular Application for JOURnals). This DTD is based on the AAP
Article DTD and is intended as an exchange format between
scientific publishers, typesetter and printers, and database
hosts. Since last year, when the MAJOUR Header DTD was presented
at the International Markup Conference 1991 in Lugano, the EWS
has been working on the Article DTD, particularly body, tables,
figures, math, and back matter. The first draft version was
finished in February '92. Work on individual parts and the
documentation is still going on. The MAJOUR Article DTD is
scheduled to be finished by the end of the year and will be
presented at International Markup 1993. The EWS is trying to take
into account and harmonize its own work as far as possible with
the work and the results of other initiatives in the field such
as the AAP Tables/Math Update Committee and the AAP Article DTD
Update.

14. In April, Pam Gennusa of Database Publishing made a
presentation on SGML to the Text Working Party of the
International Press Telecommunications Council in London. In May,
the Associated Press hosted a seminar to introduce SGML to North
American print and broadcast media and vendors. At the June
meeting of the IPTC working parties and Standards Committee in
Toronto, the AP presented an initial draft of NIML, a News
Industry Markup Language, intended as a first step towards a full
SGML implementation for news text.

The NIML draft has since been republished in SGML News in
Australia, and been added to the libraries on CompuServe's
Journalism Forum. The IPTC has formed a joint SGML working party
with the Newspaper Association of America and the Radio-
Television News Directors Association.

MAJOR CORPORATIONS & GOVERNMENT INITIATIVES

1.  The US Department of Energy has adopted SGML as its standard
for electronic exchange of scientific and technical information
and the Office of Scientific and Technical Information in Oak
Ridge, Tennessee has been selected as the facilitating
organization. Various DOE organizations and contracters are
already participating in this effort and proposals have been
submitted for the backbone system.

2.  The Australian Parliament has just completed a review of its
publishing needs and has recommended SGML for the daily
publication of Hansard and supporting documentation. The
Australian Attorney General's Office is ramping up its use of
SGML, and the Australian Tax Office has an SGML pilot project
going which, if successful, will spread across the department.
The Australian Defense Publishing Group (DPUBS) has installed a
CD-ROM manufacturing facility, which is migrating to SGML.

3.  The U.S. Navy Defense Printing Service purchased an SGML-
based publishing system to be deployed at all printing service
sites, DoD-wide under the ADMAPS (Automated Document Management
and Publishing System) program.

Under the EDRADS (Electronic Document Retrieval and Distribution
System) program, the Navy is populating a document database of
all Military Specifications and Standards by scanning and
applying SGML tagging.

The U.S. Navy has begun a study on conversion of logistics
support analysis material directly to Interactive Electronic
Technical Manuals.

4.  Agfa won the 910S award to develop DTDs and FOSIs for US Air
Force Administrative material (and conversion of 10,000 pages).
With Agfa's recent re-organization announcements, the fate of
this award is, so to speak, up in the air.

5.  JCALS    the US DoD publishing system architecture contract
   was awarded this year to a project team headed by Computer
Sciences Corporation. With an estimated value over 10 years of
$750 million, JCALS is intended to provide systems which will
then be duplicated thoughout the DoD to receive contractor data
encoded to CALS standards. When complete, the system will be the
world's largest integrated information retrieval, document
database, editing and publishing system, consisting of hundreds
of sites with tens of thousands of users.

6.  In Holland, Fokker Aircraft is working on a CALS-like SGML
implementation.

7.  The Dutch Petroleum Company (NAM) owned by Shell and Exxon
has begun implementing an SGML application.

8.  Wolters Kluwer Law, has completed conversion of the entire
Law Database of Dutch legislation into SGML and is now converting
its looseleaf operation.

9.  Sumitomo Bank Capital Markets of New York City reports that
it would not be capable of maintaining its current volume of
business without the links between its structured database data
and its unstructured data that SGML provides. Frank Deutschmann
writes:

"Over the course of the year ... we have moved ALL wordprocessing
activities into the SGML environment (currently ArborText's
Publisher). Our business (trading derivative financial products)
involves detailed legal documentation for each trade (dozens a
day), and all documentation is now generated and stored in SGML
format. I believe that we are one of the first serious users of
SGML ... literally our whole business is based on SGML."

10. In New York City and London, MarketScope, Standard & Poor's
electronic market analysis service, is being launched
simultaneously through three on-line distribution services with
three different formatting requirements    all from a common SGML
source (created in SoftQuad Author/Editor).

11. In France, Bull is creating its user documentation in SGML
using an editorial system centered an Ingres database, and geared
to producing both paper and a CD-ROM.

12. Aerospatiale, with the help of AIS, has built a ground based
ELS (Electronic Library System) to take maintenance manuals and
transform them    both automatically and manually into documents
needed by Air Inter, the national airline owned by Air France.
Aerospatiale is to deliver the final system within the next
month. The system will also be available to other airlines.

13. Delta's TOPS System is up and running, producing native SGML
job cards. The system takes Boeing data into Datalogics' tagger,
using an older version of ATA Spec 100 DTD.

14. USAir is building a native SGML application for Service
Bulletins and other internal documents using IBM's TextWrite.

15. Boeing will be producing Service Bulletins in SGML and has
provided a complete maintenance manual as test data to the
airlines.

16. At the ATA Digital Maintenance Conference last month, 160
people from 60 airlines saw Digital Service Bulletins on SGI
computers with Arbortext's SGML Editor and MS Windows and
Macintosh versions of SoftQuad Author/Editor. All computer
platforms demonstrated the same business process: Service
Bulletin content being edited and re-ordered to become
Engineering Orders.

17. The Laboratory for Library and Information Science at
Linkoping University in Sweden is working with the Swedish
Defense Research Establishment in using HyTime to model dynamic
structures in crisis management systems incorporating, for
instance, information from geographical information systems.

18. John Duke of Virginia Commonwealth University, and consultant
George Alexander, a member of the original AAP committee, are
working on a project to convert the second edition of the Anglo-
American Cataloguing Rules.  (AACR2) to an SGML file. AACR2 is
maintained by an international committee and codifies the rules
that librarians throughout the world use to describe materials in
their collections. The electronic version of AACR2 will be used
not only to produce future print versions of the constantly
changing rules, but to develop software for online versions
linked to other cataloguing tools, for tutorial programs, and for
other research tools. Value-added developers of AACR2-e may
develop linkages to other products, such as the LC Rule
Interpretations or the MARC format documents.

19. The Department of Statistics at North Carolina State
University will be publishing The Journal of Statistics
Education, a newly organized electronic journal that will
maintain journal materials using SGML. Some assistance to authors
will be provided in producing the SGML documents, at least in the
short-term. The first issue of the JSE is targeted for July 1993.
The editors plan on using a modified version of the AAP journal
DTD.

20. SRC, the Semiconductor Research Corporation funds university
research and "pre-publishes" the results to its member sponsors.
It has said it will start delivering those findings
electronically in SGML by the end of this academic year. The DTDs
are done.

21. The Caterpillar Service Information System (SIM) project is
based on 11 DTDs and includes file system management software
developed by InfoDesign. To date, the system has received,
verified and catalogued more than 350,000 pages of converted SGML
text and graphics.

A second project, the Caterpillar File Management System (FMS)
built with Computer Sciences Corporation, a distributed, SGML-
based information management system, is now being implemented.

22. Microstar Software of Ottawa has received funding for a two
year research project in the area of SGML tools from the Canadian
Department of National Defense.

23. This year Microsoft released a CD-ROM multimedia package,
entitled "Cinemania" integrating several books about movies into
one complex reference source. Microsoft and Exoterica's
consulting group used SGML as an enabling technology in the
preparation of text data.

24. Mead Data Central has begun work on what may be the largest
non-government SGML application in the world. Over the next 2
years, they will be converting between 200 and 250 million
documents into SGML using 4,000 or more DTDs and 8,000 or more
FOSIs.

25. SunSoft, the Sun software subsidiary, is using SGML in its
online publishing tools. Documents in several popular electronic
publishing formats will be converted into online information
similar to SunSoft's AnswerBook online documentation product.

26. A company called FLUKE reports that it has built a filter for
its "Standard Input File Format" where it attempts to keep a
writer's file as close to ASCII text as possible and then implies
the markup to take it into an Agfa CAPS System.

27. The Cooperative Extension System, which includes the U.S.
Department of Agriculture Extension Service, 77 land-grant
universities and 3100 county offices, has appointed a working
group to develop a standard for encoding publications using SGML.
Extension publications include technical reports, fact sheets,
and pamphlets on agriculture, horticulture, home economics, youth
development    many of which incorporate images, tables, charts
and graphs.

In addition, USDA Extension Service is supplementing its paper
distribution system with electronic distribution over the
Internet. Documents will be encoded with SGML and formatted on
request for a variety of display technologies.

PUBLICATIONS

1.  Joan Smith, leader of the CALS in Europe Special Interest
Group and one of the founding fathers and mothers of SGML, has a
new book just out, called SGML and Related Standards published in
the UK by Ellis Horwood and distributed in North America by Simon
& Schuster.

2.  Oxford University Press has indicated to Charles Goldfarb
that The SGML Handbook has gone back to press for a second
edition.

3.  Eric von Hervijnen's book Practical SGML has sold 3000
copies. The Japanese edition of the book was published this year
by the Japanese SGML Forum. Eric is now working on a second
edition which will also be available electronically in Dynatext,
incorporating the ARCSGML parser with buttons that will allow you
to parse the book's examples. A wonderful case of using available
SGML technology technology beyond simply representing pages.

4.  SGML Inc., the editorial team behind <TAG>, the SGML
Newsletter, entered into an agreement whereby the GCA publishes
the newsletter. Another sign of SGML's continued growth is the
fact that <TAG> is now published monthly.

5.  The CALS Journal, a glossy colour magazine devoted to the
world-wide CALS initiative and with continuous coverage of SGML
in CALS, is now completing its first year of publication.

6.  Interesting and exciting SGML coverage in the mainstream:
BYTE magazine's June issue had a cover section on "Infoglut"
which included articles devoted to SGML by Steve DeRose and Lou
Reynolds of Electronic Book Technologies and Chris Locke and
Haviland Wright of Avalanche. The November '92 issue of Unix
World includes that magazine's first major mention of SGML in its
Standards column.

7.  The Seybold Report's coverage of SGML activities continues to
grow, recently with Mark Walter's long piece on September 7th
describing both Silicon Graphic's and Novell's committment to
SGML:

    "If we've been writing a lot about the Standard Generalized
Markup Language (SGML) lately, it's because a lot is happening.
The latest ringing endorsement of the standard: Silicon Graphics
and Novell will be converting their hardware and software
documentation into SGML for delivering the manuals on CD-ROM.

    The adoption of SGML by two big-name computer gear suppliers
   both of whom had ready access to vendor-based solutions   
reflects a growing awareness of the intellectual and business
advantages to putting critical information in a rich, portable
form. Adoption of SGML by the computer industry could spur more
widespread use and change the face of electronic delivery
software development."

8.  The November 18th Management Edition of Newsweek    which is
sent to 3/4 million management subscribers    includes an article
by Chris Locke placing SGML in perspective for managers. Locke
describes reasons why computer automation "has largely failed to
increase productivity" and goes on to say:
"A solution to both problems    universal document interchange
and the explicit encoding of document structure    is rapidly
arriving from a largely unheard of quarter. The Standard
Generalized Markup Language (SGML) is being adopted with
surprising speed by companies such as WordPerfect, Novell Frame
Technologies, Interleaf, Silicon Graphics, Digital Equipment, and
Sony. The reason that this open, non-proprietary international
standard is situated at the heart of so many development efforts
is its ability to represent a rich set of document structures and
relate them to a humanly meaningful whole."

VENDOR ANNOUNCEMENTS

A couple of announcements this year suggest activity among
vendors that signal, I think, SGML's movement into the
mainstream:

1.  The WordPerfect Corporation, market leaders in the
wordprocessing world, demonstrated their SGML-conforming version
of UNIX WordPerfect, both at TechDoc and the recent Seybold
Conference. The product is in its beta test period now and will
be ported to MS-DOS next year.

2.  Adobe Systems has launched its Carousel product, which
accurately displays PostScript fonts and pages irrespective of
the computing platform they're sent to. Although there have been
no formal announcements, statements made at the Seybold
Conference by John Warnock and Adobe Vice President, Bill
Spaller, indicate that sometime next year a new version of
Carousel will be released with some SGML smarts.

3.  TechnoTeacher, Inc. demonstrated a prototype of its
"HyMinder" Hytime engine at TechDoc Winter 1992 last February.
TechnoTeacher expects to release the "HyMinder" product along
with its SGML document object library (called "MarkMinder")
during the first quarter of 1993.

4.  Quark, maker of Quark Express, has produced an alpha version
of a filter that exports Quark files in SGML-encoded form in
conformance with the ICADD Minimal DTD (for Braille).

Other vendors who have made announcements this year include: AIS,
Arbortext, Avalanche, Datalogics, EBT, Exoterica, Frame,
Intergraph, Interleaf, Oster & Associates, SoftQuad, Unifilt,
Zandar, and Westinghouse.

MISCELLANEOUS

1.  One sign of a growing market is the appearance of market
analysis: InterConsult has released an "SGML Software Market
Report" which divides the SGML market into useful sectors and
attempts to gauge both current and future sales levels. The data
is available both as a published document and with InterConsult-
developed software called Intuition, which allows one to build
one's own assumptions into a sophisticated analytical model.

2.  The next item is a reprise of one of last year's. I ended
this talk in 1991 by describing Michel Bielzinski's talk at the
International Markup Conference. Well, a recent issue of <TAG>
includes a very interesting piece by Michel on the same theme: a
comparison of space and time in HyTime and Einstein's General
Theory of Relativity.

3.  In December 1992 Exoterica will be releasing a CD-ROM
entitled "The Compleat SGML" containing the full text of ISO8879
integrating the 1988 amendment in online hypertext form.
Accompanying this electronic reference will be thousands of
sample SGML documents, comprising Exoterica's SGML conformance
test suite. Several enormous documents will also be provided for
benchmarking purposes.

4.  CURIA, the ancient manuscript project of the Royal Irish
Academy, has 6MB of text scanned and now being encoded from
printed editions of annuals, sagas, poems and prose works in
Irish Latin and Old Norse.

5.  Dynatext is being used in a math course (differential
geometry) at Brown University  with interactive 3D graphics and a
whole on-line text book.

6.  This is surely one of the great tidbits of miscellaneous SGML
news: At the recent mid-Atlantlic SGML User's Group meeting, the
U.S. Central Intelligence Agency announced that SGML is a
strategic direction for the Agency.

7.  Reaction was so good at the SGML '91 Conference to Tommie
Usdin's paper cut-out dolls for modelling SGML content, that the
GCA is now offering the package for sale.

8.  An SGML and LaTex volunteer group managed by Chris Rowley,
Ramer Schopf and Frank Mittelbach reports:

    "On top of this low level typesetting engine [LaTeX] we are
building a high-level language `for specifying the formatting
requirements of a class of structured documents' (i.e. for
prescribing how to format a document which conforms to a
particular DTD) and also implementing a `formatting engine' which
uses the specified formatting to convert an input document into a
PDL (primarily in TeX's DVI language, but this can be translated
directly into quite `low-level' Post Script or PCL or ...) This
will be, like the current LaTeX, a public domain system."

9.  The Integrated Chameleon Architecture, a software system for
generating translators to and from SGML DTDs, was made available
for public release in March 1992. A user's guide is also
available. Scheduled enhancements include the addition of a
capability to import already existing DTDs and to specify
attributes in DTDs.

10. CITRI, the Collaborative Information Technology Research
Institute, a joint research arm of the University of Melbourne
and the Royal Melbourne Institute of Technology, has been
researching in the area of document retrieval. A group has
developed an SGML based hypertext information retrieval system
which uses tools such as Lector (University of Waterloo) and XGML
(Exoterica) plus their own database engine, Atlas, to provide a
platform for researching the retrieval of large, structured
documents.

11. The SGML Project, based at Exeter University in the U.K., has
continued to successfully promote the use of SGML within the
U.K.'s academic and research community. During this last year,
the members have given presentations on SGML to several
universities, businesses and conferences, established a major
electronic archive for SGML resources, and recently set-up an
email discussion list for the U.K. community. They are actively
seeking additional funding, and over the coming year intend to
establish workgroups to define criteria for evaluating SGML
software, to assess the software currently available, and to
write the DTDs and translators required by the academic
community.

12. The Jet Propulsion Laboratory (a division of Caltech
contracted to NASA) began a project in 1985 called Planetary Data
System    a project to catalog and archive planetary data. Thus
far the project has only archived the plain ASCII text of
accompanying documentation, but is now looking into determining
the best way of archiving for the long term while making the
documents associated with planetary data available in multiple
output forms. SGML is being put forward as a possibility.

Finally, with JPL involved, SGML has the opportunity to become a
truly universal standard. Or at least galactic.

APPENDIX II


		Back to the Frontiers and Edges:
       Closing Remarks at SGML '92:  the quiet revolution
	    Graphic Communications Association (GCA)
		     C. M. Sperberg-McQueen
			29th October 1992

Note:  This is a lightly revised version of the notes from which
the closing address of the SGML '92 conference was given.  Some
paragraphs omitted in the oral presentation are included here;
some extemporaneous additions may be missing.  For the sake of
non-attendees who may see this, I have added some minimal biblio-
graphic information about SGML '92 talks referred to.  I have not
added bibliographic references for HyTime, DSSSL, etc.  If you
are reading this, I assume you already know about them, or know
where to find out.  (MSM)

			    Section I
			  INTRODUCTION

What a great conference this has been!  We began with a vision of
the future from Charles Goldfarb,(1) and since then have had a
detailed tour of a lot that is going on in the present.  I want
to turn your attention forward again, and outward, back to the
fringes and the edges of our current knowledge.  We've been hearing 
about a lot of projects in which the gains of SGML are being
consolidated, implemented, put into practice.  I want to talk
about some areas in which I think there may still be gains to be
made.

Not surprisingly, some of those gains are at the periphery of our
current concerns with SGML, in fringe applications, pushing the
edge of the envelope.  Not surprisingly, Yuri asked an academic
to talk about them, because academics are by nature fringe
people, and our business is to meddle with things that are
already pretty good, and try to make them better.

In identifying some areas as promising new results, and inviting
more work, there is always the danger of shifting from "inviting
more work" to "needing more work" and giving the impression of
dissatisfaction with the work that has been accomplished.  I want
to avoid giving that impression, because it is not true, so I
want to make very clear:  the questions I am posing are not
criticisms of SGML.  On the contrary, they are its children:
without ISO 8879, these questions would be very much harder to
pose:  harder to conceive of, and almost impossible to formulate 
intelligibly.  SGML, that is, has created the environment
within which these problems can be posed for the first time, and
I think part of its accomplishment is that by solving one set of
problems, it has exposed a whole new set of problems.  Notation
is a tool of thought, and one of my main concerns is to find ways
in which markup languages can improve our thought by making it
easier to find formulations for thoughts we could not otherwise
easily have.

I start with the simple question:  what will happen to SGML and
to electronic markup in the future?  Charles Goldfarb told us
Monday:  the future of SGML is HyTime.  And this is true.  HyTime
is certain to touch most of us and affect our use of SGML in the
coming years.  But HyTime is already an international standard:
it's part of the present.  What will happen next?  What should
happen next?

What I will offer is just my personal view, it has no official
stand- ing and should be taken for what it's worth.  It's an
attempt to provide a slightly fractured view, a view slightly
distorted in order to provoke disagreement and, I hope, some 
useful thought.

If you want to know what is going to happen with SGML and markup
languages in the next few years, all you have to do is think about what
happened in programming languages after the introduction of Cobol or
Algol, and what happened in database management systems after the
development of the Codasyl data model.

			   Section II
		      THE MODEL OF THE PAST

The introduction of Cobol made possible vast improvements in
program- mer productivity and made thousands of people familiar
with the notion of abstraction from the concrete machine.  It is
no accident that SGML is often compared to Cobol:  it is having a
similarly revolutionary effect.

More suggestive to me however is the parallel between SGML and
Algol. Apart from the skill with which its designers chose their
basic concepts, one of Algol's most important contributions was
its clean, simply designed syntax.  By its definition of its
syntax, Algol made possible the formal validation of the program
text, and thus rendered whole classes of programmer error (mostly
typos) mechanically detectable, thus effectively eliminating them
from debugging problems.  Similarly SGML renders whole classes of
markup error and data entry mechanically detectable and thus
eliminates them as serious problems.  The notion of formal
validity is tremendously important.

What happened after the introduction of Algol?  Over a period of
time, the intense interest in parsing and parser construction
gave way to interest in the meaning of programs, and work on
proofs of their correctness -- which I interpret as essentially
an attempt to extend formal validation beyond the syntax of the
language and allow it to detect logic or semantic errors as
well, and thus eliminate further classes of programmer error by
making them mechanically visible.  Formal reasoning about objects
requires a clean formal specification of those objects and their
characteristics, so time brought serious work on the formal
specification of programming language semantics.

In particular, work on type systems occupied a great deal of
attention, because Algol had demonstrated that type errors can
be mechanically detected for simple types.  So a lot of people
worked on extending those simple types and creating stronger,
subtler, more flexible, more useful type schemes; from this work
our current trend of object-oriented programming takes some of
its strength.

All of these same issues arose in connection with database
management systems after Codasyl.  (No matter what Tim Bray said
yesterday, this did happen well before 1978.)  The work of
Codasyl in defining formally the characteristics of databases led
to a generation of improved database systems, and eventually
the increasing complexity of those systems led to the introduction 
of the relational model, whose simple concepts had a clean
formal model firmly grounded in mathematics, which simplified
reasoning about databases and their correctness and which led to
substantial progress in database work, as Tim Bray described
yesterday.(2)

The database work confirmed, fueled, and strengthened the conviction
that formal validity and a rational set of data types are a useful
investment.  Equally important for our purposes, database work showed
the importance of placing as much as possible of the burden of
validation in the database schema definition and not in the application
software that works with the data.  If you have a logical constraint in
your data, for example that the sum of columns DIRECT COST and INDIRECT
COST not exceed the column PRICE TO CUSTOMER, or that the only colors
you offer are GREEN and BLUE, it is better to define that constraint
into the database schema, so it will be consistently enforced by the db
server.  You may be tempted to leave it out of the schema on the
grounds that your application programs can enforce this constraint just
as well as the server.  And you are right -- in theory.  In practice,
as surely as the day is long, before the end of the year you and the
two other people who were there are transferred to new duties, your
replacements will overlook the note in the documentation, the first
thing they will do is write a new application which does not enforce
this rule, and before another year is gone your database will be full
of dirty data.

In other words, to paraphrase an old Chicago election adage, constrain 
your data early and often.

As hardware costs declined and programmer costs increased,
portability became an increasingly prominent issue, and the
semantic specification of languages, abstracting away from the
specifics of individual machines, proved to be an invaluable tool
in helping achieve it where possible and limit the costs where
device-specific code was necessary.

Since the progress on formal semantics, though promising, did not
yield mechanistic logic checkers as reliable as mechanistic
syntax checkers, the years after Algol and Codasyl also saw the
development of the notion of programming style, and attempts to
define what constitutes a good program.  At least some of these
discussions appealed as much to aesthetic judgments as to empirical 
measures, which is as it should be, since aesthetics is a fairly
reliable measure of formal simplicity and power.

			   Section III
		  SGML PROBLEMS AND CHALLENGES

All of these problems are also visible in SGML and electronic text
markup today, and my prediction, for what it is worth, is that they
will occupy us a fair amount in the coming years.  What is more, as you
will have noticed in the course of the conference they are already
occupying us now.  That is, the future is closer than you might think.

What problems will occupy us in this uncomfortably near future?
The same ones that we saw in programming languages and database
management:

	* style
	* portability
	* a large complex problem I'll call "semantics", which
	  includes problems of validation and type checking

III.1   Style

We saw the other day in Tommie Usdin's session One DTD Five Ways(3) how
close we already are to developing consensus on DTD style.  As for
external, presentational details, Tommie remarked that there is already
an implicit consensus.  For details of construction and approach, she
remarked, rightly I think, that there is no one answer, no context-free
notion of "a good DTD".  Our work in coming years is to work on
clarifying a context-sensitive notion of "a good DTD".

When is it better to tag a profile of Miles Davis as a
<NewYorkerProfile> and when is it better to tag it <article> or even
<div>?  The answer is not, I suggest to you, as some were proposing the
other day: namely that it's always better to tag it <NewYorkerProfile>,
but you may not always be able to afford it and so you may have to
settle for <article> or <section>.  For production of the New Yorker,
or for a retrieval system built specifically around the New Yorker, I
personally would certainly use the more specific tag.  For a production
system to be used by all Conde Nast or Newhouse magazines, however, I
think the elements <goings>, <TalkOfTheTown>, and so on would be
problematic.  Let's face it, Psychology Today and Field and Stream just
do not have those as regular departments.  In building a 100-million
word corpus of modern American English, it would similarly be a
needless complication of the needed retrieval to provide specialized
tags for each magazine and newspaper included in the corpus.  One of
the points of this whole exercise (i.e.  SGML) is to reduce irrelevant
variation in our data -- and relevance is context-dependent.

Judging by the talks we have heard, those in this community will
be building and customizing an awful lot of distinct DTDs in the
coming years.  One of our major challenges is to learn, and then
to teach each other, what constitutes good style in them:  what
makes a DTD maintainable, clear, useful.

III.2   Portability

Our second major challenge is, I think, portability.  I can hear you
asking "What?!  SGML is portable.  That's why we are all here." And you
are right.  Certainly, if SGML did not offer better portability of data
than any of the alternatives, I for one would not be here.

But if data portability is good, application portability is better.  If
we are to make good on the promises we have made on behalf of SGML to
our superiors, our users, and our colleagues, about how helpful SGML
can be to them, we need application portability.  And for application
portability, alas, so far SGML and the world around it provide very
little help.

Application portability is achieved if you can move an application from
one platform to another and have it process the data in "the same
way".  A crucial first step in this process is to define what that way
is, so that the claim that Platform X and Platform Y do the same thing
can be discussed and tested.  But SGML provides no mechanism for
defining processing semantics, so we have no vocabulary for doing so.

DSSSL (ISO 10179, the Document Semantics and Style Specification
Language) does provide at least the beginnings of that vocabulary.  So
DSSSL will definitely be a major concern in our future.  We have seen
another bit of the future, and it is DSSSL.


			   Section IV
			 THE BIG PROBLEM

But the biggest problem we face, I think, is that we need a clear
formulation of a formal model for SGML.  If we get such a formal
model, we will be able to improve the strength of SGML in several
ways.

IV.1   SGML's Strengths

SGML does provide a good, clean informal model of document structure.
Like all good qualitative laws, it provides a framework within which to
address and solve a whole host of otherwise insoluble problems.

For the record, my personal list of the crucial SGML ideas is:
	* explicitly marked or explicitly determinable boundaries
	  of all text elements
	* hierarchical arrangement/nesting of text elements
	* type definitions constraining the legal contents of 
	  elements
	* provision, through CONCUR and the ID/IDREF mechanism, for
	  asynchronous spanning text features which do not nest properly
	  -- and here I want to issue a plea to the software vendors:  
	  Make my life easier.     Support CONCUR!
	* use of entity references to ensure device independence of
	  character sets

Obviously there are a number of other features important to making SGML
a practical system, which I haven't listed here.  What I've listed are
what seem to me the crucial elements in the logical model provided by
SGML.

It seems to me that a properly defined subset of SGML focusing on these
ideas and ruthlessly eliminating everything else, could go far in
helping spread the use of SGML in the technical community, which is
frequently a bit put off by the complexity of the syntax
specification.  I don't think a subset would pose any serious threat to
the standard itself:  use of a subset in practice leads to realizations
of why features were added to the standard in the first place, and with
a subset, the growth path to full use of the standard language is
clearly given. Spreading the use of SGML among the technical community
would in turn help ensure that we get the help we will need in
addressing some of the challenges we face.

IV.2   Semantics

We commonly think of SGML documents as data objects, to be processed by
our programs.  I ask you now to participate for a moment in a thought
experiment:  what would the world be like, if our SGML documents were
not documents, but programs?  Our current programs for processing SGML
documents would be compilers or interpreters for executing SGML
programs.

What else?

Well, first of all, we discover a tremendous gap:  we have lost
everything we used to know about programming language semantics,
and we have no serious way of talking about the meanings of these
SGML programs.  And for that matter, we have no serious way of
talking about what happens when we compile or execute them.  In
other words, we have made our programs reusable (we can run the
same program / document with different compilers) and so we can
use just one programming language instead of many, and this is
good, but it would be nice to have a clue about the semantics of
the interpretations our compilers make of the language we are
using.

The clearest analogy I can think of to our situation is that in SGML we
are using a language like Prolog, in which each program (document) has
both a declarative interpretation and an imperative or procedural
interpretation.  If you ignore the procedural aspects of Prolog
programs, you can reason about them as declarative structures; if you
attend to the procedural aspects, you can see what is going to happen
when you run the program.

The difference between Prolog and SGML is that Prolog has very
straightforward semantics for both the declarative and the procedural
interpretations, for which formal specifications are possible.  In
SGML, we have a very clear informal idea of the declarative meaning of
the document, but not a very formal one.  And we have no vocabulary
except natural languages for talking about processing them.

Ironically, it is not easy to say exactly what ought to be meant by the
term semantics.  Different people use it in different ways, and if it
does have a specific, consistently used meaning in formal language
studies, then the practitioners have kept it a pretty well guarded
secret.  So I can't tell you what semantics means; I can only tell you
what I mean by it today.

Imagine I am about to send you an SGML document.  Included in this
document are two elements I suspect you may not have encountered
before: <blort> and <vuggy>.  When I say I'd like to have a good
specification of their semantics, I mean I would like to be able to
tell you, in a useful way, what <blort> and <vuggy> mean, and what
formal constraints are implied by that meaning.

But we don't seem to know how to do that.

The prose documentation, if there is any and if I remember to send it,
may say what a <blort> is, or it may not.  It may tell you what <vuggy>
means, but if it does it may say only "full of vugs; the attribute TRUE
takes the values YES, NO, or UNDETERMINED".  Unless you are a geologist
you probably don't know what a vug is, and if you are a geologist you
may harbor some justifiable skepticism as to whether I know and am
using the term correctly.

Even if my prose documentation does explain that a vug is an air-hole
in volcanic rock, and you know how to decide how many vugs make a rock
vuggy, I have probably not succeeded in specifying what follows
logically from that meaning in any useful way -- probably not, that is,
in a way that a human reader will understand and almost certainly not
in a way that a validating application can understand and act upon.
For example, how many people here realize, given our definition of
<vuggy>, that the tag <vuggy true=yes> is incompatible with the tag
<rock type=metamorphic> -- since the definition of a vug is that it's
an airhole in volcanic, i.e. igneous, rock.  If you noticed that,
congratulations.  Are you right?  I don't know:  if some vuggy igneous
rock is metamorphosed and the airholes are still there, is it still
vuggy?  I don't know:  I'm not a geologist, I'm just a programmer.  Is
there a geologist in the house?(4)

It would be nice to be able to infer, from the formal definition
of <vuggy>, whether or not <vuggy true=yes> is incompatible with
<rock type=metamorphic>, just as we can infer, from the DTD, that
<vuggy true='76.93%'> is not valid, since the attribute true can
only take the values YES, NO, and UNDETERMINED.  Prose is not a
guaranteed way of being able to do that.

So what can we manage to do by way of specifying the "semantics"
of <blort> and <vuggy>?  We don't seem to know how to specify
meaning in any completely satisfactory way.  What do we know how
to do?  Well, we can fake it.  Or to put it in a more positive
light, we can attempt to get closer to a satisfactory specification 
of meaning in several ways:

IV.2.1   Prose Specification

First, we can attempt to specify the meaning in prose.

Specifications in prose are of course what most of our manuals provide
in practice.  It is handy to formalize this as far as possible, to
ensure consistent documentation of all the characteristics of the
markup that we are documenting.  We've heard obliquely about a number
of systems people use to generate structured documentation of SGML tag
sets: Yuri Rubinsky mentioned one used internally by SoftQuad; Debby
Lapeyre mentioned one; the Text Encoding Initiative (TEI) uses one; I
am sure others exist too.  This is already a live issue.  And it will
continue to occupy our attention in the coming years.

Natural-language prose is, at present, the only method I know of for
specifying "what something means" in a way that is intuitive to human
readers.  Until our colleagues in artificial intelligence make more
progress, however, prose specifications cannot be processed
automatically in useful ways.

IV.2.2   Synonymy

Second, we can define synonymic relationships, which specify that if
one synonym is substituted for another, the meaning of the element,
whatever that meaning is, remains unchanged.  If we didn't know in
advance what <blort> and <farble type=green> meant, we probably still
don't know after being told they are synonyms.  But knowing we can
substitute one for the other while retaining the meaning unchanged is
nevertheless comforting.

IV.2.3   Class Relationships

Third, we can define class relationships, with inheritance of class
properties.  It doesn't tell us everything we might need to know, but
if we know that a <blort> is a kind of glossary list, or a kind of
marginal note, we would have some useful information, which among other
things would allow us to specify fall-back processing rules for
applications which haven't heard of <blort>s but do know how to process
marginal notes.

The fact that HyTime found it useful to invent the notion of
architectural form, the fact that the TEI has found it useful to invent
a simple class system for inheritance of attribute definitions and
content-model properties, both suggest that a class-based inheritance
mechanism is an important topic of further work.

IV.2.4   License or Forbid Operations

Fourth, we can define rules that license or forbid particular
rela- tions upon particular objects or types of objects.  We may
not know what a <blort> is, but we can know that it stands in
relation X to the element <granfalloon>, and we can know that
no <blort> can ever stand in relation Y to any element of type
<vuggy>.

In addition to relations, we can specify what operations can be
applied to something:  knowing that INTEGER objects can be added
while DATE objects cannot, especially if one of the DATE objects
is "in the reign of Nero", is part of what we mean when we say we
understand integers and dates.

An ability to define legal operations for SGML objects is a key
requirement for using SGML in data modeling.

The definition of a data type involves both the specification of the
domain of values it can take on and the spec of operations which can
apply to it.  Because SGML has no procedural vocabulary it is very
difficult to imagine how to specify, in SGML, the operations applicable
to a data type.  It would be useful to explore some methods of formal
specification for legal operations upon SGML objects.

But note that "what it can do" and "what can be done to it" are
not, really, specifications of "what it means".

Moreover, object-oriented specifications cannot be exhaustive.  In an
application program, if an operation P is not defined for objects of
type Q, it counts as a claim that operation P is illegal for such
objects.  Even if it's not illegal, you aren't going to get anywhere by
trying to call it, so it might as well be illegal.  In SGML, with our
commitment to application independence, that isn't the case.  If no
definition of addition for DATE objects is provided, that could mean
that it is semantically invalid:  dates can never be added.  Or it
could mean that we just haven't got around to it yet, or haven't
thought about it yet.  So the absence of a method for performing an
operation doesn't tell us whether the operation is or should be legal
upon a particular type of object.  Obviously, instead of leaving
operations undefined, we could specify explicitly that certain
operations are illegal for objects of a certain class.  But it is not
feasible to make an list of all the things that cannot be done to
DATES, or BLORTS, or GRANFALLOONS, because the list is likely to be
infinite.

Nevertheless, as a way of approaching the formal description of
applications, object oriented work is very promising.  It's
fairly obvious that in the future we need to work together with
those people developing the object-oriented programming
paradigm.

IV.2.5   Axiomatic Semantics

Fifth, we can specify in some logical notation what claims about the
universe of our document we can make, given that it is marked up in a
certain way, and we can define what inferences can be made from those
claims.  The synonymic relations I was talking about a moment ago are
just a special case of this.

Formal logic (i.e. first-order predicate calculus) certainly makes
possible the kinds of inference I've been talking about, but even
predicate calculus makes some concessions to the difficulty of the
problem. I can infer that this value for this attribute and that value
for the other one are consistent, inconsistent, etc.  But since Frege
and Russell and Whitehead, logic has treated itself as a purely formal
game divorced from meaning; the only relation to the real world is by
way of models which involve assigning meanings to the entities of the
logical system and seeing which sentences of the logical system are
true under these interpretations.  The problem is that "assign a
meaning in the real world to an entity or operation of the logical
system" is taken as a primitive operation and thus effectively
undefined.  We all know how to do this, right?  We can't define
semantics, but we know it when we see it.

In work on declarative semantics, we can learn a lot from recent
experience with logic constraint programming and declarative
programming.  The declarative approach to SGML semantics has a certain
appeal, both because it fits so well with the perceived declarative
nature of SGML as it is, and because declarative information is
useful.  As John McCarthy said in his Turing Award lecture, "The
advantage of declarative information is one of generality.  The fact
that when two objects collide they make a noise can be used in a
particular situation to make a noise, to avoid making a noise, to
explain a noise, or to explain the absence of noise.  (I guess those
cars didn't collide, because while I heard the squeal of brakes, I
didn't hear a crash.)"

One worry about declarative semantics is that it might prove difficult
to define processing procedures in a declarative way.  But in fact it
is possible to specify procedures declaratively as Prolog, and logic
constraint languages, and the specification language Z show us.

So I think a formal, axiomatic approach of some kind is very
promising.

But let's be real:  it is very unlikely from a description of the
tag set in first-order predicate calculus that you or I, let
alone the authors we are working with, will understand what a
<blort> is, or even what a <vug> is.

IV.2.6   Reduction Semantics

Finally, I should mention one further method of formal semantic
specification:  reduction semantics.  Reduction works the way
high-school algebra works.  One expression (e.g. "(1 + 2) + 5" is
semantically equivalent to that expression ("3 + 5"), that one to this
other one("8"), and so on.  If you work consistently toward simpler
expres- sions, you can solve for the value of X. There has been
substantial work done on reduction semantics in programming languages,
including LISP and more purely functional languages like ML.

Moreover, reduction semantics doesn't have to be defined in terms
of string expressions:  it is entirely possible to define reduction 
semantics in terms of trees and operations upon trees.

Take a simple example:  if we have an element <A> whose content
model is "B+", does the order of <B>s matter?  In SGML there is
no way of saying yes or no.  Reduction semantics allows you to
say that this tree (gesture)

     <a><b>Apples ... </b><b>Oranges ... </b></a>

is the same as that tree (gesture)

     <a><b>Oranges ... </b><b>Apples ... </b></a>

so sequence is not important.  Or that they are not the same, so
sequence is significant.

We have a good example of this type of work in the paper "Mind Your
Grammar" and the grammar-based DB work at the university of Waterloo by
Frank Tompa and Gaston Gonnet.(5)  I think this is a very important
field for further work.

In summary:  we have at least six areas to explore in trying to work on
better semantic specification for SGML:  structured documentation (the
kind of thing SGML itself is good at), synonymy, classes, operation
definitions, axiomatic semantics, and reduction semantics.

I don't know whether these activities would constitute the
specification of a semantics for SGML and for our applications, or only
a substitute for such a specification, in the face of the fact that we
don't really know how to say what things mean.  Certainly no
lexicographer, no historical linguist, would feel they constituted an
adequate account of the meaning of anything.  And yet I suspect that
these activities all represent promising fields of activity.

IV.3   Validation and Integrity Checking

A formal model would make it possible to formulate cleanly many
of the kinds of constraints not presently expressible in SGML.
This is by no means an exhaustive or even a systematic list, but
at least all the problems are real:

	* If an attribute SCREEN-TYPE has the value BLACK-AND-
	  WHITE, the attribute COLOR-METHOD almost certainly should 
	  have the value DOES-NOT-APPLY.

	 But this kind of constraint on sets of attribute values
	 is impossible to specify for SGML attributes.  It would certainly 
	 be useful sometimes to be able to define co
	 occurrence constraints between attribute values.
	
	* Similarly, there are cases where one would like to constrain 
	  element content in a way I don't know how to do with 
	  content models.  We have heard repeatedly in this
	  conference about revision and version control systems
	  which allow multiple versions of a document to be encoded in 
	  a single SGML document. For example, one might
	  have a <revision> element which contains a series of
	  <version> elements.   The TEI defines just such a tag pair.  
	  At the moment our <version> element can contain only
	  character and phrase elements.  It would be nice to allow
	  it to operate as well upon the kind of SGML-element-based deltas
	  that Diane Kennedy described the other day
	  for revision info, in which the unit of a revision was
	  always an SGML element. If a change is made within a paragraph, 
	  the entire paragraph is treated as having been
	  changed, and versioning consists in choosing the right
	  copy of the paragraph.(6)  But one would like to be able to 
	  specify that if the first <version> element contains a <p>
	  element, the second had better contain one as well, and
	  not a whole new subsection or just a phrase.  Otherwise, the 
	  SGML document produced as output from a version
	  -selection processor would not be parsable.

	* It would be nice to be able to require that an element be
	  a valid Gregorian date, or a valid ISO date, or a valid part 
	  number, etc., etc.

	* It would be nice to be able to require character data to
	  appear within a required element:  i.e. to have a variant on 
	  PCDATA whose meaning is "character+" and not
	  "character*" -- or even to require a minimum length, as
	  for social security numbers, phone numbers, or zip codes.

	* The SGML IDREF is frequently used as a generic pointer.
	  Many people wish they could do in SGML what we can do in 
	  programming languages, and require a given
	  pointers to point at a particular type of object.  (The
	  pointer in a <figref> had better be pointing at a figure, 
	  or the formatter is going to be very unhappy.)

	* Similarly, it would be nice to have a type system that
	  understood classes and subclasses.  The only reason we face 
	  this nasty choice between using the tag
	  <goings> and using the tag <section> for the New Yorker's
	  "Goings on Around Town" section is that we have no way to 
	  make a processor understand that <goings>
	  and <TalkOfTheTown> and so on are just specialized
	  versions of <section> or <article>.  If we use the specialized 
	  tags, and want to specify an operation upon all
	  sections of all magazines, we must make an exhaustive
	  list of all the element types which are specializations of 
	  <section>.  To be sure, our application systems can
	  handle this.  But we want to constrain early and often.
	  And never constrain in your application what you could 
	  constrain in the DTD.

			    Section V
		    CONCLUSION:  WHY BOTHER?

I suppose you can sum up my entire talk today this way.  We want to
constrain our data early and often.  To do this, we need better
validation methods.  To express the validation we need, we need a clean
formal model and a vocabulary for expressing it.  The query languages
described yesterday are not the final word but they are a crucial first
step.

Why do we want to do all these things?  Why bother with formal
specification?  Because formal specification and formal validation 
are SGML's great strengths.

Why is it, as Charles Goldfarb said on Monday, that SGML allows us to
define better solutions than the ad hoc solutions built around a
specific technology?  It is because SGML provides a logical view of
problems, not an ad hoc view based on a specific technology.
Naturally, it seems to suit the technology less well than the ad hoc
approach.  But when the underlying technology changes, ad hoc solutions
begin to fit less well, and look less like ad hoc solutions, and more
like odd hack solutions.(7) But we can improve SGML's ability to
specify the logical level of our data and our applications.  And so we
should.

A logical view is better than a technology-specific view.  And so we
should welcome every effort to improve the tools available to use in
defining our logical view.  In this connection I could mention again
the work by Gonnet and Tompa on large textual databases, and the work
of Anne Brggemann-Klein which is occasionally reported on the Netnews
forum comp.text.sgml.

Success in improving our logical view of the data is what will
enable the quiet revolution called SGML to succeed.

And now I hope you'll join me in thanking Yuri Rubinsky, for organizing
this conference and for allowing all of us co-conspirators in the
revolution to get together and plot.


      -----------------------------------------------------

(1)  Charles Goldfarb, "I Have Seen the Future of SGML, and It Is
..."     (Keynote, SGML '92), 26 October 1992.
(2) Tim Bray, "SGML as Foundation for a Post-Relational Database
Model,"  talk at SGML '92, 28 October 1992.
(3) B. Tommie Usdin (et al.), "One Doc -- Five Ways:  A Comparative 
DTD  Session," (panel discussion of five sample DTDs for
the New Yorker magazine), SGML '92, 27 October 1992.
(4) There was; metamorphic rock can be vuggy too, so the initial
definition was too narrow.  - MSM
(5)  Gaston H. Gonnet and Frank Wm. Tompa, "Mind Your Grammar:  a
New Approach to Modelling Text," in Proceedings of the 13th
Very Large Data Base Conference, Brighton, 1987.
(6)  Diane Kennedy, "The Air Transport Association / Aerospace
Industries Association, Rev 100", talk at SGML '92, 28
October 1992.
(7) I owe this pun to John Schulien of UIC.