The following report was obtained from the Exeter SGML Project FTP server as Report No. 9, in UNIX "tar" and "compress" (.Z) format. It is unchanged here except for the conversion of SGML markup characters into entity references, in support of HTML. The document contains two appendices: (1) Yuri's "Year in Review", and (2) Michael Sperberg-McQueen's closing address
THE SGML PROJECT SGML/R16 CONFERENCE REPORT SGML '92 DANVERS, MA, USA OCTOBER 25TH-29TH 1992 Issued by Michael G Popham 2nd December 1992 ----------------------------------------------------------------- NOTE: Thanks to Yuri Rubinsky and Michael Sperberg-McQueen, for posting the texts of their speeches to comp.text.sgml, which have been reproduced in the Appendices to this report. Initially, Yuri and Michael's postings were reproduced without their permission -- so as well as a huge debt, I also owe them my profound apologies. Copyright and permission to re-use these texts remain with Yuri and Michael. BACKGROUND This was the tenth in a series of annual SGML conferences organized by the Graphic Communications Association (GCA). This was the most well attended conference to date, with 270 attendees drawn from a wide range of backgrounds. SESSIONS ATTENDED 1. SGML: The Year in Review -- Yuri Rubinsky (SoftQuad, Canada) (Full text of this presentation is given in Appendix I attached) 2. I have seen the Future of SGML and It is ... -- Dr Charles Goldfarb (IBM USA) 3. Lessons Learned from the Text Encoding Initiative -- Susan Hockey (CETH USA) 4. The Marks that Monks Make: Tagging Irish Manuscripts -- Peter Flynn (University College, Cork, Ireland) 5. Using SGML in Non-SGML Environments -- Ludo van Vooren and Eric Severson (Avalanche Development Company, USA) 6. SGML and Braille -- George Kerscher (Recording for the Blind, USA), Yuri Rubinsky (SoftQuad, Canada) 7. Standards Activity and News Briefing -- Various Speakers 8. The Novice's Guide to HyTime -- Lloyd Rutledge (University of Massachusetts, USA) 9. One Doc - Five Ways: Comparative DTD Session -- Various Speakers 10. The OSF DTD Recommendations: Lessons We Learned -- Jeanne El Andaloussi (Bull SA), Eve Maler (Digital Equipment Corporation) 11. Guidelines for Document Analysis reports and a Tool for Maintaining Many, Varied DTDs -- Dennis O'Connor (Bureau of National Affairs Inc, USA) 12. Sharing the Lessons of the CALS SGML Activity -- Beth Micksh, Robin Tomlin (Intergraph, USA) 13. SGML: Extending and Confirming Object-Based Software -- Don Davis (Interleaf, USA) 14. Poster Session -- Various speakers 15. International SGML Users' Group Meeting 16. The Society of Automotive Engineers J2008 Task Force -- Jim Harvey (Volt, USA) 17. The Air Transport Association/Aerospace Industries Association, rev 100 -- Diane Kennedy (Datalogics, USA) 18. The Davenport Group for On-line Documentation -- Various Speakers 19. Implementing a HyTime System in a Research Environment -- Lloyd Rutledg (University of Massachusettes, USA) 20. Poster Session -- Various Speakers 21. The User SGML at the Boston Computer Society -- Sam Hunting and Irina Golfman 22. Document Management in Production Publishing Environments -- William Trippe (Xyvision, USA) 23. An SGML Pilot Project: The OSTI Reports Received List -- Norman Smith (Science Application International Corporation, USA) 24. Frame-Based SGML -- Len Bullard (Paramax, USA) 25. SGML as Foundation for a Post-Relational Database Model -- Tim Bray (Open Text Corporation, Canada) 26. The SGML View of a Database -- Bob Barlow, Fritz Eberle (AGFA, CAPS) 27. SGML Queries -- Paula Angerstein (Texcel, USA) 27.1 Comparative Implementation of the SGML/Search Query Language Francois Chahuneau (AIS/Berger Levault, France) 27.2 Structured Queries and Location Models: DSSSL Paul Grosso (ArborText, USA) 27.3 Structured Queries and Location Models: HyQ Steve DeRose (Electronic Book Technologies, USA) 27.4 Structured Queries and Location Models: SFQL Neil Shapior (Scilab Inc, USA 28. Structured Queries and Location Models, Part II -- Various Speakers 29. Transforming Airworthiness Directives from Paper to CD- ROM -- Hally Ahearn (Oster & Associates Inc, USA) 30. The Development of a Technical Support Database for On- Line Access and for Publication on CD-ROM -- Elizabeth Jackson (K.E.B. Jackson Consulting) 31. The Making of Microsoft Cinemania: SGML in a Multimedia Environment -- John McFadden (Exoterica Corporation, Canada) 32. Process of Converting from Paper Documentation to SGML Based CD-Rom -- Ken Kershner (Silicon Graphics, USA) 33. Converting 180+ Million Pieces of Paper -- Eric Freese (Mead Data Central, USA) 34. Poster Session Conversion -- Various Speakers 35 Back to the Frontiers and Edges -- Michael Sperberg- McQueen (University of Chicago, USA) (Full text of this presentation is given in Appendix II attached). 1. SGML: The Year in Review -- Yuri Rubinsky (SoftQuad, Canada) Yuri Rubinsky (YR) offered his now traditional rapid run-through of the past year's main SGML events and activities. The full text of his presentation will be published in both <TAG> and The International SGML User's Group Newsletter (it also appears in Appendix A of this report). However, some of the highlights in the areas identified by YR include: * Standards activity HyTime, the Conformance Testing Initiative, the SGML review, and work on query languages. * User Groups new groups in Australia, United Kingdom, Washington DC., South Ontario, Seattle, Colorado * Major public initiatives European Community projects promoting information access access for the blind. Work of the Davenport Group. Work on ISO TR 9573 at the ISO. Work of the EWS (European Workgroup on SGML) on DTD for journal articles. * Major Corporate and government initiatives Use of SGML by Canadian Standards Association, Australian Parliament, US Navy, Fokker aircraft, Wolters Kluwer, Boeing, US Air, US Dept. of Energy, Silicon Graphics and Novelle (for delivering hardware and software documentation on CD-ROM) * Publications -- a new book by Joan Smith. 2nd print-run of The SGML Handbook. An electronic version of Practical SGML to be produced. * Vendors -- (In addition to those that would be documented later) YR mentioned WordPerfect's "Markup", Adobe's "Carousel" (2nd release to have SGML `smarts'), Quarkexpress (will be able to export SGML) and the future release of TechnoTeacher's "HyMinder" (HyTime engine). * Miscellaneous December '92 release of Exoterica Corporations "Compleat SGML" text suite etc. on CD-ROM. Work of CURIA (The Irish Manuscript Project). Proposed SGML-aware extension to LaTex3. Release of the ICA's software for building translators. 2. I have seen the Future of SGML and It Is .... -- Dr Charles Goldfarb (IBM, USA) Dr Goldfarb (CG) began be expressing his personal disappointment at the continuing appearance of proprietary "standards" which users are still buying. He urged all attendees to promote the advantages of using SGML (and other non-proprietary standards) as widely as possible. He then began his talk proper. The world of the isolated single document is dead. Now there is hypertext and multimedia the future lies in documents that conform to SGML and HyTime (which is both an application of SGML and a conceptual extension to it). CG showed some slides of pages taken from the Winchester Bible a highly ornate, illustrated twelfth century manuscript. He proposed that these were, in fact, multimedia documents showing that readers have been using the techniques to access such texts for several centuries. CG also remarked that time dependencies represent a central feature of using hypertext and multimedia -- but again argued that readers have been familiar with such concepts for years, in the form of music manuscripts and scores. In the same way that music notation represents the relative duration of notes, HyTime extends the concept of addressing into "finite co-ordinate addressing", where location and time co-ordinates are expressed in terms of their relative position to a known address. This enables HyTime to express almost any kind of relationship, just as SGML can express any ordered structure. Co-ordinate space can have any number of dimensions and these, plus the units of measurement, are definable by the HyTime system developer. CG concluded by encouraging all new and prospective users of HyTime (and SGML) with the thought that we are all experts in the use of hypertext and multimedia already! 3. Lessons Learned from the Text Encoding Initiative -- Susan Hockey (CETH, USA) Susan Hockey (SH) summarized the organisational structure and work of the TEI. I refer readers who are not familiar with the TEI to the account of Lou Burnard's paper at SGML '91, given in the conference report produced by the SGML Project (SGML/R9). The latest TEI news from SH is summarized below: * TEI P2 is due by July '93, with a "final report" (P3) due soon after. * the TEI is actively seeking additional funding to continue its work. * users need to be educated to realize they will not need to adopt everything that is in the TEIs "Guidelines" in order to use them * libraries are taking an increasing interest in handling electronic texts. Much of their attention focuses on work to create a standard TEI document header (and the software to process these automatically). * the TEI is also seeking to establish guidelines for testing TEI conformance, and how best to maintain and develop the "Guidelines" 4. The Marks that Monks Make: Tagging Irish Manuscripts -- Peter Flynn (University College, Cork, Ireland. Peter Flynn (PF) described the activities of the CURIA project, which is funded for ten years to make machine-readable copies of Irish manuscripts texts from the sixth century to the sixteenth century. Most of these texts are illuminated, and written in either Irish and/or Latin. PF said they anticipated multiple uses for the electronic text archives, and that it was less important to record formatting information because the electronic texts would not be used to recreate printed or displayed copies of the original manuscripts. However they had to provide electronic texts which were acceptable to the increasing number of scholars who wished to analyse them. CURIA's aim is to make the archive widely accessible -- via anonymous ftp, telnet, WWW browsers (also Gopher and WAIS), on CD-ROM and via interactive Bitnet messages. PF showed how the electronic texts were derived; rather than using an original manuscript, they scan a book version (usually nineteenth century) of the text -- which takes about two minutes per page with 99% accuracy to get an ASCII file. The file is hand-tagged using "Author/Editor" (using Latin tag names, to satisfy the lingua franca of mediaeval scholars), to produce a TEI-conformant SGML file which can also be printed or displayed with user-defined options. 5. Using SGML in Non-SGML Environments -- Ludo van Vooren and Eric Severson (Avalanche Development Company, USA) Van Vooren and Severson (V&S) gave a highly entertaining dramatised dialogue to present a discussion of how best to implement SGML in a real-world, non-SGML environment. They suggested that it would be unacceptable to try and impose the use of SGML and/or structured editors in many traditional working environments. V&S urged systems implementors to make SGML appear as simple and friendly as possible through the adoption of minimally rigid DTDs, and "verifiers" which can be incorporated into familiar word processing environments, so that users can be guided to produce acceptable structured documents before the files converted into SGML. They argued that to be truly successful and widely adopted, SGML must offer solutions to the problem of managing the "infoglut" -- and that it must always appear simple, even if an implementation is, in fact, highly complicated behind the scenes. 6. SGML and Braille -- George Kerscher (Recording for the Blind, USA), Yuri Rubinsky (SoftQuad, Canada) George Kerscher (GK) spoke of the Texas State law which has recently been passed to mandate that all school texts must be made available to the blind and other "print disabled" groups. Similar laws have subsequently been passed in eight other states. Under the Texas law, conventional publishers are required to provide braille publishers with the electronic files they use to format their printed textbooks. Braille publishers are able to strip out the formatting codes to produce a plain ASCII file which they can then mark up using SGML. However, as electronic text/manuscript standards, such as the one developed by the AAP become widely adopted amongst publishers, the braille publishers will need to find a way of mapping from say, the AAP DTD to one which provides sufficient structural information for their needs. Yuri Rubinsky described the tag set which has been developed for this second target DTD and discussed some of the problems that the translation process has identified -- ie the need to change tag names, recognise contexts , and so on. This work was being carried out in conjunction with the activities of the International Committee for Accessible Document Design (ICADD). 7. Standards Activity and News Briefing -- Various Speakers Sharon Adler (Electronic Book Technologies, USA) described the latest developments affecting SGML Registration Procedures and Authority. After several years delay following the appearance of ISO 9070, it now seems almost certain that the GCA will become the controlling authority for issuing Public Owner Identifiers etc. Marion Elledge (GCA, USA) described activities in the sphere of conformance testing. Harmonized test suites will be developed by the end of 1992, and in the first quarter of 1993 an SGML conformance testing laboratory will be established at the NCC in Manchester (UK). The laboratory will test parsers for conformance to the core of ISO 8879, test features (such as minimization), and test applications (such as the MIL 28001 DTDs, ATA DTDs etc.) Anders Berglund summarized moves towards developing an harmonized SGML math. He characterized the existing difference as follows: * ISO 9573 and AAP DTD offer meaningful element names, and aim to represent first year university math (including basic semantics) * Euromath DTD offers a layout-oriented approach * ISO TR 9573 and AAP DTD have more in common than they have differences. He suggested that the work of the AAP math revision subcommittee will be closely based on the latest version of ISO TR 9573, which offers a three layer approach to math encoding. He argued that this approach offered opportunities for semantic and layout- oriented markup and was highly user-extensible. Work on the AAP math DTD would continue at the conference. Sharon Adler (Electronic Book Technologies, USA) gave an update on the status of DSSSL (ISO/IEC DIS 10179 -- Document Style Semantics and Specification Language (DSSSL). The second DIS is scheduled for ballot during April 1993, and it is hoped that it will be available as a full International Standard within the following six months. A number of issues and features surrounding DSSSL still remain to be resolved and Adler welcomed any input from users through their national standards bodies. 8. The Novice's Guide to HyTime -- Lloyd Rutledge (University of Massachusetts, USA) Lloyd Rutledge offered a very detailed introduction to the main concepts of the HyTime standard (ISO 10744). Much of the value of the session came from extensive question and answer periods. The content of the slides was too extensive to reproduce here, and interested readers are recommended to contact Rutlege directly (his address can be obtained from the GCA or The SGML Project). 9. One Doc - Five Ways: Comparative DTD Session -- Various Speakers This session was Chaired by Tommie Usdin (ATLIS Consulting Group, USA), who began by expressing her belief that DTD development is an art form, not a science. Five experienced DTD designers had been asked to produce a DTD for "The New Yorker" magazine. Each of the five was asked to keep a particular application or purpose in mind when writing their DTD. Usdin remarked that the five DTD designs had thrown up a number of interesting similarities and differences. Similarities included: * Similar tag names chosen * influence of the AAP model * layout of the DTDs (they all looked physically similar) * oriented towards content modelling (even when the * designers had been asked to focus on formatting issues) * all the authors had identified themselves in comments near the start of the DTD. Differences: * only one DTD expressly accommodates document management * inclusion/omission of the SGML declaration level of detail * only one DTD design was modularized * only one contained "code comments" * several different types of graphics had been identified * documentation: internal, external, both or none * great variation in the extent of use of attributes Debbie Lapeyre had been asked to produce a design based on the AAP standard. She changed the AAP DTD model by stripping out a number of tags and content models -- and tried to add as few new tags/models as possible. She modularized her DTD to reflect the split between the magazine's editorial and authorial production roles. All the base level elements were therefore included at the "article" module level (ie. the authorship level), but were accessible from the "magazine" (editorial) level module. Yuri Rubinsky's design illustrated the use of content tags and tagging techniques. He felt that content could only be modelled in consultation with end users of the DTD, and so had "cheated" by actually phoning the staff of "The New Yorker" magazine! Some content was fixed, but some was surprisingly varied -- for example, titles in the table of content were not always the same as those actually used in the articles themselves. Rubinsky discussed how he had identified separate pieces of content and justified his design decisions in terms of how they modelled the end users' perception of the magazine content. Halcyon Ahearn produced a DTD that took advantage of the markup minimization features of SGML. She had scanned pages of "The New Yorker", then used regular expressions to enable an SGML parser to recognise certain character combinations in the file as being DTD tags -- ie that two successive carriage returns indicated the end of one paragraph and the start of the next. She altered the SGML declaration to permit the use of long, descriptive tag names, but used parameter entities to simplify the content models used in the DTD. She had also taken advantage of SGML's SHORTREF and LINK features to further minmize markup without losing any functionality. Dennis O'Connor had written a DTD to support print publication. Although he was concerned with formatting issues, he had found it necessary to use a surprising number of content-oriented tags. He used an empty <font> element to signify changes to the current font when printing. Dimensions of figures and graphics were expressed in terms of the number of whole column widths they were intended to span. David Durand and Steve DeRose had produced a DTD to support hypermedia applications. They stated that their design had been based on the creation of a traditional (non-hypermedia) DTD design, with additional "milestone" tags incorporated to aid navigation and reference within the document. Some of the hypermedia links would have to be hand-coded into a Hypermedia document that conformed to their DTD, whilst other links could be derived from database versions or indexes of the document. The practical issues involved, such as dealing with different graphic types, would take a long time to resolve (and they had not tackled them in their design). Putting IDs on every tag seemed the simplest way to allow hypertext linking/searching/navigation; they suggested the IDs are the basic pointing mechanism within SGML. Their envisaged application would use an "authority list" of document object IDs to control hypermedia linking and retrieval. They remarked that the more a DTD facilitated content tagging, the easier it would be to convert a document conforming to such a DTD into some sort of hypermedia form. 10. The OSF DTD Recommendations: Lessons We Learned -- Jeanne El Andaloussi (Bull SA), Eve Maler (Digital Equipment Corporation) Andaloussi and Maler (A&M) gave an account of their experiences developing the OSF's DTD recommendations. They had developed a DTD design methodology as part of the project, and went on to describe its benefits -- namely that it was a rigorous process enabling high quality DTD design with clear rationale; it also produced very thorough and understandable documentation to accompany the DTD. In order to produce the guidelines, the OSF had asked its members to send DTDs so that they could be analyzed by a design team. The bulk of A&M's presentation focussed on the development and use of a tree-based diagram notation to reflect a DTDs overall design, element hierarchy and so forth. Making DTD designs explicit in this way had enabled them to compare and better justify DTD designs decisions. Each stage of the DTD design process was also thoroughly documented through the use of standardized forms. A&M felt that their design methodology and accompanying tools bore close similarities to current software development practices. The itterative design process had proved to be very thorough, and encouraged designers to work for and justify the functionality they really wanted. The use of the tree-diagram notation approach, and standard documentation forms had enabled non-SGML experts to contribute effectively to the design process. A&M are hoping to produce a more generally applicable and formalized DTD design methodology. They repeatedly identified the benefits to be gained from agreeing and documenting clearly- defined design axioms and principles throughout the design process. The creation and maintenance of a glossary of key terms played a major part in ensuring that all those involved in the design process used the same terms in the same way -- thereby minimizing any ambiguities and misunderstandings between members of the DTD design and maintenance teams. 11. Guidelines for Document Analysis Reports and a Tool for Maintaining many, varied DTDs -- Dennis O'Connor (Bureau of National Affairs Inc, USA) Dennis O'Connor (DC) described his work to develop guidelines for writing Document Analysis reports -- which provide a means for maintaining many, varied DTDs. The reports facilitate inter- departmental communications, permit control over the growing volume of document types, and enhance the quality of available documentation. DC's guidelines define the structure and content of Document Analysis reports, and are written in plain, non-technical English. The Guidelines require a Document Analysis report to include background information -- such as "Who did the document analysis?". "Which documents were looked at?", "When was the analysis done?", "Why was the analysis done?" The bulk of the guidelines require the author of a Document Analysis report to define the document type, what a document is made up of (elements, attributes, character sets, special characters and other entities). The guidelines also require the recording of the history of the Document Analysis report -- when was it written? What, when, why were changes made? A well written Document Analysis report provides a means for those who know about documents to communicate with those who know about SGML. The report effectively provides some of the documentation which ought to accompany a DTD. 12. Sharing the Lessons of the CALS SGML Activity -- Beth Micksh, Robin Tomlin (Intergraph, USA). The CALS (Computer-Aided Acquisition and Logistics Support) initiative, which started in 1985, aims to improve the timeliness of documentation and improve the quality of weapon systems. Micksh and Tomlin (M&T) felt that users in other industries could learn from the experiences of those involved in CALS -- and perhaps adopt similar terminology, or methodology in their own areas. The presentation took the form of a mock news cast, with CALS celebrities being given brief interviews. The areas covered and interviewees' remarks are summarized below: * MIL-28001 the SGML part of CALS. Work was initially industry/committee driven but now there is a need for a single central body (such as the Dept of Defense) to take over and unify the administrative functions. * Output Specifications -- a standard way of exchanging formatting information (an appendix of MIL-28001 currently under revision). Other industries would probably benefit from producing their own output specifications. * Electronic Review -- a machine-readable way of handling structured comments on electronic texts. ie comments are inserted in a standard way (using SGML tags and attributes) -- so that they can be easily searched, retrieved etc. Structured comments are quite easy to use and are vital to any complex review project which requires careful monitoring and control of comments. * Declaration Subsets -- declarations in addition to the external declaration subset (DTD). Users can include things in the declaration subset which can override parts of the DTD. This effectively makes a DTD modifiable/modularized -- so it would be possible, for example, to take a DTD and add electronic review functionality without affecting the main DTD. * CALS SGML Registry and Library -- registration/evaluation process for all the elements etc. going into CALS DTDs, FOSIs etc. The aim of the library is to allow DTD designers/users to login and download example DTDs, entity sets and so on. 13. SGML: Extending and Confirming Object-Based Software -- Don Davis (Interleaf, USA) Much of Don Davis' (DD) presentation looked at the particular Object-Oriented approach to SGML adopted within the Interleaf range of products. Such details are omitted here, and interested readers should contact either DD or their nearest Interleaf agent. DD suggested that it is useful to think of an SGML encoded data stream as a set of objects, where the element structure provides object "handles", and element-in-context details add more information. Treating SGML encoded data in this manner, enables the use of Object-Oriented programming languages and techniques to support the mapping of SGML files to/from proprietary, object- based editing/processing systems. DD proposed that an Object- Oriented approach provides significant benefits when developing and deploying SGML-based applications. 14. Poster Session -- Various Speakers This poster session offered software vendors and developers the opportunity to demonstrate and discuss their products. The companies present were as follows: Agfa CAPS, ArborText, Avalanche, Data Conversion Laboratory, Datalogics, Exoterica, Frame Technology Corporation, Interleaf, Open Text Corporation, Recording for the Blind, Silicon Graphics, SoftQuad, TMS Incorporated, US Lynx, Xerox, Zandar. 15. International SGML Users Group Meeting Considering the number of people at the conference, this meeting was (surprisingly) poorly attended, only about 25 people came. The meeting was chaired by Steve Downie (Secretary of the International SGML Users' Group). Brief reports were heard from representatives of the following Chapters: SGML Forum of New York Canadian SGML Users' Group Mid-Atlantic SGML Users' Group Mid-West SGML Forum SGML UK SGML France There was a limited discussion on what the various Chapters did (and should do), how much they charged etc. Steve Downie reported that the Canadian Chapter will be making a proposal to the International SGML Users' Group to launch an initiative to produce reports on issues of typical concern to SGML systems implementors (ie human, contractual and technical issues). These reports would be distributed through the Users' Group membership to facilitate their work. Yet again, calls were made for the publication of conference papers -- either by the GCA or the Users' group. A recommendation to this effect will be put to the Users' Group Committee. 16. The Society of Automotive Engineers J2008 Task Force -- Jim Harvey (Volt, USA) J2008 is a standard for the automotive industry which resulted from the passing of the Clean Air Act. J2008 will shortly appear as a published document. Producing J2008 involved setting up a number of committees to look at: data modelling (ie what data should be brought together?), DTD development, Administration, Orientation, Communication and Graphics (TIFF, CGM). Bringing the DTD and Data Model together had revealed some problems however two manufacturers are now testing the DTD, and their comments will be taken into account when it is revised for publication. Manufacturers aim to use a database of J2008-conforming documents for publication; third party information provider will be able to take J2008-conforming data and re-use the information to supply to independent service providers. J2008 makes no recommendations about the hardware or software anyone should use but all adherents to the standard will want to re-use their data as much as possible. A draft version of the DTD will soon be publicly available for comment. 17. The Air Transport Association/Aerospace Industries Association, Rev 100 -- Diane Kennedy (Datalogics, USA) ATA is a written specification (cf CALS), first issued in 1956 and currently undergoing its thirty-first revision. In the 1988, revision of ATA 100, it was decided that SGML should be the standard for the interchange of electronic data (with graphics conforming to CCITT 4 and CGM). Revision 31 of ATA 100 (approved two weeks ago) includes six DTDs for Aircraft Maintenance Manuals, Aircraft illustrated Parts Catalogues, Engine Shop Manuals, Engine Illustrated Parts Catalogues, Service Bulletins, and Master Minimum Equipment Lists. ATA's first attempts to develop a DTD had relied on a single small group which had proved to be very slow. Consequently, a set of controls were specified to speed up DTD development -- including the setting up a strong committee structure to handle DTD development and approval, and the production of a DTD requirements document and an industry glossary. All six current DTDs have been harmonized for revision 31. The DTD working groups have a majority membership of subject matter experts (with little/no SGML experience), with the bulk of SGML work being done by SGML experts working in conjunction with the groups. SGML DTD modelling is done using structure charts. Conformance to the ATA DTD recommendations is voluntary -- so they have introduced four levels of DTD to allow for options based on manufacturers' needs. * Level 1 DTDs (DTD is precisely defined; no options; all tags in glossary) * Level 2 DTDs (DTD is precisely defined; options allowed, all tags in glossary) * Level 3 DTDs (DTD defined by document producer following ATA framework; all tags in glossary; used where there is minimal agreement between a group of manufacturers). * Level 4 DTDs (DTD defined by document producer; all tags in glossary; used for unique documents - ie where one manufacturer produces a unique product). All document revisions are strictly controlled and recorded. The have adopted the CALS approach to tables. In 1993, the number of DTDs within the ATA is expected to more than double. Many airlines are automating to take advantage of such new publication/information standards. 18. The Davenport Group for On-line Documentation -- Various Speakers Fred Dalyrymple (FD) summarized the reasons for the formation of the Davenport Group. Manufacturers, vendors and users had all experienced the frustrations of working with proprietary solutions, and wanted to take advantage of emerging standards such as SGML and HyTime. Standardizing information would enable generalized, shareable help systems, on-line documentation, virtual libraries, information webs etc. The Davenport group had established four working groups, dealing with * architectural forms (producing DASH and SOFABED standards) * the Committee for the Common Man * SGML Query Language * SGML Resources (cf. the CALS Registry and Library) In January 1992 a Davenport workshop on Architectural Forms was held, in the hope that this would enable the unification of several manufacturers DTDs. At that time, not many people knew about HyTime, which was still a Draft International Standard. General aims included * unifying DTD linking specifications (via HyTime's Architectural Forms) * enabling equal access to diverse technical documents * separating link descriptions from access methods * enabling the assembly of several documents into compound documents * enabling document publication in a variety of media from a single source document. So far, they have produced the Davenport Advisory Standard for Hypermedia (DASH) document -- which includes specifications relating to producing indexes, glossaries, bibliographies, tables of contents and cross references. Participants have also gained a working knowledge of HyTime. The workgroup responsible for producing the DASH has been very productive, and once they have published a final version of the document (due soon), the group will be disbanded. Lar Kaufman spoke about the work of the Committee for the Common Man (where "man" alludes to the Unix command to request on-line documentation, and the set of macros used to produce such documentation). Originally a separate endeavour, the CftCM had soon decided to work within the Davenport Group. The CftCM had emerged following a discussion on the USENET newsgroup comp.text.sgml, and had worked largely through email correspondence between expert volunteers. To their surprise, they had found "man" papers to be less consistent, structured and uniform than they had first imagined. The results of their work will be posted to comp.text.smgl as a "White Paper", with a request for comments. It should be noted that the CftCM envisage completely portable "man" pages, -- ie across all systems (not just UNIX). Although they have defined a tag set, they have not yet agreed a heirarchical structure for the tags -- they also still need to consider how to convert the `legacy data' of existing "man " pages, and how best to take advantage of the existing tools for viewing/manipulating "man" pages. 19. Implementing a HyTime System in a Research Environment -- Lloyd Rutledge (University of Massachusettes, USA) It is very difficult to report the content of Lloyd Rutledge's (LR) presentation -- since it relied very heavily on complex schematic diagrams showing a number of models. LR showed a model for his HyTime Hypermedia Presentation System and also one for the data layers in the system. Much of his presentation focussed on a diagram showing a model of his Hyperdocument Processing System being developed at the University of Massachusettes. Built around a shared database, the model included a conventional SGML parser, through which all SGML/HyTime input to the system had to pass. The results of passing files through the parser are stored in the database. Next, a HyTime Engine takes in the output of the SGML parser and checks the HyTime-specific markup; it can query the SGML document stored in the database, if necessary, The output of the HyTime Engine is also stored in the database. The output from the HyTime Engine is also passed to a HyTime DTD (HDTD) processor (which can also query the HyTime Engine's output to the DTD). The HDTD processor outputs to the hypermedia presentation system (ie the application), and takes queries from it which the HDTD processor in turn uses to query the database. In the case of large documents and/or interactive processing, some outputs of the HDTD processor may have to be passed back to the SGML parser -- eg if the processor needs to open a particular file (which will have to go through the SGML parser and HyTime Engine as usual, before it can be used by the HDTD processor). 20. Poster Session -- Various Speakers This poster session looked at the problems of handling tables within SGML. There were ten presenters, covering many of the concerns about tables. Since poster sessions thrive upon ad lib discussions, I have not reported them here. 21. The User SGML at the Boston Computer Society -- Sam Hunting and Irina Golfman Sam Hunting (SH) gave some of the background to the Boston Computing Society -- which is the world's largest user group (c.25,000 members), is organized by volunteers, and publishes around 25 newsletters, maintains bulletin boards and help lines (each of which represent document content). SH identified the typical problems of maintaining information in conventional print or electronic form. Document content cannot be easily retrieved, repackaged, or delivered in a number of different media from a single source. Presentation of such information is enforceable only by editorial staff working to strictguidelines, and conventional page markup is difficult to automate. SGML allows information content to be retrieved, repackaged and so forth; it can also permit the enforcement of house-style structures (thereby permitting automated page makeup). The Boston Computer Society has the same difficulties implementing SGML as many other organizations have experienced -- resistance from staff, lack of SGML expertise, no easy to use, attractive, interactive SGML applications, no "shrink-wrapped" tools to process SGML etc. Irina Golfman (IG) described how they had tried to overcome the problems of implementing SGML. Volunteers had had to be educated so that they could develop some enthusiasm for SGML. Vendors of SGML-aware products had been invited to give presentations. The Hypertext-subgroup sponsors the "SGML Panel", to discuss the practical implementation issues of using SGML. A growing number of BCS print publications would be produced from SGML source files (at present four newsletters are done in this way). Several other projects and activities were also envisaged. 22. Document Management in Production Publishing Environments -- William Trippe (Xyvision, USA) Xyvision sells UNIX-based production publishing systems. William Trippe (WT) reported that they are encountering a growing interest in the use of SGML within traditional publishing organizations. Xyvision's experiences have shown that there are great production benefits to be gained from using SGML -- and that one of the keys to successful SGML system implementation and acceptance is satisfying the concerns of editorial personnel. However, WT remarked that it is important to remember that large SGML applications require lots of computing "horsepower" (eg large CPU, memory disk storage, fast backup devices etc.) In order to ensure its success, WT recommended that SGML needs: * more of an infrastructure * standard data modelling techniques * standard training and documentation tools * a more helpful and obvious vocabulary * more helpful tools 23. An SGML Pilot Project: The OSTI Reports Received List -- Norman Smith (Science Application International Corporation, USA) The Office of Science and Technical Information (OSTI) maintains a large energy science and technical database (2 GB of data, with a 14 month maintenance window). The database can be used to generate bibliographies, reports etc. The Reports Received List is a paper document produced to accompany the microfiches of all the reports received at OSTI each week. Norman Smith's (NS) pilot project aimed to produce the Reports Received list as an SGML document. Since it was already highly structured, it was quite straightforward to alter their existing structured production process to output an SGML document. They also wrote an electronic document viewer (as a stand-alone application) -- which adds value to the SGML document and offers potential benefits for other SGML applications that might be developed at OSTI. NS summarized the lessons that they learned from their experiences during the pilot project * Document analysts and application programmers must work closely together * Treat DTD development as a software development project * SGML applications are easier to develop when approached from a database-oriented perspective. * use selective parsing * add value to distributed documents (by producing tools etc) for non-SGML users. NS also offered the following hints for anyone about to embark on an SGML pilot project: * pick something do-able but not trivial * keep it simple and non-critical. (If it succeeds good. If it fails - it can be thrown away). * approach from a database perspective * involve the whole organization * work within the h/w s/w framework provided * provide adequate resources for the project * don't be afraid to try unconventional approaches 24. Frame-Based SGML -- Len Bullard (Paramax, USA) Len Bullard demonstrated the Interactive Authoring and Display System (IADS) that Paramax had developed to create and deliver frame-based SGML. The demonstration content discussed the nature of frame-based SGML. IADS was running under MS-Windows, taking raw SGML frames and displaying them interactively. A "frame" is an addressable SGML node in a map of SGML nodes. The map of frames is a flat file web of frame nodes connected by links. Len Bullard showed the element declarations for the frames, buttons, hotspots etc that are hard-coded into the SGML file(s) which underlie IADS. 25. SGML as Foundation for a Post-Relational Database Model -- Tim Bray (Open Text Corporation, Canada) Tim Bray (TB) discussed how current attitudes to text processing limited the usefulness of the entire process. For example, current opinion tends to reflect such statements as: * files belong to applications * good printout = good application * data sharing = data conversion * no ad hoc access to information * intolerable application backlog He proposed that the field of MIS had been a comparable mess prior to the adoption of a number of techniques -- which could also be usefully employed in the text processing arena. These techniques included:- * data centering via database management * data modelling for system and language * using a database access language * indexing for performance * using 4GLs and GUIs * using administration features (eg concurrency control, transaction model, an audit trail etc). However, unlike MIS, TB argued that the field of text processing could not adopt a relational approach to implementation (cf the success of Relational Databases within MIS). TB argued that text processing could not follow a relational model because: * text is tabular and not normalized * in text, nothing above the character level is atomic -- so entity-relationship modelling is hard * neither the relational algebra nor calculus are an effective Data Access Language for text. * it is possible to decompose a DTD into a relational schema, but it is a bad idea (except in cases where documents are of an identical, tabular structure eg insurance claim forms. With regard to the techniques mentioned earlier, TB proposed that SGML represented the data modelling system and language. A database language is under development (eg DSSSL). We are already able to index for performance use 4GLs and GUIs and make use of database administration features. All that is now required is an SGML-based, post-Relational implementation. 26. The SGML View of a Database -- Bob Barlow, Fritz Eberle (AGFA, CAPS) Bob Barlow (BB) began by stating that when people are considering implementing an SGML database, they are confronted by a number of questions "To what level of granularity should I store my data?", "How do I identify the stored objects?", "How do I accomplish revision control?" Granularity issues will affect the pieces of data that are available for use/re-use, location in the database etc. Identifying objects involves considering if there is a relationship between SGML element IDs and database IDs, how to link to SGML objects, what information is stored in the database, and what is stored in SGML attributes. Revision control could be carried out within the SGML document or within the database environment. Fritz Eberle offered some general approaches for resolving the sorts of questions raised by BB. For example, when considering approaches to SGML document management, it is possible to exploit structure at the element, entity or content level -- or a combination of any of these. One of the interesting cases raised in Fritz Eberle's presentation was an example where two hyperdocuments share the same entity; whilst the entity is the same, the links that each uses to retrieve the entity may have different names. 27. SGML Queries -- Paula Angerstein (Texcel, USA) Paula Angerstein (PA) as Chair of the panel dealing with the afternoon's topic of SGML and Databases, took this opportunity to say something about SGML queries. Essentially a query is a question about what is in an SGML document or documents. As far as users are concerned, they want access to all the information in the document which will resolve their query. However, PA pointed out that developing an SGML query language raises a number of issues, namely: * whether to query parsed or unparsed data * whether to query the ESIS and beyond * applying queries to classes vs instances * allowed "root" of a query * user interfaces to the query language PA had asked all of the panel speakers (who had developed query languages) to discuss their approaches to resolving a number of test queries. 27.1. Comparative Implementation of the SGML/Search Query Language -- Francois Chahuneau (AIS/Berger Levault, France SGML/Search is the name of both a query language, and a system implementing this language developed by AIS (running on top of PAT/Lector from OpenText). SGML/Search considers the document (or documents), if all conform to single DTD) as a database. It uses some of the structural information given in the ESIS, it loads SGML documents and extracts SGML fragments, and it returns SGML fragments in response to queries. SGML/Search can user numerous filtering conditions to refine its queries -- ie element type, element hierarchical position, structural or lexical distance (from a known point), attribute constraints, logical connectives to combine queries, set operations etc. Francois Chahuneau (FC) showed how SGML/Search would handle the sample queries proposed by Paula Augerstein, then went on to discuss some queries, which would be "impossible" within the syntax of SGML/Search (though the queries could be resolved in other ways). FC compared SGML/Search queries to DSSSL queries -- pointing out that SGML/Search is oriented towards fragment retrieval rather than processing and that an SGML/Search query target can only be an SGML element. He claimed that any valid SGML document can be imported into SGML/Search -- giving full access to the SGML structure and text content, as well as high indexing and search performance via PAT's full-text engine. SGML/Search is an open tool for systems integrators -- offering a forms-based GUI and a complete API accessible to C/C ++ programmers. 27.2 Structured Queries and Location Models: DSSSL -- Paul Grosso (ArborText, USA) Paul Grosso stated that DSSSL queries are useful for locating objects within a tree that are going to be acted upon. He also reminded everyone that the DSSSL query language is only one component of the whole DSSSL standard. The DSSSL query grammar is based upon several premise, namely that: * a query works on a tree of digits * objects have object attributes * a query uses relationships based on pre-order traversal of the tree * a query uses information from both structure and "content" * a query can "start" from the tree root, or any object, or set of objects PG went rapidly through a series a slides describing the DSSSL query language in more detail, then showed how DSSSL queries would be written to satisfy the text cases supplied by Paula Angerstein. Readers who wish to know more about DSSSL might try contacting PG through the conference organizers. Alternatively they could contact the committee of their national standards body assigned to consider DSSSL. 27.3 Structured Queries and Location Models: HyQ -- Steve DeRose (Electronic Book Technologies, USA) HyQ is the portion of HyTime concerned with querying. Querying is a central part of hypertext, and developers require systems to have both authored linking and dynamic querying. Steve DeRose (SD) characterized a query as a mechanism for specifying sets of things -- where a set of things is a "node list" (HyTime-speak for an ordered list of locations in the information world, locations can be spread across any number of documents and can be at any level from characters to the world!) HyQ provides a number of functions for operating on node lists set operations, list processing, filtering/selection operations etc. SD did not have the time to cover the features of HyQ in any real detail, and interested readers should try retrieving information from another source (ie see remarks at the end of the previous section, or contact SGML SIGhyper). SD summarized the design principles that underlie the HyQ query language: * must be adept, not just capable, in order to deal with complex trees * must handle multilingual data and different SGML syntaxes * must provide natural access to all SGML phenomena * it is not designed for end-users to type in (it will be hidden behind 4GLs and GUIs) * it has a macro mechanism to make it manageable SD then talked through the HyQ solutions to the sample queries distributed by Paula Angerstein. 27.4. Structured Queries and Location Models: SFQL -- Neil Shapiro (Scilab Inc, USA) Neil Shapiro (NS) reported on the ATA's experiences of querying with SQL (the precursor to SFQL). SFQL is being developed in response to the drive to produce Interactive Electronic Technical Manuals (IETMs) -- which consist of both data and applications, requiring quick and intelligent access to the information they contain. Typical IETMs are distributed on CD-ROMs and involve using very large files. Current problems arose from the fact that proprietary access software (requiring proprietary indexes) lead to software dependence. A typical airline will receive several IETMs from different manufacturers, and each will necessitate the use of different user interfaces (which is clearly unsatisfactory). The ATA solution is to work towards software independence, dividing the search engine (server) from the user interface (client). This means the server can represent data in a proprietary format, but the user will only see a single user interface. NS put up the following model diagram User interface (client) ^ | | | | | --------|-------|------- can standardize this | | | | area (the server | | | | request language --------|-------|------- | | | | | v Search engine (server) Possible standards available to the ATA for the server request language are SQL (structured Query Language) and Leverage SQL. SQL has been extended to give better support for text-based queries; the new version will be called SFQL (Structured Full- text Query Language). SFQL will support such things as: * fielded searches * advanced searches (using fuzzy matches) * retrieval control (relevance ranking, projection) * extended data types (SGML, CGM, TIFF CCITT etc). SFQL is an abstract access model, which conceals all index and storage design differences whilst the SFQL conceptual model enables views of data without saying how it should be stored. After reminding everyone that users should look for software independence, and not just data portability, NS went on to show how SFQL would resolve Paula Angerstein's sample query. 28. Structured Queries and Location Models, Part II -- Various The idea of this session was to build on the query language presentations given earlier. Initial comments revealed that DSSSL will only work with SGML documents, whilst HyQ needs to be able to support both SGML and other types of document. HyQ does not have the higher level constructs of some query languages, but this is intentional and can be circumvented by developing HyQ macros. Unfortunately the discussion degenerated into little more than a series of accusations and counter-accusations about what the various query languages could do and how efficiently they achieved this. It was mostly impossible to distinguish facts from opinions -- and perhaps the point most clearly demonstrated during the session was that the ISO standards developers held SFQL in low esteem. 29. Transforming Airworthiness Directives from Paper to CD-ROM -- Hally Ahearn (Oster & Associates Inc., USA) The first half of Hally Ahearn's (HA) talk was a conventional overview of SGML theory -- looking at such issues as the SGML processing model, the SGML Document, SGML as metalanguage, the role of the parser, the process of document analysis, and when to use private or public entities. In the second half of her presentation, HA demonstrated how ASCII output from a word-processing system could be successively marked up and parsed to give ever greater levels of structural encoding. The file could then be edited using SGML-aware structure editing tools so as to fully prepare it for processing for storage in a database and/or display via a presentation system. 30. The Development of a Technical Support Database for On-Line Access and for Publication on CD-ROM -- Elizabeth Jackson (K.E.B. Jackson Consulting) Elizabeth Jackson gave a high-level overview of the work she had done towards producing HelpdisQ -- a database of multi-vendor technical support for PC hardware and software products (published as a CD-ROM in May 1992). Regrettably, this presentation lacked detailed information, which was only available in a handout available at the conference. 31. The Making of Microsoft Cinemania: SGML in a Multimedia Environment -- John McFadden (Exoterica Corporation, Canada) Throughout this presentation John McFadden (JM) was able to demonstrate the main features and capabilities of MicroSoft's "Cinemania" CD-ROM. The disk has only been available for about three weeks and costs $79 -- although Microsoft had spend nearly $5m to produce it. Running under MS Windows, the user interface is a typical mix of windows, buttons and icons to access the Cinemania data. As well as selecting particular films, the user can call up glossary definitions, film-star biographies, academy awards lists, sound clips and video stills. The Cinemania database contains five traditional publications converted into a single SGML document database. The CD-ROM contains 22Mb of markup and text (the DTD is 64kb) -- and a total of 220Mb of data and software (so the disk is still less than half full!). The Cinemania database also contains 1.6 million identified SGML elementa, with typical element granularity of ten characters per element (cpe) whilst name elements typically have a granularity of 2cpe. The database stores 19000 movie listings, 300 biographies, 500 articles, 745 reviews, a glossary, 175 still video clips etc. The original paper documents were re-keyed to give an electronic form, which was automatically tagged, marked-up, and quality controlled using Exoterica's "OmniMark" product. Exoterica's internal "CheckMark" product was used for any final structural editing, and to handle exceptional cases which had not been tagged automatically. The resulting SGML was stored in an SGML Knowledge Base. Microsoft then took the SGML Knowledge Base, used OmniMark to control linking, flow and formatting (and some presentation design experience), to get the SGML data into the Microsoft Multimedia Viewer -- which is the front-end presentation system used to create the "Cinemania" product that users actually see. Looking at some instances of conventional SGML encoding JM showed how OmniMark could greatly simplify the markup which has to be added to a document. This markup stood in for conventional raw SGML encoding -- which made it much easier for authors with little SGML experience to edit the document. Exoterica worked hard to ensure that the resulting SGML Knowledge Base was completely independent of its intended use (eg it would now be a simple matter to create new electronic/paper books based on selected extractions from the Knowledge Base). There are no explicit links in the markup; these are generated when the data is translated for the Multimedia Viewer -- which makes updating or amending the data in the Knowledge Base, or even adding new categories, very much easier. Approaching the development in this way had proved to be very fast and economical. They have produced an extremely maintainable and adaptable information resource with guaranteed quality levels. OmniMark had been used to expedite things at almost every major point in the development of the system. JM also reserved high praise for MicroSoft's MultiMedia Viewer, which only costs $495. 32. Process of Converting from Paper Documentation to SGML Based CD-Rom -- Ken Kershner (Silicon Graphics, USA) Ken Kershner (KK) stated that the reason behind Silicon Graphics decision to adopt SGML had come from their wish to enable the company, its developers and customers to deliver technical reference information (in electronic form, for publishing on paper or on-line). This was a very visual demonstration, with the results of Silicon Graphic's effort being displayed and discussed throughout. The benefits of using SGML included: * reduced printing and freight costs (=money saved) * reduced hot line calls (=money saved) * quick information retrieval (=money+time saved) * creating allies in manufacturing and Customer Support The SGML-based electronic technical reference information system produced is called IRIS InSight. It provides "one-stop-shopping" for on-line documentation, support information, and tightly integrated digital media. Although all the source files are in SGML, these are compiled into books for DynaText (Electronic Book Technologies browser) to display to the user. Usability tests of the pre-alpha release of IRIS InSight using 16 novice users, had tested link behaviour and task times. Results showed that users preferred scrolling within books to conventional methods of displaying on-line technical information. Task times averaged 10.2 minutes. Taking what they had learned from the first session, Silicon Graphics conducted a second usability test at the pre-beta release stage of development, also using 16 novice users. They tested user preferences to on-line vs paper documentation; they were a little surprised (based on previous research) to find that task performance using on-line documentation was equal to that using paper. Task times averaged only 4.3 minutes. The researchers believe that as users become more accustomed to using on-line documentation, their efficiency will improve, as will the users' learning curve. KK identified a number of traps to avoid when developing such projects: * there can be conflicts during development between sticking strictly to the standards and guidelines that have been adopted, and getting a working product. * beware of underestimating the effort involved in converting data beware of underpowered hardware * avoid making any dependency changes when development is nearing a milestone point. * start licencing discussions early KK then identified the things Silicon Graphics would do again on a similar project: * get together a committed engineering team * use outside experts * survey customers * build a multi-functional team (including authors, editors, trainers, customer support, software engineers etc. * pick the pilot project very carefully * document the entire development process. 33. Converting 180+ Million Pieces of Paper -- Eric Freese (Mead Data Central, USA) Mead Data Central maintain a number of massive, very diverse databases of textual information. Their holdings currently represent the equivalent of 180 million pieces of paper, to which another 40 million are added annually. Erice Freese (EF) gave some of the reasons why Mead had decided to convert all their electronic data to SGML form -- primarily to take advantage of SGML's device, application and language independence, and to use related standards such as HyTime and DSSSL. The more structural information they can capture in their encoding scheme, the easier it will be to provide satisfactory responses to users' queries and requests. For the SGML data they already process, they have adopted the FOSI approach (from CALS) to get formatted output, but they will certainly adopt DSSSL once a stable standard and software tools have appeared. EF claimed that because of the huge range of document types stored in Mead's Databases, they will ultimately need to develop between 4,00- 7,000 DTDs (and 8,000-14,000 FOSI's to allow for paper and on- line formatting). The conversion process will need to be almost entirely automated, as it would be unrealistic, if not impossible to do it by hand. The conversion process is due to start in 1994 and will need to handle 1.5 million documents a day. The production of database definitions, DTDs and FOSIs is expected to take only 5 months whilst the actual conversion of the documents themselves is scheduled to take only 6 months. The result will be one of the largest SGML applications in the world. EF believes that standards such as HyTime will become ever more important to them over time. The sort of search and retrieval systems he envisages for the future at Mead include: * distributed environment (possibly world wide) * GUI * Multimedia - hypertext, graphics, video, audio * international sources and delivery (10+ source languages, multilingual documents, and possibly even automatic language translations) 34. Poster Session: Conversion -- Various Speakers This session considered various approaches to, and experiences of converting to and from SGML documents. 35. Back to the Frontiers and Edges -- Michael Sperberg-McQueen (University of Chicago, USA) This was the closing keynote speech, in which Michael Sperberg- McQueen speculated on some of the SGML-related developments he expected and/or hoped to see over the coming years. Amongst a long list of various issues that he raised were such points as "Will we find a way to encode semantics?", "Will a methodology for developing DTDs evolve?" and a number of similar questions. The full text of this presentation is given in Appendix II) For further details of any of the speakers or presentations, please contact the conference organizers are: Graphic Communications Association 100 Daingerfield Road, 4th Fl. Alexandria VA 22314-2888 United States of America Phone: (703) 519-8157 Fax: (703) 548-2867 ================================================================= You are free to distribute this material in any form, provided that you acknowledge the source and provide details of how to contact The SGML Project. None of the remarks in this report should necessarily be taken as an accurate reflection of the speaker's opinions, or in any way representative of their employers' policies. Before citing from this report, please confirm that the original speaker has no objections and has given permission. ================================================================== Michael Popham SGML Project - Computing Development Officer Computer Unit - Laver Building North Park Road, University of Exeter Exeter EX4 4QE, United Kingdom Email: sgml@exeter.ac.uk M.G.Popham@exeter.ac.uk (INTERNET) Phone: +44 392 263946 Fax: +44 392 211630 APPENDIX 1 THE SGML YEAR IN REVIEW 1992 by Yuri Rubinsky, SoftQuad Inc STANDARDS ACTIVITY 1. The highlight of the SGML year, I think most people would agree, was the adoption of HyTime as an international standard. Sort of like a child born already having been accepted into a good university, HyTime has been considered for some time a necessary component of many initiatives, including the grand old US DoD CALS. I think we're going to see an outburst of activity and creativity revolving around HyTime over the next year. The standard will be published shortly by ISO in Geneva. Copies will be available from national standards bodies like ANSI and BSI; there will probably be a few authorized redistributors like GCA and TechnoTeacher. 2. Conformance Testing Initiative: Spearheaded by the GCA in the US and the National Computing Centre in the UK, the SGML conformance testing initiative slowly but surely attempts to gather the momentum (and money) it needs to proceed. There seems to be general agreement that independent testing of SGML capabilities is needed (with some vocal exceptions citing examples of the market deciding what conformance means) but no agreement whatsoever on where the money should come from. Nonetheless, the GCA GenCode Committee continues to explore the possibilities. 3. The SGML Review: ISO regulations call for a review of each standard around the time of its 5th birthday. That review will continue over the next few months; Dr. Goldfarb is chairing the Special Working Group on SGML and invites comments and suggestions. 4. From Jim Mason, Convenor of WG8, comes the following news clip: "There are two query languages for SGML documents being developed in ISO/IEC JTC1/SC18/WG8: HyQ, as part of HyTime, ISO 10744; and an unnamed language as part of DSSSL, ISO/DIS 10179. We are developing two languages because these two standards, while they both manipulate SGML documents, are partially complementary in scope and functionality. DSSSL deals with pure SGML files. Although HyTime requires the `hub document' to be in SGML, subsidiary documents may be in any format, including binary digitized audio or graphics. DSSSL and HyQ both operate on property sets defined in SGML. DSSSL's location model is entirely in terms of SGML structures (where an element is in the tree, its relationships to its siblings, and so on). HyTime also needs to deal with finite coordinate spaces (this happens three seconds after that). As of the September meeting of WG8, we feel that in areas of simultaneous interest, there should be simple mappings between the languages." 5. DSSSL, which I hope will be the highlight of next year's Year in Review, is expected to go out for a second draft ballot about April, 1993. USER GROUP ACTIVITY 1. Here's a good piece of "you heard it here first" news: Today, [Oct. 26, day one of SGML '92] on the other side of the planet, that is, about 12 hours from now, the Australian National SGML User Group is being formally incorporated under the name SGML OZ. Chaired by Carlyle Nagel, the group's inaugural sponsor is Xerox Australia, who, in Carlyle's words, "provide the tea and biscuits". Congratulations to that group on this exciting and co- incidental day 2. A Mid-Atlantic SGML Users' Group has been formed, catering to SGML Users' living in Atlantis. Well, the Washington D.C. area actually. 3. With the International SGML Users' Group having started in the United Kingdom, it was often easy for everyone to think of the International group as meeting the needs of a local chapter, but of course local chapters have a separate role to fulfill, and accordingly those big islands finally have their own U.K. SGML Users' Group with Nigel Bray as Chairman. 4. The Southern Ontario User Group (covering a broad sweep of area more or less centered on World-Series-winning Toronto, Canada) recently held a successful vendor day and continues to publish its newsletter. 5. A Seattle User Group has begun, under the sponsership of DEC. 6. Meanwhile in Colorado, what I mentioned last year as the planned Boulder SGML User Group finally got off the ground last month with its first meeting and a working name of The Rocky Mountain SGML Entity. 7. The Dutch Chapter of the SGML Users Group reports that it had a difficult year, due to the resignation of Dieke Van Wijnen as secretary of the Group. In September the group found a new secretary and now activities are resuming, including on November 25th, a one day conference on the managerial implications of the introduction of SGML applications. On December 9th, the annual meeting of the group will be held. MAJOR PUBLIC INITIATIVES 1. The Air Transport Association/Aerospace Industries Association subcommittee responsible for text standards has just released Revision 31 of Specification 100. This standard includes six DTDs covering a range of technical maintenance publications (including Aircraft Maintenance Manuals, Engine Ship Manuals and Service Bulletins). The new spec also includes a DTD Requirements Document and an SGML Data Dictionary (an industry-wide list of reusable elements and attributes). 2. Latest news on the CALS front is that MIL-M-28001B is expected to be released about March. Revision B will include significant changes in the Appendix B Output Specification, a tagging scheme for partial document delivery requirements, and a tagging scheme meeting the requirements for electronic annotation and review requirements. MIL-STD-1840B availability will be announced at CALS Expo '92. It is expected to provide more flexibility in data delivery such as accomodating other data types, device-independence, and tape medium. 3. The Commission of the European Communities (CEC) is funding the TIDE (Technology Initiative for Disabled and Elderly people) Pilot Action. Within TIDE, the Communication and Access to Information for Persons with Special Needs (CAPS) project started in December 1991, and will last until the end of March 1993. This project's main objective is to provide broader access to digitally distributed documents (especially newspapers, books and public information) to a significant group of handicapped and elderly persons who have difficulty in accessing the printed word and/or electronic information. The print disabled group includes the blind, the deaf blind, the visually impaired, the dyslexic and those with motor impairments that make it difficult to physically control paper documents or to use traditional methods for computer access. Working with the CAPS committee, Manfred Kruger of MID has written a DTD for electronic delivery of newspapers including such interesting and once-you-think-about-it-perfectly-sensible constructs as an entity for an "invisible blank". This is the character that tells a voice synthesizer to break the current word into parts which are pronounced separately. 4. In a related item, a related committee with some overlapping membership, a working sub-committee of the International Committee for Accessible Document Design, has completed a draft DTD to support the formatting of braille from SGML. This DTD was accepted by the full committee last week and will now go forth to the Texas legislature to become part of state law regarding accessible electronic versions of all textbooks approved for use in the state educational system. Anyone interested in learning how to make new or existing DTDs "braille-ready" should contact the author at SoftQuad. (+1 416 239-4801) 5. In a single year, the Davenport Group, as part of its "Davenport Advisory Standard for Hypermedia (DASH)" activity, started (in January) and by December will have completed the design and publication of a set of HyTime-based SGML architectural forms, tentatively dubbed the "Standard Open Formal Architecture for Browsing Electrical Documents," for the standard representation of indexes, tables of contents, glossaries, and cross references, for use with online documentation on Unix and Unix-like Open Systems. Unix International, the Open Software Foundation, Novell and others participated in this development, and the Open Software Foundation is probably going to implement the SOFABED architectural forms immediately. 6. X Consortium, the people who brought you X-Windows, has been presented with a protocol, proposed by Kent Summers of EJV and Jeff Vogel of EBT, for online help servers. The proposal puts forward a scheme which takes advantage of the hierarchical tendencies of SGML models but also can support anything else that has a notion of unique identifiers. 7. As evidenced by the turnout from the drug industry at SGML '92, there is strong interest in SGML from the point of view of the manufacturers and of the US Federal Drug Administration. Simultaneous with the first two days of this conference is a CANDA (Computed-Assisted New Drug Applications) conference in Washington, at which they are discussing SGML. The FDA has said it wants electronic submissions of New Drug Applications by 1995 and is interested in experimenting with SGML. The Pharmaceutical Manufacturers Association Task Force has suggested an SGML pilot project. 8. The CAD Framework Initiative (CFI a good example of a nested acronym) has a task force to develop a semiconductor industry SGML application for use in transferring component documentation within the industry. This group deserves special mention for its formal name: CAD Framework Initiative Design Information Technical Committee Components Information Representation Technical Subcommittee Electronic Data Book Working Group Technical Documentation Interchange Standard Task Force. 9. The US Congress' 1990 Clean Air Act requires that by 1996 car manufacturers provide all emissions system documentation to anyone who requests it. That will be done in SGML, in an application known as J2008 and created by a subcommittee of the Society of Automotive Engineers. 10. Formatting images for CD-ROM publishing and other electronic image management systems can be made easier if there is an organized scheme to follow. The Association for Information and Image Management C15.9 standards committee members are developing a scheme for generating image tags, based on Standard Generalized Markup Language (SGML), that will be compatible with numerous image indexing and retrieval products. The objective of the project is to assist users by providing a versatile path for converting image files into other publishing systems' proprietary formats or database files. The project is entitled Compact Disk Read Only Memory (CD-ROM) Application Profile For Electronic Image Management (EIM). 11. A healthy collection of standards bodies, including AFNOR, BSI, DIN and the IEEE are all looking at using (and modifying as necessary) the DTD in the "ISO Technical Report 9573: Techniques for Using SGML" for standards creation and production. The Canadian Standards Association is developing, with InfoDesign, a new SGML-based information and publishing system, currently in pilot test phase, to encompass all facets of the standards development process at CSA. When complete, the information and publishing system will allow all data relevant to a document to be created directly in SGML allowing its retrieval in both view-only format and editable SGML text, and publishing to both hardcopy and CD-ROM directly from SGML. 12. The Text Encoding Initiative is nearing completion of the second version, much after the hoped-for date, but correspondingly more thorough and well thought out. A number of major commercial publishers are encoding considerable volumes of material in TEI (Chadwyck-Healey and Oxford University Press, among others). 13. The European Workgroup on SGML is working on a DTD for scientific journal articles, the so called MAJOUR Article DTD (Modular Application for JOURnals). This DTD is based on the AAP Article DTD and is intended as an exchange format between scientific publishers, typesetter and printers, and database hosts. Since last year, when the MAJOUR Header DTD was presented at the International Markup Conference 1991 in Lugano, the EWS has been working on the Article DTD, particularly body, tables, figures, math, and back matter. The first draft version was finished in February '92. Work on individual parts and the documentation is still going on. The MAJOUR Article DTD is scheduled to be finished by the end of the year and will be presented at International Markup 1993. The EWS is trying to take into account and harmonize its own work as far as possible with the work and the results of other initiatives in the field such as the AAP Tables/Math Update Committee and the AAP Article DTD Update. 14. In April, Pam Gennusa of Database Publishing made a presentation on SGML to the Text Working Party of the International Press Telecommunications Council in London. In May, the Associated Press hosted a seminar to introduce SGML to North American print and broadcast media and vendors. At the June meeting of the IPTC working parties and Standards Committee in Toronto, the AP presented an initial draft of NIML, a News Industry Markup Language, intended as a first step towards a full SGML implementation for news text. The NIML draft has since been republished in SGML News in Australia, and been added to the libraries on CompuServe's Journalism Forum. The IPTC has formed a joint SGML working party with the Newspaper Association of America and the Radio- Television News Directors Association. MAJOR CORPORATIONS & GOVERNMENT INITIATIVES 1. The US Department of Energy has adopted SGML as its standard for electronic exchange of scientific and technical information and the Office of Scientific and Technical Information in Oak Ridge, Tennessee has been selected as the facilitating organization. Various DOE organizations and contracters are already participating in this effort and proposals have been submitted for the backbone system. 2. The Australian Parliament has just completed a review of its publishing needs and has recommended SGML for the daily publication of Hansard and supporting documentation. The Australian Attorney General's Office is ramping up its use of SGML, and the Australian Tax Office has an SGML pilot project going which, if successful, will spread across the department. The Australian Defense Publishing Group (DPUBS) has installed a CD-ROM manufacturing facility, which is migrating to SGML. 3. The U.S. Navy Defense Printing Service purchased an SGML- based publishing system to be deployed at all printing service sites, DoD-wide under the ADMAPS (Automated Document Management and Publishing System) program. Under the EDRADS (Electronic Document Retrieval and Distribution System) program, the Navy is populating a document database of all Military Specifications and Standards by scanning and applying SGML tagging. The U.S. Navy has begun a study on conversion of logistics support analysis material directly to Interactive Electronic Technical Manuals. 4. Agfa won the 910S award to develop DTDs and FOSIs for US Air Force Administrative material (and conversion of 10,000 pages). With Agfa's recent re-organization announcements, the fate of this award is, so to speak, up in the air. 5. JCALS the US DoD publishing system architecture contract was awarded this year to a project team headed by Computer Sciences Corporation. With an estimated value over 10 years of $750 million, JCALS is intended to provide systems which will then be duplicated thoughout the DoD to receive contractor data encoded to CALS standards. When complete, the system will be the world's largest integrated information retrieval, document database, editing and publishing system, consisting of hundreds of sites with tens of thousands of users. 6. In Holland, Fokker Aircraft is working on a CALS-like SGML implementation. 7. The Dutch Petroleum Company (NAM) owned by Shell and Exxon has begun implementing an SGML application. 8. Wolters Kluwer Law, has completed conversion of the entire Law Database of Dutch legislation into SGML and is now converting its looseleaf operation. 9. Sumitomo Bank Capital Markets of New York City reports that it would not be capable of maintaining its current volume of business without the links between its structured database data and its unstructured data that SGML provides. Frank Deutschmann writes: "Over the course of the year ... we have moved ALL wordprocessing activities into the SGML environment (currently ArborText's Publisher). Our business (trading derivative financial products) involves detailed legal documentation for each trade (dozens a day), and all documentation is now generated and stored in SGML format. I believe that we are one of the first serious users of SGML ... literally our whole business is based on SGML." 10. In New York City and London, MarketScope, Standard & Poor's electronic market analysis service, is being launched simultaneously through three on-line distribution services with three different formatting requirements all from a common SGML source (created in SoftQuad Author/Editor). 11. In France, Bull is creating its user documentation in SGML using an editorial system centered an Ingres database, and geared to producing both paper and a CD-ROM. 12. Aerospatiale, with the help of AIS, has built a ground based ELS (Electronic Library System) to take maintenance manuals and transform them both automatically and manually into documents needed by Air Inter, the national airline owned by Air France. Aerospatiale is to deliver the final system within the next month. The system will also be available to other airlines. 13. Delta's TOPS System is up and running, producing native SGML job cards. The system takes Boeing data into Datalogics' tagger, using an older version of ATA Spec 100 DTD. 14. USAir is building a native SGML application for Service Bulletins and other internal documents using IBM's TextWrite. 15. Boeing will be producing Service Bulletins in SGML and has provided a complete maintenance manual as test data to the airlines. 16. At the ATA Digital Maintenance Conference last month, 160 people from 60 airlines saw Digital Service Bulletins on SGI computers with Arbortext's SGML Editor and MS Windows and Macintosh versions of SoftQuad Author/Editor. All computer platforms demonstrated the same business process: Service Bulletin content being edited and re-ordered to become Engineering Orders. 17. The Laboratory for Library and Information Science at Linkoping University in Sweden is working with the Swedish Defense Research Establishment in using HyTime to model dynamic structures in crisis management systems incorporating, for instance, information from geographical information systems. 18. John Duke of Virginia Commonwealth University, and consultant George Alexander, a member of the original AAP committee, are working on a project to convert the second edition of the Anglo- American Cataloguing Rules. (AACR2) to an SGML file. AACR2 is maintained by an international committee and codifies the rules that librarians throughout the world use to describe materials in their collections. The electronic version of AACR2 will be used not only to produce future print versions of the constantly changing rules, but to develop software for online versions linked to other cataloguing tools, for tutorial programs, and for other research tools. Value-added developers of AACR2-e may develop linkages to other products, such as the LC Rule Interpretations or the MARC format documents. 19. The Department of Statistics at North Carolina State University will be publishing The Journal of Statistics Education, a newly organized electronic journal that will maintain journal materials using SGML. Some assistance to authors will be provided in producing the SGML documents, at least in the short-term. The first issue of the JSE is targeted for July 1993. The editors plan on using a modified version of the AAP journal DTD. 20. SRC, the Semiconductor Research Corporation funds university research and "pre-publishes" the results to its member sponsors. It has said it will start delivering those findings electronically in SGML by the end of this academic year. The DTDs are done. 21. The Caterpillar Service Information System (SIM) project is based on 11 DTDs and includes file system management software developed by InfoDesign. To date, the system has received, verified and catalogued more than 350,000 pages of converted SGML text and graphics. A second project, the Caterpillar File Management System (FMS) built with Computer Sciences Corporation, a distributed, SGML- based information management system, is now being implemented. 22. Microstar Software of Ottawa has received funding for a two year research project in the area of SGML tools from the Canadian Department of National Defense. 23. This year Microsoft released a CD-ROM multimedia package, entitled "Cinemania" integrating several books about movies into one complex reference source. Microsoft and Exoterica's consulting group used SGML as an enabling technology in the preparation of text data. 24. Mead Data Central has begun work on what may be the largest non-government SGML application in the world. Over the next 2 years, they will be converting between 200 and 250 million documents into SGML using 4,000 or more DTDs and 8,000 or more FOSIs. 25. SunSoft, the Sun software subsidiary, is using SGML in its online publishing tools. Documents in several popular electronic publishing formats will be converted into online information similar to SunSoft's AnswerBook online documentation product. 26. A company called FLUKE reports that it has built a filter for its "Standard Input File Format" where it attempts to keep a writer's file as close to ASCII text as possible and then implies the markup to take it into an Agfa CAPS System. 27. The Cooperative Extension System, which includes the U.S. Department of Agriculture Extension Service, 77 land-grant universities and 3100 county offices, has appointed a working group to develop a standard for encoding publications using SGML. Extension publications include technical reports, fact sheets, and pamphlets on agriculture, horticulture, home economics, youth development many of which incorporate images, tables, charts and graphs. In addition, USDA Extension Service is supplementing its paper distribution system with electronic distribution over the Internet. Documents will be encoded with SGML and formatted on request for a variety of display technologies. PUBLICATIONS 1. Joan Smith, leader of the CALS in Europe Special Interest Group and one of the founding fathers and mothers of SGML, has a new book just out, called SGML and Related Standards published in the UK by Ellis Horwood and distributed in North America by Simon & Schuster. 2. Oxford University Press has indicated to Charles Goldfarb that The SGML Handbook has gone back to press for a second edition. 3. Eric von Hervijnen's book Practical SGML has sold 3000 copies. The Japanese edition of the book was published this year by the Japanese SGML Forum. Eric is now working on a second edition which will also be available electronically in Dynatext, incorporating the ARCSGML parser with buttons that will allow you to parse the book's examples. A wonderful case of using available SGML technology technology beyond simply representing pages. 4. SGML Inc., the editorial team behind <TAG>, the SGML Newsletter, entered into an agreement whereby the GCA publishes the newsletter. Another sign of SGML's continued growth is the fact that <TAG> is now published monthly. 5. The CALS Journal, a glossy colour magazine devoted to the world-wide CALS initiative and with continuous coverage of SGML in CALS, is now completing its first year of publication. 6. Interesting and exciting SGML coverage in the mainstream: BYTE magazine's June issue had a cover section on "Infoglut" which included articles devoted to SGML by Steve DeRose and Lou Reynolds of Electronic Book Technologies and Chris Locke and Haviland Wright of Avalanche. The November '92 issue of Unix World includes that magazine's first major mention of SGML in its Standards column. 7. The Seybold Report's coverage of SGML activities continues to grow, recently with Mark Walter's long piece on September 7th describing both Silicon Graphic's and Novell's committment to SGML: "If we've been writing a lot about the Standard Generalized Markup Language (SGML) lately, it's because a lot is happening. The latest ringing endorsement of the standard: Silicon Graphics and Novell will be converting their hardware and software documentation into SGML for delivering the manuals on CD-ROM. The adoption of SGML by two big-name computer gear suppliers both of whom had ready access to vendor-based solutions reflects a growing awareness of the intellectual and business advantages to putting critical information in a rich, portable form. Adoption of SGML by the computer industry could spur more widespread use and change the face of electronic delivery software development." 8. The November 18th Management Edition of Newsweek which is sent to 3/4 million management subscribers includes an article by Chris Locke placing SGML in perspective for managers. Locke describes reasons why computer automation "has largely failed to increase productivity" and goes on to say: "A solution to both problems universal document interchange and the explicit encoding of document structure is rapidly arriving from a largely unheard of quarter. The Standard Generalized Markup Language (SGML) is being adopted with surprising speed by companies such as WordPerfect, Novell Frame Technologies, Interleaf, Silicon Graphics, Digital Equipment, and Sony. The reason that this open, non-proprietary international standard is situated at the heart of so many development efforts is its ability to represent a rich set of document structures and relate them to a humanly meaningful whole." VENDOR ANNOUNCEMENTS A couple of announcements this year suggest activity among vendors that signal, I think, SGML's movement into the mainstream: 1. The WordPerfect Corporation, market leaders in the wordprocessing world, demonstrated their SGML-conforming version of UNIX WordPerfect, both at TechDoc and the recent Seybold Conference. The product is in its beta test period now and will be ported to MS-DOS next year. 2. Adobe Systems has launched its Carousel product, which accurately displays PostScript fonts and pages irrespective of the computing platform they're sent to. Although there have been no formal announcements, statements made at the Seybold Conference by John Warnock and Adobe Vice President, Bill Spaller, indicate that sometime next year a new version of Carousel will be released with some SGML smarts. 3. TechnoTeacher, Inc. demonstrated a prototype of its "HyMinder" Hytime engine at TechDoc Winter 1992 last February. TechnoTeacher expects to release the "HyMinder" product along with its SGML document object library (called "MarkMinder") during the first quarter of 1993. 4. Quark, maker of Quark Express, has produced an alpha version of a filter that exports Quark files in SGML-encoded form in conformance with the ICADD Minimal DTD (for Braille). Other vendors who have made announcements this year include: AIS, Arbortext, Avalanche, Datalogics, EBT, Exoterica, Frame, Intergraph, Interleaf, Oster & Associates, SoftQuad, Unifilt, Zandar, and Westinghouse. MISCELLANEOUS 1. One sign of a growing market is the appearance of market analysis: InterConsult has released an "SGML Software Market Report" which divides the SGML market into useful sectors and attempts to gauge both current and future sales levels. The data is available both as a published document and with InterConsult- developed software called Intuition, which allows one to build one's own assumptions into a sophisticated analytical model. 2. The next item is a reprise of one of last year's. I ended this talk in 1991 by describing Michel Bielzinski's talk at the International Markup Conference. Well, a recent issue of <TAG> includes a very interesting piece by Michel on the same theme: a comparison of space and time in HyTime and Einstein's General Theory of Relativity. 3. In December 1992 Exoterica will be releasing a CD-ROM entitled "The Compleat SGML" containing the full text of ISO8879 integrating the 1988 amendment in online hypertext form. Accompanying this electronic reference will be thousands of sample SGML documents, comprising Exoterica's SGML conformance test suite. Several enormous documents will also be provided for benchmarking purposes. 4. CURIA, the ancient manuscript project of the Royal Irish Academy, has 6MB of text scanned and now being encoded from printed editions of annuals, sagas, poems and prose works in Irish Latin and Old Norse. 5. Dynatext is being used in a math course (differential geometry) at Brown University with interactive 3D graphics and a whole on-line text book. 6. This is surely one of the great tidbits of miscellaneous SGML news: At the recent mid-Atlantlic SGML User's Group meeting, the U.S. Central Intelligence Agency announced that SGML is a strategic direction for the Agency. 7. Reaction was so good at the SGML '91 Conference to Tommie Usdin's paper cut-out dolls for modelling SGML content, that the GCA is now offering the package for sale. 8. An SGML and LaTex volunteer group managed by Chris Rowley, Ramer Schopf and Frank Mittelbach reports: "On top of this low level typesetting engine [LaTeX] we are building a high-level language `for specifying the formatting requirements of a class of structured documents' (i.e. for prescribing how to format a document which conforms to a particular DTD) and also implementing a `formatting engine' which uses the specified formatting to convert an input document into a PDL (primarily in TeX's DVI language, but this can be translated directly into quite `low-level' Post Script or PCL or ...) This will be, like the current LaTeX, a public domain system." 9. The Integrated Chameleon Architecture, a software system for generating translators to and from SGML DTDs, was made available for public release in March 1992. A user's guide is also available. Scheduled enhancements include the addition of a capability to import already existing DTDs and to specify attributes in DTDs. 10. CITRI, the Collaborative Information Technology Research Institute, a joint research arm of the University of Melbourne and the Royal Melbourne Institute of Technology, has been researching in the area of document retrieval. A group has developed an SGML based hypertext information retrieval system which uses tools such as Lector (University of Waterloo) and XGML (Exoterica) plus their own database engine, Atlas, to provide a platform for researching the retrieval of large, structured documents. 11. The SGML Project, based at Exeter University in the U.K., has continued to successfully promote the use of SGML within the U.K.'s academic and research community. During this last year, the members have given presentations on SGML to several universities, businesses and conferences, established a major electronic archive for SGML resources, and recently set-up an email discussion list for the U.K. community. They are actively seeking additional funding, and over the coming year intend to establish workgroups to define criteria for evaluating SGML software, to assess the software currently available, and to write the DTDs and translators required by the academic community. 12. The Jet Propulsion Laboratory (a division of Caltech contracted to NASA) began a project in 1985 called Planetary Data System a project to catalog and archive planetary data. Thus far the project has only archived the plain ASCII text of accompanying documentation, but is now looking into determining the best way of archiving for the long term while making the documents associated with planetary data available in multiple output forms. SGML is being put forward as a possibility. Finally, with JPL involved, SGML has the opportunity to become a truly universal standard. Or at least galactic. APPENDIX II Back to the Frontiers and Edges: Closing Remarks at SGML '92: the quiet revolution Graphic Communications Association (GCA) C. M. Sperberg-McQueen 29th October 1992 Note: This is a lightly revised version of the notes from which the closing address of the SGML '92 conference was given. Some paragraphs omitted in the oral presentation are included here; some extemporaneous additions may be missing. For the sake of non-attendees who may see this, I have added some minimal biblio- graphic information about SGML '92 talks referred to. I have not added bibliographic references for HyTime, DSSSL, etc. If you are reading this, I assume you already know about them, or know where to find out. (MSM) Section I INTRODUCTION What a great conference this has been! We began with a vision of the future from Charles Goldfarb,(1) and since then have had a detailed tour of a lot that is going on in the present. I want to turn your attention forward again, and outward, back to the fringes and the edges of our current knowledge. We've been hearing about a lot of projects in which the gains of SGML are being consolidated, implemented, put into practice. I want to talk about some areas in which I think there may still be gains to be made. Not surprisingly, some of those gains are at the periphery of our current concerns with SGML, in fringe applications, pushing the edge of the envelope. Not surprisingly, Yuri asked an academic to talk about them, because academics are by nature fringe people, and our business is to meddle with things that are already pretty good, and try to make them better. In identifying some areas as promising new results, and inviting more work, there is always the danger of shifting from "inviting more work" to "needing more work" and giving the impression of dissatisfaction with the work that has been accomplished. I want to avoid giving that impression, because it is not true, so I want to make very clear: the questions I am posing are not criticisms of SGML. On the contrary, they are its children: without ISO 8879, these questions would be very much harder to pose: harder to conceive of, and almost impossible to formulate intelligibly. SGML, that is, has created the environment within which these problems can be posed for the first time, and I think part of its accomplishment is that by solving one set of problems, it has exposed a whole new set of problems. Notation is a tool of thought, and one of my main concerns is to find ways in which markup languages can improve our thought by making it easier to find formulations for thoughts we could not otherwise easily have. I start with the simple question: what will happen to SGML and to electronic markup in the future? Charles Goldfarb told us Monday: the future of SGML is HyTime. And this is true. HyTime is certain to touch most of us and affect our use of SGML in the coming years. But HyTime is already an international standard: it's part of the present. What will happen next? What should happen next? What I will offer is just my personal view, it has no official stand- ing and should be taken for what it's worth. It's an attempt to provide a slightly fractured view, a view slightly distorted in order to provoke disagreement and, I hope, some useful thought. If you want to know what is going to happen with SGML and markup languages in the next few years, all you have to do is think about what happened in programming languages after the introduction of Cobol or Algol, and what happened in database management systems after the development of the Codasyl data model. Section II THE MODEL OF THE PAST The introduction of Cobol made possible vast improvements in program- mer productivity and made thousands of people familiar with the notion of abstraction from the concrete machine. It is no accident that SGML is often compared to Cobol: it is having a similarly revolutionary effect. More suggestive to me however is the parallel between SGML and Algol. Apart from the skill with which its designers chose their basic concepts, one of Algol's most important contributions was its clean, simply designed syntax. By its definition of its syntax, Algol made possible the formal validation of the program text, and thus rendered whole classes of programmer error (mostly typos) mechanically detectable, thus effectively eliminating them from debugging problems. Similarly SGML renders whole classes of markup error and data entry mechanically detectable and thus eliminates them as serious problems. The notion of formal validity is tremendously important. What happened after the introduction of Algol? Over a period of time, the intense interest in parsing and parser construction gave way to interest in the meaning of programs, and work on proofs of their correctness -- which I interpret as essentially an attempt to extend formal validation beyond the syntax of the language and allow it to detect logic or semantic errors as well, and thus eliminate further classes of programmer error by making them mechanically visible. Formal reasoning about objects requires a clean formal specification of those objects and their characteristics, so time brought serious work on the formal specification of programming language semantics. In particular, work on type systems occupied a great deal of attention, because Algol had demonstrated that type errors can be mechanically detected for simple types. So a lot of people worked on extending those simple types and creating stronger, subtler, more flexible, more useful type schemes; from this work our current trend of object-oriented programming takes some of its strength. All of these same issues arose in connection with database management systems after Codasyl. (No matter what Tim Bray said yesterday, this did happen well before 1978.) The work of Codasyl in defining formally the characteristics of databases led to a generation of improved database systems, and eventually the increasing complexity of those systems led to the introduction of the relational model, whose simple concepts had a clean formal model firmly grounded in mathematics, which simplified reasoning about databases and their correctness and which led to substantial progress in database work, as Tim Bray described yesterday.(2) The database work confirmed, fueled, and strengthened the conviction that formal validity and a rational set of data types are a useful investment. Equally important for our purposes, database work showed the importance of placing as much as possible of the burden of validation in the database schema definition and not in the application software that works with the data. If you have a logical constraint in your data, for example that the sum of columns DIRECT COST and INDIRECT COST not exceed the column PRICE TO CUSTOMER, or that the only colors you offer are GREEN and BLUE, it is better to define that constraint into the database schema, so it will be consistently enforced by the db server. You may be tempted to leave it out of the schema on the grounds that your application programs can enforce this constraint just as well as the server. And you are right -- in theory. In practice, as surely as the day is long, before the end of the year you and the two other people who were there are transferred to new duties, your replacements will overlook the note in the documentation, the first thing they will do is write a new application which does not enforce this rule, and before another year is gone your database will be full of dirty data. In other words, to paraphrase an old Chicago election adage, constrain your data early and often. As hardware costs declined and programmer costs increased, portability became an increasingly prominent issue, and the semantic specification of languages, abstracting away from the specifics of individual machines, proved to be an invaluable tool in helping achieve it where possible and limit the costs where device-specific code was necessary. Since the progress on formal semantics, though promising, did not yield mechanistic logic checkers as reliable as mechanistic syntax checkers, the years after Algol and Codasyl also saw the development of the notion of programming style, and attempts to define what constitutes a good program. At least some of these discussions appealed as much to aesthetic judgments as to empirical measures, which is as it should be, since aesthetics is a fairly reliable measure of formal simplicity and power. Section III SGML PROBLEMS AND CHALLENGES All of these problems are also visible in SGML and electronic text markup today, and my prediction, for what it is worth, is that they will occupy us a fair amount in the coming years. What is more, as you will have noticed in the course of the conference they are already occupying us now. That is, the future is closer than you might think. What problems will occupy us in this uncomfortably near future? The same ones that we saw in programming languages and database management: * style * portability * a large complex problem I'll call "semantics", which includes problems of validation and type checking III.1 Style We saw the other day in Tommie Usdin's session One DTD Five Ways(3) how close we already are to developing consensus on DTD style. As for external, presentational details, Tommie remarked that there is already an implicit consensus. For details of construction and approach, she remarked, rightly I think, that there is no one answer, no context-free notion of "a good DTD". Our work in coming years is to work on clarifying a context-sensitive notion of "a good DTD". When is it better to tag a profile of Miles Davis as a <NewYorkerProfile> and when is it better to tag it <article> or even <div>? The answer is not, I suggest to you, as some were proposing the other day: namely that it's always better to tag it <NewYorkerProfile>, but you may not always be able to afford it and so you may have to settle for <article> or <section>. For production of the New Yorker, or for a retrieval system built specifically around the New Yorker, I personally would certainly use the more specific tag. For a production system to be used by all Conde Nast or Newhouse magazines, however, I think the elements <goings>, <TalkOfTheTown>, and so on would be problematic. Let's face it, Psychology Today and Field and Stream just do not have those as regular departments. In building a 100-million word corpus of modern American English, it would similarly be a needless complication of the needed retrieval to provide specialized tags for each magazine and newspaper included in the corpus. One of the points of this whole exercise (i.e. SGML) is to reduce irrelevant variation in our data -- and relevance is context-dependent. Judging by the talks we have heard, those in this community will be building and customizing an awful lot of distinct DTDs in the coming years. One of our major challenges is to learn, and then to teach each other, what constitutes good style in them: what makes a DTD maintainable, clear, useful. III.2 Portability Our second major challenge is, I think, portability. I can hear you asking "What?! SGML is portable. That's why we are all here." And you are right. Certainly, if SGML did not offer better portability of data than any of the alternatives, I for one would not be here. But if data portability is good, application portability is better. If we are to make good on the promises we have made on behalf of SGML to our superiors, our users, and our colleagues, about how helpful SGML can be to them, we need application portability. And for application portability, alas, so far SGML and the world around it provide very little help. Application portability is achieved if you can move an application from one platform to another and have it process the data in "the same way". A crucial first step in this process is to define what that way is, so that the claim that Platform X and Platform Y do the same thing can be discussed and tested. But SGML provides no mechanism for defining processing semantics, so we have no vocabulary for doing so. DSSSL (ISO 10179, the Document Semantics and Style Specification Language) does provide at least the beginnings of that vocabulary. So DSSSL will definitely be a major concern in our future. We have seen another bit of the future, and it is DSSSL. Section IV THE BIG PROBLEM But the biggest problem we face, I think, is that we need a clear formulation of a formal model for SGML. If we get such a formal model, we will be able to improve the strength of SGML in several ways. IV.1 SGML's Strengths SGML does provide a good, clean informal model of document structure. Like all good qualitative laws, it provides a framework within which to address and solve a whole host of otherwise insoluble problems. For the record, my personal list of the crucial SGML ideas is: * explicitly marked or explicitly determinable boundaries of all text elements * hierarchical arrangement/nesting of text elements * type definitions constraining the legal contents of elements * provision, through CONCUR and the ID/IDREF mechanism, for asynchronous spanning text features which do not nest properly -- and here I want to issue a plea to the software vendors: Make my life easier. Support CONCUR! * use of entity references to ensure device independence of character sets Obviously there are a number of other features important to making SGML a practical system, which I haven't listed here. What I've listed are what seem to me the crucial elements in the logical model provided by SGML. It seems to me that a properly defined subset of SGML focusing on these ideas and ruthlessly eliminating everything else, could go far in helping spread the use of SGML in the technical community, which is frequently a bit put off by the complexity of the syntax specification. I don't think a subset would pose any serious threat to the standard itself: use of a subset in practice leads to realizations of why features were added to the standard in the first place, and with a subset, the growth path to full use of the standard language is clearly given. Spreading the use of SGML among the technical community would in turn help ensure that we get the help we will need in addressing some of the challenges we face. IV.2 Semantics We commonly think of SGML documents as data objects, to be processed by our programs. I ask you now to participate for a moment in a thought experiment: what would the world be like, if our SGML documents were not documents, but programs? Our current programs for processing SGML documents would be compilers or interpreters for executing SGML programs. What else? Well, first of all, we discover a tremendous gap: we have lost everything we used to know about programming language semantics, and we have no serious way of talking about the meanings of these SGML programs. And for that matter, we have no serious way of talking about what happens when we compile or execute them. In other words, we have made our programs reusable (we can run the same program / document with different compilers) and so we can use just one programming language instead of many, and this is good, but it would be nice to have a clue about the semantics of the interpretations our compilers make of the language we are using. The clearest analogy I can think of to our situation is that in SGML we are using a language like Prolog, in which each program (document) has both a declarative interpretation and an imperative or procedural interpretation. If you ignore the procedural aspects of Prolog programs, you can reason about them as declarative structures; if you attend to the procedural aspects, you can see what is going to happen when you run the program. The difference between Prolog and SGML is that Prolog has very straightforward semantics for both the declarative and the procedural interpretations, for which formal specifications are possible. In SGML, we have a very clear informal idea of the declarative meaning of the document, but not a very formal one. And we have no vocabulary except natural languages for talking about processing them. Ironically, it is not easy to say exactly what ought to be meant by the term semantics. Different people use it in different ways, and if it does have a specific, consistently used meaning in formal language studies, then the practitioners have kept it a pretty well guarded secret. So I can't tell you what semantics means; I can only tell you what I mean by it today. Imagine I am about to send you an SGML document. Included in this document are two elements I suspect you may not have encountered before: <blort> and <vuggy>. When I say I'd like to have a good specification of their semantics, I mean I would like to be able to tell you, in a useful way, what <blort> and <vuggy> mean, and what formal constraints are implied by that meaning. But we don't seem to know how to do that. The prose documentation, if there is any and if I remember to send it, may say what a <blort> is, or it may not. It may tell you what <vuggy> means, but if it does it may say only "full of vugs; the attribute TRUE takes the values YES, NO, or UNDETERMINED". Unless you are a geologist you probably don't know what a vug is, and if you are a geologist you may harbor some justifiable skepticism as to whether I know and am using the term correctly. Even if my prose documentation does explain that a vug is an air-hole in volcanic rock, and you know how to decide how many vugs make a rock vuggy, I have probably not succeeded in specifying what follows logically from that meaning in any useful way -- probably not, that is, in a way that a human reader will understand and almost certainly not in a way that a validating application can understand and act upon. For example, how many people here realize, given our definition of <vuggy>, that the tag <vuggy true=yes> is incompatible with the tag <rock type=metamorphic> -- since the definition of a vug is that it's an airhole in volcanic, i.e. igneous, rock. If you noticed that, congratulations. Are you right? I don't know: if some vuggy igneous rock is metamorphosed and the airholes are still there, is it still vuggy? I don't know: I'm not a geologist, I'm just a programmer. Is there a geologist in the house?(4) It would be nice to be able to infer, from the formal definition of <vuggy>, whether or not <vuggy true=yes> is incompatible with <rock type=metamorphic>, just as we can infer, from the DTD, that <vuggy true='76.93%'> is not valid, since the attribute true can only take the values YES, NO, and UNDETERMINED. Prose is not a guaranteed way of being able to do that. So what can we manage to do by way of specifying the "semantics" of <blort> and <vuggy>? We don't seem to know how to specify meaning in any completely satisfactory way. What do we know how to do? Well, we can fake it. Or to put it in a more positive light, we can attempt to get closer to a satisfactory specification of meaning in several ways: IV.2.1 Prose Specification First, we can attempt to specify the meaning in prose. Specifications in prose are of course what most of our manuals provide in practice. It is handy to formalize this as far as possible, to ensure consistent documentation of all the characteristics of the markup that we are documenting. We've heard obliquely about a number of systems people use to generate structured documentation of SGML tag sets: Yuri Rubinsky mentioned one used internally by SoftQuad; Debby Lapeyre mentioned one; the Text Encoding Initiative (TEI) uses one; I am sure others exist too. This is already a live issue. And it will continue to occupy our attention in the coming years. Natural-language prose is, at present, the only method I know of for specifying "what something means" in a way that is intuitive to human readers. Until our colleagues in artificial intelligence make more progress, however, prose specifications cannot be processed automatically in useful ways. IV.2.2 Synonymy Second, we can define synonymic relationships, which specify that if one synonym is substituted for another, the meaning of the element, whatever that meaning is, remains unchanged. If we didn't know in advance what <blort> and <farble type=green> meant, we probably still don't know after being told they are synonyms. But knowing we can substitute one for the other while retaining the meaning unchanged is nevertheless comforting. IV.2.3 Class Relationships Third, we can define class relationships, with inheritance of class properties. It doesn't tell us everything we might need to know, but if we know that a <blort> is a kind of glossary list, or a kind of marginal note, we would have some useful information, which among other things would allow us to specify fall-back processing rules for applications which haven't heard of <blort>s but do know how to process marginal notes. The fact that HyTime found it useful to invent the notion of architectural form, the fact that the TEI has found it useful to invent a simple class system for inheritance of attribute definitions and content-model properties, both suggest that a class-based inheritance mechanism is an important topic of further work. IV.2.4 License or Forbid Operations Fourth, we can define rules that license or forbid particular rela- tions upon particular objects or types of objects. We may not know what a <blort> is, but we can know that it stands in relation X to the element <granfalloon>, and we can know that no <blort> can ever stand in relation Y to any element of type <vuggy>. In addition to relations, we can specify what operations can be applied to something: knowing that INTEGER objects can be added while DATE objects cannot, especially if one of the DATE objects is "in the reign of Nero", is part of what we mean when we say we understand integers and dates. An ability to define legal operations for SGML objects is a key requirement for using SGML in data modeling. The definition of a data type involves both the specification of the domain of values it can take on and the spec of operations which can apply to it. Because SGML has no procedural vocabulary it is very difficult to imagine how to specify, in SGML, the operations applicable to a data type. It would be useful to explore some methods of formal specification for legal operations upon SGML objects. But note that "what it can do" and "what can be done to it" are not, really, specifications of "what it means". Moreover, object-oriented specifications cannot be exhaustive. In an application program, if an operation P is not defined for objects of type Q, it counts as a claim that operation P is illegal for such objects. Even if it's not illegal, you aren't going to get anywhere by trying to call it, so it might as well be illegal. In SGML, with our commitment to application independence, that isn't the case. If no definition of addition for DATE objects is provided, that could mean that it is semantically invalid: dates can never be added. Or it could mean that we just haven't got around to it yet, or haven't thought about it yet. So the absence of a method for performing an operation doesn't tell us whether the operation is or should be legal upon a particular type of object. Obviously, instead of leaving operations undefined, we could specify explicitly that certain operations are illegal for objects of a certain class. But it is not feasible to make an list of all the things that cannot be done to DATES, or BLORTS, or GRANFALLOONS, because the list is likely to be infinite. Nevertheless, as a way of approaching the formal description of applications, object oriented work is very promising. It's fairly obvious that in the future we need to work together with those people developing the object-oriented programming paradigm. IV.2.5 Axiomatic Semantics Fifth, we can specify in some logical notation what claims about the universe of our document we can make, given that it is marked up in a certain way, and we can define what inferences can be made from those claims. The synonymic relations I was talking about a moment ago are just a special case of this. Formal logic (i.e. first-order predicate calculus) certainly makes possible the kinds of inference I've been talking about, but even predicate calculus makes some concessions to the difficulty of the problem. I can infer that this value for this attribute and that value for the other one are consistent, inconsistent, etc. But since Frege and Russell and Whitehead, logic has treated itself as a purely formal game divorced from meaning; the only relation to the real world is by way of models which involve assigning meanings to the entities of the logical system and seeing which sentences of the logical system are true under these interpretations. The problem is that "assign a meaning in the real world to an entity or operation of the logical system" is taken as a primitive operation and thus effectively undefined. We all know how to do this, right? We can't define semantics, but we know it when we see it. In work on declarative semantics, we can learn a lot from recent experience with logic constraint programming and declarative programming. The declarative approach to SGML semantics has a certain appeal, both because it fits so well with the perceived declarative nature of SGML as it is, and because declarative information is useful. As John McCarthy said in his Turing Award lecture, "The advantage of declarative information is one of generality. The fact that when two objects collide they make a noise can be used in a particular situation to make a noise, to avoid making a noise, to explain a noise, or to explain the absence of noise. (I guess those cars didn't collide, because while I heard the squeal of brakes, I didn't hear a crash.)" One worry about declarative semantics is that it might prove difficult to define processing procedures in a declarative way. But in fact it is possible to specify procedures declaratively as Prolog, and logic constraint languages, and the specification language Z show us. So I think a formal, axiomatic approach of some kind is very promising. But let's be real: it is very unlikely from a description of the tag set in first-order predicate calculus that you or I, let alone the authors we are working with, will understand what a <blort> is, or even what a <vug> is. IV.2.6 Reduction Semantics Finally, I should mention one further method of formal semantic specification: reduction semantics. Reduction works the way high-school algebra works. One expression (e.g. "(1 + 2) + 5" is semantically equivalent to that expression ("3 + 5"), that one to this other one("8"), and so on. If you work consistently toward simpler expres- sions, you can solve for the value of X. There has been substantial work done on reduction semantics in programming languages, including LISP and more purely functional languages like ML. Moreover, reduction semantics doesn't have to be defined in terms of string expressions: it is entirely possible to define reduction semantics in terms of trees and operations upon trees. Take a simple example: if we have an element <A> whose content model is "B+", does the order of <B>s matter? In SGML there is no way of saying yes or no. Reduction semantics allows you to say that this tree (gesture) <a><b>Apples ... </b><b>Oranges ... </b></a> is the same as that tree (gesture) <a><b>Oranges ... </b><b>Apples ... </b></a> so sequence is not important. Or that they are not the same, so sequence is significant. We have a good example of this type of work in the paper "Mind Your Grammar" and the grammar-based DB work at the university of Waterloo by Frank Tompa and Gaston Gonnet.(5) I think this is a very important field for further work. In summary: we have at least six areas to explore in trying to work on better semantic specification for SGML: structured documentation (the kind of thing SGML itself is good at), synonymy, classes, operation definitions, axiomatic semantics, and reduction semantics. I don't know whether these activities would constitute the specification of a semantics for SGML and for our applications, or only a substitute for such a specification, in the face of the fact that we don't really know how to say what things mean. Certainly no lexicographer, no historical linguist, would feel they constituted an adequate account of the meaning of anything. And yet I suspect that these activities all represent promising fields of activity. IV.3 Validation and Integrity Checking A formal model would make it possible to formulate cleanly many of the kinds of constraints not presently expressible in SGML. This is by no means an exhaustive or even a systematic list, but at least all the problems are real: * If an attribute SCREEN-TYPE has the value BLACK-AND- WHITE, the attribute COLOR-METHOD almost certainly should have the value DOES-NOT-APPLY. But this kind of constraint on sets of attribute values is impossible to specify for SGML attributes. It would certainly be useful sometimes to be able to define co occurrence constraints between attribute values. * Similarly, there are cases where one would like to constrain element content in a way I don't know how to do with content models. We have heard repeatedly in this conference about revision and version control systems which allow multiple versions of a document to be encoded in a single SGML document. For example, one might have a <revision> element which contains a series of <version> elements. The TEI defines just such a tag pair. At the moment our <version> element can contain only character and phrase elements. It would be nice to allow it to operate as well upon the kind of SGML-element-based deltas that Diane Kennedy described the other day for revision info, in which the unit of a revision was always an SGML element. If a change is made within a paragraph, the entire paragraph is treated as having been changed, and versioning consists in choosing the right copy of the paragraph.(6) But one would like to be able to specify that if the first <version> element contains a <p> element, the second had better contain one as well, and not a whole new subsection or just a phrase. Otherwise, the SGML document produced as output from a version -selection processor would not be parsable. * It would be nice to be able to require that an element be a valid Gregorian date, or a valid ISO date, or a valid part number, etc., etc. * It would be nice to be able to require character data to appear within a required element: i.e. to have a variant on PCDATA whose meaning is "character+" and not "character*" -- or even to require a minimum length, as for social security numbers, phone numbers, or zip codes. * The SGML IDREF is frequently used as a generic pointer. Many people wish they could do in SGML what we can do in programming languages, and require a given pointers to point at a particular type of object. (The pointer in a <figref> had better be pointing at a figure, or the formatter is going to be very unhappy.) * Similarly, it would be nice to have a type system that understood classes and subclasses. The only reason we face this nasty choice between using the tag <goings> and using the tag <section> for the New Yorker's "Goings on Around Town" section is that we have no way to make a processor understand that <goings> and <TalkOfTheTown> and so on are just specialized versions of <section> or <article>. If we use the specialized tags, and want to specify an operation upon all sections of all magazines, we must make an exhaustive list of all the element types which are specializations of <section>. To be sure, our application systems can handle this. But we want to constrain early and often. And never constrain in your application what you could constrain in the DTD. Section V CONCLUSION: WHY BOTHER? I suppose you can sum up my entire talk today this way. We want to constrain our data early and often. To do this, we need better validation methods. To express the validation we need, we need a clean formal model and a vocabulary for expressing it. The query languages described yesterday are not the final word but they are a crucial first step. Why do we want to do all these things? Why bother with formal specification? Because formal specification and formal validation are SGML's great strengths. Why is it, as Charles Goldfarb said on Monday, that SGML allows us to define better solutions than the ad hoc solutions built around a specific technology? It is because SGML provides a logical view of problems, not an ad hoc view based on a specific technology. Naturally, it seems to suit the technology less well than the ad hoc approach. But when the underlying technology changes, ad hoc solutions begin to fit less well, and look less like ad hoc solutions, and more like odd hack solutions.(7) But we can improve SGML's ability to specify the logical level of our data and our applications. And so we should. A logical view is better than a technology-specific view. And so we should welcome every effort to improve the tools available to use in defining our logical view. In this connection I could mention again the work by Gonnet and Tompa on large textual databases, and the work of Anne Brggemann-Klein which is occasionally reported on the Netnews forum comp.text.sgml. Success in improving our logical view of the data is what will enable the quiet revolution called SGML to succeed. And now I hope you'll join me in thanking Yuri Rubinsky, for organizing this conference and for allowing all of us co-conspirators in the revolution to get together and plot. ----------------------------------------------------- (1) Charles Goldfarb, "I Have Seen the Future of SGML, and It Is ..." (Keynote, SGML '92), 26 October 1992. (2) Tim Bray, "SGML as Foundation for a Post-Relational Database Model," talk at SGML '92, 28 October 1992. (3) B. Tommie Usdin (et al.), "One Doc -- Five Ways: A Comparative DTD Session," (panel discussion of five sample DTDs for the New Yorker magazine), SGML '92, 27 October 1992. (4) There was; metamorphic rock can be vuggy too, so the initial definition was too narrow. - MSM (5) Gaston H. Gonnet and Frank Wm. Tompa, "Mind Your Grammar: a New Approach to Modelling Text," in Proceedings of the 13th Very Large Data Base Conference, Brighton, 1987. (6) Diane Kennedy, "The Air Transport Association / Aerospace Industries Association, Rev 100", talk at SGML '92, 28 October 1992. (7) I owe this pun to John Schulien of UIC.