The following report was obtained from the Exeter SGML Project FTP server as Report No. 9, in UNIX "tar" and "compress" (.Z) format. It is unchanged here except for the conversion of SGML markup characters into entity references, in support of HTML.
UNIVERSITY OF EXETER COMPUTER UNIT SGML/R9 THE SGML PROJECT CONFERENCE REPORT: SGML '91 THE OMNI BILTMORE HOTEL, PROVIDENCE, RHODE ISLAND, USA OCTOBER 20th-23rd 1991 Issued by Michael Popham 29 September 1992 _________________________________________________________________ 1. SUBJECT The SGML '91 conference was organized by the Graphic Communications Association (CGA). In their promotional literature, the GCA used the following terms to describe the conference: "SGML '91 will be a high-speed, interactive, meeting of the people who are using SGML right now or are just on the verge. We will hear SGML success stories and discuss the problems and issues that face many users. The agenda includes guided discussions of common concerns. Participants will be called upon to join working groups and to contribute to conference documents which may shape standard practice in this developing field!" In this long report, I tried to record as much about the presentations and events at the conference as I could. I take full responsibility for any mistakes or misrepresentations and I apologize in advance to all concerned. Any text enclosed in square brackets is mine. This report could not have been completed without the assistance of Neil Calton and the goodwill shown by his employers, Rutherford Appleton Laboratory (RAL). The SGML Project takes full responsibility for all the facts and opinions given in this report, none of which necessarily reflect opinion at RAL. All Neil Calton's contributions are indicated by [NBC], all other reports were written by me. 1.1. List of Contents 1. Subject 2. Background 3. Programme - Day 1 3.1 "SGML The Year In Review, A User Guide to the Conference" -- Yuri Rubinsky (President, SoftQuad Inc) 3.2 "SGML - The Wonder Years" -- Pam Gennusa (Consultant, Database Publishing Systems) 3.3 "Grammar Checking and SGML" -- Eric Severson (Vice President) and Ludo Van Vooren (Director of Applications, Avalanche Development) Company 3.4 "Attaching Annotations" -- Steven DeRose (Senior Systems Architect, Electronic Book Technologies) 3.5 "Using Architectural Forms" -- Steve Newcomb (President, TechnoTeacher Inc Case Studies) 3.6 "Implementing SGML for the Florida Co-operative Extension Service" -- Dr Dennis Watson (Assistant Professor, University of Florida) 3.7 "SGML User Case Study" -- Susan Windheim (Technology Consultant, Prime Computer Technical Publications) 3.8 "STEP and SGML" -- Sandy Ressler (National Institute of Standards and Technology) [NBC] 3.9 "Multi-vendor Integration of SGML Tools for Legal Publishing" -- Francois Chahuneau (AIS/Berger-Levrault) 3.10 "Developing a Hypertext Retrieval System Based on SGML" -- Tom Melander (Sales Engineering Manager, Dataware Technologies) [NBC] Application Topic#1 3.11 "Data for Interactive Electronic Technical Manuals (IETMs)" -- Eric Freese (Senior Technical Specialist, RYO Enterprises) 3.12 "International SGML Users' Group Meeting 4. Programme - Day 2 Reports from the Front - Various Speakers 4.1 "OSF's Pursuit of DTDs" -- Fred Dalrymple (Group Manager, Documentation Technology, Open Software Foundation) 4.2 "The Text Encoding Initiative: A(nother) Progress Report" -- Lou Burnard (Co-ordinator, Oxford Text Archive, Oxford Computing Service) 4.3 "TCIF IPI SGML Implementation" -- Mark Buckley (Manager, Information Technology, Bellcore) Application Topic#2 4.4 "Rapid DTD Development" -- Tommie Usdin (Consultant, Atlis Consulting Group) Poster Session 1 4.5 "Tables in the Real World" -- Various speakers 4.6 "Handling Tables Practically" -- Joe Davidson (SoftQuad) 4.7 "TCIF approach to Tables" -- Mark Buckley (Manager, Information Technology, Bellcore) 4.8 "Format-oriented vs content-oriented approaches to tables" -- Ludo Van Vooren (Director of Applications, Avalanche Development Company) 4.9 "How should statistical packages import/export SGML tables?" -- Peter Flynn (Academic Computer Manager, University College, Cork) Formatting Issues and Strategies - Various speakers 4.10 "Formatting - Output Specifications" -- Kathie Brown (Vice-President, US Lynx) 4.11 "Formatting as an Afterthought" -- Michael Maziarka (Datalogics Inc., Chicago, Illinois) 4.12 "Native versus Structure Enforcing Editors" -- Moria Meehan (Product Manager, CALS Interleaf) Poster Session 2 4.13 "Verification and Validation" -- Eric Severson (Vice-President, Avalanche Development Company) 4.14 AAP Math/Tables Update Committee Chair - Paul Grosso 5. Programme -Day 3 5.1 "Unlocking the real power in the Information" -- Jerome Zadow (Consultant, Concord Research Associates) 5.2 "The Design and Development of a Database Model to Support SGML Document Management" -- John Gawowski, Information Dimensions Inc [NBC] 5.3 "A Bridge Between Technical Publications and Design Engineering Databases" -- Jeff Lankford, Northrop Research and Technology Centre [NBC] 5.4 "Marking Up a Complex Reference Work Using SGML Technology" -- Jim McFadden, Exoterica [NBC] 5.5 "Nuturing SGML in a Neutral to Hostile Environment" -- Sam Hunting, Boston Computer Society [NBC] 5.6 Trainers Panel [NBC] 5.7 "Reports from the Working Sessions [NBC] #1 Standard Practices -- Eric Severson and Ludo Van Vooren #2 A Tool for Developing SGML Applications 6. Summary 2. BACKGROUND This was a well-attended conference, with over 150 participants. European interests were sadly under-represented, with only about 10 attendees from E.C. nations (of which two-thirds were from academic or research institutions, and the remaining third was predominantly Dutch publishing houses). Japan had sent representatives from Fujitsu International Engineering Ltd, and the Nippon Steel corporation, but all the other attendees were from North America. The American delegates were a reasonable mix of SGML software houses and consultancies, academic/research institutions, and many large corporations; the following were well-represented: AT&T, Boeing, Bureau of National Affairs Inc., IBM, InfoDesign Corporation, Interleaf, US Air Force and several U.S. Departments. Various activities preceded the start of the conference proper, most notably a short tutorial for those who had had little direct experience of SGML Document Type Definitions (DTDs) and a guided tour of the Computer Center at Brown University. The tour highlighted Brown's success at forging sponsorship deals with the commercial world, and we were shown rooms full of workstations available for undergraduate use. We were also given a demonstration of some of their research on computer graphics - including the construction of an 'intelligent' graphic object (a cheese seeking mouse) using a number of logical building blocks. 3. PROGRAMME - Day 1 3.1. "SGML The Year In Review, A User Guide to the Conference" -- Yuri Rubinsky (President, SoftQuad Inc.) The conference opened with a presentation from the Chair, Yuri Rubinsky. He displayed a graph showing the dramatic rise in attendance of such conferences since SGML '88 was held, and suggested that this reflected the growth in interest in the whole computing community. Rubinsky then went on to talk about a range of SGML-based activities, and noted that every initiative that he had talked about at SGML '90 was still on-going (and thus such projects should not be though of as 'flash-in-the-pan'). He then listed a large number of points, many of which I have tried to reproduce below; any inaccuracies or omissions are mine. Under the general heading of SGML Applications, Rubinsky stated that the Text Encoding Initiative now includes fifteen affiliated projects, with a combined funding of $30+ million, and involving approximately 100,000 people. He also reported that the European Workgroup on SGML (EWS) is continuing to develop its MAJOUR DTD -- which will now incorporate parts of the AAP's work for its body and book matter; the European Physics Association are to adopt MAJOUR and also intend to campaign for changes to the AAP DTD. Rubinsky stated that the Open Software Foundation (involving such companies as IBM, Hitachi, HP, and Honeywell) are to use SGML for all their documentation. Also the French Navy are to have their suppliers provide documentation marked up with SGML, and the CALS initiative itself remains very active. With regard to international standards, Rubinsky noted that HyTime was about to go forward as a Draft International Standard until April 1992. Meanwhile, the Document Style Semantics and Specification Language (DSSSL) has been approved as a Draft International Standard, and the Standard Page Description Language (SPDL) was due to finish balloting at the end of October -- after which Rubinsky expected to move into being a standard. He also reminded everyone the ISO 8879, the Standard dealing with SGML, is now entering its review period, any subsequent version(s) will be backwardly compatible with existing applications and systems. Rubinsky said that the current intention is to maintain a database of all comments made about ISO 8879, which will be submitted with markup conforming to a specially developed DTD. Rubinsky mentioned several interesting projects and initiatives. Microsoft is about to release an updated version of its "Bookshelf" CD-ROM, which will be partially coded with SGML markup. The ACM is to extend use of SGML to cover its documentation, whilst the group of banks involved in the SWITCH scheme will produce all their internal documentation with SGML markup. The CURIA project at University College, Cork, (Ireland) will spend ten years marking up Irish literary manuscripts, and the publishers Chadwyck-Healey have recently released a CD-ROM containing the works of 1350 poets marked up with SGML. Rubinsky also referred to the work of the Text Encoding Iniative (TEI) and that of the SGML Forum (Japan) -- the Japanese chapter of the SGML User's Group who have recently produced a piece of software called "SGF", a Simple SGML Formatter. The Canadian Government have stated that they will be producing all their national standards using SGML marked up text. In France, three dictionaries are being produced with the help of SGML, whilst a number of car and aeroplane manufacturers have begun to use SGML. In addition, Rubinsky spoke of the work being carried out at the Biological Knowledge Laboratory (Northeastern University, USA) to produce a body of on-line SGML texts and knowledge-based tools. He also cited the work of the British National Corpus -- a collection of around 100 million words, tagged according to the guidelines produced by the TEI. Rubinsky briefly noted that three relevant books had appeared in the past year: Goldfarb's "The SGML Handbook", a Guide to CALS (No authors mentioned) and a bibliography on SGML (produced at Queen's University at Kingston, Canada). He then went on to list the plethora of new products, technology and services that had appeared since SGML '90, including the following [Please forgive any omissions]: the ARC SGML Parser (now installed at 100+ sites and ported from DOS to several versions of UNIX), AIS/Berger-Levrault's "SGML Search", AborText's "SGML Editor/Publisher", Avalanche's "FastTag" (now available for DEC workstations), Agfa CAP's next generation of CAPS software, Bell Atlantic's SGML-based hypertext product, the Computer Task Group (Consultants), Electronic Book Technologies new version of "Dynabook". E2S "EASE", Exoterica Corporation's "Omnimark", Office Workstations Limited (OWL)'s "Guide Professional Publisher" ( which creates hypertext versions of SGML documents), PBTrans (a context-sensitive replacement program to be released as shareware), and the SEMA Group's work on its "Mark-It" and "Write-It" products. On the corporate front, Rubinsky said that Framemaker are committed to producing an SGML-based version of their product by the end of 1992. Thirty percent of Exoterica Corporation had recently been sold to a large French firm. WordPerfect Corporation have said that they will enable users to manually tag documents (although there is no stated time-frame) and IBM have also announced a commitment to providing users with SGML facilities. Rubinsky closed by citing various published items on SGML use, and remarking upon work to produce the Publishers' Interchange Language (PIL), which will be based on SGML. 3.2 "SGML - The Wonder Years" -- Pam Gennusa (Consultant, Database Publishing Systems) Gennusa began by reminding the conference that the fifth anniversary of ISO 8879 had only just passed; she stated that her presentation would look at what had happened over that five year period, and how we could expect to see the use of SGML evolve. Indulging in a little wordplay with the third letter of the acronym SGML, Gennusa suggested that the Standard Generalized MARKUP Language had been the direct result of the decision to begin using a generic tagging scheme for markup. However, recent practice has been to treat SGML as a MODELLING language, where Document Type Definitions (DTDs) are written to model the "content reality" of a type of document. Nowadays, there is a growing recognition that SGML provides the support of a document MANAGEMENT language -- with the creation of DTDs that assist in the management of document instances and applications. Gennusa had observed that a number of similarities have emerged in SGML usage. Conventions for DTD construction are beginning to appear -- for example, placing all entity declarations at the start of a DTD, typically having attribute list declarations immediately below element type declarations, and conventions for naming parameter entities. The need for modelling information corpora rather than the information contained in individual documents has also been recognized -- with the emergence of the HyTime Standard, work on Interactive Electronic Technical Manuals (IETMs) for the U.S. Department of Defense, and the AECMS 1000D specification for technical information for the European Fighter Aircraft. Gennusa also noted that the use of SGML has become increasingly mature. She claimed that there has been "more concentration on content over pure structure", "a more generic notion of behaviour mapping", and "more flexibility in instance tailoring". Also, there were now new definitions of the term "document", and the relationship between use of SGML and databases had begun to strengthen and be consolidated. Gennusa raised the "ol'saw" of content-oriented vs structure- oriented markup, and suggested that publishing is only a reflection or view of the real information contained within a "document". She suggested that information and its uses should always be borne in mind and that the work of a DTD writer "must reflect not just the document but [also] the application goals". Indeed Gennusa suggested that it was strongly inadvisable to try and write a DTD without establishing how the information will be used; this requires careful document and application analysis. She briefly mentioned how early use of SGML had concentrated on separating content from format, but noted that there now seemed to be a movement towards the use of `architectural forms' (which she described as " ... indicative of a class of behaviour that would be manifested differently in different environments", and also noted that " .. multiple elements within a DTD may have the same architectural form, but require unique element names for various reasons"). Gennusa stated that people's definition of a `document' is changing, with the advent of hypertext, meta-documents, and text bases. However, SGML is keeping pace with these changes, through the introduction of HyTime, meta-DTDs and modular DTDs. She also felt that there was an increase in document instance flexibility, through such practices as the use of parameter enties for content models, the use of marked sections, and the use of publicly identified declaration sets. Gennusa then discussed work on the Interactive Electronic Technical Manuals Database (IETMDB) DTD for the U.S. Department of Defense. The DTD is known as the Content Data Model (CDM), and attempts to model all the data required for the maintenance of a weapons system in such a way that selected sections can be extracted for viewing on-line or on paper. This is the first use of HyTime's linking and architectural form features within a defence environment. AECMA 1000D is the specification for technical information relating to the European Fighter Aircraft. The primary object of the system is a `data module', which contains a data module code, management information, and content of a particular type. The application makes extensive use of publicly identified element and entity declaration sets (with one declaration per element across all the declaration sets), marked sections, and the use of parameter entity replacement text as content models. Looking ahead, Gennusa identified a number of important developments on the horizon. She expected to see a strengthening in the relationship between SGML and object-oriented programming, systems and databases. Data dictionaries will be used to control the semantics used within an application, and the use of architectural forms will increase. The number of documents that only ever exist within an SGML environment will grow, and document instances will increasingly have content-oriented rather than structure-oriented markup. Gennusa felt that although the work with SGML was far from complete, there was no possibility of "turning back" [but if the President of the SGML User's Group said anything different, I would be very worried!] On a light-hearted note, Gennusa closed with some remarks on the theory of the "morphogentic field" (M-field) which postulates that once a piece of information has been acquired by a particular portion of the population of a given species, this knowledge will `become available' to all the member of the species (and their descendants) via resonance across the M-field -- whatever their physical location. Gennusa hoped that the number of participants at SGML '91 would be just sufficient to spread recognition of the value of SGML as an enabling technique for information producers and consumers, across the M-Field of our species. 3.3 "Grammar Checking and SGML" -- Eric Severson (Vice President), Ludo Van Vooren (Director of Applications, Avalanche Development Company) This presentation was aimed primarily at writers of (technical) documentation within a commercial environment. Severson and Van Vooren began by giving the reasons for applying standard practices to document handling procedures -- ie to provide an impression of corporate identity and consistency (in terms of writing style, information structure and layout etc), and also to enable uniform and flexible processing. Standardized formatting can be facilitated by separating document form and content, making use of generic markup and style sheets, and adopting appropriate international standards (such as SGML, DSSSL, FOSI, ODA etc). The contents of documents can also be organized in a standard fashion -- by ensuring the separation of content and structure, using object-oriented hierarchical structures, and employing rule-based systems (eg with DTDs) to enforce occurrence and sequence in structures. The contents themselves can also be standardized, through the adoption of well-defined writing styles and techniques; this could involve anything from passing text through spelling/grammar/readability checkers, to adopting strict rules on vocabulary, sentence construction etc (eg Simplified English). Severson and Van Vooren then spent some time looking at the capabilities of the state-of-the-art tools that are available to help standardize the writing of text -- such as spelling checkers, thesauri, readability indexes and grammar and style checkers. Of course authors can also adopt self-defined rules on grammar, vocabulary, sentence construction etc. Using a markup system based on SGML makes it easier to apply any standard practices and rule checkers to specific objects within a document structure. Moreover, any automated checking procedures can extract information about the nature and context of an object from the relevant (SGML) markup. Severson and Van Vooren then gave some examples of how markup could be used as a basis for recognizing, and perhaps even automatically correcting, any non- standard text occurring within a document. Thus, a poorly written warning could be picked-up by a grammar checker suitably primed to look for certain features within any text element marked up as a warning; readability checkers could be set up to accept different levels for different elements in the text (eg to reflect that help text should always have a much lower readability level than, say, a technical footnote); different user-defined dictionaries could apply when spell-checking certain sections of a document. On a slightly different note, Severson and Van Vooren closed with a brief discussion on how grammatical understanding can improve the performance of auto-tagging applications. For example, sentences and paragraphs can be checked for completeness before they are marked up as such (which is particularly useful if text is split across columns or pages). A grammar checker can also help to parse sentences in order to recognize (and tag) elements such as cross-references, citations, and so on. 3.4 "Attaching Annotations" -- Steven DeRose (Senior Systems Architect, Electronic Book Technologies). DeRose gave the first paper to deal with a single (technical) issue in detail -- though without going into the minutiae of his topic. He began by raising the question of what do we mean by the term "annotation", and suggested that the term could be applied to any "text or other data supplemental to a published document". (By "published document", I took DeRose to be referring to a time-fixed release of information intended for human consumption -- such as a particular version of a specific article stored within a hypertext database.) Thus "annotation" includes anything which is not part of the document as published, and excludes any author-created supplements to the published text -- such as footnotes or sidebars. This interpretation implies a distinct separation between the work of the author, the published document, and any reader-supplied annotation. Since DeRose's work with Electronic Book Technologies focuses primarily on he generation of and access to hypertext, his presentation concentrated on the problem of annotation and electronic (hyper-texts). Having defined what he mean by "annotation", DeRose raised the general question of where they should be stored -- inside or outside the document? The main advantages to storing annotations inside a document are that it is easy to see where they attach, and it is easy to keep them attached to the correct point if the text of the document is edited. However, he identified a number of problems with this approach, for example the fact that readers would be allowed to change a published document which may be undesirable in itself -- but which might involve checking a reader/annotator's authority, risk corrupting the original content or invalidating its markup, and so on. Moreover, it would be necessary to identify any reader-annotation as distinct from the original content -- which might require changes to the document's DTD, re- validation etc -- and there would still be practical difficulties to overcome, such as how to stop copies of the same published document getting out of synch, how to robustly attach annotation if the reader is given no write-permission, or the document is stored on a read-only medium (eg CD-ROM). Storing annotations outside of the published document resolves many of these problems because there is no danger of the original being changed or corrupted, however, this approach raises difficulties of its own. DeRose suggested it makes it harder to specify exactly where in a document an annotation attaches to, more difficult to keep annotations with the relevant part of the document as it is edited, and the process is further complicated if the document's publisher and reader are not on the same network. DeRose then discussed several ways to attach annotation to an SGML document. The `natural' way would be to use ID/IDREF attributes but the problem with this is that most elements in a document do not have IDs. Another technique might involve identifying a path down the document tree using named nodes (eg BOOK, then CHAP4, then SEC3, then PARA27); this avoids having to take into account the actual contents of the document (such as chapter, titles etc) or the data type (eg CDATA), and is easily specified and understood by humans. However, changes in the structure of the document tree might cause problems. A related approach would be to specify a path down the document tree using attributes (DeRose gave the following example: "WORK with name=Phaedo, SECTION N=3, LINE N=15"), or by using unnamed nodes (such as Root, then child 5, then 4, then 27") -- however there could still be complications if the tree was altered. Other techniques which DeRose briefly mentioned included using element numbering (which he described as "well-defined, but inconvenient"), using token offset ("poorly defined") or using byte offset (which he dismissed as "Not even well-defined in SGML" [because?] "Nth data character is not what the file system tells you"). DeRose felt that the main challenge to the successful handling of annotation was how to cope with changes to a document (and thus its underlying tree structure). He identified two ways in which the path through a document tree could be altered, or "broken" -- either "perniciously" or "benignly". Pernicious breaking occurs when it is impossible for an application program to tell that the path through a document tree has been broken, and this could cause severe problems. Whereas, with benign breaking, at least the program would be able to recognize (and inform the user or system) that an unrecoverable break had occurred in the path that enabled the identification of the annotated text. DeRose then considered in greater detail how each of the various technologies for attaching annotation to an SGML document that he had outlined earlier, would be affected by changes made to the document tree. If element IDs/IDREFs had been used, DeRose suggested that the link between the annotation and the relevant text would remain very stable; authoring software could help to prevent the (accidental) re-assignment of IDs IDREFs by the document's publisher or annotator, and if such a link were to fail it would do so benignly. If the use of a path of named nodes had been adopted, DeRose felt that any "long-distance breakage" should be very unlikely. He pointed-out that breaks would only occur if a very restricted set of elements changed -- such as if a direct ancestor, or an elder sibling on an ancestor, was added or deleted. DeRose gave an example, stating that "Named path pointers into Chapter 2 are safe no matter what changes happen within other chapters" -- and argued that if such path pointers broke, they would be likely to point to a node which no longer existed, and thus would behave benignly. DeRose believed that a path based on the attributes of document tree nodes, would perform in a very similar way to a path of named nodes, and would be equally benign. However, he felt that the technique of using a path through a document tree based on unnamed nodes (eg "Root, then child 5, then child 4 .... etc") would be less likely to be benign when it broke. For despite the fact that breaks would only occur when changes were made to certain elements, with mixed content models "the number of "elements" would include CDATA chunks, and [so] the count can change subtly". DeRose held that the other techniques he had discussed would all break perniciously; thus a path based on element numbering would break perniciously if any tagged element was added/deleted/moved and it preceded the element the annotation pointed to -- similarly if a preceding word was added/deleted/moved with a token offset-based path, or a preceding character with a byte offset-based path. DeRose then compared the convention (largely human-centered) approaches to identifying sections of texts, with the techniques he had been discussing. He suggested that using IDs/IDREFs, or stepping through a tree using named nodes or node attributes, corresponded well to the way human beings typically refer to (parts of) texts. For example, we might use a scheme of serial numbers to identify and refer to a particular paragraph in a manual; we might reference text in a book by its title, chapter, section and paragraph number, or a piece of verse by the name of the work, section and line number. DeRose suggested that identifying nodes in a document tree through a system of numbered family relations (eg "Root, then child 5,..... etc") was an approach familiar to computer programmers, whilst referencing solely on the basis of sequential numbering, token offset, or byte offset was only used by "counting machines". DeRose then looked at the sort of information one would want to keep with an annotation -- chiefly details about the annotation itself. It would obviously be necessary to know what the contents of the annotation were and where it attached in the relevant document -- but it could also be very helpful to know who made the annotation, when and what was its purpose/function/type (eg criticism, support, correction etc). There might also be some need to say how the annotation should be presented -- although DeRose seemed somewhat sceptical of this. However, he was very certain of the fact that for any document processed electronically, it would be necessary to record the version of the target document to which an annotation applies. Looking to the future, DeRose discussed some of the features encountered in the special case of annotating hypetext documents. In a hypertext system -- where a document is stored as a web of nodes rather than an hierarchical tree structure -- it would be necessary to have at least two links to attach annotation: one between the annotation and the annotated node, and the other between the annotation and the node which connects the annotated node to the rest of the document web; if there were several other nodes which connected to the annotated node, they would all need to know about the annotation. [I am not sure that I understood DeRose fully on this point, and may therefore have misrepresented his argument]. DeRose also suggested that in a hypertext [hypermedia?] system, each end of an annotation link would need to specify "what application can handle the data"[?]. DeRose also noted that in a hypertext system, an annotation might be attached to a note that "was not an element". [By which I was unclear whether DeRose meant "not a [single] element", "not a [small] element", "not a [text] element", or something else]; his suggested solution was to "Point to [the] lowest element, then [to an] offset within". It is also conceivable that a reader might want to attach an annotation that crosses element boundaries although DeRose felt that this would be unlikely (as people usually want to refer to elements), and could be resolved by simply having the annotation attached to its start and end points. Similarly, if an annotation applied to multiple/complex/discontinuous hypertext nodes, this could be resolved by putting the information about each attach point in a specially created node. In general, DeRose's advice was for designers to implement the most flexible and robust pointers that they can, so that any breaks will be benign. He also indicated that the emerging HyTime standard is developing a range of techniques for handling annotation and related problems (such as activity- and amendment-tracking). 3.5 "Using Architectural Forms" -- Steve Newcomb (President, TechnoTeacher Inc) Newcomb has been closely involved with work to develop ISO/IEC DIS 10744, which describes the Hypermedia/Time-based Structuring Language (HyTime) and he is also the Chairman of the SGML Users' Group Special Interest Group on Hypertext and Multimedia (SGML SIGhyper). He began his presentation be reminding listeners that voting on ISO/IEC DIS 10744 had started only eleven days earlier, and urging all interested parties to obtain a copy of the document as soon as they can. Newcomb then turned to his main theme, the use of architectural forms, which are a direct development of work on HyTime. [I was unfamiliar with this topic, and strongly suggest that any interested readers should obtain a copy of ISO/IEC DIS 10744 and/or get in touch with SGML SIGhyper]. Newcomb suggested that two sets of practices had recently emerged in the computing world -- the use of OOPS (Object-Oriented Programming Systems), and the use of generic markup (especially SGML). Although both came from quite different sources, they share a fundamental concern about how data will be used. Newcomb noted that one of the main principles underlying the development and use of SGML was to enable the re-use of documents. However, he also noted that whilst a particular group of people might find that SGML facilitates their re-use of their own documents, different groups have different perceptions of what is important in a document. Thus rather than adopt, say, a publicly registered DTD for the document type `report', many groups of SGML users have been writing their own uniquely tailored DTDs (or amending DTDs they have obtained from a public source). Such behaviour limits the re-usability of documents which are, ostensibly, of the `same' type -- as it becomes impossible for one group of users to easily re-use the documents created by another. Newcomb described the problem as one of "creeping specialization and complexity", with the only foreseeable result being a return to the "Tower of Babel" document interchange scenario, which prompted the development of SGML in the first place! Newcomb's suggested solution, was that we (the SGML/HyTime/user community) should only attempt to standardize the truly common parts of a document [type]. Architectural forms enable the creation of a set of rules at a meta-level, all the objects created as part of one set of rules (ie as part of a meta-class inherit all the properties of that meta-class. Newcomb gave the example of having to write a DTD for some patient medical records. He argued that there would be certain information that it would always be necessary to know (eg name, sex, age, blood type, allergies etc) but there would be other pieces of information which would become more or less important over time. Since the writers of the DTD have no way of knowing how the second type of information will be re-used in the future, there is every likelihood that they will write the DTD to reflect their current priorities -- so making all of the information less easily re-usable. Newcomb's solution would be to use architectural forms in the DTD to define those elements whose information content will remain of fixed importance over time; this should help to guarantee that crucial information will be available for future re-use. Thus, in an on-line environment, element components (based on HyTime's architectural forms), user requirements, and the SGML syntax are all combined to produce a user-created DTD (with entities, elements, attributes etc, and all the other features of a traditional SGML DTD) for the generation and manipulation of hyperdocuments. Newcomb put up a series of slides, illustrating the general HyTime approach to architectural forms, and some specific examples taken from DTDs. Since I cannot reproduce the slides here and do not feel that I understood the subject well enough to summarize Newcomb's examples and discussion, I would direct readers to ISO/IEC DIS 10744. In conclusion, Newcomb stressed the fact that any conforming SGML parser that currently exists should be capable of parsing a HyTime document. Newcomb said that HyTime is an application/extension of SGML (ISO8879), and is not intended as a replacement or hypertext equivalent. The development of architectural forms would appear to be another step towards ensuring the re-usability of SGML documents, without necessarily involving a move to a (HyTime-based) hypertext/multimedia environment. CASE STUDIES -- Various speakers 3.6 "Implementing SGML for the Florida Co-operative Extension Service" -- Dr Dennis Watson (Assistant Professor, University of Florida) Watson gave a brief overview of the approach adopted at the Institute of Food and Agricultural Sciences (IFAS) of the University of Florida to solve the documentation needs of the Florida Co-operative Extension Service (FCES). IFAS not only supports FCES, several hundred scientists, and dozens of Ccounty Extension Offices, academic departments, research and education centres, it does so with the express purpose of educating students and strengthening the dissemination and application of research. Moreover, IFAS needs to be able to distribute documentation in a huge variety of paper-based and electronic forms. The IFAS approach is for authors to product their documents using WordPerfect, then pass then on to an editorial team who are responsible for reviewing the document and using automated techniques to convert them into suitable formats. Over time, this approach had been growing progressively unmanageable and inefficient. IFAS adopted an SGML-based approach quite recently -- having their first tutorial in October 1990. Watson's team developed a WordPerfect style sheet which can be incorporated into the authoring environment through WordPerfect's style feature and additional pop-up menus. Authors are offered a range of options under the general areas of `File', `Edit', `Headings', `Mark-up', `Tools' and `Set-up'. Using WordPerfect's, built-in option to reveal (hidden) codes, it is possible for authors to see how the `styles' have been incorporated into the body of the text -- with embedded tags indicating where each style starts and stops. It is then possible to take (a copy of) the resulting files and replace the embedded style tags with suitable SGML markup. The SGML files can then be processed for storage within a database, from where they can be easily retrieved for publishing on paper, on-line, as hypertext CD-ROM etc. Since the summer of 1991, Watson and his team have been involved in document analysis and DTD development. They are aiming for compatibility with the AAP's (Association of American Publishers) DTD ideally a subset of the AAP article DTD, with as few locally unique elements as possible. They are intending to use a combination of custom-built and commercially available software to handle the process of converting to, and validating, the SGML documents. Whilst they have identified a number of limitations with their approach -- non-referenced figures, the omission of critical data from the original WordPerfect texts (eg publication number and date), and the requirement to add some local DTD elements -- they feel they have gained a great many benefits. Apart from streamlining the entire production process, the IFAS approach encourages authors to feel they have some control over the markup of their documents, whilst simultaneously helping to impose and maintain documentation standards. Having documents in SGML format facilitates interchange and re-use, and the break- down of documents into suitable "chunks" for database storage can now be done in an automated and controlled fashion. 3.7 "SGML User Case Study" -- Susan Windheim (Technology Consultant, Prime Computer Technical Publications) Windheim described Prime Computers experience in attempting to produce SGML-based technical documentation. Since 1989, Windheim's team had been investigating the possibility of delivering their documentation electronically (in the form of on- line access to quarterly-updated CD-ROMs) in order to improve overall quality and gain cost savings. However, if possible they wanted to be able to maintain the existing authoring environment, and also port existing documents into the new system; they also had a number of other factors influencing their choice of software (eg the filters/tools they would require). In April 1991, Prime began training staff in document analysis and DTD development, at the same time work began on producing the necessary filters to integrate the current authoring environment into the new system. Text was already being produced using troff and Interleaf -- which would both require filters -- and there was also a need to define processor for handling indexing and graphics, and to develop style sheets for the delivery system, DynaText (from Electronic Book Technologies). The goal for Interleaf and troff documents to be filtered into a parsable SGML form where errors could be corrected, and the text browsed via Dynatext; the result would also be stored in a text database for further editing and correction. The system was built using a variety of off-the-shelf software such as FastTAG, the XTRAN parser/language, and Author/Editor, as well as custom software such as the troff and Interleaf filters, Cprograms and XTRAN programs. Windheim said that Prime's experiences with available SGML products had shown that there is little choice, that each tool has an associated learning curve, and that each has limitations and makes compromises. For example, they found that FastTAG could not support some of their DTD features, and that more system variables would be required to enable it to differentiate between all the text objects they wanted to target. With Dynatext, they had encountered difficulties with horizontal and vertical lines, dynamic formatting, and processing instructions. With regard to their DTDs, Windheim's team found that it was being altered too frequently - with the consequence that legacy SGML documents no longer conformed to the latest version. The possible solutions to this problem included default translation of markup that did not directly map to that defined in the latest version of the DTD, editing old documents so that they would parse, writing a more lenient DTD, or creating alternate DTDs (rather than relying on developing the `same' one). Other DTD issues included whether or not Prime should deviate from SGML's Reference Concrete Syntax and whether they should tailor their DTD (s) to take in to account the limitations of their off-the- shelf applications. Prime's aim is to deliver their documentation on a CD-ROM -- requiring a system to have a CD-ROM drive, x-terminal facilities, and a PostScript laser printer. Users will be offered full text indexing with suitable search and retrieval tools, hypertext links (that may be either automatically or author-generated), user bookmarks and annotation, a WYSIWYG display, and high quality hardcopy output. Windheim felt that using SGML brought a number of benefits, including the ability to search texts at the element level, flexibility in Prime's choice of applications), facilitating the move towards centralized document databases, ensuring consistently marked up texts, and helping Prime meet international requirements. 3.8 "STEP and SGML" -- Sandy Ressler (National Institute of Standards and Technology) [NBC] At NIST they are looking at potential SGML uses within the STEP standards effort this includes both STEP (Standard for the Exchange of Product Model Data) and PDES (Product Data Exchange using STEP). As with most standards activities there are a loarge number of documents from a diverse set of sources. SGML seems the natural choice for integration. They are seeking conformance with the DTD for ISO Standards. At the moment most documents are being written using LaTEX. In the future they will be aiming to convert LaTEX STEP documents into SGML and they are starting to use Author/Editor as an SGML input tool. 3.9 "Multi-vendor Integration of SGML Tools for Legal Publishing" -- Francois Chahuneau (AIS/Berger-Levrault) Chahuneau described his company's work to develop a prototype SGML-based editorial system for the French publishing house Editions Francais Lefebvre. Although Lefebvre pass on SGML files to their typesetters, they are not directly concerned with printing and chiefly view SGML as the means to support the tools for the users of their on-line editing system. Lefebvre have been using SGML since 1988 and have developed DTDs for each of their main types of document - weekly periodicals, monthly periodicals, looseleaf central publications, and books. These DTDs share a large number of common elements and tags, and are required to support a variety of tasks; for example, the looseleaf publications consist of 180 mb of textual data, of which 20% has to be updated annually. Chahuneau presented a diagram of the system's general architecture -- which consisted of several (currently only writer/editor workstations connected via a LAN to a database server running BASISplus. Mounted on each workstation are a GUI (an OpenWindows application[?] to manage the user interface, LECTOR (an on-line browser), and an Editor (SoftQuad's Author/Editor). Using a series of slides showing workstation screen dumps, Chahuneau talked his audience through a typical user session. Using the software developed by Chahuneau and his team, users are able to freely browse the textual database, cutting and pasting selected sections of text between browsing and editing windows. However Chahuneau coined the phrase "cut and parse", to suggest the concept of any attempt to paste some text into a new document being subject to validation by a parser. The parsing actually relies on Author/Editor's rule-checking feature (which users can turn off), but Chahuneau and Co had developed the X-Windows command module to support cutting and parsing, and the communication module to relay everything to and from the database server. 3.10 "Developing a Hypertext Retrieval System Based on SGML" -- Tom Melander (Sales Engineering Manager, Dataware Technologies) [NBC] Dataware have been contracted by the United States Government Printing Office (GPO) to develop a hypertext retrieval system for a large database of legal texts. The GPO currently use their own tagging system, but intend to convert entirely to an SGML-based approach within the next five years. Melander based his presentation on an analysis of a small sample of typical text taken from the current database - complete with embedded tags, non-printing characters etc. Dataware have created tables which identify each of the various textual elements and describe their function. They have used these as the basis for writing a collection of C routines through which the text can be passed - to replace any non-printing characters, and substitute the appropriate SGML conformant markup for the original tagging scheme. Melander gave examples of the C code of some of these routines, and showed how the original text was progressively translated as it was passed through each routine. Once all the texts have been suitably tagged, Dataware will make them available on CD with a hypertext interface. Melander closed by identifying the six key additional features which the conversion to hypertext would bring, namely a complete table of contents, support for fielded browsing, the ability for users to follow cross references and to insert their own bookmarks and notes, and support for a history mechanism (though I was unsure if Melander meant a history of document amendments, of user activity, or both!) 3.11 APPLICATION TOPIC #1: "Data for Interactive Electronic Technical Manuals (IETMs)" -- Eric Freese (Senior Technical Specialist, RYO Enterprises) This presentation was primarily aimed at attendees with an interest in CALS. Technical documentation is estimated to cost the USAF $7.5 billion per year, and involves in excess of 23 million pages (of which 19 million are authored and maintained by contractors and the remaining 3 million by the USAF itself). Freese described the evolution of the technical manual as going through three stages: * The past/present situation, where documentation is created, distributed and used on paper. * The present/future situation, where digital documents are produced electronically and distributed on disk. * The future situation, where IETMs will be produced by creating documents electronically, stored in large technical manual databases, and accessed via portable display devices. Freese pointed out that even today, most documentation data is produced as paper with only a limited amount of accessible on- line. By 1999, Freese expects to see about 30% of data produced as IETMs, distributed on optical disks and accessible on-line, with the remainder still appearing in more traditional document- based forms. However, by 2010 Freese believes there will have been a shift of emphasis towards the processing information -- with data being held in integrated databases which can be accessed interactively. Freese then spoke briefly about Technical Manual Standards and Specification (TMSS), in relation to his predictions for the future development of manuals. Freese felt that the current situation led only to a lack of guidance on standards for technical manuals, and that the CALS initiative was therefore poorly supported. For greater interoperability between systems, users, Government and industry, Freese argued that there would be a need for changes to the TMSS following a co-ordinated policy between standards organizations to ensure consistency in specifications and standards. Freese then outlined the main features of the role of IETMs, these being: * To provide task-oriented digital data to the user. * To allow for the development of a non-redundant database (ie where no piece of data is duplicated in several places in the database). * To support the development of an Integrated Weapons Systems Database (IWSDB). * To provide guidelines for the style, format, structure and presentation of digital data. He then went on to talk about the chief characteristics of the data that makes up an IETM. This display medium will be electronic (rather than paper, as at present), and the data primitives will include not just text, tables and graphics, but also audio, video, and processes. All IETM data will be marked up with content tagging (and absolutely no format tagging) -- where each primitive is identified on the basis of the role or function of the data it contains, rather than its position in the logical document structure. The data will be organised in a fully integrated database, with relational links and no redundancy. The data will also be suitable for dynamic presentation, with context dependent filtering as well as user interaction and branching. Freese devoted much of the rest of his presentation to a closer examination of the layers that make up an IEMT database (IEMTDB), paying particular attention to the generic layer. According to Freese, the generic layer would probably be based on the HyTime standard, which would enable developers to draw on such concepts as architectural forms. He closed with a discussion of some sample SGML markup declarations for IETM, and an examination of how such markup might appear in practice. 3.12 International SGML Users' Group Meeting This was the SGML Users' group (SGMLUG) mid-year meeting, following on from the AGM held at Markup '91 in Lugano in May. Just before the meeting Pam Gennusa, the SGMLUG chair, had asked me if I would be prepared to say a few words on the release of the ARC SGML Parser materials - which were originally distributed from The SGML Project at Exeter University. I was, therefore, somewhat surprised when I was formally introduced as the first speaker (and it's always tough being the warm-up act). I began by apologizing for the slight hitch that had occurred when we first tried to release the parser over the academic network (at the behest of the SGMLUG and the material's anonymous donor). I briefly outlined what I knew to be in the collection of materials, and mentioned some of the methods and locations from which they could be obtained. I took the opportunity to point out that the SGML Project is not actively engaged in any SGML development and that our familiarity with the code and use of the Parser materials is actually very limited. I expressed some regret at the fact that the original intention for the SGML Project to act as a clearing house for bug reports and ports seemed to have not been realized in practice -- however, I was very gratified to learn that the materials were now widely disseminated and widely used amongst the SGML community. The only port of which I had been directly notified , was that performed by James Clark to a UNIX environment (from DOS) -- but other information had been sparse. Following these remarks, two other attendees said that a port to the Apple Mac environment was nearing completion, and that there appeared to be a bug in the ARC SGML parser relating to the use of CONREF in a HyTime document. [Since the conference, I have received no further information about either of these points]. Pam Gennusa then gave a brief account of how the AGM's decision to employ a part-time secretary for the secretariat functions of the SGMLUG had been carried out. Ms Gaynor West now fills this role. She also raised the general possibility of re-structuring the SGMLUG in some way other than the current system of national chapters and special interest groups (SIGs). There then followed reports from the various chapters and SIGs, starting with representative from the Dutch Chapter (Mark Woltering). He reported that they now have about 120 members and were able to hold two meetings in the past twelve months -- both of which have been well-supported and active. He also stated that the Dutch Chapter is heavily involved in the European Work- group on SGML (EWS) and had helped to develop the MAJOUR header DTD which was released at Markup '91. The EWS was now working on the creation of a body and back matter DTD but was unsure whether they should risk creating their own DTD `from scratch', or absorb the relevant parts of the AAP's DTD (and try to influence future amendments to that). The representative from the Japanese Chapter (Makoto Yoshioka) gave a brief outline of the rather different composition of their Chapter consisting of forty large companies and organisations rather than individuals, each paying a subscription in the order of Y500,000 (which evoked a mixture of gasps and sobs). Yoshioka had brought with him two products which members of the Japanese Chapter had developed and were prepared to distribute gratis to members of the SMLUG. The first is SGF (Simple SGML Formatter), which is a DOS [Windows?] -- based application that formats documents with embedded SGML markup for output on, say, an HP laserjet printer (or for previewing on screen). Yoshioka said that SGF is still being developed, and that additional printer drivers would probably be added, yet he hoped to be able to distribute the executable code with documentation in English very shortly (with the source code available at a later date). The SGMLUG meeting overran its time allocation and was continued the following evening -- when Yoshioka gave a demonstration of SGF; unfortunately, I missed much of his presentation but hope to obtain a copy from the SGMLUG (for possible distribution, with permission). The other product developed by the Japanese Chapter, vz, was described as an `input editor' and if there was a demonstration of it I neither heard or saw anything. However, I was interested to learn that the Japanese make use of the ARC SGML Parser. The only other reports that I heard were from various North American groups/chapters: The "SGML Forum of New York" reported that they had held their first meeting in April, which had attracted about 50 attendees. The decision was taken to allow both individual and corporate members, of which there are around twenty that are fully paid-up; they also decided to form a non- profit making company. On September 24th, the forum held the "SGML and Publishing Case Study", which also attracted about 50 people. Their short-term goals include the setting up of an electronic bulletin board to disseminate information on SGML, sample DTDs, the ARC SGML Parser, and down-loaded postings from the newsgroup comp.text.sgml. They are also intended to co- sponsor a one-day introductory seminar on SGML in conjunction with the Electronic Publishing Special Interest Group (EPSIG). Other reports came from the Canadian Chapter, who now have a paying membership of about 15 (mainly from the publishing and pharmaceutical industries) who meet quarterly, the Mid-West Chapter -- who are sponsored by Datalogics [?] and due to start meeting on October 29th -- and the [Washington?] DC Chapter, who are sponsored by UNISYS [?]. Several other reports were probably made at the second half of the SGMLUG meeting, and I refer interested readers to the next SGML User's Group Newsletter. REPORTS FROM THE FRONT -- various speakers 4.1 "OSF's Pursuit of DTDs" -- Fred Dalrymple (Group Manager, Documentation Technology, Open Software Foundation) Dalrymple stated that the objective of the OSF (Open Software Foundation) was to "Develop one or more DTDs that enable document interchange among OSF, member companies, technology providers, licensees". He also pointed out that although OSF had been responsible for developments such as Motif, they no longer saw themselves as tied only to the UNIX environment. Following their recognition of the need to interchange documentation, the OSF had originally opted to use troff (with mm and ms macro packages) but following negative feedback from their members they had been forced to revise this decision in favour of SGML. In the first quarter of 1990, OSF put forward a proposal and began implementation of SML (the Semantic Macro Language). This was effectively another macro package designed to replace the mm and ms specific macros with generic structural markup. SML was only intended as a provisional and temporary approach to markup, and in December 1990 the OSF issued a request for DTDs. By April 1991, the OSF had received DTD submissions from ArborText (tables and equations), Bull, CERN (based on IBM's BookMaster), Datalogics (AAP), DEC, HP (HP Tag), and IBM (based on BookMaster). Following presentations on the DTDs, the available SGML software, and from various industry experts, the OSF organised a series of subgroups to examine issues in more detail and produce position papers. By August 1991, it was necessary to refine the position papers and produce a requirements matrix. Since October 1991, a design group has been working on the OSF's DTD's) -- exploring the practicalities of certain issues, and reporting to the various subgroups. The group is using a combination of top-down and bottom-up approaches to try and identify the common elements shared by the proposed DTDs, however their main emphasis is on facilitating document interchange. To this end, they have decided to adopt the reference concrete syntax and permit no markup minimization -- although they may extend the permitted length for names, attributes, literals etc. Dalrymple said that the next phases of work for the OSF will involve the creation of an analysis matrix, the specification of the OSF DTD, followed by its implementation, documentation and eventual publication. The OSF are keen that theirs should be the DTD that people will automatically think of in connection with writing any computer-documentation and they want to ensure that its specification and implementation will prove satisfactory for the requirements of OSF members. Initially, the DTD will only be distributed amongst the OSF, but later they may release it in to the public domain; in either case, the OSF wants to set up a body to ensure that the DTD will be properly maintained. The OSF are also aware that they have yet to consider any formatting issues, such as FOSIs (Format Output Specification Instances) or the use of DSSSL (Document Style Semantics and Specification Language). 4.2 "The Text Encoding Initiative: A(nother) Progress Report" -- Lou Burnard (Co-ordinator of the Oxford Text Archive, Oxford University Computing Service) This presentation served as both an introduction to the Text Encoding Initiative (TEI), and a progress report for those already familiar with its work. Burnard had a great deal of information to get across in a fairly limited amount of time, and despite his energy and enthusiasm (and pleas to the conference Chair) he was unable to get through all his slides. [This seemed somewhat unfortunate, given that those involved in the TEI have devoted a great deal of effort to the problems of marking up real texts, and many attendees might have gleaned some useful tips if they had been given the opportunity to hear about the TEI's work in a little more detail]. The TEI is "a major international project to establish standards and recommendations for the encoding of machine readable textual data" (as used by researchers largely in the Humanities). Its main goals are to facilitate data interchange and provide guidance for text creators, and to this end TEI has produced guidelines which address both what to encode, and how to encode it. When looking for a text encoding scheme the TEI wanted something which had wide acceptance, was simple, clear, and rigorous, was adequate for research needs and conformed to international standards. It also needed to be software, hardware, and application independent. As far as the TEI were concerned, SGML was the only choice. Burnard then gave a brief overview of the organizational structure of the TEI, and outlined the main achievements and activities prior to the conference. He drew particular attention to the publication of TEI P1 ("Guidelines for the Encoding and Interchange of Machine-Readable Texts"). which had provoked a variety of reactions since its first release in July 1990. Burnard also outlined the procedures which would lead to the publication of the second version of the "Guidelines" (TEI P2) in January 1992, followed by TEI P3 in April 1992, and the final version, in June 1992. He gave an indication of the sort of work carried out by the TEI's committees and work groups, and noted with regret that the following areas would probably not be covered satisfactorily: physical description of manuscripts, analytic bibliography, encyclopaedias, directories and other reference books, and office documents. Burnard stated that most of the reactions to TEI P1 fell into one of the four following types: * It is too literary/linguistic/specialist etc and pays too little attention to my own needs. * It is technically too complex/not complex enough * It is not didactic enough * It violates textual purity He then listed the six most frequently asked questions addressed to the TEI, along with the current standard replies. Thus, the TEI hope to make the "Guidelines" available in electronic form, but have not yet done so. People are free to use SGML's minimization features, but in order to conform to the TEI's "Guidelines .." must not do so in any document which is to be exchanged between machines. The TEI will be enforcing its decision to adopt the ISO646 subset. The TEI gives users the freedom to select from its standard tag set, and does not require people to use more than they need. The TEI is working on providing a simpler version of its "Guidelines .."/tag set [?] for beginners. The TEI have yet to make a decision on what software it should recommend. Burnard gave an indication of what TEI P2 would contain. It will contain some Tutorial Guides covering the theory and practice of markup and how SGML can help (giving extended examples) as well as a "barebones subset of P2" (There will be one generic Tutorial Guide, and several lexicographers, theoretical linguists, discourse analysts, textual critics, and so on.) A revised version of the "Guidelines ..." (TEI P1) will form the main part of TEI P2, and will consist of formal prose, an alphabetical reference section, and some DTDs. The formal prose will introduce the basic notions, contain three major sections (the core and default DTD base, alternate DTD bases, and DTD toppings), and a section contents (of formal prose specifications for related sets of textual features). The alphabetical reference section of TEI P2 will be modelled on FORMEX and MAJOUR, with a generic identifies, a definition, and some indication of optionality being given for each element and its attributes. Moreover, the parentage, content, defaults, and the semantics of values are given for each attribute. However, TEI P2 will still not offer or recommend any software. Burnard summarized the TEI's approach to DTD design, and defined some of the fundamental concepts and preferences that had been adopted (eg the notion of bound and floating elements, the concept of crystals, the alternation style of element declaration etc). Discussion DTD design in more detail, Burnard gave the rationale behind the TEI's decisions; comparing the process to designing a restaurant menu, he made the following observations: * Adopting an a la carte (or mix`n'match) model -- gives users maximum freedom to choose all and only the tags that they want. However, it makes document interchange difficult. * Choosing a menu (or set meal) model -- offers minimal freedom and is highly prescriptive. Of course, this makes document interchange very reliable. * Using a pizza model --this gives users "freedom within the law", ie a limited ability to add or change tags, and makes document interchange fairly reliable. Having adopted the `pizza' model approach, the TEI then decided to offer several types of base DTDs (either the TEI core, or a DTD suitable for spoken text, lexicography, mixed from text etc) in conjunction with an appropriate choice of `topping (s)' -- for example hypertext, textual criticism or linguistic analysis. Burnard then discussed the solutions that he believes the TEI has to offer. For users in general, it offers the following: * a single coherent framework in which interchange can be carried out * a set of tools and principles for user-defined extensions * a standard for documenting the content and structure of electronic texts. For those involved with literary and historical applications, it gives: * sets of general purpose structural tags adequate to most surviving written material produced in the Western world during the last 2000 years * a way of embedding arbitrary levels of interpretative annotation with WYSIATI [What You See Is All There Is?] texts * a way of implementing an electronic variorum, in which every instantiation of a given text can be represented in parallel Lastly, for linguistic applications, Burnard felt that the TEI offers * ways of structuring and documenting the contents of large textual corpora, and of guaranteeing their re-usability * ways of aligning and synchronising distinct components of transcribed speech * powerful general purpose tools for the unification of linguistic analyses. Judging from his collection of slides, Burnard had also wanted to talk about the TEI's approach to spoken language and the use of feature structures, but he was not given enough time. He would also have liked to mention the work of the twelve work groups -- reporting on their objectives and status as of 1st October 1991. Burnard closed by urging everyone at the conference to contact TEI if they either wished to subscribe to the distribution list TEI-L and/or to obtain a copy of TEI P2. 4.3 "TCIF IPI SGML Implementation" -- Mark Buckley (Manager, Information Technology, Bellcore) Buckley's presentation was substantially revised and bore little relation to the material contained in his handouts (see below). However, the following extracts from his handouts may be of interest to some readers: "The Information Product Information (IPI) Committee of the Telecommunications Industry Forum (TCIS) has emerged from a recognition of the need for members of the telecommunications community to exchange electronic forms of technical information and the consequent need for the voluntary adoption of standards that facilitate such exchange. In 1990 the TCIF IPI Committee recommended SGML for use where appropriate to facilitate the exchange of document text and began work on a Document Type Definition suitable for descriptive Telecommunications Practices, a common type of technical document. Draft 4 of the Telecommunications Practice (TP) DTD, the first public draft, was published in October of 1990 and distributed at SGML '90. Draft 6 of the TP DTD descends directly from Draft 5. Changes reflect the input of numerous comments and experiments with the earlier draft and have been made primarily to allow for: * easier modification for use in internal corporate languages and for use in various SGML tools available from vendors * greater flexibility in tagging To these ends, * We have stripped most internal comments from the DTD file. * We have consolidated the separate DTD and ELEMENT files into one DTD file. * We have redesigned content models to be descriptive rather than prescriptive. The new content models impose very little in the way of structural requirements. It is likely that TCIF or specific document producers and recipients will evolve stricter requirements with experience. Such constraints may be reintroduced into the DTD when appropriate or may be validated by processing applications. * We have added four general data content elements that greatly reduce the size of the compiled DTD in most environments. * We have further reduced the size of the DTD by consolidating what were separate elements into fewer general elements with "type" attributes. Above all, the DTD [has] been redesigned to function as a DTD specifying the syntax of a language meant for intercorporate document exchange. It is not likely to serve as the foundation of an internal SGML application without modification". Rather than simply duplicate the material contained in his handouts, Buckley chose to set the decision to design the TP DTD in context. He gave a detailed description of the nature and problems of the relationships between the numerous (American) telecommunications companies (and groups of such companies). There are a great many restrictions on the information that can be passed between groups/companies, and even if there were not, most of the companies are so large that they face real problems when trying to interchange documents between different divisions/departments. 4.4 APPLICATION TOPIC #2: "Rapid DTD Development" -- Tommie Usdin (Consultant, Atlis Consulting Group). In her presentation, Usdin examined the problems of DTD development, the Joint Application Development (JAD) methodology, and how JAD can facilitate rapid DTD development. She saw the latter as important because it is expensive, represents a recurring cost and can result in much wasted time and money as DTD often change rapidly when put into production. Usdin put the root case of problems with DTD design down to the fact that it is being left to the wrong group of experts. She argued that in fact it is users who know most about how documents are created, the relevant document parts (their names and definitions), the rules governing document structure, an the final uses/purpose of the information. Usdin saw the root cause of the problem as being management's decision to ask SGML experts to perform document analysis, rather than the genuine document experts (namely, users). Usdin did not wish to imply that SGML experts are inept, merely that they are forced to rely on interviews with users, frequently inadequate/poorly selected samples etc, in order to produce the DTD that they believe is required. Not only is DTD development time-consuming and expensive, outsiders (such as an SGML expert) tend to use language and examples with which users are unfamiliar. Furthermore, since an SGML application is only as flexible as its DTDs, these will often have to be fine-tuned to meet user's (real) needs -- and any rules enforced by the application may be resented and circumvented by irritated users. The solution would appear to be to allow users to develop any DTDs themselves. Yet this is clearly impractical since they do not know, or want to learn, the intimate details of SGML syntax. Even if this were not the case, they would almost certainly lack the necessary experience of DTD development, and each would have only a limited view of the document life cycle. Usdin likened DTD development to systems development -- with conflicting requirements coming from many users who only know part of the overall requirements coming from many users who only know part of the overall requirement. The value of an application depends heavily on the quality of the stated/perceived requirements, and user hostility can quash an otherwise good product. Usdin characterized the traditional method of systems requirements development in terms of the following steps: i) Serial interviews of users ii) Requirement document sent to users for comment iii) Conflicts in requirements resolved by systems analysts based on: - sequence in which requirements received - authority of requestor - ease of implementation iv) Comments incorporated as received. She then displayed a pie-chart showing the reasons for code changes under traditional systems development -- which showed that about 80% were due to problems with either system design, or the requirements and analysis (the latter pair accounting for around 60% of all changes!). Usdin felt that this highlighted the important role of requirements definition -- not only at the development stage, but also during maintenance. As a solution to the problem of requirements definition, Usdin offered "Joint Application Development (JAD) -- highly effective approach to gathering user requirements completely and efficiently, as well as reconciling differences among conflicting user requirements. Usdin described JAD as a "one time one place" highly structured forum for decision making -- a workshop- based process where requirements are stated in the users' vocabulary, and which is led by a facilitator trained in communications, system analysis, and group dynamics. Usdin made some very impressive claims for the JAD technique -- reductions of up to 50% in the time needed to define user requirements and functional specifications, improvements of up to 33% in the accuracy of user requirements and design documents, increased commitment to the software from users and management, improved system usability and a reduction in maintenance plus enhanced communication between systems, designers and users. She felt that the SGML community could learn from the experiences of the Systems development community, the benefits of exploiting user knowledge, the advantages to be had from using meetings to extract information, build consensus, and create a feeling of `user' ownership'. Usdin felt this approach was faster and, consequently, better; it would also be more satisfactory for management, who want to see results rather than processes. Usdin summarized rapid DTD development as "A method of harnessing users' knowledge of their own documents, information management process, and needs. Instead of having consultants learn your document structures, let your users do the document analysis". Its main goals are to reduce the time and cost both of DTD development, and the customization when new DTDs are brought into production. Based on JAD, rapid DTD development is an interasive, highly-structured workshop-based process, in which users are encouraged to adopt a top-down approach to document analysis, and their consensus is sought at every stage. Information is collected via a forms-based approach -- where users are asked to complete forms detailing all the information relating to the document elements that they want and/or have identified (giving examples of the elements in context etc.) Usdin then described the workshop process in more detail. She said that a rapid DTD development should consist of no more than fifteen people, and should include the SGML analyst/facilitator, authors, editors, production and system staff. the role of the facilitator (normally the SGML analyst), is to plan and manage the workshop process, create e and explain the forms used, to facilitate discussions, and help participants create acceptable compromises and build consensus. A typical workshop agenda (for the work to be done by the users) involves the following steps: * Define the document type * Select application standards * Decompose document into elements (top-down) * Define, name, and document each element * Describe element model and presentation format * Identify, define and name attributes * Identify, define and name general entities (ie what users might call "boilerplate" text). For users who are authors and/or editors, in conjunction with the steps mentioned above they should also be encouraged to select tag names (since they are the people who are most likely to deal with them directly), and to identify element relationships. The production and systems staff need to ensure that adequate information is captured for presentation, and also to identify elements needed for routing,control and management. They should also identify any systems constraints and learn the vocabulary, needs and concerns of other users. The role of the SGML analyst is to record the results of the workshop as a DTD suitable for parsing and validation; however, at this stage the DTD must be revisable in accordance with user specifications (ie users must not be forced to comply with the DTD). The SGML analyst must ensure that the process of document analysis is completely thorough (reminding participants of such factors as database requirements, future uses of the information, and non-printing elements -- such as routing, tracking and security information). Moreover, the analyst must ensure that the DTD conforms to any relevant standards (eg CALS), and must provide full documentation including a tag library, an hierarchical [tag] listing, and alphabetical indexes to element and tag names. Usdin noted that rapid DTD development works most effectively when there are: * Established documents * Users who are knowledgeable of their own documents, applications, requirements etc. * An experienced SGML analyst/JAD facilitator In the light of the above, Usdin remarked that rapid DTD development would therefore be inappropriate for: * Start-up operations * Completely new document types * Where there were no experienced users/authors * Where there was no access to the users/authors * An unstructured approach to data gathering * When the facilitator lacks SGML knowledge or JAD experience However, when it is possible to product DTDs using the rapid DTD development process, they have several inherent advantages. For example, the DTDs are based on the way users create, manipulate and use information, and employ user defined names for elements and attributes. Such DTDs reflect the appropriate level of detail needed for current and planned products and are ready to use and fully documented in weeks (rather than months). From a management point-of-view, rapid DTD development techniques save time and money by making the DTDs available sooner and by reducing user distrust and hostility (since they feel they played a part in the analysis and "own" the DTD). Moreover, training costs are reduced because staff are already familiar with the names, vocabulary and examples used; integration costs are cut because systems concerns have been addressed during the development phase (meaning that less technical fine-tuning needs to be done). Finally, Usdin asserted that rapid DTD development can produce a DTD within two to six weeks -- as opposed to the two to six months of traditional methods; furthermore, it is much cheaper for a business to use its own staff time than to pay the fees of an SGML consultancy! 4.5 POSTER SESSION #1: "Tables in the Real World" -- various speakers The idea of the poster sessions is for several speakers to give simultaneous presentations on topics that relate to a general theme. In the time available, each speaker repeats his/her presentation, fields questions from the audience -- who are free to move from speaker to speaker as they wish. Whilst the idea seems attractive, it was difficult to give serious attention to more than a couple of presentations, the better speakers tended to attract (and keep) the largest audiences whatever their topic), and the timing of the presentations quickly got out of synch. What I heard was generally worthwhile, but I would put in a plea to the conference organizers for more draconian time management in future. The theme for the poster session was briefly introduced by Eric Severson, who raised the same question as he had (apparently) asked last year -- are tables graphical objects or multi- dimensional arrays? The speakers, and a rough guide to their topics were as follows: * Bob Barlow CALS (DoD): The CALS table tagging Scheme * Mark Buckley TCIF: tables rely on being two- dimensional to convey information, it is not practical to concentrate primarily on the data held in tables; the TCIF have no general solution for handling `non-representational' tables. * Joe Davidson Handling tables practically * Peter Flynn How should statistical packages import/export SGML tables? * Thomas Talent[?] The `Oakridge' approach to producing tables. * Ludo Van Vooren Format-oriented vs content-based approaches to tables. 4.6 "Handling Tables Practically" -- Joe Davidson (SoftQuad Inc) Davidson began by stating SoftQuad's view of tables -- that they are information objects with specific visual characteristics. The visual characteristics play an inherent part in an information object's ability to convey information,, in order to support this assertion, Davidson cited the difficulties of trying to describe a table to someone who cannot see it (eg over the telephone). [Although it did seem to me that Davidson was rather blurring the distinction between trying to convey the information contained in a table, and the way that the information was formatted for clear presentation ie moving away from a logical view of the text/table towards a more presentation-oriented view]. Davidson then discussed how tables can be coded in SGML for display purposes -- using attributes to code, the number of row/columns etc. However, he also asserted that table design is a very diverse and individual, and it is difficult to design a tool that can format an on-screen representation of a table for any and all possible SGML table DTDs. Therefore, SoftQuad have decided to work on the premise that there are basic features which are common to the vast majority of tables, and they have then designed a tool to cater for these features. Davidson then used a series of slides to demonstrate a user session with SoftQuad's table editing tool -- which will be available as an add-on to Author/Editor. To create a table from scratch using the table editor, a dialogue box is called up and the user is prompted to answer several questions (eg number of rows/columns etc.) When this process is complete, the template of the table appears as a graphical object on the screen, and the user is free to enter data into the available cells. The user also has the option to view the whole table as data and raw SGML tags, should s/he so wish. It is also possible to import valid SGML marked up tables into the current document, and have the table editor display the information in a suitable graphical form. Apart from entering valid data into the cells of the graphical representation of the table, it is also possible to perform cut and past operations -- moving selections of cells, rows etc to any valid new location. It is even possible to perform invalid cut and past operations, but only with Author/Editor's rule checking facility turned off (since this performs on-the-fly validation against the document's DTD). [If stated, I missed quite how the validation (and, therefore, DTD extension?) relating to the table integrates with the DTD which governs the rest of the document -- since surely it would not always be advisable to give users the freedom to personalize a standard' DTD in this way]. Using the table editor, it is possible to alter the structure of an existing table, with the restriction that no table can have more than 64K rows or 64K columns. Scroll bars make it possible to view large tables on screen, but this might be a little tedious with very wide tables. 4.7 "TCIF Approach to Tables" -- Mark Buckley (Manager, Information Technology, Bellcore) I came in mid-way through Buckley's presentation -- which took the form of continuous discussion rather than a cycle of main points. From the bulleted points noted on his flip over chart, the TCIF's approach to tables (in draft 5/6 of their Telecommunications Practice DTD) was: * Do what you can * Pass any information that is cheap to produce and that may be of use to the document's recipient * The TP DTD should be able to support this kind of activity, ie passing information about tables in a standard way -- using attributes. 4.8 "Format-oriented vs Content-oriented Approaches to Tables" -- Ludo Van Vooren (Director of Applications, Avalanche Development Company) This was another presentation which was in `full-flow' by the time I arrived. Vooren argued that the logical structure of any table is, in fact, a multi-dimensional array. As with an array, any part of the table can be accessed (and the information retrieved therefrom), by giving unique sets of co-ordinates. Thinking of, and marking up a table in this way, makes it much easier to manipulate the information contained in that table; for example, it could be possible to extract the data contained in a user-defined selection of cells from the table, and re- present that data in the form of a pie chart. Adopting a format- based approach to table markup, makes it much more difficult (if not impossible) to manipulate the data contained in a table in a way that is equally useful. However, the content-oriented/multi-dimensional array approach to table markup, raises a number of significant issues. For example, DTD designers will need to know very clearly how and for what purposes the information contained in a table will be used. Moreover, it will not be an easy task to educate users to think of tables as multi-dimensional (logical) arrays, rather than as graphical objects. Also separating the form and content of tables highlights the particular difficulties of trying to represent such logical objects in a formatted form suitable for presentation on screen or paper. As I left the discussion, Van Vooren had just raised the interesting question of whether or not we should treat forms as simply a sub-group of the type `table' -- and the implications this might have for information processing. 4.9 "How Should Statistical Packages Import/Export SGML Tables?" -- Peter Flynn (Academic Computer Manager, University College, Cork) I only caught a little of Flynn's remarks in passing -- and I think it was regrettable that there were not more statistical/spreadsheet package manufacturers attending the conference (let alone Flynn's poster session). He seemed to be arguing that depressingly little attention had been paid by the SGML community and developers to the problems involved in importing and exporting into and out of statistical packages. Although there seems to be general interest in, say, the problems of using SGML files with popular workprocessing packages such as WordPerfect or Microsoft Word, Flynn said that he had heard very little about developing SGML features for packages such as Lotus 1-2-3 or SPSS. [Feedback to the SGML Project has been notably sparse on this area also]. "FORMATTING ISSUES AND STRATEGIES" -- Various speakers 4.10 "Formatting -- Output Specifications" Kathie Brown (Vice- President, US Lynx) Brown began by describing where the creation of an output specification fitted into the overall SGML composition process. Her main theme was that many SGML implementations are being delayed by the lack of suitable tools for creating and using output specifications. Brown defined an output specification as something which relates source elements and a set of style specifications. It uses element tags and the source tree structure to locate places where a style should be applied or changed, or other processing initiated. Furthermore, the output specification should be written in a system-neutral language, and employ standard terms. Brown also suggested that an output specification addresses a number of other problems -- it describes document flow rules as well as local formatting, it describes page layout and directs source rules for generating and re-ordering source content, and it defines all the logical operations necessary for resolving values and initiating logical routines. Brown then identified what an output specification must describe: assembly and ordering of final document text - generation of implied content - reordering of content - replication of content for extractions, strings etc. - suppression of content - denoting the style in which text elements should appear - assembling into text blocks - numbering of elements description of page models - media characteristics - page size, orientation, and margins - imposition - layout of main text - other page areas, including relative placement - page ruling - repeating page elements, including graphics and extractions - page order - associated blanks composition rules and style characteristics - area/text flow and rules for area/text fill - arbitration rules for competing values - rules for placement of tables and figures etc. - page and column balance - rules for handling footnotes and similar structures - ordering of document values assembly into printable document and other finishing operations - numbering pages - ordering of values used in page marking - insertion and treatment of blank pages - rules for generating and formatting empty elements - handling of separate volumes or parts - effects of document binding on output handling of local elements in the source (overrider) - overriding document-wide values - overriding element attributes values - rules for reformatting tables if necessary - changing row/column relationships - adjusting graphic sizes - suppressing source elements how to handle non-parsed data types Brown went on to identify those characteristics which she believed would form the basis of a good output specification. It should be neutral to DTDs, and not based on assumptions inherent in a single DTD or class of documents. A good output specification should also be widely used (to promote the writing of proprietary output drivers), and also easy for an author to understand, use, and retrace his/her steps. It should be structured so as to ease the building of automated authoring tools -- and if it were machine structurable, it would be possible to build output maps directly from the output specification. A good output specification would also be characterized by stable syntax and semantics. Brown noted that output specifications could be written to support many other structures than only those required for paper- based printing. For example, output specifications could be written to support database loading, hypertext; illustrated parts lists (and similar derived tabular structures), indexes of abstracts or paragraphs, screen displays, or on-line database loading and screen display. Brown identified a number of issues directly relating to the output of SGML. In addition to decisions concerning the use of character sets and fonts, there are semantic questions which require resolution (eg the identification of "significant" element names, passing formatting values in attributes, the standardization of style specification labels, and the production of proprietary specifications). There are also logical problems to overcome (eg counter management/output, extraction of content/attribute values, extraction of computed values/strings, graphic placement etc) and what Brown referred to as "fidelity guarantees" and "language biases". She also identified "geometric issues" relating to SGML output -- which concludes such matters as the declaration of a co-ordinate system, dimensioning, the description.invocation of relative areas, and the windowing/scaling of graphics. She also briefly mentioned the overheads to be considered when revising output specifications. In the process of moving from an output specification to final composed pages, Brown suggested that several new elements are introduced as part of the transformation process, which require consideration. These new elements included such factors as file management, the use of proprietary syntax/coding, graphic anchoring and graphic file format conversions, how to handle page elements such as headers, footers and folios, how to cope with system executable functions, document assembly routines, and the resolution of values. The sorts of values which might require resolution, Brown identified as : [and] numbering, specific references to font/character locations, spacing specifications, entities, attribute values, cross-references, external references, variables, footnotes, any system-invoked constraint rules, graphic file referencing and the setup of tabular material. Brown then summarized the current situation comparing the SGML users' situation to that of the characters in Beckett's play "Waiting for Godot" -- like Godot, DSSSL's imminent arrival is always anticipated, but never actually happens!) Brown stated that there are currently available a number of programming libraries that allow links with parsers to enable custom programming. There are also a few commercially available specified languages to develop custom applications. In addition, there are some proprietary solutions that support specific classes of documents, as well as FOSIs written within systems (many of which support specific DTDs). Lastly, there are what Brown referred to as "Combinations of specialized languages linked to parsers and programmatic approaches to file management and value resolution". However, Brown suggested that all these approaches (and DSSSL engines) depend on knowing the composition system at a very low level. Looking ahead, Brown anticipated that arrival of WYSIWYG output specification writing tools, as well as user-friendly mapping development tools, She also predicted the production of automatic output-specification interpreters and DSSSL engines. Yet Brown stressed that each of these developments would depend on first having a stable out-put specification or an DSSSL. Brown concluded by summarizing what her company (US Lynx) offered in terms of solutions for working with SGML and output specifications. They offer general consultation, project management, and custom programming solutions as well as DTD and FOSI development. They use/have developed [it was unclear which] a "context-wise" SGML document tagger, which offers context- sensitive transformation of exisitng documents into SGML documents. They use/have developed [it was unclear which] an "instance imager output system" that offers a modular structure -- revision and editorial station, graphic editing station, and output driver -- and an "Instance Imager" development kit for writing drivers. Brown said that US Lynx also use/have developed [it was unclear which] a technical manual output specification DTD which can handle most paper output specifications, and offers assistance for custom or proprietary implementations. 4.11 "Formatting as an Afterthought" -- Michael Maziarka (Datalogics Inc., Chicago, Illinois) Marziarka began by proposing that early DTDs were written with the goal of publishing in mind, and so were optimized for presentation. This mean that DTDs were written for almost every type of document, purely on the basis that each `type' had a different look. It was unclear what should be regarded as formatting information, and what as structural information. The result of this confusion, Maziarka suggested, was that too many DTDs were being written; the different appearances of various documents were confusing the issue of whether or not they contained different (types of) information. However, Maziarka felt that people's goals were now changing -- with the realization that the money is in the data, not the paper on which the data is formatted and stored. SGML provides a tool to manage data, and Maziarka argues that publishing should be seen as a by-product not the major goal of SGML use. He identified four goals for data management with SGML -- to manage information rather than pages, to author the data once only (and then re-use it), to provide easier access/update/use of data, and to produce data (rather than document) repositories. Maziarka stated that modern DTDs are written to optimize data storage, with data stored in modules and the use of boiler-plate text, in order to eliminate redundancy. Moreover, multiple document types can be derived from a single storage facility, by using one "modular" DTD. However, Maziarka then raised the difficulty of moving data from its stored SGML form into an SGML form suitable for formatting. Is it possible to format directly from the "storage" DTD (ie is the data sequenced correctly for linear presentation, and/or do links in the data need to be resolved during the extraction phrase)? Or, must you try and extract data from a "storage" DTD into a "presentation" DTD? To highlight the difficulties with both approaches, Maziarka then presented a number of practical examples, (including problems relating to the use of a "storage" DTD, and the handling of footnotes and tables). Maziarka suggested that "storage" DTDs should be carefully written so as not to hinder formatting (or movement into a "presentation" DTD). Similarly, "presentation" DTDs must be written in such a way as to ensure their compatability with "storage" DTDs. Maziarka also put in a plea for the production of more sophisticated extraction utilities, since formatters require data in a linear fashion (ie in the order it will appear on the page) to be at their most effective. Maziarka concluded with a repetition of his calls for DTDs that serve both storage and formatting purposes, and for more sophisticated extraction routines. 4.12 "Native vs Structure Enforcing Editors" -- Moira Meehan (Product Manager CALS, Interleaf) [No copies of Meehan's slides were made available, and her presentation was not always easy to follow -- therefore I cannot guarantee that I have represented her opinions correctly]. Meehan began by outlining "native" SGML editing, which implies/involves the processing of SGML markup, the storing of the SGML markup alone, and the manipulation of advanced SGML constructs. However, she suggested that SGML may not guarantee consistent document structure, since identical fragments can yield inconsistent document structures, which in turn has implications for editing operations. Unfortunately, none of these assertions were supported by examples. Meehan, (citing an ANSI document [ANSI X3VI/91-04 (Appendix A - ESIS) ??]), suggested that there are two types of SGML application object-oriented systems that can control structures, and WYSIWYG systems that facilitate processing. She then stated that Interleaf 5 is a fully object-oriented system, in which the arrangement of objects can be controlled and the parser is able to produce the data in canonical form. WYSIWYG systems can use formatting to imply document structure, they are easy to use, and enable verification of the logical document, its semantics, and compound element presentation [?]. Meehan concluded by saying that controlling document structures (objects) is valuable, as is the process of text conforming to ISO 8879. Adding that [on-screen] formatting facilitates the creation of such text. She was then subjected to a barrage of questions about Interleaf's policy on SGML and how their products were going to develop. Meehan said that she knew of no plans to re-write their engine to support SGML; currently, Interleaf 5 does not directly process SGML markup, but instead maintains the document structure internally and recomputes the SGML. SGML documents that are imported into Interleaf are mapped to Interleaf objects in a one-to-one mapping, although their process has been found by some users to be unacceptably slow/ [My thanks to Neil Carlton, who provided most of the information used in this write-up]. POSTER SESSION 2: Verification and Validation -- various speakers [I was only able to attend one of the presentations within this session] 4.13 "Verification and Validation" -- Eric Severson (Vice President, Avalanche Development Co) Severson proposed the development of a tool that relies on weighted rules to check markup. He suggested that a statistical approach could be adopted in order to identify the areas where most markup problems occur, and then assign weightings to a set of rules which control the tool's sensitivity when checking markup. Severson could see two major problems with the approach he was suggesting. No-one has done any research to establish the statistical distribution of incorrect markup -- and it was unclear whether such a distribution would be effected by who or what did the markup, and of what types of document structure. For example, is the process of auto-tagging tables, more or less error-prone than the manual tagging of book texts? Severson suggested two possible ways of studying the statistical distribution of incorrect markup. Either to perform a "global analysis" and adopt what he called a `process control' approach, or to perform a "local analysis" and attempt to establish a rule base that detects the symptoms of local problems. 4.14 AAP Math/Table Update Committee -- Chair, Paul Grosso This was an evening session for anyone interested in the AAP Standard (Association of American Publishers) and particularly those aspects relating to the handling of Maths and tables. Paul Grosso opened with a brief commentary on the AAP standard, and noted that AAP's imminent intention to revise it. Grosso said that the AAP still tended to think in terms of a US audience, but was gradually recognizing that its work was quickly becoming a de facto standard in the publishing industry world- wide. Discussion was vigorous and wide-ranging, and consequently not easy to summarize. A member of the committee that had originally devised the AAP standard said that they were well aware of its inherent imperfections, but this had been a by-product of the broad policy decision to "dumb-down" the original DTD in order to make it more suitable for non-technical and widespread consumption. Grosso raised the question of what do people actually want to do with the AAP tagging schemes? He had received direct communications from people concerned with mathematics who had suggested several things they would like to do with the AAP Standard and SGML, but had fount it difficult/impossible with the standard in its current form (eg actually using the mathematics embedded in documents, easy database storage, searching etc). However, Grosso said he had heard surprisingly little from people who wanted to use the AAP Standard as a basis for processing tables. He suggested that users need to decide what their goals are, with respect to handling mathematics. and tables, then decide how it should be marked up. Thus Grosso posed the question "What is the intended purpose of the AAP DTD?" Replies fell roughly into two broad types. One type placed the emphasis on making documents generic -- fully independent of either hardware or software -- with the main intention being to capture only information (which would be devoid of anything to do with formatting or styles). The other type of reply suggested that the main purpose of the AAP DTD was so that scientists could communicate more easily. Grosso then asked for an indication of how many of those present at the meeting actually used either of the AAP tagging schemes. For both the math and table tagging schemes the answer was around six out of approximately thirty attendees (largely the same six in both cases). [Of course there was no real way of telling how significant these results were, as the attendees were all simply interested parties rather than a representative sample of the user community; also, some attendees were present on behalf of large companies or publishing concerns, whilst others were there only as individuals]. Fred Veldmeyer, (Consultant, Elsevier Science Publishers), said that they have been using the AAP scheme for tagging math, but had encountered problems with the character set (especially when handling accented characters). Elsevier are now looking at the ISO approach to handling math and formulae, and they are also involved in the work of the EWS (European Workgroup on SGML). Steven DeRose (Senior Systems Architect, Electronic Book Technologies) suggested that in its current form the AAP tag set does not enable users to markup enough variety of tables, and it also does not facilitate extraction of information from tables. William Woolf (Associated Executive Director, American Mathematical Society), spoke of his organization's involvement in the development of TeX. He said that he wanted to ensure that TeX users would be able to be carried along by, and fit in with, the transition towards SGML. Richard Timoney (Euromath) pointed out that any SGML-based system for handling math must be able to do at least as much as TeX or LaTex, otherwise mathematicians would be very reluctant to start using it. He said that the Euromath DTD currently contained far fewer categories than in the AAP math DTD, (and in a number of respects it was more like a translation from TeX). Someone then asked if handling math was only a question of formatting, why not simply abandon the AAP approach and encourage users to embed TeX in their documents? This provoked a reply from another of the members of the committee that had originally worked on the AAP standard. He said that designing the DTD [s} had been very hard work -- and that any revisions would be equally difficult and time consuming. They had tried to aim for a system of math markup which would be keyable by anyone (eg copy typists, editors) whether or not they had specialist knowledge of the subject. However, he also stated that he wanted the AAP standard to be dynamic and responsive to the comments and requirements of users through a process of regular review and revision. Another speaker remarked that the situation has changed since the AAP standard was drawn up -- with the arrival of products such as Mathematica and Maple, and new envisaged uses for marked up texts. A speaker from the ACM (Association for Computing Machinery) remarked that whilst as publishers they did not expect the AAP Standard to be an authoring tool, they still needed a way to turn the texts authors produce into/out of the Standard, (and such tools are not currently available). Steve Newcomb (Chair SGML SIGhyper), suggested that future revisions of the AAP Standard might perhaps incorporate some of the facilities pioneered in the development of HyTime. He proposed the use of finite co-ordinate spaces and bounding boxes as HyTime facilities which could assist in the process of marking up tabular material. He also remarked that adding these facilities to the Standard need not entail a wholesale movement over to HyTime. Fred Veldmeyer (Elsevier) remarked that whatever changes are going to be made to the AAP Standard, the process should not take too long. Several people at the meeting were also of the opinion that since it would effectively be impossible to supply a DTD that met everyone's full requirements, it was better to have an imperfect Standard in circulation than no standard at all. An alternative opinion was that whilst it is plainly necessary to get the revised version of the Standard out as soon as possible, if it served no-one's needs adequately it would eventually be left to wither away though lack of use. A straw poll was then taken of the remaining people at the meeting -- to see which of the current approaches to handling maths should form the basis for revisions to the AAP Standard. About five people were in favour of using ISO 9573's approach to math, five for the developing the current AAP approach, and five for investigating the Euromath DTD. No vote was taken on an approach to tables. The American Mathematical Society volunteered to set up an electronic distribution list, so that the threads of discussion on Math and tables could be continued over the network for anyone who wished to take part. [Since SGML '91, the distribution lists have been operational, in order to subscribe to the list, email listserv@e-math.ams.com, an email message whose body consists of the following three lines (for math): subscribe sgml-math <your name> set sgml-math mail ack help (for tables): subscribe sgml-tables <your name> set sgml-tables mail ack help In each case <your name> is your full name (so in may case <your name> = Michael G Popham). You should receive acknowledgement from the list server by return email.] The next meeting of the AAP Math/Table Update Committee is intended to co-incide with TechDoc Winter '92. 5.1 "Unlocking the real power in the Information" -- Jerome Zadow (Consultant, Concord Research Associates). Zadow opened by posing the question "What's the good of SGML?", particularly with regard to the topic of the title of his presentation. He then set out his own position, aware of his own bias towards SGML. He suggested that most people considering SGML systems do not understand the reach and power of the concepts involved; most that are already implementing SGML, are doing so for the wrong reasons. He argued that the simple powerful concepts of SGML still require careful thought and analysis, and represent as many implementation considerations as a large database. Drawing on an article from Scientific American (L.Tester, 9/91), Zadow summarized computing trends for each decade from the 60's to the 90's. He also discussed the main features of the networks currently in use, and how they work together to create the existing network infrastructure. Zadow believed that trends in computer use and networking, meant that users are now overwhelmed with information. Users now create and deal with much more information than in the past and also spend more of their time formatting the information they process. Zadow described the current situation as one of "information overload", where there is too much information of too many types in too many different places. The result is that the cost of finding, acquiring, using and preparing necessary information is becoming too great. Looking ahead for the next ten years, Zadow anticipated the increased use of "Knowledge Navigators" which facilitate intelligent network roaming. He foresaw a growth in the number and size of public repository databases, and a fusion of computers, communications, TV and personal electronics. However, he also accepted that many factors would continue to change -- platforms, screen resolutions, windowing mechanisms, carriers, speeds, operating systems etc. What remains comparatively stable are the information sources and data -- both of which have a long life -- and the fact that the number of information types will continue to grow. The initial rationale for SGML was for the publishing and re- publishing of frequently revised information, and for the distribution of information in a machine- and application- independent form. The justification for establishing SGML as a standard was to ensure the validity, longevity and usability of such information. However, Zadow suggested that SGML is now being used will beyond its initial purposes. Information marked up with SGML is no longer confined to publishing on paper -- as well as supporting publishing in a variety of different media, SGML is used in distributed, databases, and interactive environments. SGML structures have grown beyond words and pictures to include motion, sound and hyperlinks. Zadow then put up some schematic diagrams of the current (typical) information flow, and the future information flow using SGML. He annotated each diagram with notes on the processing stages involved in the information flow. Instead of information going directly from the publisher to the consumer, with some information re-directed into a library for storage and retrieval, Zadow predicted that all published information will flow directly into the `library'. The generators of information will no longer merely research and write it, but will also be responsible for adding intelligence (via markup), and providing additional features such as motion, sound etc. Information will be distributed from the library to the consumer on the basis of additional processing which will only be made possible by the inclusion of intelligent markup in the information (facilitating security checks, the matching of requests will use profiles etc.) Zadow suggested that all information should be analysed, so that its purpose could be defined. it should be established how you want to use and share the data, who you want to share it with (the community), and over how much of the community your techniques will be valid. Zadow then proposed three broad types of data: a) Information managed as pages -- hard copy or master: little or no intelligent coding b) Information managed as (parts of) documents -- hard copy or electronic: structure, form and some content encoded. c) Information managed as a database: content encoded, structure and form attached. An example of type (c) data would be found in an hierarchical non-redundant database of re-usable data elements. each of these elements would have content, relationships (with other elements) and attributes (for use). He then showed a schematic representation of type (c) data -- stored in a non-redundant, non-formatted neutral database -- being extracted and variously used as task-oriented data, training data, data for documentation and management data. Zadow emphasized the importance of careful information analysis -- adopting structure-based, information tagging, and hybrid approaches. It is vital to establish how the information is used now, and what facilities are missing; it is also important to bear in mind the extent to which any approach will meet the requirements of the broader community. Any new SGML application should be defined broadly -- with the required and optional parts clearly determined. The application should be prototyped and thoroughly tested against any stated purposes and goals. Analysts must also ensure that the definition of this application will offer the least hinderance in attaining future goals. Zadow proposed a number of scale economies that could result from an SGML-based approach -- primarily those of reduced costs. He encouraged users to "think globally, act locally" and to focus on their immediate functional purposes whilst still planning to maximize future benefits. However, Zadow also sounded a note of caution. Simply using proper SGML application does not mean that everyone will be able to use your data. He encouraged the use of the concrete reference tag names and syntax, and urged designers to adopt (or modify) existing public SGML applications rather than working from scratch. Zadwo envisaged a changing realm of publishing, one that fulfilled the definition he cited from Webster's 7th New collegiate Dictionary -- "[To] Publish: to make generally known, to place before the public: disseminate". He saw this in terms of a move away from paper to other media (including electronic), from formality to informality, from active to passive, from serial to self-organized, from the physical limitations of paper to the extra dimensionality offered by motion, sound etc. He also saw traditional libraries becoming more like archives. They would continue their existing functions as information repositories, providers of user access to multi-publisher document collections, and collection management, but also take on new and additional functions. These would include "routing" (distributing information from publisher to consumer), automated electronic search and selection, information marketing and order fulfilment, and several other diverse functions. Zadow suggested that any organization intending to use SGML to unlock the real power of information should be prepared to change the scope of its information systems management; it will also need to adopt new production/competitive strategies, and refine its information. When considering the range of alternative applications SGML could be another way to do just what you are doing now; or it could be part of a much broader strategy, in which case the costs are likely to be greater, but the benefits deeper and must longer lived. Zadow concluded by saying that SGML applications should not just replace current methods. We must ensure success of early implementations because they are the pilots for broader applications. Organizations must start changing now, or find themselves lagging behind. We must organize communities with shared interests to agree upon and manage applications and data dictionaries; only in this way will our applications by broadly usable over time. 5.2 "The Design and Development of a Database Model to Support SGML Document Management" -- John Gawowski, Information Dimensions Inc [NBC] This was another piece of work in progress. He listed the requirements for an SGML document management system and the development of a model to support these requirements. Such a system must be able to manage documents at the component level: documents of different types must co-exist; creation of new documents from the components of several documents should be possible; and searching based on structure as well as content should be supported. In the model they are developing they are making a division of data into contextual content and contextual criteria. They therefore take a DTD and separate its elements into one of the above. As an example, for a simple resume the following division could be made. CONTEXTUAL CONTENT CONTEXTUALCRITERIA resume name emp_history birthdate job marital_status jobdes from career_goals until Putting these together gives structures of the form: CRITERIA VALUE CONTENT name T. Adam resume birthdate April 12, 1950 resume marital_status Single resume from 1976 job(1) from 1981 job(2) until 1981 job(1) until 1991 job(2) This model has been found to meet the stated requirements and can be generalised to support arbitrary applications. It is also appropriate for the support of structured full-text query languages (SFQL) being proposed. Their development system has a relational engine (Basis+) with a text retrieval capability. 5.3 "A Bridge Between Technical Publications and Design Engineering Databases" -- Jeff Lankford, Northrop Research and Technology Centre [NBC] At Northrop they are also working on a database publishing system. They too have come up against the problem of too much paper and too little information. he showed a slide of a technician next to a fighter aircraft with a pile of manuals taller then himself. They are looking at the opportunities provided by advanced technology publishing. These include: * being able to retrieve the technical content from on-line databases; * having the presentation dependent on the document, display and use; * content-sensitive document scanning; * electronic multi-media presentation; * and interactive, animated graphics. Their approach to obtaining these goals is based on the notion of dynamic documents. These have a number of characteristics: * there is not final published version, they are created on demand; * the technical content is retrieved from on-line databases; * and authoring binds markup with specific sources. To meet these goals, the solution they have come up with is known as TINA (TINA Is Not an Acronym) With TINA they are trying to develop a client-server model. The architecture of TINA involves multiple distributed concurrent clients and servers using a network protocol based on ISO and SQL accessible databases. the TINA server has drivers for the different databases being used. The clients provide the information from the databases. The server is able to decode information from the clients and switch to the appropriate driver. The client is responsible for putting together a valid SQL construct. They have produced a successful prototype and a production version is in progress. Their aim is to replace the pile of manuals next to the technician by a portable computer terminal. Currently they are not using SGML with TINA but the speaker suggested that it could be used in the definition and construction of the source databases, for data interchange among clients and servers, and for CALS-compliant distribution to customers. However, he saw some barriers to the use of SGML. Firstly, the existing application works fine without SGML so it is hard to persuade management to change. Secondly, the speaker complained that there are no wide-spread public domain SGML tools. This statement drew criticism from the audience when someone suggested that while academics might have trouble affording commercial software, Northrop ought to be able to afford to purchase products. The speaker insisted that very little money came his way to buy SGML software. He was still trying to educate his management of its advantages. 5.4 "Marking Up a Complex Reference Work Using SGML Technology" -- Jim McFadden, Exoterica [NBC] This presentation was billed as an example of the use of "Fourth Generation SGML". According to McFadden, fourth generation SGML, or fourth generation markup languages, is a new term designed to distinguish a particular methodology. Fourth generation SGML documents are marked up generically ie they do not contain any explicit information about how the data in them will be processed or displayed. This is the goal of third generation SGML documents as well. However, fourth generation SGML documents are distinguished by two important features. Firstly, the data are organized such that the structure being represented is the structure of the data and not the structure of the presentation, and the markup language defined for the document is powerful enough to capture the data in a natural and economical way. Secondly, intelligence is added to the document so that the data are unambiguous to both human coders and the computer; all redundancy and data dependency will be resolved. In traditional DBMS technology this is referred to as "data normalization". With SGML a properly normalised data structure ensures that the data can be easily applied to several diverse applications without any requirement for modification of the original marked up source. Fourth generation SGML documents can become very sophisticated webs of complex knowledge. Fourth generation SGML requires the functionality of the reference concrete syntax, in order to support advanced markup notations, and it requires sophisticated processing software to support the wide range of possible data manipulations. Exoterica are using these fourth generation SGML techniques to capture the detail in complex texts, such as reference works. The presentation described a real ongoing commercial activity involving movie reference works. They are taking a number of books on the same subject and combining them into a single computer-based book. The customer wanted the ability to create complex multimedia hypertext and the ability to recreate the original book forms. Text was received in book and typesetter form and they have used their own OmniMark product, writing scripts to convert binary codes to ascii. As the document was already very structured an OmniMark script could also be used to tag the document with "full tags". At this stage they had produced effectively a third generation generically tagged SGML document. Another OmniMark script was used to translate the full markup to a simpler language which is more readable for the coders. Some elements become attributes and some tags are replaced by strings. The coders then have to go through the whole text as a preliminary QA step. They have found that it takes (sometimes expert) human intervention to resolve ambiguities for the computer. CheckMark, a validating SGML editor, was then used to ensure that the markup in the files was syntactically correct. At the completion of this phase the files were valid SGML documents which had enough detail captured to format the documents in a number of ways. There is also sufficient detailed captured to create powerful hypertext documents. The marked up files were then converted to RTF (used by Word) with another OmniMark script. Coders then compare the Word display to the original text and any necessary corrections made. They have had four people working on this project since October. They reckoned on four man-months of work to do about four (similar) books. They are also doing a World almanac. They see many potential uses for the data. One can: * form the original database for future amendments; * generate various hypertext documents; * generate books of various types based on criteria; * generate various indices; * combine the data with other books. 5.5 "Nurturing SGML in a Neutral to Hostile Environment -- Sam Hunting, Boston Computer Society. [NBC] I caught only the end of this talk but picked up some of the background material. The BCS has 25,000 members and 800 activists who primarily, deliver member services on a volunteer basis, using software and hardware donated by industry. They produce over 20 publications. They have recently set up the Accessible Information Technology Project (AIT). Its mission is to develop a common electronic document standard for the BCS, which can be rendered (viewed) in any medium. They have chosen SGML for the electronic format because it is: (a) an ISO standard, (b) device and media independent, (c) human and machine-readable, (d) parseable, (e) allows house style to be enforced unobtrusively. The rationale behind the project is that in the past they have not been good stewards of their knowledge base. Delivered on paper, member services are inevitably trashed. With their publications they want to be able to archive, retrieve, send out, and repackage. A small committee has been formed to oversee the project. Sam's background has involved him in desktop publishing and previously he has looked at style-sheets as a means of automating processes. However, the style sheet conceived as a collection of specifications addressing a paragraph lacks the notion of context, leading to anomalous results eg string conversions, in-line formatting. SGML, with its nested elements, therefore appeared to him as a solution to automating formatting problems (or, rather, design solutions). The project aims to show people what is possible ("If you build it, they will come"). The 800 activists cannot be coerced, but can be inspired. Since they are what the industry refers to as "early adopters". AIT's focus has been on demonstration projects - the proof of the concept. Progress so far includes the development of a magazine DTD (for their shared publications). A print format has been designed from this (by a designer). The initial delivery of a print version was seen as easing the credibility problems. Further iterations on the DTD have been done and back issues of magazines have been retrofitted to the current DTD. In addition to the already-produced print versions they have an EBT WindowBook and a full-text retrieval bulletin board. 5.6 "Trainers Panel" [NBC] There was a short session when four speakers gave their tips on how to teach SGML to people. The most enthusiastic response was for the person who used cut-out figures complete with hats and boots to represent different parts of the SGML syntax. Someone else suggested playing hangman using words from the SGML grammar and then getting people to give examples of how the word was used in SGML. 5.7 "Reports from the Working Sessions" [NBC] There were short reports from the two working sessions which had been held on Monday and Tuesday evening. #1: STANDARD PRACTICES - Eric Severson and Ludo Van Vooren In this session people had considered whether a standard "systems development methodology" approach could be applied to SGML implementation. Discussion produced the following task list. (1) Preliminary assessment. Determine the boundaries and scope of the project Using a prototyping approach, define specific milestones, deliverables and management/user sign-off points. (2) Analyse the present product. This includes documenting the current author/edit cycle and critical control points. (3) Define success criteria for the new product. Identify business objectives in implementing the new product. Evaluate the present product's strengths and weaknesses. (4) Develop implementation approach for new product. There are many areas where changes to the existing systems will need to be described. (5) Perform document analysis. Identify a small group of qualified people to perform the document analysis. Express the results of the analysis in a "rigorous English" form, not directly in SGML. Document analysis is not the same as DTD development, and must precede any DTD development. SGML syntax is not a good form to communicate with management and users! (6) Design functional components. Particularly define a strategy for converting existing documents. (7) Implement new system. Select and install software. Create DTDs based on the document analysis. Convert existing data and implement new policies. (8) Perform post-implementation review. Evaluate the new product's success against defined criteria. Refine and tune the new system. #2: A TOOL FOR DEVELOPING SGML APPLICATIONS The purpose of this session was to describe the functionality of an information management or CASE tool to support the development of SGML applications. People had made a first attempt to outline the functions necessary to support SGML application creation and maintenance, including: gathering the application and element level data necessary to create, compare and maintain SGML applications; producing SGML "code" such as DTDs and FOSIs; and, producing SGML documentation such as Tag Libraries and tree structure diagrams. The following functional categories were identified: data collection, report generation, code generation, administration support, and house-keeping functions. Each of these had then been discussed in some detail, generating a lot of ideas about what such a tool should provide under each category Ideas generated by the group in the brainstorming session were categorized as follows: Data Collection Put information in once; Build DTD from tagged instance; Identify attribute value types from example; Import BNF structures; Graphic data Collection; Read DTDs; Store rationale for analysis; Store source of information; Prompt through document or application anaylsis; Prompt through instance building; Prompt through attribute creation; Prompt- based SHORTREF generator; Special character recognition and cataloging; Support Work Group; Support multiple simultaneous users; Group conference-ware capability (electronic conference). Report Generation List elements; List tags; List entities; List attributes and values; List omissible tags; Exception report (inclusions/exclusions); List of FQGIs; Tree structure diagram; Heirarchical listing; Structure diagram; Plain-English Tag Library; Quick Reference Cards; Tag Usage Report; Compare instance to straw-man DTD (simultaneously); DTD-to-DTD mapping; Well-populated sample document; Minimal parseable document; Minimal partial model (eg front matter); Flag identical elements; Flag similar elements; Element source listing. Code Generation Generate SGML Declaration; Build DTD; Build FOSI; Build "straw-man" DTD (based on partial information - for evaluation); Automatic parameter entity generation; Create translation filters; Export BNF structures; Custom editor generation; API for other tools (Application Profile Interface). Administration Support SValidate DTDs; Validate tagged instance; Validate SGML Declaration; Compare DTDs; Implement Architectural Forms; Warnings about bad practice; Identify errors; Error explanations; Find inconsistencies (eg in content models); On- line Help re: SGML standard; On-line Help re: CASE Tool; Search/Find; Find paths in DTD; Access to LINK library. House Keeping Audit trail; History tracking; Version control; Automatic Minimazation; Disambiguator; Error correction; Content Model libraries; Structure libraries; Shared libraries (Among Multiple Applications or Multiple DTDs); Fully functional editor; Multilingual; User tracking. 6 Summary This was a well-attended and lively conference, with more technical content than can be found in the comparable "International Markup" series of conferences that are also organized by the GCA. However, the audience's level of SGML expertise was very diverse, and some of the more rexperienced felt that the conference had been less rewarding than they would have hoped. I would take issue with this view of the conference, as it seemed to me a reflection of the hidden agenda of some attendees to get free solutions/advice on how to resolve their organizations particular difficulties with their implementation of SGML. As a general forum for discussion on some of the technical aspects of SGML, I found the conference to be very useful. I would urge anyone seriously interested in implementing SGML to attend and to present papers or poster sessions if they have encountered a particular problem which could usefully be shared and discussed with other attendees. I will be attending "SGML '92" (Danvers, MA) October 25th - 29th, and hope to produce a succinct and timely report soon after. I apologise to all those who have waited so long for this document to appear. ================================================================= For further details of any of the speakers or presentations, please contact the conference organizers at: Graphic Communications Association 100 Daingerfield Road, 4th Fl. Alexandria, VA 22314-2888 United States Phone: (703)519-8157 Fax:(703)548-2867 ================================================================= You are free to distribute this material in any form, provided that you acknowledge the source and provide details of how to contact The SGML Project. None of the remarks in this report should necessarily be taken as an accurate reflection of the speakers' opinions, or in any way representative of their employers' policies. Before citing from this report, please confirm that the original speaker has no objections and has given permission. ================================================================= Michael Popham SGML Project - Computing Development Officer Computer Unit - Laver Building North Park Road, University of Exeter Exeter EX4 4QE, United Kingdom Email: sgml@exeter.ac.uk M.G.Popham@exeter.ac.uk (INTERNET) Phone: +44 392 263946 Fax: +44 392 211630 =================================================================