This document is hosted as a local archive copy from the official and canonical URL, http://www.gca.org/conf/mt98/papers.htm; please refer to the canonical source document if possible. Several links/anchors have been added here. Last updated: January 11, 1998.
AGENDA & SCHEDULE
SCHEDULE AT A GLANCE
Thursday, November 19
7:30 - 5:00
Quin, Links in XML: Detection, Representation, & Presentation Bidoul, Transactional Approach to SGML Storage
Hsu et al., Authoring, Managing, and Browsing Hyperlinks in Multimedia Docs Mc Nally, Building a DTD Maintenance System
Turner, Managing Knowledge with XML Øverby, The Challenge of Implementing SUBDOC
McDonald, Processing Composite Content Sujansky, Williams, and Toback, Sharing and Reuse of Electronic Medical Records
Simons, Automated DTD Subsetting Using Architectural Processing Son and Johns, SGML/XML Electronic Commerce and Document Management System
Ramalho, Lopes, and Henriques, Generating SGML-specific Editors Duluc, Multimedia and Aeronautical Technical Documentation 8:00 - 10:00pm Smith, Bosak, Goldfarb et al., Standards Update Markup Related Activity at ISO & W3C: Overview and Work Group Summaries
Friday, November 20
Graham, Unicode: What Is It and How Do I Use It? Leal Portela, SGML Joins the Battle Against International Art Robbery
Streich, Name Resolution Using an LDAP Directory Service Butler and Fisher, Moving from SGML to XML for Delivery of Content-Rich Text
Birnbaum, In Defense of Invalid SGML Fontaine, SGML Boosts Standard Document Processing for a Major European Bank
Arnold-Moore and Sacks-Davis, Models for Structured Document DBMS Shifrin, Rapid Reporting:
A Case Study
Bray,Dealing with Multi-Dimensional Text
4:00 - 4:45
Usdin,View from the Chair
Thursday, November 19, 1998
Plenary (Nov. 19)
9:00 - 9:15
9:15 - 10:00
Brian Reid, Digital Equipment Corporation
Brian Reid's work with markup systems began in the 1970s. He independently invented and implemented descriptive markup and developed its theory. His Scribe system may have been the cleanest separation of structure and format ever built. His dissertation on it was already complete in 1981, the year he presented in Lausanne in the same session where Charles Goldfarb publicly presented GML; SGML was proposed about a year later. In recent years Reid has turned his attention to network systems and the internet.
This keynote address ("20 Years of Abstract Markup - Any Progress?") was a reflection upon the early development of descriptive markup based upon a presentation made at the Conference on Research and Trends in Document Preparation Systems at Lausanne, Switzerland, February 27-28, 1981. Many of the slides in Reid's Chicago keynote presentation were taken from the 1981 paper, "The Scribe Document Specification Language and its Compiler."
The visuals from Brian Reid's keynote at the Markup Technologies '98 conference, given 19 November 1998, are available online in PowerPoint 97 format. Please note that the file is almost 10 megabytes - it has 25 full-page scanned images in it. [local archive copy, 1999-01-04]
Say What You Mean, Mean What You Say: Toward New Techniques for Defining Document Types
C. M. Sperberg-McQueen, University of Illinois at Chicago.
Many formal systems distinguish between semantic correctness, which generally requires human intervention, and formal validity, which can be checked by suitable software. XML introduces the even simpler concept of well-formedness, checkable by very simple software. By simplifying the construction of the document tree, well-formedness may make it feasible to define much more powerful constraints for document types than have been feasible before. Some problems relating to data typing and to formulating constraints for document grammars will be discussed.
Theory Track (Nov. 19)
11:00 - 11:45
Links in XML: Detection, Representation, and Presentation
Liam R. E. Quin, GroveWare (Canada)
The Extensible Markup Language (XML) supports the markup of hypertext links, and uses links to associate disparate components of compound documents. A number of specifications associated with XML introduce their own forms of linking. Furthermore, there are linking conventions in both computer-based information display systems and in paper-based typographical layout. The basic ideas of many of these linking systems can be described in terms of a unifying abstract model based on the concept of linking functions which map from link sources to link targets. A single minimal terminology makes it possible to compare different linking systems directly.
Note: As of December 15, 1998, a preprint version of the presentation was available at http://www.groveware.com/~lee/papers/markup98conference/links.html, and a 'final text' version was planned for release. Quin wrote [xlxp-dev, 1998-12-15]: "My goal was to show that thinking of links as maps between sets (i.e., as mathematical functions) is a useful and productive alternative to thinking of a hyperverse (?) as a set of nodes with links between them. The main difference is the sticks-and-nodes model one is purely static, and can't easily cope with links that create the node on the other end on the fly (for example) or that have an unspecified number of end points." [local archive copy, preprint version]
11:45 - 12:30
Authoring, Managing and Browsing of Large-scale Hyperlinks in Multimedia Product Documentation
Liang H. Hsu, Peiya Liu, Ben Johnson-Laird, and Subramanyam Vdaygiri, Siemens Corporate Research
Describes an R&D project for automatic hyperlinking of SGML documents. The hyperlinking process is based on formal specifications written in Hyperlink Specification Language (HSL), which represents the patterns and context of the required links. Link specifications are created using an interactive link editor. The automatic capturing of link specifications, automatic hyperlinking of documents, hyperlink management for supporting both incremental hyperlinking and logical-to-physical destination mapping, and browsing through both integrated and media-specific viewers are described.
2:00 - 2:45
Managing Knowledge with XML
Ronald Turner, Soph-Ware Associates
Entity management, content management, and knowledge management are not interchangeable terms. They represent three well-defined layers within an information architecture. We are therefore not at liberty to call any managed data set a knowledge management system. Managing entities storage objects is a basic requirement and should be an out-of-the-box feature of a robust, scalable XML/SGML-aware document management product. To manage content requires exposing that content to its application-specific processor. But for the system to qualify as knowledge management according to established definitions, it must provide for live human interaction with the content. XML contributes to entity management, and it is an absolute requirement for Web-enabled content and knowledge management.
2:45 - 3:30
Processing Composite Content
Marc McDonald, Design Intelligence
Describes the novel Facet Technology, a mechanism for handling some of the problems of defining composite markup that are also addressed by namespaces specifications and Architectural Forms. After introducing the concepts and terminology of Facets, examples are worked out diagrammatically to illustrate the concepts and in an XML syntax to illustrate how processing might be performed.
4:00 - 4:45
Using Architectural Processing to Derive Small, Problem-specific XML Applications from Large, Widely-used SGML Applications
Gary F. Simons, Summer Institute of Linguistics
Abstract: The large SGML DTDs in widespread use (e.g. HTML, DocBook, CALS, EAD, TEI) offer the advantage of standardization, but for a particular project they often carry the disadvantage of being too large or too general. A given project might be better served by a DTD that is no bigger than is needed to solve the specific problem at hand, and that is even customized to meet special requirements of the problem domain. Furthermore, the project might prefer for the data it produces to meet the different syntactic constraints of XML conformity. This paper demonstrates how architectural processing can be used to develop a problem-specific XML DTD for a particular project without losing the advantage of conforming to a widely used SGML DTD. As an example, the paper develops a small XML application derived from the Text Encoding Initiative DTD. The TEI Guidelines offer a mechanism for building TEI-conformant applications; the paper concludes by proposing an alternative approach to TEI conformance based on architectures.
Keywords: computing, humanities computing, SGML, XML, architectural forms, DTD design, conformance of derived DTDs, TEI (Text Encoding Initiative), lexicography, dictionary, Sikaiana language, Solomon Islands
An online copy of this paper is available in the SIL Electronic Working Papers Series.
4:45 - 5:30
Generating SGML-specific Editors: From DTDs to Attribute Grammars
José Carlos Ramalho, Alda Reis Lopes, and Pedro Henriques, University of Minho (Portugal)
Many SGML parsers are implemented using traditional syntax-directed translation; this provides good performance for structural validation and batch processing. Problems emerge when we change the goal or the processing context for example, to build an extension for semantic validation, or to validate online instead of batch. In the early 1970s, a newer paradigm of semantics-directed translation, based on the formalism of attribute grammars, caught the attention of compiler developers. We have developed a DTD editor that generates attribute grammars which correspond to the DTD being edited from which, in turn, it is possible to generate a specialized editor for the specific document type. We conclude with a glimpse of the intended environment, which will include a style editor and a semantic editor as well as the DTD editor described.
. . . An attribute grammar (AG) is a well accepted formalism used by the compiler community to specify the syntax and semantics of languages. Introduced by Knuth, the AG appeared as an extension to the classic CFG (context-free grammar) to allow the local definition (without the use of global variables) of the meaning of each symbol in a declarative style. Terminal symbols have intrinsic attributes (that describe their lexical information) and Nonterminal symbols are associated with generic attributes; semantic information can be synthesized up the tree (from the bottom to the root), but can also be inherited down the tree (from the top to the leaves), enabling explicit references to contextual dependencies. . ."
8:00 - 10:00 PM
Standards Update Markup Related Activity at ISO and W3C: Overview and Work Group Summaries
Moderator: Joan Smith, SGML Technologies Group (US)
An overview of the active work items related to markup in the ISO and W3C committees and working groups. Jim Mason and Jon Bosak will provide context and guidance. Representatives of some key groups will introduce their major focus issues and plans/expectations for their next deliverable(s).
Practice Track (Nov. 19)
11:00 - 11:45
A Transactional Approach to SGML Storage:
Why You Should Ask for More from Your Repository
Stéphane Bidoul, SGML Technologies Group (Belgium)
Most SGML (Standard Generalized Markup Language) repositories are heavily oriented towards document storage. Because of this, there is a tendency to have an interface that is based on a check-out/check-in mechanism of documents or parts of documents. Such an interface is very well adapted to the way in which humans work when interacting with the repository. However, when SGML is considered as a data modelling language, and the stored data gets more complex, the document-oriented check-out/check-in approach becomes inappropriate as a data manipulation language.
In this paper the benefits of a transaction-based interface to an SGML database are presented, along the lines of the update capabilities of traditional databases. Several real-world applications of this mechanism are described. An interface of this type is then presented, and it is shown why this is a very flexible way to access any SGML database, including document-oriented information bases.
Bidoul's paper is available online in HTML format. [local archive copy]
11:45 - 12:30
Building a DTD Maintenance System
David Mc Nally, Matthew Bender & Company
Describes how you maintain a set of DTDs, and allow them to change when necessary, without compromising data and system integrity. Matthew Benders conversion of over 2 million pages of published data took almost 3 years, and we are now in the process of instituting a DTD maintenance system. Describes the functionality of such a system, including DTD versioning and the justification, management, dissemination, and documentation of changes. In addition, discusses the various ways we want to change DTDs, the need to support changes that lay the groundwork for future products and systems, means of fostering understanding and involvement with stakeholders, testing methods, and the attitudes toward the DTDs during the Conversion Project as contrasted with the changed attitudes during the DTD Maintenance phase.
2:00 - 2:45
The Challenge of Implementing SUBDOC: With Some HyTime Support
Erlend Øverby, University of Oslo (Norway)
At the University of Oslo, we have been working with SGML since 1992. Currently, we have over 120 authors maintaining approximately 1000 SGML files. We have found the SUBDOC feature a useful way to manage SGML materials that must be edited as freestanding units, but that are combined, for publication, with other materials. SUBDOC simplifies our use of marked sections for conditional text, and eliminates many ID/IDREF name conflicts. Simple HyTime linking helps us manage cross-references between subdocuments effectively.
2:45 - 3:30
Sharing and Reuse of Electronic Medical Records via SGML
Walter Sujansky, Jason P. Williams, and Michael Toback, Oceania
Widespread access to, and multi-faceted use of, medical records is a driving force in medical informatics research and commercial development. The ability to share and to reuse electronic medical records depends on appropriate knowledge-level representations of clinical information that existing technologies do not yet support. A number of groups are exploring the use of SGML and XML as representation formats that better support sharing and reuse. Our experiments in sharing SGML-encoded clinical documents among many users via World-Wide-Web technology (based on a commercial clinical information system) have produced initial favorable results; it is possible to reuse SGML-encoded clinical documents for data analysis via a two-step query and report-generation process. More research on the appropriate design of SGML document structures to support sharing and reuse is needed.
4:00 - 4:45
SGML/XML Electronic Commerce and Document Management System
Nam Jin Son, ISOGEN International, and Betty Johns, Society of Petroleum Engineers (SPE)
The Society of Petroleum Engineers (more than 50,000 engineers, scientists, and managers in the oil and gas industry) produces an international schedule of conferences, exhibitions, and workshops as well as journals, electronic publications, books, and courses. SPE needed a system that allowed member access to an article repository containing articles from the most current back to 1951. The SPE solution required the development of an architectural DTD; complex conversion scripts for capturing legacy data; well-defined conversion criteria for incoming articles; a DMS to manage the articles; an SGML/XML authoring, editing, and publishing environment; and (most importantly) a storefront mechanism for real-time purchase and download with the convenience of electronic commerce.
[Conclusion:] In cooperation with ISOGEN International Corp., SPE members now have unlimited Web access to SPE's "Master Disc On-Line", SPE's comprehensive index of more than 30,000 technical papers. No longer limited to an archaic word search, SPE technical papers searches are based upon document structure or specific content fields as well as text strings. The implementation of intelligent indexing, searching and navigational capabilities ensure that members have fast and easy access to the information they want. Additionally, the first page or abstract of the paper can be viewed as native HTML, a dynamic transformation from SGML source, by hyperlink from the search result. Upon selection, the full article(s) can be ordered on-line supported by "on-line shopping cart", real-time credit card authorization, dynamic creation of "DOWNLOAD" area, and on-line confirmation and update services to both the DMS and SPE's financial systems.
4:45 - 5:30
Multimedia and Aeronautical Technical Documentation: New Challenges, and New Issues
Franck Duluc, Aerospatiale (France)
Aeronautical technical documentation is bulky, complex, and regulated. The problems that have been solved for traditional electronic technical documentation (text and graphics) are now extended by the new types of information associated with multimedia. Most of the problems arise from a combination of the new media (videos, audio, and animation), documents revised every three months, and the use of document repositories to manage and assemble document fragments. Discusses the problems of synchronization and multimedia presentation and creation, particularly as applied within the environment of a repository. Concludes by giving views on how markup technologies can be used for solving our different issues.
8:00 - 10:00 p.m.
Moderated by Joan Smith. Speakers to include Jon Bosak on the status of XML and Dr. Charles Goldfarb on the status of the SGML standard.
Friday, November 20, 1998
9:00 - 9:45
Unicode: What Is It and How Do I Use It?
Tony Graham, Mulberry Technologies
The rationale for Unicode and its design goals and principles are presented. The correspondence between Unicode and ISO/IEC 10646 is discussed, and the scripts included in the two are described. Examples of how to specify a Unicode character in a variety of applications are given, and the use of Unicode in SGML and XML applications is discussed. Concludes with descriptions of the character encodings used with Unicode and ISO/IEC 10646.
The full text of this presentation is available online from Mulberry Technologies, Inc.; local archive copy.
9:45 - 10:30
Name Resolution Using an LDAP Directory Service
Robert Streich, Schlumberger
The Lightweight Directory Access Protocol (LDAP) is a lightweight front end to the X.500 Directory Access Protocol; LDAP clients can use most features of X.500 without incurring the heavy cost of the DAP network protocols. On top of an LDAP directory, we can build a resolution service for Uniform Resource Names (URNs). The LDAP protocol and the directory structure associated with it provide many of the features necessary for building and managing a general-purpose URN resolution service.
11:00 - 11:45
In Defense of Invalid SGML
David J. Birnbaum, University of Pittsburgh
When using SGML to encode new texts, adherence to a DTD ensures that the resulting documents will observe a coherent structure. For example, if a dictionary is divided into lexical entries, which consist of a keyword, followed by a phonetic transcription, followed by examples, then users who create new dictionaries based on this DTD will be required by their SGML validating tools to include exactly one keyword, followed by exactly one transcription, followed by at least one example for each entry. Problems arise when creating an electronic edition of an existing print document, such as the Oxford English Dictionary (OED), which contains structural errors as a result of the fallibility of human editors. Because the OED is an object of study in its own right, simply correcting the errors would eliminate historical information of interest to scholars. Examines the consequences of anomalous data for SGML and XML DTD development and markup and discusses the advantages and disadvantages of different potential solutions.
11:45 - 12:30
Models for Structured Document Database Management System
Timothy Arnold-Moore and Ron Sacks-Davis, Royal Melbourne Institute of Technology (Australia)
A review of the models and terminology of document management databases. Discusses numerous models for handling structured documents at the three layers: the logical or data model (e.g., whole documents versus documents divided into fields or elements); the system or architectural model (e.g., string-based models, relational model, extended relational or object-oriented model, hybrid models); and the physical or implementation layer (such as inverted files, PAT trees, inference networks, etc.). Also presents a taxonomy of queries which might be made on a collection of structured documents in order to discuss how the different models handle the various categories of queries.
Practice Track (Nov. 20)
9:00 - 9:45
SGML Joins the Battle Against International Art Robbery
Jorge Leal Portela, SGML Technologies Group (Belgium)
Many organizations, such as museums, police forces, and insurance companies, are faced with the problem of identification of stolen and recovered objects of art and have difficulties in sharing relevant information. GRASP (Global Retrieval, Access and information System for Property items) is a project which addresses the problem of sharing information by demonstrating how descriptions of objects can be captured, stored in a heterogeneous database, and widely distributed across a network environment.
This paper addresses the issue of how SGML (Standard Generalized Markup Language) was successfully used for numerous aspects of the project, ranging from data storage and specification of the exchange structure, to distributed database synchronization control, in combination with a programmable processing tool.
The paper of Jorge Leal Portela is available online in HTML format; [local archive copy]
9:45 - 10:30
Orlando Project: Moving from SGML to XML for Delivery of Content-rich Encoded Text
Terry Butler and Sue Fisher, University of Alberta (Canada)
The Orlando Project ['An Integrated History of Women's Writing in the British Isles'] is creating a scholarly history of British Womens Writing, in the form of written volumes of criticism and a thematic chronology. The literary history is written by team members and tagged in SGML; the electronic results will be available to end-users through a search mechanism we hope on the Web. The project has composed its own DTDs, borrowing core tags from the TEI and adding new tags for critical interpretation and analysis. XML raises several issues for us. Our DTDs use inclusions extensively not easy to migrate into XML. Our notion of electronic text involves searching, sorting, and extracting text on the fly from several document types; we wonder whether XML tools will be more capable of meeting our needs than the current generation of SGML tools.
Slides from the presentation are available in HTML format (21 slides), and in the PowerPoint source file; [local archive copy]. Presentation title: "The Orlando Project and the Question of Delivery in XML: A PowerPoint Presentation." See the longer list of publications for other description of the Orlando Project.
11:00 - 11:45
SGML Boosts Standard Document Processing for a Major European Bank
Philippe Fontaine, SGML Technologies Group (Belgium)
Abstract: Large banking organizations have always been faced with the problem of distributing information that constitutes professional knowledge to the right people, at the right time, and in the right place. With the ever-increasing arrival of new technologies - information highways - this is a hot topic again. Internet and intranet-based solutions can bring new aspects into the communication of information.
Over and above the scope of distributing information, structured documents that contain professional knowledge have to be produced, validated, stored, maintained, and eventually distributed in a traditional environment that is being turned upside down. This paper shows how an SGML-based solution successfully transforms the dream of managing core business information into reality.
Philippe Fontaine's paper is available online in HTML format; [local archive copy]
11:45 - 12:30
Rapid Reporting: A Case Study
Laurel Shifrin, Matthew Bender & Company
With changes in law and important case decisions occurring nearly every day, legal publishers must continually explore ways to push production time to rock bottom. A legal publishing company reports on its creation of a rapid reporting system, from SGML analysis and design through workflow analysis and management. We use word-processing templates as a simple universal authoring tool, augmented by Perl scripts, Jade, and DSSSL style sheets. Net result: a 90% cut in production process time.
Plenary (Nov. 20)
2:00 - 2:45
Dealing with Multi-Dimensional Text
Tim Bray, TextualityComputers have long been put to good use in processing text, and text processing is one of the main forces driving the widespread adoption of computing. However, many of the most successful technical strategies put to use in processing text have been based on a (nearly) one-dimensional model of text. SGML and XML can sometimes, but not always, be processed in one dimension. This presentation will review some technologies that have proved successful in processing one-dimensional text, discuss the extent to which they remain useful, focus on some problem areas inherent in multidimensional text, and outline some new technical approaches that may prove useful with respect to these problems.
2:45 - 3:30
Parsing - who needs it?
D. Grune, Vrije Universiteit (The Netherlands)
The problems of SGML, HTML and XML are not exactly those for which parsing has been developed. Dr. Grune, an expert on parsers and parser theory, will discuss what kinds of parsing are useful for what kinds of input texts, with an eye to SGML, HTML and XML. Discussion questions include: Isn't parsing an artifact and can't we do without it? (It is not at all obvious that you still need parsing in the classical sense for XML, although it is clear that you do need something.) And in what fields does one need parsing?
Overview: "SGML was clearly not designed to interface seamlessly with existing computer science parsing techniques. Most of these techniques were already very well-known in the beginning of the 80s, when SGML was designed. The syntax issues of SGML in ISO 8879 are expressed in completely novel terminology and the parsing requirements do not match known algorithms and techniques. The results were not favourable: few computer scientists have worked on parsing SGML, although many software designers and programmers have. Most computer scientists found parsing SGML a less than attractive challenge, and some were daunted by its alien terminology. The situation has deteriorated with the advent of HTML, which had a syntax defined more or less by what Netscape or Microsoft could get away with. Unsurprisingly, this did not lead to a strong basis and stable software. The developers of XML got the point, and did it right, both from the parsing point of view, the terminology, and the SGML compatibility." [from the Introduction] Note that the text of Dick Grune's book Parsing Techniques - A Practical Guide is available online. The book "treats parsing in its own right, in greater depth than is found in most computer science and linguistics books. It offers a clear, accessible, and thorough discussion of many different parsing techniques with their interrelations and applicabilities, including error recovery techniques. Unlike most books, it treats (almost) all parsing methods, not just the popular ones. . ."
The slides and text from Dick Grune's presentation are now available online.
4:00 - 4:45
View from the Chair
B. T. Usdin, Mulberry Technologies
Return to GCA's Home Page | Return to Conferences Index
100 Daingerfield Road
Alexandria, VA 22314-2888