Bibliography: SGML '96 Conference Proceedings. Celebrating a Decade of SGML
The bibliographic records in the following document have been created for the volume SGML '96 Conference Proceedings. Celebrating a Decade of SGML. As described in the first record (for the proceedings volume itself): each bibliographic entry includes the published abstract, author contact information, an indication of the "track" in which the presentation was delivered, and additional annotations or relevant hypertext links. The published abstracts for the papers, in many cases, are considerably more detailed than the brief abstracts which accompany the online conference program. This bibliographic information is being moved into the bibliographic database of the SGML/XML Web Page [now complete, March 13, 1997]. As a convenience to readers, it will also be retained online as a single document. Kindly report any errors by electronic mail. Authors are also invited to communicate about updates, revisions, retractions, new URLs, etc.
The SGML '96 Conference celebrated a decade of SGML, reckoned from the first publication of SGML as an ISO standard in 1986. The seventy-eight (78) published papers in the proceedings volume are divided into seven major sections, and represent a majority of the eighty-five (85) papers read at the conference. The collection not only documents an impressive milestone for the ISO 8879 standard, but serves as a valuable resource for SGML users. The SGML '96 conference itself was attended by over 1400 people, and included more than 120 speakers, and 100+ poster sessions in addition to conference sessions and exhibits.
Introductory essays in the proceedings volume are from the conference Co-Chairs B. Tommie Usdin and Deborah A. Lapeyre, and from Charles F. Goldfarb ("The Roots of SGML - A Personal Recollection"). The full inventory of published papers includes: Introductions (3 papers); Newcomer (11 papers), User (21 papers), Expert (16 papers), Business Management (5 papers), Case Studies (16 papers), "And More" (6 papers). The volume has complete title and author indexes. It was produced directly from the SGML source (based upon the "GCAPAPER" DTD) using ArborText's ADEPT Series SGML software.
Most of the published conference papers are referenced (by author) in the online bibliography of the SGML/XML Web Page. Each bibliographic entry includes the published abstract, author contact information, an indication of the "track" in which the presentation was delivered, and additional annotations or relevant hypertext links. The published abstracts for the papers, in many cases, are considerably more detailed than the brief abstracts which accompany the online conference program. The SGML '96 Conference Proceedings volume containing the full text of the papers may be obtained from GCA. GCA may also be reached at: GCA Publications, 100 Daingerfield Rd, Alexandria, VA 22314-2888 USA.
Abstract: "This talk looks at different approaches to introducing SGML, at different perceptions of the language and related technology, and at the changing nature of the audience for SGML. It is for those who are just being introduced to SGML and for those who must now make the case for SGML within their organization or industry."
Abstract: "The paper introduces a new initiative for SGML in the medical informatics industry. It describes the current state of information processing in medicine, gives some of the requirements for a new, SGML-based approach to medical information processing, introduces the group working for the introduction of SGML into medical informatics and gives a brief description of the umbrella medical information standard called HL7 under which the new initiative is working. The paper concludes with a summary of the challenges facing the new initiative and an invitation to all to participate and contribute. Up-to-date information on contacts and programs will be available at the conference session."
Further information on the SGML Initiative in Health Care (HL7 Health Level-7 and SGML) can be found in the main entry of the SGML Web Page.
Abstract: "The annual SGML Conference provides the opportunity to focus on technology, expand our level of knowledge, exchange ideas and experiences with others in similar or related environments. Once the week is concluded, however, we are challenged to sustain the momentum that's been attained. This does not mean that one should only wait for the next year's conference; much can be done in the interim to continue the pursuit of knowledge and exchange of information. Many communities have organized local forums, specifically designed to address these concerns. This talk will focus on some of the major issues in establishing and maintaining such an organization."
Another paper discussing the role and operation of SGML user groups was presented at SGML '96 by Holly Smith.
Abstract: "This presentation looks at using multiple DTDs for different stages in the life of a given piece of information, and examines the issues that should be taken into account when designing DTDs for a given application and deciding just how many DTDs are required.
A number of different models (e.g., a single DTD for the entire process; one DTD for authoring, another for storage, another for output, etc.) are examined, and the pros and cons for each are discussed. These considerations include the costs for each model (cost of maintaining multiple DTDs as well as the transform filters placed between them, versus the inefficiency of authoring with a single huge DTD), as well as the question of 'roll your own' versus using industry-standard DTDs."
Abstract: "Several studies have tried to address the topic of Object Orientation around SGML.
The question asked was too simple and dichotomic; the answer given far too simple 'yes' or 'no'. The SGML application aspect, that is not covered by the standard, was not considered when searching for commonalities.
This paper intends to show that some application architectures coupled with an SGML parser offer an object mechanism with embedded SGML.
The relation between the parsed tokens and the application methods shows that application objects are connected to parsing objects in a simple and efficient paradigm which fully conforms to the LINK feature of the SGML language.
Adopting this view of an SGML application, makes all the facilities offered by the LINK feature suddenly self-evident and useful."
Session Abstract: "The DSSSL (Document Style Semantics and Specification Language) Online session will consist of a 45 minute orientation session followed by two or more hours of interactive discussion and a demonstration of Jade, a DSSSL engine. Since the basic motivation behind dsssl-o is the application of semantics to generic SGML documents served out over the Internet, some time will be spent reviewing the case for SGML on the Web and the need for semantic specification methods beyond those being currently developed for HTML before presenting the Application Profile itself.
It is assumed, but not required, that session participants will have already gained some familiarity with the DSSSL standard. The DSSSL tutorial on Sunday, November 17, is highly recommended for persons planning to attend the DSSSL Online workshop."
Abstract: "There are three major components to an SGML Document - the SGML Declaration, Prolog and Document Instance. An understanding of their roles, their inter-dependencies, and their arrangement within a practical working environment is essential for all users of SGML based systems. As well as describing the purpose and content of each major component of an SGML Document, this paper explains how they are managed by an entity manager, and how they integrate with a parser."
Abstract: "Extensible Markup Language (XML for short) is being designed under the auspices of the World Wide Web Consortium; the larger goal of this effort is 'to enable future Web user agents to receive and process generic SGML in the way that they are now able to receive and process HTML. As in the case of HTML, the implementation of SGML on the Web will require attention not just to structure and content (the domain of SGML per se) but also to link semantics and display semantics.' [from the W3C 'Activity' Page] As a subgoal, we are creating an SGML application profile, XML, that is designed to provide many of the benefits of SGML in a lightweight, easy-to-use, easy-to-implement dialect that omits many of the difficult or problematic features of the full standard. This paper is an interim report on the progress of the work on creating an XML specification. This work is proceeding rapidly and we anticipate a draft of the specification being available at the time of SGML '96."
Further information on XML is available in the main XML entry of the SGML Web Page.
Abstract: "A variety of SGML authoring and editing tools exist on the market today and new ones are being added all the time. Initially, there seemed to be the need for only one type of tool but as a result of market need there are now a number of different 'flavors' each best suited for a particular SGML application.
This session will discuss the role of SGML authoring within a total publishing system. It will also describe the various types of tools available today for editing and authoring and what broad category each fits into in terms of its 'flavor'. A list of all known authoring and editing tools will be provided."
Abstract: "The British National Corpus (BNC) is a rather large SGML document, comprising some 4124 samples taken from a rich variety of contemporary British English texts of every kind, written and printed, famous and obscure, learned and ignorant, spoken and written. Each of its hundred million words and six and a quarter million sentences is tagged explicitly in SGML and carries an automatically-generated linguistic analysis. Each sample carries a TEI-conformant header, containing detailed contextual and descriptive information, as well as more conventional SGML mark-up.
The corpus was created over a four year period by a consortium of leading dictionary publishers and academic research centres in the UK, with substantial funding from the British Department of Trade and Industry, the Science and Engineering Research Council, and the British Library. It is currently available under licence within the European Union only, where it is increasingly used in linguistic research and lexicography, in applications ranging from the construction of state of the art language-recognition systems, to the teaching of English as a second language.
This paper begins by describing how the corpus was constructed, and gives an overview of some of the SGML encoding issues raised during the process. A description of the special purpose SGML aware retrieval system developed to analyse the corpus is also provided."
See a longer abstract [mirror copy], and an online version of the SGML '96 presentation: Using SGML for Linguistic Analysis: the case of the BNC [mirror copy, pis aller, but see the canonical source if possible].
Abstract: "Currently, most mathematics DTDs in widespread use are presentation-based, that is the markup relates to the layout of the mathematics on the page or screen rather than to the mathematical content. Such an approach makes the interchange between different SGML applications, and between SGML applications and computational applications, very difficult. This paper proposes a semantics-based DTD for mathematics, and describes a mechanism for selection of the particular branch of maths in use and extension of the DTD to cover areas of maths not as yet covered. Issues related to presentation, and the implications for applications, are discussed. Examples of possible mappings between the DTD and notations used by a typical computational program are given.
The meeting of the ISO 12083 committee in Munich in May 1996 accepted the proposal as the basis for the Mathematics fragment of the coming revision of the 12083 Standard. The paper reviews the issues raised and the resulting implications for the Mathematics fragment.
Significant progress has been made since the Munich meeting. The DTD has evolved following comments and test cases sent to the authors. Contacts with other interested organisations, such the OpenMath consortium and the W3 HTML mathematics group have been pursued."
Abstract: "Generation of SGML-coded documents as a result of database query processes is a commonly used practice. In most cases, however, the contents of such documents are entirely built from scratch as an SGML-formatted image of the query results. We present an extension to this practice, in cases when documents are made of a combination of human-generated parts and database originated parts. When such documents are updated, human-generated parts should remain untouched, while database originated parts (text, tables and graphics) should be regenerated or updated.
The method used here is that of SGML templates, which embed links targeted to a database. Such a technique can be used in many application fields, ranging from Web applications to industrial catalog publishing, where complex, human-generated document structures coexist with database extracts."
Abstract: "Organizational decisionmaking patterns determine SGML investment strategies and potential benefits. A framework for understanding the primary policy objectives that can influence the selection of SGML (inherent policy effects) and application design (user-defined policy goals) will be presented. Competing and often contradictory goals and perceptions of value often make the development of a business case for SGML very difficult. Methods for integrating stakeholder principles, interests, and expectations in the early stages of application conceptualization and design will increase real and perceived benefits and de-fuse potential political problems before they develop."
See the bibliography entry for a related article by Kurt Conrad, "SGML, HyTime, and Organic Information Management Models.".
Abstract: "This paper/presentation is an update of the one which was delivered at SGML'95. It is intended to be a general introduction to the issues and concepts involved in the selection of software tools for the electronic delivery and retrieval of SGML (Standard Generalized Markup Language) documents. In addition, some of the issues unique to publishing to CD-ROM or via the World Wide Web will be explored."
A similar version of this paper is available online: "Tools for Implementing SGML-Based Information Systems: Viewers and Browsers, Text Retrieval Engines, and CD-ROMs," based on a paper which was presented at SGML'95, December 4-7, 1995 and published in the conference proceedings. URLs: http://www.3-cities.com/~conrad/delivery.htm, [mirror copy].
Abstract: "The joint Air Transport Association/Aerospace Industries Assn (ATA/AIA) Graphics Working Group has developed a specification for Intelligent Graphics (IGEXCHANGE) to support the interchange of graphical application structures containing information which is non-graphical in nature. This paper will cover the development of industry requirements for intelligent graphics, describe Amendment 2 to the Computer Graphics Metafile (CGM) Standard developed to support application structuring of graphics, and describe the ATA industry profile of that standard. In addition, the use of SGML syntax to describe attributes associated with application structures will be discussed."
See: ATA profile -- ATA Specification 2100 Graphics Exchange, and EPCES relationship to ATA 2100 for other information on the ATA profile.
Abstract: "Consleg Interleaf is an example of an SGML application that is used in a production environment. On a daily basis, operators use the application in order to provide lawyers from the European Community with the most accurate information on the existing legislation. As such, it is an application that illustrates how the SGML concepts can be applied in order to obtain a sophisticated document handling system."
Abstract: "There has been much discussion as well as work accomplished worldwide regarding the adoption of SGML as a methodology for the development of information standards in the pharmaceutical industry. This paper describes an example of how SGML-based tools that exist today were used to produce a complete Supplemental New Drug Submission for the Health Protection Branch, Health Canada. The submission was SGML browser-based, running on a Windows 3.1 PC. The system allowed the reviewer to navigate and comment electronically on all the textual documentation, clinical data and Case Record Form images required for the submission, and compiled all comments and relevant information collected during the review process for use in the reviewer's report. Summary tables were linked to the underlying clinical data from the browser so that tables could be verified, the underlying database queries modified and analysis redone as the document was reviewed.
A paper-based submission was made simultaneously to Health Canada to satisfy legal requirements. The electronic version used the same SGML-based instance as the paper, ensuring a one-to-one correspondence between the paper and the electronic data. This made possible, for example, the generation of Hytime hyperlinks for the table of contents and other cross-references required for navigation of the electronic version without any additional authoring or manual markup. The relative ease with which the source documents were taken from the authoring to the publishing phase greatly facilitated the incorporation of late changes to the submission resulting from electronic and manual in-house review.
It was concluded that the adoption of electronic document management and review techniques early in the submission development process greatly enhanced the quality of the final document which also eliminated unnecessary review delays at the government agency due to missing or inaccurate information. Although a government approved DTD was not available for this project, it is clear that the ability to be able to parse a document before submitting it for review is enough justification alone for using SGML in the standardization of information of this type."
Abstract: "FORMEX (FORMALIZED EXCHANGE) is one of the very first initiatives that adopted the SGML notation. Initially designed around the UNESCO CCF standard (COMMON COMMUNICATION FORMAT), the original FORMEX specification (1986) and its first revision FORMEX V2 supported both notations. This year, the Office for Official Publications of the European Communities EUR-OP released a new version of FORMEX V3 which incorporates more than ten years of experience in the SGML field. FORMEX V3 is based exclusively on the SGML notation and SDIF is the communication standard encapsulating the exchange of data. Though the FORMEX specification is able to support any kind of document, it has a specific target: Legal Publications. The set of tags exhibited in FORMEX V3 is highly semantic and can be combined into a wide variety of legal publications doctypes. FORMEX V3 is the basic mechanism of the EUR-OP editorial work and information exchange. The global workplace is articulated around specialised workshops, handling production, housekeeping, consolidation of law, etc. The consistency of the system is a reference database which links the different workshops logically for document production, archiving and distribution. Whenever required, images are embedded in the SGML tagging. The SGML-structure information is distributed via different media and can be targeted for different users. Workshops for authoring, translating, editing and proof-reading, indexing and cataloging, etc. can be specific systems; some high co-operative workshops are connected to more than 500 workstations. EUR-OP releases the FORMEX V3 specification as a PUBLIC tagging scheme that can be shared by many EUROPEAN and non-EUROPEAN legal and governmental publishers."
Free SGML software is available to create document instances and SGML processing applications, as well as to analyze complex DTDS. This article describes the origin and use of SGMLC-Lite, Near and Far Lite, the PSGML add-in to the Emacs text editor, the NSGMLS parser, Earl Hood's perlSGML tools, and the sgmls.pl and SGMLS.pm perl application development tools. This paper is excerpted from the book "SGML for Free," available soon from the Prentice-Hall Charles F. Goldfarb Series on Open Information Management."
Bob DuCharme maintains an online resource entitled "DBMS Support of SGML Files." It includes "information collected about database systems that present themselves as reasonable solutions for storing SGML data." See: http://cs.nyu.edu/cs_alumni/duchar96/sgmldbms.html
Abstract: "We consider the syntax and semantics of the TL (Transformation Language)in the DSSSL (Document Style Semantics and Specification Language) specification (DSSSL96). At present TEs (Transformation Expressions) are less than first-class language objects - they must all reside at the top level, and cannot be manipulated like other DSSSL/Scheme objects. In particular, there is no means of passing information among TEs, so one TE cannot take advantage of information derived by another, such as passing data about parent nodes to direct the transformation of child nodes. We propose extending the DSSSL syntax to allow a DSSSL program to better exploit the tree-like nature of the source grove by providing a semantics for nesting query expressions, allowing information to be passed around while retaining DSSSL's functional nature. The TEs would also come closer to being first-class objects. We suggest these extensions will make DSSSL programs easier to write and probably easier to optimize."
Abstract: "In this paper we report the use of SGML for the documentation of highly structured engineering data in the telecommunication area. These structures are built by using a method, called Macro Modeling Technique. Macro Modeling Technique provides means for structuring the information about complex technical domains in a most unambiguous and nonredundant way. Models built by using Macro Modeling Technique are highly modular and can be refined and aggregated without overlap. The models also allow very precise access to engineering information because of their elaborated detailed structures.
It was a challenge to use the SGML language to map structures of the Macro Models onto document structures and support certain operations on a model within a document. For this purpose we have defined an unambiguous mapping from our models to content-oriented DTDs. We have developed a systematic approach to construct specifically tailored DTDs by combining parts of various model-based DTDs.
We have successfully applied this approach to the documentation for large systems in the telecommunication area and we implemented a prototype version of the required operations."
Abstract: "During development of our first-generation online documentation conversion and delivery system, we addressed most of the obvious problems and requirements we foresaw. After the system was in place, we discovered other less obvious areas for improvement. We implemented the changes in a second-generation system and are planning additional changes in a third-generation system. This paper addresses the plans and realities of each of these systems."
Abstract: "Lately it seems that everyone is talking about HTML. Some of you SGML `96 attendees may believe that this hot topic has nothing to do with SGML. Some of you may believe it has everything to do with SGML. And the rest of you may not be sure whether it's relevant or not.
Whether you consider HTML as a critical element of your information delivery strategy or not, you are probably reacting to .... thinking about ... being asked to put your SGML content on an intranet. This brings up many challenges: how to track revisions, how to manage relationships and links between objects, how to reuse information effectively and efficiently, and how to retain your investment without transforming to HTML.
Getting the most out of your SGML source means exploiting your investment by using that source as the same source for your intranet delivery needs. There's a big payoff in combining HTML, SGML, document component management and internet technologies to achieve a diversity of document products, increase quality of customer service, and ensure accuracy and timeliness. Imagine automatically assembling pieces of information which exactly matches a customer's need, and delivering the most up-to-date information in the form and format requested. Achieving this is possible today.
To help you achieve this 'jackpot' of capabilities, this presentation will:
- describe the need and business case for intranets
- identify a roadmap for exploiting SGML
- list key capabilities of such a system
- identify key technologies that should be integrated
This presentation, aimed at a managerial audience, will examine the aspects, value and impact of several real-world intranet applications. It will describe the relevant technologies and offer guidance on enabling your current technology investment to drive this new type of information delivery. It will also discuss critical features and functions of such a system. You will leave this presentation with a deep understanding of how to build a complete information delivery strategy."
This presentation was the text of a keynote address at SGML '96, and is printed in the Introductory section of the proceedings volume.
From the Conclusion: "I like to think of the history of SGML as - what else - a tree structure. One root - from Rice to GML to my basic SGML invention - joined at the base of the trunk by the other - Tunnicliffe to Scharpf and GenCode. The trunk, of course, is the extraordinary 8-year effort to develop ISO 8879, involving hundreds of people from all over the world. The products and tools that came after are the branches, the many applications the leaves, and they are all still growing.
And in all these 30 years, while the technologies of both computers and publishing have undergone overwhelming and unpredictable changes, the tree continues to bear the fruit that I described in 1971:
The principle of separating document description from application function makes it possible to describe the attributes common to all documents of the same type. . . [The] availability of such 'type descriptions' could add new function to the text processing system. Programs could supply markup for an incomplete document, or interactively prompt a user in the entry of a document by displaying the markup. A generalized markup language then, would permit full information about a document to be preserved, regardless of way the document is used or represented."
Note: The SGML '96 Conference Proceedings volume containing the full text of the paper may be obtained from GCA.
Abstract: "There are many hypertext authoring tools available for specific outputs. For example, software that enables HTML or Windows Help authoring. These tools provide easy to use solutions for specific outputs, but they lack the benefit of a tailored, structured environment, and of course they do not allow the creation of multiple outputs from raw content stored as SGML -- a requirement we have at Novell.
However, these tools provide distinct advantages to the author that an SGML-based authoring system should strongly consider. To ignore these capabilities is to risk the SGML system being unusable, or incapable of handling large hypertext projects. These advantages center around the management of small information objects we call topics, and the links between them that are inherent in hypertext systems. To combine the power of SGML with the advantages of off-the-shelf authoring tools, Novell has developed a hybrid, named HelpWise. Novell's goal with HelpWise is to leverage the benefits of a structured SGML authoring system, and retain the link management that is crucial while creating hypertext documentation."
Abstract: "Using simple examples concentrating on five characters from an exotic character set, the author shows techniques for describing a document's character set in the SGML Declaration and how different document character sets are treated by the parser. The presentation concludes with examples of how the techniques are used in real life."
Available online in HTML format: "Document Character Sets by Example", by Tony Graham, Consultant, Mulberry Technologies, Inc.
In the category of "Free SGML Transformation Tools" are free software packages "for transforming an SGML instance into something else, be that another SGML instance or a file in some other format." Graham discusses "the criteria for selecting an SGML transformation processing tool."
Abstract: "Book publishing is a conservative industry that relies on a tried-and-true process, characterized by a strong division between 'editorial' functions (obtaining and preparing manuscript) and 'production' functions (turning manuscript into printed books), a division commonly known as 'the wall'. SGML has been relegated to the production side in most implementations. While there is much to be gained here, this limited approach also involves a considerable sacrifice of potential benefit. This paper presents a blue-print for maximizing the benefits of SGML in a commercial book-publishing setting by showing how SGML can be leveraged on both sides of the wall, with consideration of practical implications for both process modification and the implementation of technology.
"The proposed approach for taking full advantage of SGML in a publishing setting involves the mark-up of manuscript not at the stage when it is traditionally keyboarded for typesetting, within production, but at the intake stage. Because there is no way to enforce author compliance with an SGML authoring strategy, it must be handled on submission by an 'intake unit' that is under the control of the editorial departments. This association allows those responsible for DTD creation and initial tagging of manuscript to be in direct contact with those whose job it is to dictate the structure of the documents, and who are most familiar with its content.
"Further, this connection makes the editors in charge of decisions regarding repurposing (electronic versions of existing titles on Web or CD-ROM) and reuse (ancillaries, subsequent editions) directly aware of the potential of SGML to help ease the costs of these (often low-profit) publications. If properly implemented, they also avoid the need to learn the more arcane and unfamiliar aspects of SGML; they can rely on their own staff (NOT answerable to the head of production) to supply them with the necessary technical guidance. The 'structured manuscript' allows the automation of repetitive and labor-intensive tasks in the development process, while making sample material readily available for delivery in print or on the Web for early promotion efforts and expert review.
"By the time the manuscript passes to production, many time-consuming production chores (typecoding, identification of ambiguous structural elements, consistency checks) have already been performed. The editorial departments are brought into closer touch with the realities of scheduling (a constant bone of contention between editorial and production arms), while the production department can now create the printed book at a much accelerated rate, again through automated processes enabled by SGML.
"The introduction of the SGML 'intake unit' into what is traditionally a non-technical branch of a publishing company could be a difficult change to implement; through the proper use of conversion and authoring technology for both initial tagging and subsequent development (and with appropriately designed document types), many of these challenges can be overcome. The gains realized in giving the power of SGML to those who can best make use of it will also help to enable the success of this tricky aspect of implementation.
"The antagonism between editorial and production units within a commercial publishing company has many negative effects. The proper implementation of SGML in this setting could actually help to ease these antagonisms, by adjusting the responsibilities and power that accompany the use of this technology. At the same time, such an implementation would allow publishers to realize the full promise of SGML, in terms of reuse, repurposing, and in faster time-to-market, not just in the final phases of book publication, but throughout the publication process."
See the bibliographic entry for a related article by Arofan Gregory: "Commercial Book Publishing and Author Control."
Abstract: "Success of legacy conversion might be the single most important determinant of your organization's success in a move towards an SGML environment. It can also be the single most costly aspect of the project. This session's goal will be to dispel the myths. We will present an overview of the key issues and illustrate them with real-life experience. We will discuss: keying vs. OCR vs. software conversion; what software can really accomplish; what you can expect in quality and how you measure it; what a 'ballpark' quote includes and what it doesn't; and how to improve the probability of success.
Data Conversion Laboratory prepares data and text for CD-ROM and Web publishing. Going beyond conversion, DCL specializes in enhancing your legacy documents to meet the new demands of SGML, HTML, PDF, and other structured formats. The company supports all major electronic source formats as well as paper and microfilm."
Abstract: "Information is the raw material from which information products are produced. Nowadays, new information products are needed, including CD-ROM, online databases, World Wide Web pages, and electronic browsers, in addition to printed documents, which impacts production processes. The reasons why SGML is ideal for supporting multiple outputs are discussed. Because of the many process changes involved, it is important to cost justify your SGML project. The three keys to a successful cost justification proposal are: 1) understanding your company's goals, 2) understanding your contribution, and 3) understanding your readers. Return on investment and cost/benefit analysis approaches to a cost justification proposal are discussed. Some formulas for associating cost savings with some tangible SGML benefits are presented."
Abstract: "The Electronic Publishing Solutions department at Northern Telecom (Nortel) transformed product and price publications from paper to electronic media within a short period of time. Electronic publishing radically improved Nortel's ability to control document quality and reduce information time-to-market. This department incorporated many significant production changes, such as:
- The use of Standard Generalized Markup Language (SGML)
- The sourcing of information directly from legacy and new product and price databases
- The distribution of documents in multiple forms, including CD-ROM, Nortel's Intranet, and paper across multiple systems and platforms
Nortel's previous publication production methods required the use of word processors to replicate and edit large product documents. Document publication was dependent on manual entry via word processors across several departments. Data entry errors and constantly shifting page layout due to changes, updates, and deletions created a vicious cycle of self-generated re-work and ever expanding schedules. Generally, information accuracy and update timeliness prevented consistent publication and use of resultant publications.
Publication is now produced directly from an SQL database source using SGML with embedded SQL statements. Both the source and the resultant documents are true SGML documents compliant to ISO 8879 standards. These SGML documents were created without modification of the legacy database. Replacing the existing database structure was not an option because it would have required re-engineering all of the existing processes that use the database. However, by using an internally developed toolset that expands SGML with embedded SQL statements, Nortel is able to produce SGML documents from legacy databases. These embedded SQL queries produce variable-length documents on-the-fly for printing or for display by the common Internet or CD-ROM browser.
Today, using an Internet or CD-ROM browser, Nortel's marketing and production engineers, sales support staff, distribution managers, and external distributors and customers can immediately access accurate product and price information. In addition, on-line access enables users to query and generate live reports dynamically from legacy information so that they can further target desired information. Information is kept up-to-date in an Automated Price Action application that is accessible on the Internet. Product adjustments are introduced for approval via this Internet service, and once approved, changes to product and price databases become instantaneously available for use. Although paper publishing is still required, Nortel anticipates substantial savings in time, labor, and cost by using SGML in a unique way."
Abstract: "SGML, which is used for document interchange among various environment, is a meta language to describe documents. Before marking up a document, we need to prepare a DTD that defines a document structure.
In general, a DTD applicable to diverse document classes is incompatible with a DTD focusing on the semantic features of documents. If the number of DTDs grows, the costs of developing application programs for the DTDs would also skyrocket.
To apply a DTD focusing on the semantic features to diverse document classes, we developed a system which, from a base generic DTD, derives a different DTD for each document class. Our system also has a function that translates derived DTD instances to base DTD instances. This function frees us from the burden of developing application programs separately for each of the derived DTDs."
Abstract: "Implementing SGML can be an enormous task. To be successful, an implementor must have a good technical background in SGML and must have a clear understanding of data flow and SGML system functionality. Gaining a understanding of the key components of an SGML system is critical. This afternoon's presentations are designed to provide the SGML newcomer with an overview of the major classes of SGML tools and a brief review of the products commercially available today. Presenters for this session are independent SGML consultants who specialize in the design and implementation of SGML-based information systems."
Abstract: "Information access for people with disabilities is creating numerous opportunities and challenges within the SGML (Standard Generalized Markup Language) community. Additionally, as a result of the increasing paradigm shift by the publishing industry toward Internet and WWW-based document delivery systems, the importance of producing accessible information using SGML mechanisms has increased immeasurably.
The primary focus of this paper involves the production of electronic documents. However, the key principals involved in the design, production, and delivery of information apply regardless of the document medium.
In this showcase the presenters will: identify major problems in information and software design that deny access, demonstrate successful products that can be used by people with disabilities to access publications, point to resources that assist developers in creating accessible products in the future. The goals of the showcase are to educate participants about accessible electronic text delivery systems, and direct participants toward resources which help them create of choose accessible products."
Abstract: "This paper discusses the issues of SGML re-use and shows why they can only be solved generally through the use of subdocuments. The paper explores the following general issues:
- General text entities are not re-usable
- How to enable interoperation of documents with possibly different document types?
- How to effect the cross-document addressing needed when a single document is composed of many subdocuments?
The SGML standard only defines two object types that can have independent existence: documents and subdocuments. Thus it is clear that only documents and subdocuments can be reliably re-used. In particular, external general text entities are not useful candidates for general re-use. My plea then is for tools to add the functions necessary to support the use of subdocuments for the re-use of semantic fragments. For most applications, such as browsers, this means treating the content of subdocument entities as though it had occurred in a general text entity for the purpose of processing (not parsing). For parsers, it means providing a mechanism to either parse multiple documents in parallel or to suspend the parsing of the parent document while the subdocument is parsed and then integrating the parsing result of the subdocument with the data resulting from the parsing of the parent document. For editors, it means allowing the declaration and editing of subdocument entities. Editors, in particular, may also need to provide ways to define constraints on what document types or architectures are to be allowed for subdocuments in specific application environments (families of DTDs).
I think that these conventions provide a clear and simple way to make the use of subdocuments in general less problematic and more fruitful. The full promise of SGML cannot be realized until the problem of fragment re-use is solved and I am firmly convinced that subdocuments are the key to that solution."
See the online version of the paper: "Re-Usable SGML: Why I Demand SUBDOC", SGML '96 presentation by W. Eliot Kimber of ISOGEN International Corp.; [mirror copy]. An SGML version is also accessible via the ISOGEN server, as well as a package containing HyBrowse styles and instructions for using HyBrowse.
Abstract: "Thompson Legal Publishing has re-engineered aging SGML-based systems to meet current needs. Tools were chosen from solid companies that did not expose the SGML to users, did not restrict the use of SGML in any way, that have the capacity to emulate structure and that have API's. Users now work in an environment that does not force them to place thirty elements/attributes in the data to enter one judicial case citation. Instead, a couple of clicks of the mouse, and in goes the case cite. Our savings in output processing have been enormous; a process that used to take cost $18.00/page now and costs $0.95 per page. The system's simplicity from the user's point of view will be demonstrated, and the complexity of the data created and the resulting flexible output will be shown.
Abstract: "SGML is the logical choice for encoding electronic documents, and Virginia Tech encourages (and will later require) students to submit Electronic Theses and Dissertations (ETDs) in SGML. Our DTD must work with translators as well as be usable for students preparing SGML directly. A usability test for tagging ETDs according to our DTD involved teaching SGML-novice graduate students to code using our DTD, observing them tagging their own documents, and having them narrate their thoughts during the process. Our results show that subjects require high-quality system documentation (replete with examples of correct usage), that learning to author the simplest hypermedia in SGML is inherently nonintuitive, and that our line-edited, batch-processed ETD formatting system is easy to use.
This work was funded in part by the Southeastern Universities Research Association (SURA) 1996 project, 'Development and Beta Testing of the Monticello Electronic Library Thesis and Dissertation Program'."
More detailed information on the Electronic Theses and Dissertations project may be found at: http://etd.vt.edu/etd/. See especially the brief project description [mirror copy], and a related write-up in the September 1996 issue of D-Lib Magazine [mirror copy, December 1996]
Abstract: "Ten years after SGML was adopted as an international standard, more organizations than ever before are investigating its possibilities. The reason is simple. The problems addressed by Total Quality Management in the manufacturing and general service industries are magnified enormously in knowledge work and are much more difficult to address. Accessibility and reusability of information are important, and so are the relevance and applicability of information in a particular problem-solving context. Redundant knowledge creation and information rework waste organizational effort and dollars and have a profoundly negative effect on programs, processes, and systems. To combat redundancy and rework, organizations are seeking solutions in standard tools and standard data representations."
Abstract: "Technical and Management Services Corporation (TAMSCO) and Warner Robins Air Logistics Center/LB/LU Directorate recently began a cooperative effort to develop a more efficient way to manage the data for the C-130 flight manuals. WR ALC/LB/LU recognized the tremendous cost and inefficiencies in managing the existing C-130 data. With the assistance of TAMSCO, this cooperative effort is currently reengineering the existing process for creating, distributing, accessing, and reusing the technical information. By using Standard Generalized Markup Language (SGML), this effort will realize the ability to store and reuse technical procedures more efficiently. The SGML data will be accessible to the end users through an electronic information base both digitally and hard-copy. Using SGML and the AF Standards will bring many benefits and lower maintenance costs. The future success of the USAFs C- 130 Technical Manual program depends on how effectively and efficiently the existing data is identified, maintained, managed, and used."
Abstract: "Two approaches are available for specifying transformation processes on SGML documents: a declarative approach, based on context-sensitive rules triggered on SGML parsing events, and a procedural approach, based on explicit manipulation of the document tree."
"This paper shows that each approach is optimal for a certain class of problems, but that both are actually needed and that maximum expressive power is achieved when both can be combined in a same program."
The document is available online in HTML format: http://www.balise.com/current/articles/lecluse.htm; [mirror copy].
An alternative source for information presented in this paper is the Proceedings of SGML Finland '96; see the paper by François Chahuneau, "Event driven or Tree Manipulation Approaches to SGML Transformation - You Should Not Have to Choose."
Abstract: "Maintaining large amounts of SGML data in separate files on a file system has always been a difficult proposition. Trying to coordinate a distributed workgroup environment is even more difficult. Simple mechanisms such as ID and IDREF can become a nightmare on even small projects. A database environment offers many exciting possibilities for features such as version control, sharing, validation, and distribution. The challenge is to develop a system that is capable of accepting any SGML document and flexible enough to support many different SGML database applications."
Abstract: "Developing SGML applications involves making choices driven by end user requirements and by the availability and functionality of third party SGML parsers, authoring tools, search engines, browsers, and data converters. Capabilities of HTML and the World Wide Web should factor into these decisions as well if users are geographically dispersed or have diverse computing platforms. SGML application developers typically build some or all of the following components: a DTD; legacy data conversion tools; a DTD-tailored authoring environment; a document repository; browsing and searching interfaces; and tools for producing formatted output. For each component, we discuss design and implementation alternatives, the approach we decided to use in building our SGML environment for authoring and accessing STEP product data exchange standards, and our rationale for choosing that approach."
More informtion on SGML and STEP (ISO 10303 Standard for the Exchange of Product Data) is available in the dedicated entry of the SGML Web Page.
Note: The above presentation was part of the "SGML Case Studies" track at SGML '96. The SGML '96 Conference Proceedings volume containing the full text of the paper may be obtained from GCA. See the NIST server for an online version of the document.
Abstract: "This talk will describe how the US National Security Agency, the Central Intelligence Agency, the Defense Intelligence Agency, the National Reconnaissance Office and other top agencies that collectively are known as the United States Intelligence Community are significantly improving their intelligence gathering and reporting operations through the development and implementation of advanced technology including networking concepts and international information standards such as SGML.
The central focus of this talk will be a description and discussion of Intelink, the classified, world wide 'Intranet' for the Intelligence Community. Intelink, and the Intelink community address one of the world's largest data management problems, involving demanding requirements that are at the extreme of what normal enterprises require.
Intelink is now operational for a broad base of intelligence customers and consumers from the warfighter to the White House. Intelink is currently being used in support of several basic and key functional areas. Perhaps the most significant of these areas is the electronic publishing and distribution of our nation's intelligence reports. This talk will discuss how our "Signals Intelligence" (SIGINT) Reports have gone from the world of reports in only ASCII text to robust multimedia formats with distribution, using SGML, over Intelink. The talk will also address other key functional areas including analytical research, collaboration facilities, and training.
The talk will address several of the unique problems, concerns, challenges and special features that distinguish Intelink from other Intranet applications. These issues include networking; architecture and standards; analyst collaboration issues; and finally encryption and other security considerations that are unique to this special environment.
The talk also will provide specific examples of Intelink SGML applications in several agencies within the US Intelligence Community. These examples will present insights into the issues, problems, and solutions for organizations desiring to take advantage of emerging technology allowing them to realize tangible cost savings as well as to enjoy significantly improved capabilities.
The talk will conclude with an examination of the future for Intelink, including plans for enhanced analyst collaboration, security boundaries/access control, and an improved Graphical User Interface."
Abstract: "SGML has been an ISO standard for ten years now. It was being adopted and implemented even before the final standard was published, and its user community is now very large, with thousands of applications. But is SGML a standard for all times?
SGML has always faced competition from systems that now are largely forgotten. Only three years ago, a distinguished consultant proclaimed that WYSIWYG was dying. Will SGML be able to continue its record of success in the face of HTML (you mean that's not SGML?), PDF, OpenDoc, OLE, and the surprising continued vitality of proprietary systems?
SGML has been a remarkably stable standard in the past decade, but will it remain so in the next? Fashions in computing and data management have changed in the years since the development of SGML was begun. In the past year, GCA's conferences have devoted an increasing amount of time to HyTime and DSSSL, new standards that may offer foreshadowings of changes to SGML itself. Perhaps the next year will bring us the long-awaited revision of the base standard. Will SGML still be SGML?
There may be no one answer for these questions. As users and proponents of SGML, we need to take a hard look at our requirements and define what we need from the standard and its implementers. more significantly, we need to understand what information is and what we expect it to do for us. Only with that understanding can we devise good SGML applications, make the right requests from vendors, make the right links between SGML data and other kinds of information-or design a good replacement for SGML."
Abstract: "Business demands are forcing more producers and publishers of information to customize documents for their end users based upon the selection of product choices, options, or configurations. This demand presents new challenges for creating, managing, and distributing our data. Occasionally the differences are at a high enough level to permit complete modularization of information. But, more often the differences are at a finer level. For example, a removal procedure may be completely rewritten for different product configurations. But more often, configuration differences may result in an extra step, or may reference a different part number within the same procedure.
The problem is more than a simple modeling problem because it affects all areas of a complete editorial and production process. To start with, the ability to edit a procedure with multiple effectivities simultaneously affects how the information model is created. It is typically not satisfactory to create different instantiations of a large information fragment (e.g., procedure) for each configuration when only one step within that editable fragment may be different for different configurations.
From a data management point of view, the goal is to save information as a Minimum Revisable Unit. This is the unit at which a piece of information has meaning regardless of the context in which it is used. As such, this often dictates that different effectivities be contained within the same storage unit.
From a delivery point of view, be it on paper, CD, or the Web, a final representation of the information must be rendered to fit the user's requirements (i.e., effectivities must be resolved for the user). This requires a tool to resolve effectivites. Based upon the chosen approach, resolution might be accomplished through a parser or through some type of data transformation which selects the appropriate information based upon attribute combinations.
This paper discusses two approaches for specifying the applicability of information using SGML. The first is use of elements combined with attributes to indicate the effectivity of the element's content. The second approach is use of SGML Marked Sections to provide a wrapper for information which can be included or ignored based upon the use. The benefits and drawbacks of both approaches will be highlighted."
Abstract: "JSP (Jackson Structured Programming) and JSD (Jackson System Design) are software development methodologies from the late Seventies and early Eighties. Both concentrate heavily on the concept of data models. This talk explores the relationship between JSP/JSD and SGML and considers whether SGML's superiority as a data modelling language might make SGML useful as a general purpose Software Development tool. It also examines how some of the ideas of JSP/JSD can be usefully applied to more traditional SGML processing applications.
Many of the philosophical ideas underpinning JSP/JSD are remarkably similar to those found in SGML. I.e., Concepts such as content models, exclusions, validation, LINK, tree transformation are all present albeit in disguise. JSP even grapples with CONCUR!
As well as striking similarities between SGML and JSP/JSD, there are fundamental differences. JSP/JSD provides a modelling paradigm but does not provide any software to support implementation. In other words, JSP/JSD have a modelling language like SGML's DTD but no parsing/validating capabilities. Moreover, JSP/JSD has no direct support for recursive data structures.
The net result of these differences is that SGML can be shown to be a more powerful modelling system than JSP/JSD. The intriguing thing about this is that it implies that SGML may have a role in fields where these methodologies has been used to good effect. These range from library booking systems to process control applications.
SGML continues to evolve from a document markup language to a general purpose modelling tool. Related standards such as HyTime and DSSSL expand the scope of SGML based applications above and beyond 'documents' and 'publishing'.
Comparing SGML with JSP/JSD - philosophically similar, software engineering methodologies - may give us some clues as to where SGML is headed. It may also point to the sort of SGML CASE tools we are likely to see in the future."
Abstract: "Transformations allow a developer and user to think of their documents as active parts of a system. In doing so, we can re-orient our documentation systems and other document-related systems to use transformations as the means by which documents are processed or produced.
With the advent of DSSSL as a standard, we now have the means to be able to create systems that not only read both standard documents but also standard transformations. Simple tasks like editing can be re-oriented as a transformation process. Thus, transformation takes 'center stage' as the 'conductor' of the processes necessary to produce your documents.
This talk will introduce the concept of transformation as the basis of an application and cover the infrastructure necessary to produce such systems using SGML, HyTime, and DSSSL."
See, for example, the description of the SENG Transformation Engine from Copernican Solutions Incorporated as an experimental DSSSL engine that is now [December 1996] being extended to include support for the DSSSL transformation language: SDQL has been implemented, and the WIP 0.1 preview will support abstract groves.
Abstract: "RAFHS produce the Aircrew Manuals and Flight Reference Cards required by the aircrew of all three United Kingdom services - Army, Navy and Airforce. Members of RAFHS team are specialists in the aircraft types flown by the Forces. They are not computer professionals and therefore the system acquired had to be intuitive, modern and have excellent user interface. RAFHS produced Camera Ready Copy (CRC) using a commerical DTP application. Information was received from a variety of sources including paper and proprietary word processing format. Graphics were always provided on paper and needed to be scanned-in by the authors and saved electronically. Any changes to graphics had to be returned to the originators for amendment and the whole process started again. Management of all these documents was a manual paper based system, as was the audit trail for revisions.
We learned these lessons on the way: (1) Assemble a small in-house team who are aware of your business processes and are forward thinking; (2) Educate all concerned because a little background goes a long way; (3) Know the principles of SGML; (4) Plan for change; (5) What do you require from your system and therefore your hardware and software; (6) Work closely with the consultants to ensure they understand your requirements; (7) Insist on a thorough, comprehensive document review; (8) Plan for change; (9) Understand your document structures and graphics requirements; (10) Graphics Packages; (11) Decide on the type of DTD - modular or document based; (12) Plan for change; (13) Mark-up examples of your product; (14) Plan for change; (15) Test the DTD against examples of your documentation; (16) Don't forget the small details, like attributes; (17) Plan for change; (18) Expect to rework, again and again; (19) Sort out the problems of publication; (20) The styles (FOSI? DSSL? System specific output?)
The RAFHS installed system provides them with an integrated solution providing SGML author/editing, document management, revision tracking to provide future proofed data, an airworthiness audit trail, and finally output formatting and pagination by a composition engine."
Abstract: "Electronic Technical Manuals (ETMs) vary from simple raster 'page turners' to complete IETMs. For each type, an overview of major aspects will be presented. SGML-based ETMs and SGML-based IETMs (Interactive Electronic Technical Manuals) will be compared and contrasted, highlighting fundamental differences in function, architecture, and applicability. An ETM display engine and sample ETM document will be used with an IETM display engine for the demonstration. As part of the presentation the information structuring capabilities of the MIL-PRF-87269 IETM DTD (Document Type Definition) will be covered."
Abstract: "SPDL was published in 1995 as a language to describe the final form of a document. The document processing model of ISO/IEC JTC1/SC18 WG8 described SPDL as the final stage of three steps; creation/edit (SGML), format (DSSSL), and presentation (SPDL). SPDL had two editors, one from Xerox and the other from Adobe. Having two editors might have had some impact on the publication schedule of SPDL.
The architecture of SPDL has influence of both Xerox Interpress and Adobe PostScript. Unlike PostScript, SPDL has a document structure using elements such as Picture and Pageset. This hierarchical structure defines the scope of various settings such as dictionaries, dictionary stack, and various imaging parameters. Under SPDL, Picture is a unit for imaging and can contain other Pictures. Because of the document structure, SPDL knows when to image one page by keying on the highest level of Picture. Therefore, SPDL does not require an operator, showpage, of PostScript to notify the imaging device to perform rendering on the imaging medium. SGML is used by the clear text encoding of the document structure.
PDL in one of the subordinate element (Token Sequence) under Picture describes the images and graphics to be rendered. This PDL is a stack oriented languages very similar to PostScript. In fact, many PDL operators of SPDL in clear text format are taken from PostScript.
SPDL uses ISO/IEC 9541 for font handling and glyph referencing. One way to reference glyphs in the mapping between integers and glyphs within SPDL is to use the registration numbers assigned by AFII. One of the format is afiixxxx where xxxx is the AFII registration number.
In addition to the clear text encoding, SPDL has binary encoding using ASN.1 for the document structure and the own encoding for PDL section. Except the positions of comments in the document structure, SPDL clear text encoding and binary encoding can map to each other easily.
One possible application of SPDL is to incorporate Picture element as an imaging portion in such a application as HTML. Using Picture, graphs can be sent as PDL programs rather than images. The start tag of Picture contains an attribute to identify the content to be ISO/IEC 10180 SPDL.
Further information on SPDL (Standard Page Description Language) is available in the main entry of the SGML Web Page.
Abstract: "The Publications Division of SAS Institute needed a way to replace the hardcopy formatting tools it had been using, and also faced the challenge of producing online documentation for its large variety of software products. After deciding to implement an SGML solution using Adept, the Institute decided to apply good software engineering and programming principles to the effort and develop a modular, maintainable store of declarative SGML structures and custom executables. This paper describes the implementation of that system."
A related presentation describing the implementation of SGML by the Publications Division of SAS Institute was given at SGML '96 by Craig R. Sampson, "SASOUT: A Context Based Table Model."
Abstract: "The presentation will be a summary of a project at the University of Oslo, where about 80 persons have been working with SGML. The way we work with SGML is a bit different from many others. We want to use SGML as an infrastructure, applied to a wide range of documents. In this presentation I will summarize the evaluation of the project, and the interviews that I have done with some of the writers."
Abstract: "The Astrophysical Journal, published by the University of Chicago Press for the American Astronomical Society, is a large and complex scientific journal of more than 25,000 pages per year. Over the last several years the production system for this publication has been re-engineered to be SGML-based, including on-screen SGML copy editing, exporting SGML for conventional typesetting, and producing an online HTML edition from the SGML archive. The most difficult part of the implementation was the use of SGML math and the problems encountered in translating complex mathematics between LaTeX, TeX, SGML, ASCII, HTML, and two different commercial typesetting systems. The key benefits of this implementation were (1) reduced conventional production costs, (2) the creation of additional electronic products, and (3) the establishment of a rigorous framework for future non-text content."
For more information on the use of SGML by the American Astronomical Society, see the main AAS entry in the SGML Web Page.
Abstract: "Wärtsilä Diesel is the largest medium speed diesel engine manufacturer in the world, with offices and factories all over the world. This is a case study where Wartsila Diesel Power Plant provides an editorial system for their subcontractors, so that they can easily produce content oriented information modules, based on the physical equipment breakdown structure (EBS) according to the WD Base-DTD. The study also covers the production system that is used in Wartsila to maintain and to produce presentation-oriented technical manuals from the content oriented information modules delivered by the subcontractors. We will also cover the background and problems of handling lots of information coming from several sources in different formats, why WD decided to implement an CALS/SGML information environment and what they achieved so far.
The editorial system consists of the WD Base-DTD that is mapped in SGML Author to templates in Microsoft WORD and a database that is used for mapping the information modules into the correct level in the EBS. This editorial system makes it very easy to author content oriented information, because of the familiar wordprocessor that helps the user to navigate in the DTD without having any knowledge about SGML. The key thing in the application is having an interface of a database from where the author chooses an information module and puts in information by using the next legal style, which follow the structure in the WD Base-DTD.
The production system consists of tools for navigating, searching, browsing and publishing of the technical information from the main repository. When the subcontractor delivers the technical information, it will be analysed in Wartsila Diesel and if it becomes approved it is saved into a main repository for the information. The main tool in the production system is a browser that is configured to the relational database (main repository) that holds the EBS with the associated information modules. The tool is used for searching, viewing and publishing of the information modules in a very object oriented way. By choosing publish, the user can produce information products, such as IETM, Online and/or paper manuals very easy by 'dragging and dropping'."
[A presentation based upon the author's popular "Whirlwind Guide to SGML Tools and Vendors."]
Abstract: "There are differences of opinion as to how the current SGML standard (ISO 8879 as amended in 1988) should be interpreted with respect to the handling of the characters that make up the SGML documents it describes. But a consensus has pretty well been achieved as to how the revision now being worked on will treat 'characters' and 'character strings', and how the 'character sets' described in an SGML declaration will be interpreted and used. This paper presents the character model that is being considered by the group working on the revision of ISO 8879 (the SGML Raporteur Group of ISO/IEC JTC1 SC 18 WG8).
Characters are recognized as 'abstract' data types, just as, for example, are integers. The new model will not assume, for example, that characters of a given character repertoire are always represented by fixed-width bit strings and that strings of characters are not always represented by direct concatenation of the representations of single characters.
The new character model clarifies the relationship between the character representations being used by an SGML system, the character representations used to store external entities, and the character sets described in the SGML declarations of SGML documents. It provides for the possibility of character representation information being in the SGML declaration's 'document character set' description or the 'formal system identifier' of an entity, or even being provided via external-to-the-document, system-dependent means."
Abstract: "The paper provides an overview of the [following elements]: (1) Reasons which spurred the Commission of the European Union to seek and find 'the SGML solution' for their own Official Journal of the European Communities and its Supplement, including a discussion as to how the technical issues surrounding SGML and the production constraints hampered the full implementation of 'pure' SGML production systems for a decade.
(2) The decision to implement 'SGML transition systems' and an account of the consequential experience gained through their implementation, along with an insight as to how Multilingual, Multiple Media 'pure' SGML production systems will be in place before the end of 1996, thanks to the increasing availability of ever more sophisticated software tools and the recent availability of reasonably priced computer processing power.
(3) Philosophy of the new technical concepts and the names the products which comprise the new production systems.
(4) Meeting of the technical challenges involved in providing Multilingual, Multiple Media SGML services, including overcoming the issues related to the implementation of special character sets and a review of a number of both critical personal and business decisions which were made in order to maximise the scope and optimise the Multilingual, Multiple Media SGML services being provided to the European Union."
Abstract: "As described by Charles Goldfarb during talks at SGML '95 and SGML Europe '95, the international standards committee responsible for the SGML standard, ISO/IEC JTC1/SC18/WG8, has been reviewing ISO 8879, which defines SGML. This effort has been more intense recently. After each of its meetings, WG8 makes a point of reporting on the status of the review and the technical issues that have been decided. While these reports are available on the Web at http://www.ornl.gov/sgml/wg8/docs or http://www.sgmlsource.com/8879rev/index.htm, this paper presents the decisions that have been reached to date in order to:
- Assure conference attendees who have not already studied the material that a revised standard will not affect the validity or interpretation of today's documents;
- Encourage attendees to participate in the work of WG8 by commenting on these proposals, suggesting additional possible changes, or representing their national standards bodies at WG8 meetings.
Abstract: "The SGML literature divides DTDs into two types: those that describe existing information structures and those that prescribe a fixed set of structures. A purely prescriptive approach has been in vogue for several years; however, the descriptive approach has much to offer. It is suggested that many DTDs should in fact fall somewhere between the two extremes, and could be termed suggestive. In a Suggestive DTD, certain structures are fixed, others are flexible, and still others are configured through the simple use of attributes to permit previously unexpected values. Relationships are explicitly marked where they cannot be derived."
Abstract: "SGML (Standard Generalized Markup Language) introduced DTD (Document Type Definition) concept to formally describe document syntax and structure. One of its main characteristics is the fact of being purely declarative and fully independent of the future document's processing (typesetting, formatting, translation/transformation). In this context, SGML has become the international standard to be followed.
Sooner or later, a document must be processed. In order to do that we need to associate semantics to the document's structure. In compiler's context, normally we separate semantics in two, static and dynamic. Establishing a parallelism with document processing, we can think of the document's decorated tree (as recognized by a SGML analyzer) as representing the static semantics and document's tree transformation as dynamic semantics.
Pursuing this idea, we will present and discuss a study of the relationship between SGML, DAST (Decorated Abstract Syntax Tree), and Algebraic Specification, in order to better understand how to formally process documents and how to specify and build generic document processing tools."
Abstract: "This composition system accepts documents coded according to multiple authoring DTDs (of many versions) and provides a maintainable method for updating the system to keep pace with DTD changes. The key is that lower level elements, such as paragraphs and phrases, are identical across DTDs while Division (section) level elements differ. The solution automatically creates a document-type-specific transformation program and creates a generic SGML file from one of multiple authoring document type SGML instances. The generic SGML file can then be input to a more structure-based composition converter to create the final composed (targeted) output."
The description of the composition system is based upon an SGML application developed by (for) National Semiconductor to support its technical publications needs -- delivering some 30,000 pages of company information in various delivery formats. The underlying database is called Powerbase, from Coris. "There are two essential parts to the PowerBase solution: database content management, and information production and distribution. National Semiconductor provides new or revised product information in SGML format. Coris produces additional, associated file formats -- for example, HTML for the Web, or video files for multimedia CDs -- and converts images to TIFF for print or GIF for Web. Every one of these pieces is then organized as an individual content object in the PowerBase database, ready to be pulled into multiple-purpose materials and in multiple media." More information on Powerbase may be found on the Coris WWW server: http://www.coris.com/corishome/pwrbase/pwp.html.
Abstract: "Recently, the SGML world has been rediscovering the database repository. Many SGML users have documents which need to be shared at different levels of granularity, distributed among workgroups, created on the fly from smaller pieces, versioned, found via queries, or managed in parts because they are too large to fit in RAM. All of these requirements suggest that documents need to be composed of smaller units, 'components', which can be exchanged among users, combined to form documents, versioned, returned as the result of queries, and validated.
The term 'component' is abstract, and does not concretely specify the relationship of components to an SGML document or DTD. However, the manner in which components are used clearly indicate certain properties which they must have. This presentation uses three specific scenarios to determine these properties: versioning in a workgroup environment, logical access to the components of a document, and dynamic document generation.
Based on these scenarios, we define components in terms of their basic properties and uses, discuss the design choices available for implementing components in an SGML repository, and outline some of the design choices made in an SGML repository system which was jointly designed by F.A.Davis, a medical textbook and multi-media publisher, and POET Software, an object database company."
Abstract: "Producing and storing medical documentation is time-consuming and costly. Studies have shown that physicians spend upwards of 35% of their time on documentation, and the documents produced yearly number in the billions. Very little of this generated medical data is recorded in a format that is computer-readable. The result is a combination of high administrative costs and the inability of clinical decision-makers to use most of the data generated during the patient care process.
Kurzweil Applied Intelligence has received a research grant from the National Institute of Standards and Technology (NIST) to build a prototype system which will use large-vocabulary voice-recognition technology to produce SGML-structured medical reports.
SGML addresses the need for a structured reporting framework for medical applications because:
- SGML enables the preservation of context and structure in medical reporting, making the information gathered more useful and accessible. The current lack of a widely accepted standard format for medical reporting has limited the benefits of computerized patient records.
- The open systems approach of SGML facilitates communication among and porting between diverse platforms.
- The SGML standard supports a wide variety of information types in addition to text; images, video and audio clips can be incorporated into the medical report. SGML formats can be extended to meet the industry's changing requirements.
Many of the issues surrounding the use of SGML in this project will be familiar to the general SGML community, particularly the advantages of tagging and structuring the data. In other respects, however, the project raises some new and interesting problems, such as the dynamic creation of SGML documents from a voice-controlled application. Another important issue is the lack of any standard DTD for clinical data. We have developed DTDs for patient demographic information, prescriptions, and primary care reports, and we are actively involved in the HL7 SGML Initiative, which is an effort to standardize healthcare DTDs."
Abstract: "The SASOUT table model was developed to support the tabular documentation needs of the Publications Division of SAS Institute Inc. SASOUT instances contain sufficient meta information to allow them to be presented in both hard and soft copy. The meta data also makes possible non-traditional and interactive online presentations of the tabular data.
In 1995, research on tables produced by SAS software and on the tables previously used in our documentation resulted in our identification of four table types: simple, intersection, drill-down, and show-all. Imaging these tables on paper, as in the past, presented no significant problems even with SGML source data. However, we anticipated problems presenting our tables in soft copy after experimenting with the capabilities of the CALS table model, which was supported by our SGML software tools.
The CALS model does not support markup for indicating relationships between cells in a table nor directly support row header formatting. These relationships are not critical for producing hard copy, but are very important to our interactive online presentations. Header formatting is important for both hard copy and online presentations from a single source.
The SASOUT table model was developed to provide a means of marking up our tabular data while preserving its characteristics. The markup supports row headers and cell relationships in addition to all CALS features, such as column heads, spanning rows and columns, and alignment of data. The SASOUT model also supports behavior characteristics that allow the specification of online presentation methods.
This paper describes our table types, our platform presentation requirements, extensions we added to the CALS model, and the processing we designed to meet our formatting requirements so far.
The SASOUT DTD is freely available and we look forward to vendors providing support for it and other table DTD's that provide the means to fully identify tabular data."
The document is also available online in SGML format: see the download instructions from Craig Sampson, which contain the associated GCAPAPER DTD. URLs for the paper are: ftp://ftp.sas.com/incoming/sasout.tar.Z, (UNIX tar compressed) or ftp://ftp.sas.com/incoming/sasout.zip (.ZIP format); [UNIX format mirror copy] and [ .ZIP format mirror copy]. The SASOUT table DTD has been made available publicly by Craig Sampson on the Usenet News forum comp.text.sgml (CTS): see the local document. A related presentation describing the implementation of SGML by the Publications Division of SAS Institute was given at SGML '96 by Leonard P Olszewski, "Modular DTD Development and Maintenance at SAS Institute: Implementing an Efficient SGML System Using Software Engineering Principles."
Abstract: "Unlocking the benefits of information in your documents may mean that you invest more than human resources, money and equipment. What about the pre-process planning time? Investment in Standard Generalized Markup Language (SGML) does not guarantee you immediate return and does not happen at the drop of a hat without a major management investment in pre-planning.
Because most corporations information management's top goal is to produce a much richer information environment, we must make a management commitment to a document analysis process. The process should identify what information in these documents is important enough to migrate to a rich electronic format such as SGML. It may seem obvious that the way to maximize your information is to break it into intelligible chunks in a data base. However, to get those chunks of information into a format that is acceptable by most applications is not a simple process. When that is complete, next comes the targeted conversion by document type.
For Learning Support the goal was to establish an information database that yielded benefits in the area of:
- document creation,
- document updating and revising,,
- database review and validation,
- information reuse, and,
- on-line full-text retrieval and distribution of information.,
The objective was to convert annually some 300,000 pages of technical documents containing complex tables and graphics from several different authoring environments into an industry standard Document Type Definition (DTD), called the Telecommunications Industry Markup (TIM DTD).
This industry standard format, Telecommunication's Industry Markup Document Type Definition (TIMDTD) is an explicit and neutral form of markup. The BCCs, in conjunction with the Telecommunications Industry Forum consisting of representatives from telecommunication vendors, such as Ericson, Siemens, AT&T, and Northern Telecom have unanimously endorsed it as their standard list of SGML markup tag definitions.
This paper identifies key learnings grasped from project management of the SGML Implementation Plan the Learning Support organization at Bellcore. Key outcomes determined were:
- Document analysis was critical to the success of the [project]
- The DTD writer's interpretation of the data and its structure required an iterative process with document developers and users. DTDs will change.
- It was important for acceptance to maintain the document developers view of the textual layout and format of the data while enforcing structure.
- Management's buy-in was needed at all points in the process
- Not everyone will be on board the train at the same time."
For more information on the TIM DTD as part of the TCIF/IPI (Telecommunications Industry Forum Information Products Interchange) standard, see the main entry in the SGML Web Page.
Abstract: "One of the most exciting applications of SGML which has emerged in the recent years is its use in document databases. The structural information embedded in SGML documents makes it possible to query SGML documents and extract information in an automatic manner; however, this querying process has not been standardized. As a result, different SGML database implementations use their own query language syntax, thus making the migration from one system to another a difficult process. In the relational database domains, however, the query language SQL has been a standard for over ten years and is universally used in most relational database systems. Although originally designed for relational databases, SQL is quite powerful for specifying complex queries in a relatively easy-to-understand syntax. With a small set of extensions to take advantage of the hierarchical structure of SGML, SQL can be easily adapted for use with SGML document databases (TAG-496).
The powerful 'generalized' nature of SGML makes it easy to implement SQL as an SGML DTD, so that queries can be expressed as document instances of the SQL DTD. Current SGML authors and users can write queries expressed in this DTD without learning a different language or using a separate editor. Moreover, because of the portable nature of SGML, these queries can be used in any SGML database system and can be converted to regular SQL for use in a relational or Object-Relational/Object-Oriented database system, if necessary. Databases that support the SQL DTD can also store the queries without any extra effort, and subsequently query them for inferring optimization parameters.
This paper presents a representative DTD for the SQL query language, with extensions for use with hierarchically structured documents. It also compares this language with languages proposed and implemented, including SDQL - the query language in the DSSSL standard (DSSSL95). This paper explains the advantages of using this language as a query language in document database systems and the necessity for standardizing the querying process in document databases. Finally, it discusses some implementation issues and complexity measures."
Abstract: "Making data sharing work in a publishing system is not as easy as it sounds. There is much to take into consideration. I plan on discussing key points and factors that will enable you to have a better understanding of the concept of sharing data. I will also discuss what things need to be considered in deciding whether or not to share data. Also, key components will be defined as what is needed to make sharing data successful. Real life experience implementing SGML database systems that have the capability of sharing data is the basis of the following discussion."
Abstract: "As the SGML community continues to grow, users are seeking new support structures, new sources of information, new technology, and new ways of applying SGML. The result is a number of emerging SGML interest groups, not just around the U.S., but around the world. Just over a year ago, I helped revive the defunct Rocky Mountain SGML Users' Group in Colorado. The journey to a strong, productive users' group has been long, and not without hurdles. However, the benefits are many for everyone involved, and the learning experiences have been invaluable. This paper presents ten good reasons to start an SGML users' group, who should be involved in organizing a users' group, how to get started on the right foot, what people can expect to happen during different stages of users' group development, common problems that tend to crop up and how to deal with them effectively, and the dos and don'ts of managing a users' group.
Another paper discussing the role and operation of SGML user groups was presented at SGML '96 by Richard Barth.
In 1991, Ericsson Inc. began implementing Standard Generalized Markup Language (SGML) in their Customer Documentation Department in Richardson, Texas. An SGML working environment for procedural documentation was created first. The second SGML working environment was developed internally for descriptive documents and was based on the first. A user's guide working environment was developed in 1994 which was different than anything done in the past. A system was also put in place for maintaining these SGML environments. Customer Documentation's SGML expertise has enabled it to be in the forefront for SGML implementation in other company groups and also to sell its services in SGML document production."
Document available online from the ISOGEN server: "Case Study: Maintaining and Developing a Dynamic SGML Environment at Ericsson", SGML '96 presentation by Renée Swank.
Abstract: "In realizing an SGML-based document processing system, it is required to transform the document structure and/or the data representation, from a source document written in SGML, to data in the format required by the application. In real-world, there is a problem that this transformation often becomes very complex. To solve this problem of complexity, we designed a programming language for SGML transformation (down translation) and implemented its processor. (This language is currently called "Æsop.")
The Æsop processor works on a parsed tree structure (ESIS structure), which is the output of an SGML parser. The processor automatically traverses the ESIS tree structure in depth-first order, selects and executes a script for each node.
To realize the complex transformation with a simple and straightforward program, we designed Æsop as a language which has following features: (1) Ability to select a script for a node, according to any complex condition satisfied by the node. (2) A rich set of built-in functions which enables to modify the document structure itself. (3) Ability to construct a 'process pipeline.' A 'process' is a set of scripts applied to the document tree structure through one traversal action. With Æsop, programmers can divide a complex transformation program to a series of simple processes. A typical Æsop program consists of one or more tree conversion processes and one data output process.
With a prototype processor of Æsop, we succeeded to transform a complex SGML document (written according to a DTD which is very similar to the ISO/IEC TR 9573-11 DTD) to LaTeX. Through this work, we had confirmed the effectiveness of Æsop for transformation from SGML documents containing complex math expressions and tables."
Abstract: "Too many people say 'tag' when they mean 'element'. While this might seem to be just semantic quibbling, the difference is actually important. The power of SGML-based processing lies precisely in the fact that an element is more than a tag. By examining three systems that exploit the power of SGML to allow sophisticated actions on content, this talk shows that understanding an element as more than just the tags that delimit it is a critical part of exploiting the full power of SGML."
Abstract: "In this presentation I would like to share my years of experience in SGML conversion, by reviewing several strategies for converting Legacy documents to ATA standard DTD. In particular, I would like to review practical applications of such strategies in Jeppesen's Maintenance Information Services daily operation.
Among the subjects covered will be input analysis, interchange DTD versus publishing DTD, manual clean-up versus automatic conversion, the "divide and conquer" approach, CALS table conversion and paper conversion. With about 1,000,000 pages of SGML converted so far, I believe we have faced most of the obstacles in this domain."
See the SGML Web Page main entry for ATA (Air Transport Association) for more information on the ATA DTD and its usage.
Abstract: "The use of SGML attributes to represent complex tabular data can help authors create and maintain large volumes of data. Smart use of attributes combined with the functionality of today's SGML processing tools can make the management and distribution of this type of data simple, effective, and more usable. SAIC has recently implemented attributes in some unique SGML applications. We consider SGML attributes as a useful extension of the "content tagging" approach that is being commonly implemented with SGML elements. This paper will describe one such application that effectively used attributes to store up to 250 pages of tabular records each with up to 70 repetitive content descriptors. The application will be described and the rational for selecting an attribute solution will be described."
Abstract: "Objectives [of the paper are]:
- To provide an overview of the information and steps necessary to convert and load manually authored DTDs into Near and Far Library.
- Provide a list of problems found when loading manually authored DTDs into Near and Far Library.
- Provide a Summary of costs and benefits of using Near and Far Library for manually authored DTDs.
"In the Fall of '95, three Thomson Companies; RIA (Research Institute of America), WG&L (Warren, Gorham and Lamont) and Thomson Legal Publishing, Alexandria, VA were merged into one company RIAG (RIA Group). In order to share SGML DTDs and SGML data more effectively across all three companies and a variety of geographic areas, we needed a common storage, maintenance and documentation method. The three companies had various technology departments using a variety of hardware, operating systems and applications. Microstar's Near and Far Designer (previously referred to as Near and Far) and Near and Far Library (previously referred to as CADE) products was the best and only choice for this purpose.
NFD (Near and Far Designer) is a graphical editor for SGML DTDs. NFL (Near and Far Library, previously referred to as CADE Groupware) is a template for a Lotus Notes Database, and is a repository for storing definitions and descriptions of all information objects, e.g., elements, attributes, etc. in a DTD. NFD has an interface to NFL for storing and retrieving DTDs from the Lotus Notes database. Because Lotus Notes is available for a variety of platforms and works well across many geographic locations, this was a good solution to a common storage medium. A central repository for SGML DTDs was needed to provide a method for standard definitions, usage and documentation of information objects across multiple DTDs.
Because NFL is designed to protect the integrity of the DTD database, the definitions and descriptions of the elements, attributes and entities had to be consistent within a programmatic algorithm (i.e., byte-for-byte identical values). Because RIAG had 43 DTDs an automated solution was needed for this conversion. Several problems found during this process and the solutions that RIAG devised will be presented. A summary of the costs and benefits found during this project will also be presented. The presentation will also cover the unexpected benefits, and organizational impact of having a central repository for DTDs."
Abstract: "This talk is not concerned with document analysis or ways in which to turn requirements such as database connectivity into SGML. It is concerned with discussing some features of the HTML DTD and what authors of other DTDs can learn from them.
HTML has been probably the single largest experiment in structured document construction that there has been, in terms of the numbers of participants. DTD authors should consider some of the results of this experiment when writing their DTDs, as at least some of the lessons to be learned may be valid for any given application of SGML. Authors of HTML documents choose to use specific elements and features of the language. Knowledge of possible reasons for these choices are also important for the design of DTDs."
Abstract: "In order to determine how people are really using SGML, GCA has polled attendees at GCA conferences over the last year and conducted a mail survey of our extensive database of people interested in SGML. Results will be discussed by conference, in order to give regional perspective, as well as for the information collection as a whole. Survey topics included: current uses of SGML, user skill levels, document formats, and investment in SGML technologies. From this survey we can begin to get a more accurate picture of the markets that SGML has reached and what attracted current SGML users."