Abstract: Texts are characterized by various types of linkages, within themselves and with other documents, which may be either explicit or implicit. When texts are available in machine-readable form, the ability to trace linkages should become much easier, and more complex tracing of linkages should be possible. Hypertext is an electronic document paradigm whose distinguishing feature is machine support for the building and tracing of intra- and inter-document links; a document is viewed as a collection of nodes connected by directed links. A limitation of many hypertext systems is that all links must be created explicitly by the user. This is impractical in many situations, and it is unnecessary if the link structure is inherent in the documents themselves. The work described in our paper is motivated by the perceived need to extend the hypertext paradigm so that links can be derived from a collection of documents. We explore how a rich set of links connecting documents in a text archive can be programmatically generated, and present a set of link types that are useful, specifiable and computable. The documents in the archive are encoded using the Standard Generalized Markup Language, which views a document as a hierarchical organization of document elements. The archive, therefore, consists of a forest of document trees.
Abstract: "This case study chronicles the successes and shortcomings of an ongoing SGML implementation for electronic publishing from a legacy data conversion at a technical society. A commitment to SGML is most often couched in dollar terms; most implementors are aware of the tremendous costs in data conversion, DTD development, and editorial tools. However, implementors at this technical society realized that an additional level of commitment was needed to successfully publish using SGML. This commitment escaped notice perhaps because implementation planning most often focuses on easily measurable costs. For this quintessential 'content provider,' the need for technically skilled people was one of the unforeseen aspects of its embrace of SGML."
This paper was delivered as part of the "Business Management" track in the SGML/XML '97 Conference.
Note: The SGML/XML '97 conference proceedings volume is available from the Graphic Communications Association, 100 Daingerfield Road, Alexandria, VA 22314-2888; Tel: +1 (703) 519-8160; FAX: +1 (703) 548-2867. Complete information about the conference (e.g., program listing, tutorials, show guide, DTDs, conference reports) is provided in the dedicated conference section of the SGML/XML Web Page and via the GCA Web server. The electronic proceedings on CDROM was produced courtesy of Jouve Data Management (Jouve PubUser).
Abstract: This paper presents MarkItUp!, a system to recognize the structure of untagged electronic documents which contain sub-documents with similar format. For these kinds of documents manual structure recognition is a highly repetitive task. On the other hand, the specifictation of recognition grammars requires significant intellectual effort. Our approach uses manually structured examples to incrementally generate recognition grammars by means of techniques for learning by example. Users can structure example portions of a document by inserting mark-ups. MarkItUp! then abstracts and unifies the structure of the examples. On this basis, it tries to structure another example with similar format. Users can correct or accept the produced structure.
With every accepted example thereby a grammar is acquired and gradually refined, which can be used to successfully structure the other portions of the document.
The article is based upon a paper delivered at the Conference on Electronic Publishing, Document Manipulation and Typography, EP94, Darmstadt, Germany, 1994. A draft version is available in Postscript format as P-94-07.ps.Z from the GMD-IPSI FTP server. Article received 15-August-1993, revised 1-December-1993.
"Abstract: Although editors make extensive use of the computer in their work. Most editors still mark changes on paper using traditional editing symbols. There are, however, compelling reasons for editors to begin marking copy on the computer. We consider online editing from the perspective both of editors and their employers. We then focus on one aspect of online editing: the mark-up models embodied in various editing tools. We demonstrate that the different mark-up models and their particular implementations have major implications for the editing process, including the quality of edited material and the worklife satisfaction of editors and writers. We conclude by recommending that the technical communication community exert its influence on software developers and corporate technology planners to encourage the development and adoption of online editing tools that will be congenial to editors."
Abstract: "SGML is billed as a key to making your data vendor-independent. 'Freedom!' is a rallying cry of the SGML community. Inspired, you migrate your data to SGML, only to discover that important clients and business partners still want it in the format of their favorite word processor, WWW browser, or publishing system and they expect you to translate it for them. How will you translate your data from SGML to other formats? In this article, we discuss several solutions to this translation problem. Along the way, we visit some key features and concepts of tools that address this problem, and we relate the problem to the DSSSL standard. Finally, we investigate the translation problem and the roles of SGML and DSSSL in the context of digital libraries."
See the main document entry for the complete list of articles and contributors, as well as other bibliographic information.
Abstract: SGML, the Standard Generalized Markup Language, separates the content of adocument from its format. SGML documents contain tags that describe what a text component is rather than how it should be formatted. The absence of device-dependent formatting codes means documents can be transferred across systems and formatted in various ways. The presence of tags allows for selective searching, editing and viewing of the text. However, determining what text components should be tagged can be difficult since text can be classified in various ways, depending on how the document will be used.
"Summary: The electronic version of the Oxford English Dictionary incorporates a modified SGML syntax. This article describes the conversion project and explains why the database designers chose structured markup, but not a full SGML implementation. It lists some advantages of structured markup: (1) It supports the data-searching needs of users; (2) It allows textual components to be extracted or modified to produce derivative versions; (3) It allows text to be viewed in various useful ways."
Abstract: "Effective search and retrieval, editing, and browsing tools are necessary to access the growing amount of material published online. Tools based on tagged text address this need by treating text as a database where components can be manipulated independently from each other. The University of Waterloo is developing a browsing tool that allows readers or editors to isolate a text of interest, while concealing other text from view. This can result in useful renditions of the same text. The creation of these rensitions, however, required writers to become even more aware of the organization of their texts."
Abstract: SIMON is a grammar-based transformation system for restructuring documents. Its target applications include meta-level specification of document assembly, view definition and retrieval for multiview documents, and document type evolution. The internal document model is based on attribute grammars, and it interfaces with external document models such as SGML through input and output conversion. The transformation engine of SIMON is an amalgamation of syntax-directed computation and content-oriented computation: the former is through higher-order (and related) extensions of attribute grammars, whereas the latter is done by externally defined programs and it is for computation not naturally amenable to the syntax-directed paradigm. The current implementation of SIMON employs the higher-order extension proposed by H. H. Vogt, S. D. Swierstra, and M. F. Kuiper ("Higher-Order Attribute Grammars," Proceedings of the ACM SIGPLAN '89 Conference on Programming Language Design and Implementation: 131-145) for the syntax-directed computation, and C++ for the content-oriented computation.
Abstract: "Executives at the Automated Document Control Branch of the Correspondence and Directives Directorate at the Pentagon is attempting to standardize the DOD's manuals and directives under SGML. Officials believe that an electronic publishing system will significantly reduce the amount of paper required for manuals throughout the entire DOD. The Air Force and Army have responded enthusiastically to the program, but some DOD agencies are resisting adopting SGML document management, which has become a DOD standard. The electronic publishing system uses SoftQuad's Panorama Pro text processing software for online publishing and to automate SGML document preparation, and civilians can gain access to unclassified information via the Web."
Excerpt: "The tool they are using is Panorama Pro software from SoftQuad Inc. of Toronto. The software automates SGML document preparation and supports online publishing, including searching. Pan-orama Pro also generates a table of contents that automatically is updated whenever the document is changed. . . Web publishing is based on Hypertext Markup Language, a subset of SGML that lacks some of SGML's functionality. HTML is changing as the Web develops, Vercio said, so it lacks the advantages of standardization. On the other hand, HTML browsers do not support SGML. Web browsers are common, but SGML ones are not."
The article is available online from the Government Computer News WWW server: GCN Online; [archive copy, text only].
The HyTime Application Development Guide clarifies the relationship between SGML and HyTime, shows how HyTime constructs can be used to extend the basic capabilities of SGML, and describes some of the basic features of HyTime, using examples.
Version 1.2.4 of the document is available in PostScript format from TechnoTeacher's FTP server. Version 1.2 (May 1995) of the document is also available in PDF format through PHOENIX DATA LABS. See the author's announcement for a brief description.
Summary: "This paper outlines an extension of the Text Encoding Initiative encoding scheme for use by those who wish to study, interpret or otherwise use legal text. This last series of verbs is deliberately vague. It brings under one roof the academic study of legal texts, by lawyers, linguists, historians and others, as well as the use of text by lawyers in their daily work of determining the law for clients or courts. The purpose of these extensions is to provide these researchers with tools that will enable the understanding and use of a text in specifically legal ways by the encoding of those textual features that may be of particular interest in legal study. Although it deals with aspects of text that differentiate it precisely as legal text, this paper takes as a working assumption that the basic reading of text by lawyers is sufficiently like at least some of the reading practiced by humanities scholars to make the extension for lawyers of a humanities encoding system worthwhile.[...] This paper is concerned with the description of a rather basic set of extensions that will allow better representation of legal text while using the TEI encoding scheme. The completion of this set of extensions and the use of the TEI encoding scheme as so modified to mark up a significant quantity of legal text will be an endeavor that will prove both interesting and useful."
See the main database entry for University of Cincinnati College of Law, Center for Electronic Text in the Law, or the web site for CETL: The Center for Electronic Text in the Law.
The extended abstract for the document is available online: http://www.stg.brown.edu/webs/tei10/tei10.papers/finke.html; [local archive copy]. See the main database entry for additional information about the conference, or the Brown University web site.
This issue of Baskerville makes available a number of papers presented at a joint meeting of the UK TEX Users' Group and BCS Electronic Publishing Specialist Group (January 19, 1995) [mirror copy]. See the link to Baskerville, or email: baskerville@tex.ac.uk. Issue 5/2 of Baskerville has other articles on SGML: "Portable Documents: Why use SGML?" (David Barron); "Formatting SGML Documents" (Jonathan Fine); "HTML & TeX: Making them sweat" (Peter Flynn); "The Inside Story of Life at Wiley with SGML, LaTeX and Acrobat" (Geeti Granger); "SGML and LaTeX" (Horst Szillat). See the special bibliography page for other articles on SGML and (LA)TEX.
Report on the conference "SGML Technology 1996" which was held on March 27, 1996, in Ottawa. See the conference entry.
See the bibliographic entry for The Transcription of Primary Textual Sources Using SGML in this database.
The paper (online version) has two parts: (1) Transcription challenges inherent in the document: The problem of dual emendation and correction; (2) Transcription Challenges arising from the Use of SGML: The Problem of Multiple Hierarchies.
"In both of the cases discussed above, it is clear that the principles of SGML are intimately bound up with the WWP's conceptualization and solution of various transcription problems. This is to say that SGML and the TEI's implementation of it, quite apart from being either a help or a hindrance in solving a particular transcription issue, are of great use in thinking intelligently about it. It may sometimes seem that the particular formulations enforced by these systems create unnecessary complexity; however, in almost all cases this complexity is already latent in the document or the activity of transcription. What appear to be simple, natural systems (like pages with text on them) reveal their complexity when we attempt to map out their real structures in an explicit way; we see them as simple only because they are written deeply into our cultural systems. As a way of bringing such systems to the level of awareness, SGML is invaluable; we only need to remain aware of the structures it in turn creates before they too become naturalized and invisible. [from the Conclusion]
The document is available online: http://dynaweb.stg.brown.edu/wwp_books/DL/. See also the main workshop entry or the program listing for other workshop details. See also: Mah, Flanders, and Lavagnino, "Some Problems of TEI Markup and Early Printed Books," in Computers and the Humanities (CHUM), 1997.
[Excerpt]: "Now that the TEI Guidelines have been in use long enough to create a substantial base of encoded data, projects whose source material and encoding strategies are similar can benefit from comparing approaches to common problems, and assessing whether their divergences are justified by differences in data or philosophy, or merely represent unnecessary variation in the application of the TEI. . .One area of primary source transcription which deserves examination along these lines is the classification of proper nouns and similar words and phrases, using the elements described in Chapter 20 of the TEI Guidelines . . .The proposed session will present several perspectives on this problem, with several aims: first, of allowing the participating projects (and those represented in the audience) to compare practices and discuss the status of their variation; second, of situating the specific problem of encoding proper nouns within the context of scholarly analysis, so as to create a more precise sense of the needs which the encoding is intended to address; and third, to think more broadly about the pressures and constraints on classification systems in text encoding."
Abstract available online in HTML format: "Applying the TEI: Problems in the classification of proper nouns. (Session)", by Julia Flanders, Sydney Bauman, Mavis Cournane, Willard McCarty, Aara Suksi; [archive copy]
Additional information on the ACH-ALLC '97 Conference is available in the SGML/XML Web Page main conference entry, or [August 1997] via the Queen's University WWW server.
For a description of the SGML database underlying the production of the Principles of Ambulatory Medicine, see the bibliographic entry for the work edited by Randol L. Barker, et al. For further information on the use of SGML by the National Library of Medicine for database publishing, see the main entry for the NLM.
Author's abstract: "This report is a summary of the joint conference of the Association for Computing in the Humanities and the Association for Literary and Linguistic Computing, held at Georgetown University, Washington DC, 16-19 June 1993. It contains a précis of the text published in the preprints supplemented by the author's notes, but omissions occur for a few sessions for which (a) no paper was available; (b) where a panel discussion was held viva voce; or (c) where a fuller report is available from the speaker. In dealing with topics sometimes outside my own field, I will naturally have made mistakes, and I ask the authors' pardon if I have misrepresented them."
Several of the presentations at ACH/ALLC 1993 treat SGML topics. An HTML version of this report is available from the Curia WWW server. It is also available here in mirror copy.
Abstract: "[The book has] three sections deal with (1) Getting connected to the Internet and using Internet software; (2) Writing HTML (2.0) files for the WorldWideWeb; (3) Running a HTTP server and providing a Web service. Author is a member of the IETF Working Group on HTML. Text includes additional material on SGML; choice of editors, browsers and servers;copyright and intellectual property; and advance details of HTML3."
See the online description of the book, the Table of Contents, the book Foreword, or the companion HTML Quick Reference Guide.
This issue of Baskerville makes available a number of papers presented at a joint meeting of the UK TEX Users' Group and BCS Electronic Publishing Specialist Group (January 19, 1995) [mirror copy]. See the link to Baskerville, or email: baskerville@tex.ac.uk. Issue 5/2 of Baskerville has other articles on SGML: "Portable Documents: Why use SGML?" (David Barron); "Formatting SGML Documents" (Jonathan Fine); "HTML & TeX: Making them sweat" (Peter Flynn); "The Inside Story of Life at Wiley with SGML, LaTeX and Acrobat" (Geeti Granger); "SGML and LaTeX" (Horst Szillat). See the special bibliography page for other articles on SGML and (LA)TEX.
Abstract: "HTML is often criticized for its presentation-oriented conception. But it does contain sufficient structural information for many everyday purposes and this has led to its development into a more stable form. Future platforms for the World Wide Web may support other applications of SGML, and the present climate of popularity of the Web is a suitable opportunity for consolidation of the more stable features [of HTML]. TeX is pre-eminently stable and provides an ideal companion for the process of translating HTML into print."
Editorial note: Some of the notions in the article (predicated of HTML) and the supporting tools are applicable, to a degree, to SGML (documents). The paper is based upon a similar document published in Baskerville 5/2 (March 1995). Note that the author has produced a utility SGML2TEX as a general-purpose program that translates SGML-tagged text into LATEX based upon mappings specified by the end user.
For more on SGML/XML and TeX, see the dedicated database entry and the topical bibliography listing.
"This book is a practical guide to implementing SGML and XML with precise procedures for making the most of the wide range of tools available. Programs are introduced in the context of the lifecycle of a document, from creation, through validation, on-line display, searching and database, to printed delivery and repository storage. Included are many examples of the tools discussed, showing the various output stages and the methods for producing them, as well as tips and tricks for getting the most out of them. The accompanying CDROM contains a range of SGML and XML tools, including design tools, editors, parsers, formatters, databases, converters, utilities, DTDs, DSSSL/XSL and other sample style specifications."
From the author's web page: "Here's the essential book on software for handling SGML: the complete life-cycle guide to your documents. Understanding SGML and XML Tools is essential reading for anyone building or using SGML systems, and provides a valuable reference for all those times when you need the right tool for the job. Peter Flynn has many years' experience in implementing and using SGML in business and research, and provides valuable advice on the tricks and traps of working with over 50 popular commercial and public domain SGML programs. The book has seven chapters, covering the cycle of creation to final archival. The free CD-ROM includes both commercial and public-domain software, from demos to complete full systems, plus a 90-day license for Corel's WordPerfect Suite 8 SGML wordprocessor."
See the more complete volume description with chapter summaries. Or see http:imbolc.ucc.ie/~pflynn/books/sgmltools.html, which also provides a Contents listing and bibliographic data in BibTeX style.
Abstact: The relationship between TeX and SGML (Standard Generalized Markup Language, ISO 8879) has always been uneasy, with adherents to one system the other displaying symptoms reminiscent of the religious wars popular between devotees of TeX and other word processors. SGML and TeX can in fact coexist successfully, provided features of one system are not expected of the other. This paper presents a pilot program to test one method of achieving such a cohabitation.
The text is/was online at Curia. A mirror copy is available here.
Abstract: "The World Wide Web has had over 5 years of intensive development, and has expanded from a text-only technical documentation system to a multimedia information base distributed across the planet. Although its tool for structural definition, the Hypertext Markup Language (HTML), has been under constant development throughout this period, most browsers have been slow to take advantage of all the facilities it offers. At a time when there is much debate over the public future of the Web, it is in danger of partial stagnation. Despite significant innovations in some areas, the field is still open for software developers who are capable of harvesting the benefits of SGML, the language in which HTML is written. This analysis of HTML Document Type Descriptions (DTDs) reveals where some opportunities may lie."
See the main document entry for the complete list of articles and contributors, as well as other bibliographic information.
"Abstract: The evolution of document management systems towards compound documents makes necessary the use of rational representations of information. Rather than slowing down technology development, document standards offer a new way of using document management systems. They are not only search tools, but become communication nodes for documents between electronic media. By representing the logical structure of documents and being system independent, the SGML standard especially addresses this issue."
Abstract: We propose the use of SGML 'concurrent structures' to create and tag the structure of an idealized or virtual document to be mapped onto the tagged structures from actual print dictionaries. The idealized structure is to be defined by a simplified document type definition [SGML DTD]; the elements of actual print dictionary entries will be rearranged to fit into the resulting template. We use a system of index numbers to link the elements of the generalized entries with their sources in the entries of the actual documents. We illustrate this technique by using it to merge elements from a number of different dictionaries into a generalized entry structure.
"INTRODUCTION: With funding from 1991-1995 by NSF, and support by ACM, the Envision team has prototyped a DL system for the computing literature (Fox et al., 1993b). Part of the work has involved developing SGML Document Type Definitions, converting typesetter data into an SGML archive based on those DTDs, and building a large collection of bibliographic records, review articles, full-text technical articles and video materials. Thousands of page images have been scanned in, and coupled with bibliographic records. A small collection of MPEG data has been prepared using special compression software, for use in educational activities (Fox & Abdulla, 1994).
Project activities also have included developing the Envision system. One component of that is a specialized object-oriented database system being developed by G. Averboch to replace the earlier system programmed by QiFan Chen. The largest component is the Envision backend system, that makes use of a version of MARIAN for searching. It manages data in an SGML archive, and converts documents that are selected for display to HTML, so they can be presented using a Mosaic browser. The backend talks with a specially tailored interface for query formulation, listing results, and visualizing the result set (Nowell et al., 1994). Overall, a user-centered design approach was undertaken; usability tests have shown keen appreciation of the interface."
Available online: http://fox.cs.vt.edu/NORDINFO.txt; [mirror copy, September 1995]
"Abstract: With support from four NSF awards we aim to develop a prototype digital library in computer science and apply it to improve undergraduate education. First, Project Envision, "A User-Centered Database from the Computer Science Literature", 1991-94, deals with translation, coding standards including SGML, retrieval/previewing/presentation/browsing/linking, human-computer interaction, and construction of a partial archive using text and multimedia materials provided by ACM. Second, "Interactive Learning with a Digital Library in Computer Science," 1993-96, supported by NSF and ACM with additional assistance from other publishers, focuses on improving learning through delivery of materials from the archive. Third, "Networked Multimedia File System with HyTime", funded by NSF through the SUCCEED coalition, considers networking support for distributed multimedia applications and the use of HyTime for description of such applications. Fourth, equipment support comes from the Information Access Laboratory allotment of the "Interactive Accessibility: Breaking Barriers to the Power of Computing" grant funded by NSF for 1993-98. We report on plans and work with digital video relating to these projects. In particular, we focus on our analysis of the requirements for a multimedia digital library in computer science and our experience with MPEG as it applies to that library."
Abstract: "The NSF funded (1993-96) Education Infrastructure project Interactive Learning with a Digital Library in Computer Science builds upon work in Project Envision, another NSF project (1991-94) to develop A User-Centered Database from the Computer Science Literature. Our objective is to make development of high-quality educational materials more cost effective by having that effort build upon a hypermedia digital library. In Fall 1994 we have three classes working with new courseware produced and distributed over WWW.
"This presentation describes the architecture, design, development and administration for our system. In addition to using Mosaic and the NCSA WWW server, we employ Graz University's Hyper-G server and its Harmony and Amadeus clients. Further, the Envision system aids authoring by providing searching, results visualization, and browsing of a large SGML archive and multimedia object database - and also utilizes a WWW server loaded with data from ACM and other publishers. Our materials include large bibliographies, collections of page images, SGML versions of journal articles, videos, and other resources.
"...Of particular importance is our commitment to using SGML and object-oriented database (OODB) methods. According to our investigations, users prefer to think about objects related to their domain of inquiry (e.g., algorithms, animations, source code, pseudo code, theorems, proofs, conferences, research projects, authors). Thus, we have an object in our OODB for each abstract entity (e.g., author, institution, journal), with associated attributes recorded as well as links to related objects."
Available online in HTML format; [mirror copy, September 1995]
Introduction: "On the first anniversary of funding by the U.S. Department of Education (FIPSE) for a National Digital Library of Theses and Dissertations, we review its origins, describe progress-to-date that warrants its now being called the Networked Digital Library of Theses and Dissertations (NDLTD), explain some of the controversy that has led to widespread publicity and dissemination, and explore future growth possibilities. The first workshop about electronic theses and dissertations (ETDs) took place in 1987 with a technical focus on standards, namely applying SGML to the description of research. Ten years later, we realize that the proper aim should be improving graduate education by having students enter ETDs into a digital library which facilitates much broader access. Achieving that goal calls for a sustainable, worldwide, collaborative, educational initiative of universities committed to encouraging students to prepare electronic documents and to use digital libraries - NDLTD."
The article is available online in HTML format; local archive copy. See the mail URL for NDLDT: http://www.ndltd.org. Note that D-Lib Magazine frequently contains articles referencing the use of SGML encoding in digital library research.
Summary: "With the support of the National Science Foundation and the Association for Computing Machinery (ACM), the Envision project has developed a prototype digital library of computer science literature that is highly usable (from user-centered design), highly structured (from SGML and an object database), and highly integrated (from hypertext links among objects). The result is a representation of part of the computer science literature as a cohesive body of knowledge that can be searched and viewed in innovative ways. The user interface was designed with careful attention to user needs and desires (through interviews with potential users), to graphic detail (through involvement of an artist and attention to the research literature on graphical perception and psychophysics), and to usability (through an iterative process of usability evaluation). Recognizing the need to translate enormous quantities of documents in an unlimited variety of input formats into a single standard format, the project developed a flexible system for analyzing the structures (e.g., titles, authors, paragraphs, and references) within a document and translating that structure into any standard markup scheme. The Envision distributed server supports simultaneous access to the library by a number of users and in a variety of ways. The Envision software is soon to be installed at ACM headquarters and made available to ACM members. The Envision system will continue in use at Virginia Tech and Norfolk State University to support the work of a related NSF Educational Infrastructure grant."
Available online: the HTML version, or [mirror copy, September 1995]; a Postscript version is also available.
Abstract: "In numerous fields of technical documentation and publishing, paper media are giving way to electronic media. The SGML Standard (ISO 8879:1986) is gaining increasing importance. The documents processed, in particular in the aircraft industry but also in other fields like legal publishing, are generally of large size, highly structured and regularly updated. Their production in the SGML format creates a need in terms of document management systems. A repository architecture, supporting different kinds of activities such as consultation or edition, is a requirement for such a production."
"We give an overview of the requirements for such a repository, starting with those already widely recognized and used as a basis for on-going experiments and products. We then detail some unforeseen requirements that are just emerging. Finally, we will analyze the impact of these requirements on information modelling inside and outside the repository. The analysis of requirements reveals some shortcomings of the SGML exchange model itself. An evolution towards the HyTime standard enables most of these shortcomings to be offset. For the repository internal model, we propose an object-oriented data model that meets most of the requirements."
This article was published in an SGML special issue of Computer Standards & Interfaces [The International Journal on the Development and Application of Standards for Computers, Data Communications and Interfaces], under the issue title SGML Into the Nineties. It was edited by Ian A. Macleod, of Queen's University.
"Abstract: As far as technical documentation or publishing is concerned, paper media is giving way to electronic media. The SGML exchange standard is gaining increasing importance. It is more particularly used to perform electronic technical publication exchanges between aircraft manufacturers and airlines. This standard allows the document tree logical structure to be described in a computer system-independent way. As the documents produced are usually of large size, highly structured and regularly updated, their manipulation calls for new systems: document management systems. In fact, a document repository and therefore an underlying database is necessary to support document utilization. An in-depth analysis of requirements allows this repository s main features to be defined. We focus on demonstrating how database technology, and more particularly object-oriented database technology, is helpful to develop such a system. We propose an object-oriented repository data model which allows most of the requirements to be met."
Abstract: "In the context of Document Management Systems, the notion of document is becoming less and less preponderant. A document corresponds to an assembly of information objects -- SGML or non SGML objects -- that may be shared by several documents. Moreover, these information objects are interconnected by various kinds of links.
"The conventional SGML Databases offer a good support for storing and manipulating collections of independent SGML documents. They have to evolve for managing a network of SGML and non-SGML documents, i.e., hypermedia documents. SGML allows to define inter-document links by using id/idref attributes and entity sharing. HyTime goes beyond the SGML limits concerning the hyperlinking features by offering the semantic to model complex links, such as a link from a document to a very precise location inside an other one. In order to offer all the functionalities necessary for managing hypermedia documents, SGML Databases must then take into account all the above constructs. The schema of these SGML databases consists in a tree structure representing the mapping of the SGML meta-model. But it has to evolve towards a graph structure for representing the HyTime hyperlinking model. This paper presents the principles to extend an SGML Database to an HyTime Database and the functionalities of a web interface to access to the documents stored in the database."
This paper is the result of a current collaboration between Aerospatiale Aircraft Business and the Research and Development Division of Electricité De France. This collaboration concerns a study and research project in the structured electronic document database field. Although the specific industrial contexts are different, numerous common requirements may be identified in this particular field and a large benefit may be expected from a common study. Aerospatiale and Electricité De France are two big French companies which produce respectively, aircrafts (Aerospatiale Aircraft Business) and electricity. Both need to manage a large amount of documentation in their own industrial context. As a consequence, a significant benefit is expected from powerfully computerized documents. [...] After presenting this study's industrial contexts, we succinctly present our approach for specifying an SGML database. Then, we focus on our strategy for evolving towards an HyTime hypermedia database. We show how we have chosen to implement this SGML/HyTime Database. Finally, we conclude by giving the progress status of our work and the main issues which remain to be studied in depth.
Note: The electronic conference proceedings in hypertext were produced by Inso Corporation (DynaText) and by High Text (EnLIGHTeN). Information about the SGML Europe '97 Conference may be found in the main database entry.
Abstract: "This paper proposes a model of a hypermedia repository based on both object technologies (specification method and Database Management System) and electronic document standards (SGML and HyTime).
"SGML is an ISO standard for document structuring. It guarantees document exchange capabilities, document longevity and document reusability. Object Database Management Systems fit well for complex document storage and access. However, every time a mapping between SGML and an object model is studied, many issues arise and only a part of them are solved. We propose a complete and generic object model of SGML which eliminates all the limits. Moreover, this model should stand for a universal model, i.e., it can be used as a standard API to plug any SGML visualizer or editor on top of an object repository.
"HyTime is also an ISO standard, based on SGML, so that it ensures exchange capabilities, longevity and reusability too. Moreover, HyTime goes beyond the SGML limits concerning the hyperlinking features by offering the semantics to model complex links, such as a link from a document to a very precise location inside another one. We describe how to extend our object model to the hyperlink features of HyTime. We give an overview of the prototype we have implemented to validate our approach."
For other conference information, see the main conference entry for EP '96, or the brief history of the conference as sixth in a series since 1986. See the volume main bibliographic entry for a linked list of other EP '96 titles relevant to SGML and structured documents.
"Abstract: SGML is an excellent tool or technology for implementing modular, reusable information and documentation - but technology and tools alone are not enough! Unless a new methodology is added, the end result of a typical SGML project may be that the users get instant access to enormous amounts of completely useless data (i.e. information pollution).
"The reason is that the basic units of information that we have been using for centuries - the chapter, the section, and the paragraph, are completely undefined in terms of what function and purpose the information has for the reader! Short and/or highly structured documents may be easy to describe in a meaningful way in a DTD - but if we are looking at typical business policies and procedures manuals they will often be just as structured as a bowl of spaghetti.
"This paper will introduce the Information Mapping(reg.trade mark) method. The method provides a complete hierarchy of information types or classes that can be used to produce modular, reusable information objects - all with a precisely defined purpose and function for the information users."
The document is available online in HTML format: "Object-oriented information" [mirror copy, December 1995]. For further details on the Conference and BeLux, see the contact information for SGML BeLux.
The author reports on the fourth annual meeting of the OmniMark Users Group, held in Boston, with some 200 people in attendance. OmniMark is strongly supporting the "microdocument architectures" paradigm in the new release of the company's flagship product. On OmniMark (formerly Exoterica Corporation) : see the OmniMark Home Page.
The author offers a positive review of this book, which appears in the Charles F. Goldfarb Series On Open Information Management, and is published by Prentice-Hall. It was begun by the late Yuri Rubinsky, and completed on Yuri's behalf by a Murray Maloney. The book is structured as a tutorial introduction to HTML and SGML, and used SoftQuad Inc.'s Panorama Pro SGML browser/searcher (supplied on the accompanying CDROM disc) to view the tutorial examples. Eric thinks the book is a "must-have" for SGML implementors. See further description of the book, including an online copy of the Preface and Table of Contents, via the bibliographic entry.
"Introduction: As the popularity of the World Wide Web (WWW) increases, a growing number of organizations are interested in distributing their data over the Internet. Some of this data is marked up using the Standard Generalized Markup Language (SGML). The challenge of distributing SGML over the WWW involves converting structurally marked data into a less structured format for the presentation of the data. Two major issues that must be addressed when developing a system for the publication of SGML information over the Internet include the partitioning of large SGML documents and the transformation of elements." [from the Introduction]
The document is available from the online electronic Conference Proceedings, or in mirror copy here.
Summary: [The subject matter is] "discussed from the perspective of the National Digital Library Program at the Library of Congress, which implemented a TEI-based DTD in 1993. The National Digital Library Program at the Library of Congress is on an ambitious course to digitize millions of primary source items from a broad range of historical collections. The Library has been digitizing historical materials since 1990, when the American Memory pilot program developed collections, initially on CD-ROM, to explore the potential audiences, uses, and enthusiasm for digital resources on American history and culture.
"In developing our SGML document type definition (DTD), we confronted some hard choices and made what might be characterized as some very library-like decisions. [...] We didn't want to have to force different document types into a single content model. Nor did we want to have a baker's dozen of DTDs and match up every document with the best suited DTD, or require that kind of sophisticated decision making from data-entry technicians who were unlikely to possess the appropriate training. We knew that we would provide digital images of the original pages of text materials and that we wanted the texts to faithfully retain original errors.[. . .] In response to these issues, we sought to cultivate some creative and flexible middle ground. Little did we realize that we would find that middle ground in the TEI. There was an uncanny congruence between the encoding principles derived during the American Memory document analysis and the TEI guidelines. This should not be so surprising, however, since both projects were firmly rooted in the careful analysis of a broad range of humanities texts. Though LC staff did not expect to become TEI converts, we knew what we wanted and what types of capabilities we had to have. It could be argued that the unexpected result--great compatibility and congruence between the American Memory DTD and the TEI--underscores the appropriateness of the TEI gestalt for use in the humanities. The descriptive flexibility afforded by the TEI is profoundly important and, this author would argue, developing a digital library of historical materials in the humanities would be impossible without it.
The extended abstract for the document is available online: http://www.stg.brown.edu/webs/tei10/tei10.papers/friedland.html; [local archive copy]. See the main database entry for additional information about the conference, or the Brown University web site.
"Abstract: Electronic Book Technologies (EBT) Inc is adding Netscape Communications Corp's Frames capabilities to EBT's DynaWeb 2.0 server software. The aim is to leverage Standard Generalized Markup Language (SGML) document structure to the World Wide Web (WWW). Users will be able to store corporate data in SGML, translating on the fly to the HyperText Markup Language (HTML). SGML is an internationally accepted standard for representing data in electronic form, and HTML is a subset of SGML designed for the Web. The addition to DynaWeb 2.0, called FreeFrames, will let users publish HTML pages with multiple live windows. Customers are currently beta testing FreeFrames. A frames demo is set up on EBT's Web site at http://www.ebt.com."
Abstract: "To allow users to truly use the Web to construct personal information systems, users must be able to write their own applications to retrieve, massage, combine, and store information from Web servers. Information providers cannot know all the ways their information can be used; that is determined by the collectivity of users. If users cannot write their own applications, then Web access will remain a tedious and manual process. After describing two small applications we show that the Web architecture, based on HTML, a display-oriented language for describing pictures, does not support client applications very well; the structure and marking of a page does not describe its information in a way easily understood by software. Nevertheless, because the information is mostly textual and was designed to convey that information to a human, it is often possible to retrieve needed information from a page. We describe our implementation, written in Scheme, which queries pages using set predicates, extracts information, and uses that to query further Web pages. Extensions of this approach can combine this information with the clients other local resources. Finally, the same tools are applicable to more sophisticated markup systems, arch as SGML or its Web-oriented offspring XML.
"To be published in HICSS31." The document is available online in Postscript format; [local archive copy]
Abstract: "The Internet provides a medium to combine human and computational entities together for ad hoc cooperative transactions. To make this possible, there must be a framework allowing all parties (human or other) to communicate with each other. The current framework makes a fundamental distinction between human agents (who use HTML) and computational agents, which use CORBA or COM. We propose domain-specific languages (DSLs) as a means to allow all kinds of agents to 'speak the same language'. In particular, we adopt some ideas (and syntax) from SGML/XML, especially the strict separation of syntax and semantics, so each agent in a collaboration is capable of applying a behavioral semantics appropriate to its role (buyer, seller, editor). We develop the example of a card game, where the syntax of the language itself implies some of the semantics of the game."
See also the following entry.
Abstract: "The Internet provides a medium to combine human and computational entities together for ad hoc cooperative transactions. To make this possible, there must be a framework allowing all parties (human or other) to communicate with each other. The current framework makes a fundamental distinction between human agents (who use HTML) and computational agents, which use CORBA or COM. We propose DSLs as a means to allow all kinds of agents to 'speak the same language.' In particular we adopt some ideas (and syntax) from SGML/XML, especially the strict separation of syntax and semantics, so each agent in a collaboration is capable of applying a behavioral semantics appropriate to its role (buyer, seller, editor). We develop the example of a card game, where the syntax of the language itself implies some of the semantics of the game."
"We have relied heavily on SGML, the Standard Generalized Markup Language, as the metagrammar for defining our various DSLs. SGML has some important characteristics which make it a candidate for the role: 1) It is an existing international standard already used to mark up terabytes of information, much of which may be interesting for the kinds of applications under consideration. 2) Although rather complex, a number of parsers are available. XML, a simplified version of SGML designed for Web delivery is designed to be simple to parse. 3) It is LL(1), as we will discuss later. 4) Most important, SGML was designed to enable a complete break between syntax and semantics through its promulgation of logical, or descriptive, markup. 5) SGML is also the metagrammar in which HTML is defined, so it will look familiar to people who have read Web document sources."
Available online in HTML format: http://cs.nyu.edu/phd_students/fuchs/dsl.html; [local archive copy]. To be published in the Proceedings of the Domain Specific Languages Workshop, held in October, 1997. Also in Postscript or Postscript, gzip, [local archive copy].
Abstract: "Multi-user distributed applications running on heterogeneous networks must be able to display user interface components on several platforms. In wide-area public networks, such as the Internet, the mix of platforms and participants in an application will occur dynamically; the user interface will need to coexist with environments completely uncontrolled by the designer. We have dealt with this issue by considering user interfaces as a kind of document specifying the application's requirements and adopting SGML technology to process them locally. This approach provides new flexibility, with implications for the design of network browsers, such as those of the World Wide Web, and leads to an interesting class of active documents."
This article was published in an SGML special issue of Computer Standards & Interfaces [The International Journal on the Development and Application of Standards for Computers, Data Communications and Interfaces], under the issue title SGML Into the Nineties. It was edited by Ian A. Macleod, of Queen's University.
Available on the Internet in Postscript format: g'zipped version: http://cs.nyu.edu/phd_students/fuchs/sgml.ps.gz, [mirror copy].
Abstract: "We consider the syntax and semantics of the TL (Transformation Language)in the DSSSL (Document Style Semantics and Specification Language) specification (DSSSL96). At present TEs (Transformation Expressions) are less than first-class language objects - they must all reside at the top level, and cannot be manipulated like other DSSSL/Scheme objects. In particular, there is no means of passing information among TEs, so one TE cannot take advantage of information derived by another, such as passing data about parent nodes to direct the transformation of child nodes. We propose extending the DSSSL syntax to allow a DSSSL program to better exploit the tree-like nature of the source grove by providing a semantics for nesting query expressions, allowing information to be passed around while retaining DSSSL's functional nature. The TEs would also come closer to being first-class objects. We suggest these extensions will make DSSSL programs easier to write and probably easier to optimize."
An online version of the presentation is available: "Why Isn't DSSSL a Tree?", in SGML format; [mirror copy]
Note: The above presentation was part of the "SGML Expert" track at SGML '96. The SGML '96 Conference Proceedings volume containing the full text of the paper may be obtained from GCA.
Reports on LivePage, by Information Atrium Inc. Various components in rthe SGML system allow for storage of documents (document objects?) in relational databases.
"Abstract: Several research document manipulation systems have combined an interactive direct-manipulation style of use interface with a grammatically-specified structured document representation. These systems have demonstrated that the two concepts can be combined, but have not yet demonstrated that the resulting systems are usable when manipulating large or complexly-structured documents. The paper discusses some of the characteristics of such complicated documents and suggests possible approaches for handling the resulting issues. Primary attention is focussed on structurally complex documents, with additional discussion on the management of large-sized documents and on the handling of specification errors."
The dissertation is also available as Technical Report Number 86-09-08, Department of Computer Science, University of Washington. See also a related document published as a conference paper.
Abstract: "Integrated Editor/Formatters merge the document editing and formatting functions into a unified, interactive system. A common type of Integrated Editor/Formatter, the Exact-representation Editor/Formatter (also known as WYSIWYG), presents an interactive representation of the document that is identical to the printed document. Another powerful metaphor applied to documents has been to describe the document as abstract objects -- to describe the document's logical structure, not its physical makeup. The goal of the research reported here is to merge the flexibility found in the abstract object-oriented approach with the naturalness of document manipulation provided by the Exact-representation Editor/Formatters. A tree-based model of documents that allows a variety of document objects as leaves (e.g., text, tables, and mathematical equations) has been defined. I suggest a template-oriented mechanism for manipulating the document and have implemented a prototype that illustrates the mechanism. Further work has concentrated on handling user operations on arbitrary, contiguous portions of the display."
Abstract: "Document preparation systems that are oriented to an author's preparation of printed material must permit the flexible specification, modification, and reuse of the contents of the document. Interactive document preparation systems commonly have incorporated simple representations -- an unconstrained linear list of document objects in the `What You See Is What You Get' (WYSIWYG) systems. Recent research projects have been directed at the interactive manipulation of richer tree-oriented representations in which object relationships are constrained through grammatical specification. The advantage of such representations is the increased flexibility that they provide in the reusability of the document and its components and the more powerful user commands that they permit. We report on the experience gained from the design of two such systems. Although the two systems were designed independently of each other, a common set of issues, representations, and techniques has been identified. An important component of these projects has been to examine the WYSIWYG user interface, retaining the naturalness of their user interface but eliminating their dependencies on the physical-page representation. Aspects of the design of such systems remain open for further research. We describe these open research problems and indicate some of the further gains that may be achievable through investigation of these document representations." [published abstract]
The article discusses prototype systems tnt and Grif, with some treatment of SGML's influence upon these editing systems.
Abstract: "The Electronic Library Project was lunched to study the limits of new technologies like ODBMS (Object Database Management System), SGML (Standard Generalized Markup Language) / HyTime (Hypermedia/Time-based Structuring Language) structuring and a Web access interface. The result is better than any estimation we could have made. Interactive XML (EXtensible Markup Language) generation is feasible. The submitted talk is as much oriented towards a concrete demonstration as towards the conceptual explanation of the prototype."
This paper was delivered as part of the "User" track in the SGML/XML '97 Conference.
Note: The SGML/XML '97 conference proceedings volume is available from the Graphic Communications Association, 100 Daingerfield Road, Alexandria, VA 22314-2888; Tel: +1 (703) 519-8160; FAX: +1 (703) 548-2867. Complete information about the conference (e.g., program listing, tutorials, show guide, DTDs, conference reports) is provided in the dedicated conference section of the SGML/XML Web Page and via the GCA Web server. The electronic proceedings on CDROM was produced courtesy of Jouve Data Management (Jouve PubUser).
Abstract: "During the period of time since the inception of MIL-PRF-87269 and MIL-PRF-87268, vendors have been developing and delivering 'MIL-PRF-87269 compliant' SGML instances to the DoD (Department of Defense) as line item deliverables under various system acquisition/maintenance contracts. Many of these SGML instances have been developed with a limited understanding of the specific information engineering requirements set forth in MIL-PRF-87269. In the absence of this understanding non-conforming IETM (Interactive Electronic Technical Manual) instances have been or are being accepted by the DoD. This has happened partially because there has been no contractual basis to support rejection. This in turn has led to some IETM providers believing that they are creating a fully conforming product when in fact their offerings may have fallen short of what the specification intended.
"It should be noted that the problems encountered with IETM implementation do not stem from any major deficiency in the MIL-PRF-87269 specification. On the contrary, MIL-PRF-87269 and its accompanying implementation guide clearly express a simple but elegant IETM traversement concept.
"This paper reviews the powerful information modeling and information exchange concepts contained in MIL-PRF-87269, and describes how well they fit with existing object-oriented analysis and design methodologies. The MIL-PRF-87269 generic layer architectural forms are discussed in terms of Class 2,3,4 and 5 ETM, ICW and IETM development and their relationship to procedural traversement data modeling. An object-oriented analysis and design approach is offered for interoperable ETM, ICW and IETM content data modeling beginning with the analysis of the users' existing old physical model, the construction of logicalized content data model object classes, the enhancement of logicalized content model object classes with inherited traversement behavior, and finally the implementation of the traversement rules enhanced content data model for ETM, ICW and IETM development. Configuration management and version control issues are addressed as an essential and embedded part of the object-oriented content data modeling approach. The proper use of parameter entities in the creation of re-usable modular DTDs is explained. Issues are raised and strategies are offered regarding the creation and/or adoption of mnemonics and nomenclature related to the naming and name space management and cataloging of information objects and rich content tagged general entities. Finally, the benefits of the object-oriented content modeling approach are revealed in terms of IETM instance acquisition, development and acceptance quality, and in terms of the interoperability that can be achieved among instances through a consolidated ETM, IETM and ICW information server."
This paper was delivered as part of the "IETM" track in the SGML/XML '97 Conference.
Note: The SGML/XML '97 conference proceedings volume is available from the Graphic Communications Association, 100 Daingerfield Road, Alexandria, VA 22314-2888; Tel: +1 (703) 519-8160; FAX: +1 (703) 548-2867. Complete information about the conference (e.g., program listing, tutorials, show guide, DTDs, conference reports) is provided in the dedicated conference section of the SGML/XML Web Page and via the GCA Web server. The electronic proceedings on CDROM was produced courtesy of Jouve Data Management (Jouve PubUser).
Abstract: "Nortel is a large, international telecommunications company, with 68,000 employees, whose documentation needs are diverse and changing. Implementation of SGML in a cross-corporate fashion introduces the difficulty of meeting separate user requirements while still maintaining a level of control over document structures. Using SGML architectures confronts this by defining the relationships between diverse document types, allowing for more effective interchange of cross-corporate information. This information interchange is further enabled by generic tools that can operate on documents conforming to specific classes. This paper highlights the activities of a team tasked with implementing SGML architectures, and describes some of the technical challenges involved. In particular, we show how implementation of architectures relates to the broader area of object-oriented design. We also describe a generic transformation tool written to facilitate interchange."
This paper was delivered as part of the "Expert" track in the SGML/XML '97 Conference.
Note: The SGML/XML '97 conference proceedings volume is available from the Graphic Communications Association, 100 Daingerfield Road, Alexandria, VA 22314-2888; Tel: +1 (703) 519-8160; FAX: +1 (703) 548-2867. Complete information about the conference (e.g., program listing, tutorials, show guide, DTDs, conference reports) is provided in the dedicated conference section of the SGML/XML Web Page and via the GCA Web server. The electronic proceedings on CDROM was produced courtesy of Jouve Data Management (Jouve PubUser).
Abstract: "In this paper we report the use of SGML for the documentation of highly structured engineering data in the telecommunication area. These structures are built by using a method, called Macro Modeling Technique. Macro Modeling Technique provides means for structuring the information about complex technical domains in a most unambiguous and nonredundant way. Models built by using Macro Modeling Technique are highly modular and can be refined and aggregated without overlap. The models also allow very precise access to engineering information because of their elaborated detailed structures.
It was a challenge to use the SGML language to map structures of the Macro Models onto document structures and support certain operations on a model within a document. For this purpose we have defined an unambiguous mapping from our models to content-oriented DTDs. We have developed a systematic approach to construct specifically tailored DTDs by combining parts of various model-based DTDs.
We have successfully applied this approach to the documentation for large systems in the telecommunication area and we implemented a prototype version of the required operations."
Note: The above presentation was part of the "SGML Case Studies" track at SGML '96. The SGML '96 Conference Proceedings volume containing the full text of the paper may be obtained from GCA.
Abstract: "We use SGML to bridge the gap between early specification documents and analysis models in an object-oriented software development process. Introducing content-oriented DTDs for the specification documents it is possible to define a tool-based transformation from the specification documents to an initial object-oriented analysis model. Our methodology OOSDM (Object Oriented Structuring and Description Method) which structures specification documents in a canonical way and our tool OMC (Object Model Creator) which converts those specification papers to an initial object-oriented analysis model are a first step towards a seamless integration of the early specification and analysis phases into the software development and the documentation process."
"We have implemented the conversion tool OMC which converts an input document (OOSDM based SGML document) to an initial object model in the Case Tool Software through Pictures (StP) from Aonix. The user interface is integrated completely into the user interface of StP. OMC uses the OMCT+Booch notation which is supported by StP. OMC runs under UNIX (SUN OS and HP UNIX) and is implemented as an user customization to StP (using a StP specific programming and query language and C++). We have defined a default rule file tailored for the DTD of our specification documents which we use as the input files for OMC. Our customer has started to use OMC for pilot applications."
This paper was delivered as part of the "Expert" track in the SGML/XML '97 Conference.
Note: The SGML/XML '97 conference proceedings volume is available from the Graphic Communications Association, 100 Daingerfield Road, Alexandria, VA 22314-2888; Tel: +1 (703) 519-8160; FAX: +1 (703) 548-2867. Complete information about the conference (e.g., program listing, tutorials, show guide, DTDs, conference reports) is provided in the dedicated conference section of the SGML/XML Web Page and via the GCA Web server. The electronic proceedings on CDROM was produced courtesy of Jouve Data Management (Jouve PubUser).
"Abstract: SGML (Standard Generalized Markup Language) is an ISO standard for document description (ISO 8879). The main idea in SGML is to specify document both by text and by the document's structure without reference to a particular processing system. There are very few systems of SGML that have friendly interfaces and are portable to many applications. In this paper, various approaches to implementing SGML are assessed and the transformation list for SGML application is introduced. This approach is not limited to specific application fields. It is suitable to any application domain and is friendly to users. Users can understand it without any training and can use it as easily as doing their routine work. It will accelerate the development of document interchange."
Abstract: The advantages of a standardized format for transcribing social interactions to computerized media are discussed. The chief advantage of this scheme is that transcripts can be easily exchanged among research groups and across text processing programs. An important element of a transcript is the set of conventions, called markup, that identify the metalinguistic features of texts. Conventions should employ symbols that (1) cannot be mistaken for ordinary text, (2) explicitly describe their linguistic function, and (3) obey a grammar. Social scientists who transcribe social interactions should participate in the development of a standardized scheme of descriptive markup for the encoding of machine-readable transcripts, based on the Standard Generalized Markup Language (SGML).
Discussion of the role played by SGML in managing multinational documentation for power plants that have long life-cycles (50-60 years).
Summary: "From the viewpoint of the medieval scholar TEI P3, as it currently stands, offers only limited facilities for the detailed encoding of manuscripts. This is particularly true with respect to the encoding of manuscript description or "metadata": that is, the detailed prose descriptions which appear in traditional manuscript catalogues and handlists. Only a few manuscript-specific elements are defined in the existing header (for example <hand> and <handshift>, which allow the recording of information on scribes and handwriting styles); consequently, the manuscript scholar is often forced to fudge the issue by using tags for unintended purposes, or to modify the DTD for specific applications (as has been done, for instance, by the Canterbury Tales Project). In January 1996, the Bodleian Library at Oxford began looking at the possibility of extending the TEI to incorporate more detailed metadata for manuscript cataloguing, as a part of a nationally-funded four year project to provide access to descriptions of previously uncatalogued western medieval manuscripts in its collection. In this paper we review the set of TEI extensions we have so far defined for this purpose, most of which which extend the TEI header, but which also include a new global attribute, and several new phrase level elements. Our intention is that the set of metadata elements defines should be rich enough both for those wishing to use the TEI as the basis for a conventional catalogue record, and for those intending to produce electronic editions of the manuscripts themselves. It should also be emphasized that both the scope and the detail of our scheme have been very much dictated by local needs within the Bodleian; other libraries with other habits or different kinds of material may well have different needs."
For more information, see the relevant web site for the Bodleian Library, University of Oxford: Information on the Bodleian Library's TEI extensions for Manuscript Cataloguing, and DTD extensions for WMSS. See in greater detail: "The Cataloguing of Western Medieval Manuscripts in the Bodleian Library: a TEI approach with an appendix describing a TEI-conformant manuscript description," by Lou Burnard, Richard Gartner, and Peter Kidd (August 1997). [ bibliography entry, with abstract], available in RTF format, or in TEI-Lite SGML source; [mirror copy, RTF version].
The extended abstract for the document is available online: http://www.stg.brown.edu/webs/tei10/tei10.papers/gartner.html; [local archive copy]. See the main database entry for additional information about the conference, or the Brown University web site.
"Abstract: The Kurdish language presents specific problems for the lexicographer. Being the language of a people without a state, it has attracted less interest than the other languages of the Middle East, and it is only recently that lexicographic studies and bilingual dictionaries of Kurdish have been published. It lacks standardization (division in dialects, and above all, use of three different writing systems). It also presents further problems in the field of computing (encoding). Paradoxically, this complicated situation means the computer may be especially useful for lexicographic work on the Kurdish language."
"This paper deals with Direji Kurdi, a project for the development of a lexicographic software environment specifically geared towards Kurdish. The different phases of the project, the reasons for the technical choices made by the author, its current state of development, the problems encountered and the planned extensions (linking with CD-ROM dictionaries, corpora, compliance with TEI-SGML) are introduced."
Abstract: "Electronic representations of data objects vary widely. The same information can typically be represented in numerous ways. These differences can be realized at various levels of abstraction.
Such variations can be the result of 1) multiple views of data; 2) differences between application data models; 3) differing proprietary formats of applications. This plethora of underlying representations of electronic data makes it extremely difficult for information to be transported between applications. This plethora of underlying representations of electronic data makes it extremely difficult for information to be transported between applications.
The most general response to the data translation problem is to provide a mechanism whereby, given the description of two data representations, code is automatically generated to translate between them.
In an effort to approach this ultimate goal, we propose a universal framework for data translation. This proposed framework captures certain generalities that are present in all translation tasks. In addition, we are developing an architecture that is modeled after this framework. The goal of the architectgure is to validate the ideas proposed in the framework, showing that a large part of the data translation problem can be generalized." [Keywords: code generation, data translation, intermediate form, object oriented data model.]
Available online in Postscript format from OSU ([or mirror copy]).
Abstract: "Organization of information manipulated by applications vary widely in their representation. However, there does exist some commonality in the way these differences are realized. A better understanding of such commonality will allow for the development of tools that will simplify the task of reconciling those differences, thereby making it easier for applications to share information.
In an effort to achieve this understanding, we have proposed a Universal Framework for Data Transformation. This framework dissects the task of transforming data into three individual components. These three components are steps common to all transformation efforts. Furthermore, individually each of these three steps can be generalized to some degree." [Keywords: code generation, data translation, schema, data model, encoding]
Available online in Postscript format from OSU ([or mirror copy]).
This paper describes the ISO standard character sets currently in use, the use of SGML entity sets, and the TEI writing system declaration. It addresses 7-bit character sets, 8-bit character sets, the Universal Coded Character Set (UCS), SGML and TEI entity sets as relevant to the TEI writing system declaration, coded characters and glyphs, and current international standards work. [adapted from the Introduction]
Draft version of a TEI-related paper [June 24, 1994], to be published in revised format in CHUM. Author address: galiard@let.rug.nl. Available via FTP from ftp.let.rug.nl, or as a mirror copy on the SIL server. The Netherlands FTP site also stores a copy of the paper in TeX format.
Abstract: "An ongoing project at the University of Virginia Library is an effort to provide bibliographic control and access through use of the Text Encoding Initiative (TEI) header and MARC record to SGML-encoded electronic texts collected by the Library's Electronic Text Center. The Original Cataloging Unit creates both full MARC records and TEI-conformant headers for electronic texts. Gaynor discusses the development of the cataloging workflow and raises issues, both local and national, that confront libraries as they integrate electronic text cataloging into the traditional technical services operations."
"SGML, in combination with other developments, offers some additional solutions. SGML is application-independent, non- proprietary, and extremely flexible; as such, it offers a viable alternative and/or adjunct for the encoding of bibliographical information. As the projects mentioned above demonstrate, it is already possible to encode data in an SGML format. SGML and MARC are still separate formats that do not interact; MARC as it currently exists does not appear to be flexible enough to allow libraries to take full advantage of the ever-developing information retrieval technology, especially the World Wide Web." [extracted]
The document is available online: http://www.lib.virginia.edu/speccol/scdc/articles/alcts_brief.html; [mirror copy]. Related article: "From Catalog to Gateway, Briefings from the CFCC," in ALCTS Newsletter vol. 7, no. 2, 1996
Abstract: "During development of our first-generation online documentation conversion and delivery system, we addressed most of the obvious problems and requirements we foresaw. After the system was in place, we discovered other less obvious areas for improvement. We implemented the changes in a second-generation system and are planning additional changes in a third-generation system. This paper addresses the plans and realities of each of these systems."
Note: The above presentation was part of the "SGML User" track at SGML '96. The SGML '96 Conference Proceedings volume containing the full text of the paper may be obtained from GCA.
Abstract: "This paper discusses, from a corporate point of view, the maintenance of an internally developed SGML environment, continual training for the maintenance team, and training and support of users. These are the practical aspects of an SGML system implementation that are quite often overlooked when making implementation decisions. The author recognizes that external SGML consultants normally provide ongoing maintenance and support of systems they design, as well as ensuring that users of the system are properly trained. However, as a user of, and trainer for, an internally developed SGML environment, the author observes that maintenance, support, and training are items that often occur as afterthoughts because of sudden need, rather than checklist items planned for from the beginning as strategic parts of any SGML system implementation."
This paper was delivered as part of the "Business Management" track in the SGML/XML '97 Conference.
Note: The SGML/XML '97 conference proceedings volume is available from the Graphic Communications Association, 100 Daingerfield Road, Alexandria, VA 22314-2888; Tel: +1 (703) 519-8160; FAX: +1 (703) 548-2867. Complete information about the conference (e.g., program listing, tutorials, show guide, DTDs, conference reports) is provided in the dedicated conference section of the SGML/XML Web Page and via the GCA Web server. The electronic proceedings on CDROM was produced courtesy of Jouve Data Management (Jouve PubUser).
Abstract: "Lately it seems that everyone is talking about HTML. Some of you SGML `96 attendees may believe that this hot topic has nothing to do with SGML. Some of you may believe it has everything to do with SGML. And the rest of you may not be sure whether it's relevant or not.
Whether you consider HTML as a critical element of your information delivery strategy or not, you are probably reacting to .... thinking about ... being asked to put your SGML content on an intranet. This brings up many challenges: how to track revisions, how to manage relationships and links between objects, how to reuse information effectively and efficiently, and how to retain your investment without transforming to HTML.
Getting the most out of your SGML source means exploiting your investment by using that source as the same source for your intranet delivery needs. There's a big payoff in combining HTML, SGML, document component management and internet technologies to achieve a diversity of document products, increase quality of customer service, and ensure accuracy and timeliness. Imagine automatically assembling pieces of information which exactly matches a customer's need, and delivering the most up-to-date information in the form and format requested. Achieving this is possible today.
To help you achieve this 'jackpot' of capabilities, this presentation will:
- describe the need and business case for intranets
- identify a roadmap for exploiting SGML
- list key capabilities of such a system
- identify key technologies that should be integrated
This presentation, aimed at a managerial audience, will examine the aspects, value and impact of several real-world intranet applications. It will describe the relevant technologies and offer guidance on enabling your current technology investment to drive this new type of information delivery. It will also discuss critical features and functions of such a system. You will leave this presentation with a deep understanding of how to build a complete information delivery strategy."
Note: The above presentation was part of the "SGML Business Management" track at SGML '96. The SGML '96 Conference Proceedings volume containing the full text of the paper may be obtained from GCA.
"This paper focuses on the use of the Standard Generalized Markup Language (SGML) as a tool that goes beyond a neutral markup language. It investigates the possibility of using specific SGML features and conventions to help suppliers to manage and manipulate their technical data as a text database. The Standard Generalized Markup Language, SGML, is a meta-language that can be tailored to an application and that is valid regardless of the processing methods used. The language can be used to describe a text stream at varying levels of complexity, based on the needs of the application. Although SGML can be used in the financial, legal, and office publishing fields, its flexibility makes it of particular value to technical publishing." (Based upon a paper presented at MarkUp '86, Luxembourg.)
The author provides a detailed report on the first European XML Conference, "XML Ready for Prime Time?", held in London, April 22, 1997. It was sponsored by Technology Appraisals Limited, and moderated by Tim Bray. Summaries are provided for each of the presentations, including (in Gennusa's presentation) an overview of commercial and non-commercial software products which support XML.
See the main conference entry for other information, including the list of presenters, and a report by Martin Bryan.
The article describes an SGML conformance testing program proposed by NIST (US National Institute for Standards and Technology). A number of matters, including governance and appropriateness to the needs of SGML vendors, raised questions in the mind of critics about the advisability of the testing program. The article includes a letter from Yuri Rubinsky and from the International SGML Users' Group to NIST asking for amendments to the program; see also the reference to Mary Laplante's letter to NIST.
Summary: SGML is rapidly emerging as the required format for information across many major industries. All contractors and subcontractors to the US Department of Defense and all contractors and subcontractors to the major airline and automobile manufactures must deliver documentation electronically in SGML. Over the years, these companies have evolved elaborate procedures and conventions for the delivery and procurement of documentation on paper. Gennusa describes the new systems, procedures, contractual instruments, and other conventions that the delivery and procurement of information in electronic form calls for." [publisher's pre-publication description; title uncertain]
"Abstract: Over the past ten years, SGML has achieved great popularity. But not without some bad publicity. Comments heard in the past include: "We're glad we did it, but it did take us longer than we thought"; "We had to change our DTD late in the project and that cost us quite a bit"; "The product didn't work the way we had thought it would, so we had to redesign part of the system"; and so on. SGML is a relatively new tool, so it is still highly vulnerable to being implemented in ways that confound rather than assist the user. Because it is so vulnerable, it is important to continuously monitor project activities against business requirements and keep focused on the project's ultimate objectives. The use of SGML can yield great rewards. As business people, we must decide how much we are willing to pay for those rewards."
The document is available online in HTML format: "Using SGML: the Pain/Gain Ratio" [mirror copy, December 1995]. For further details on the 1995 Conference and BeLux, see the contact information for SGML BeLux.
Update on the liaison relationship of the ISUG (Internationl SGML Users' Group) to ISO SC18, whereby ISUG member feedback is channeled into the ISO process, and assistance is provided to SC18 National Bodies seeking to locate SGML experts to participate in their national standards committees. The current ISUG Users' Group "has over 600 members, more than 500 of whom also belong to an independent chapter. Countries represented in the membership include: Australia, Belgium, Canada, China (including Hong Kong), Denmark, Ireland, Finland, France, Germany, India, Israel, Italy, Japan, Luxembourg, Mauritius, Norway, Portugal, Scotland, Singapore, Spain, Sweden, Switzerland, The Netherlands, the United Kingdom, and the United States of America. Membership across the independent Chapters is estimated at approximately 1200." [Extracted; see the complete text of the article in the ISUG Newsletter.]
The presentation represents an extensive summary of SGML events for the year 1994, given as a "World Report and Year in Review." Covers Standards Activities, Major Public Initiatives, User Group Activity, Corporate and Government Initiatives, News and Forthcoming Publications, Industry News, Events.
Abstract: "Literate Programming is a documentation method that attempts to maintain consistency among the various design and program documents of a software system. Unfortunately the majority of the literate programming tools do not have appropriate user interfaces and require the users to learn complicated and cryptic tagging languages. SGML is a metalanguage used to specify markup or tagging languages that can be used to encode the structure of documents. Since SGML is an ISO standard and is being widely used by both industry and government, the number of applications that use SGML as the underlying document representation language is growing rapidly. This paper describes how a markup language defined using SGML can be used as the basic method for structuring literate programming documents and can be made independent of the programming language. Furthermore, with SGML and tools to browse and edit SGML documents, literate programs can benefit from WYSIWYG editing and hypertext capabilities, and can even include pictures and other graphics. In addition, syntax-directed editors that support SGML can hide the markup tags and thus, remove the need to learn a markup language. Text databases that use SGML can also be used to store literate programs. As a result, literate programs can be browsed and queried using complex search expressions, a capability beyond most text editors. For example, the searches can involve combinations of structural and textual information. Because SGML is a popular and emerging standard, we can expect to have more powerful tools to manipulate many different forms of design and program documentation."
"This paper describes the issues involved in the development of a literate programming environment that uses SGML as the storage model."
Available in Postscript format: ftp://csg.uwaterloo.ca/pub/cascon94/moralesgerman/litprog.ps; [mirror copy].
"Abstract: Standards, if widely accepted, encourage the development of tools and techniques to process objects conforming to that standard. The paper describes a number of experiments using available tools to process text containing Z specifications adhering to the existing Z Interchange Format. The experiments resulted in tools that could be used in specific programming environments where Z was used to describe software systems."
Available on the Internet in Postscript format; [local mirror copy].
Abstract (from the proceedings volume): "Literate Programming is a documentation method that attempts to maintain consistency among the various design and program documents of a software system. Unfortunately the majority of the literate programming tools do not have appropriate user interfaces and require the users to learn complicated and cryptic tagging languages. SGML is a metalanguage used to specify markup or tagging languages that can be used to encode the structure of documents. Since SGML is an ISO standard and is being widely used by both industry and government, the number of applications that use SGML as the underlying document representation language is growing rapidly. This paper describes how a markup language defined using SGML can be used as the basic method for structuring literate programming documents and can be made independent of the programming language. Furthermore, with SGML and tools to browse and edit SGML documents, literate programs can benefit from WYSIWYG editing and hypertext capabilities and can even include pictures and other graphics. In addition, syntax-directed editors that support SGML can hide the markup tags and thus remove the need to learn a markup language. Text databases that use SGML can also be used to store literate programs. As a result, literate programs can be browsed and queried using complex search expressions, a capability beyond most text editors. For example, the searches can involve combinations of structural and textual information. Because SGML is a popular and emerging standard, we can expect to have more powerful tools to manipulate many different forms of design and program documentation."
One version is available online in Postscript format. This is the related version (with A. Ryman as a joint author) apparently published in Proceedings of the International Symposium on Applied Corporate Computing, October 1996; [local archive copy]. See also the bibliographic entry for "An SGML Based Programming Environment for Literate Programming," above.
Abstract: "This case study describes the issues involved in managing a large body of SGML-based aircraft maintenance documentation. Principal topics covered are managing appropriate granularity; sharing document components; managing multiple configurations; managing revision cycles; and impact on publications. The presentation explains the reasons motivating the technical decisions, describes the tools used to manage multiple configurations and versions, and evaluates the resulting system."
"Sogitec has developed Industrial Documentary Systems (IDS) independent of the Document Type Definitions (DTD) handled and which are based on the concept of Data Modules. These systems benefit from our dual qualification as a supplier of aircraft documentation and computer system designer. For many years, Sogitec has been handling technical documentation for all Dassault aircraft, including the Mirage and Rafale fighters and the Falcon series of business aircraft. The systems can be adapted to different user profiles.
"The documentary database is broken down into a multitude of units of documentation (UD). The final documents are compiled by concatenating individual units of documentation. This strategy has been dictated by the often overwhelming size of documents combined with the need for fine grained control of the system. Each unit of documentation has contents, an identifier and an identity card. The contents of a unit of documentation is an SGML-format text based on a DTD. Almost any DTD is possible, but they sometimes use mark-up which needs to be displayed to identify links, effectivity or any other information useful for database management.
Note: The electronic conference proceedings in hypertext were produced by Inso Corporation (DynaText) and by High Text (EnLIGHTeN). Information about the SGML Europe '97 Conference may be found in the main database entry.
"Abstract: The new Standard Generalized Markup Language (SGML) will open electronic distribution of documents on a wide scale if it becomes popular. Many vendors are planning to add SGML support to their desktop-publishing and word-processing applications. A growing number of viewing formats, many of which include SGML support, are vying for market share. Standardized document formatting will allow more re-use of information and reduce the amount of paper in offices by letting users view documents on-screen instead of always printing them. It will also provide enhanced navigation; SGML describes structural information in terms of how a document is organized, facilitating the automatic creation of hyperlinks, tables of contents and indexes. SGML is a true standard and helps ensure that information is always accessible."
Issue 4/2 of the The Gilbane Report is a special issue dedicated to SGML, celebrating the 10-year annniversary of the ISO 8879:1986 SGML standard. The feature article is a record of extensive interviews with several SGML and publishing authorities on the topic of SGML's current and future role in document management and publishing. Those interviewed included: George Alexander, Larry Bohn, Mark Buckley, Charles Goldfarb, Mark Klamerus, Murray Maloney, Howard Shao, and Robin Tomlin.
Standard Generalized Markup Language (SGML) promises to meet the current demands of publishing technology - to exchange, reuse, and reformat information without constraint. SGML is an extension of earlier markup systems, but also represents a departure from earlier systems by separating format from content and making document structure explicit. This article illustrates these concepts through a simple example. SGML-enforced document structure means that writers have fewer formatting concerns and can focus on writing content. SGML systems require several components, including an input system, a parser, a document type definition (DTD), and an output system. As interest in using SGML increases, so also does development of complementary standards and improved tools for creating and processing SGML source files. The author predicts that SGML will become thoroughly integrated into our publishing systems.
"Abstract: This article gives a general introduction to the form and function of the TEI header, points out some of the reasoning of the Text Documentation Committee that went into its design, and discusses some of its limitations. The TEI header's major strength is that it gives encoders the ability to document the electronic text itself, its source, its encoding principles, revisions, and characteristics of the text in an interchange format. Its bibliographical descriptions can be loaded into standard remote bibliographic databases, which should make electronic texts as easy to find for researchers as texts in other media, including print. Its major weakness is that the default header does not yet provide the ability for retrieval across texts in a networked research environment, which users may want now or in the future."
See also: "MARC (MAchine Readable Cataloging) and SGML/XML."
"Abstract: A general introduction to the form and function of the Text Encoding Initiative (TEI) header is given and its relationship to the MARC record is explained. The TEI header's major strength is that it documents electronic text in a standard interchange format that should be understandable to both librarian catalogers and text encoders outside of librarianship. It gives encoders the ability to document the electronic text itself, its source, its encoding principles, and revisions, as well as nonbibliographic characteristics of the text that can support both scholarly analysis and retrieval. Its bibliographical descriptions can be loaded into standard remote bibliographic databases, which should make electronic texts as easy to find for researchers as texts in other media, including print. Finally, it represents a long collaboration between librarians and members from a range of academic disciplines outside of librarianship, and may thus be a model of such collaboration. The header's major weakness is that the default header does not provide the ability for fine-grained retrieval within or across texts that users might want now or in the future as networked research environments improve.
A draft version of the document is available here (courtesy of the author).
The authors discuss problems of semantics in text processing where a large heterogenous database is involved. In such cases, generic encoding of individual texts (e.g., in TEI/SGML) is not of itself an adequate solution to the problems of semantics, and particularly, with respect to equivalency (different possible encodings using varying markup strategies).
See the author's summary in a brief SGML bibliography document.
Supplies a classification scheme for seven key publications on SGML, and provides abstracts for each of the publications.
See the document on the DOE/OSTI WWW server or in mirror copy [June 02, 1995] here.
Abstract: "This paper tells how Lawrence Livermore National Laboratory enriched CRI's online documentation set by publishing local manuals using the same SGML DTD used by CRI and delivered using (a more sophisticated version of) the same World Wide Web server (DynaWeb 3.0). This approach supports flexible local content and styles, yet integrates local and CRI manuals through one access mechanism and user interface. We explain the basic strategy involved, compare the benefits of this approach with three alternatives, and discuss the problems to which it gives rise."
Available online: http://www.intrepid.net/~fpes/cugs97_proceedings/author.folders/Shuler.3C/index.htm; [archive copy].
The article is a review of Eric van Herwijnen, Practical SGML, 2nd edition.
"SGML enables an attractive and sustainable vision of information management, but achieving it takes effort and a long-term perspective. It can be hard to resist the seduction of HTML and the World Wide Web, especially when SGML and HTML share the same syntax and the latter seems so much easier to use than the former. Thus it is essential that someone who wants to learn about SGML does so from a book that isn't so focused on syntax as is Practical SGML. A book that fits this requirement is ABCD...SGML by Liora Alschuler (1995), which is an excellent first book for managers and writers because it deals with issues and case studies. Practical SGML makes a good second book on SGML from which to learn syntax after the idea of SGML is firmly understood. The ideal book would explain the ideas of SGML and its syntax at the same time, but this book does not yet exist." [from the Conclusion]
The full text of the article is available online; [mirror copy].
Summary: "The key question, then, is not whether you should adopt SGML for such information, but whether you can successfully adopt it given your current capabilities, methods and technology. My experiences with many companies who have adopted SGML suggests that a small number of factors predicts the success or failure of an SGML project. In this essay I will help you diagnose your organization so that you can either proceed with confidence in adopting SGML or be able to identify the problems you need to fix to increase your chances of a successful migration."
Available online in HTML format: "Successful Migration to SGML"; [mirror copy].
This article is a case history of the development of the Silicon Graphics IRIS InSight (tm) system, the first system for viewing online documentation from a computer vendor that uses SGML, the Standard Generalized Markup Language. We describe the SGML publishing process from the perspectives of authors, production staff, and management. We review the key decisions and turning points in four phases of the project: (1) Project initiation and requirements; (2) Design and development; (3) Process characterization and institutionalization; (4) Deployment and enhancement.
The article is available online in HTML format: http://www.sgi.com/Technology/tech_InSight.html. Or: http://www.passage.com/pubs/white/insight.htm. [Mirror copy]
"Abstract: The Pinnacles Group, a collective organization made up of semiconductor manufacturers Intel Corp, Philips, National Semiconductor Corp and Texas Instruments, will release in Feb 1994 the first recommendations for a common format for data books for integrated circuits. Data-books include information such as a part's title, number, release date, revision data and drawings. The information could eventually be transmitted to buyers electronically rather than on paper. The data book standard will be based on the International Standards Organization's (ISO) Standard Generalized Markup Language (SGML). It will include a 'tag library' of information terms and Document Type Definitions which will limit the information available in various types of documents. A series of information analysis sessions will be hosted by Pinnacle Group's four members, who have already contributed a total of $200,000 to the project and hired an SGML consultant."
This manual is organized as follows: The remainder of this Section gives an overview on DREAM and its relationship to SGML. Furthermore we give an informal introduction to the development of document structure descriptions. Section 2 describes the architecture and global commands provided by DREAM. Section 3 introduces the language for describing the structure of documents. Section 4 develops an example structure description. Finally, Section 5 discusses some work arounds for typically encountered problems when working with DREAM.
DREAM is a parser which uses document structure descriptions to mark up not explicitly or inconsistently structured documents. It is designed as an extension of SGML (Standard Generalized Markup Language). The result of the markup process is a document conforming a DTD in SGML. The resulting document can be further processed with any SGML based tool.
Some knowledge of SGML is necessary to be able to work successfully with DREAM. Martin Bryan's book, SGML, An Author's Guide To The Standard Generalized Markup Lanuage (BRYAN) is recommended as a good introduction. SGML is an international standard that allows a system-independent, portable representation of documents. The structure of SGML documents is described in a DTD (Document Type Definition). SGML parsers are used to test the conformance of documents to a DTD. Editing SGML documents is supported by special editors (e.g. Author/Editor by Softquad). The main advantage of SGML tagged documents is the strict seperation of the logical structure of the document from its physical appearance (for example as a print product). This seperation improves the reusability of documents e.g. by other text processors, hypertext systems, and for building document databases. Retrieval and browsing based on the logical document structure can provide better access to large sets of documents.
One of the most severe `SGML startup problem' is the huge amount of untagged documents, as the processing of SGML documents always requires the structure of documents to be explicitly represented by tags. Most text processing software today does not support logical tagging that can easily be converted into SGML. In addition, for authors it may be burdensome to structure their documents to the depth required by some other application. Marking up the logical document structure manually can be highly repetitive and cumbersome.
DREAM uses rules to mark up a document. These rules are called the Document Structure Description (DSD). The DSD Syntax is closely related to the Syntax of Document Type Descriptions (DTDs). A DSD describes the layout structure of a document rather than its logical document. However, for a large class of documents like documents downloaded from public databases or electronic press agencies logical structure and layout structure closely resemble each other. The main problem with these documents is that their structure is only implicit and partially inconsistent. A DSD provides a declarative way to make the structure of such documents explicit.
For other classes of documents tools for restructuring SGML documents can be used to further transform the layout structure into a desired logical structure.
The document is available online from the GMD-IPSI FTP server: ftp://ftp.darmstadt.gmd.de/pub/dimsys/reports/P-92-08.ps.Z
"This document contains an unofficial compilation of the formal SGML specifications from the HyTime Standard (ISO/IEC 10744:1992), cross-referenced to the text that describes them, and with known errors corrected. The current version expands on the original by including the formal specifications of useful element types, notations, and instances from Annex A. of the standard. The material in this document is excerpted from the pre-publication review edition of The HyTime Handbook, which I hope will be available for limited distribution in the fall of 1993."
See the http://www.sgmlopen.org/sgml/docs/archform.htm (or: hereonline version of the Catalog (HTMLized by Eric Freese, January 1995). Alternately, the text version is available from the SGML Repository, or from the SIL WWW server, or elsewhere.
"Abstract: The processes involved in text processing applications are analyzed without regard to artificial distinctions between "publishing" and "information retrieval" activities. Functions performed by a system are distinguished from the means of invoking them, so that areas of commonality between processes can clearly be identified. An integrated ststem should have: (1) a set of operators for all generally required functions, (2) a language and environment which allow combination and extension of operators to perform applications, and (3) an application-independent representation of text on which the operators can operate in any sequence."
The document was prepared (apparently) for use by ISO Technical Committee X3J8, Computer Languages for the Processing of Text. This constitutes one of the early IBM design papers in which notions of generalized markup were elaborated.
This publication represents one of the clearest and most accessible presentations of IBM's GML (Generalized Markup Language). Many of the specific markup constructs and notation conventions which were implemented in SGML were derived from analogues in IBM's GML.
"NOTE: This paper summarizes the key aspects of entity management in SGML. It is not a tutorial on the SGML entity structure. Knowledge of entity constructs (entity declarations, ENTITY attributes, entity references, data attributes, external identifiers, etc.) is assumed. The author gratefully acknowledges the many helpful contributions of Erik Naggum, Eliot Kimber, and Wayne Wohler."
The document is available here online (or from SGML Open).
The article describes IBM's Generalized Markup Language (GML), and general principles of general markup.
Conference proceedings containing this paper also available as
Abstract: "Forty Years of Generalized Markup. 1997 marks the start of the second decade of ISO 8879, the International Standard that defines SGML. It also marks the start of the fourth decade of the generic coding and Generalized Markup concepts on which SGML is based. In his traditional SGML Europe keynote, SGML's inventor fearlessly predicts where we will be ten years from now, with perspective and direction gleaned from where we have come during the last thirty."
Note: The electronic conference proceedings in hypertext were produced by Inso Corporation (DynaText) and by High Text (EnLIGHTeN). Information about the SGML Europe '97 Conference may be found in the main database entry.
[Summary:] "At the [New York SGML] Forum meeting on March 13th at the New York Public Library, Joe Davidson presented a proposal for a Technical Advisory Group within the SGML Forum and called for participants. Approximately fifteen (15) companies signed up on the spot. . . The Board of Directors of the Forum has endorsed the establishment of a Technical Advisory Group (provisionally dubbed "NYTAG"). The group's mandate will be to act as a conduit on behalf of the membership: gathering ideas, concerns and recommendations, and relaying them to standards committees and vendors in the form of position papers, press releases, member surveys, etc." [Extracted; see the complete text of the article in the ISUG Newsletter.]
See also the announcement for the second meeting of the New York (area) Technical Advisory Group ("NYTAG"), from Joe Davidson (Microstar).
This tribute is printed in a special issue of <TAG> dedicated to the memory of Yuri Rubinsky. See another version of the tribute by Charles Goldfarb in a CTS posting, as well as the main eulogy collection.
This presentation was the text of a keynote address at SGML '96, and is printed in the Introductory section of the proceedings volume.
From the Conclusion: "I like to think of the history of SGML as - what else - a tree structure. One root - from Rice to GML to my basic SGML invention - joined at the base of the trunk by the other - Tunnicliffe to Scharpf and GenCode. The trunk, of course, is the extraordinary 8-year effort to develop ISO 8879, involving hundreds of people from all over the world. The products and tools that came after are the branches, the many applications the leaves, and they are all still growing.
And in all these 30 years, while the technologies of both computers and publishing have undergone overwhelming and unpredictable changes, the tree continues to bear the fruit that I described in 1971:
The principle of separating document description from application function makes it possible to describe the attributes common to all documents of the same type. . . [The] availability of such 'type descriptions' could add new function to the text processing system. Programs could supply markup for an incomplete document, or interactively prompt a user in the entry of a document by displaying the markup. A generalized markup language then, would permit full information about a document to be preserved, regardless of way the document is used or represented."
Note: The SGML '96 Conference Proceedings volume containing the full text of the paper may be obtained from GCA.
Abstract: "The SGML Extended Facilities, part of ISO/IEC 10744:1997 (HyTime Second Edition) add significant functionality to SGML." Thus "HyTime Two" represents two standards in one document: 1) SGML Extended Facilities and 2) Enhanced HyTime architecture, where "SGML Extended Facilities will become part of ISO 8879: SGML." The presentation by Goldfarb covers: SGML Extended Facilities Overview, Architectural Form Definition Requirements (AFDR), Key AFDR Concepts, Architecture Benefits, Property Set Definition Requirements (PSDR), Graph Representation Of property ValuEs (GROVE), General Architecture, Formal System Identifier Definition Requirements (FSIDR), Lexical Type Definition Requirements (LTDR), HyTime Architecture Enhancements, Foundation for Future of SGML and XML, For Further Information ... (see the HyTime User's Group Web Site).
See also the ISO 8879 Review Index Page maintained by Charles Goldfarb, where a summary of HyTime Two and SGML Extended Facilities is available online, and in outline format substantially represents the content of the published paper; [local archive copy]. In this connection, see also the online paper "What You Need to Know About the New HyTime,", by Steven R. Newcomb, of TechnoTeacher Inc..
This paper was delivered as part of the "Expert" track in the SGML/XML '97 Conference.
Note: The SGML/XML '97 conference proceedings volume is available from the Graphic Communications Association, 100 Daingerfield Road, Alexandria, VA 22314-2888; Tel: +1 (703) 519-8160; FAX: +1 (703) 548-2867. Complete information about the conference (e.g., program listing, tutorials, show guide, DTDs, conference reports) is provided in the dedicated conference section of the SGML/XML Web Page and via the GCA Web server. The electronic proceedings on CDROM was produced courtesy of Jouve Data Management (Jouve PubUser).
"Abstract: Some of the key concepts and features of HyTime, which is being developed as an American and an international standard (ISO/IEC 10744) for structured representation of hypermedia information, are introduced. HyTime is an application of ISO 8879 and is interchanged using ASN.1 (ISO 8824) for OSI compatibility. HyTime complements and enhances the utility of standards for individual multimedia objects, such as motion video and still pictures. HyTime is not a complete hyperdocument architecture. Its functions will be incorporated into architectures and applications designed by standards committees, industry groups, and others."
Discusses the production of a textbook by Cliff Jones (
This book, from three well-known SGML experts, represents a valuable and monumental resource for SGML and XML users. The volume subtitle "A Unique Guide to Determining Your Requirements and Choosing the Right SGML and XML Products and Services" accurately describes the principal focus and utility of the book. Its content is divided into 39 chapters in 5 parts: Part 1: Determining Your Requirements; Part 2: HARP Analysis in Depth; Part 3: SGML and XML Tools and Services; Part 4: The SGML Community; Part 5: The SGML and XML Directory. The book substantially incorporates the content of Steve Pepper's "Whirlwind Guide to SGML Tools and Vendors." A sixth section contains the "Sponsor Showcase" with informative 'white paper' advertising from 30 organizations that helped fund the publication. A unique feature of the book is a presentation and elaboration of the "HARP" technique of analysis which helps users understand what happens to information as it passes through publishing systems from creation to final delivery; this analysis tool allows users to match candidate resources against their specific requirements. The name "HARP" (tm) signifies: Human Thought, Computer Abstraction, Computer Rendition, Physical Presentation. HARP analysis helps users to 1) assess their publishing requirements in a visual manner, 2) evaluate publishing systems based upon their methods of storing and representing information in the computer, 3) discover new ways to utilize current publishing systems better 4) discover how workflow analysis and reengineering can yield great payoffs, and 5) determine more precisely what SGML and XML tools and services are applicable to the enterprise problem domain.
The CDROM supplement provides authoritative descriptions for more than 150 SGML and XML tools and services, categorized according to the function they fulfill, per the taxonomy used in Steve Pepper's Whirlwind Guide to SGML Tools. It features: 1) a showcase for leading SGML and XML software and service providers, featuring in-depth product and service information, white papers, SGML samples, live demos, and trialware; 2) a hand-picked collection of 45 genuine, productive, no-time-limit SGML and XML free software tools; 3) some "trialware" software resources supporting "Fun with HARP analysis."
See also the "Prentice-Hall SGML Series" web page. The book was reviewed in XML Files: The XML Magazine.
[Promotional]: "The SGML Buyer's Guide presents the most comprehensive, detailed, and up-to-date descriptions of today's SGML and XML resources available -- and a unique methodology for picking the right ones for your needs. Be among the first to discover the new HARP(tm); Analysis methodology for assessing your publishing operations in an intuitive, visual way. Includes a CD-ROM that contains extensive SGML and XML materials, live demos, trialware, and 45 great genuinely free software products." [blurb from the SGML/XML '97 promo]
The SGML Buyer's Guide helps experts and beginners to analyze the publishing process and to evaluate and choose the best tools and services for their needs. It also presents a new methodology, developed by the authors, that simplifies and optimizes publishing systems. The CD-ROM contains a professionally chosen selection of SGML and XML freeware, a graphics package, and demos of many commercial SGML software packages." [from the Amazon.com synopsis]
The authors report on the results of the May 5 - 9 meetings of ISO SC18/WG8 in Barcelona. Highlights of the meeting related to the "WebSGML Adaptations" TC, aligned with and to be incorporated into the HyTime ("Second Edition") TC.
Some of the items addressed in the TC are: (1) duplicate name token attribute values are now permitted in the same set of attribute definitions; (2) multiple attribute definition list declarations for a single element type; (3) declaration of global attributes; (4) impliable document type definitions; (5) impliable element, attribute and entity declarations; (6) parsing without DTDs; (7) referencing of SGML declarations as external entities; (8) optional removal of capacity and quantity constraints; (9) unbundling of short tag options to allow control of individual options; (10) predefined (character set independent) entities for charactersused as delimiter strings; (11) simplified white space handling ; (12) optional end-tags for empty elements; (13) hexadecimal numeric character references; (14) use of Internet domain names in formal public identifiers; (15) a new conformance class: "tag-valid"; (16) new entity structure constraints, such as "integrally stored" and "reference-free". [Extracted; see the complete text of the article in the ISUG Newsletter.]
This volume contains the full annotated text of ISO 8879 (with amendments), authored by IBM Senior Systems Analyst and acknowledged "father of SGML," Charles Goldfarb. The book was itself produced from SGML input using a DTD which is a variation of the "ISO.general" sample DTD included in the annexes to ISO 8879. The SGML Handbook includes: (1) the up-to-date amended full text of ISO 8879, extensively annotated, cross-referenced, and indexed (2) a detailed structured overview of SGML, covering every concept (3) additional tutorial and reference material (4) a unique "push-button access system" that provides paper hypertext links between the standard, annotations, overview, and tutorials. See a detailed Table of Contents listing for further description.
Abstract: "This article is a commentary -- over a quarter-century after the fact -- on the first published paper to summit the need for (and hint at the existence of) what is now the Standard Generalized Markup Language. It was presented at the 33rd Annual Meeting of the American Society for Information Science in Philadelphia, October 15, 1970, and published in Volume 7 of the ASIS Proceedings. The editors of this Special Issue of JASIS felt that that meeting was worth remembering here because of its hitherto unpublicized connection with the origin of SGML. In addition, it is also worth remembering because of its closing banquet, which featured an erudite and witty speech by a professor with two doctorates, a piece balalaika orchestra, the entire Philadelphia Mummers band (replete with banjos, saxophones, and feathered headdresses), and a middle-eastern belly dancer who worked on the table tops! I've spoken at some hundred conferences since then and none of them has even come close."
See the bibliographic entry for the original publication: Goldfarb, Charles F.; Mosher, E. J.; Peterson, T. I. "An Online System for Integrated Text Processing." Proceedings of the American Society for Information Science Volume 7 (1970) 147-150.
See the main document entry for the complete list of articles and contributors, as well as other bibliographic information.
The article overviews the origins of XML and explains why XML is a good thing for the longevity of SGML. One early criticism of XML ("DTD-less documents") is interesting, since (the author says), "DTD-less processing was actually an original objective of SGML, which is why ISO 8879 defines both validating and non-validating parsers: the former were to be used when creating documents and the latter when formatting them."
"Although XML and HTML are both derivatives of full SGML, they are very different. HTML is a complete SGML application; that is, there is a DTD (several versions of it, actually) and prescribed processing for the elements, implemented by Web browsers. An HTML user is presented with a fixed "vocabulary" of element types, each of which will be rendered in a predictable way. . . . XML is different. It is an "application profile" -- a set of rules for constructing SGML applications. As a profile, it is more concerned with the syntax of SGML than the vocabulary. Users can define their own element types, DTDs, and the style sheets that govern their rendition. In other words, there can be an unlimited number of XML applications." [Extracted; see the complete text of the article in the ISUG Newsletter.]
Abstract/annotations pending. See provisionally the announcement from Charles Goldfarb (July 28, 2998).
Abstract: "In creating complex interactive documents, some technical communicators use software products that emphasize format and style in displaying pages. This approach limits the communicator's ability to repackage the information presented in electronic versions and increase its interactive use, which is a key benefit of the structure-based approach offered by using Standard Generalized Markup Language (SGML). In a number of projects that render mathematical, scientific, and engineering texts electronically, using SGML allows the technical communicator to make equations interactive and to automate links to references. The author sketches out problems associated with page description approaches to displaying electronic pages and discusses the comparative benefits of SGML." [Manuscript received November 1996; revised February 1997.]
Quoteworthy: "Simply put, SGML is the 'acid-free paper' of the electronic world. Despite its clear benefits, the limited acceptance of SGML in electronic publications is no mystery. The high cost of post-compositional translation of text into SGML and the resistance of typesetters and printers to retool their considerable infrastructure are real disincentives to change. Adobe Acrobat has provided an alternative that is cheap and 'good enough' for the moment, but the fundamental problems of being based on a page descriptive language prevent it from becoming a comprehensive long-term electronic publishing solution."
Available online: "Using SGML to Create Complex Interactive Documents for Electronic Publishing." By Peter Goldie. This article is part of a special issue of IEEE Transactions on Professional Communication (with an introduction by Jonathan Price): "Structuring Complex Information for Electronic Publication." See also the main entry for the IEEE Computer Society and its use of SGML.
Bob Goldstein suggests that using SGML rather than multiple versions of HTML for the WWW would make sense.
For further information on the use of SGML by the National Library of Medicine for database publishing, see the main entry for the NLM.
The paper discusses the use of SGML to structure texts in a medical database for full text retrieval. The paper has two themes: (1) an overview of the concept of an online reference work (ORW) as defined by the Online Reference Works in Medicine, a collaborative program between the National Library of Medicine (NLM) and the William H. Welch Medical Library, Johns Hopkins Medical Institutes (JHMI), and (2) an overview of the evolving concepts of full text and object-oriented retrieval from structured texts. The article addresses the concepts and general observations on the implementation of an ORW, the efforts to extend the present model, and a brief description of the object-oriented full text retrieval methodology to be applied.
For further information on the use of SGML by the National Library of Medicine for database publishing, see the main entry for the NLM.
"A program entitled Online Reference Works in Medicine (ORW) is being pursued at the Information Technology Branch ITB) of the Lister Hill National Center for Biomedical Communication (LHNCBC), National Library of Medicine (NLM), in collaboration with the Laboratory for Applied Research in Academic Information (LAMED, William H. Welch Medical Library, Johns Hopkins Medical Institutions (JHMI). The ORW Program is predicated on the premise that the creation, production, and intellectual value of existing reference works can be improved and enhanced by being brought online as a complement to, not a surrogate for, the printed text. The medical text being used within the context of present efforts is Principles of Ambulatory Medicine (PAM), by Drs Barker, Burton, and Zieve, Editors, and published by Williams and Wilkins."
"While the ORW program has an online emphasis in the title, the research is, in reality, targeted at the complete life cycle of a publication, from the scholarly creation, through the editing, to the publication. Publication is, within the ORW program, being generalized to include both hard-copy and online formats. In order to proceed within one environment and include both output formats, a decision was made to encode the text with SGML and, where possible, to use the Association of American Publishers (AAP) Electronic Manuscript Standard for Document Type Book. While the desirability of so encoding the text is obvious from the perspective of hardcopy publishing, the potential ramifications are even more pronounced in terms of online storage and retrieval. The greatest advantage from the perspective of the latter is the unambiguous identification of objects; independent of SGML, it was already recognized that the requirements of structured texts required an object-oriented retrieval system. The development of the latter is proceeding in parallel with the development of the editing and text processing facilities discussed herein. The three editors of PAM are staff physicians at the Francis Scott Key Hospital, Johns Hopkins Health System. The editors and their editorial assistants have been provided with Macintosh II workstations by LARAI and are linked to the Welch Medical Library. The editing of manuscripts for the third edition of PAM is now proceeding utilizing a combination of SoftQuad Author/Editor 1.0 and Microsoft Word 4.0." [from the Introduction]
See the main entry for National Library of Medicine for further information. Note: The volume editor for SGML Users' Group Bulletin 4/1 is David W. Penfold (Edgerton Publishing Services, Huddersfield, UK).
PAT and associated text processing tools are built around descriptively-marked text, even if not specifically SGML text. The tools have been created in conjunction with a large body of research conducted at the University of Waterloo Centre for New OED and Text Research. Compare also "PAT, GOEDEL, LECTOR and More: Text-dominated Database Software, " pp. 83-84 in: Tools for Humanists, 1989. A Guidebook to the Software and Hardware Fair Held in Conjunction with the Dynamic Text 6-9 June 1989 Toronto. Toronto, Ontario: Centre for Computing in the Humanities, 1989. The article describes several software tools developed at the Waterloo Centre, including TRUC (an editor for SGML or SGML-style tagged text). TRUC supports multiple views of a tagged document, based upon use of style-sheets.
The University of Waterloo has pioneered several important research efforts in the study of machine-readable lexical databases, machine transduction and generation of descriptively marked-up electronic texts (SGML-style markup). The Centre has also developed software to search, interactively display and format text structured with descriptive markup. These tools were developed for the New Oxford English Dictionary Project with the long range goal of application to other texts. A Newsletter is issued by the Centre describing ongoing research, publications, software enhancements, work of visiting scholars, conferences and other events. Persons interested in the Centre's research and publications may write for a current document list (e.g., especially the several publications and technical reports by Darrell R. Raymond, Donna L. Berg, Gaston H. Gonnet, Timothy J. Benbow, Heather J. Fawcett, Rick Kazman, Frank Wm. Tompa, George V. J. Townsend. See Gonnet, Raymond and Tompa in this bibliography. Address: Electronic Text Research; Centre for the New Oxford English Dictionary; Davis Centre; University of Waterloo; Waterloo, Ontario; Canada N2L 3G1 TEL: (1 519) 885-1211 extension 6183; Email (Internet):newoed@waterloo.edu.
The PAT and LECTOR tools are now supported commercially by Open Text Systems, Inc., a spin-off company working closely with the University of Waterloo Centre for the New Oxford English Dictionary and Text Research. Open Text Systems was "established to market, develop and customize the text management software created at the (NOED) Centre." The company began operations in December, 1989, and supports the Transduction Toolkit, PAT (text search system), GOEDEL (database management system), LECTOR (text display system) and TRUC software developed at the University of Waterloo Centre. The supported software was designed around and tested on one of the largest and most complex lexical databases, the Oxford English Dictionary, Second Edition. For further description, see (1) Steve Higgins, "Open Text Adds Automatic Indexing to Document Managment Software," PC Week 7/32 (August 13, 1990) 38; (2) UW Centre for the NOED Newsletter 22 (December, 1989) 1-2; (3) Dale Waldt, "OpenText Search and Retrieval Tools," <TAG> 5/1 (January 1991) 9. The group may be reached at : Open Text Systems, Inc., Unit 622, Waterloo Town Square, Waterloo, Ontario, CANADA N2J 1P2; Tel: (519) 746-8288; FAX: (519) 746-3255; Email (Internet): tbray@watsol.waterloo.edu (Tim Bray).
See similarly a University of Waterloo technical report, OED-87-01 by the same authors.
Abstract: Beginning to create the New Oxford English Dictionary database has resulted in the realization that databases for reference texts are unlike those for conventional enterprises. While the traditional approaches to database design and development are sound, the particular techniques used for commercial databases have been repeatedly found to be inappropriate for text-dominated databases, such as the New OED. In the same way that the relational model was developed based on experiences gained from earlier database approaches, the grammar-based model presented here builds on the traditional foundations of computer science, and particularly database theory and practice. This new model uses grammars as schemas and "parsed strings" as instances. Operators on the parsed strings are defined, resulting in a "p-string algebra" that can be used for manipulation and view definition. The model is representation-independent and the operators are non-navigational, so that efficient implementations may be developed for unknown future hardware and operating systems. Several approaches to storage structures and efficient processing algorithms for representative hardware configurations have been investigated.
See also Gaston H. Gonnet and Frank Wm. Tompa, "Mind Your Grammar: A New Approach to Modeling Text," pp. 339-346 in the Proceedings of the 13th International Conference on Very Large Data Bases (VLDB87), Brighton, England (Sept. 1-4, 1987). See also the NOED main entry.
"Abstract: Aspects of text processing important for the scientific community are discussed, and an overview of currently available software is presented. Progress on standardization efforts in the area of document exchange (SGML), document formatting (DSSSL), document presentation (SPDL), fonts (ISO 9541) and character codes (Unicode and ISO 10646) is described. An elementary particle naming scheme for use with LATEX and SGML is proposed. LATEX, PostScript, SGML and desk-top publishing allow electronic submission of articles to publishers, and printing on demand. Advantages of standardization are illustrated by the description of a system which can exchange documents between different word processors and automatically extract bibliographic data for a library database."
See also (provisionally): Michel Goossens et Eric van Herwijnen, "Introduction à SGML, DSSSL et SPDL," Cahiers GUTenberg 12 (décembre 1991) 37-70.
Abstract: "SGML, the Standard Generalized Markup Language, deals with the structural markup of electronic documents. It was made an international standard by ISO in 1986. SGML soon became very popular thanks in particular to its enthusiastic acceptance in the editing world, by large multi-national companies, governmental organizations, and, more recently, by the ubiquity of HTML, HyperText Markup Language, the source language of structured documents on the WWW. This article discusses the basic ideas of SGML and looks at a few interesting tools. It should provide the reader with a better understanding of the latest developments in the field of electronic documents in general, and of SGML/HTML in particular."
Pages 103-123 constitute the main part of the article; pages 124-145 are appendices with printout of the HTML 2.0 DTD, sample marked-up documents, etc.
Abstract: "Markup languages are used to identify and delimit the components of manuscripts. The principal application of these languages is to provide a means for authors to markup their manuscripts with the information required by publishers for typesetting. LATEX is a popular de facto standard markup language in some technical communities, such as academic computer science. SGML is an official ISO standard for defining markup languages. The qwertz document processing system is an SGML application we have developed for our own use, intended to combine the advantages of SGML and LATEX. It consists of a model of LATEX as a SGML Document Type Declaration (DTD) and Unix tools for translating SGML documents using this DTD into LATEX, as well as troff. This article discusses our experiences in building and using the system.
For more on SGML/XML and (La)TeX, see the dedicated database entry and the topical bibliography listing.
The author relates "experiences and thoughts on using CALS and non-CALS table formatting, ID management, and generating multiple electronic delivery formats (besides SGML)." Issues of legacy data conversion are discussed.
Available online in HTML format: "Practical Issues in SGML Publishing", by Liz Gower; [mirror copy]. For further information on the conference, see: (1) the description in the conference announcement and call for papers, and (2) the full program listing, or (3) the main conference entry in the SGML/XML Web Page.
Graf observes that "a document created under the exact rules of a valid DTD may very well be invalid when passed through an instance parser." See the response of John McFadden and Sam Wilmott, "Ambiguity in the Instance: An Analysis" in
Abstract: "There are many hypertext authoring tools available for specific outputs. For example, software that enables HTML or Windows Help authoring. These tools provide easy to use solutions for specific outputs, but they lack the benefit of a tailored, structured environment, and of course they do not allow the creation of multiple outputs from raw content stored as SGML -- a requirement we have at Novell.
However, these tools provide distinct advantages to the author that an SGML-based authoring system should strongly consider. To ignore these capabilities is to risk the SGML system being unusable, or incapable of handling large hypertext projects. These advantages center around the management of small information objects we call topics, and the links between them that are inherent in hypertext systems. To combine the power of SGML with the advantages of off-the-shelf authoring tools, Novell has developed a hybrid, named HelpWise. Novell's goal with HelpWise is to leverage the benefits of a structured SGML authoring system, and retain the link management that is crucial while creating hypertext documentation."
Note: The above presentation was part of the "SGML Case Studies" track at SGML '96. The SGML '96 Conference Proceedings volume containing the full text of the paper may be obtained from GCA.
Abstract: "It is unfortunately easy to confuse the terms that SGML uses when discussing characters and character sets. This graphic illustrates the relationships of characters, character sets, and related concepts. [Illustration and definitions for character, character repertoire, code set, character set, code set position, coded representation, character number].
For other articles in this issue of MLTP, see the annotated Table of Contents.
Abstract: "Using simple examples concentrating on five characters from an exotic character set, the author shows techniques for describing a document's character set in the SGML Declaration and how different document character sets are treated by the parser. The presentation concludes with examples of how the techniques are used in real life."
Available online in HTML format: "Document Character Sets by Example", by Tony Graham, Consultant, Mulberry Technologies, Inc.
Note: The above presentation was part of the "And More..." track at SGML '96. The SGML '96 Conference Proceedings volume containing the full text of the paper may be obtained from GCA.
In the category of "Free SGML Transformation Tools" are free software packages "for transforming an SGML instance into something else, be that another SGML instance or a file in some other format." Graham discusses "the criteria for selecting an SGML transformation processing tool."
Available online in HTML format: "Free SGML Transformation Tools", by Tony Graham, Consultant, Mulberry Technologies, Inc. [local archive copy]
Note: The above presentation was part of the "SGML User" track at SGML '96. The SGML '96 Conference Proceedings volume containing the full text of the paper may be obtained from GCA.
Web developers and programmers have access to an authoritative and well-written guide to Unicode, thanks to the publication of Unicode: A Primer. Written by Tony Graham (Mulberry Technologies), Unicode: A Primer "is the first book devoted to the Unicode Standard Version 3.0 and its applications (other than the standard itself)." The endorsement of the book by Rick McGowan, a Unicode Consortium Technical Director, speaks volumes: "For developers who plan to use the Unicode Standard, this is the best companion book so far." The Unicode standard, as described by Tony Graham on his Unicode web site, "is a character encoding standard published by Unicode Consortium. Unicode is designed to include all of the major scripts of the world in a simple and consistent manner. The Unicode Standard, Version 3.0, defines 49,194 characters from over 90 scripts. It covers alphabetic, syllabic, and ideographic scripts, including Latin scripts, Greek, Cyrillic, Thai, ideographs unified from the scripts of China, Japan, and Korea, and Hangul characters used for writing Korean. The Unicode Standard also defines properties of the characters and algorithms for use in implementations of the standard. Every major operating system, many programming languages, and many applications support the Unicode Standard." The new guide to Unicode implementation is a book that needed to be written. Tony Graham is eminently qualified to be its author: he has worked intimately with Unicode and other character encoding standards since 1994, and has written several key articles on Unicode. Part I of Unicode: A Primer includes "Introducing Unicode and ISO/IEC 10646" (a first look at the Unicode Standard, ISO/IEC 10646, and the Unicode Consortium) and "Unicode Design Basis and Principles." Part II (Chapters 3-8) gets to the heart of Unicode and related materials standardized by the Unicode Consortium. It provides three views of the structure of the Unicode Standard (by character block, by the files in the Unicode Character Database, and by the ISO/IEC 10646 view of the Universal Character Set); also: summaries of the features of the UCS-4, UCS-2, UTF-16, UTF-7, UTF-8, UTF-EBCDIC, and UTF-32 encodings and of endianness, transcoding, and the Standard Compression Scheme for Unicode (SCSU); an overview of the properties that a single character can have; things you need to know when working with sequences of characters; descriptions of the principles that guided encoding of the CJK ideographs and Korean Hangul characters in the Unicode Standard; conformance requirements for the Unicode Standard and ISO/IEC 10646, plus details of how to submit new scripts. Part III explains the use of the Unicode standard, particularly in Internet applications. The author includes descriptions and sample programs demonstrating Unicode support in nine programming languages. The book also has four valuable appendices (tables; descriptions of each of the character blocks in Unicode 3.0; information about the Unicode Consortium, versions of the Unicode Standard, Unicode Technical Reports, and Unicode conferences; tables of ISO/IEC 10646 amendments, blocks, and subsets), glossary, index, and bibliography. The book's complete Table of Contents, together with links to Unicode resources, is published on the companion Web site. For related resources, see "XML and Unicode."
This issue of Baskerville makes available a number of papers presented at a joint meeting of the UK TEX Users' Group and BCS Electronic Publishing Specialist Group (January 19, 1995) [mirror copy]. See the link to Baskerville, or email: baskerville@tex.ac.uk. Issue 5/2 of Baskerville has other articles on SGML: "Portable Documents: Why use SGML?" (David Barron); "Formatting SGML Documents" (Jonathan Fine); "HTML & TeX: Making them sweat" (Peter Flynn); "The Inside Story of Life at Wiley with SGML, LaTeX and Acrobat" (Geeti Granger); "SGML and LaTeX" (Horst Szillat). See the special bibliography page for other articles on SGML and (LA)TEX.
Extracts: "As a company we have monitored the progress of SGML since 1985, but have only recently used it in earnest. Our first project is a 5000 page encyclopaedia. . .in an 8-volume set. . .with more than 3 million words."
The author describes experiences with SGML at Wiley by using Encyclopaedia of Inorganic Chemistry as a case study. The article is based upon a similar document published in Baskerville 5/2 (March 1995).
Research report within the project Opéra (Outils Pour les documents Électronique, Recherche et Applications), with focus upon the structure editor Grif.
Available in Postscript format on the Internet: ftp://ftp.imag.fr/pub/OPERA/doc/RapportGranier.ps.Z [mirrored copy, November 1995].
The GCA Standard 101-1983 was characterized as a "Preliminary Trial-Use Standard." Part A of the book (pages 1-98) cover "SGML" as it existed in its sixth working draft. Part B of the book (pages 113-) contains expository material which was to serve as an aid to understand the Document Markup Metalanguage Standard.
The SGML '96 Conference celebrated a decade of SGML, reckoned from the first publication of SGML as an ISO standard in 1986. The seventy-eight (78) published papers in the proceedings volume are divided into seven major sections, and represent a majority of the eighty-five (85) papers read at the conference. The collection not only documents an impressive milestone for the ISO 8879 standard, but serves as a valuable resource for SGML users. The SGML '96 conference itself was attended by over 1400 people, and included more than 120 speakers, and 100+ poster sessions in addition to conference sessions and exhibits.
Introductory essays in the proceedings volume are from the conference Co-Chairs B. Tommie Usdin and Deborah A. Lapeyre, and from Charles F. Goldfarb ("The Roots of SGML - A Personal Recollection"). The full inventory of published papers includes: Introductions (3 papers); Newcomer (11 papers), User (21 papers), Expert (16 papers), Business Management (5 papers), Case Studies (16 papers), "And More" (6 papers). The volume has complete title and author indexes. It was produced directly from the SGML source (based upon the "GCAPAPER" DTD) using ArborText's ADEPT Series SGML software.
Most of the published conference papers are referenced (by author) in the online bibliography of the SGML/XML Web Page. Each bibliographic entry includes the published abstract, author contact information, an indication of the "track" in which the presentation was delivered, and additional annotations or relevant hypertext links. The published abstracts for the papers, in many cases, are considerably more detailed than the brief abstracts that accompany the online conference program. The SGML '96 Conference Proceedings volume containing the full text of the papers may be obtained from GCA. GCA may also be reached at: GCA Publications, 100 Daingerfield Rd, Alexandria, VA 22314-2888 USA.
The Guide has been issued in several editions, with updates. Several SGML-related standards documents distributed by GCA are listed and annotated in this Guide. Listings of SGML suppliers are in alphabetical order and provide information on the type of business, name and description of products or services, and prices. The Guide is issued on a subscription basis in looseleaf format; updates are issued quarterly or as information is accumulated.
Absract: "The growing complexity of automobiles, coupled with U.S. government requirements for emission-related information to be made available to independent repair technicians, is requiring major changes in the way technical information is delivered in the automobile industry. The European automotive industry also will benefit from implementation of automotive industry also will benefit from implementation of SGML, CGM, TIFF and other standards specified by the SAE J2008 task force, which was charged with developing the recommended organization of electronic service information. This presentation describes common problems U.S. automobile manufacturers have faced in creating electronic service bay systems, and the additional challenges that arise when some of the needed data resides in foreign countries in closed, proprietary systems.
"In the last few years, the 'Big Three' U.S. car companies (GM, Ford, Chrysler) have begun to introduce electronic service information into their dealerships. These systems differ widely in their look, feel, and user-friendliness, but they all use SGML and can produce information organized according to J2008. While a few importers, such as Hyundai, already have operational SGML systems, most are still developing viewing and authoring systems simultaneously with creation of an SGML/CGM database. Development of a J2008-compliant system is even more complex for companies where much of the service information required resides in foreign locations, sometimes in foreign languages, in proprietary, non-standard publishing systems."
Note: The electronic conference proceedings in hypertext were produced by Inso Corporation (DynaText) and by High Text (EnLIGHTeN). Information about the SGML Europe '97 Conference may be found in the main database entry.
"Abstract: Many of Europe's multinational companies are employing the digital Standard Generalized Markup Language (SGML) to help reduce paper usage. SGML supports a wide range of media, including paper, CD-ROM, online files and Braille. The International Organization for Standardization (ISO) established SGML as a standard for structured documentation. European companies are ahead of their US counterparts in implementing the standard, although the cost savings realized from SGML can be tremendous. Adopting the standard has provided European companies with better product maintenance and faster time to market. SGML also dovetails nicely with the demands of consortia-based cooperative work required in a global economy. US automotive companies are beginning to adopt SGML to standardize documentation and comply with the Clean Air Act."
"Abstract: HTML represents the worst of two worlds. We could have taken a formatting language and added hypertext anchors so that users had beautifully designed documents on their desktops. We could have developed a powerful document structure language so that browsers could automatically do intelligent things with Web documents. What we have got with HTML is ugly documents without formatting or structural information. I show that a standard modern novel cannot be rendered readable even in HTML level 3. I propose a document- and author-centered way of determining the simplest enhancements to HTML sufficient to capture the intent of the authors. I review Tom Malone's mid-1980's work on semistructured messages, which shows us how to add structure without sacrificing flexibility and generality. I describe how to add structure tags without breaking current Web browsers and HTTP servers. Finally, I discuss useful ideas that we can take from the KQML agent-communication language."
Available on the Internet via an MIT WWW server [mirror copy, text only, June 1995].
The volume contains an edited collection of fourteen essays on various aspects of conceptual modelling and development of standardized encoding methods for representing knowledge in historical texts. The contributions are by Manfred Thaller, Lou Burnard, Daniel I. Greenstein, Hannes D. Galter, Ingo H. Kropač, Donald A. Spaeth, Hans Jørgen Marker, Thomas Werner, Jan Oldervoll, and Kevin Schurer. The essays reflect interaction with and critique of encoding methods which emerged from the TEI phase I efforts as documented in TEI-P1; see on TEI entry and its pointers to the UICVM LISTSERVer where early TEI research documents are archived.
For other journal special issues and monographs dedicated to the Text Encoding Initiative, see the relevant subentry for TEI.
Summary: "This paper grows out of work recently carried by the Arts and Humanities Data Service and the UK Office for Library and Information Networking on metadata for resource discovery - that is the descriptive data which is supplied for information resources to facilitate their location or discovery by interested users (http://ahds.ac.uk/ and http://ukoln.ac.uk/ respectively). That work was built on the following assumptions: that scholars alike require access to information about relevant materials irrespective of where, how (e.g. as books, audio tapes, digital objects), or by whom (e.g. librarians, data archivists, museum curators) they are stored, and regardless of the manner in which they are described or catalogued. They want to query any number of information systems in parallel, and in this respect require a framework which will allow resource discovery across particular subject, curatorial, regional and other domains; a framework that will facilitate meaningful integrated access to the intellectual record generally. [. . .we highlight other ] the aspects of the research on which this paper will dwell; notably on the prospects for using the TEI to integrate resource description practices which emerge as de facto or de jure standards amongst particular groups of information specialists."
The extended abstract for the document is available online: http://www.stg.brown.edu/webs/tei10/tei10.papers/greenstein.html; [local archive copy]. See the main database entry for additional information about the conference, or the Brown University web site.
Abstract: "This paper focuses on the types of questions that are raised in the encoding of historical documents. Using the example of a 17th century Scottish Sasine, the authors show how TEI-based encoding can produce a text which will be of major value to a variety of future historical researchers. Firstly, they show how to produce a machine-readable transcription which would be comprehensible to a word-processor as a text stream filled with print and formatting instructions; to a text analysis package as [a] compilation of named text segments of some known structure; and to a statistical package as a set of observations each of which comprises a number of defined and named variables. Secondly, they make provision for a machine-readable transcription where the encoder's research agenda and assumptions are reversible or alterable by secondary analysts who will have access to a maximum amount of information contained in the original source."
Abstract: "One of the most difficult parts of implementing SGML in a commercial publishing setting is the conversion of raw manuscript into valid SGML. Because SGML is so information-rich, its creation requires a higher degree of skill on the part of keyboarders, editors, and authors than standard word-processing formats. There are a number of ways to deal with the problem of getting non-SGML manuscript into an SGML format. This problem is discussed in detail in the white paper 'Commercial Book Publishing and Author Control'."
The document is available on the Internet in HTML format; [mirror copy]
Abstract: "Book publishing is a conservative industry that relies on a tried-and-true process, characterized by a strong division between 'editorial' functions (obtaining and preparing manuscript) and 'production' functions (turning manuscript into printed books), a division commonly known as 'the wall'. SGML has been relegated to the production side in most implementations. While there is much to be gained here, this limited approach also involves a considerable sacrifice of potential benefit. This paper presents a blue-print for maximizing the benefits of SGML in a commercial book-publishing setting by showing how SGML can be leveraged on both sides of the wall, with consideration of practical implications for both process modification and the implementation of technology.
"The proposed approach for taking full advantage of SGML in a publishing setting involves the mark-up of manuscript not at the stage when it is traditionally keyboarded for typesetting, within production, but at the intake stage. Because there is no way to enforce author compliance with an SGML authoring strategy, it must be handled on submission by an 'intake unit' that is under the control of the editorial departments. This association allows those responsible for DTD creation and initial tagging of manuscript to be in direct contact with those whose job it is to dictate the structure of the documents, and who are most familiar with its content.
"Further, this connection makes the editors in charge of decisions regarding repurposing (electronic versions of existing titles on Web or CD-ROM) and reuse (ancillaries, subsequent editions) directly aware of the potential of SGML to help ease the costs of these (often low-profit) publications. If properly implemented, they also avoid the need to learn the more arcane and unfamiliar aspects of SGML; they can rely on their own staff (NOT answerable to the head of production) to supply them with the necessary technical guidance. The 'structured manuscript' allows the automation of repetitive and labor-intensive tasks in the development process, while making sample material readily available for delivery in print or on the Web for early promotion efforts and expert review.
"By the time the manuscript passes to production, many time-consuming production chores (typecoding, identification of ambiguous structural elements, consistency checks) have already been performed. The editorial departments are brought into closer touch with the realities of scheduling (a constant bone of contention between editorial and production arms), while the production department can now create the printed book at a much accelerated rate, again through automated processes enabled by SGML.
"The introduction of the SGML 'intake unit' into what is traditionally a non-technical branch of a publishing company could be a difficult change to implement; through the proper use of conversion and authoring technology for both initial tagging and subsequent development (and with appropriately designed document types), many of these challenges can be overcome. The gains realized in giving the power of SGML to those who can best make use of it will also help to enable the success of this tricky aspect of implementation.
"The antagonism between editorial and production units within a commercial publishing company has many negative effects. The proper implementation of SGML in this setting could actually help to ease these antagonisms, by adjusting the responsibilities and power that accompany the use of this technology. At the same time, such an implementation would allow publishers to realize the full promise of SGML, in terms of reuse, repurposing, and in faster time-to-market, not just in the final phases of book publication, but throughout the publication process."
See the bibliographic entry for a related article by Arofan Gregory: "Commercial Book Publishing and Author Control."
Note: The above presentation was part of the "SGML User" track at SGML '96. The SGML '96 Conference Proceedings volume containing the full text of the paper may be obtained from GCA.
Abstract: "One of the main barriers to those wishing to create SGML systems is the high 'cost' of authoring: the level of expertise necessary to author in a structured editor can be prohibitive, as can the cost of up-translating documents from traditional word-processing formats. While SGML editing tools are slowly improving, they are still a far cry from the sort of applications that would enable SGML to be adopted as a mainstream technology in settings where WYSIWYG word-processors are the norm. At the same time, emerging standards in the SGML world such as HyTime and DSSSL, as well as the demand for support of such features of the standard as LINK, SUBDOC, and the internal declaration subset (among a more technical audience) require that SGML editors fill a different role, but one that is currently neglected. This paper presents a practical look at how (and how far) these goals can be met, given the existing state of SGML technology and related standards."
"By examining the different authoring paradigms, and by looking at successful word-processors, code editors, HTML authoring tools, and structured editors, a set of requirements is developed for the 'ideal' SGML authoring tool(s). The design approaches that can be taken for meeting these requirements is then considered. Emphasis is placed on utilizing architectural forms as an enabling technology within software applications, and leveraging the DSSSL and HyTime standards to enhance both application functionality and the value of the documents produced.
"Features that would be desireable for ease of integration with SGML document management systems are also addressed. The impact of XML on the development of editing tools is examined, and some approaches recommended for dealing with the impending wave of 'para-SGML' documents this new 'standard' threatens to generate.
"While an SGML editor cannot do everything for the user, the real world demands that this class of applications be significantly improved, both in terms of usability and functionality. This paper focuses on where these improvements can realistically be made, and what approaches have become possible given advances in the implementation of SGML and related technologies. In a more traditional vein, an analysis of successful authoring paradigms shows how SGML applications could be improved without requiring new technologies. It is hoped that this paper will help users vocalize what they most want in their SGML editors, and help developers understand how these features might successfully be implemented."
Note: The electronic conference proceedings in hypertext were produced by Inso Corporation (DynaText) and by High Text (EnLIGHTeN). Information about the SGML Europe '97 Conference may be found in the main database entry.
"Abstract: The rural Forest of Dean in Gloucestershire seems an unlikely setting for a massive document processing operation but it is here at Rank Xerox Business Services, Document Technology Centre, in the RX Business Park at Mitcheldean, that 1,500 new patent documents a week arrive from the European Patent Office (EPO) to be processed. This new venture for Rank Xerox has led them into offering commercial services based on the technologies used in this operation including scanning, ICR conversion, SGML encoding, CD-ROM production and laser printing, although Rank Xerox does not actually press CD masters, a partner does that."
"Abstract: Presents the most interesting aspects of the authors work in the framework of the development of a scientometric workstation. Their goal is to develop aids for information management, i.e. for decision and evaluation in the field of the scientific information. They present an original implementation of the coword analysis technique from the point of view of information technology. SGML is used as a pivotal format between independent C programs running on Unix workstations. The method employs hypertext integration. The authors emphasize a highly modular implementation facilitated by the use of SGML and the relationships between a SGML-encoded corpus and hypertext. Through the use of a clusters hierarchy generated by the SDOC application, the authors intend to illustrate that the joint use of hypertext and clustering techniques permits the information analysts to navigate through relevant information by following some statistically established relations between concepts."
The article contains a sample of a formatted letter generated from an SGML source [Jan Grootenhuis to Anders Berglund]. The appendices (eleven examples) illustrate the production process.
Note: The volume editor for SGML Users' Group Bulletin 4/1 is David W. Penfold (Edgerton Publishing Services, Huddersfield, UK).
Abstract: "There are two common, but unfortunate, responses when companies are confronted by a need to do a legacy data conversion to SGML: freezing in fear and doing nothing, or jumping in head-first and creating a disaster. Obviously, there are better ways to react. This presentation discusses how to develop an effective and realistic data conversion strategy. Among the issues considered are whether to do it inhouse or to outsource, selecting a DTD, developing a workable conversion schedule, writing a conversion specification, conversion methodologies, and arriving at a realistic conversion plan."
This paper was delivered as part of the "Newcomer" track in the SGML/XML '97 Conference.
Note: The SGML/XML '97 conference proceedings volume is available from the Graphic Communications Association, 100 Daingerfield Road, Alexandria, VA 22314-2888; Tel: +1 (703) 519-8160; FAX: +1 (703) 548-2867. Complete information about the conference (e.g., program listing, tutorials, show guide, DTDs, conference reports) is provided in the dedicated conference section of the SGML/XML Web Page and via the GCA Web server. The electronic proceedings on CDROM was produced courtesy of Jouve Data Management (Jouve PubUser).
Most SGML publishing projects include, at some point, the conversation of existing, formatted text into SGML-tagged files. These SGML conversions differ from other conversions because SGML separates content from structure and defines structure with ambiguity. Ultimately, all source materials can be converted, but how much of the process is automated and how much done by hand is a critical choice. This article weighs the costs and benefits of various conversion strategies for all types of source material -- from typesetting tapes to OCR. The article illustrates the challenge of converting the implicit structure of tables and other document features and concludes with tips for creating a conversion plan.
The article discusses Encyclopedia Composita, a composite electronic reference work based upon 29 large paper volumes. The authors' main point: create a plan for conversion based upon good analysis.
Abstract: "Success of legacy conversion might be the single most important determinant of your organization's success in a move towards an SGML environment. It can also be the single most costly aspect of the project. This session's goal will be to dispel the myths. We will present an overview of the key issues and illustrate them with real-life experience. We will discuss: keying vs. OCR vs. software conversion; what software can really accomplish; what you can expect in quality and how you measure it; what a 'ballpark' quote includes and what it doesn't; and how to improve the probability of success.
Data Conversion Laboratory prepares data and text for CD-ROM and Web publishing. Going beyond conversion, DCL specializes in enhancing your legacy documents to meet the new demands of SGML, HTML, PDF, and other structured formats. The company supports all major electronic source formats as well as paper and microfilm."
Note: The above presentation was part of the "SGML Newcomer" track at SGML '96. The SGML '96 Conference Proceedings volume containing the full text of the paper may be obtained from GCA.
Abstract: "The SGML standard, ISO 8879:1986, defines a meta-language for defining markup languages. The standard primarily addresses the issue of the syntactic interpretation of a complete, valid SGML document by a conforming SGML parser. It does not define how an application, such as an authoring tool, should behave. Authoring requires the solution of various problems that do not exist when only the parsing of complete documents is considered."
Available on the Internet in HTML format: from SGML Open; [mirror copy].
"Abstract: Two different but related issues pertaining to entity management impede interoperability of SGML documents: (1) that of interpreting external identifiers in entity declarations so that an SGML document can be processed by different vendors' tools on a single computer system, and (2) that of moving SGML documents to different computers in a way that preserves the association of external identifiers in entity declarations with the correct files or other storage objects.
"While there are many important issues involved and a complete solution is a long term goal, the SGML Open membership agrees upon the enclosed simple set of conventions to address a useful subset of the complete problem. To address issue A, this resolution defines an entity catalog that maps an entity's external identifier and/or name to a file name. To address issue B, this resolution defines a simple interchange packaging scheme using an interchange catalog to associate a public identifier with each interchanged file."
Available in HTML format: SGML Open - TR 9401:1995 - "Entity Management" [mirror copy, December 28, 1995]. Also available from the FTP server at Exoterica Corporation in compressed Postscript format ftp://ftp.exoterica.com/sgmlopen/9401/9401.ps.Z, [mirror copy] or in other formats (files: 9401pack.tar.Z, 9401pack.zip, 9401ps.zip). Revisions: Technical Resolution 9401:1994; Final Technical Resolution: 1994 August 9; Technical Resolution 9401:1995 (Amendment 1); Committee draft 1: 1995 March 1; Committee draft 2: 1995 March 23; Final Draft Technical Resolution: 1995 April 19; Final Technical Resolution: 1995 September 8.
Summary: "Two different but related issues pertaining to entity management impede interoperability of SGML documents: (a) that of interpreting external identifiers in entity declarations so that an SGML document can be processed by different vendors' tools on a single computer system, and (b) that of moving SGML documents to different computers in a way that preserves the association of external identifiers in entity declarations with the correct files or other storage objects."
While there are many important issues involved and a complete solution is a long term goal, the SGML Open membership agrees upon the enclosed simple set of conventions to address a useful subset of the complete problem. To address issue A, this resolution defines an entity catalog that maps an entity's external identifier and/or name to a file name. To address issue B, this resolution defines a simple interchange packaging scheme using an interchange catalog to associate a public identifier with each interchanged file." [from the Abstract]
A copy of the abstract is available (SGML Open Technical Resolution 9401:1994). A mirror copy is also available here. Full copy of the text of the resolution may be obtained from SGML Open.
"Abstract: This paper considers special problems in providing bibliographic control of and access to electronic texts and how they are being addressed by the Anglo-American Cataloging Rules, 2d ed. 1988 rev., and the MAchine-Readable Cataloging (MARC) standards used for encoding bibliographic data on the computer. It summarizes the concepts and development of the USMARC Format for Bibliographic Data, computer files specifications, and identifies particular issues in providing bibliographic control of electronic texts including identification, description, location, and access. It explores attempts to address these difficult issues surrounding electronic texts, particularly in the MARC formats, as libraries are adapting to the growth of the Internet and the wide availability and proliferation of many types of electronic items. The paper reviews specific projects that attempt to provide better description of and access to electronic texts, including the OCLC Internet Resources Project, attempts of the USMARC Advisory Group of the American Library Association to enhance the MARC formats to provide location and access to online information resources, standards under development for locators and identifiers of Internet resources (Uniform Resource Identifiers), and some projects involving access to electronic texts. In addition, the author reviews the relationship between [the] Standard Generalized Markup Language (SGML) and MARC."
Another abstract for the article is available from ETEXTCTR Review #2 (Aurora Ioanid).
Abstract: "The XML language is a profile of the full SGML language optimized for use on the Web. XSL -- the Extensible Style Language -- is the name being given to the language designed to be used to specify a stylesheet for an XML document.
"In the original conception of the XML project, its stylesheet language was expected to be a subset of DSSSL. However, the syntax of DSSSL as defined in the 1996 version of the standard -- based on the Scheme programming language -- was seen by many as suboptimal for average use. The current XSL proposal is basically a mapping of a subset of DSSSL into what is mostly a declarative (rather than programming) language.
"The official XSL work being done under the auspices of the World Wide Web Consortium (W3C) is very much in progress. This talk mostly discusses XSL described in the joint proposal made by ArborText, INSO, and Microsoft to the W3C, acknowledged by the W3C in September 1997. Updates on the latest work will be provided at the talk."
This paper was delivered as part of the "User" track in the SGML/XML '97 Conference. See similarly: "XSL - A Proposed Stylesheet for XML. Placing XSL into Perspective." By Paul Grosso. August 29, 1997. From the 'XML and SGML Library' at ArborText. The paper discusses: "Where XSL fits into the XML specification effort, Where XSL fits into SGML and related technology, Relationship of XSL to DSSSL, Relationship of XSL to HTML and CSS," etc. [local archive copy]
Note: The SGML/XML '97 conference proceedings volume is available from the Graphic Communications Association, 100 Daingerfield Road, Alexandria, VA 22314-2888; Tel: +1 (703) 519-8160; FAX: +1 (703) 548-2867. Complete information about the conference (e.g., program listing, tutorials, show guide, DTDs, conference reports) is provided in the dedicated conference section of the SGML/XML Web Page and via the GCA Web server. The electronic proceedings on CDROM was produced courtesy of Jouve Data Management (Jouve PubUser).
The volume contains an introduction to SGML and implementation of the standard for electronic interchange of CEC and OPOCE documents. FORMEX unified two different approaches to text interchange: (1) Common Communication Format (CCF), PGI-84/WS/4, Paris: UNESCO, 1984, itself based upon
Abstract: "Was SGML born too early? 10 years ago, very few people understood its potential. But, now-a-days, surfing on the Internet wave, a number of technologies have become trendy. Buzzwords like virtual reality (VRML) or active content (Java, activeX) have become fashionable. How are they related to SGML? Is it not the SGML approach that paved the way to such advanced technologies?
"This paper explores these issues, examining the results of a number of research projects launched by Eurostat to review cutting-edge and emerging information technologies and to evaluate, from a user's point of view and on the basis of concrete applications, how they could help Eurostat fulfil its mission, which is to provide the European Union with a high quality statistical information service.
Note: The electronic conference proceedings in hypertext were produced by Inso Corporation (DynaText) and by High Text (EnLIGHTeN). Information about the SGML Europe '97 Conference may be found in the main database entry.
Abstract: "Sun is three years into a large project that radically revamps its online document creation, management and delivery system, from a proprietary product to an open SGML system designed to meet all current needs, including XML and Java, as well as whatever the future might bring.
"The goals are deceptively simple: take the writers' documents and, with a minimum of processing, package them and make them available to all users; support links not only within the books but also between books, independently of where the books are ultimately located; execute context driven searches; support user manipulation of contents of book collections and installation wherever they wish; enable viewing books outside of the Solaris environment - and we want, finally, for anybody to be able to publish their own books within this environment.
"A decision was made, very early in the process, to migrate to an SGML-based delivery process, and, following consultations with others in the industry and our own research, we also decided to migrate our authoring environment and tools to SGML, rather than continue using a third party proprietary product, so as to avoid the painful problems associated with multiple conversions.
"Three processes were put in place: conversion/authoring, production, and delivery vehicles.
"This case study will concentrate on: 1) how we arrived at our DTD; 2) how SunSoft decided on an editing tool; 3) how inter-book linking in the absence of standard URNs is accomplished; 4) how books are delivered over the Net, including multiple HTML formats; 5) print-on-demand; 6) I18N/L10N. [Finally], what is the relationship between all of the above and the so-called 'production process', which controls both the automatic production of collections, and the repository/database where all meta-information is maintained?"
The source for this document is available in SGML and Postscript format, kindly supplied by the authors. See also the main database entry for docs.sun AnswerBook Documentation - Sun Microsystems.
Note: The electronic conference proceedings in hypertext were produced by Inso Corporation (DynaText) and by High Text (EnLIGHTeN). Information about the SGML Europe '97 Conference may be found in the main database entry.
Abstract: "Information is the raw material from which information products are produced. Nowadays, new information products are needed, including CD-ROM, online databases, World Wide Web pages, and electronic browsers, in addition to printed documents, which impacts production processes. The reasons why SGML is ideal for supporting multiple outputs are discussed. Because of the many process changes involved, it is important to cost justify your SGML project. The three keys to a successful cost justification proposal are: 1) understanding your company's goals, 2) understanding your contribution, and 3) understanding your readers. Return on investment and cost/benefit analysis approaches to a cost justification proposal are discussed. Some formulas for associating cost savings with some tangible SGML benefits are presented."
Note: The above presentation was part of the "SGML Business Management" track at SGML '96. The SGML '96 Conference Proceedings volume containing the full text of the paper may be obtained from GCA.
Abstract: "When making a business case for SGML, one of the key arguments is justifying the cost for the transition to SGML. This presentation is designed to help you justify the cost of implementing SGML whether your objective is to support multiple outputs or to re-engineer your information production processes. This presentation covers the measurable benefits in detail, discusses the unmeasureable benefits of SGML, and provides suggestions for preparing your argument."
"A cost justification proposal is more than just a series of line items with related costs. It is an important sales tool - one that will help ensure your success as you compete against others for funding. Authors of successful cost justification proposals have three things in common. First, they have a clear understanding of their company's short- and long-term goals and objectives. Second, they have identified areas where new practices (or processes) and new technologies can contribute to reaching these goals. Third, they have adopted a production process vocabulary set to describe the changes and how these changes will benefit their customers."
Note: The electronic conference proceedings in hypertext were produced by Inso Corporation (DynaText) and by High Text (EnLIGHTeN). Information about the SGML Europe '97 Conference may be found in the main database entry.
The author reviews David Megginson's book Structuring XML Documents. This book appears in the Charles F. Goldfarb Series on Open Information Management, [Subseries:] The Definitive XML Series from Charles F. Goldfarb. Hahn has some praise for the book, but thinks a more appropriate title would have been Evaluating and Using DTDs. Hahn also offers some general criticism of the books in the Goldfarb series.
The successful book
Extract: "The new HyTime standard has two parts: one covering the domain of HyTime per se (for describing time-based, hypertextual multimedia, with modules for linking and addressing, event schedules, and rendition) and another covering the SGML Extended facilities, essentially a number of separate, but important, specifications, one of which is the Architectural Form Definition Requirements. Architectural forms are building blocks in deriving documents or specializing generic architectures (similar to a superclass in object-oriented terms).
"Whereas XML is designed for simplicity, with design goals rather easily met even when starting from scratch, the standards' new facilities require product vendors to meet specific requirements at a rather profound level of complexity within the standards. Grove-based document processing defines steps for grove construction, interpretation, and providing results of such processing. Addressing development of this functionality typically requires person years. Luckily, much of this new SGML/DSSSL functionality is already implemented in the freely available sp parser and Jade DSSSL engine, respectively, both from James Clark (http://www.jclark.com/)."
Abstract: "SGML has celebrated 10 years as a standard, and although the standard is only now being revised, the use of SGML has evolved over time. This paper explores some of the features that has made SGML successful, the importance of adopted conventions, and speculates on future applications as SGML transitions into the next century."
"As an international standard, SGML is subject to orderly, voted-upon change. Already a decade in adoption, it is due to be revised. In many ways, the standard was farsighted in its design -- a fact confirmed by it being applied well beyond its original publishing design intentions, and in becoming the foundation of promising standards such as ISO 10744 HyTime. Even the long delay in completing the companion standard ISO 10179 DSSSL has not significantly slowed SGML's rise to prominence. However, even if the standard itself has yet to change, the use of SGML has made a number of transitions.
Note: The electronic conference proceedings in hypertext were produced by Inso Corporation (DynaText) and by High Text (EnLIGHTeN). Information about the SGML Europe '97 Conference may be found in the main database entry.
Abstract: "Two key measurements of the success of any SGML system are cost reduction and user satisfaction. This paper examines implementation details that can affect ease of training, ease of use, and overall productivity; and suggests various techniques for enhancing both productivity and user satisfaction. It also looks at how to take advantage of emerging technologies to provide additional capabilities for leveraging and reusing information."
Note: The electronic conference proceedings in hypertext were produced by Inso Corporation (DynaText) and by High Text (EnLIGHTeN). Information about the SGML Europe '97 Conference may be found in the main database entry.
Summary: The author discusses the use of the ISO 12083 DTD (SGML) by the AIP, and the transformation of SGML-encoded physics documents into HTML for publishing on the Web. Math equations are pre-processed into images for inclusion into HTML documents.
The 18th Annual Meeting of the TeX Users Group was held July 28 - August 1, 1997 at the Lone Mountain Conference Center, San Francisco, California. An entire day was devoted to "The Web and SGML", including a presentation on DSSSL by Jon Bosak, and one on 'The TeX backend for Jade' by Sebastian Rahtz. For more on SGML/XML and TeX, see the dedicated database entry and the topical bibliography listing.
Abstract: "ADAPT is a document processing system that automatically builds full-text databases from document images. The major components of the process are scanning, image segmentation, optical character recognition (OCR), layout object identification, and database building. A retrieval system and user interface completes the functionality. The system features a general document representation that includes the document image and an SGML tagged version. Standards are adhered to where applicable."
Abstract: Concerns a document processing model accounting for aspects of an activity which is usually called formatting. The core of the model, an experimental formatting language called FFL, is the central topic. FFL is a purely functional language in the style of FP and the applicative part of APL. Sequences, characters, and so-called boxes constitute the data types and among the build-in primitives are functions for aligning/spacing, breaking etc. Emphasis is put on presenting the language and exemplifying its use. Also considered are issues in type checking of formatting function definitions and techniques for doing incremental formatting with FFL formatting functions. FFL is currently being implemented by the BENEDICK project group led by the author.
Abstract: The heterogeneous aspect of data description in medical/health care (e.g. coding character, terminology, units, data record format, data transfer format, and so on) is a major impediment to acquiring knowledge from medical/health information systems automatically. There are some efforts being made to standardize interchangeable data and knowledge representation, but their focuses are different from each other. To overcome these bottlenecks, a meta-syntax approach should be introduced to describe the data structure. In this paper, medical/health data is considered as a readable document, which plays a key role in applying document processing techniques that are used in other fields (e.g. computational linguistics, electronic publishing, electronic data interchange for administration, commerce and transport, and so on) to medical/health data processing. SGML (Standard Generalized Markup Language) is used as the meta-syntax description language to describe the data structure and its attributes, by which the computer can parse messages transferred on networks in order to extract useful information for the system. This paper describes our meta-syntax approach, and aims at serving as a catalyst to stimulate research in automatic knowledge acquisition with standardization of data.
[Extract:] "The National Institute of Japanese Literature (NIJL) has been designing, building, managing, and maintaining the databases on Japanese classical literature for academic researchers both in Japan and foreign countries. The NIJL's database system is comprised from a computer and inter-network, and provides three catalogue databases (i.e., the Catalogue of Holding Microfilms of Manuscripts and Printed Books on Japanese Classical Literature, the Catalogue of Holding Manuscripts and Printed Books on Japanese Classical Literature, and the Bibliography of Research Papers on Japanese Classical Literature). . . There are a several languages or standards for describing text structures, including SGML (Standard Generalized Markup Language), TeX, PostScript, and ODA (Open Document Architecture). Among these, SGML is the only language that can describe the logical structure of text. As it is established as ISO and JIS (Japanese Industrial Standard) standard, many applications have been developed. At present, we are under reconstruction of catalogue databases and full text databases. Both data can essentially be considered as nested string fields with variable length. SGML can describe the complicated text structure such as repeating groups, nests, an order of appearance, and number of appearances. If a data search is regarded as 'a search for a specific string in text data,' constructing database system that uses a string searching device is possible. Actually, in research on Japanese literature, search by string is more common way than search by numbers. Meanwhile, fast string search devices and software are being developed and sold; all of the products are capable of handling SGML data. Consequently, we have done some projects based on SGML."
Abstract available online in HTML format: "A Digital Library System for Japanese Classical Literature", by Shoichiro Hara, Hisashi Yasunaga; [archive copy]. See also the main database entry for the National Institute of Japanese Literature.
Additional information on the ACH-ALLC '97 Conference is available in the SGML/XML Web Page main conference entry, or [August 1997] via the Queen's University WWW server.
Abstract: "This paper describes our study on the text data description rules for Japanese classical literature. We investigated the various functions for the text data description by analyzing Japanese classical materials. As the result, we have defined and developed the rules with three functions, calling these KOKIN Rules. Many Japanese classical texts have been electronically transcribed using these rules. We have evaluated their availability especially for their application to databases, CD-ROMs, and publishing. Recently, as SGML has become a popular markup language, we have conducted a study of conversion to SGML compliant text. A full-text database system has been produced based on the string search system conforming to SGML."
The document is available online in HTML format; [archive copy, text only]. See also the main database entry for the National Institute of Japanese Literature.
The authors discuss the National Institute of Japanese Literature (NIJL). SGML is being used for the encoding of information in the electronic texts and catalogs.
"Abstract: The relatively new international HyTime standard (ISO 10744) introduced the notion of architectural forms. With architectural forms, SGML elements can be classified by means of #FIXED attributes as belonging to some class. In HyTime, architectural forms are used as a basis for processing hypermedia documents, but their use is not limited to that.
"The International Committee for Accessible Document Design (ICADD) was formed to help in making printed materials accessible to people with print disabilities (eg. people who are blind, deaf-blind, or otherwise reading impaired). The ambition of ICADD is that documents should be made available for people with print disabilities at the same time as and at no greater cost than they are made available to people who can access the documents in traditional ways (usually by reading them on pages of paper). This ambition presents a significant technological challenge.
"ICADD has identified the SGML standard as an important tool in reaching their ambitious goals, and has designed a DTD that supports production of both "traditional" documents and of documents intended for people with print disabilities (eg. in braille form, or in electronic forms that support speech synthesis).
"ICADD is aware that it is unrealistic to expect document producers and publishers to use the ICADD DTD directly for production and storage. Instead a "document architecture" has been developed that permits relatively easy conversion of SGML documents in practically any DTD to documents that conform to the ICADD DTD for easy production of accessible versions of the documents. The architecture is based on ideas that are quite similar to those of HyTime architectural forms."
"The approach of ICADD is interesting, not least because it illustrates that document portability and exchange in SGML can be achieved by other means than standardizing on a single DTD in the exchange domain. In ICADD, portability is achieved by specifying mappings onto a standardized DTD."
The document is available online in HTML format: "Document processing based on architectural forms with ICADD as an example" [mirror copy, December 1995]. For further details on the Conference and BeLux, see the contact information for SGML BeLux.
Abstract: "Information Providers add value by enabling access to information. On-line Information Providers add value by adding markup which enables access via a search engine. Improving search precision and recall is directly related to the match between the language used in a search query and the collection of searchable information. Markup that enables construction of the search query using the language of the collection is lexical and semantic not structural. SGML parsers allow us to validate structural markup but cannot be used to determine the quality of lexical and semantic markup.
"Natural language is powerful because the rules regarding lexical and semantic information are flexible and mutable. However, the recognition and marking of lexical and semantic information is more difficult, and will be less robust than the recognition of structural and syntactical information. We, as Information Providers, need to make sure that our markup systems maintain a high level of quality.
"The manufacturing industry uses Statistical Process Control to monitor and maintain the quality of their products. This paper demonstrates how Statistical Process Control can be used within an information manufacturing system to monitor and maintain the quality of the processes adding search enabling markup."
This paper was delivered as part of the "User" track in the SGML/XML '97 Conference.
Note: The SGML/XML '97 conference proceedings volume is available from the Graphic Communications Association, 100 Daingerfield Road, Alexandria, VA 22314-2888; Tel: +1 (703) 519-8160; FAX: +1 (703) 548-2867. Complete information about the conference (e.g., program listing, tutorials, show guide, DTDs, conference reports) is provided in the dedicated conference section of the SGML/XML Web Page and via the GCA Web server. The electronic proceedings on CDROM was produced courtesy of Jouve Data Management (Jouve PubUser).
Abstract: "Xerox Corporation has developed an integrated electronic documentation system for field technicians to use at customer sites for diagnosis and repair of equipment. The electronic documentation system lets users access large technical documents that include text, graphics, video, and sound. Documents interact with the equipment being serviced to provide the reader with information relevant to the current situation. This article discusses how the Standard Generalized Markup Language (SGML) was used, user interface requirements and how they are addressed in our system, multimedia implementation techniques, integration strategies with other service tools, and the technologies employed in the system."
Available online: http://www2.rpa.net/~harmison/IEEE2.html; [local archive copy]. This article is part of a special issue of IEEE Transactions on Professional Communication (with an introduction by Jonathan Price): "Structuring Complex Information for Electronic Publication."
See the main entry for the RIDDLE Project. See also the full bibliographic entry for for deliverable #4, RIDDLE Project. Translation of Contents Pages to On-Line Library Catalogue Format. Another abstract of the article by Mary Mallery is also online.
Hayter argues that Kaelbling's "improvements" to SGML" (see reference to Kaelbling) are based upon a misunderstanding of the intent of the SGML standard. Kaelbling's original draft known to Hayter was apparently 16-March-1988; Kaelbling's revised draft of 18-October-1988 responds to Hayter's comments.
Abstract: A frequently cited problem with the Standard Generalized Markup Language (SGML) is that applications using the standard have been slow in arriving. Part of this delay is because of the instability of the standard and part because of constructs of the language that are functionally redundant and/or add unnecessary complexity to both machine and human processing. This paper is based on our experience implementing an SGML parser using commonly available tools for building programming language translators. It describes the problems we encountered and suggests modifications to SGML to eliminate those problems. The modified language can be implemented using well tested tools and will be more stable and more amenable to both computer and human processing while maintaining all of the fundamental strengths of SGML.
See similarly, by the same authors, "Difficulties in Parsing: Suggestions to Improve SGML," <TAG> 10 (July 1989) 8-10.
"This paper intends to review a number of metadata formats in order to highlight their characteristics. The comparison will be done in the context of the requirements of bibliographic control, with reference to the suitability of the various record formats for this purpose. The author is a researcher working at UKOLN on the ROADS (Resource Organisation and Discovery in Subject-based services) project, part of the eLib Electronic Libraries Programme, and a special concern is to establish a comparative context in which to discuss the IAFA template which is being used in that project. The choice of formats for comparison has been limited due to practical considerations. The formats chosen for consideration (MARC, IAFA templates, TEI headers and URCs) were chosen for their particular relevance to those working within the UK eLib projects. Other formats such as GILS (US Government Information Locator Service) and Harvest SOIF (Summary Object Interchange Format) would also merit investigation in the future." [from the Introduction]
See the online version: Review of Metadata Formats, by Rachel Heery; "looks at issues surrounding the use of several metadata formats (MARC, TEI headers, IAFA templates, Dublin Core)" [mirror copy].
A detailed, informative report on the SGML-Europe '95 Conference. The document is available online ["Report on the conference SGML-Europe 16-19 May 1995, Gmunden"]; mirror copy.
"Abstract: The article describes the basics of the Standard Generalized Markup Language (SGML), a language for document representation. It concludes that: SGML is an international standard; it is a system independent-SGML; documents can be transferred from one computer system to another of a different manufacturer; it is a device independent-SGML; documents may be output to a variety of devices; it is a language independent-SGML; it is usable for Latin based and non Latin based alphabets; an SGML document may include any forms of information; in SGML everything is governed by rules; SGML is a representation language, able to represent any type of a document."
"Abstract: A simple notation for describing the internal structure of a document is presented, and contrasted with other, more conventional notations for describing documents, in particular those related to subject classification systems and document description for bibliographic purposes, as well as with document metalanguage codes such as those of SGML. It is suggested such a notation should assist the science of human messaging through (1) permitting hypotheses to be more readily expressed and/or tested concerning document structure, and (2) facilitating the formation of taxonomies of documents based on their structures. Such a notation should also be of practical value in contributing to the processes of document specification, building and testing, and could possibly also contribute to new generations of information retrieval systems which link retrieval against record databases to the search systems internal to specific documents. It is suggested that, following formative criticism, professional standards for describing document structure should be sought based on the notation. The notation is at present limited to linear documents, but extensions to it to accommodate documents in nonlinear form (e.g. hypertext documents) and/or existing in physically distributed form, could usefully be constructed. Examples of the application of the notation are provided."
"The CGM Handbook is an essential companion to the CGM standard, and an invaluable for anyone using or implementing the CGM. A metafile is a mechanism for retaining and transporting graphical data which contains a description of one or more pictures. The CGM is an international standard format for 2-D computer graphics storage and exchange of images. The CGM Handbook provides ample coverage of this rapid-growth area of computer graphics and will be of interest to anyone interested in CGM. CONTENTS: Is CGM For Me? Graphical Data Storage Concepts. CGM in Context. Does It Work and Who Uses It? Overview of the CGM Standard. CGM History. CGM Functionality: CGM Functional Overview. Lines and Line Attributes. Filled Areas. Text, Text Attributes and Fonts. Markers, Symbols and Segments. Raster Primitives. Further CGM Features. Beyond the Standard Elements. CGM Encodings: Overview of CGM Encodings. Clear Text Encoding. Binary Encoding. Character Encoding. Implementing the CGM: What to Implement. Profiles. CGM into Documents. File Transfer Considerations. CGM Testing. Looking Ahead. Element Reference. Appendixes. Bibliography. Index." [publisher's blurb]
Abstract [from GCA]: "The Computer Graphics Metafile is an international standard format for 2-D computer graphics storage and exchange of images. The CGM Handbook is the definitive work on the CGM for implementors, managers and consumers, explaining its context, role and technology. The book addresses CGM:1992 which is the revised CGM standard, offering extended capabilities resulting from requirements studies with input from engineering, technical documentation and graphic arts areas. It describes the strategic, management and implementation decisions and issues surrounding the use of the CGM, guiding the reader through an understanding of the encodings and programming issues. The highlighted "What's New and Different" sections point out the significant additions and changes to the original CGM:1987. (446 pages, hardcover, 1993)"
For other information on CGM, see the main database entry for Computer Graphics Metafile. See also: CGM in the Real World. Edited by Anne M. Mumford and M. W. Skall. Eurographic Seminars. Berlin: Springer Verlag, [November] 1988. ISBN: 0387192115.
For other information on CGM, see the main database entry for Computer Graphics Metafile.
See also: CGM in the Real World. Edited by Anne M. Mumford and M. W. Skall. Eurographic Seminars. Berlin: Springer Verlag, [November] 1988. ISBN: 0387192115.
"Abstract: Many writers of technical documentation must consider two different presentation media, namely traditional printed books and electronic forms. This appears to be a long-term situation, not a transitional phase: for some reading tasks, hard copy will be preferred, but for others, electronic copy will be preferred. In some settings, it is thus necessary to prepare material that is of high quality in both media, often with the constraint that a single source file be used. The problem is to specify the structure of a text so that whether it is printed or deployed electronically, neither version contains textual problems caused by its dual role. Several examples are presented to show how a writer's structuring intentions can be effective in hard copy but not in electronic copy. The difficulty of preserving structuring intentions in both media stems from declarative markup languages that are rhetorically impoverished. While standard markup languages can be used to specify what text elements comprise a text, they cannot be used to specify the intended roles of the text elements. To preserve structuring intentions, it is proposed that a rhetorical markup language is needed. Two potential advantages of such a language are improved media-transferability and improved visibility of text structure."
"The key element to full integration, however, is our expectation that our finding aids will be thoroughly encoded using the DTD developed in this project. In this form they can be fully realized in this information system with a full array of external and internal linkages. Although SGML search engines for the Web are in the early stages of development, there are already tools in use at the University of Virginia Electronic Text Center that can convert SGML to HTML "on the fly," as it were. While this is obviously only a temporary solution (after all, why go to the trouble to develop rich SGML encoding only to dilute it?), it seems clear, as I just noted, that the Web is evolving in the right direction." [extracted]
Available online in HTML format: [mirror copy]. For more on the conference, see information on the Sunsite WWW server.
"Abstract: In this paper we outline how the use of design principles had a benificial impact on the modelling of the hierarchy of information, object information, meta-information and shape-information in electronic documents."
The document is available online in HTML format: "The IMAP-DTD" [mirror copy, text only, December 1995]. For further details on the Conference and BeLux, see the contact information for SGML BeLux.
Abstract: The author provides a "discussion of practical experiences in using SGML and HyTime for publishing to different media (hardcopy, browser, HTML) using software such as Synex ViewPort and FrameMaker+SGML."
Summary: "We believe that we have built a product which couldn't have been built without using SGML and HyTime. . . SGML helped us in strictly separating content from formatting. The advantage of this approach is that you can use the formatting rules which are best adapted to the characteristics of the final output medium. SGML also helped us in generating the content best suited to the final output medium. Thanks to HyTime we have the uncredible power of independent linking, so we can offer precisely those links that are targeted towards a specific audience. . .On the other hand, we are now confronted with severe problems. Those problems are mainly a result of the use of HyTime. HyTime's syntax is a disaster, editing independent HyTime links with the current SGML editing tools is a masochistic activity, and managing those links afterwards quickly becomes a nightmare."
Available online in HTML format: "QUESTOR: Publishing social law to different media"; [mirror copy]. For further information on the conference, see: (1) the description in the conference announcement and call for papers, and (2) the full program listing, or (3) the main conference entry in the SGML/XML Web Page.
Description of the SGML BeLux conferrence program, and call for papers. SGML BeLux is the Belgian-Luxembourgian chapter of the International SGML Users' Group. Contact: +32-16-23-59-54.
The annotated ISO 12083 is an electronic book for Microsoft Windows (R). It includes the complete text of the ISO 12083 standard in electronic format and a history of the development of the standard. Explanations of the more complex parts of standard are provided, including how to modify the ISO 12083 to meet your specifications while remaining within the parameters of the standard. Examples of marked-up mathematical formulas are given. The electronic book is delivered through Electronic Book Technologies SGML browser DynaText. The package includes an installation guide and online help. Requires Windows 3.1 and 4 MB RAM. [adapted from the description in the NISO Press catalog]
The author of Practical SGML answers some common questions about ISO 12083.
Eric van Herwijnen, author of Practical SGML, answers questions about ISO 12083. In this issue of ISQ: "Formatting tables continues to be problematic. Can you give me some pointers?"
This is one of the best introductory texts on SGML currently available [February 1995]. An online text file with Foreword, Preface, and Table of Contents for the second edition of the book is available here, or via FTP from world.std.com.
An electronic version of
Reviews of the second edition: (1) By Nico Poppelier Poppelier, (2) by Harry Gaylord: see "Review: Eric van Herwijnen, Practical SGML Second Edition", also mirrored as a copy on the local server.
First edition information: Dordrecht/Hingham, MA: Wolters Kluwer Academic Publishers, 1990. 200 pages. ISBN: 0-7923-0635-X.] Reviews of the first edition: (1) by Carol Van Ess-Dykema in
SGML tutorial on computer disk. Requires IBM-PC or compatible, Microsoft Windows 3.0 or 3.1, VGA or SVGA monitor (color or monochrome), 3.5 MB hard disk space.
The SGML Tutorial is an electronic hypertext version of Eric van Herwijnen's book Practical SGML. It includes exercises in which SGML structures authored by the user are checked by the SGML parser.
"Abstract: Guidon is OCLC's graphical interface to Electronic Journals Online. The World Wide Web (WWW) is a system that OCLC and many other institutions use to offer services over the Internet. We have extended a prototype version of Guidon so that it can display HyperText Markup Language (HTML) documents and function as a WWW browser. In addition to letting Guidon view Web pages, HTML offers several powerful features that could be integrated into OCLC's electronic journal services, such as a flexible forms capability."
The document is available via Internet on the OCLC WWW server [or in mirror copy, June 1995, text only].
"Abstract: Virginia Tech is one of the universities participating in Elsevier's TULIP project. OCLC is supplying the software to electronically distribute the collection to their campus. For this purpose, we have modified the Guidon electronic journal interface client software to display page images and have written a program to load Elsevier's data into OCLC's Newton text retrieval program. To make working with image databases as natural as possible, we have made a number of enhancements, including "thumbnail" page images and a magnifying option in Guidon."
The document is available via Internet on the OCLC WWW server [or in mirror copy, June 1995, text only].
Abstract: "In a previous article, S.J. DeRose et al. (1990) stated that there are now clear signs that OHCO based text processing will soon be reaching the general text processing markets. The authors did not mean that millions of office workers and school children would be learning to type tags into their documents. Rather, they were predicting that new WYSIWYG editors would make content based markup languages transparent and easy to implement. Once these editors made content based coding as simple as using a word processor, the text producing world would give up word processing, which treats text as a stream of characters, and come to see text as it really is: an Ordered Hierarchy of Content Objects (OHCO). In 1997, it is certainly true that content based markup systems (usually referred to as descriptive markup) have become familiar to those who create or produce large, complex documents for a living. However, to the rest of the world-that is, in the thousands of offices and schoolrooms in which vast amounts of text are being produced and distributed for a variety of purposes-descriptive markup systems remain largely unknown. This may be partly because WYSIWYG editors for SGML markup systems are not widely marketed. However, the real obstacles preventing descriptive markup systems from penetrating the general text processing market may be more complex."
The article is a response (commentary) on the publication of DeRose (et al), "What is Text, Really?" reprinted from Journal of Computing in Higher Education 1/2 (Winter 1990) 3-26.
This article appeared with four others in a special issue of JCD which focused upon 'the OHCO model of text [ordered hierarchy of content objects]'. The Journal of Computer Documentation (JCD) is a quarterly publication of the Association for Computing Machinery, Special Interest Group on Systems Documentation [SIGDOC], published by the Association for Computing Machinery. Editor in Chief: Tony R. Girill, Lawrence Livermore National Laboratory and University of California.
[Published] Abstract: The Center for Electronic Texts in the Humanities (CETH) was established in 1991 by Rutgers and Princeton Universities to provide a national focus for those who are involved in the creation, dissemination, and use of electronic texts and resources in the humanities. These resources may be literary works, historical documents, manuscripts, papyri, inscriptions, transcriptions of spoken texts, or dictionaries, and they may be written in any natural language. Electronic texts become much more useful when additional information, such as author, title, chapter, or features such as quotations and proper names are marked in some way. There are at least thirty different methods of encoding such features, but a new common format developed by the Text Encoding Initiative (TEI SGML) is emerging. A further issue to be addressed is that many existing texts also suffer from inadequate documentation and unclear copyright situations.
"Abstract: Electronic texts have been used for research and teaching in the humanities ever since the end of the 1940s. This paper charts the development of various applications in literary computing including concordances, text retrieval, stylistic studies, scholarly editing, and metrical analyses. Many electronic texts now exist as a by-product of these activities. Efforts to use these texts for new applications led to the need for a common encoding scheme, which has now been developed in the form of the Text Encoding Initiative's implementation of the Standard Generalized Markup Language (SGML), and to the need for commonly used procedures for documenting electronic texts, which are just beginning to emerge. The need to separate data from software is now better understood, and the variety of CD-ROM-based text and software packages currently available is posing significant problems of support for libraries as well as delivering only partial solutions to many scholarly requirements. Attention is now turning to research towards more advanced network-based delivery mechanisms."
Another abstract for the article is available from ETEXTCTR Review #2 (Jerry Caswell).
The article is part of a special issue which "focuses on the presentations of a program session on Internet-accessible scholarly resources held at the 1996 ACLS Annual Meeting." The issue theme is entitled "Internet-Accessible Scholarly Resources for the Humanities and Social Sciences." One section of Hockey's paper is on the Standard Generalized Markup Language (SGML) and the Text Encoding Initiative (TEI). She reports on The Model Editions Partnership and the Orlando projects, both of which use SGML encoding.
According to David Green (the editor)'s summary of the challenge Hockey addresses: "This challenge operates on several levels: first to be aware of the immense added value given to text that is encoded or "marked up" using Standard Generalized Markup Language (SGML). SGML, through its tagging system, provides a solid basis for describing the structural characteristics of a text (or object) and its content, irrespective of hardware or software platforms. It thus makes the text easier to query and more productive to analysis. Encoding in SGML requires a thorough analysis of the intellectual issues involved in creating an electronic object: what it is and how it may be used. SGML-encoded text can only be read in a dumbed-down version through the HyperText Markup Language (HTML) of the Web as we know it today. So there is urgent need for a more sophisticated post-Web Internet medium."
The article is available online in HTML format: http://www.acls.org/n44hock.htm; [mirror copy].
The document (a computer file) is available via the LC WWW server, or as a copy mirrored on the local Web server. See a summary of the LC seminar by Sarah E. Thomas (Director of Cataloging). Internet access to the proceedings volume's Table of Contents is in the document http://lcweb.loc.gov/catdir/semdigdocs/seminar.html.
Summary: "The chapter 'Textual Databases' [provides] a discussion of the generation, maintenance, and study of large text corpora. The availability of data collections like the Brown and LOB corpora has dramatically changed many areas of language scholarship (see for example the chapter by Bayer et al. in this volume). This chapter describes what corpora are, where they can be accessed, how they are annotated, what the various types of markup communicate, and what software is available to manipulate them. SGML and the work of the Text Encoding Initiative are concisely explained; and, in sum, the article represents a succinct and authoritative overview useful to anyone wishing to use electronic texts in their teaching or research." [editor's summary]
See also the online appendix which updates this chapter, and provide links to network resources.
An introduction and overview of the book may be found on the Routledge web site and [provisionally] at the University of Michigan. An online Table of Contents is provided, as well as online appendices for each chapter in the book.
[Extract:] "The Orlando Project, based at the Universities of Alberta and Guelph, is producing the first full scholarly history of British women's writing in printed and electronic form. Under the direction of Patricia Clements, the Orlando team comprises five co-investigators, two post-doctoral fellows, a project librarian and eight graduate research assistants. Orlando has received a SSHRC Major Collaborative Research Initiative grant for $1.6 million over five years beginning in July 1996, and is also supported by the Universities of Alberta and Guelph. Orlando is using computing technology and SGML at all stages within the project, but one of its key contributions to humanities computing methodologies is the use of SGML to encode interpretive information in basic research notes as the research is being carried out. The authors of the printed volumes will draw on this database of SGML-encoded information as they write. The database will also be used to create a number of hypertext products for research and teaching."
Abstract available online in HTML format: "Orlando Project: Humanities Computing in Conversation with Literary History. (Session)", by Susan Hockey, Terry Butler, Susan Brown, and Sue Fisher; [archive copy]. See also the main database entry for The Orlando Project: An Integrated History of Women's Writing in the British Isles
Additional information on the ACH-ALLC '97 Conference is available in the SGML/XML Web Page main conference entry, or [August 1997] via the Queen's University WWW server. See: the Orlando database entry.
[Describes TEI-SGML.] Abstract: Although the value of corpus-based research has been recognized since the compilation of the Brown and LOB corpora in the 1960s, the overall picture today is still one access to texts provided in many different ways, some of which are ad hoc and dependent upon individuals. Attention has thus turned to the need for reusable corpora and the establishment of procedures to guarantee that reusability. In the longer term we see the library as the place that will maintain and provide access to electronic texts and corpora, as it already does for print and other archival media. The Text Encoding Initiative's guidelines will play an important role in standardizing corpus-access procedures, in particular the TEI's proposal for an electronic text file header which will ensure that adequate information is available about the text and will provide the link with the library catalogue. We see a further need for detailed studies of the 'uses and users' of electronic texts and for research to establish a sounder methodology for the compilation of corpora.
The author describes a program and supporting methodology used within the English-Norwegian Parallel Corpus proejct. TEI/SGML encoding is used for the texts prepared for analysis.
Available on the Internet in Postscript format: ftp://www.hd.uib.no/pub/corpora/enpc.allc.ps [mirror copy]. For further details, see the main entry for the English-Norwegian Parallel Corpus.
"Abstract: Gipsy is an interactive document processing system based upon syntax-directed editing, where a document is viewed as an abstract syntax tree derived over a document grammar. The outer structure of the document is described by means of a formatting rule for each production in the document grammar. These rules are written in a language based on a box and glue concept. Two kinds of users exist: grammar designers and ordinary users. To allow the user to control the formatting, the grammar designer can use inherited attributes when writing the formatting rules. The values of these variables can be set by the user."
Abstract: "Excitement around XML is running high. This paper explores both the roots of XML and the market dynamics which are driving the intense excitement around this emerging standard. It is not the technical merits which will ultimately decide XML's popularity and acceptance - it will be the extent to which emerging applications properly implement XML support to provide significant solutions to important problems. Generating well formed XML does not constitute meaningful XML support. This paper attempts to give a perspective on the value that XML support should bring; XML cast as HTML++ or as SGML-lite will not deliver on the promise of XML. This paper attempts to explain why this is so and why you should care."
This paper was delivered as part of the "Newcomer" track in the SGML/XML '97 Conference.
Note: The SGML/XML '97 conference proceedings volume is available from the Graphic Communications Association, 100 Daingerfield Road, Alexandria, VA 22314-2888; Tel: +1 (703) 519-8160; FAX: +1 (703) 548-2867. Complete information about the conference (e.g., program listing, tutorials, show guide, DTDs, conference reports) is provided in the dedicated conference section of the SGML/XML Web Page and via the GCA Web server. The electronic proceedings on CDROM was produced courtesy of Jouve Data Management (Jouve PubUser).
Abstract: "A document is a written artifact. It is a designed linguistic structure rendered in a relatively stable visual medium. As a linguistic structure it has two aspects: the message itself, constructed by a writer to convey some ideas, and the notation used to express those ideas. Therefore, to understand documents, one must understand notations and how their design works in a visual medium to present a writer's message. Creating a well-motivated language for describing notations provides insight into their workings and into the construction of documents written in them. It provides writers with a logical and uniform view of their documents, serves as a conceptual tool for the rational design and analysis of notations, and becomes the basis for the creation of an integrated suite of tools for document production. Such a language for describing documents and their notations defines a document model.
This dissertation sets out a general document model. It weaves together ideas from three strands: grammar-driven structure editing, text-formatting, and linguistics. Two linguistic principles guide the overall structure. First, a document is a text plus a grammar, so the visible form of a document can be changed either by altering the text or by changing the grammar rules. Second, visible marks are not random. They exist for the express purpose of providing clues to the logical structure of the document. Editing a grammar can produce principled changes in the visible form of a document in a way that editing simple visual markup does not. Combining this observation with conventional wisdom from the other two strands leads to the division of the grammar into three parts. Abstract syntax governs logical structure, abstract geometry governs visual structure (layout),and concrete syntax mediates between them, specifying how logical elements are to be marked visually. The model uses an extension of the operator-phylum model for the abstract syntax, a generalization of boxes-and-glue for the abstract geometry, and a functional description based in part on linguistic marking theory for the concrete syntax."
Note: I have not yet seen this book. A publisher's blurb is [was 980409] available on the McGraw-Hill web site.
Abstract: "In May 1994, the Center for Electronic Texts in the Humanities (CETH) sponsored an invitational workshop on documenting electronic primary source materials in the humanities. The goal of the workshop was to work toward a clearer understanding of the relationship between the TEI header, the MARC record, and the current international cataloging rules, with an objective of establishing how far they meet the needs of scholars, librarians, publishers and software developers who work with these materials."
Also available online: CETH Newsletter, Fall 1994/Summary of CETH Workshop; [or mirror copy.
Abstract: "This article discusses a method by which documents marked up using Standard Generalised Markup Language (SGML) can be used to generate a database for use in conjunction with the World Wide Web. The tools discussed in this article and those that were used in experiments are all public domain or shareware packages. This demonstrates that the power and flexibilty of SGML can be utilised by the Internet community at little or no cost. The motivation for this work stems from the lack of standardisation on display techniques for SGML presentation."
"The motivation for this study has arisen from work carried out as part of the Electronic SGML Applications Project ELSA (Electronic Library SGML Applications) at the IIELR, De Montfort University. The ELSA project is concerned with the investigation of the use of SGML as a method of delivering scientific journal articles for use in an electronic library environment. The SGML articles were provided by Elsevier Science." TCL is used.
The document is available in electronic format on the Internet: http://www.ukoln.ac.uk/ariadne/issue6/sgml/intro.html, [mirror copy].
"Abstract: An important development in the processing and formatting of text has been the creation and use of markup languages, especially with the increased interest in electronic publishing and the Internet. An area being given particular attention has been the use of descriptive markup languages, which allow one to describe a text element or document in a way which is independent of its final output and form. One area which deserves greater attention in this regard is the creation of survey questionnaires, and any comprehensive markup language standard should include markups for supporting this application. The paper examines this need, and explains why a markup language approach would properly support the survey application and how it would extend the utility of the markup approach. A set of markups for survey creation are proposed which would serve as extensions to existing markup standards. The advantages and benefits of markup command languages as compared to traditional direct manipulation WYSIWYG approaches are also discussed."
This paper discusses one of the tools which may be used for representing texts in machine-readable form, i.e., encoding systems or markup languages. This discussion is at the same time a report on current tendencies in the field. An attempt is made at reconstructing some of the main conceptions of text lying behind these tendencies. It is argued that, although the conceptions of texts and text structures inherent in these tendencies seem to be misguided, text encoding is nevertheless a fruitful approach to the study of these texts. Finally, some conclusions are drawn concerning the relevance of this discussion to themes in text linguistics.
Abstract: "This paper discusses one of the tools which may be used for representing texts in machine-readable form, i.e., encoding systems or markup languages. This discussion is at the same time a report on current tendencies in the field. An attempt is made at reconstructing some of the main conceptions of text lying behind these tendencies. It is argued that although the conceptions of texts and text structures inherent in these tendencies seem to be misguided, nevertheless text encoding is a fruitful approach to the study of texts. Finally, some conclusions are drawn concerning the relevance of this discussion to themes in text linguistics."
Abstract: There are many good reasons for adopting SGML as the method of marking up source reference information when the data is primarily of textual form. Once such a decision has been reached, then it is usually necessary to consider two data conversion issues: how to get existing documents (which may exist in several different formats) into a target SGML form (up-translation); how to produce 'documents' from the SGML source data (down-translation). 'Documents' is quoted, because in many cases the desired deliverable form of the information may actually be some type of electronic database rather than a traditional printed documented. The author points out some of the issues to be considered in planning and performing these two types of conversion functions; some methodologies for carrying them out; and mentions some of the software tools which can be used for developing specific conversion applications.
Abstract: "The Electronic Publishing Solutions department at Northern Telecom (Nortel) transformed product and price publications from paper to electronic media within a short period of time. Electronic publishing radically improved Nortel's ability to control document quality and reduce information time-to-market. This department incorporated many significant production changes, such as:
- The use of Standard Generalized Markup Language (SGML)
- The sourcing of information directly from legacy and new product and price databases
- The distribution of documents in multiple forms, including CD-ROM, Nortel's Intranet, and paper across multiple systems and platforms
Nortel's previous publication production methods required the use of word processors to replicate and edit large product documents. Document publication was dependent on manual entry via word processors across several departments. Data entry errors and constantly shifting page layout due to changes, updates, and deletions created a vicious cycle of self-generated re-work and ever expanding schedules. Generally, information accuracy and update timeliness prevented consistent publication and use of resultant publications.
Publication is now produced directly from an SQL database source using SGML with embedded SQL statements. Both the source and the resultant documents are true SGML documents compliant to ISO 8879 standards. These SGML documents were created without modification of the legacy database. Replacing the existing database structure was not an option because it would have required re-engineering all of the existing processes that use the database. However, by using an internally developed toolset that expands SGML with embedded SQL statements, Nortel is able to produce SGML documents from legacy databases. These embedded SQL queries produce variable-length documents on-the-fly for printing or for display by the common Internet or CD-ROM browser.
Today, using an Internet or CD-ROM browser, Nortel's marketing and production engineers, sales support staff, distribution managers, and external distributors and customers can immediately access accurate product and price information. In addition, on-line access enables users to query and generate live reports dynamically from legacy information so that they can further target desired information. Information is kept up-to-date in an Automated Price Action application that is accessible on the Internet. Product adjustments are introduced for approval via this Internet service, and once approved, changes to product and price databases become instantaneously available for use. Although paper publishing is still required, Nortel anticipates substantial savings in time, labor, and cost by using SGML in a unique way."
Note: The above presentation was part of the "SGML Case Studies" track at SGML '96. The SGML '96 Conference Proceedings volume containing the full text of the paper may be obtained from GCA.
Summary: "Publication is now produced directly from an SQL database source using SGML with embedded SQL statements. Both the source and the resultant documents are true SGML documents compliant to ISO 8879 standards. These SGML documents were created without modification of the legacy database. Replacing the existing database structure was not an option because it would have required re-engineering all of the existing processes that use the database. However, by using an internally developed toolset that expands SGML with embedded SQL statements, Nortel is able to produce SGML documents from legacy databases. These embedded SQL queries produce variable-length documents on-the-fly for printing or for display by the common Internet or CD-ROM browser."
The document is available online in HTML format: http://www.agave.com/html/newsworthy/news_nortel.htm; [local archive copy]. See also the Agave Software Design Home Page.