Announcement for a new SIG ("Special Internet Group") on the subject "SGML Design Issues Related to Applications." Topics include: CONCUR, LINK, content model ambiguity, etc. The foundational meeting for the new SIG is May 10, 1995. Contact: Arjan Loeffen as sgmltw@let.ruu.nl, or Tel. +31-30-536417, or Arthur van Horck, Tel. +31-13-662232.
Apparently based upon or being a thesis in the area of applied information science. The work apparently discusses SGML and ODA.
Abstract: "HTML 4.0 has barely been released, but to some of us it is dead on delivery. We're already looking past it to XML, the eXtensible Markup Language, which promises to add much more power, flexibility, and reliability to the web. This article serves as a great introduction to XML and, to a lesser degree, Dynamic HTML (DHTML). The online version of the article links you through to some of the essential documents on XML. If you are interested in the future of the web, listen up. As the authors of this article put it: 'Although it will require developers and user to retool, the migration to XML must begin. The future of the Web depends on it'." [from (c) '-- RT' in Current Cites 9(2) (February 1998) ISSN: 1060-2356.
Another abstract: "We have a love/hate relationship with HTML. We love its easy learning curve and universality, but we hate its easily broken links and limited formatting. We love its simple and compact syntax, but we hate its rigid formatting and inflexibility. To keep what we love and jettison what we hate, we've scripted it, styled it, tabled it, and framed it. Yet, after more face lifts and tummy tucks than an aging Hollywood star, today's HTML is still just HTML. The broken links and formatting problems are just warts and cellulite that won't go away. It's time to find some new, fresh talent. A few new stars are about to break onto the scene with names like Extensible Markup Language (XML), cascading style sheets (CSS), and Dynamic HTML (DHTML). Each works on a slightly different set of HTML. 3.2's problems: XML on helping organize and find data, CSS on Web page inheritance and presentation, and DHTML on dynamic presentation of Web content. Aided by the recent HTML 4.0 refresh, these new technologies will beat back HTML's legacy of too many dead links, slow searches, and static pages on today's Internet and intranets." [from authors? check -9804]
This article is the "Cover Story" in the March 1998 issue of Byte Magazine. Now online: "The features that made HTML so popular are causing the Web to fall apart. What's next?"
"Syntopican XIII served as the host site for the American National Standards Institutes (ANSI) meeting of the American National Standards Committee (ANSC) for Office Systems. ANSC is the advisory group to the international committee on office standards for information processing systems and text and office systems. New standards are needed since office documents are becoming more complex, incorporating practices that were initially developed for printing and publishing applications. Six task groups were formed to address the issues of user requirements, document architecture, procedures for text interchange, content architectures (including requirements for character sets and coding, videotex in office systems and text interchange via magnetic media), text processing languages and user/system interfaces and symbols. ANSC is working to make itself known to users who feel alone with their incompatibility problems. One of its accomplishments has been to facilitate development of Standard Generalized Markup Language (SGML) for those in the printing and publishing industries to edit and mark copy, which is being considered a standard at the international and national levels."
The article is a revision of a presentation given at the GCA-sponsored conference SGML '89 (Atlanta, November 1989).
Abstract: Two important international standards relating to text have emerged. One of these, SGML, describes a framework for descriptive markup. The other, and more recent, deals with a command language interface for full text retrieval. The two standards have been developed in isolation from one another and the command language can handle only the conventional view of text and not the relatively complex structures implicit in descriptive markup. It is shown how a relatively simple syntactic extension to the command language enables it to be applied to SGML databases. Some implementation issues are also discussed.
Abstract: Descriptive markup languages provide a mechanism for specifying the structure of a document. The basic premise of the work described here is that structure is an important characteristic of a document and is something more than a layout specification. For this reason, it appears important that retrieval tools should be developed which can take advantage of structural knowledge. In this paper, a query language is described which provides such a capability. The underlying implementation strategy is also discussed. [Funding: Supported by the Natural Sciences and Engineering Research Council of Canada.]
Abstract: "Descriptive markup languages provide a mechanism for specifying the structure of a document. The basic premise of the work described here is that structure is an important characteristic of a document and is something more than a layout specification. For this reason, it appears important that retrieval tools should be developed which can take advantage of structural knowledge. A query language is described which provides such a capability. The underlying implementation strategy is also discussed.
See a previous version of the document in the Queen's University technical report.
This introductory article was published in an SGML special issue of Computer Standards & Interfaces [The International Journal on the Development and Application of Standards for Computers, Data Communications and Interfaces], under the issue title SGML Into the Nineties. It was edited by Ian A. Macleod, of Queen's University.
The journal
Articles include: David T. Barnard, Lou Burnard, and C. Michael Sperberg-McQueen, "Lessons from Using SGML in the Text Encoding Initiative"; Bart Bauwens, Filip Evenepoel, and Jan Engelen, "SGML as an Enabling Technology for Access to Digital Information by Print Disabled Readers"; Franz Burger and Sigfried Reich, "Design and Implementation of an Abstract SGML Interface in Smalltalk"; Patricia Francois, "Generalized SGML Repositories: Requirements and Modelling"; Matthew Fuchs, "The User Interface as Document: SGML and Distributed Applications"; Edward Levinson, "Exchanging SGML Documents Using Internet Mail and MIME"; Ian A. Macleod, "SGML into the Nineties"; Hans Holger Rath and Hans-Peter Wiedling, "Making SGML Work: Introducing SGML Into an Enterprise and Using its Possibilities in Advanced Applications"; Darrell R. Raymond, Frank Wm. Tompa, and Derick Wood, "From Data Representation to Data Model: Meta-Semantic Issues in the Evolution of SGML".
The 'Call for Papers' read as follows, in part: "SGML (the Standardized General Markup Language) is an international standard whose importance is rapidly growing. It is fair to say that the era of electronic text has finally arrived. A large number of potential text applications are seeking solutions, and there is significant industrial interest in the technologies being developed in the SGML context. In view of the high importance of SGML,
Abstract: There have been a number of important related activities which suggest the need for a new model for text. ISO standards for document description have been recently developed. These standards view documents as hierarchical objects and it is likely that languages such as SGML will become widely used in the near future for document markup. As structured documents become available, so there will be a need to evolve tools to take advantage of structural knowledge. The goal of the work described here is to develop such tools. A conceptual model for bibliographic data has been designed. The model is known as Maestro (Management Environment for Structured Text Retrieval and Organization). It supports structured documents and provides a query language to retrieve and link information contained in these structures. In this paper an overview of Maestro is presented together with an outline of the basic implementation strategy.
Abstract: Standard Generalized Markup Language (SGML) is an international standard for markup languages. Descriptive markup is a means whereby the logical structure of a document can be explicitly encoded. Such markup can subsequently be processed to provide an appropriate physical layout of the actual document content. Additionally, the logical structure provides the information necessary for highly context sensitive retrieval. In this paper the authors describe an SGML application which can process encoded documents into a format suitable for storage and retrieval by an appropriately powerful retrieval system.
It is also possible to encode links between documents using SGML. One technique is that suggested by the Text Encoding Initiative (TEI), a co-operative international venture to promote guidelines for the encoding and interchange of machine-readable texts. This paper also describes how such links can be processed to produce the equivalent structures in a document database.
"Abstract: This paper describes Godot (Generalized Object Depository Oriented to Text), a depository for structured text. The work is heavily influenced by international standards relating to text. The physical storage model, which is built on top of an implementation of the ISO DFR (Document Filing and Retrieval) standard, is described. It is shown how structured SGML documents can be incorporated within the storage model. An overview of the underlying object-oriented implementation is given and the basic access operations described. Examples of structural queries are provided."
Abstract: SGML [C. F. Goldfarb, editor, The Standard Generalized Markup Language (ISO 8879)] is a passive standard. That is, it provides mechanisms through which descriptive markup can be applied to documents, but says nothing about how these documents are to be processed. The SGML standard refers frequently to the "application" but includes no clean mechanism for attaching applications to SGML parsers. The purpose of this paper is to present one such mechanism while maintaining compatibility with the current SGML standard.
Abstract: Recent research has sought to build document-retrieval systems on top of relational database management systems (DBMS) in order to increase the power of document retrieval. While the use of DBMS shows a more flexible approach to designing search strategies, the underlying representation of the information is inflexible and does not correspond to either the structure or the meaning of the real-world objects. This limitation can be overcome through the use of conceptual modelling techniques. The array model presented here is based on these techniques and has been designed specifically for application in document retrieval.
"Abstract: The Multimedia Information Presentation System (MIPS) allows end-users to browse multimedia information presented in a user-friendly and consistent manner. In its most powerful configuration, it will allow the end-user to formulate queries which are interpreted, analysed, and despatched by the system to heterogeneous distributed external data sources, and to view a coherent and customized presentation of the data retrieved as answers. Data are stored in, or referenced from, a set of hyperdocuments conforming to the ISO standards HyTime and SGML. The hyperdocuments constitute an information web which may be dynamically expanded to accommodate retrieved data. The web navigation structure, structure of information nodes, specification of presentation mechanisms, specification of presentation tools, and data are separable and potentially reusable for different applications, different activities within an application, or different environments. The authors outline the intended functionality and the design of MIPS, with particular reference to the structure and function of the hypermedia web and the role of the knowledge base system module in its dynamic expansion."
Abstract: "In developing new ways to publish vast amounts of information, many technical communication teams face problems that go far beyond the challenges of one book, a series of books, or even a series of CD-ROMs. Technical communicators begin to face a constellation of problems that are more like those that have plagued software development since it became a distinct profession in the 1960s. At first a project appears promising. Then, as the work begins and progresses, we become enmeshed in interlocking problems of management, purchasing, staffing, training, installation, integration, and vision. This article summarizes the lessons learned from a major effort to use the Standard Generalized Markup Language (SGML) to pull together into a single, accessible, electronic "publication" large amounts of very complicated information."
Note: This article is part of a special issue of IEEE Transactions on Professional Communication (with an introduction by Jonathan Price): "Structuring Complex Information for Electronic Publication."
Abstract: "The SGML (Standard Generalized Markup Language) world has concentrated on solving the problems of textual documentation. SGML and other information standards are rather complex to take into wide use. SGML alone is not enough to implement working solutions. There are a large number of methods, models and naming conventions developed for different application areas: microdocuments, components that contain bigger element hierarchy, etc.
"This paper describes a keep-it-simple model as a base for interactive electronic applications. The model keeps the data in life-cycle safe format (SGML), but still gives the end-user any possible view to the data and interaction with it. One design goal of the model was to separate information, functionality and user interface from each other.
"Information is managed in SGML, HyTime (Hypermedia-Time-based Structuring Language, ISO 10744), and DSSSL (Document Style Semantics and Specification Language) formats. The information packages, that travel between client and server (and between applications), are modeled with information standards. Functionality is achieved with engines on client and/or server side. The user interface language is HTML (HyperText Markup Language) and Java applets.
"XML (Extensible Markup Language) brings lightness and data format independence when HTML provides a common user interface description language. Java applets are a modular solution for interface functionality and platform independence. HyTime is the way to link the information chunks together in a standard way. Data is stored in databases that are part of the application or part of the information infrastructure.
"This presentation contains models for putting SGML and other information standards to work for wide range of interactive electronic multimedia applications.
Note: The electronic conference proceedings in hypertext were produced by Inso Corporation (DynaText) and by High Text (EnLIGHTeN). Information about the SGML Europe '97 Conference may be found in the main database entry.
Abstract: "This book is meant as a guide to modern handling of legal information with the aid of standardized markup languages, in response to the well-known need for sharpened tools for managing the rapidly growing amount of legal information in combination with transborder data flows, especially on the Internet. The SGML and XML international standards for document description are becoming increasingly important for the legal domain in these respects.
The content is based on empirical results reached in the Corpus Legis Project. This interdisciplinary research programme began in 1994 at the Faculty of Law, Stockholm University and it has led to three different IT-applications, which may be categorised according to the following profiles: (1) hypertext based systems, (2) advanced information retrieval systems, and (3) general electronic document and management systems."
"Experiences from this practical work are described in the book. Major activities associated with the development of an SGML system, e.g. document analysis, DTD-design (Document Type Definition), and markup, are described from a legal point of view. The study comprises document types originating from different national legal systems, written in various languages, and covering a broad time perspective. The book can thus be seen as a checklist of critical factors in legal document management." [from the online book description]
A related work is represented by the publication The Comparative Part of the Corpus Legis Project - Using SGML for Intelligent Information Retrieval of Legal Documents. Authors: Haider, Georg, Magnusson Sjöberg, Cecilia, Quirchmayr, Gerald, Sebald, Verena, EXPERSYS-96, Artificial Intelligence Applications. J Zarka, E. Mercier-Laurent, D.L. Crabtree, M. Narasipuram. In: Technology Transfer Series. pp. 181-186. Editor: A. Niku-Lari. Other publications by the author are listed in the Corpus Legis Final Project Documentation. On Corpus Legis, see "The Corpus Legis Project." See also the author's list of publications.
The Table of Contents for the book is available online; [local archive copy]. See also the book announcement/review at: http://www.sub.su.se/juridik/subiura/1999-1.htm.
This book may be ordered from JURE Law Books, Artillerigatan 67, SE-114 45 Stockholm,Sweden. Phone: +46-8-662 00 80; Fax: +46-8-662 0086; Email: order@jure.se.
Abstract from the online version of the paper (co-authored with Julia H. Flanders): "This paper presents two groups of text encoding problems encountered by the Brown University Women Writers Project (WWP). The WWP is creating a full-text database of transcriptions of pre-1830 printed books written by women in English. For encoding our texts we use Standard Generalized Markup Language (SGML), following the Text Encoding Initiative's Guidelines for Electronic Text Encoding and Interchange. SGML is apowerful text encoding system for describing complex textual features, but a full expression of these may require very complex encoding, and careful thought about the intended purpose of the encoded text. We present here several possible approaches to these encoding problems, and analyse the issues they raise."
The TAG article focuses upon various strategies for using the TEI's so-called mirror tags for the encoding of simple variants, such as an apparent error and a correction. For example: <abbr expan=""></abbr> (storing the expansion for an abbreviation in an attribute value and the abbreviation in the content) versus the mirror, <expan abbr=""></expan> (storing the abbreviation's expansion in the content). See the TEI P3 Guidelines (1994) Chapter 6, pages 163-170. The author illustrates how nesting such elements can lead to perplexing logic, depending upon how processing is assumed to take place. The article is based upon a longer discussion published in a scholarly journal as "Some Problems of TEI Markup and Early Printed Books." See a version of the document on the STG Web server: http://dynaweb.stg.brown.edu/wwp_books/DL/1.toc. For the specific section on dual emendation: http://dynaweb.stg.brown.edu/wwp_books/DL/57; [mirror copy, partial links only].
Abstract: "This paper presents two groups of text encoding problems encountered by the Brown University Women Writers Project (WWP). The WWP is creating a full-text database of transcriptions of pre-1830 printed books written by women in English. For encoding our texts we use Standard Generalized Markup Language (SGML), following the Text Encoding Initiative's Guidelines for Text Encoding and Interchange. SGML is a powerful text encoding system for describing complex textual features, but a full expression of these may require very complex encoding, and careful thought about the intended purpose of the encoded text. We present here several possible approaches to these encoding problems, and analyze the issues they raise."
Also apparently published as: "Some Problems of TEI Markup and Early Printed Books," Carole Mah, Julia Flanders and John Lavagnino, [forthcoming in] Revue Informatique et Statistique dans les Sciences Humaines. Université de Liège, 32.1-4 (1996). And: Julia Flanders, Some Problems of TEI Markup and Early Printed Books, Paper presented at Digital Libraries Workshop 1996, Organized by Nancy Ide and Judith Klavans, Held in conjunction with the First ACM International Conference on Digital Libraries, Bethesda, Maryland, 1996.
See the main database entry: The Brown University Women Writers Project.
Summary: Every SGML document must conform to some specified Document Type Definition (DTD). Maler and El Anduloussi explain the basics of DTD design, then present a methodology and series of techniques to help information professionals design, implement and document DTDs. [publisher's pre-publication description]
The books offers detailed treatment of the SGML "Document Type Definition (DTD) -- specifications that form the foundation for every document based on the SGML language. Therefore DTD quality is too important to be left to chance. This guide shows how to develop DTDs that work, based a proven methodology and techniques. KEY TOPICS: The book explains how DTD development benefits from the same rigorous treatment as software development: Articulate project goals, analyze requirements, write specifications, design and implement readable and maintainable code using good programming style, perform thorough testing, and document the work along the way. MARKET: The book is appropriate for writers, editors, and other subject matter experts; software developers and other DTD implementors; and publishing managers."
Additional information on the book is accessible via the Prentice Hall WWW server: http://www.prenhall.com/allbooks/ptr_0133098818.html - [mirror copy], [formerly].
Chet Ensign published a review of this book in "Structure Rules! Why DTDs Matter After All" (Markup Languages: Theory & Practice Volume 1, Number 1 [Winter 1999]). See the abstract in the issue summary, and the expanded/annotated Table of Contents in Deborah A. Lapeyre's complementary review article.
Abstract: "The third CETH Summer Seminar, co-sponsored by the Centre for Computing in the Humanities, University of Toronto, was held at Princeton University in the final two weeks of June. The thirty participants hailed from seven countries including the United States: Spain, Sweden, Canada, Australia, New Zealand, and Hong Kong. Participants also came from a variety of disciplines: humanities scholarship, computing, publishing, and the library communities. For two weeks we shared the facilities at Princeton University and tried to speak each other's language and see electronic texts from one another's point of view."
Also available via the Internet: CETH Newsletter, Fall 1994/1994 CETH Summer Seminar Report [or mirror copy]. For further details on the seminar, see the main entry.
The report summarizes major presentations by John Price-Wilkin (University of Michigan, HTI) and Gregory Murphy (CETH). Announcements about the CORE Project and SGML Open are also given. A copy of the report is available here.
Mallery supplies a detailed report from the session, chaired by Marianne Gaunt. Major presentations -- all dealing with SGML in some detail -- were given by David Seaman, Coordinator of the University of Virginia's Electronic Text Center; Mark Day, Co-Director of LETRS (Library Electronic Text Resources Services) of the Indiana University, with Perry Willett, the Coordinator of Collection Development of LETRS and Librarian for English and American Literature, Indiana University Libraries, and Gregory Murphy, Text Systems Manager at CETH (Center for Electronic Texts in the Humanities). An online version of the report is available.
Abstract: "The Chameleon Research Project has demonstrated that for one particular class of encoding schemes the problem of data translation can be solved in a general, elegant way. A translation technology, the Integrated Chameleon Architecture, unique in its combined functionality and integration, automates the generation of translation code. The technology eliminates coding errors, reduces iterations to achieve correct translations, and increases productivity. In this manuscript we describe how the architecture is used to specify translators and we report on our experience using the technology to develop translators for a document of type 'book' and for bibliographic databases." [Keywords: code generation, data translation, intermediate form, SGML.]
Existing computer systems to support scholarly writing are inadequate to meet the needs of authors. This paper presents a new model of scholarly writing, merging elements from several models of a scholar as a composing author. The new model identifies the activities that encompass the authoring task, arranged into three stages. The middle stage, composition, is bracketed by stages that support activities peripheral to the primary writing endeavor of forming a coherent sequence out of a set of ideas, notes, figures, and so on. A computer system that implements this model would eliminate inadequacies of existing support systems. The MAnuscript Development ENvironment, or MANDEN, project is building a prototype software architecture to instantiate the model. This paper describes the motivation for and the details of the writing model and identifies components of the model that are currently under development. Companion papers describe these components in more detail.
Abstract: "A translation often is required from one specific electronic encoding of a document to another. For example, an author may wish to translate an article encoded with LATEX to the same article encoded using Scribe, or using a macro version of the troff family. Different techniques and tools exist for achieving such translations. Techniques include using a pairwise or intermediate-form approach to the translation. Tools include programming languages, code-generating tools, and integrated, code-generating toolsets.
A person faced with a translation problem must choose among the various combinations of techniques and tools, with little guidance as to the comparative effort or quality that is achievable with different approaches. In this paper we discuss the complexity of comparing the various approaches. We describe an experiment that we have undertaken to begin to generate some comparative data. And, we discuss the potential significance of the experimental data. (From the Introduction).
Keywords and categories: Software Engineering; Data Storage Representations; Text Processing; Computers in Other Systems; format and notation, publishing; data translation, pairwise translation, intermediate-form translation."
Available online in PostScript format from OSU ([mirror copy of the text]). See also the appendices in files: ftp.cis.ohio-state.edu/pub/tech-report/1994/TR08-DIR/appendix1.gz, ...appendix2.gzm ...appendix3.gz, ...appendix4.gz, ...appendix5.gz.
Abstract: The Standard Generalized Markup Language, SGML, is being adopted by various international organizations as the medium for exchange of electronically encoded documents. An exchange is accomplished by way of a Document Type Definition, DTD, that describes the content of documents targeted for an exchange. In this paper we suggest guidelines for the designers of SGML DTDs. The guidelines emphasize uniformity and simplicity without sacrificing expressive power.
There exists a huge store of electronically encoded data, comprising a broad and varied collection of document databases. Examples of such electronic stores are various corpora, dictionaries, thesauri, and databases holding legal documents, abstracts of scientific manuscripts, and catalog card information. A primary goal of creating electronic stores of these data is to make them accessible to a wide audience, for a wide variety of activities such as queries, online data delivery and display, data exchange, and data analysis. An obstacle to achieving the goal of developing a full, rich set of software tools for data access is the often undue complexity of the underlying data representations. In the past, database designers have typically chosen their own, idiosyncratic representations, leaving software developers with the task of recreating scanners and parsers, and other components common to many software tools, from scratch for each database. The situation has greatly improved with the advent of standardized data representations of document databases, as encouraged by the Standard Generalized Markup Language, SGML, for example. However, these standard representations can themselves be utilized in such a way as to leave the final data representation difficult to access by humans and machines alike. In this paper we suggest guidelines for the designer of SGML document databases that emphasize uniformity and simplicity in the data representation, with little or no necessary loss of expressive power or functionality.
There is a need for widespread exchange of electronic documents in domains as diverse as book publishing, automated offices, factories and research laboratories. The variety of data representations, and the subsequent need for data translation, is a major obstacle to this exchange. This article describes our experiences in developing translators among three specific text formatters: Scribe, LaTeX and Troff. We used a standard form approach in developing the translation capability. We chose a Standard Generalized Markup Language (SGML) data type definition (DTD) for the manuscript type article, developed by the Association of American Publishers, as the basis for our work. We describe the difficulties that we encountered in developing the translators and present guidelines for future definers of SGML DTD's and future developers of translators for these DTD's (sic!).
This is a reply to the
Abstract: "There is a need for widespread exchange of electronic documents in domains as diverse as book publishing, automated offices, factories, and research laboratories. The variety of data representations, and the subsequent need for data translation, is a major obstacle to this exchange. This paper describes a comprehensive data translation system with the following characteristics: 1) it is derived from a formal model of the translation task; 2) it supports the buildingof translation tools; 3) it supports the use of translation tools; and 4) it is accessible to its targeted end-users. A software architecture to achieve the translation capability is fully implemented. Translators have been generated using the architecture, both by the original software developers and by industrial associates who have installed the architecture at their own sites."
Abstract: "As electronic manuscript exchange becomes more prevalent, problems arise in translating among the wide variety of electronic representations. The optimum solution is a system that can support both the use and the creation of translation tools."
Review: "This is a somewhat long-winded yet readable paper about how to organize a collection of tools to help translate between a standard manuscript representation (based on ISO SGML) and the myriad existing document representations. The authors point out that, owing to the high-level structure of SGML, translating to other representations is straightforward, but translating in the other direction inherently requires human assistance. Unfortunately, they have not yet implemented much of what they propose. The paper would have been more valuable if it had (1) left out the general discussion of the merits of standardization, and (2) emphasized more clearly the general translation paradigm, since it also applies to other applications, such as translating between musical scores and soundtracks." [review by David Alex Lamb, in the CACM database]
Available in PostScript format or in .gz-compressed PostScript format. Mirror copy here (September 1995). Email: Pietro Mancino, piero@stdoca.ericsson.se
Abstract: "This work introduces to the Standard Generalized Markup Language (SGML, formally ISO 88791). SGML is an international standard for electronic document exchange. SGML is the basis of the highly popular HyperText Markup Language (HTML) which, together with URLs (Uniform Resource Locators) and HTTP (HyperText Transfer Protocol), is one of the foundations of the World Wide Web initiative (WWW also known as Web)."
Available online: HTML format: An Introduction to SGML by ben marchal, or text format. Author contact: ben@brainlink.com
Abstract: The Perseus Project is a large-scale hypermedia research effort based on the premise that multifaceted evaluations of interactions between such emerging technologies as hypermedia and such complex human processes as learning guide the development of specific systems and illuminate human performance in electronic environments. Perseus is intended to provide an environment that lets people work more effectively with various primary source materials, including visual and textual materials, than is possible in print. Students may learn less quantitative information than in a course that runs through a fixed and linear curriculum, but they will develop an attitude of disciplined and respectful skepticism toward published interpretations. Perseus combines texts, images, and programs comprising a set of HyperCard stacks and data files; structured data include SGML texts, relational tables for catalogs and encyclopedia, PostScript drawings, LandSat images, and 35mm slides.
The paper is also published in Actes de la conférence "Technologie SGML 1996" organisée par le Centre de recherche en droit public de l'Université de Montréal, CRDP, 1996, pages 1-13. See also the conference entry, or possibly the site page. An online version is available in HTML format; [mirror copy, text only]; or in Word format.
See the main entry for EBSI-GRDS.
Marcoux, Yves et Martin Sévigny. "Querying hierarchical text and acyclic hypertext with generalized context-free grammars." Accepté pour publication, RIAO 1997.
See: RIAO'97 CONFERENCE, conference entry.
A version of the document from 1995 (see "Pourquoi SGML? Pourquoi maintenant?" - below) is available online: http://www.droit.umontreal.ca/crdp/fr/equipes/technologie/conferences/sgmlquebec/12.html; [mirror copy]
The JASIS version of the article is to appear in a special journal issue dedicated to structured information and standards for document architectures. See the main entry for EBSI-GRDS. See also (pending more complete bibliographic work): (1) MARCOUX, Yves. "Pourquoi SGML? Pourquoi maintenant?" Actes de la conférence "SGML et inforoutes; pour la diffusion optimale de l'information gouvernementale et juridique" organisée par le Centre de recherche en droit public de l'Université de Montréal et l'EBSI, CRDP, 1995, pp. 55-69; (2) MARCOUX, Yves. "Les formats normalisés de documents électroniques." ICO Québec, vol. 6, nos 1-2, printemps 1994, pp. 56-65; (3) HUARD, Guy; MARCOUX, Yves; POULIN, Daniel. Le SGML en documentation juridique et gouvernementale: potentiel et mise enoeuvre. Éditeur officiel du Québec, Québec, 1995; (4) Marcoux, Yves et Martin Sévigny. "Querying hierarchically structured texts with generalized context-free grammars." To appear in Proceedings of the 1996 annual SIGIR conference, 1996; (5) Sévigny, Martin et Yves Marcoux. "Conception et réalisation d'une interface-utilisateurs pour l'interrogation de bases de documents structurés." Soumis pour publication, 1996. [Details on the Home Pages of Yves Marcoux and Martin Sévigny]
See the predecessor to this article in French: "Pourquoi SGML? Pourquoi maintenant?" in Actes de la conférence "SGML et inforoutes; pour la diffusion optimale de l'information gouvernementale et juridique" organisée par le Centre de recherche en droit public de l'Université de Montréal et l'EBSI, CRDP, 1995, pages 55 - 69.
See the main document entry for the complete list of articles and contributors, as well as other bibliographic information.
Abstract: "MATHS (Mathematical Access for Technology and Science) is a recently completed project of the European Community which developed a SGML-based computer-oriented approach to teaching mathematics to visually-impaired students. A document-oriented architecture was implemented which permitted the software to be used by sighted, low-vision, and blind students and which supported multiple interactive input methods. SGML was the core component as it permitted the implementation of an application specific to mathematics, supported the input, output and interaction modes defined in the architecture, and enabled implementation with an existing SGML editor in a relatively short period of time. The knowledge gained in the MATHS project is not just of use to specialists in the area of accessibility but is of general applicability to human-computer interaction. In addition to describing the architecture of MATHS and its encoding of mathematics in SGML, this presentation will suggest ways to relate the results of the MATHS project to the general problem of computer application interfaces design."
"MATHS SGML markup was partly visual/presentational and partly semantic/structural, a balance which enabled a single application to provide good visual presentation and all the hooks for software processing. In order to manipulate mathematical expressions, all active objects were available to the software, providing practical lessons to the designers of the forthcoming document object model. Voice and other means of input and of output, critical to the operation of the MATHS environment, are being designed into the next generation of operating systems in order to address problems like repetitive motion syndrome, to make the computer useful to workers in types of work where it is not convenient to rely on a keyboard, and to increase the productivity of all computer users. The recognition that individuals absorb information in many different ways and the desire in interface design to make the presentation of information more flexible highlights the importance in MATHS of the great degree of customization possible in the presentation of math and its multiple modes of operation. Finally, MATHS uses SGML to implement a learning environment which is based on an abstract layer (the DTD) and is extensible (modify or replace the DTD). This extensibility was judged to be critical since mathematics is a field where both notations and pedagogical concepts are constantly evolving. The same approach to extensibility can be used to implement and evolve metaphors used in interface design."
This paper was delivered as part of the "Expert" track in the SGML/XML '97 Conference.
Note: The SGML/XML '97 conference proceedings volume is available from the Graphic Communications Association, 100 Daingerfield Road, Alexandria, VA 22314-2888; Tel: +1 (703) 519-8160; FAX: +1 (703) 548-2867. Complete information about the conference (e.g., program listing, tutorials, show guide, DTDs, conference reports) is provided in the dedicated conference section of the SGML/XML Web Page and via the GCA Web server. The electronic proceedings on CDROM was produced courtesy of Jouve Data Management (Jouve PubUser).
Abstract: This article describes the system used for the introduction of textual data into the CELEX full-text document databases. The solution implemented is based on the establishment of a text production database for the management and validation of texts before introducing them into the CELEX dissemination databases, and the management of structured documents described with the help of an SGML syntax. Note: CELEX (Communitatis Europææ LEX) is the computerized multi-lingual documentation system for European community law.
Contact: Commission of the European Communities, Service EUROBASES, 200 rue de la Loi, B-1049 Brussels, BELGIUM; Tel: +32-235-00-01; FAX: +32-235-00-03. Another description of CELEX is Geoffrey Gudgion, "SGML Applications in the European Commission,"
[2001-01-16 note:] "Alice in the wonderland of SGML: streamlining text entry in the CELEX databases." By J. Marin-Navarro and P.E. Alevantis (CEC, JMO C2/25 Bâtiment Jean Monnet, Plateau de Kirchberg, L-2920 Luxembourg). Brought online 16/01/2001. "Abstract: This article describes the system used for the introduction of textual data into the CELEX full-text document databases. The solution implemented is based on the establishment of a text production database for the management and validation of texts before introducing them into CELEX dissemination databases, and the management of structured documents described with the help of SGML syntax." [cache]
"Abstract: ISO/IEC JTC1 SC2/WG12, known collectively as the Multimedia and Hypermedia Information Coding Experts Group (MHEG), is developing a standard titled Coded Representation of Multimedia and Hypermedia Information. ANSI group XSVI.8M, known collectively as the Music Information Processing Standards (MIPS) committee, is developing a hypermedia document interchange standard titled HyTime/SMDI. HyTime has been officially accepted as an ISO project as well, following a successful new project ballot by ISO/IEC JTC1. The authors describes the history, technical orientation and status of the MHEG and HyTime projects, as well as their relationship to multimedia (e.g. MPEG) and document interchange (e.g. ODA and SGML) standards. The relationship between the standards is also explored, with emphasis on appropriate applications and situations where they can be used together in a complementary fashion."
"Abstract: A description is given of the history, technical orientation and status of the MHEG (Multimedia and Hypermedia Information Coding Experts Group) and HyTime projects. Their relationship to multimedia (e.g. MPEG) and document interchange (e.g. ODA and SGML) standards are also discussed. The relationship between the standards is explored, with an emphasis on appropriate applications and situations where they can be used together in a complementary fashion."
Abstract: "This talk will describe the future of information management within the various organizations and agencies that collectively are known as the United States Intelligence Community, including the CIA, NSA, DIA, and the now declassified NRO . The central focus of this talk will address what the US Intelligence Community believes to be the 'information revolution' of the Third Millennium, with an impact similar to that experienced in past millennia in both the agriculture and industrial revolutions. Kept secret as classified information in all fifty previous years since its inception, the Intelligence Community of the US Government recently confirmed that its budget last year totaled $26.6 billion dollars. This talk will provide an explanation of the possible role and impact that the ITMRA (Information Technology Management Reform Act), passed by Congress in August 1996, will have on the future of information management in the Intelligence Community, and how that relates to this industry. It will describe the transition to web-centric, electronic publishing of our nation's intelligence reports, known as 'finished intelligence' into an integrated information space. Describing what the future world of 'Virtual Intelligence' will really look like, this talk will explore the concept of a more 'agile' intelligence enterprise, giving us insight into how the US Intelligence Community plans to achieve its goal of an electronically networked environment for the production and exchange of intelligence, a goal deemed essential to national security in the 21st Century."
This paper was delivered as part of the "Case Studies" track in the SGML/XML '97 Conference.
Note: The SGML/XML '97 conference proceedings volume is available from the Graphic Communications Association, 100 Daingerfield Road, Alexandria, VA 22314-2888; Tel: +1 (703) 519-8160; FAX: +1 (703) 548-2867. Complete information about the conference (e.g., program listing, tutorials, show guide, DTDs, conference reports) is provided in the dedicated conference section of the SGML/XML Web Page and via the GCA Web server. The electronic proceedings on CDROM was produced courtesy of Jouve Data Management (Jouve PubUser).
Abstract: "This talk will describe how the US National Security Agency, the Central Intelligence Agency, the Defense Intelligence Agency, the National Reconnaissance Office and other top agencies that collectively are known as the United States Intelligence Community are significantly improving their intelligence gathering and reporting operations through the development and implementation of advanced technology including networking concepts and international information standards such as SGML.
The central focus of this talk will be a description and discussion of Intelink, the classified, world wide 'Intranet' for the Intelligence Community. Intelink, and the Intelink community address one of the world's largest data management problems, involving demanding requirements that are at the extreme of what normal enterprises require.
Intelink is now operational for a broad base of intelligence customers and consumers from the warfighter to the White House. Intelink is currently being used in support of several basic and key functional areas. Perhaps the most significant of these areas is the electronic publishing and distribution of our nation's intelligence reports. This talk will discuss how our "Signals Intelligence" (SIGINT) Reports have gone from the world of reports in only ASCII text to robust multimedia formats with distribution, using SGML, over Intelink. The talk will also address other key functional areas including analytical research, collaboration facilities, and training.
The talk will address several of the unique problems, concerns, challenges and special features that distinguish Intelink from other Intranet applications. These issues include networking; architecture and standards; analyst collaboration issues; and finally encryption and other security considerations that are unique to this special environment.
The talk also will provide specific examples of Intelink SGML applications in several agencies within the US Intelligence Community. These examples will present insights into the issues, problems, and solutions for organizations desiring to take advantage of emerging technology allowing them to realize tangible cost savings as well as to enjoy significantly improved capabilities.
The talk will conclude with an examination of the future for Intelink, including plans for enhanced analyst collaboration, security boundaries/access control, and an improved Graphical User Interface."
Note: The above presentation was part of the "SGML Case Studies" track at SGML '96. The SGML '96 Conference Proceedings volume containing the full text of the paper may be obtained from GCA.
Abstract: "SGML has been an ISO standard for ten years now. It was being adopted and implemented even before the final standard was published, and its user community is now very large, with thousands of applications. But is SGML a standard for all times?
SGML has always faced competition from systems that now are largely forgotten. Only three years ago, a distinguished consultant proclaimed that WYSIWYG was dying. Will SGML be able to continue its record of success in the face of HTML (you mean that's not SGML?), PDF, OpenDoc, OLE, and the surprising continued vitality of proprietary systems?
SGML has been a remarkably stable standard in the past decade, but will it remain so in the next? Fashions in computing and data management have changed in the years since the development of SGML was begun. In the past year, GCA's conferences have devoted an increasing amount of time to HyTime and DSSSL, new standards that may offer foreshadowings of changes to SGML itself. Perhaps the next year will bring us the long-awaited revision of the base standard. Will SGML still be SGML?
There may be no one answer for these questions. As users and proponents of SGML, we need to take a hard look at our requirements and define what we need from the standard and its implementers. more significantly, we need to understand what information is and what we expect it to do for us. Only with that understanding can we devise good SGML applications, make the right requests from vendors, make the right links between SGML data and other kinds of information-or design a good replacement for SGML."
Note: The above presentation was part of the "SGML Expert" track at SGML '96. The SGML '96 Conference Proceedings volume containing the full text of the paper may be obtained from GCA.
Abstract: "In 1995 and early 1996, the ISO standards process that includes SGML and related standards has seen a remarkable coalescence of efforts that should be beneficial to all of us. Most notably, DSSSL and HyTime are developing a shared approach to tree structures and query languages. A consequence of this may be the development of a set of general facilities that can be shared among all SGML-based standards and that, when incorporated into products, will make our documents easier to work with and more powerful in their ability to deliver information."
See also the ISO/IEC JTC1/SC18/WG8 Web Service, WWW server for 'Information Technology -- Document Processing and Related Communication.' James Mason is the Convenor for WG8.
See the main document entry for the complete list of articles and contributors, as well as other bibliographic information.
The document discusses issues facing the DOE as it incorporates SGML into the processes of generating and distributing technical reports. The report may be obtained as http://nuke.handheld.com/NIRMA_Docs/EIE/Meetings/Mason.html or in mirror copy here [copy dated April 12, 1995; original filestamp July 4, 1994].
Abstract: "The Alabama State Legislature has begun an extensive re-engineering effort to improve the process and technologies used to craft and enact legislation and to improve the means through which the public can be directly involved in the legislative process. When this project is completed, the State will have an information system that provides repository based authoring and publishing, client/server legislative operation systems with a 'real-time' Internet interface. SGML is an important component of this complex application.
"In this session we will present an overview of the application and we will discuss in more detail how we used object oriented information engineering analysis and document analysis to develop a robust information model. We will discuss the design challenges we faced integrating SGML with traditional database technologies, mult-tiered client server technologies and the Internet. Most important, we will share valuable lessons learned about designing and building repository based SGML systems."
Note: The electronic conference proceedings in hypertext were produced by Inso Corporation (DynaText) and by High Text (EnLIGHTeN). Information about the SGML Europe '97 Conference may be found in the main database entry.
Abstract: "Exceptions are used in many standard DTDs, including HTML, because they add expressive power for DTD authors. However, there is a tradeoff: although they are useful, exceptions add significantly to the complexity of DTDs. Authoring DTDs is a difficult task, and existing tools are of limited use because of the lack of a suitable formal model for exceptions. This paper describes methods for constructing a static model that completely and precisely describes DTDs with exceptions. A software tool has been written to implement the methods and to demonstrate some practical applications. Examples are shown of how the tool is used for DTD authoring, and some useful extensions of the tool are described. For one example DTD, the output of the tool is converted into a regular expression grammar. Preliminary studies indicate that general case algorithms can be developed for this conversion. This would allow existing theory for the context free languages to be used in developing SGML applications. Statistical results are shown from running the software tool on a number of industry and government DTDs and for three successive versions of HTML. The results illustrate that the complexity of DTDs in practice is approaching, or has exceeded, manageable limits with existing tools. The formal model and its applications are needed for SGML and continued development of these methods may impact the evolution of HTML, XML, and related web publishing standards. Some specific projects are proposed, where continued development of the model can result in more powerful tools and new kinds of applications for SGML."
[Conclusion: The paper provides evidence to illustrate] "the complexity of DTDs with exceptions, which in turn implies high costs for DTD design and corresponding problems with quality. These results also show that the complexity of some DTDs is approaching (or has exceeded) manageable limits given existing tools for designing and understanding them. There is clearly a need for more powerful tools for DTD design and analysis and for subsequent SGML processing. The software tool described in this paper is useful for understanding (viewing) DTDs with exceptions and for detecting errors caused by the incorrect use of exceptions. Several practical extensions of the tool are described that provide other new capabilities for DTD analysis. Because exceptions are an integral part of SGML, any generalized SGML tool must support them. There are previous theoretical results for formal language models of DTDs with exceptions ([Matzen, "Model"]; [Kilpeläinen and Wood, "SGML and Exceptions"]). However, this is the first description of an implementation, and thus it provides a foundation for a new generation of applications and tools."
"The expanded DTDs output by the software tool are a powerful extension of the model; these can be used to construct DTDs without exceptions that are pseudo-equivalent to the original DTDs with exceptions. This allows authors to design DTDs using the expressive power of exceptions while managing their side-effects. Also, the methods shown for converting DTDs with exceptions to regular expression grammars provide a powerful formal foundation, the existing theory for the context free languages, to be used in developing new kinds of SGML applications. The continued development of the methods and tools described in this paper can be a significant factor in the future success of SGML, and they would affect the evolution of HTML, XML, and other standards for the World Wide Web."
The document is available online in PDF format - "A new generation of tools for SGML." [local archive copy] See also: "SGML exceptions analysis" (results from running the prototype software tool described in "A New Tool for SGML with Applications for the World Wide Web," Proceedings of the 1998 ACM Symposium on Applied Computing, February, 1998). For other articles in this issue of MLTP, see the annotated Table of Contents.
Revision: Received 22 June 1998, Revised 31 July 1998.
Abstract: "The Standard Generalized Markup Language (SGML) is an international standard (ISO 8879) for document definition and interchange. It is widely used in government and industry, and it has received increased attention from academia since HTML evolved to a formal application of SGML. SGML is a meta-language scheme for defining the structure of documents. A Document Type Definition (DTD) is a finite set of productions called element declarations; DTDs are similar to context free grammars, but the productions are more complex. One important optional feature of element declarations is called exceptions. Exceptions add expressive power for DTD authors, and thus are used in most industry and government standard DTDs, including HTML. Although exceptions are useful, they significantly add to the complexity of DTDs. Existing tools for DTD design and analysis are of limited use, because of the lack of a static formal model for exceptions. This paper describes a static model that completely and precisely describes the effects of exceptions on DTDs; a software tool has been written to implement the theory and to demonstrate some practical applications. Results are shown for three versions of the HTML DTD. The results show that the language model and its applications are needed for SGML, and that continued development of these methods may impact the evolution of HTML and related web publishing standards."
See the associated SGML exceptions analysis (results): The results shown in [this results set] are from running the prototype software tool described in the above paper. And see the authors' paper, "Unraveling Exceptions," Conference Proceedings: SGML/XML 97, Washington D.C., December, 1997. [local archive copy]
[Check Proceedings Volume 'Paper 121'?]
Available online in Postscript format. [local archive copy] See: SAC'98, the 1998 ACM Symposium on Applied Computing and the online Abstract.
Abstract: "Authoring DTDs is a difficult task: they typically contain over fifty element declarations and they are often recursive. This complexity implies high costs for DTD design and subsequent document processing. It also means that DTDs may have corresponding problems with quality.
"Exceptions are used in many standard DTDs because they add expressive power for DTD authors. However, there is a tradeoff: although they are useful, they are also a big part of the complexity problem. It is difficult to view the effects of exceptions on DTDs, primarily because of the lack of a formal static model.
"This presentation describes a static model that gives a complete and precise view of DTDs with exceptions. The model provides a foundation for new kinds of applications for processing SGML. A software tool has been developed to implement the model and to demonstrate its potential. Some specific projects are outlined, where continued development of the model and tool will have a significant impact on the success of SGML and related web publishing standards. One proposed project is the development of an automated SGML to XML converter."
This paper was delivered as part of the "How To" track in the SGML/XML '97 Conference.
Note: The SGML/XML '97 conference proceedings volume is available from the Graphic Communications Association, 100 Daingerfield Road, Alexandria, VA 22314-2888; Tel: +1 (703) 519-8160; FAX: +1 (703) 548-2867. Complete information about the conference (e.g., program listing, tutorials, show guide, DTDs, conference reports) is provided in the dedicated conference section of the SGML/XML Web Page and via the GCA Web server. The electronic proceedings on CDROM was produced courtesy of Jouve Data Management (Jouve PubUser).
Abstract: "The Standard Generalized Markup Language (SGML) is an international standard for document definition (ISO 8879) that was adopted in 1986 and is rapidly gaining acceptance in industry and government. It is a meta-language system for document design rather than a specific scheme for document processing; almost any kind of document can be described using SGML. Productions called element declarations are used to define arbitrary elements of documents and the context in which they can occur. A finite set of element declarations called a document type definition (DTD) defines the high-level syntax of a set of documents. DTDs are similar to context-free grammars, but the productions are more complex. The Standard does not describe a formal language model for SGML, and there is little work in the literature on this topic."
"This article defines a formal language model for SGML; systems of finite automata from systems of regular expressions. The model is applied in two ways: a parser is constructed for DTDs, and methods are shown for automatically constructing parsers for the documents defined by a DTD. These methods for parsing SGML are new, and they include features of DTDs that have not previously been included in a static language model. The model applies directly to the syntactic constructs of SGML, and thus, the methods shown in this article have distinct advantages for parsing SGML over traditional context-free parsing methods." [online abstract]
Abstract: The Standard Generalized Markup Language (SGML) is a meta-language system for document representation that was adopted as an ISO standard in 1986. In SGML, element declarations define the logical components (elements) of documents; a content model is the part of an element declaration that defines the content of the elements. SGML defines and prohibits "ambiguous content models" but does not show a method for detecting them. Model groups, the only required components of content models, are expressions similar to regular expressions. This paper defines ambiguous model groups and gives an algorithm for detecting them. When the optional components of element declarations are not considered, the algorithm detects ambiguous content models as defined by the standard. The algorithm is based on a construction of indexed nondeterministic finite automata (NFAs) in which each arc is bound to a particular occurrence of an element symbol in a model group.
Report on a March 14, 1995 meeting of the Detroit Chapter Midwest SGML Forum. Contact for the Midwest Forum: maz@xyvision.com [Mike Maziarka].
Mike Maziarka of XyVision reports on the election of new officers for the Midwest SGML Forum, for 1995/1996. Contact: (Forum president) Mike Mercier, Deere & Company, email MM46100@deere.com.
Report on the election of the 1994-1995 board for the Midwest SGML Forum, and announcements for future meetings. Contact: maz@dlogics.com
Abstract: "Publishing to the Web introduces a new set of c

