Cover Pages: Markup Languages: Theory and Practice. Volume 1, Number 2: Table of Contents

Annotated Table of Contents

[CR: 19990528]

Fahrenholz-Mann, Sally. "SGML for Electronic Publishing at a Technical Society: Expectation Meets Reality." [PROJECT REPORT] Markup Languages: Theory & Practice 1/2 (Spring 1999) 1-30 (with 4 references). ISSN: 1099-6621 [MIT Press]. Author's affiliation: Electronic Publications Manager, ASM International [Materials Information Society]; Email: spfahren@po.asm-intl.org; Tel: +1 440-338-5151 X 5653.

Abstract: "This case study describes the ongoing SGML implementation for electronic publishing from a legacy data conversion at ASM International, a technical society. The promise of SGML is that it provides a non-proprietary, media-independent, structured environment for information capture, management, and publishing. A commitment to SGML is often couched in dollar terms; most implementors are aware that there are tremendous costs in data conversion, DTD development, and editorial tools. These tremendous costs drive a high level of expectation. Implementors might expect seamless management and delivery of information, and "instant" media independence from the use of SGML. Reality, however, rarely meets these expectations. The fact is, that even after all the expense and design work, SGML still makes a significant demand from an adopting organization in staffing and in the acquisition and development of technical skills. When staffing levels are increased to an adequate level, the use of SGML provides the impetus for an organization to begin widening its vision of its publishing capabilities. This article provides an overview of ASM's publishing paradigm before the deployment of SGML and offers a detailed narrative of the various decisions and strategies involved with this SGML implementation, from the initial stages of DTD development and system installation, through data conversion and verification processes to the successful publication of the ASM Handbooks on CD-ROM, released in installments throughout 1998 and 1999. There is particular emphasis on the specific challenges involved with adopting SGML as a solution to encoding, managing, and publishing complex scientific reference material."

[Received 23-June-1998. Revised 19-February-1999.]

[CR: 19990528]

Burnard, Lou. "Using SGML for Linguistic Analysis: The Case of the BNC." [ARTICLE] Markup Languages: Theory & Practice 1/2 (Spring 1999) 31-51 (with 7 references). ISSN: 1099-6621 [MIT Press]. Author's affiliation: Manager, Humanities Computing Unit, Oxford University Computing Services. WWW: http://users.ox.ac.uk/~lou; Email lou.burnard@oucs.ox.ac.uk. Also: The British National Corpus (BNC).

Abstract: "The British National Corpus (BNC) is a rather large SGML document, comprising some 4124 samples taken from a rich variety of contemporary British English texts of every kind, written and printed, famous and obscure, learned and ignorant, spoken and written. Each of its hundred million words and six and a quarter million sentences is tagged explicitly in SGML and carries an automatically-generated linguistic analysis. Each sample carries a TEI-conformant header, containing detailed contextual and descriptive information, as well as more conventional SGML mark-up. The corpus was created over a four year period by a consortium of leading dictionary publishers and academic research centers in the UK, with substantial funding from the British Department of Trade and Industry, the Science and Engineering Research Council, and the British Library. On publication, it was made freely available under license within the European Union, where it is increasingly used in linguistic research and lexicography, in applications ranging from the construction of state of the art language-recognition systems, to the teaching of English as a second language. This paper describes how the corpus was constructed, and gives an overview of some of the SGML encoding issues raised during the process. A brief description of the special purpose SGML aware retrieval system developed to analyse the corpus and its current status is also provided."

[The SARA System: The SARA system was designed for client-server mode operation, typically in a distributed computing environment, where one or more work-stations or personal computers are used to access a central server over a network. This is, of course, the kind of environment which is most widely current in academic (and other) computing milieux today. The success of the World Wide Web, which uses an identical design philosophy, is vivid testimony to the effectiveness of this approach. The system has four chief components: (1) the indexing program, which generates an index of tokens from an SGML marked-up text; (2) the server program, which accepts messages in the Corpus Query Language (see below) and returns results from the SGML text; (3) the SARA protocol, a formally defined set of message types which determines legal interactions between the client and server programs; this protocol makes use of a high-level query language known as CQL (for Corpus Query Language); (4) one or more client programs, with which a user interacts in any appropriate platform-specific way, and which communicate with the server program using the protocol."

A version of this document is available online in HTML format: "Using SGML for Linguistic Analysis: the case of the BNC" (October 1996, revised May 1998, 'Automagically generated by lite2html on 9-Jun-98'). [local archive copy]

[Received 27-July-1998. Revised 12-August-1998.]

[CR: 19990528]

Nicol, Gavin Thomas. "I Met a Space That Wasn't There." [SQUIB] Markup Languages: Theory & Practice 1/2 (Spring 1999) 52. ISSN: 1099-6621 [MIT Press]. Author's affiliation: Inso Corporation; Email: gtn@eps.inso.com.

"A reminiscence of an invisible bug. The importance of native language markup, and the role the SGML declaration plays in an SGML system, are fairly well understood these days, partly due to the tireless efforts of Rick Jelliffe on the ERCS, and partly due to a lot of work done on HTML I18N (Internationalization). XML, thankfully, has a fixed SGML declaration that is designed for I18N. To give some idea of why people should be so thankful, I would like to relate a true story from personal experience. . . All because of a little zenkaku ['full-width'] space. . . ."

[CR: 19990528]

Kilpeläinen, Pekka. "SGML & XML Content Models." [ARTICLE] Markup Languages: Theory & Practice 1/2 (Spring 1999) 53-76 (with 18 references). ISSN: 1099-6621 [MIT Press]. Author's affiliation: Department of Computer Science, P. O. Box 26 (Teollisuuskatu 23), FIN-00014 University of Helsinki, Finland. Email: Pekka.Kilpelainen@cs.Helsinki.FI; WWW.

Abstract: "The SGML and XML standards use a variation of regular expressions called content models for modeling the markup structures of document elements. SGML content models may include so called and groups, which are excluded from XML. An and group, which is a sequence of subexpressions separated by an &-operator, denotes the sequential catenation of its subexpressions in any possible order. If one wants to shift from SGML to XML in document production, one has to translate SGML content models to corresponding XML content models. The allowed content models in both SGML and XML are restricted by a requirement of determinism, which means that a parser recognizing document element contents has to be able to decide without lookahead, which content model token to match with the current input token, while processing the document from left to right. It is known that not all SGML content models can be expressed as an equivalent XML content model. It is also known that transforming an SGML content model into an equivalent XML content model may cause an exponential growth in the length of the content model. We present methods for eliminating and groups and analyze formally the circumstances where they can be applied. We also consider the length of the resulting content models. We derive a tight bound of en! on the number of symbols in the result of eliminating an and group of n symbols, where e = 2.71828 . . . is the base of natural logarithms. We also show that minimal deterministic automata for recognizing an and group of n distinct element names contain 2ⁿ states and n2^n-1 transitions, excluding the failure state and transitions leading to it."

A version of the document is available online in Postscript format. Local archive copies: abstract, full paper.

Note other publications by the author: (1) Technical Report C-1999-2. Jani Jaakkola and Pekka Kilpelainen, Nested Text-Region Algebra, January 1999. HTML and Postscript. (2) Technical Report C-1996-83. Jani Jaakkola and Pekka Kilpelainen, Using sgrep for querying structured text files, November 1996. HTML and Postscript. See the author's Web site for other publications on SGML and structured text.

[Received 7-August-1998.]

[CR: 19990528]

Streich, Robert. "Techniques for Managing Collections of Interrelated Text Modules." [PRACTICE NOTE] Markup Languages: Theory & Practice 1/2 (Spring 1999) 77-94 (with 21 references). ISSN: 1099-6621 [MIT Press]. Author's affiliation: Schlumberger Austin Research; Email streich@slb.com.

Abstract: "There are many advantages to breaking up complete documents into small, relatively discrete chunks or 'text modules': multiple authors can work more easily on the same document, the text modules can be served up individually as part of an on-line help or performance support system, and the modules can be reused in other documents. But how can we reuse modules in different documents with some assurances that they fit the new context? How will we track the dependencies between modules? How will we address the increased complexity of managing a library of shared text modules? To answer these questions, we need to define: (1) a way to bound a text module, (2) a way to locate the modules we want in a large library, (3) a way to combine them into a coherent document. This paper proposes solutions to the last two items based on two fields of research in software engineering: module interconnection languages and faceted classification. A proposal for the first item is left for a later work."

[Received 7-August-1998. Revised 6-November-1999.]

[CR: 19990528]

Piez, Wendell. "Review (With 'Annotated Table of Contents') of XML for Dummies." [REVIEW] Markup Languages: Theory & Practice 1/2 (Spring 1999) 95-96. ISSN: 1099-6621 [MIT Press]. Author's affiliation: Mulberry Technologies, Inc.; Email: wapiez@mulberrytech.com; WWW: http://www.mulberrytech.com/people/piez/.

Annotated Table of Contents for Tittel, Ed; Mikula, Norbert; Chandak, Ramesh. XML for Dummies. Foreword by Dan Connolly. Foster City, CA: IDG Books Worldwide, Inc., 1998. Extent: xxviii + 367 pages, CDROM. ISBN: 0-7645-0360-X.

See also the dedicated Web site for the book, with detailed chapter summaries, URL collections, examples, and other resources.

[CR: 19990528]

Piez, Wendell. "Review (With 'Annotated Table of Contents') of XML: Extensible Markup Language." [REVIEW] Markup Languages: Theory & Practice 1/2 (Spring 1999) 97-98. ISSN: 1099-6621 [MIT Press]. Author's affiliation: Mulberry Technologies, Inc.; Email: wapiez@mulberrytech.com; WWW: http://www.mulberrytech.com/people/piez/.

Annotated Table of Contents for Harold, Elliotte Rusty. XML: Extensible Markup Language. Structuring Complex Content for the Web. Foster City/Chicago/New York: IDG Books Worldwide, 1998. Extent: xxiv + 426 pages, CDROM. ISBN: 0-7645-3199-9.

[CR: 19990528]

Piez, Wendell. "Review (With 'Annotated Table of Contents') of The XML and SGML Cookbook. Recipes for Structured Information." [REVIEW] Markup Languages: Theory & Practice 1/2 (Spring 1999) 99-101. ISSN: 1099-6621 [MIT Press]. Author's affiliation: Mulberry Technologies, Inc.; Email: wapiez@mulberrytech.com; WWW: http://www.mulberrytech.com/people/piez/.

Annotated Table of Contents for Jelliffe, Rick. The XML and SGML Cookbook. Recipes for Structured Information. The Charles F. Goldfarb Series on Open Information Management. Upper Saddle River, NJ: Prentice Hall PTR, May 1998. Extent: 650 pages, CD-ROM. ISBN: 0-13-614223-0.


SEARCH \| ABOUT \| INDEX \| NEWS \| CORE STANDARDS \| TECHNOLOGY REPORTS \| EVENTS \| LIBRARY

Overview

Annotated Table of Contents