Cover Pages: Markup Languages: Theory and Practice. Volume 1, Number 3: Table of Contents

Overview

This document contains an annotated Table of Contents for Markup Languages: Theory and Practice, Volume 1, Number 3 (Summer 1999). Markup Languages: Theory and Practice (ISSN: 1099-6622) is published by MIT Press Journals. Editors in Chief for MLTP are B. Tommie Usdin (Mulberry Technologies, Inc.) and C. M. Sperberg-McQueen (University of Illinois/Chicago). A journal description with an overview of the Editorial Structure is provided in a separate document. See also the annotated Table of Contents for Volume 1, Number 1 (Winter 1999) and Volume 1, Number 2 (Spring 1999).

Annotated Table of Contents

[CR: 19991002]

Smith, Joan M.; Usdin, B. Tommie; Sperberg-McQueen, C. Michael. "Interview with Joan Smith." [COMMENTARY and OPINION] Markup Languages: Theory & Practice 1/3 (Summer 1999) 1-6. ISSN: 1099-6621 [MIT Press]. Author's affiliation: [Smith:] Chairman, SGML Technologies Group; [Usdin:] Mulberry Technologies, WWW; [Sperberg-McQueen: University of Illinois at Chicago; WWW].

The editors-in-chief of Markup Languages: Theory & Practice interview Joan Smith, who was instrumental in promoting SGML, especially in Europe.

"Joan Smith is Chairman of the SGML Technologies Group of a pan-European companies, with subsidiaries in Brussels and Luxembourg; the largest group of companies specializing in SGML in Europe. She founded the International SGML Users' Group, has written numerous books and papers on SGML, and has organized conferences on SGML. She has recently been accepted as a Freeman of the Worshipful Company of Information Technologists of the City of London. She is a Fellow of the British Computer Society, a Member of the Institute of Directors, and was the first European to receive the GCA's Tekkie award and later the GCA's International SGML Award."

Note: Some of Joan Smith's publications are referenced in the main bibliography reference collection. A larger number of Smith's publications -- over ninety-nine (99)! -- are referenced in the larger print bibliography for SGML and related standards.

[CR: 19991002]

Lubell, Joshua. "Structured Markup on the Web: A Tale of Two Sites." [ARTICLE] Markup Languages: Theory & Practice 1/3 (Summer 1999) 7-22 (with 20 references). ISSN: 1099-6621 [MIT Press]. Author's affiliation: National Institute of Standards and Technology, 100 Bureau Drive, Stop 8260, Gaithersburg, MD 20899-8260, USA. Tel: +1 (301) 975-3563. Email: [email protected]; WEB http://www.nist.gov/msidstaff/lubell.htm.

"Businesses and organizations are increasingly finding that HTML (Hyper-Text Markup Language) offers no help whatsoever in managing the information on their web sites. SGML (Standard Generalized Markup Language) provides the flexibility and reuse lacking in HTML. However, SGML alone does not address the problems involved in maintaining online document repositories. Although traditional database management systems are clumsy at managing hyperlinked documents, a system combining SGML, database technology, and the protocols of the Web can provide a reasonably robust environment for developing and maintaining a web site. Two possible site designs employing SGML are discussed and evaluated with respect to a set of design objectives and choices. The likely impact of the emerging XML (Extensible Markup Language) standard on web site design is also discussed."

"Sites 1 and 2 illustrate a dilemma that today's web site developers to take advantage of the benefits of SGML. On the one hand, they can rely heavily on SGML's ability to represent data in an application-specific, structured manner and on CGI to dynamically generate browser-ready web output in response to SGML database queries. While such a site design enables users to quickly find information through application-specific queries and is easier to maintain than a collection of HTML documents, it requires extra effort on the part of content providers, additional server overhead, and the implementation of hyperlinking if links to off-site web pages are desired. On the other hand, web site developers may choose to minimize the burden on content providers and to maximize server performance, interoperability with web search engines, and linkage with other web sites. In this case, they must sacrifice application-specific structured query capability and implement tools for managing entities and maintaining hyperlinks. The emerging XML standards promise to provide web site developers with the best of both worlds, allowing them to enjoy most of the benefits of SGML while not sacrificing the convenience of HTML and interoperability with the rest of the Web. If XML is ultimately successful, not only will it be easier for web site developers to use SGML, but also they will be able to take advantage of newly available capabilities to make their content easier for users to read and easier for web clients and other desktop applications to interpret."

More information about the work discussed in this paper is available on the Internet at http://www.nist.gov/apde.

[Received 23 June 1998. Revised 21 October 1998.]

A related version of this publication is available online.

Note: Lubell's work on XML includes the PSL (Process Specification Language) Project. Preliminary findings describing how the PSL semantic concepts may be mapped to the eXtensible Markup Language (XML) is now available. For references, see "Process Specification Language (PSL) and XML."

[CR: 19991002]

Cameron, Robert D. "REX: XML Shallow Parsing with Regular Expressions." [ARTICLE] Markup Languages: Theory & Practice 1/3 (Summer 1999) 61-88 (with 5 references, 3 appendices). ISSN: 1099-6621 [MIT Press]. Author's affiliation: Professor, School of Computing Science at Simon Fraser University; Associate Dean of the Faculty of Applied Sciences, SFU..

"The syntax of XML is simple enough that it is possible to parse an XML document into a list of its markup and text items using a single regular expression. Such a shallow parse of an XML document can be very useful for the construction of a variety of lightweight XML processing tools. However, complex regular expressions can be difficult to construct and even more difficult to read. Using a form of literate programming for regular expressions, this paper documents a set of XML shallow parsing expressions that can be used as a basis for simple, correct, efficient, robust and language-independent XML shallow parsing. Complete shallow parser implementations of less than 50 lines each in Perl, JavaScript and Lex/Flex are also given."

[From the conclusion:] "The simplicity of the shallow parsing model based on regular expressions suggest suggests some interesting possible directions for development of XML. First of all, a shallow parsing representation such as that produced by REX could be a useful reference representation for a revised XML specification. Such a refer-ence representation would have the advantage of providing a language-independent approach to shallow parsing encoded in the standard, with a language-independent implementation framework based on regular expressions. Furthermore, it may be possible to relax certain XML restrictions that can be easily accommodated by regular-expression processing, such as the restriction that attributed values must always be quoted. However, possibilities such as these must be carefully weighed by the overall XML development community."

[CR: 19991002]

Mikheev,Andrei; Grover, Claire; Moens, Marc. "XML Tools And Architecture for Named Entity Recognition." [ARTICLE] Markup Languages: Theory & Practice 1/3 (Summer 1999) 89-113 (with 13 references). ISSN: 1099-6621 [MIT Press]. Authors' affiliation: University of Edinburgh, HCRC Language Technology Group. 2 Buccleuch Place, Edinburgh EH8 9LW, UK. [Mikheev:] [email protected]; [Grover:] [email protected]; Marc [Moens:] [email protected].

"Named Entity recognition involves identifying expressions which refer to (for example) people, organizations, locations, or artifacts in texts. This paper reports on the development of a Named Entity recognition system developed fully within the XML paradigm. In the section 'Named Entity recognition' we describe the nature of the Named Entity recognition task and the complexities involved. The system we developed was entered as part of a DARPA-sponsored competition, and we will briefly describe the nature of that competition. We then give an overview of the design philosophy behind our Named Entity recognition system and describe the various XML tools that were used both in the development of the system and that make up the runtime system (section "LTG text handling tools"), and give a detailed description of how these tools were used to recognize temporal and numerical expressions (section "TIMEX, NUMEX") and names of people, organizations and locations (section "ENAMEX"). We conclude with a description of the results we achieved in the competition, and how these compare to other systems (section 'Conclusion), and give details on the availability of the system (section Availability').

[System description:] "One of the design features of the system which sets it apart from other Named Entity recognition systems is that it is designed fully within the SGML paradigm: the system is composed of several tools which are connected via a pipeline with data encoded in SGML or XML. This allows the same tool to apply different strategies to different parts of the texts using different resources. The tools do not convert from SGML into an internal format and back, but operate at the SGML or XML level. Our system does not rely heavily on lists or gazetteers but instead treats information from such lists as "likely" and concentrates on finding contexts in which such likely expressions are definite. In fact, the first phase of the enamex analysis uses virtually no lists but still achieves substantial recall. The system is document centered. This means that at each stage the system makes decisions according to a confidence level that is specific to that processing stage, and draws on information from other parts of the document. The system is hybrid, applying symbolic rules and statistical partial matching techniques in an interleaved fashion. A runtime version of the system described here is available for free at http://www.ltg.ed.ac.uk/software/ne/. We also have a set of tools which can be used to develop a Named Entity recognition system. The tool suite is called LT TTT, and is available from http://www.ltg.ed.ac.uk/software/ttt/. LT TTT consists of lttok, ltstop and fsgmatch, a number of resource files for tokenization, for end-of-sentence disambiguation, and for the recognition of temporal expressions, and tools for extending these resource grammars or for creating new ones. It also has a visual interface which uses XSL style sheets to render the XML Named Entity annotation in a form that is easier to inspect. The part of speech tagger is available as a separate tool. See http://www.ltg.ed.ac.uk/software/pos/.

[Received 6 March 1999, Accepted 26 May 1999.]

[CR: 19991002]

Tidwell, Doug. "IBM's TaskGuide: An XML-Based System for Creating Wizard-Style Helps." [PROJECT REPORT] Markup Languages: Theory & Practice 1/3 (Summer 1999) 23-39. ISSN: 1099-6621 [MIT Press]. Author's affiliation: Advisory Programmer, IBM Corporation, Human Interface Group. E20D/500, P.O. Box 12195, Research Triangle Park, NC 27709. Tel: 1+ (919) 254-5128; FAX 1+ (919) 543-4118; Email: [email protected].

"Wizards have been a part of workstation products since the early 1990s. A wizard is a task-oriented dialog that guides the user through a given task, automating as much of that task as possible. A typical wizard panel has a graphic area on the left, a set of navigation buttons on the bottom, and an area on the right that contains any text and controls needed for the task at hand."

"IBM's TaskGuide technology gives Technical Writers and Human Factors professionals the ability to create wizards. Based on the premise that task analysis is the most difficult part of creating an effective wizard, our tools let you focus on design, not writing code. This paper discusses the basics of wizard technology, followed by a discussion of the XML-based system we have created. We cover some of the key design decisions we had to make, and introduce some of the unique features of our product. We also discuss the changes we have made to our product as technology has changed around us. Finally, we demonstrate a recursive document, a wizard that creates another wizard."

"IBM's TaskGuide technology allows technical writers to create wizard panels without programming. These panels are created dynamically based on the information in wizard scripts. Our approach lets wizard writers focus on the truly difficult tasks of task analysis and technical writing, rather than on the mundane aspects of programming a graphical interface. As our technology has grown over time, the basic skills learned to create wizards with our first driver are still useful and effective today."

[Received 3 July 1998.]

[CR: 19991002]

Catteau, Tom . "An SGML System for the Budget of the European Union." [PROJECT REPORT] Markup Languages: Theory & Practice 1/3 (Summer 1999) 41-59 (with 3 references). ISSN: 1099-6621 [MIT Press]. Author's affiliation: Software Engineer, SGML Technologies Group. 29 Boulevard Général Wahis, B-1030 Brussels Belgium. Email: [email protected]; WEB http://www.sgmltech.com. Tel: +32 2 705 70 21; FAX +32 2 705 81 01.

"In this paper, the system used for the editorial process of the European Union's budget is described, both from a functional and a technical point of view. It will be shown how the choice of SGML as the key technology has had an impact on the overall architecture as well as on individual modules which constitute the system. The description is based on the current status of the system. Future developments are discussed briefly."

"The editorial process of the budget of the European Union is an annual, on-going process in which different players such as authors, translators, reviewers and a printer all operate in a common environment to enter, translate, and review data needed to produce the budget. The budget itself is published on paper and on the Web. The system, designed to fulfill requirements for the timely delivery of high-quality documents, together with short production times, and hence minimized costs, is entirely SGML-based. It has evolved to a complete and mature production environment. In this paper an overview of the architecture of the system is given as well as a description of the rationale behind the key technical choices that were made. It highlights certain aspects of SGML, such as concurrency and links, which are explained by illustrating their use in the budget application. The need for reliability and stability is shown to have led to a client/server system in which SGML acts as the backbone of the modules which govern the production workflow. These modules communicate with each other through SGML-formatted messages. This application has been made possible through the use of a full-featured SGML parser and an associated application language that combine to make a powerful SGML engine. In a final section, future developments, some of which are currently being developed, are briefly discussed."

[Received 23 June 1998. Revised 5 August 1998. Accepted 27 July 1998.]

See "The European Union's Budget: SGML Used to its Full Potential." by Tom Catteau. In Conference Proceedings of SGML '97, pages 645-653. Other research papers from the group are available.

[CR: 19991002]

Graham, Tony. "Whither &#38;?" [SQUIB] Markup Languages: Theory & Practice 1/3 (Summer 1999) 40. ISSN: 1099-6621 [MIT Press]. Author's affiliation: Mulberry Technologies; Home Page.

"The declarations for predefined & and < entities provided in section 4.6, Predefined Entities, of the XML Recommendation may be confusing at first sight because the leading ampersand in each numeric character reference is itself escaped as a complete numeric character reference. [shows how <!ENTITY my-amp "&#38;"> will eventually yield strings like "AT&T" (internally) in an application after reparsing...]

[CR: 19991002]

Piez, Wendell. "Review of The XML Companion, by Neil Bradley." [BOOK REVIEW] Markup Languages: Theory & Practice 1/3 (Summer 1999) 114. ISSN: 1099-6621 [MIT Press]. Author's affiliation: Mulberry Technologies; WWW.

"Neil Bradley has been working with generic markup applications for over ten years; his offering, The XML Companion, benefits accordingly. His treatment covers the same range of issues as other overviews, but the text itself is refreshingly free of statements of unanchored principle (what XML 'should' be) and prognostication, instead presenting the actual state of things and concentrating on what is known by markup practitioners to work. Likewise, he is much more accurate and forthright than many other general references in indicating which technologies are stable (for example, the DTD syntax of XML 1.0 is not subject to change and will not suddenly be replaced by 'XML-Data', even while a new schema language is in the works) and which are soft or still under development (like XSL). He is also more consistently successful in exposing core ideas, rather than depending on examples (plucked from wherever) to be self-explanatory. . ."

References: The XML Companion. Harlow, Essex: Addison Wesley Longman, 1998. Extent: 464 pages. ISBN: 0-201-41999-8.

[CR: 19991002]

Piez, Wendell. "Review of XML: The Annotated Specification, by Bob DuCharme." [BOOK REVIEW] Markup Languages: Theory & Practice 1/3 (Summer 1999) 115. ISSN: 1099-6621 [MIT Press]. Author's affiliation: Mulberry Technologies; WWW.

"XML: The Annotated Specification is the shortest and most manageable of the books under review, and the quality of information in it is good; its scope is also narrower. Unlike the other books, it is not a general reference; Bob DuCharme concentrates exclusively on the syntax of XML languages (both instance and DTD syntaxes) as defined in the February 1998 W3C Recommendation (which appears in the book verbatim, intermixed with commentary). DuCharme, while not a member of the committee that wrote the specification itself, was party to discussions about its design when it was in progress, and is thus in a good position to present an interpretation without compromising the specification's 'actual meaning'. This book will be of greatest interest and most benefit, naturally, to the technical user who has a reason to be concerned with details of the standard itself, rather than with one or another implementation or application of it. . ."

References: Bob DuCharme. XML: The Annotated Specification. The Charles F. Goldfarb Series on Open Information Management. The Definitive XML Series from Charles F. Goldfarb. Upper Saddle River, NJ: Prentice Hall PTR, 1999. Extent: xx + 339 pages. ISBN: 0-13-082676-6.

[CR: 19991002]

Piez, Wendell. "Review of XML In Plain English, by Sandra E. Eddy." [BOOK REVIEW] Markup Languages: Theory & Practice 1/3 (Summer 1999) 116. ISSN: 1099-6621 [MIT Press]. Author's affiliation: Mulberry Technologies; WWW.

"XML In Plain English is a digest of information from available specifications presented in directory form, so that one could, for example, look up 'children' in the XML Syntax section and find out how the XML Specification uses the term. Included are sections on XML Syntax (information derived from the February 1998 XML Specification), XLink and XPointer (1998 Working Drafts), Cascading Style Sheets (CSS1 and CSS2), the DSSSL-O subset of DSSSL (August 1998), Appendixes on Unicode and XML Editors and Utilities, and a Glossary. . ."

[CR: 19991002]

Piez, Wendell. "Review of The XML Black Book, by Natanya Pitts-Moultis and Cheryl Krik." [BOOK REVIEW] Markup Languages: Theory & Practice 1/3 (Summer 1999) 117. ISSN: 1099-6621 [MIT Press]. Author's affiliation: Mulberry Technologies; WWW.

Pitts-Moultis and Kirk's XML Black Book, billed as a 'comprehensive reference', tries to cover the full range of XML-related issues. It contains six parts, variously approaching high- and low-level problems of document modeling, system design and implementation, style sheet technologies, application development and so on. Within these parts the chapters, with titles like 'Implementing XML in a Corporate Environment' or 'Creating Content in XML', each contain an 'In Depth' and an 'Immediate Solutions' section. . ."

Also in this issue of MLTP:

Announcement for Markup Technologies '99 [page 60]
Books Received [page 118]
Author biographies [pages 119-120]


SEARCH \| ABOUT \| INDEX \| NEWS \| CORE STANDARDS \| TECHNOLOGY REPORTS \| EVENTS \| LIBRARY