Cover Pages: Markup Languages: Theory and Practice. Volume 1, Number 4: Table of Contents

This document contains an annotated Table of Contents for Markup Languages: Theory and Practice, Volume 1, Number 4 (Fall 1999). See further information in (1) Journal publication information for Volume 1, Number 4; in (2) the overview in the serials document, Markup Languages: Theory & Practice; and in (3) the journal description document. An online TOC is also accessible on the MIT Press Web site.

[CR: 20000620]

Birnbaum, David J.; Mundie, David A. "The Problem of Anomalous Data: A transformational approach." [ARTICLE] Markup Languages: Theory & Practice 1/4 (Fall 1999) 1-19 (with 11 references). ISSN: 1099-6622 [MIT Press]. Authors' affiliation: [Birnbaum] University of Pittsburgh, Department of Slavic Languages and Literatures; Email: djb@clover.slavic.pitt.edu, WWW; [Mundie] DASYS Inc.; Email: mundie@dasyseda.com; WWW.

Abstract: "In the context of encoding new texts, the use of automated tools to ensure conformance to a grammar specified in a DTD has many benefits, most notably the guarantee of correctness and the simplification of downstream processing applications. In the context of producing new electronic editions of existing print documents, however, the use of those tools is problematic, because the existing paper texts may violate their underlying structure due to human error during their compilation or production. The producer of such a document is torn between the need to preserve the historical record of the original text, on the one hand, and the need to produce a correct, validated document that conforms to a DTD, on the other. Such texts raise the philosophic problem of encoding what is in essence an invalid document within a framework designed specifically to support validity. The solution we propose is to view the valid and the invalid texts as transforms of each other. By capturing the transformation rules between them, we can easily produce a correct SGML document instance while at the same time preserving the historical record."

[The problem]: "In a standard SGML authoring model, SGML tools can ensure that newly-created documents conform to a DTD developed for a particular purpose. This model is appropriate in an environment where SGML tools are used to create new structured documents, but it is less well suited to the production of electronic versions of pre-existing print (or even non-SGML electronic) documents, an extremely common enterprise in humanities computing. One reason that such transcriptions are problematic is that portions of pre-existing documents that were created outside SGML editors may, owing to the fallibility of human editors, violate the overall logical structure of those documents in general. For example, a dictionary entry might improperly omit an obligatory element, or place it out of an otherwise strict and regular position..." [Conclusion:] "Ultimately one can envisage merging this transformation-based system with a traditional SGML system, perhaps using the error-correction facilities of the parser to generate at least some of the transformation rules automatically. In an authoring environment enriched in this way, the system might query the user upon encountering a parsing error. The user would either correct the error or inform the system of how the erroneous structure should be mapped automatically to a valid structure. The interjection of this type of associative layer into the model allows the document instance to preserve the syntax of the original, it allows the DTD to model the abstract structure underlying the original even when that structure is not followed with absolute fidelity, and it provides a format where users can specify in formal ways the relationships between ideal and actual markup without compromising the integrity of either the transcription of the primary source or the DTD that purports to represent the syntactic structure underlying that source."

[CR: 20000620]

Sperberg-McQueen, C. Michael. "Regular expressions for dates." [SQUIB] Markup Languages: Theory & Practice 1/4 (Fall 1999) 20-26. ISSN: 1099-6622 [MIT Press]. Author's affiliation: World Wide Web Consortium/MIT Laboratory for Computer Science; Email: cmsmcq@acm.org.

"[One may hear:] 'Validation of dates, therefore, must (so goes the argument) be left to application-specific code.' While I agree that date checking is probably not best done using SGML content models, I feel compelled to point out that the claim just paraphrased, as stated, is false. It is possible to write a regular expression which recognizes dates. At the Markup Technologies '98 conference, I exhibited a deterministic finite-state automaton for recognizing Gregorian dates in the form yyyy-mm-dd (as specified by ISO 8601), which can be represented as lex code thus... The editors challenge our readers to find shorter expressions for recognizing dates, which accepts all valid dates, and only valid dates; in particular, the expression should accept a 29th day in February only in leap years, using the Gregorian rules for leap years. In particular, we are interested in (a) the shortest regular expression and (b) the shortest such expression which is unambiguous in the sense of SGML (or, synonymously, deterministic in the sense of the XML specification). For concreteness, we will specify that the expression should use the syntax of lex , and need only accept four-digit years. Variant date formats specified by ISO 8601 need not be accepted. The shortest correct expressions received by the editors before 1 July 2000 will be published in a later issue of this journal. Judgement of a panel of peer reviewers as to the correctness and length of the submissions is final."

Note 2001-08: see now the article by Eric Howland and David Niergarth in MLTP 2/2. Referenced in the online TOC for MLTP 2/2.

[CR: 20000620]

Attipoe, Alfred. "Knowledge Structuring for Corporate Memory." [PROJECT REPORT] Markup Languages: Theory & Practice 1/4 (Fall 1999) 27-36 (with 8 references). ISSN: 1099-6622 [MIT Press]. Author's affiliation: SGML Technologies Group, 29 Boulevard Général Wahis 1030 Brussels, Belgium; Email: aat@sgmltech.com; WEB http://www.sgmltech.com.

Abstract: "An emerging technology is presented that leverages existing approaches of corporate memort based on document management. This technology deals with the growing and crucial needs of capitalization of the strategic and/or technical information in industrial, financial, or administrative environments. A brief summary of current methods for implementing knowledge management applications in organizations is presented, then a description of the needs for corporate memory in different domains is given. The rest of the paper focuses on SGML and XML as the most appropriated markup languages for corporate memory systems using a knowledge-based documents approach, and provides an overview of the associated system architecture as implemented in various projects of the SGML Technologies Group."

[Conclusion:] "This paper proposed an SGML-based approach to knowledge management in organizations and gave examples of applications. Traditional database and knowledge-based systems use a fine-grained structured information model for automatic processing but raw documents are not easily handled. Other approaches for knowledge management are found today; for instance data warehouses, workflow systems, and corporate document management portals (such as LIVELINK or DOCUMENTUM), bring the means for integrating the information resources of an organization and allow the users to perform complex searches. However no SGML-like fined-grained model of the information is provided in such systems. On the contrary, this approach shows the knowledge modeling facilities offered by a markup language such as SGML and XML, along with architecture organization for a value-added corporate memory system. The modeling approach is similar to research works on conceptual structures, however the architecture organization is the result of several years of experience in the development of advanced SGML document publishing system. Knowledge management requires a continual effort promoting both information sharing and timelessness of quality information resources. As a consequence, new roles should be defined for the daily management of the corporate memory system. Two types of knowledge management roles are conceivable: (1) a technical role for system management; and (2) a strategic role (on top of the technical role), for continual reinforcement of the content of the corporate memory and for motivation of the users..."

[CR: 20000620]

Holman, G. Ken. "XML Conformance (draft-19990728-1650)." [STANDARDS REPORTS] Markup Languages: Theory & Practice 1/4 (Fall 1999) 37-45 (with 4 references). ISSN: 1099-6622 [MIT Press]. Author's affiliation: Chief Technology Officer, Crane Softwrights Ltd., Box 266, Kars, Ontario CANADA K0A-2E0; Email: gkholman@CraneSoftwrights.com, WEB http://www.CraneSoftwrights.com/links/mlang-link.htm .

Abstract: "OASIS (The Organization for the Advancement of Structured Information Standards), began looking at the issue of XML (Extensible Markup Language) interoperability and conformance six months before the XML 1.0 Recommendation [Bray, Paoli, and Sperberg-McQueen, "XML 1.0"] was finalized. Nothing in the XML Recommendation is said explicitly about conformance testing, though conformance is described boldly in Section 5.0. OASIS undertook to collect and create conformance tests for use by the XML community. OASIS specifically decided to postpone any decision regarding the act of conformance testing, leaving this task to other organizations who would find the OASIS XML Conformance Test Suite [OASIS XML Conformance Technical Subcommittee, "XML Conformance Tests"] of use when providing testing tools (such as NIST (National Institute of Standards and Technology)), or for testing authorities when providing the service of XML Conformance Testing or Certification. The objective of the committee is to pool, create, and document files of data that can be used to test the behavior of XML processors against the expected behavior of conforming XML processors. These files of data include valid, well-formed, not valid and not well-formed XML instances (the recommendation does not use or define the terms "invalid" or "malformed" so they are not used in the test suite). This project report relates a brief history of the project, describes the objectives of the committee, outlines the resulting report on an XML Conformance Test Suite, and comments on how the report is and can be exploited by the XML community at large."

[Conclusion:] "A public information page at http://www.oasis-open.org/committees/xmlconf-pub.html has been created to relate the status of the committee's efforts. Links to versions of the test suite can be found through that page. Continued contributions to the OASIS Technical Committee will be considered for incorporation into the suite. Readers should consider joining OASIS to get involved in the process and work directly on the committee. Readers are encouraged to talk to vendors about the vendor's membership in OASIS and their commitment to conformance. The areas of conformance for the XML family of recommendations will continue to grow. The busy working groups described in the W3C XML Activity (http://www.w3.org/XML/Activity.html) will be producing new recommendations in which conformance testing will be an important issue. OASIS will continue to identify, develop and provide contributions related to interoperability and conformance of XML-related recommendations as it meets its mandates to its membership and the XML community at large. Your company's and your own participation in OASIS can help promote these goals.

See further references in "XML Conformance."

[CR: 20000620]

Pepper, Steve. "Navigating Haystacks and Discovering Needles: Introducing the New Topic Map Standard." [STANDARDS REPORTS] Markup Languages: Theory & Practice 1/4 (Fall 1999) 47-74 (with 19 references). ISSN: 1099-6622 [MIT Press]. Author's affiliation: Email: pepper@ontopia.net; WWW: Ontopia.

Abstract: "This article provides an introduction to the new topic map standard (ISO/IEC 13250) with particular reference to the domain of encyclopaedia publishing, and discusses the relationship between topic maps and the W3C recommendation Resource Description Framework (RDF). It is based on the author's participation in the development of the topic map standard (representing Norway in SC34, the ISO committeee responsible for SGML and related standards), and two years' collaboration with the leading reference works publishers in Norway, Denmark, Sweden, Poland, and Germany."

"As the amount of information available on the WWW and elsewhere continues to grow at an almost exponential rate, it becomes increasingly difficult to locate the particular piece of information we need: precious time and resources are consumed navigating haystacks of information and those sought-after needles of information become ever more difficult to discover. Two recent standards are designed to provide ways of coping with this problem: ISO/IEC 13250, Information technology - SGML Applications - Topic maps [ISO, Topic Maps] and the Resource Description Framework (RDF) [W3C, RDF model and syntax], [W3C, RDF schema]. This article aims to provide a simple introduction to the basic concepts underlying the first of these, the topic map standard, and to discuss the relationship between topic maps and RDF. In the first section we introduce the topic map standard itself and describe its background, rationale and current status. The second section presents the topic map model along with its key concepts. This is followed by a discussion of some areas of applicability of the topic paradigm to the domain of encyclopaedia publishing. Finally we give a brief overview of RDF and discuss its relationship with topic maps."

[Conclusion:] "The new topic map standard provides a standardized way of modeling the structure of the knowledge contained in information resources in such a way as to enable new means of navigation and retrieval, and ultimately also new means of organization of that information. The applicability of topic maps extends to all spheres of information management, not least commercial reference works, and effectively 'bridges the gap' between knowledge representation and information management. Support for topic maps is currently being implemented in a number of information management tools, including STEP's document management and editorial system, SigmaLink."

See also Steve Pepper's presentation from the XML Europe 2000 Conference (Paris): "The TAO of Topic Maps." For other references, see "(XML) Topic Maps."

[CR: 20000620]

Graham, Tony. "Unicode: What is it and how do I use it?" [ARTICLE] Markup Languages: Theory & Practice 1/4 (Fall 1999) 75-102 (with 12 references). ISSN: 1099-6622 [MIT Press]. Author's affiliation: Senior Consultant, Mulberry Technologies, Inc.; Email: info@mulberrytech.com, WEB http://www.mulberrytech.com .

Abstract: "The rationale for Unicode and its design goals and detailed design principles are presented. The correspondence between Unicode and ISO/IEC 10646 is discussed, the scripts included or planned for inclusion in the two character set standards are listed. Some products that support Unicode and some applications that require Unicode are listed, then examples of how to specify Unicode characters in a variety of applications are given. Use of Unicode in SGML and XML applications is discussed, and the paper concludes with descriptions of the character encodings used with Unicode and ISO/IEC 10646, plus sources of further information are listed."

"Unicode was born from frustration by software manufacturers with the fragmented, complicated, and contradictory character encodings in use around the world. The technical difficulties of dealing with different coded character sets meant that software had to be extensively localized before being released into different markets, which meant that the 'other language' versions of software could be significantly delayed and also be significantly different internally because of the character handling requirements. Not surprisingly, the same character handling changes were regrafted onto each new version of the base software before it could be released into other markets. Of course the character encoding is not the only aspect to be changed when localizing software, but it is a large component, and it is the aspect most likely to have a custom solution for each new language... The Unicode Standard defines a fixed-width 16-bit, uniform encoding scheme for written characters and text. The Unicode Standard, Version 2.0, is also code-for-code identical with ISO/IEC 10646-1:1993 (plus the first seven amendments), and Version 3.0 will be code-for-code identical with the forth-coming ISO/IEC 10646-1:2000. The Unicode Standard, however, does define more semantics for characters than does ISO/IEC 10646. The Unicode Technical Committee has a liaison membership with the ISO/IEC Working Group responsi-ble for computer character sets."

"What are the Unicode and ISO/IEC 10646 Coded Representations? -- The Unicode Standard and ISO/IEC 10646 share a bewildering variety of coded representations for the same characters. The following sections will help in deciphering the acronyms [. . .] Which is best for me? There is no single, simple answer to this question. The choice of encoding will depend in part upon your language and in part upon the tools that you are using. For example, if you are working in English, it is simplest to use UTF-8 since UTF-8 is a superset of ASCII, but if you are working in Japanese, it is preferable to use UTF-16, since using UTF-8 would result in larger files. However, if you working with Perl, the latest version of which handles Unicode characters as UTF-8 internally, it might be simplest to only use UTF-8 for input and output. Your choice may be determined by your choice of application, since even on 'little-endian' Intel processors, some applications save Unicode text as UTF-16BE. The good news is that recent versions of web browsers can handle both UTF-8 and UTF-16 text, as can your XML software..."

See also references in (1) "XML and Unicode" and (2) the Unicode Consortium Home Page.

[CR: 20000620]

Flynn, Peter. "[Review of] XML by Example." [BOOK REVIEWS] Markup Languages: Theory & Practice 1/4 (Fall 1999) 103-105. ISSN: 1099-6622 [MIT Press].

XML by Example, by Seán McGrath. Charles F. Goldfarb Series on Open Information Management. Prentice-Hall PTR, 1998.

"The book is in four Parts, and assumes only a basic knowledge of HTML and the Web to begin with, such as any reasonably well up-to-date business user can be expected to have. McGrath takes you simply and clearly through the techniques XML is being used for to improve the functionality and reliability of business on the Web. The tone is light but careful and precise, and avoids the forced jokiness which some other publishers and authors believe is appropriate for business technology. Later in the book the reader needs a good grasp of computers, and a little programming experience is useful: Java, Perl, and Python are all represented. . . This book is for those who are taking e-commerce seriously, harnessing the data-moving power of the Internet to the descriptive power of XML. McGrath's own experience shows in his handling of a technical subject without making it unnecessarily complex: the level is well within the range of anyone who is comfortable with computing and the Internet. The management summary, for example, provides an excellent run-through of the principles without assuming more than a simple familiarity with the Web; yet the really technical bits remain accurate and reliable, unlike many books on HTML (although there are one or two mildly invalid HTML oddities, but they don't really count). . ."

[CR: 20000620]

Mah, Carole E. "[Review of] Understanding SGML and XML Tools: Practical Programs for Handling Structured Text ." [BOOK REVIEWS] Markup Languages: Theory & Practice 1/4 (Fall 1999) 105-107. ISSN: 1099-6622 [MIT Press].

Understanding SGML and XML Tools: Practical Programs for Handling Structured Text, by Peter Flynn. Kluwer Academic Publishers, 1998. Foreword by Steven DeRose.

"...the [author's] goal is not to provide exhaustive detail (which would soon be out-of-date), but to provide a roadmap to assist readers in understanding the SGML and XML tools well enough to analyze tools in more detail on their own. Flynn provides many well-placed references to other books and to web resources which can provide more detail. This book would be ideal for someone who does not know much about SGML/XML or associated tools, providing a very solid understanding from which to move forward. Frequent internal cross-references, along with a good deal of redundant information mean that ones does not have to read from cover to cover but can flip directly to information of particular interest. The well-structured divisions, sections and subsections also make this very easy. Together, these two features modestly manage to showcase the utility of using SGML by their very existence! Flynn of course used SGML tools to produce his book, and the astute reader can see the fruit of this labor everywhere, but especially in these features. Additionally, even the reader who wishes to read the whole book from cover to cover cannot dislike the repetition, as there is just enough to usefully reinforce concepts. These features all coalesce to produce a book that is useful both to the SGML newcomer and to the old hands (particularly scholars) who have long wished for a book which brings together information heretofore available only in hearsay and fragments. Newcomers will find the introduction to SGML concepts especially lucid: I have never read a more cogent and simple explanation of the differences between parameter, character, and general entities..."

[CR: 20000620]

Lapeyre, Deborah A. "[Annotations on] Practical Transformation Using XSLT and Xpath (XSL Transformations and the XML Path Language)." [ANNOTATED TABLE OF CONTENTS] Markup Languages: Theory & Practice 1/4 (Fall 1999) 107-110. ISSN: 1099-6622 [MIT Press].

Publication by Crane Softwrights Ltd.

"[The volume Practical Transformation Using XSLT and Xpath (XSL Transformations and the XML Path Language)] provides an introduction to the conceptual models, mechanisms, and detailed syntax for XSLT (Extensible Stylesheet Language Transformations) and XPath (XML Path Language). XSLT is the portion of W3C's XSL recommendation designed to transform an XML document into another structure, such as a different XML document, an HTML document, a document composed of formatting objects, or a document in a proprietary markup system. This is tree transformation, with an XSL engine, under the direction of an XSLT stylesheet, transforming a well-formed XML document tree into a result tree in another form. XPath is the document component addressing portion of the XML recommendations family, which specifies how to address the nodes in these trees. This book, which assumes knowledge of XML but no prior knowledge of XPath or XSLT, provides extensive samples and diagrams illustrating XSLT architectures and describes the XSLT instructions and their use Since this book is published only in electronic form, Ken Holman's Practical Transformation provides what no printed volume could-nearly immediate updates whenever the XSL or XPath specifications change, as they have done 4 or 5 times in the last 6 months. The book is written as a series of slides (quite lengthy slides) rather than as textual paragraphs within chapters, since the work supports Crane Softwrights' XSL classes. This style is appropriate for instructional material and the slides stand alone without need for an instructor. An abbreviated version of the work (over 100 pages) is available for free download so that the work can be evaluated before purchase. The free download includes very useful indices to the XSLT Draft Recommendation...[summary of the twelve modules follows]"


SEARCH \| ABOUT \| INDEX \| NEWS \| CORE STANDARDS \| TECHNOLOGY REPORTS \| EVENTS \| LIBRARY