CP RSS Channel
About Our Sponsors
Articles & Papers
Technology and Society
|Markup Languages: Theory and Practice. Volume 1, Number 4: Table of Contents|
This document contains an annotated Table of Contents for Markup Languages: Theory and Practice, Volume 1, Number 4 (Fall 1999). See further information in (1) Journal publication information for Volume 1, Number 4; in (2) the overview in the serials document, Markup Languages: Theory & Practice; and in (3) the journal description document. An online TOC is also accessible on the MIT Press Web site.
Birnbaum, David J.; Mundie, David A. "The Problem of Anomalous Data: A transformational approach." [ARTICLE] Markup Languages: Theory & Practice 1/4 (Fall 1999) 1-19 (with 11 references). ISSN: 1099-6622 [MIT Press].
Authors' affiliation: [Birnbaum] University of Pittsburgh, Department of Slavic Languages and Literatures;
[Mundie] DASYS Inc.;
Abstract: "In the context of encoding new texts, the use of automated tools to ensure conformance to a grammar specified in a DTD has many benefits, most notably the guarantee of correctness and the simplification of downstream processing applications. In the context of producing new electronic editions of existing print documents, however, the use of those tools is problematic, because the existing paper texts may violate their underlying structure due to human error during their compilation or production. The producer of such a document is torn between the need to preserve the historical record of the original text, on the one hand, and the need to produce a correct, validated document that conforms to a DTD, on the other. Such texts raise the philosophic problem of encoding what is in essence an invalid document within a framework designed specifically to support validity. The solution we propose is to view the valid and the invalid texts as transforms of each other. By capturing the transformation rules between them, we can easily produce a correct SGML document instance while at the same time preserving the historical record."
[The problem]: "In a standard SGML authoring model, SGML tools can ensure that newly-created
documents conform to a DTD developed for a particular purpose. This model is
appropriate in an environment where SGML tools are used to create new
structured documents, but it is less well suited to the production of electronic
versions of pre-existing print (or even non-SGML electronic) documents, an
extremely common enterprise in humanities computing. One reason that such
transcriptions are problematic is that portions of pre-existing documents that
were created outside SGML editors may, owing to the fallibility of human editors,
violate the overall logical structure of those documents in general. For example, a
dictionary entry might improperly omit an obligatory element, or place it out of
an otherwise strict and regular position..." [Conclusion:] "Ultimately one can envisage merging this transformation-based system with a
traditional SGML system, perhaps using the error-correction facilities of the
parser to generate at least some of the transformation rules automatically. In an
authoring environment enriched in this way, the system might query the user
upon encountering a parsing error. The user would either correct the error or
inform the system of how the erroneous structure should be mapped
automatically to a valid structure. The interjection of this type of associative layer
into the model allows the document instance to preserve the syntax of the
original, it allows the DTD to model the abstract structure underlying the original
even when that structure is not followed with absolute fidelity, and it provides a
format where users can specify in formal ways the relationships between ideal and
actual markup without compromising the integrity of either the transcription of
the primary source or the DTD that purports to represent the syntactic structure
underlying that source."
Sperberg-McQueen, C. Michael.
"Regular expressions for dates." [SQUIB]
Markup Languages: Theory & Practice
1999) 20-26. ISSN: 1099-6622
Author's affiliation: World Wide Web Consortium/MIT Laboratory for Computer Science;
"[One may hear:] 'Validation of dates, therefore, must (so goes the argument) be left
to application-specific code.' While I agree that date checking is probably not best done using SGML
content models, I feel compelled to point out that the claim just paraphrased, as
stated, is false. It is possible to write a regular expression which recognizes dates. At the
Markup Technologies '98 conference, I exhibited a deterministic finite-state
automaton for recognizing Gregorian dates in the form yyyy-mm-dd (as specified
by ISO 8601), which can be represented as lex code thus... The editors challenge our readers to find shorter expressions for recognizing
dates, which accepts all valid dates, and only valid dates; in particular, the
expression should accept a 29th day in February only in leap years, using the
Gregorian rules for leap years. In particular, we are interested in (a) the shortest
regular expression and (b) the shortest such expression which is unambiguous in
the sense of SGML (or, synonymously, deterministic in the sense of the XML
For concreteness, we will specify that the expression should use the syntax of
lex , and need only accept four-digit years. Variant date formats specified by ISO
8601 need not be accepted.
The shortest correct expressions received by the editors before 1 July 2000
will be published in a later issue of this journal. Judgement of a panel of peer
reviewers as to the correctness and length of the submissions is final."
Note 2001-08: see now the article by Eric Howland and David Niergarth in MLTP 2/2. Referenced in the online TOC for MLTP 2/2.
Attipoe, Alfred. "Knowledge Structuring for Corporate Memory." [PROJECT REPORT] Markup Languages: Theory & Practice 1/4 (Fall 1999) 27-36 (with 8 references). ISSN: 1099-6622 [MIT Press]. Author's affiliation:
SGML Technologies Group, 29 Boulevard Général Wahis 1030 Brussels, Belgium;
Abstract: "An emerging technology is presented that leverages existing approaches of corporate memort based on document management. This technology deals with the growing and crucial needs of capitalization of the strategic and/or technical information in industrial, financial, or administrative environments. A brief summary of current methods for implementing knowledge management applications in organizations is presented, then a description of the needs for corporate memory in different domains is given. The rest of the paper focuses on SGML and XML as the most appropriated markup languages for corporate memory systems using a knowledge-based documents approach, and provides an overview of the associated system architecture as implemented in various projects of the SGML Technologies Group."
[Conclusion:] "This paper proposed an SGML-based approach to knowledge management in
organizations and gave examples of applications. Traditional database and
knowledge-based systems use a fine-grained structured information model for
automatic processing but raw documents are not easily handled. Other
approaches for knowledge management are found today; for instance data
warehouses, workflow systems, and corporate document management portals
(such as LIVELINK or DOCUMENTUM), bring the means for integrating the
information resources of an organization and allow the users to perform complex
searches. However no SGML-like fined-grained model of the information is
provided in such systems. On the contrary, this approach shows the knowledge
modeling facilities offered by a markup language such as SGML and XML, along
with architecture organization for a value-added corporate memory system. The
modeling approach is similar to research works on conceptual structures,
however the architecture organization is the result of several years of experience
in the development of advanced SGML document publishing system.
Knowledge management requires a continual effort promoting both
information sharing and timelessness of quality information resources. As a
consequence, new roles should be defined for the daily management of the
corporate memory system. Two types of knowledge management roles are
conceivable: (1) a technical role for system management; and
(2) a strategic role (on top of the technical role), for continual reinforcement of the
content of the corporate memory and for motivation of the users..."
Holman, G. Ken. "XML Conformance (draft-19990728-1650)." [STANDARDS REPORTS] Markup Languages: Theory & Practice 1/4 (Fall 1999) 37-45 (with 4 references). ISSN: 1099-6622 [MIT Press]. Author's affiliation:
Chief Technology Officer, Crane Softwrights Ltd., Box 266, Kars, Ontario CANADA K0A-2E0;
Abstract: "OASIS (The Organization for the Advancement of Structured Information Standards), began looking at the issue of XML (Extensible Markup Language) interoperability and conformance six months before the XML 1.0 Recommendation [Bray, Paoli, and Sperberg-McQueen, "XML 1.0"] was finalized. Nothing in the XML Recommendation is said explicitly about conformance testing, though conformance is described boldly in Section 5.0. OASIS undertook to collect and create conformance tests for use by the XML community. OASIS specifically decided to postpone any decision regarding the act of conformance testing, leaving this task to other organizations who would find the OASIS XML Conformance Test Suite [OASIS XML Conformance Technical Subcommittee, "XML Conformance Tests"] of use when providing testing tools (such as NIST (National Institute of Standards and Technology)), or for testing authorities when providing the service of XML Conformance Testing or Certification. The objective of the committee is to pool, create, and document files of data that can be used to test the behavior of XML processors against the expected behavior of conforming XML processors. These files of data include valid, well-formed, not valid and not well-formed XML instances (the recommendation does not use or define the terms "invalid" or "malformed" so they are not used in the test suite). This project report relates a brief history of the project, describes the objectives of the committee, outlines the resulting report on an XML Conformance Test Suite, and comments on how the report is and can be exploited by the XML community at large."
[Conclusion:] "A public information page at
http://www.oasis-open.org/committees/xmlconf-pub.html has been
created to relate the status
of the committee's efforts. Links to versions of the test suite can be found through
Continued contributions to the OASIS Technical Committee will be
considered for incorporation into the suite. Readers should consider joining
OASIS to get involved in the process and work directly on the committee. Readers
are encouraged to talk to vendors about the vendor's membership in OASIS and
their commitment to conformance. The areas of conformance for the XML family of recommendations will
continue to grow. The busy working groups described in the W3C XML Activity
(http://www.w3.org/XML/Activity.html) will be producing new
recommendations in which conformance testing will be an important issue.
OASIS will continue to identify, develop and provide contributions related to
interoperability and conformance of XML-related recommendations as it meets
its mandates to its membership and the XML community at large. Your
company's and your own participation in OASIS can help promote these goals.
See further references in "XML Conformance."
Pepper, Steve. "Navigating Haystacks and Discovering Needles: Introducing the New Topic Map Standard." [STANDARDS REPORTS] Markup Languages: Theory & Practice 1/4 (Fall 1999) 47-74 (with 19 references). ISSN: 1099-6622 [MIT Press]. Author's affiliation: Email: email@example.com; WWW: Ontopia.
Abstract: "This article provides an introduction to the new
topic map standard (ISO/IEC 13250) with particular reference to the domain of encyclopaedia publishing, and discusses the relationship between topic maps and the W3C recommendation Resource Description Framework (RDF). It is based on the author's participation in the development of the topic map standard (representing Norway in SC34, the ISO committeee responsible for SGML and related standards), and two years' collaboration with the leading reference works publishers in Norway, Denmark, Sweden, Poland, and Germany."
"As the amount of
information available on the WWW and elsewhere continues to grow at an
almost exponential rate, it becomes increasingly difficult to locate the particular
piece of information we need: precious time and resources are consumed
navigating haystacks of information and those sought-after needles of information
become ever more difficult to discover.
Two recent standards are designed to provide ways of coping with this problem:
ISO/IEC 13250, Information technology - SGML Applications - Topic
maps [ISO, Topic Maps] and the Resource Description Framework (RDF) [W3C,
RDF model and syntax], [W3C, RDF schema]. This article aims to provide a
simple introduction to the basic concepts underlying the first of these, the topic
map standard, and to discuss the relationship between topic maps and RDF. In
the first section we introduce the topic map standard itself and describe its background,
rationale and current status. The second section presents the topic map
model along with its key concepts. This is followed by a discussion of some areas
of applicability of the topic paradigm to the domain of encyclopaedia publishing.
Finally we give a brief overview of RDF and discuss its relationship with topic
[Conclusion:] "The new topic map standard provides a standardized way of modeling the
structure of the knowledge contained in information resources in such a way as to
enable new means of navigation and retrieval, and ultimately also new means of
organization of that information.
The applicability of topic maps extends to all spheres of information
management, not least commercial reference works, and effectively 'bridges the
gap' between knowledge representation and information management.
Support for topic maps is currently being implemented in a number of
information management tools, including STEP's document management and
editorial system, SigmaLink."
See also Steve Pepper's presentation from the XML Europe 2000 Conference (Paris): "The TAO of Topic Maps." For other references, see "(XML) Topic Maps."
Graham, Tony. "Unicode: What is it and how do I use it?" [ARTICLE] Markup Languages: Theory & Practice 1/4 (Fall 1999) 75-102 (with 12 references). ISSN: 1099-6622 [MIT Press]. Author's affiliation:
Senior Consultant, Mulberry Technologies, Inc.;
Email: firstname.lastname@example.org, WEB
Abstract: "The rationale for Unicode and its design goals and detailed design principles are presented. The correspondence between Unicode and ISO/IEC 10646 is discussed, the scripts included or planned for inclusion in the two character set standards are listed. Some products that support Unicode and some applications that require Unicode are listed, then examples of how to specify Unicode characters in a variety of applications are given. Use of Unicode in SGML and XML applications is discussed, and the paper concludes with descriptions of the character encodings used with Unicode and ISO/IEC 10646, plus sources of further information are listed."
"Unicode was born from frustration by software manufacturers with the
fragmented, complicated, and contradictory character encodings in use around
the world. The technical difficulties of dealing with different coded character sets
meant that software had to be extensively localized before being released into
different markets, which meant that the 'other language' versions of software
could be significantly delayed and also be significantly different internally because
of the character handling requirements. Not surprisingly, the same character
handling changes were regrafted onto each new version of the base software
before it could be released into other markets. Of course the character encoding is
not the only aspect to be changed when localizing software, but it is a large
component, and it is the aspect most likely to have a custom solution for each
new language... The Unicode Standard defines a fixed-width 16-bit, uniform encoding
scheme for written characters and text. The Unicode Standard, Version 2.0, is
also code-for-code identical with ISO/IEC 10646-1:1993 (plus the first seven
amendments), and Version 3.0 will be code-for-code identical with the forth-coming
ISO/IEC 10646-1:2000. The Unicode Standard, however, does define
more semantics for characters than does ISO/IEC 10646. The Unicode Technical
Committee has a liaison membership with the ISO/IEC Working Group responsi-ble
for computer character sets."
"What are the Unicode and ISO/IEC 10646 Coded
Representations? -- The Unicode Standard and ISO/IEC 10646 share a bewildering variety of coded
representations for the same characters. The following sections will help in
deciphering the acronyms [. . .] Which is best for me? There is no single, simple answer to this question. The choice of encoding will
depend in part upon your language and in part upon the tools that you are using.
For example, if you are working in English, it is simplest to use UTF-8 since UTF-8
is a superset of ASCII, but if you are working in Japanese, it is preferable to use
UTF-16, since using UTF-8 would result in larger files. However, if you working
with Perl, the latest version of which handles Unicode characters as UTF-8
internally, it might be simplest to only use UTF-8 for input and output.
Your choice may be determined by your choice of application, since even on
'little-endian' Intel processors, some applications save Unicode text as UTF-16BE.
The good news is that recent versions of web browsers can handle both
UTF-8 and UTF-16 text, as can your XML software..."
See also references in (1) "XML and Unicode" and (2) the Unicode Consortium Home Page.
Flynn, Peter. "[Review of]
XML by Example." [BOOK REVIEWS] Markup Languages: Theory & Practice 1/4 (Fall 1999) 103-105. ISSN: 1099-6622 [MIT Press].
XML by Example, by Seán McGrath. Charles F. Goldfarb Series on Open Information Management. Prentice-Hall PTR, 1998.
"The book is in four Parts, and assumes only a basic knowledge of HTML and
the Web to begin with, such as any reasonably well up-to-date business user can
be expected to have. McGrath takes you simply and clearly through the
techniques XML is being used for to improve the functionality and reliability of
business on the Web. The tone is light but careful and precise, and avoids the
forced jokiness which some other publishers and authors believe is appropriate
for business technology. Later in the book the reader needs a good grasp of
computers, and a little programming experience is useful: Java, Perl, and Python
are all represented. . . This book is for those who are taking e-commerce seriously, harnessing the
data-moving power of the Internet to the descriptive power of XML. McGrath's
own experience shows in his handling of a technical subject without making it
unnecessarily complex: the level is well within the range of anyone who is
comfortable with computing and the Internet. The management summary, for
example, provides an excellent run-through of the principles without assuming
more than a simple familiarity with the Web; yet the really technical bits remain
accurate and reliable, unlike many books on HTML (although there are one or
two mildly invalid HTML oddities, but they don't really count). . ."
Mah, Carole E. "[Review of] Understanding SGML and XML Tools: Practical Programs for Handling Structured Text
." [BOOK REVIEWS] Markup Languages: Theory & Practice 1/4
(Fall 1999) 105-107. ISSN: 1099-6622 [MIT Press].
Understanding SGML and XML Tools: Practical Programs for Handling Structured Text, by Peter Flynn. Kluwer Academic Publishers, 1998. Foreword by Steven DeRose.
"...the [author's] goal is not to provide exhaustive detail (which would soon be out-of-date), but to
provide a roadmap to assist readers in understanding the SGML and XML tools
well enough to analyze tools in more detail on their own. Flynn provides many
well-placed references to other books and to web resources which can provide
more detail. This book would be ideal for someone who does not know much
about SGML/XML or associated tools, providing a very solid understanding from
which to move forward. Frequent internal cross-references, along with a good deal of redundant
information mean that ones does not have to read from cover to cover but can
flip directly to information of particular interest. The well-structured divisions,
sections and subsections also make this very easy. Together, these two features modestly manage to showcase the utility of using SGML by their very existence!
Flynn of course used SGML tools to produce his book, and the astute reader can
see the fruit of this labor everywhere, but especially in these features.
Additionally, even the reader who wishes to read the whole book from cover to
cover cannot dislike the repetition, as there is just enough to usefully reinforce
concepts. These features all coalesce to produce a book that is useful both to the
SGML newcomer and to the old hands (particularly scholars) who have long
wished for a book which brings together information heretofore available only in
hearsay and fragments. Newcomers will find the introduction to SGML concepts
especially lucid: I have never read a more cogent and simple explanation of the
differences between parameter, character, and general entities..."
Lapeyre, Deborah A. "[Annotations on] Practical Transformation Using XSLT and Xpath (XSL Transformations and the XML Path Language)." [ANNOTATED TABLE OF CONTENTS] Markup Languages: Theory & Practice 1/4 (Fall 1999) 107-110. ISSN: 1099-6622 [MIT Press].
Publication by Crane Softwrights Ltd.
"[The volume Practical Transformation Using XSLT and Xpath (XSL Transformations and the XML Path Language)] provides an introduction to the conceptual models, mechanisms, and detailed syntax for XSLT (Extensible Stylesheet Language Transformations) and XPath (XML Path Language). XSLT is the portion of W3C's XSL recommendation
designed to transform an XML document into another structure, such as a
different XML document, an HTML document, a document composed of
formatting objects, or a document in a proprietary markup system. This is tree
transformation, with an XSL engine, under the direction of an XSLT stylesheet,
transforming a well-formed XML document tree into a result tree in another
form. XPath is the document component addressing portion of the XML
recommendations family, which specifies how to address the nodes in these trees.
This book, which assumes knowledge of XML but no prior knowledge of XPath
or XSLT, provides extensive samples and diagrams illustrating XSLT
architectures and describes the XSLT instructions and their use
Since this book is published only in electronic form, Ken Holman's Practical
Transformation provides what no printed volume could-nearly immediate
updates whenever the XSL or XPath specifications change, as they have done 4 or 5 times in the last 6 months.
The book is written as a series of slides (quite
lengthy slides) rather than as textual paragraphs within chapters, since the work
supports Crane Softwrights' XSL classes. This style is appropriate for
instructional material and the slides stand alone without need for an instructor.
An abbreviated version of the work (over 100 pages) is available for free
download so that the work can be evaluated before purchase. The free download
includes very useful indices to the XSLT Draft Recommendation...[summary of the twelve modules follows]"
|Receive daily news updates from Managing Editor, Robin Cover.|