The Cover PagesThe OASIS Cover Pages: The Online Resource for Markup Language Technologies
Advanced Search
Site Map
CP RSS Channel
Contact Us
Sponsoring CP
About Our Sponsors

Cover Stories
Articles & Papers
Press Releases

XML Query

XML Applications
General Apps
Government Apps
Academic Apps

Technology and Society
Tech Topics
Related Standards
Last modified: September 05, 2001
Markup Languages: Theory and Practice. Volume 2, Number 2: Table of Contents

This document contains an annotated Table of Contents for Markup Languages: Theory and Practice, Volume 2, Number 2 ('Spring 2000'), pages 111-204. See further information on MLTP in: (1) the journal publication statement, (2) the overview in the serials document, Markup Languages: Theory & Practice; and in (3) the journal description document. Current subscription information is also available on the MIT Press web site.

Summary of articles in MLTP issue 2/2:

  • Can a Team Tag Consistently? Experiences on the Orlando Project
  • Regular Expressions for Checking Dates
  • Demonstrational Interface for XSLT Stylesheet Generation
  • From Semistructured Data to XML: Migrating the Lore Data Model and Query Language
  • The Consultant's Toolkit
  • Marked-up Programming: Using XML to Structure Computer Program Source Code
  • A Formal Semantics of Patterns in XSLT and XPath

[CR: 20010905]

Butler, Terry James; Sue Fisher, Greg Coulombe, Patricia Clements, Isobel Grundy, Susan Brown, Jean Wood, and Rebecca Cameron. "Can a Team Tag Consistently? Experiences on the Orlando Project." [ARTICLE] Markup Languages: Theory & Practice 2/2 (Spring 2000) 111-125 (with 5 references). ISSN: 1099-6622 [MIT Press]. Authors' affiliation: University of Alberta; email:

Abstract: "The Orlando Project is creating a literary history of British women's writing; the textbase is being researched, written and tagged in SGML to empower sophisticated end-user access. The SGML tagging is rich, and marks thematic and critical issues as well as structural elements. The team has been constituted for five years; it is about a dozen strong at most times, including senior scholars, post-doctoral fellows and graduate students. All team members write and tag the textbase. With a complex tag set and a large team, issues of tagging consistency soon became pressing for us. Focussing on the core tags which will provide powerful access to our textbase (name, date, place), we report on an analysis of our tagging practice, and provide suggestions on how to reduce inconsistency in tagging. We establish tagging practice through a collaborative team conversation; introduce it though formal training sessions; and support one another with on-line mentoring, documentation, and extensive team communication. Even with these elaborate guides in place, we find that team members cannot meet our high expectations for consistency in tagging practice and usage. This results confirms other studies which find inter-tagger consistency to be difficult to achieve. To address the problem, we have adapted and developed our own software tools to process and revise the textbase, to draw out inconsistencies for human revision, and to remedy them programmatically where possible. Our sgrep wrapper program, and our strategies around databasing some portions of our SGML textbase, are described."

See the related paper presented at the ACH-ALLC 1999 Conference [Charlottesville, VA] and the abstract.

References: (1) main entry "The Orlando Project: An Integrated History of Women's Writing in the British Isles"; (2) Orlando Project home page; (3) the list of project publications. Contact:

[CR: 20010905]

Howland, Eric; Niergarth, David. "Regular Expressions for Checking Dates." [SQUIB] Markup Languages: Theory & Practice 2/2 (Spring 2000) 126-132 (with 2 references). ISSN: 1099-6622 [MIT Press].

Abstract: "Below we present several regular expressions for checking dates including leap years. These expressions were inspired by the article by C.M. Sperberg-McQueen in Markup Languages. Specifically they were inspired by the challenge at the end of the article (and the date on which that challenge expires) to shorten the long regular expression generated by lex. The regular expression offered here is, unfortunately, not deterministic but it is more than an order of magnitude shorter than the regular expression generated by lex. The expression is the inverse of the lex expression and actually finds incorrect dates rather than correct dates.The proposed expression uses the \D convention of Perl and Python to detect characters that are not numbers and the .{1,8} notation to indicate a string of one to eight characters. Note that a somewhat longer (and arguably less readable) version of the regular expression is also included in case you find those conventions distasteful. Also note that about 40% of this expression is dedicated to finding poorly formed dates (dates not in the nnnn-nn-nn format where n is a digit). This implies that a much shorter total expression is possible if you allow two passes (one pass to ensure that the potential date is well-formed and the second pass to detect incorrect dates). The savings when using two passes is even larger when the Python conventions are not allowed. Using two passes is, however, a less aesthetically pleasing response to the challenge. The expression is a series of tests for errors OR'ed together. Perhaps the easiest way to understand this expression is to see how it is built up from the various types of possible errors. This approach turns out to be effective, but it is hard to guarantee that all possible errors have been found. Because the challenge specifies a well-defined (and enforceable) format for the input to be tested, it is possible to exaustively test for errors. A Python program has been created (the second listing below) that exhaustively tests all dates of the form nnnn-nn-nn (where n is a number) using both algorithmic and regular expression-based tests. A comparison of the results from these two methods exposes any errors in the regular expression and guarantees that the regular expression is as accurate as the algorithm. Of course, doing all 100 million comparisons does take some time (about 4 hours on a 400 MHZ Celeron)."

See the initial reference in "Regular Expressions for Dates." Also, "MLTP Contest on writing the shortest correct regular expressions for dates."

[CR: 20010905]

Koyanagi, Teruo; Kouichi Ono; Masahiro Hori. "Demonstrational Interface for XSLT Stylesheet Generation." [ARTICLE] Markup Languages: Theory & Practice 2/2 (Spring 2000) 133-152 (with 18 references). ISSN: 1099-6622 [MIT Press].

Abstract: "XSLT plays an important role in the data conversions between different XML representations. However, besides the transformation between XML data representations, conversion to an HTML document is one of the most practical tasks for XSLT, because it allows XML documents to be rendered in a human-readable form on Web browsers. In this paper, we present a method of XSLT stylesheet generation by demonstration. First, we introduce the paradigm of programming by demonstration, and explain a model of WYSIWYG editing. We then elaborate a process of XSLT rule generation based on the users' operation history recorded behind the WYSIWYG editor. Finally, we give some examples of XSLT rules for HTML rendering, which are created automatically by a rule generation module."

"A demonstrational interface is a promising approach to helping users with XSLT stylesheet authoring, because it allows the users to create a desired result interactively with a concrete example, rather than abstractly off line. Since users of demonstrational systems need not learn XSLT programming at all, this approach is particularly suitable for end users or novice programmers who are not familiar with XSLT. However, it is also useful for skilled XSLT programmers too, because the demonstrational interface allows users to concentrate on styling Web pages without paying any attention to XSLT programming task."

"It has been pointed out that the success of a demonstrational system depends far more on the user experience of interacting with the system than the induction algorithms used to create the user's programs. The advantage of our XSLT authoring environment in this regard is that it relies on a conventional WYSIWYG HTML editor rather than a special editor tailored to the demonstrational interface. Further research will be needed to investigate the applicability of this technique. The main problem is to determine to what extent uses are willing to accept a demonstrational system that guesses what they are doing, and that occasionally might make inadequate or inappropriate generalizations. However, the demonstrational approach investigated in this paper will be an important step beyond the existing visual interfaces for XSLT authoring."

An online version of this paper is also available from The Open University. Similarly, presented at Extreme Markup 2000.

[CR: 20010905]

Goldman, Roy; Jason McHugh; Jennifer Widom. "From Semistructured Data to XML: Migrating the Lore Data Model and Query Language." [ARTICLE] Markup Languages: Theory & Practice 2/2 (Spring 2000) 153-163 (with 16 references). ISSN: 1099-6622 [MIT Press].

Abstract: "Research on semistructured data over the last several years has focused on data models, query languages, and systems where the database is modeled as some form of labeled, directed graph. The recent emergence of Extensible Markup Language (XML) as a new standard for data representation and exchange on the World-Wide Web has drawn significant attention. Researchers have casually observed a striking similarity between semistructured data models and XML. While similarities do abound, some key differences dictate changes to any existing data model, query language, or DBMS for semistructured data in order to fully support XML. This paper describes our experiences migrating the Lore database management system for semistructured data to work with XML. We present our modified data model, whose definition was a subtly challenging task given that XML itself is just a textual language. Based on this model, we describe changes to Lorel, Lore's query language. We also briefly discuss changes to Lore's dynamic structural summaries (DataGuides) and the relationship of DataGuides to XML's Document Type Definitions (DTDs)."

See the related paper by R. Goldman, J. McHugh, and J. Widom: "From Semistructured Data to XML: Migrating the Lore Data Model and Query Language," in Proceedings of the Secoind International Workshop on the Web and Databases (WebDB '99), Philadelphia, Pennsylvania, June 1999. [cache]

[CR: 20010905]

Slotnik, Arnold M. "The Consultant's Toolkit." [SQUIB ] Markup Languages: Theory & Practice 2/2 (Spring 2000) 164. ISSN: 1099-6622 [MIT Press].

The author supplies an SGML DTD for a document of type 'complication'.

[CR: 20010905]

Lukka, Tuomas J. "Marked-up Programming: Using XML to Structure Computer Program Source Code." [PROJECT REPORT] Markup Languages: Theory & Practice 2/2 (Spring 2000) 165-182 (with 10 references). ISSN: 1099-6622 [MIT Press]. [Author's affiliation:] Helsinki and Jyväskylä, see WWW.

Abstract: "The use of markup in the context of computer languages is discussed. As with human languages, moving from procedural markup (such as plain source code) to descriptive markup in which the structures of the problem and the program are better displayed has several advantages: it is easier to reuse or mechanically manipulate the source code, and the expressive scope of the language can be expanded, allowing new kinds of structures to be expressed. An example of a system using markup with source code successfully is given by XGtk, which is a system by the author for writing GUI applications. The GUI is written with XML markup representing the widget hierarchy, and the program code is written at the appropriate places in the XML hierarchy as event callbacks, constructors, destructors, or methods of composite widgets."

"We have presented XGtk, a proof-of-principle system for programming graphical user interfaces using a traditional programming language (Perl) marked up with XML markup to provide additional structure (i.e., the widget tree) and to entangle the document with the code, making it easier to keep the documenta-tion and the code in synch. It might be argued that XGtk is an anomalously good application of XML source code markup since widget trees happen to be hierarchical. However, 1) so is data in many other applications, and 2) XML has the ID-IDREF system by which elements can refer to other elements uniquely, allowing more complicated structures.

"There are already several different ways to represent user interfaces using markup languages. A simple and early one is HTML forms where javascript within the markup can even be used to respond to user actions interactively. The Glade user interface editor can output XML describing the user inter-face, and this can then be read into a program using libglade (both are in connection with the Gnome project). The XUL language used for describing browser window decorations in the Mozilla project is another example of using javascript within the markup to add actions to the user interface. However, none of these can be used to create a complete general-purpose application from just markup; most of the application lies beyond the reach of the marked-up source code. In this article, we reverse the scenario: the marked-up source file is the application. The use of markup such that the code and the documentation are at the same place is reminiscent of Knuth's literate programming; however, the existence of XML makes both the format and the tools for processing it far simpler than the original WEB language."

"We have several other projects underway for using XML in the context of source code, which we will be reporting separately. For instance, cleaning up the implementation by writing XGtk itself as an XML document seems to have potential -- the structure of the program can be brought out much better this way, by explaining all the elements through XML structures. Also, the actions and the documentation related to elements can be kept in synch by keeping them next to each other in the 'source code' -- this has sometimes been a problem when the XGtk language has been rapidly evolving. At this time, so little is known about source code markup that it is difficult to make any broad, sweeping classifications. Each non-trivial project that source markup is used for will undoubtedly uncover new ideas about this topic. However, it is quite probable that some patterns, analogous to OO Design Patterns will start emerging, for instance, the way XGtk handles the pack element might be useful in other contexts. Likewise, it may be possible to create a toolkit of XML transformations that are useful for source code markup. In this way, new DTDs could be built in a matter of minutes to correspond with the structure of the problem being solved."

July 1999: "XGtk is currently a simple Perl script that uses the Perl XML::Parser module (available on CPAN) and XML::Grove module to create complete Gtk (and later Gnome) applications using just XML with embedded Perl (later other languages) scriplets. This module also requires a working Gtk module for Perl (gnome-perl subdirectory in the gnome CVS repository)..."

See the XGtk sources in the GNOME CVS repository. Compare: "Extensible User Interface Language (XUL)."

[CR: 20010905]

Wadler, Philip. "A Formal Semantics of Patterns in XSLT and XPath." [ARTICLE] Markup Languages: Theory & Practice 2/2 (Spring 2000) 183-202 (with 12 references). ISSN: 1099-6622 [MIT Press]. Author's affiliation: Avaya Labs (previously Bell Labs); email:

Abstract: "This article presents a formal semantics of the pattern language from the 16 December 1998 draft of XSLT. The semantics is clear and concise, summarizing in one page of formulas what required about ten pages of prose to describe. With the aid of the semantics one can rigorously state and prove properties of the language; these properties helped to guide future development of the XSLT design. The semantics was developed using standard techniques from the programming langauge community, and this article provides a tutorial introduction to these techniques. While little here will be new to the language theorist, some of what is here may be useful to the markup technologist."

"The formal semantics is given in a style known as denotational semantics. There are several textbook introductions to this subject, including those by Schmidt and Allison. We will be able to get by using some of the most basic ideas from semantics. Many of the tricky corners in semantics arise from possibly infinite behavior, such as a program that may enter an infinite loop. Since in our case we deal with finite documents and finite sets of nodes, these complexities can be avoided.

"The formal semantics also draws upon techniques from the functional pro-gramming community. Again, there are several fine introductions, including those by Bird and Paulson. The semantics was developed and debugged by transliterating it into the functional language Haskell, and a copy of the Haskell program may be had by contacting the author. In related work, Haskell programs for manipulating XML have been developed by Wallace and Runciman [Wallace/Runciman 1999].

"The same techniques used here can be extended to give a denotational semantics of the entire XPath language, and such a semantics has been written. However, XPath is considerably more powerful than the pattern language of the December 1998 XSLT, and the semantics is correspondingly more complex. The semantics given here seems more appropriate for a gentle introduction."

Note: Philip Wadler has written a book on functional programming; see the bibliography page. On Haskell: see HaXml: utilities for using XML with Haskell ('Includes an XML parser, an HTML parser, a pretty-printer, a combinator library for generic XML transformations, and two Haskell>-<XML converters using type-based translation').

An online version of this paper is available from the author; [cache]

See also the list of related papers on XML.

Hosted By
OASIS - Organization for the Advancement of Structured Information Standards

Sponsored By

IBM Corporation
ISIS Papyrus
Microsoft Corporation
Oracle Corporation


XML Daily Newslink
Receive daily news updates from Managing Editor, Robin Cover.

 Newsletter Subscription
 Newsletter Archives
Globe Image

Document URI:  —  Legal stuff
Robin Cover, Editor: