Volume Description
Abstract: "XML, a landmark in the evolution of Internet information systems, allows authors to say what they mean, rather than merely how to say it. The shift to XML will unleash a diverse range of new applications, ranging from mathematcial equation structures to new browser and client tools. This issue of the Web Journal, by guest editor Dan Connolly, is your first look at the technical specifications and early applications of a new data format that will rock every aspect of the Web, including markup, linking, and exchange." [from the publisher]
The volume has been published as 'Volume 2, Issue 4' of the World Wide Web Journal, edited by Rohit Khare and published by O'Reilly & Associates.
See also online: XML: Principles, Tools, and Techniques: Full Description [archive copy]. An the online table of contents is also avaliable, [local archive copy].
Editorial
Summary: "Guest Editor Dan Connolly and Series Editor Rohit Khare team up to herald the appearance of XML and discuss its evolution from the Standard Generalized Markup Language (SGML)."
XML Background
Summary: Several members of the W3C XML WG (Working Group) are interviewed by D. C. Denison: "Members of the W3C's XML Editorial Review Board talk about the 'road to XML': its history, breakthroughs, the participation of Microsoft and Netscape, and the work that remains."
Work in Progress
Summary: "In 'The Web Is Ruined and I Ruined It' self-proclaimed HTML Terrorist David Siegel discusses how proper separation of structure (HTML), style (CSS), and semantics (XML) makes content more compelling and design more effective."
A version of this document is available online in HTML format: http://webreview.com/97/04/11/feature/index.html; or http://xent.ics.uci.edu/FoRK-archive/spring97/0381.html [local archive copy, text only].
Timeline
On pages 22-25, the editors provide a description of key events in the development of XML to date. Each event has an associated URL so that the reader can visit documents which chronicle the progress of XML in the past year.
W3C Reports
Abstract: "Extensible Markup Language (XML) is an extremely simple dialect of SGML which is completely described in this document. The goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML. XML has been designed for ease of implementation and for interoperability with both SGML and HTML."
A version of this document is available online in HTML format: http://www.w3.org/TR/WD-xml-970807; [local archive copy ].
Abstract: "This document specifies a simple set of constructs that may be inserted into XML documents to describe links between objects and to support addressing into the internal structures of XML documents. It is a goal to use the power of XML to create a structure that can describe the simple unidirectional hyperlinks of today's HTML as well as more sophisticated multi-ended, typed, self-describing links."
A version of this document is available online in HTML format: http://www.w3.org/TR/WD-xml-link-970731; [local archive copy].
Abstract: "The HTML-Math Working Group recently released another version of its Working Draft of MathML. The full text of this Working Draft is available at http://www.w3.org/TR/WD-math. This note should serve to point the way to the proposal outlined in the full Working Draft, and will describe a little of the history, current state, and the future of the HTML-Math work."
Note: The Working draft of July 10, 1997 clarifies the relationship of HTML-Math to XML as follows: "Mathematical Markup Language, or MathML, is an XML application for describing mathematical notation and capturing both its structure and content. The goal of MathML is to enable mathematics to be served, received, and processed on the Web, just as HTML has enabled this functionality for text. [...] The Mathematical Markup Language, or MathML, working draft defines an XML compliant mark-up language for describing equation content and presentation. Equation presentation mark up is carried out in a way which respects logical expression structure. This allows content description to be attached in a natural and effective way. For example, one might add an annotation to a superscript indicating it denotes function composition instead of power in this expression. Alternatively, authors may directly use equation content tags to mark up common things like trig functions, powers, and so on. Content tags are associated with notational conventions, for example, adding mark up for a power, which is by default rendered as a superscript."
Abstract: "This document defines the high-level requirements for the Document Object Model (DOM). References to XML and HTML documents generally denote the physical files that contain structural markup Some requirements are not implemented in DOM Level 1 [. . .]."
Note that the Document Object Model Specification (W3C Working Draft 09-Oct-1997) "defines the Document Object Model, a platform- and language-neutral interface that will allow programs and scripts to dynamically access and update the content, structure and style of documents. The Document Object Model provides a standard model of how the objects in an XML or HTML document are put together and a standard interface for accessing and manipulating these objects and their inter-relationships. Vendors can support the DOM as an interface to their proprietary data structures and APIs, and content authors can write to the standard DOM interfaces rather than product-specific APIs, thus increasing interoperability on the Web."
A later version of this document is available online in HTML format: Document Object Model Requirements, W3C Working Draft 09-October-97.
Technical Papers
Abstract: "This article provides a technical introduction to XML with an eye towards guiding the reader to appropriate sections of the XML specification when greater technical detail is desired. This introduction is geared towards a reader with some HTML or SGML experience, although that experience is not absolutely necessary. The XML Link and XML Style specifications are also briefly outlined."
A version of this document is available online in HTML format: http://www.berkshire.net/~norm/articles/xml/, or alternately http://www.arbortext.com/nwalsh.html; [local archive copy].
Abstract: "The simplicity of document creation was a key element in the astonishingly rapid development of the Web. This article describes XML and CSS: the 'one-two' punch that will not only bring back that level of simplicity, but also enable the construction of complex applications which are either difficult or impossible using HTML. In this article we outline the steps for using a CSS style sheet in an XML document; we discuss the limitations of CSS in complex applications, and we present a real life example."
[Lead paragraph:] "Cascading Style Sheets (CSS) is a style sheet mechanism specifically developed to meet the needs of Web designers and users. HTML provides limited possibilities for the explicit formatting and positioning of text, and the mechanisms that are provided (such as the FONT element, or the ALIGN attribute) force the page designer to embed presentation-specific information within the document making it difficult to prepare documents for a variety of screen sizes, presentation modalities (e.g., speech), and types of audiences. These limited features are not sufficient to achieve the formatting desired by many Web designers. Designers commonly resort to using tables and various HTML coding "tricks" to obtain the desired results. Among the many negative consequences of this is that the information content of HTML documents is very hard to maintain as that content is inextricably interwined with the format related encoding. More sophisticated formatting capabilities have been needed to support a wide range of types of documents from marketing froufrou to legal documents to scientific journals. CSS provides HTML with a real style sheet language with far greater control over document presentation in a way that is independent of document content. CSS style sheets can be used to set fonts, colors, white space, positioning, backgrounds, and many other presentational aspects of a document. It is possible for several documents to share the same style sheet, enabling you to maintain consistent presentation within a collection of related documents, without having to modify each document separately." [from the online version]
A version of this document is available online in HTML format: http://shoal.w3.org/w3j-xml/cssxml/grifcss.htm. See also in Web Review: "XML and CSS," by Stuart Culshaw, Michael Leventhal, and Murray Maloney.
Abstract: "HTML is the ubiquitous data format for Web pages; most information providers are not even aware that there are other options. But now, with the development of XML, that is about to change. Not only will the choices of data formats become more apparent, but they will become more attractive as well. Although XML succeeds HTML in time, its design is based on SGML, which predates HTML and the Web altogether. SGML was designed to give information managers more flexibility to say what they mean, and XML brings that principle to the Web. Because it allows the development of custom tagsets, we can think of XML as HTML without the 'training wheels.' In this article, we trace the history and evolution of Web data formats, culminating in XML. We evaluate the relationship of XML, HTML, and SGML, and discuss the impact of XML on the evolution of the Web."
A version of this document is available online in HTML format: http://www.cs.caltech.edu/~adam/papers/xml/ascent-of-xml.html.
Summary: "I want to discuss what I consider one of the worst mistakes of the software world, embedded markup; which is, regrettably, the heart of such current standards as SGML and HTML [...] There is no one reason this approach is wrong: I believe it is wrong in almost every respect."
Abstract: "Structured documents in XML are capable of managing complex documents with many separate information components. In this article, we describe the role of the XML-LANG specification in supporting this. Examples are supplied explaining how components can be managed and how documents can be processed, with an emphasis on scientific and technical publishing. We conclude that structured documents are sufficiently powerful to allow complex searches simply through the use of their markup."
"XML is the ideal language for the creation and transmission of database entries. The use of entities means it can manage distributed components, it maps well onto objects and it can manage complex relationships through its linking scheme. Most of the software components are already written." [conclusion, online version]
A version of this document is available online in HTML format: http://www.venus.co.uk/omf/cml/doc/tutorial/xml.html, or http://www.ch.ic.ac.uk/ectoc/echet96/+CML/epub/xml.html.
Abstract: "The following paper was given as a talk at the 'XML Mixer' in La Jolla, California in late July '97, before a combined audience of clinicians, computing professionals, and vendors of document processing software. What brought the group together was an ongoing effort to introduce markup technology into the processing of healthcare information in an ISO standard manner, using SGML (Standard Generalized Markup Language) and SGML's strict subset, XML (Extensible Markup Language). Other speakers spoke more specifically to processing topics, work flow, or business issues in the use of information systems in medicine, but the emphasis here is on some long perceived, but often overlooked problems in the semantics of communication. Both the general and the specific are important ingredients in this area, which indirectly indicates why the document format offers the appropriate middle ground between free text and excessively rigid (but easy to process) data structures."
Note: further information on the role of SGML/XML in medical informatics is found in the database section for the SGML Initiative in Health Care (HL7 Health Level-7 and SGML).
Abstract: "Is Perl a suitable language for programming XML? The use of Perl with XML is illustrated in this article with a program that checks to see if an XML document is well-formed. The relative simplicity of the program demonstrates that lightweight Perl programs may be used with XML, although Unicode and the use of entities make it difficult for Perl programmers to handle some XML files."
Abstract: "XML is a syntax for storing hierarchically organized data such as directories, catalogues, user manuals, etc. It can store only textual data, but that is not a severe restriction. This article defines, in some detail, how text is stored in an XML file. It also describes how an XML file is encoded for transportation over the Internet, and upon arrival, decoded again. Under the Internet model for transport of text files, the encoding/decoding may result in a 'different' file (i.e., a different sequence of bytes), but retains exactly the same text and structure."
Abstract: "Lark is a non-validating XML processor implemented in the Java language; it attempts to achieve good trade-offs among compactness, completeness, and performance. This report gives an overview of the motivations for, facilities offered by, and usage of, the Lark processor. This article applies to version 0.92 of Lark, in use in early September 1997."
A later version of this document is available online in HTML format: http://www.textuality.com/Lark/.
Abstract: "Microsoft cofounded the XML working group at the W3C in July 96 and actively participated in the definition of the standard. This article describes why Microsoft implemented its first XML application and how it led to the development of two XML parsers shipping in Internet Explorer 4.0, one written in C++ and the other in Java. We describe the importance of designing an object model API and our vision of XML as a universal, open data format for the Internet."
A version of this document is available online in HTML format: http://www.w3j.com/xml/excerpt.html; [local archive copy]. See also Microsoft's XML Support Page.
Abstract: "JUMBO (Java Universal Markup Language) is an object-oriented XML browser/editor and transformation tool, written in Java. It has been developed as a development tool to explore the emerging XML-LANG and XML-LINK specifications, and implements most of the current proposals. Its emphasis is on the management of structured documents; specifically, their interpretation as trees. It provides behavior for ELEMENTS by providing Java classes for rendering and transformation. It is particularly aimed at nontextual applications where ELEMENTs (such as those in technical disciplines) require complex processing. JUMBO also implements much of the current XML-LINK spec, including TEI extended pointers and simple aspects of EXTENDED XML-LINKs."
Note: other information on JUMBO may be found on the VSMS server or on a server located in San Diego.
Abstract: "This paper discusses the challenges of capturing the state of distributed systems across time, space, and communities, and looks to XML as an effective solution. First, when recording a data structure for future reuse, XML format storage is self-descriptive enough to extract its schema and verify its validity. Second, when transferring data structures between different machines, XML's link model in conjunction with Web transport protocols reduces the burden of marshaling entire data sets. Third, when sharing collaborative data structures between disparate communities, it is easier to compose new systems and convert data definitions to the degree that XML documents are adopted for the World Wide Web. Just as previous generations of distributed system architectures emphasized relational databases or object-request brokers, the Web generation has good reason to adopt XML as its common archiving tool, because XML's sheer generic power has value in knowledge representation across time, space, and communities."
A version of this document is available online in HTML format: http://www.cs.caltech.edu/~adam/papers/xml/xml-for-archiving.html; [local archive copy].
[Introduction]: "The extraordinary growth of the World Wide Web has been fueled by the ability it gives authors to easily and cheaply distribute electronic documents to an international audience. As Web documents have become larger and more complex, however, Web content providers have begun to experience the limitations of a medium that does not provide the extensibility, structure, and data checking needed for large-scale commercial publishing. The ability of Java applets to embed powerful data manipulation capabilities in Web clients makes even clearer the limitations of current methods for the transmittal of document data."
"To address the requirements of commercial Web publishing and enable the further expansion of Web technology into new domains of distributed document processing, the World Wide Web Consortium has developed an Extensible Markup Language (XML) for applications that require functionality beyond the current Hypertext Markup Language (HTML). This paper describes the XML effort and discusses new kinds of Java-based Web applications made possible by XML."
A version of this document is available online in HTML and several other formats: HTML, [directory] http://sunsite.unc.edu/pub/sun-info/standards/xml/why/.
Abstract: "The problem of direct access to Web data from within business applications has until recently been largely ignored. The Web Interface Definition Language (WIDL) is an application of the eXtensible Markup Language (XML) which allows the resources of the World Wide Web to be described as functional interfaces that can be accessed by remote systems over standard Web protocols. WIDL provides a practical and cost-effective means for diverse systems to be rapidly integrated across corporate intranets, extranets, and the Internet."
A version of this document is available online in HTML format: http://www.webmethods.com/technology/Automating.html; [local archive copy, text only].