[This local archive copy is from the official and canonical URL, http://pages.wooster.edu/ludwigj/xml/thesis.html; please refer to the canonical source document if possible.]

An Investigation of XML with Emphasis on
Extensible Linking Language (XLL)

by Justin Ludwig

An Independent Study Thesis Presented in Partial Fulfillment of the Requirements
of the College of Wooster and the Program in Computer Science

Advisor: Dale Brown

March 23, 1999

Abstract

This paper describes Extensible Markup Language (XML) documents. It explains how to construct XML documents with Document Type Descriptions (DTDs), XML Namespaces, Extensible Linking Language (XLL), Extensible Stylesheet Language (XSL), and Cascading Style Sheets (CSS) Level 1. Each chapter includes a real-world case study of XML usage related to the chapter. This paper also describes an XML browser that can process and display XLL hyperlinks. It includes a full implementation in Java, using the Document Object Model (DOM) Core Level 1.

1 Introduction

2 Extensible Markup Language

2.1 Document Type Descriptions: 2.1.1 !DOCTYPE tag; 2.1.2 !ELEMENT tag; 2.1.3 !ATTLIST tag; 2.1.4 !ENTITY tag; 2.1.5 Special keywords

2.2 XML Namespaces

2.3 The Document Obejct Model (Core) Level 1: 2.3.1 Generic tree nodes; 2.3.2 Document nodes; 2.3.3 Element nodes; 2.3.4 Other nodes

2.4 Case Study: Java Speech Markup Language

3 Extensible Stylesheet Language

3.1 Cascading Style Sheets Level 1: 3.1.1 Basic rules; 3.1.2 Advanced selectors; 3.1.3 Pseudo-elements

3.2 XSL Stylesheets: 3.2.1 Stylesheet structure; 3.2.2 Template rules; 3.2.3 Ancestry patterns; 3.2.4 Qualified patterns; 3.2.5 Applying results; 3.2.6 Special result tags; 3.2.7 Extracting character data; 3.2.8 Counting; 3.2.9 Macros

3.3 Case Study: Docproc: 3.3.1 body, section, and topic tags; 3.3.2 Lists; 3.3.3 Table of contents; 3.3.4 File dates and sizes

4 Extensible Linking Language

4.1 XLink: 4.1.1 Simple links; 4.1.2 Extended links; 4.1.3 Group links

4.2 XPointer: 4.2.1 Absolute terms; 4.2.2 Relative terms; 4.2.3 String terms; 4.2.4 Spanning terms; 4.2.5 Attribute terms

4.3 Case Study: The Annotated XML Specification: 4.3.1 x tag; 4.3.2 spec tag; 4.3.3 here tag; 4.3.4 Link example

5 Application Description

5.1 Implementation: 5.1.1 Overview; 5.1.2 Link processing

5.2 Discussion

A References

B Application Listing

RetrieveURLDialog.java

RetrieveURLListener.java

VolatileInteger.java

1 Introduction

Extensible Markup Language (XML) stores diverse kinds of character data in a structured way. XML constitutes a subset of Standard Generalized Markup Language (SGML), but has simpler rules. The World Wide Web Consortium (W3C) controls the XML standard. The W3C designed XML to allow a single general parsing algorithm to read any set of XML-formatted data into memory.

Along with XML, the W3C is developing several complimentary markup specifications. The XML Namespaces architecture allows XML documents to combine potentially-conflicting sets of XML tags. The Document Object Model (DOM) specifies a standard interface for applications and scripts to access XML data. Extensible Stylesheet Language (XSL) and Cascading Style Sheet (CSS) documents provide display specifications for XML documents. Extensible Linking Language (XLL) describes references among XML documents.

These technologies extend the limited capabilities of Hypertext Markup Language (HTML, also an SGML derivative), which allows one to mark up a document with human-readable tags that describe the document's data and its display format. While an individual XML document, by itself, describes only the data that it contains, and not its formatting nor references to external data, XSL and XLL documents fill in the gaps. With XSL and XLL technologies, XML documents can markup and describe any kind of document or character-based data, display that document in a variety of different ways, and refer (or "link") to specific parts of other documents. Furthermore, a programmer can easily include the capability to manipulate an XML document in any application.

The availability of tools to manipulate data with, the industry-backed standards for, and the self-describing nature of XML makes it an appealing document format for storing character-based information, especially information that several different applications may wish to access. Today, various organizations and companies have published on the Internet hundreds of proposals or specifications for document types that use XML. Even so, few actually use XML on a day-to-day basis because the W3C has not yet issued the final 1.0 specification for either XSL and XLL. As a result, no applications currently exist that implement a complete XML package: data, formatting, and linking -- components that have had full implementation in HTML applications for several years.

This paper attempts to describe the details of the XML document format, along with its accompanying stylesheet and linking formats. In addition to numerous example markup sequences and subsequences, this paper offers an XML browser implementation that processes and traverses XLL links. While it covers the breadth of XML technologies, this paper cannot serve as a definitive reference for XML or its ancillary formats. These specifications may change from month to month, and still contain many un- or ill-defined details. The W3C website [1] offers the current specification drafts. As for general XML reference, Robin Cover's SGML/XML website [2] provides the most comprehensive and up-to-date available, on the Internet or elsewhere.

2 Extensible Markup Language

XML stores character data using tags placed within the data representing its structure. An XML document has a tree structure, where "element" tags comprise the nodes. The "root" element, implicitly present although never physically denoted in an XML file, begets all the other elements in the tree. All elements except the root have a single parent, and one can trace their ancestry to the root element. Although not required to, elements often contain character data. An element node may also possess a list of element "attribute" items. These attributes hold information about their parent elements.

The W3C XML recommendation [3] specifies the syntax of an XML document. It also specifies how an XML processor transforms an XML document from a text file on disk to a tree object in memory. In an XML file, element tags delineate data. An element tag begins with a <, followed immediately by the name of the element. A list of element attributes may follow, until the tag ends with a >. The list of attributes consists of attribute names paired by = signs with their values. The attribute names precede the values, and single or double quotes (' or ") enclose the attribute values. One concatenates the attribute name and value pairs with white space. An example myElementName tag follows, with attributes myAttribute1Name and myAttribute2Name:

<myElementName myAttribute1Name="my attribute1 value"
myAttribute2Name="my attribute2 value">

An element may enclose character data or child elements with a start tag and an end tag, or it may enclose nothing and consist solely of an empty-element tag. Element start tags and empty-element tags may contain element attribute names and corresponding value declarations; an end tag contains only the element name preceded by a slash (/). Here a start and end tag pair enclose some character data:

<myElement myAttribute="my attribute value">This is character data
interposed between the start and end tags.</myElement>

An empty-element tag, which ends with a slash, follows:

<myEmptyElement myAttribute="my attribute value"/>

An XML document that conforms to the 1.0 version of the XML specification always begins with an XML declaration consisting of the <?xml version="1.0"?> tag. In general, the <? and ?> symbols delineate processing instruction tags. An XML processor can choose whether or not to respond to the processing instruction based on the first word in the instruction, which should indicate the "target" application. The  symbols delineate comment tags, which an XML processor completely ignores. The example below shows an XML document which describes a poem:

<!-- poem.xml -->
<?xml version="1.0"?>
<poem>
	<title>Fire and Ice</title>
	<author>
		Robert Frost
		<!-- comment:the following shows an empty-element 
		with two attributes -->
		<dates born="1874" died="1963"/>
	</author>
	<body lines="9">
		<line>Some say the world will end in fire,</line>
		<!-- comment: the seven middle lines go here -->
		<line>And would suffice.</line>
	</body>
</poem>

2.1 Document Type Descriptions

A Document Type Description (DTD) establishes the logical structure of an XML document. A "valid" XML document either contains a DTD or references one, and the document conforms to the DTD. An "well-formed" XML document follows all the rules of the XML specification, but it either does not have a DTD or it fails to conform to it. A valid XML document must be well-formed. One would call the poem.xml document above well-formed, but not valid, because while it follows all the syntactical rules of XML, it lacks a DTD.

2.1.1 `!DOCTYPE` tag

The !DOCTYPE tag always begins a DTD, and encloses the rest of the DTD. Then name of the "document" element follows the !DOCTYPE keyword. All elements specified by the DTD descend from the document element. If the XML file itself contains the DTD, the DTD follows the !DOCTYPE keyword, enclosed by brackets ([]). One should set the standalone attribute of the XML declaration to yes in this case:

<!-- standalone_document.xml -->
<?xml version="1.0" standalone="yes"?>
<!DOCTYPE myDocumentElement [
	<!-- the rest of the DTD goes here -->
]>
<!-- the XML markup goes here -->

If the DTD for the document resides in an external file, a "system identifier" or a "public identifier" follows the root element name. A system identifier consists of the SYSTEM keyword followed by a Uniform Resource Identifier (URI):

<!DOCTYPE myType SYSTEM "http://www.my.org/myType.dtd">

A public identifier consists of the PUBLIC keyword followed by the descriptive name of a well-known resource. A public identifier indicates the resource kind and name, which allows an XML processor to find the most current version of the designated resource on its own. The XML recommendation does not specify a format for public identifiers, but it is common to also include the organization which produces the resource and its type:

<!DOCTYPE myType PUBLIC "-//myOrg//DTD myType 1.0//EN">

One can also refer to a resource by both a system identifier and by a public identifier:

<!DOCTYPE myType PUBLIC "-//myOrg//DTD myType 1.0//EN" "http://www.my.org/myType.dtd">

With the above example, an XML processor will try to find the resource using the public identifier first; if it cannot, it will try using the system identifier. An XML document which uses an external DTD should set the standalone attribute of the XML declaration to no. The following XML document uses an external DTD which it expects an XML processor to find located in the same directory as the XML document:

<!-- document_with_external_DTD.xml -->
<?xml version="1.0" standalone="no"?>
<!DOCTYPE myDocumentElement SYSTEM "myDocumentElement.dtd">
<!-- the XML markup goes here -->

Any time a DTD tag refers to an external file, even with elements other than !DOCTYPE, it must use either a SYSTEM or the PUBLIC identifier.

2.1.2 `!ELEMENT` tag

The !ELEMENT DTD tag declares the elements that comprise a document type. One declares each element within its own !ELEMENT tag. The !ELEMENT tag also establishes the relationship between the element it declares and the children of that element. If the element can never contain any elements or character data, one declares it with the EMPTY keyword; if it can contain any set of elements, one declares it with the ANY keyword:

<!ELEMENT a EMPTY>
<!ELEMENT b ANY>

If an element contains only character data, one uses the #PCDATA keyword, surrounded by parenthesis:

<!ELEMENT c (#PCDATA)>

One may add to the #PCDATA keyword a list of elements, separated by bars (|), that compose the structure of the parent element. A list with the #PCDATA keyword must end with an asterisk, denoting that the parent element may contain any number of child elements and character data chunks. With the following example, an element of type a may contain, interspersed in any order, instances of character data, any number of elements b, and any number of elements c:

<!ELEMENT a (#PCDATA | b | c)*>

One may also specify a more structured list consisting only of elements. This list may include nested lists declaring either "sequences" (,) or "choices" (|) of elements. A sequence consists of an ordered list of elements; a choice means one and only one element in the list must appear. The following example specifies a similar element a to the one above, but this element must contain either one element b, one element c, or one element d, but not any combination of the three, nor anything else:

<!ELEMENT a (b | c | d)>

With the next example, an element a must contain one element b, one element c, and one element d, in that order. Element a cannot contain anything else:

<!ELEMENT a (b, c, d)>

Plus signs, asterisks, and question marks respectively denote whether a given element may appear at least once in a row, zero or more times in a row, or either once or not at all. In the following example, an element a must contain zero or more element b's, followed by one or more element c's, followed by either one or no element d's:

<!ELEMENT a (b+, c*, d?)>

Two element a's that conform to this declaration (note the slashes in the b, c, and d tags indicate empty elements):

<a>
	<b/>
	<b/>
	<c/>
</a>
<a>
	<c/>
	<c/>
	<d/>
</a>

The following, more complicated, definition of element a specifies either one or no b's followed by one or no c's, or one or more lists of zero or more d's and one or no e's:

<!ELEMENT a ((b?, c?) | (d*, e?)+)>

One can derive these a's from the above declaration:

<a></a>
<a>
	<b/>
	<c/>
</a>
<a>
	<d/>
	<d/>
	<e/>
	<d/>
</a>

An external DTD follows for the poem.xml file listed near the beginning of this chapter. Note it contains only !ELEMENT declarations, even though a complete version should include other declarations describing the element attributes. Recall that the poem element subsumes all other elements, and so one should declare it as the document element in the XML file:

<!--poem2.xml -->
<?xml version="1.0"?>
<!DOCTYPE poem SYSTEM "poem.dtd">
<poem>
	<!-- see the Frost poem above -->
</poem>

The DTD for the document type poem:

<!--poem.dtd -->
<!ELEMENT poem (title, author, body)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT author (#PCDATA | dates)*>
<!ELEMENT body (line*)>
<!ELEMENT dates EMPTY>
<!ELEMENT line (#PCDATA)>

2.1.3 `!ATTLIST` tag

Rather than encoding element information as character data within an element body, one can use element attributes to encode this information within start and empty-element tags. The DTD specifier must use his or her own discretion in each case when deciding whether to use an attribute, or to encode the data within the element body. The best choice often varies depending on the application. Attributes most usefully represent data that often appears in a default configuration, or establish a unique identifier for specific elements.

The !ATTLIST tag specifies element attribute names, types, and default values as a single list of (potentially) multiple attributes. For an attribute type, one may specify that an attribute consists of character data, and apply the CDATA keyword, or that it encodes a unique token, and apply the ID keyword, or that it constitutes an enumeration, and declare the permissible character data values. One would declare an element that had two attributes, one containing a unique identifier and one containing character data, as the following (ignore the #REQUIRED and #IMPLIED keywords for the moment):

<!ELEMENT myElement EMPTY>
<!ATTLIST myElement
	myID		ID	#REQUIRED
	myCharacterData	CDATA	#IMPLIED>

Parentheses surround an enumeration and bars (|) separate it internally. The default value for an attribute closes off each individual attribute declaration. With the following example, an element of type a could only take the values b, c, or d for the myAttribute attribute. If, in the body of an XML file, one does not mention the attribute myAttribute in an a element, the XML processor will assign it the default value of c:

<!ELEMENT a EMPTY>
<!ATTLIST a
	myAttribute	(b | c | d)	"c">

If one specifies an enumeration of notations, the NOTATION keyword precedes the enumeration. An element attribute may use this keyword to establish the format of its element's data. A notation attribute takes as its value a previously defined notation, which in turn refers to an external file that specifies the format of the notation. One declares a notation with the !NOTATION tag, followed by the name of the notation, followed by a system or public identifier. The example below instructs an XML processor to use the notation defined in the local file formula.def to process elements which use the formula notation:

<!NOTATION formula SYSTEM "formula.def">

Note that each XML processor may respond to such requests differently, if at all. The next example shows the formula notation used in an attribute list declaration:

<!ELEMENT equation (#PCDATA)>
<!ATTLIST equation
	format	NOTATION	(formula)	#IMPLIED>

If an attribute may hold only one value, the #FIXED keyword must precede that value, which takes the place of a default value. If the attribute has no default, the #IMPLIED keyword replaces the default value. If an attribute has no default, but must appear in every instance of the element tag, the #REQUIRED keyword replaces the default value.

The following example declares four attributes for the element book. An XML file with the book tag must provide a character data title attribute. The title attribute doubles as the unique identifier of a particular book element. All others attributes are optional. If no author attribute appears, an XML processor will give the attribute a value of Faulkner. One can include the language attribute in a book element tag, but it must always have the value English.

<!ELEMENT book EMPTY>
<!ATTLIST book
	title		ID		#REQUIRED
	author		(Faulkner | Hemingway | Steinbeck)	"Faulkner"
	pages		CDATA		#IMPLIED
	language	NOTATION	(English)	#FIXED	"English">

An example book tag with minimal attributes follows, then one with all un-redundant attributes:

<book title="The Sound and the Fury"/>
<book title="For Whom the Bell Tolls" author="Hemingway" pages="390"/>

2.1.4 `!ENTITY` tag

An entity refers to a chunk of text in an XML file, which can include markup. One can use an "entity reference" much like a C constant. One can change the value of an entity in one place, and an XML processor will replace all its references with the entity's value during processing. This works similarly to the replacement of a "character reference." When an XML processor runs across the string A (65 represents A in decimal Unicode) or &#x41; (41 represents A in hexadecimal Unicode), it replaces the reference with the actual character A. XML processors have hard-coded character references; entity references may refer to literal values defined directly in a document's DTD, or to remote resources.

An !ENTITY tag declares an identifier that an XML processor replaces with the entity's value during processing. One can use a "parameter" entity reference only within a DTD, and a "general" reference only within the main body of an XML document. In an !ENTITY declaration, the percent sign denotes a parameter reference; its absence indicates a general reference. When used outside of the !ENTITY declaration, a percent sign precedes a parameter reference name and an ampersand precedes a general reference name. A semicolon follows both.

The following example shows a parameter reference, which works only within a DTD. First, the !ENTITY declaration indicates that an XML processor should replace every instance of the myParometerReference reference with the text myElement. Then the !ELEMENT declaration invokes the myParameterReference reference by enclosing the reference identifier within a percent sign and semicolon:

<!DOCTYPE myDocument [
<!ENTITY myParameterReference % "myElement">
<!ELEMENT %myParameterReference; (#PCDATA)>]>

An XML processor will replace the %myParameterReference; in the above !ELEMENT declaration with myElement:

<!ELEMENT myElement (#PCDATA)>

The next example shows a general reference, which one declares within a DTD, but uses only in regular markup outside of the DTD. The !ENTITY declaration indicates that an XML processor should replace every instance of the myGeneralReference reference with the text myElement. Further down, the markup section invokes the myGeneralReference reference by enclosing the reference name with an ampersand and a semicolon:

<!DOCTYPE document [
<!ENTITY myGeneralReference "myElement">
<!ELEMENT myElement (#PCDATA)>]>

<&myGeneralReference;>some character data</&myGeneralReference;>

An XML processor will replace the &myGeneralReference; reference in the above example with myElement:

<myElement>some character data</myElement>

An entity reference can replace any part of an XML document, including DTD declarations, element tags, character data, etc. One could use an entity reference to replace the entire element in the above example. With this definition of the general entity reference myGEreference

<!DOCTYPE document [
<!ENTITY myGEreference "<myElement>some character data</myElement>">
<!ELEMENT myElement (#PCDATA)>]>

&myGEreference;

one can replace the &myGEreference; in the above example with

<myElement>some character data</myElement>

A stipulation in the XML specification requires that one encode quote characters (' or ") in the replacement text part of !ENTITY declarations as &#39; or &#34;, so as to avoid ambiguous replacement text terminators. For example, instead of

<!ENTITY myGEreference "myElement myAttr="five"">

one should use

<!ENTITY myGEreference "myElement myAttr=&#38;#34five&#38;#34;">

Entity references can also refer to outside files, using system or public identifiers. The content of the file referenced replaces the entity identifier. External general references can also contain the NDATA keyword, followed by a notation. This example reference refers to an external picture.gif file, and the NDATA keyword declares that the gif notation specifies the format of the file:

<!ENTITY picture SYSTEM "picture.gif" NDATA gif>

2.1.5 Special keywords

The IGNORE and INCLUDE DTD keywords allow one to switch between ignoring and including a section of the DTD. One often combines them with entity references. To ignore a section, place <![IGNORE[ and ]]> around the section; to include it, place <![INCLUDE and ]]> around it. The following example illustrates how one can use these keywords with entity references. When working with the draft version of a book, one might define the entities draft and final as such:

<!ENTITY % draft "INCLUDE">
<!ENTITY % final "IGNORE">

When an XML processor reads the following DTD

<![%draft;[<!ELEMENT book (title, author, body)>]]>
<![%final;[<!ELEMENT book (title, author, publisher, date, body, index)>]]>

it replaces %draft; with INCLUDE and %final; with IGNORE:

<![INCLUDE [<!ELEMENT book (title, author, body)>]]>
<![IGNORE[<!ELEMENT book (title, author, publisher, date, body, index)>]]>

In this case, an XML processor will only use the draft definition of a book. When a book becomes complete, one would want to change the entity definitions to

<!ENTITY % draft "IGNORE">
<!ENTITY % final "INCLUDE">

so that an XML processor will replace %draft; with INCLUDE and %final; with IGNORE. Then the processor will use the final definition of a book:

<![IGNORE [<!ELEMENT book (title, author, body)>]]>
<![INCLUDE[<!ELEMENT book (title, author, publisher, date, body, index)>]]>

The CDATA keyword indicates that it encloses character data unconstrained by the usual limitations against special XML characters (<, >, &, ', and "). Often one can represent these characters with their Unicode character references (or with <, >, &, ', and " respectively), but sometimes one might prefer the raw character data. In these cases, <![CDATA[ and ]]> should enclose that character data. In the following example, those symbols enclose the data FF00 & 20F0:

<value><![CDATA[FF00 & 20F0]]></value>

Comments and processing instructions do not have the character limitations that normal markup has.

2.2 XML Namespaces

Because different authors are likely to create different XML tags with the same name for different uses, the W3C added the namespace component to XML tags. The W3C's XML Namespaces specification [4] describes their usage. One can include elements from many different namespaces in the same document; a namespace merely delineates an element of one kind with the same name as another of another kind. Placing a namespace prefix and colon in front of a tag name indicates that the tag belongs to that namespace. In the following example, the html prefix shows that the H1 tag belongs to the html namespace:

<html:H1>This element is an HTML element.</html:H1>

Note that this does not indicate that all H1 tags in the document belong to the html namespace; in the following example, the first H1 tag does, while the second belongs to the hotdog namespace:

<html:H1>This element is an HTML element.</html:H1>
<hotdog:H1 bun="yes">beef frank with catsup and pickles</hotdog:H1>

Before using a namespace prefix, one must establish the formal name that the prefix represents in the XML document. One can do this in any tag that subsumes all references to the namespace. In the following example, one could either put the namespace declaration in the HTML tag, or in the BODY tag, or in any tag that might enclose the HTML tag:

<!-- example1.html -->
<HTML>
	<html:BODY>
		<html:P>Some HTML text.</html:P>
		<hotdog:H3>turkey frank in beans</hotdog:H3>
	</html:BODY>
</HTML>

The formal name of a namespace consists of a URI. In general, the URI should refer to the specification of the namespace, although it does not need to. Namespaces use URI's simply because that ensures that each namespace will have a unique designation. Therefore, each namespace must possess a unique URI. The following example designates the URI for the namespace prefix html. The xmlns prefix on the xmlns:html attribute of the HTML tag designates that attribute as one which assigns a namespace prefix:

<!-- example2.html -->
<HTML xmlns:html="http://www.w3.org/TR/REC-html40">
	<html:BODY>
		<html:P>Some HTML text.</html:P>
		<hotdog:H3>turkey frank in beans</hotdog:H3>
	</html:BODY>
</HTML>

In adding namespace declarations to example1.html, one could also put the html namespace declaration in the BODY tag. Note that the hotdog namespace also receives a declaration:

<!-- example3.html -->
<HTML>
	<html:BODY xmlns:html="http://www.w3.org/TR/REC-html40">
		<html:P>Some HTML text.</html:P>
		<hotdog:H3 xmlns:hotdog="http://www.dogs.com/hotdog_spec.html">turkey
		frank in beans</hotdog:H3>
	</html:BODY>
</HTML>

The html and hotdog namespace declarations can also coexist in the same tag. Note that now the HTML tag does belong to the html namespace:

<html:HTML xmlns:html="http://www.w3.org/TR/REC-html40"
xmlns:hotdog="http://www.dogs.com/hotdog_spec.html">

If tags appear with no namespace prefix, they belong to the default namespace. One can designate the default namespace by omitting the prefix in a namespace declaration. The next example makes html the default namespace:

<HTML xmlns="http://www.w3.org/TR/REC-html40">

The next example mimics examples3.html, but uses html as the default namespace.

<!-- example4.html (default namespace: html) -->
<HTML xmlns="http://www.w3.org/TR/REC-html40"
xmlns:hotdog="http://www.dogs.com/hotdog_spec.html">
	<BODY>
		<P>Some HTML text.</P>
		<hotdog:H3>turkey frank in beans</hotdog:H3>
	</BODY>
</HTML>

The final example shows the original default namespace, html, replaced in the first P tag. Note that it only affects that P tag and that P tag's children. Also notice that multiple namespace prefixes can refer to a single namespace, as in the case of the original default and the html prefix.

<!-- example5.html -->
<!-- declares HTML the default namespace -->
<!-- (html and the default refer to the same namespace) -->

<HTML xmlns="http://www.w3.org/TR/REC-html40"
xmlns:html="http://www.w3.org/TR/REC-html40"
xmlns:hotdog="http://www.dogs.com/hotdog_spec.html">

	<BODY>
		<html:P xmlns="http://www.computers.com/computer.dtd">
			
			Some HTML text. Note that all child tags have
			a <html:EM>new</html:EM> default.
			
			<!-- the following system tag belongs -->
			<!-- to the new default namespace -->
			<system kind="Dell" processor="Pentium II"/>
			
		</html:P>
		
		<hotdog:H3>turkey frank in beans</hotdog:H3>
		
		<P>Some more HTML text. Note that html is once
		again the default namespace.</P>
	</BODY>
</HTML>

2.3 The Document Object Model (Core) Level 1

The W3C Document Object Model establishes "a platform- and language-neutral interface that allows programs and scripts to dynamically access and update the content, structure and style of documents." Various components of the DOM specify application programming interfaces (APIs) for creating, manipulating, and extending browser software like Netscape Navigator or Internet Explorer. The DOM (Core) Level 1 specification [5] describes an API for manipulating XML documents. The W3C hopes various XML processing modules will implement the DOM interface to allow other modules standard methods for access to XML data. The W3C designed the DOM so that programmers can use it with a variety of object-oriented languages like Java, C++, or ECMAScript. The DOM represents an XML document as a tree, and the object interfaces for each kind of node reflect this model. The DOM designates documents, document fragments, document type information, entity references, elements, element attributes, processing instructions, comments, text, CDATA sections, entities, and notations all as nodes.

2.3.1 Generic tree nodes

The class Node acts as a superclass for all these special kinds of nodes. A Node object has all the methods needed to manipulate a tree, along with a few methods that allow generic Node objects access to the data held by specific kinds of nodes. To traverse a tree, one can use the parentNode(), firstChild(), lastChild(), previousSibling(), and nextSibling() methods. Note that since the DOM represents a linear document, each node has a specific order relative to its siblings. The childNodes() method maintains this ordering in the NodeList object that it returns, which contains all of the children of the Node. One can determine whether or not this NodeList will actually contain any nodes by using the hasChildNodes() method of the Node. A NodeList object has only two methods: a length() method which returns the number of nodes in the list, and an item(long index) method, which returns the node in the list whose index in the list matches the integer value passed in the index field. If one passes a invalid index, the item() method returns null.

One can also manipulate the children of a given node by using several other methods of a Node object itself, like appendChild(Node newChild). The DOM will add the Node passed as a parameter to the end of the child node list of the Node upon which one invoked this method. The insertBefore(Node newChild, Node refChild) method inserts the Node passed as the first parameter into the list of child nodes directly in front of the Node passed as the second parameter. The replaceChild(Node newChild, Node oldChild) replaces the Node passed in the second parameter with the Node passed in the first. To remove a child node, one can use the removeChild(Node oldChild) method, which removes the Node passed as a parameter from the child node list of the Node upon which one invoked this method. The following function, written in Java, uses several of the above Node methods to replace the second passed Node with the first child of the first passed Node:

void replaceFirstNode(Node parent, Node newFirstChild)
{
	Node oldFirstChild = parent.firstChild();
	
	parent.insertBefore(newFirstChild, oldFirstChild);
	parent.removeChild(oldFirstChild);
	
	// could have effected same result with
	// parent.replaceChild(newFirstChild, oldFirstChild);
}

All these methods may generate a DOMException exception, which may contain an error code accessible through its code() method.

A Node object also has general methods that allow a programmer to access the specific information contained by certain types of nodes. The nodeType() method returns an integer code that indicates whether the object is a special kind of node (ex. an Element node), or just a simple Node. The nodeName() and nodeValue() methods return character strings that vary depending on the subclass kind of node upon which one invoked these methods. For example, a Text node (discussed later) will return the string #text from nodeName() and its character data content from nodeValue(). A simple Node will return null from both nodeName() and nodeValue().

The Node method attributes() will return a NamedNodeMap of Attr nodes when invoked on an Element node, and null for all other nodes. Note that under the DOM, attribute nodes are not part of the document tree, but kept as a separate list for each element. A NamedNodeMap object has the same length() and item() methods a NodeList has; in addition it has getNamedItem(String name), removeNamedItem(String name), and setNamedItem(Node newNode). The first returns the Node whose nodeName() method returns the same string as the string parameter passed, and the second returns the same Node as the first while also removing it from the NamedNodeMap. The setNamedItem() method replaces the existing Node in the NamedNodeMap with the same nodeName() as the passed Node parameter, or it appends the the new node if none already exists.

The Node class also has a few utility methods like cloneNode(boolean deep), which returns a duplicate of the Node upon which it was invoked. It does not attach the returned node to the tree. If one passes it a boolean parameter of true, the method also duplicates all of the original node's descendant nodes. The ownerDocument() method of Node returns the Document node that contains the node upon which it was invoked.

3.2.2 Document nodes

A typical XML document begins with a Document node as its root. It will have the document element as a child, as well as any processing instructions that occur before the document element (although not nessecarily the XML declaration, as the XML parsing module may consume that node itself), and perhaps some comment nodes. Document objects have a number of special utility methods: doctype() returns some DTD information in the form of a DocumentType object; implementation() returns a DOMImplementation object which may contain some information about the specific DOM implementation used; documentElement() returns the document element; and getElementsByTagName(String tagName) returns a NodeList of all the descendant elements in the document that have the same name as the string parameter.

A Document object also serves as a "factory" for special nodes. It has createElement(), createDocumentFragment(), createTextNode(), createComment(), createCDATASection(), createProcessingInstruction(), createAttribute(), and createEntityReference() methods. A DocumentFragment object serves as a "lightweight" Document, meaning that it is really just a simple Node which contains a tree fragment. It has none of the additional functionality of Document nodes.

3.2.3 Element nodes

Most of the methods of an Element node manipulate its attributes. One can use the getAttribute(String name), setAttribute(String name, String value), and the removeAttribute(String name) to get, set, and remove an attribute based on its name. If these methods cannot find the specified attribute, the get method will return an empty string, the set method will add the specified attribute, and the remove method will do nothing. If an attribute has a default value, the remove method will replace the attribute's current value with its default value.

One can also access and manipulate attribute (Attr) nodes directly. The getAttributeNode(String name) returns the Attr node with the name of the string parameter, or null if it finds none. The setAttributeNode(Attr newNode) replaces an Attr node with the same nodeName() as the passed node, or it adds the passed node if it finds no match. The removeAttributeNode(Attr oldNode) removes the Attr node with the same nodeName() as the passed node, or throws a DOMException if it finds none. It also returns the found Attr node. If the attribute has a default value, removeAttributeNode() will replace the current Attr with one which contains the default value.

Since Attr nodes do not constitute part of the document tree, they return null when one invokes the parentNode(), previousSibling(), and nextSibling() methods that the Attr nodes inherit from the Node superclass. Attr objects also, of course, have no children. Attr objects do have three methods of their own: name(), value(), and specified(). The first two return the attribute's name and value as character strings. The third returns true if the attribute was explicitly specified in the XML file, or if the DOM user modified the Attr node. The specified() method returns false only if the DOM implementation created it merely to assign the default value to it, and the DOM user has not modified it.

Element objects also have three utility methods of their own. The tagName() method returns the actual string name of the element, which equals the return value of the nodeName() method it inherited from the Node class. The getElementsByTagName() method duplicates the functionality of the Document method of the same name, but only applies to descendants of the Element upon which one invoked it. The normalize() method returns nothing, but it combines any adjacent text node siblings that descend from the Element upon which one invoked it.

The following Java function uses Node and Element methods to find in a subtree the first element that has a certain attribute name. To search an entire XML document, one could pass the Element returned by the Document method documentElement() to this function. This function returns the String tag name of the found element, or an empty string if it finds none. Note that it uses the Java String object, which contains the method equals().

// return the tag name of the found element
String elementWithAttributeName(Element e, String attrName)
{
	// if the element has an attrName attribute
	if (!e.getAttribute(attrName).equals(""))
	return e.getTagName();
	
	NodeList list = e.childNodes();
	
	for (int i = 0; i < list.length(); i++)
	{
		Node n = list.item(i);
		
		// if the child nodeType() value equals
		// the enumerated value ELEMENT_NODE
		// then recurse with the child
		if (n.nodeType() == Node.ELEMENT_NODE)
		{
			String tagName = elementWithAttributeName
			  (n, attr);
			
			if (!tagName.equals(""))
			return tagName;
		}
	}
	
	// no such element found
	return "";
}

3.2.4 Other nodes

The DOM establishes two kinds of text nodes: Text and CDATASection. A CDATASection object derives from a Text node, and a Text object derives from a CharacterData node. A DOM implementation will create a CDATASection node to hold text escaped with the <![CDATA[]]> notation, and a Text node to hold all other character data. A CDATASection has no methods of its own, and a Text node has only the method split(long offset), which splits the text contained in the original node into two nodes at the offset passed as an integer parameter. This method also returns the second (new) node, even though it adds the node to the document tree automatically.

The CharacterData object has two simple methods, data() and length(), which return the character data string and the length of the string, respectively. It also has several methods that allow one to manipulate the data string. The method appendData(String chars) appends the passed string onto the end of the node's current data string. The method insertData(long offset, String chars) inserts the string passed in the second parameter into the current data string beginning at the integer offset passed in the first parameter. All CharacterData methods will generate a DOMException if passed an invalid offset.

The method replaceData(long offset, long count, String chars) will delete the current data string from the integer offset passed in the first parameter for the integer number of characters passed in the second parameter, and then insert the string passed in the third parameter beginning at the integer offset passed as the first parameter. The method deleteData(long offset, long count) deletes the current data string from the integer offset passed in the first parameter for the integer number of characters passed in the second parameter. The method substringData(long offset, long count) returns a string which represents the data string from the integer offset passed as the first parameter for the integer number of characters passed as the second parameter.

The Comment object also descends directly from the CharacterData object. It has no additional methods. Other relatively simple node types include the ProcessingInstruction object and the Notation object. A ProcessingInstruction node has two methods, target() and data(). The target() method returns a string identifying the target application (ex. xml in an XML declaration), and the data() method returns a string containing the rest of the processing instruction. A Notation object also has two unique methods, publicId() and systemId(). Each returns a string referring to the notation specification, the first using a public identifier, and the second using a system identifier. The method nodeName(), inherited from Node, indicates the notation name.

Finally, Entity and EntityReference nodes work together. Both contain the entity's replacement text as child nodes, and both return the entity name with the nodeName() method. Entity objects have three additional methods: publicId(), systemId(), and notationName(). The publicId() and systemId() methods mimic the corresponding methods in Notation objects. The notationName() method returns the name of the notation that one may have associated with the entity.

2.4 Case Study: Java Speech Markup Language

The Java Speech Markup Language (JSML) Specification [6] describes a set of tags that one can add to a plain text document to indicate how a speech synthesizer module should speak the text. A synthesizer receives text to speak from another application, and, while speaking, the synthesizer may send messages back to that application. All JSML elements may have a MARK attribute, the value of which the synthesizer passes to the calling application when the synthesizer reaches that element. One can also use the empty element MARKER specifically for such a task. The MARKER tag possesses only the MARK attribute.

The most basic elements in JSML consist of the PARA and SENT tags. PARA elements enclose paragraphs and SENT elements enclose sentences. PARA elements may enclose SENT elements, but neither PARA nor SENT elements may enclose PARA elements. SENT elements also cannot enclose other SENT elements. The following example illustrates the use of PARA, SENT, and MARKER tags in a paragraph. Note that the MARK attributes do not affect the synthesizer's speech in any way, unlike other JSML elements.

<PARA><SENT>As I was going to <MARKER MARK="St. Ives"/>St. 
Ives, I met a man with seven wives.</SENT> <SENT>Each wife had seven 
sacks, each sack had seven cats, each cat had seven kits.</SENT> <SENT 
MARK="question">Kits, cats, sacks, and wives, how many were going to St. 
Ives?</SENT></PARA>

While in general, speech synthesizers can recognize most paragraph and sentence endings, some structures, like abbreviations, can cause problems. The SENT and PARA tags help with the sentences and paragraphs, and the BREAK tag can help with structures like commas, parentheses, or other natural pauses. When using the empty-element BREAK tag, one must include either the MSECS attribute or the SIZE attribute. The MSECS attribute takes the integer number of milliseconds the synthesizer should pause before continuing on to the next word. The SIZE attribute takes one of four possible values: none, small, medium (the default), or large. Note that the BREAK tags (like PARA or SENT tags) do not necessarily have to accompany punctuation:

<PARA><SENT>As I was going to St. Ives, I met a man <BREAK 
MSECS="500"/>with seven wives.</SENT> <SENT>Each wife had seven 
sacks, each sack had seven cats, each cat had seven kits.</SENT> 
<SENT>Kits, <BREAK SIZE="small"/>cats, <BREAK 
SIZE="small"/>sacks, <BREAK SIZE="small"/>and wives, how many were going 
to St. <BREAK SIZE="none"/>Ives?</SENT></PARA>

The SAYAS tag can help a speech synthesizer with words and notation that people do not pronounce the way they literally spell. One can use the CLASS attribute of the SAYAS tag to inform a synthesizer whether a collection of digits stands for a date, individual digits, a single number, or a time:

<SAYAS CLASS="date">5/1/77</SAYAS>
Beverly Hills <SAYAS CLASS="digits">90210</SAYAS>
<SAYAS CLASS="number">70</SAYAS> home runs
<SAYAS CLASS="time">12:30</SAYAS>

The CLASS attribute can also inform one that it should take initials as literal individual letters. In the following example, a synthesizer would pronounce the first USA "oosa" and the second "you ess ay":

USA or <SAYAS CLASS="literal">USA</SAYAS>

In place of the CLASS attribute the SAYAS tag can contain a SUB attribute that informs a speech synthesizer how to speak abbreviations or unusual spellings. On can also use the PHON attribute for the same purpose, except that it takes a string of phonetic characters from the International Phonetic Alphabet.

<SAYAS SUB="december seven, nineteen forty one">Dec. 7, 1941</SAYAS>
<SAYAS SUB="saint">St.</SAYAS> Ives

The EMP tag indicates that a synthesizer should give certain words or phrases more or less emphasis. It can either contain a word or phrase, or appear as an empty element and refer to the word following it. The EMP tag takes the LEVEL attribute, which can either have a value of strong, moderate (the default), none, or reduced. In the following example, the EMP tags before going indicate that a synthesizer should not emphasize the word. The EMP tag around how many indicates that a synthesizer should give how many a moderate emphasis.

<PARA><SENT>As I was <EMP LEVEL="none"/>going to St. Ives, I 
met a man with seven wives. </SENT> <SENT>Each wife had seven sacks, 
each sack had seven cats, each cat had seven kits. </SENT> <SENT>Kits, 
cats, sacks, and wives, <EMP>how many</EMP> were <EMP 
LEVEL="none"/>going to St. Ives?</SENT></PARA>

One can use the PROS tag to alter the characteristics of the synthesizer voice. The RATE attribute takes an integer number of words to speak per minute. A number n with a plus in front of it indicates an increase of n, a number n with a minus in front indicates a decrease of n, a number n with a plus in front and a percent sign following indicates an increase of n percent, and a number n with a minus in front and a percent sign following indicates a decrease of n percent. A number n without any other characters indicates an absolute rate of n, and a value of reset resets the RATE to the default synthesizer value.

The VOL attribute takes a value from 0.0 to 1.0 indicating the volume at which to speak. The VOL attribute, like the other attributes of PROS, also uses the same system of pluses, minuses, and percent signs as RATE. The PITCH attribute establishes the baseline pitch of the synthesizer's speech in Hertz, with an integer number. The RANGE attribute designates the range in pitch over which the synthesizer's speech can vary, also an integer number of Hertz. In the following example, the first PROS tag increases the volume by a tenth; the second tag decreases the baseline pitch by twenty-five percent; and the third PROS tag sets the rate of speech to one hundred words per minute, while increasing the range of pitch by thirty percent:

<PARA><SENT>As I was going to <PROS VOL="+.1">St. 
Ives</PROS>, I met a man <PROS PITCH="-25%">with seven 
wives.</PROS></SENT> <SENT><PROS RATE="100" 
RANGE="+30%">Each wife had seven sacks, each sack had seven cats, each cat had 
seven kits.</PROS></SENT> <SENT>Kits, cats, sacks, and wives, how 
many were going to St. Ives?</SENT></PARA>

One can use the ENGINE tag to indicate that a synthesizer should speak a portion of text with a specific voice engine, if the system includes it. The ENGID holds the names of the requested engine or engines, delimited by commas. The DATA attribute holds the text that the engine should speak; if the system does not have any of the specified engines, the synthesizer will continue speaking the text contained within the ENGINE tag with its current voice. In the following example, a synthesizer will speak I am an android with engines HAL or Robotron if it has either. If the synthesizer has neither, it will speak I am actually human in its current voice:

<ENGINE ENGID="HAL, Robotron" DATA="I am an android">I am actually human</ENGINE>

A DTD for JSML follows. It derives from the current draft of the JSML specification, although the specification does not fully define some elements. The JSML element, the document element of a JSML document, contains all the markup in the document:

<!-- jsml.dtd -->
<!-- a JSML element contains one or more PARAs -->
<!ELEMENT JSML (PARA+)>

<!-- a PARA contains at least one SENT, followed by other SENT elements, 
possibly with a BREAK in between -->
<!ELEMENT PARA (SENT, (BREAK?, SENT)*)>
<!ATTLIST PARA
	MARK	CDATA	#IMPLIED>

<!-- a SENT can contain character data mixed in with the list elements in any order -->
<!ELEMENT SENT (#PCDATA | BREAK | EMP | SAYAS | PROS | ENGINE | MARKER)*>
<!ATTLIST SENT
	MARK	CDATA	#IMPLIED>

<!-- a BREAK can take a SIZE attribute with a value of either none, small, 
medium, or large; if omitted, SIZE defaults to medium -->
<!ELEMENT BREAK EMPTY>
<!ATTLIST BREAK
	MSECS	CDATA	#IMPLIED
	SIZE	(none | small | medium | large)	"medium"
	MARK	CDATA	#IMPLIED>

<!ELEMENT EMP (#PCDATA)>
<!ATTLIS TEMP
	LEVEL	(strong | moderate | none | reduced) "moderate"
	MARK	CDATA	#IMPLIED>

<!ELEMENT SAYAS (#PCDATA)>
<!ATTLIST SAYAS
	SUB	CDATA	#IMPLIED
	CLASS	(data | digits | literal | number | time)	#IMPLIED
	PHON	CDATA	#IMPLIED
	MARK	CDATA	#IMPLIED>

<!ELEMENT PROS (#PCDATA)>
<!ATTLIST PROS
	RATE	CDATA	#IMPLIED
	VOL	CDATA	#IMPLIED
	PITCH	CDATA	#IMPLIED
	RANGE	CDATA	#IMPLIED
	MARK	CDATA	#IMPLIED>

<!ELEMENT ENGINE (#PCDATA)>
<!ATTLIST ENGINE
	ENGID	CDATA	#IMPLIED
	DATA	CDATA	#IMPLIED
	MARK	CDATA	#IMPLIED>

<!ELEMENT MARKER EMPTY>
<!ATTLIST MARKER
	MARK	CDATA	#IMPLIED>

With the above DTD, an XML processor can process and validate JSML documents, such as this example document:

<!-- speech.jsml -->
<?xml version="1.0" standalone="no"?>
<!DOCTYPE JSML PUBLIC "-//sun//DTD JSML 1.0//EN" "http://www.javasoft.com/jsml.dtd">

<JSML>

<PARA><SENT><ENGINE ENGID="Bahh" DATA="and now, a selection from 
Voltaire's Candide">Tempest and 
Earthquake</ENGINE></SENT></PARA>

<PARA><SENT>The meal was certainly a sad affair, and the guests wept as 
they ate; but Pangloss consoled them with the <EMP/>assurance that things 
could not be otherwise:</SENT> <SENT>For all this, said he, is a 
manifestation of the rightness of things, since if there <EMP/>is a volcano 
at Lisbon <BREAK/>it <EMP>could not be</EMP> anywhere 
else.</SENT> <SENT>For it is <EMP/>impossible for things not to 
be where they are, because everything is for the best.</SENT></PARA>

<PARA><SENT>A little man in black, <BREAK SIZE="small"/>an 
officer of the Inquisition, <BREAK SIZE="small"/>who was sitting beside 
Pangloss, turned to him and politely said:</SENT> <SENT><PROS 
VOL="+20%">It appears, <BREAK SIZE="none"/>Sir, that you do not believe in 
original sin; for <EMP/>if all <EMP/>is for the best, there can be 
<EMP>no such thing</EMP> as the fall of Man and eternal 
punishment.</PROS></SENT></PARA>

<PARA><SENT><PROS VOL="+20%">I <EMP/>most <EMP 
LEVEL="strong"/>humbly beg your Excellency's pardon,</PROS> replied 
Pangloss, <BREAK SIZE="small"/>still more politely,<PROS VOL="+20%" 
RATE="-20%"> but I <EMP/>must point out<BREAK SIZE="small"/> that 
the fall of Man<BREAK SIZE="small"/> and eternal punishment enter, of 
<EMP/>Necessity, into the scheme of the best of all possible 
worlds.</PROS></SENT></PARA>

<PARA><SENT><PROS VOL="+30%" PITCH="+20%">Then you don't believe 
in Free Will, <BREAK SIZE="none"/>Sir?</PROS> said the 
officer.</SENT></PARA>

<PARA><SENT><PROS VOL="+30%">Your Excellency must excuse 
me,</PROS> said Pangloss;</SENT> <SENT><PROS VOL="+30% 
RATE="-30%">Free Will <EMP/>is consistent with Absolute Necessity, for it 
was ordained that we should be free.</PROS></SENT> <SENT><PROS 
VOL="+20%" RATE="-20% RANGE="-20%"><BREAK/>For the <EMP 
LEVEL="strong"/>Will that <EMP>is 
Determined</EMP>...</PROS></SENT></PARA>

</JSML>

3 Extensible Style Language

XSL documents describe the formatting of XML documents. XSL rules transform XML documents by adding, subtracting, or moving XML markup and data, and by adding formatting tags. A browser may then render an XSL-transformed document according to these formatting tags. One can use any set of formatting tags, although currently most XSL processors use HTML. The XSL specification [7] introduces its own set of tags, Formatting Objects, derived from Cascading Style Sheets (CSS) language. The W3C intends to merge the syntax of the two tag sets; currently, they have some differences.

An XML document can request that a browser render itself with one or more stylesheets using the xml-stylesheet processing instruction. The W3C note "Associating Stylesheets with XML Documents" [8] discusses this instruction. Each xml-stylesheet instruction must contain an href attribute, the value of which represents the URI of the stylesheet. The value can also represent an XPointer (discussed later), and the stylesheet rules can constitute part of the XML document itself. An example xml-stylesheet instruction follows:

<?xml-stylesheet href="myStylesheet.xsl"?>

The xml-stylesheet instruction may also contain the attribute type. The value text/xsl denotes an XSL stylesheet, and the value text/css denotes a CSS document. The attribute title establishes a title for the stylesheet, should the browser give the user the option to choose between two or more stylesheets. An alternate attribute value of yes indicates a browser should use the stylesheet if it cannot access the stylesheets with alternate attribute values of no. The alternate attribute defaults to no. The following shows an XSL stylesheet followed by a CSS stylesheet. It makes stylesheet1.xsl a default stylesheet, and stylesheet2.css an alternate.

<?xml-stylesheet href="stylesheet1.xsl" type="text/xsl" title="Default Stylesheet"?>
<?xml-stylesheet href="stylesheet2.css" type="text/css" title="Alternate Stylesheet" alternate="yes"?>

3.1 Cascading Style Sheets Level 1

A CSS document contains formatting rules for specified tags that a browser can use to render an HTML or XML document, while maintaining the HTML or XML document's existing structure. The W3C CSS1 recommendation [9] specifies the syntax of CSS Level 1 (the simplest form of CSS). With CSS, individual HTML elements may contain formatting rules, or a CSS STYLE element may contain formatting rules for an entire HTML or XML document.

3.1.1 Basic rules

A simple CSS rule contains two parts: a "selector" and an "declaration." The selector identifies the markup tag to which the rule pertains. The declaration consists of properties of the markup tag and the corresponding values that the declaration assigns to those properties. For example, in the rule H1 {color: red}, the tag H1 represents the selector while {color: red} constitutes the declaration. This rule manipulates the color property of the H1 tag, specifying that a browser should display text enclosed by the H1 tag in red.

If a CSS rule pertains to just one individual HTML element, one encodes the rule within the HTML element itself, using the STYLE attribute of the HTML element:

<!-- example1.html -->
<H1>A browser colors the text of this heading with the browser's own default color.</H1>
<P>Some more text not influenced by a CSS rule.</P>
<H1 STYLE="color:red">A browser colors the text of this heading, and this heading only, in red</H1>
<P>Even more text not influenced by a CSS rule, and so colored with the browser's default color.</P>

The second H1 tag above contains the CSS rule that directs a browser to color its text red. If one intends for a rule to pertain to all the elements in a file, one can encode the rule within the STYLE tag, within the HEAD tag of an HTML document. In the example below, a browser will color red all text enclosed by any H1 tag (note that /* and */ delimit CSS comments):

<!-- example2.html -->
<HEAD>
	<TITLE>Example 2</TITLE>
	<STYLE TYPE="text/css">
		H1 {color: red} /* color all H1 tagged text red */
	</STYLE>
</HEAD>
<BODY>
	<H1>A browser colors the text of this heading red.</H1>
	<P>A browser colors this text with the browser's default color.</P>
	<H1>A browser also colors the text of this heading red.</H1>
</BODY>

A CSS STYLE tag always specifies the TYPE attribute with a value of text/css.

If one intends for a rule to pertain to all the elements in a file, one can also encode the rule in an external file. One can either use the @import notation within the STYLE tag, which instructs a browser to automatically render the HTML file with the imported rules, or the LINK tag, which instructs the browser to give the user the option of viewing the HTML file with the referenced stylesheet's rules. One can use both notations multiple times in one document. In the following example, the stylesheet1.css file contains:

/* stylesheet1.css */
H1 {color: red}

The stylesheet2.css file contains:

/* stylesheet2.css */
EM {color: blue}

This example3.html file imports both CSS files' rules from above, using the @import notation:

<!-- example3.html (uses @import notation) -->
<HEAD>
	<TITLE>Example Using @import Notation</TITLE>
	<STYLE TYPE="text/css">
		@import url (stylesheet1.css);
		@import url (stylesheet2.css);
	</STYLE>
</HEAD>

One passes the URI of the target stylesheet as the parameter of the @import url method. This example4.html file refers to just the first CSS file, using the LINK tag:

<!-- example4.html (uses LINK tag) -->
<HEAD>
	<TITLE>Example Using LINK Tag</TITLE>
	<LINK REL="STYLESHEET" TYPE="text/css" 
	HREF="stylesheet.css" TITLE="Option 1">
</HEAD>

With a CSS stylesheet, the LINK REL attribute always has the value STYLESHEET, and the TYPE attribute always has the value text/css. The HREF attribute holds the URI of the target stylesheet for its value, and the TITLE attribute holds the title specified by the document creator. One may give more than one tag the same rule by enumerating the tags in the selector section of a rule, separated by commas:

H1, H2, H3 {color: red}

Following this rule, a browser will display all text tagged with either H1, H2, or H3 tags in red. One may assign more than one property in a single declaration section by enumerating the individual declarations and separating them by semicolons:

H1 {font-weight: bold; font-size: 12pt; font-family: helvetica; font-style: normal;}

Following this rule, a browser will display all text tagged with H1 in a twelve point Helvetica Bold.

3.1.2 Advanced selectors

One can compose rules that use CLASS attributes, ID attributes, and contextual selectors for greater variety in formatting. Many individual HTML elements delineated by different tags may have the same CLASS attribute, but no two individual elements in a single document may share an ID attribute. The selector part of a rule may contain a CLASS or ID attribute, indicating to a browser that the rule applies only to elements with that attribute. The following example uses a CLASS attribute with the value of draft:

<!-- example5.html (uses CLASS attribute) -->
<HEAD>
	<TITLE>Example Using CLASS Attribute</TITLE>
	<STYLE TYPE="text/css">
		/* color red all elements */
		/* whose CLASS attribute is "draft" */
		.draft {color: red}
	</STYLE>
</HEAD>
<BODY>
	<H1 CLASS="draft">A browser colors the text of this heading red.</H1>
	<P CLASS="draft">A browser colors this text red.</P>
	<H1>A browser does not color the text of this heading red.</H1>
	<P CLASS="draft">But it does color this text red.</P>
</BODY>

Both H1 and P tags may have the CLASS attribute value draft. The period in front of draft in the STYLE tag indicates it uses the CLASS attribute. In the following example, using an ID selector, one and only one element in the document can have the ID attribute value redtext:

<!-- example6.html (uses ID attribute) -->
<HEAD>
	<TITLE>Example Using ID Attribute</TITLE>
	<STYLE TYPE="text/css">
		/* color red the element */
		/* whose ID attribute is "redtext" */
		#redtext {color: red}
	</STYLE>
</HEAD>
<BODY>
	<H1 ID="bluetext">The CSS rules say nothing about this ID attribute.</H1>
	<P ID="redtext">A browser colors this text red.</P>
	<H1>A browser does not color this text.</H1>
</BODY>

The pound sign (#) in front of redtext in the STYLE tag indicates that it uses the ID attribute. Note that a browser should produce an error processing the following HTML, because the same ID attribute value appears in two different elements:

<H1 ID="redtext">This element has the ID attribute value "redtext."</H1>
<P ID="redtext">And so does this one.</P>

One can concatenate tag names, CLASS attributes, and ID attributes in the selector part of a rule to establish the ancestry of the elements to which a rule applies. For example, to produce a rule that applies only to EM (emphasis) elements within H1 tags, one concatenates H1 and EM with white space in the selector:

H1 EM {color: blue}

The following example indicates that, according to the two rules established in the STYLE tag, a browser should color red all text tagged with H1, except for H1 text also tagged EM, which it should color blue. A browser will display other EM tagged text with its default settings.

<!-- example7.html (contextual selectors) -->
<HEAD>
	<TITLE>Example Using Contextual Selectors</TITLE>
	<STYLE TYPE="text/css">
		/* color red text tagged H1 */
		H1 {color: red}
		/* color blue H1 text tagged EM */
		H1 EM {color: blue}
	</STYLE>
</HEAD>
<BODY>
	<H1>A browser colors most of this text red. <EM>But it 
	colors this text blue.</EM> And this text red again.</H1>

	<P>A browser colors this text with its default color. <EM>It 
	also colors this text with its default color.</EM></P>
</BODY>

A browser will follow the rule of the contextual selector most specific for each instance of an element in a markup document. For example, with the following rules, one can specify that a browser should color blue EM tagged text within H1 tags, and color green EM tagged text within those tags:

H1 EM {color: blue}
H1 EM EM {color: green}

This holds true even if another tag, a CITE tag in the following example, subsumed the second-tier EM tag:

<H1>Some default-colored text. <EM>A browser colors this text 
blue. <CITE>It colors this text blue, too. <EM>But it colors this text 
green.</EM> Back to blue.</CITE> Still blue text.</EM> Now some 
plain old default H1 text again.</H1>

One can mix CLASS and ID attributes into the selector in the same way:

<!-- example8.html (contextual selectors) -->
<HEAD>
	<TITLE>Example Using Contextual Selectors</TITLE>
	<STYLE TYPE="text/css">
		/* (1) color red text tagged H1 */
		H1 {color: red}
		/* (2) color green text tagged H1 */
		/* with the greenhead ID attribute */
		H1#greenhead {color: green}
		/* (3) color blue text tagged H1 */
		/* and EM with the blue CLASS attribute */
		H1 EM.blue {color: blue}
		/* (4) color yellow any text tagged CITE */
		/* whose ancestor has the blue CLASS attribute */
		.blue CITE {color: yellow}
	</STYLE>
</HEAD>
<BODY>
	<H1>A browser colors this text red (rule 1). <EM>And it 
	colors this text red too.</EM> And this text red again. 
	<EM CLASS="blue">But it colors this text blue (rule 
	3).</EM></H1>

	<P CLASS="blue">A browser colors this text with its 
	default color. <CITE>But it colors this text yellow 
	(rule 4).</CITE></P>

	<H1 ID="greenhead">A browser colors this text green 
	(rule 2). <CITE>And this text green too.</CITE></H1>
</BODY>

One can indicate that certain rules have precedence in cases of conflict with the !important notation. A browser will select a declaration with the !important notation over one without the notation in every case. In the following example, a browser will color red text within an H1 tag if no other rules involving H1 tag color have an !important keyword. Notice that in rules with more than one declaration, one must mark each individual declaration with the !important notation for it to receive precedence.

H1 {color: red ! important}
H1 {font-size: 12pt}
/* note that the above declaration is equivalent to:
H1 {color: red ! important; font-size: 12pt} */

Rules in style sheets imported via the @import keyword have precedence over rules from style sheets selected via the LINK tag. Rules from stylesheets specified within the markup document have precedence over rules applied from user- or browser-specified stylesheets. Within the same stylesheet, rules that contain more ID selectors have precedence over those which have less; in the case of a tie, rules that contain more CLASS selectors have precedence over those which have less; in case of another tie, rules that contain more tag name selectors have precedence over those which have less. If all other factors equal out, a browser will use the rule specified last.

3.1.3 Pseudo-elements

Selectors can also include "pseudo-elements": elements not encoded within the markup document, but implicitly present when the browsers renders the document. These include the first-line and first-letter pseudo-elements in a block of text, and the (unvisited) link, visited, and active states of an A (anchor) tag. This means one can specify the style for the first line of body text (small-caps in the following example, like in a newspaper), or differentiate the styles of visited and unvisited links. One always adds pseudo-element selectors onto a tag, CLASS, or ID selector:

<!-- example9.html (pseudo-element selectors) -->
<HEAD>
	<TITLE>Example Using Pseudo-Element Selectors</TITLE>
	<STYLE TYPE="text/css">
		/* set first line of text tagged P in small caps */
		P.first:first-line {font-style: small-caps}

		/* color blue unvisited links */
		A:link {color: blue}

		/* color green visited links */
		A:visited {color: green}
	</STYLE>
</HEAD>
<BODY>
	<P CLASS="first">WHEN A BROWSER LAYS OUT THIS 
	paragraph, it will render the first line in small caps. 
	If the user has visited  <A HREF="been_here.html">this 
	link</A>, the browser will render it in green; if the 
	user has not visited <A HREF="not_been_here.html">this 
	link</A>, the browser will render it in blue.</P>
</BODY>

3.2 XSL Stylesheets

An XSL stylesheet contains rules which a browser can use to render specific XML elements. The XSL specification [7] describes these rules, which one encodes in XML-compatible markup. These rules transform other XML documents, potentially creating a document completely different from the original. This differs from CSS in that CSS rules merely describe how a browser should render specific XML tags; XSL rules transform specific XML tags into specific different ones, and can change the structure of the XML document. In fact, an XSL processor will actually create a new formatted document out of a plain XML file. This discussion will call the original XML document the "source" document, and the new document generated by the XSL processor the "result" document.

XSL elements belong to the namespace http://www.w3.org/TR/WD-xsl, and XSL formatting objects belong to the namespace http://www.w3.org/TR/WD-xsl/FO. The examples will always use the prefix xsl for the XSL namespace, fo for the XSL formatting objects namespace, and html for the HTML namespace. In general, one need not always use these prefixes.

3.2.1 Stylesheet structure

The stylesheet element contains all the XSL elements in an XSL document. One might declare the XSL namespaces and the formatting namespaces in this element. The result-ns attribute of the stylesheet element designates the formatting namespace. When using XSL formatting objects, an XSL document begins like this:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl"
 xmlns:fo="http://www.w3.org/TR/WD-xsl/FO" result-ns="fo">

The next example shows a stylesheet tag for HTML:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl"
 xmlns:html="http://www.w3.org/TR/REC-html40" result-ns="html">

One does not need to include the result-ns attribute in a stylesheet tag, and in fact, a stylesheet may include formatting tags from a combination of namespaces.

Within the body of the XSL document (i.e. within the body of a stylesheet tag) one can refer to external XSL rules. At the beginning of the stylesheet, directly after the stylesheet tag, one may import other stylesheets. Rules from imported stylesheets have less precedence than rules contained within the XSL document. Rules from stylesheets imported first have less precedence than rules imported later.

In the following example, stylesheet1.xsl imports stylesheet2.xsl and stylesheet3.xsl. Since stylesheet1.xsl imported stylesheet3.xsl last, rules in stylesheet3.xsl have precedence over rules in stylesheet2.xsl. Rules in stylesheet1.xsl have precedence over those in the other two.

<!-- stylesheet1.xsl -->
<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl"
 xmlns:fo="http://www.w3.org/TR/WD-xsl/FO" result-ns="fo">
	<xsl:import href="stylesheet2.xsl"/>
	<xsl:import href="stylesheet3.xsl"/>
	<!-- local XSL rules go here -->
</xsl:stylesheet>

Note that the import tag, as an empty element tag, ends with a slash. The href attribute takes the URI of the stylesheet to import.

At any point in an XSL document, one can include external stylesheets. An XSL processor treats included rules as if they actually occur at the location of the tag that includes them. For example, if stylesheet2.xsl contains &ruleB;, and stylesheet1.xsl includes stylesheet2.xsl, an XSL processor will insert &ruleB; into the same location at which stylesheet1.xsl included stylesheet2.xsl. In this example, one should assume that an XSL processor will substitute valid XSL rules for the entity references &ruleA;, &ruleB;, and &ruleC;.

<!-- stylesheet1.xsl -->
<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl"
 xmlns:fo="http://www.w3.org/TR/WD-xsl/FO" result-ns="fo">
	&ruleA; <!-- local XSL rule -->
	<xsl:include href="stylesheet2.xsl"/>
	&ruleC; <!-- another local XSL rule -->
</xsl:stylesheet>

An XSL processor would include &ruleB; like this:

<!-- stylesheet1.xsl -->
<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl"
 xmlns:fo="http://www.w3.org/TR/WD-xsl/FO" result-ns="fo">
	&ruleA; <!-- local XSL rule -->
	&ruleB; <!-- XSL rule included from stylesheet2.xsl -->
	&ruleC; <!-- another local XSL rule -->
</xsl:stylesheet>

3.2.2 Template rules

Within the stylesheet tag, template element tags contain the actual style rules. An XSL processor matches tags within a source XML document to the match attribute of template tags. Call the match attribute value the "match pattern." Upon finding a match, the XSL processor adds formatting tags to the result XML document. If a processor found the tag title in an XML document, it might match it to the following rule:

<xsl:template match="title">
	<html:H1 ALIGN="CENTER">
		<xsl:apply-templates/>
	</html:H1>
</xsl:template>

This rule instructs a processor to add an html:H1 tag to the result document, and then continue processing the title tag's children, and apply the appropriate templates to them, the formatting of which will appear within the html:H1 start and end tags.

An XSL processor treats an XML document as tree. Like the DOM, it represents elements that enclose other elements as parent nodes to the enclosed child elements, and elements that enclose text as parent nodes to the enclosed child text. For the following example, which uses the template rule above, assume that the source XML file has this line:

<title>A Tale of Two Cities</title>

The title tag's only child is the string A Tale of Two Cities, so an XSL processor would add this string to the result document enclosed in html:H1 tags:

<html:H1 ALIGN="CENTER">A Tale of Two Cities</html:H1>

Note that the original title element does not appear in the result document, because the template did not instruct the XSL processor to add it. This example does:

<xsl:template match="title">
	<title><html:H1 ALIGN="CENTER">
		<xsl:apply-templates/>
	</html:H1></title>
</xsl:template>

In a more extended example, let the XSL file contain these rules:

<!-- stylesheet2.xsl -->
<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl"
 xmlns:html="http://www.w3.org/TR/REC-html40" result-ns="html">

<!-- rule 1: format title tag data within an H1 -->
<xsl:template match="title">
	<html:H1 ALIGN="CENTER">
		<xsl:apply-templates/>
	</html:H1>
</xsl:template>

<!-- rule 2: format body tag data within a P-->
<xsl:template match="body">
	<html:P>
		<xsl:apply-templates/>
	</html:P>
</xsl:template>

<!-- rule 3: format italic tag data with an I -->
<xsl:template match="italics">
	<html:I>
		<xsl:apply-templates/>
	</html:I>
</xsl:template>

</xsl:stylesheet>

Let the source XML file contain this:

<!-- source2.xml -->
<story>
	<title>A Thrilling Mystery</title>
	<body>It was a dark a stormy night. <italics>Suddenly,
	</italics> a shot rang out.</body>
</story>

An XSL processor would transform the source2.xml document above using the rules in the stylesheet2.xsl document to generate the following result2.html document:

<!-- result2.html -->
<html:H1 ALIGN="CENTER">A Thrilling Mystery</html:H1>
<html:P>It was a dark and stormy night. <html:I>Suddenly,
</html:I> a shot rang out.</html:P>

The XSL processor first fails to match the story tag, and so, by default, it continues processing the story tag's children as if it had found the following template in the stylesheet2.xsl file:

<xsl:template match="story">
	<xsl:apply-templates/>
</xsl:template>

Next, the processor matches the title tag with rule 1 of the XSL document. It adds the H1 tag to the result document, and then the child of the title tag: the text A Thrilling Mystery. Finding no other children, the XSL processor goes on to the body tag, which matches rule 2. It adds the P tag to the result document, and then the body tag's children: the text It was a dark and stormy night., the italics tag, and the text a shot rang out.. Finally, the processor matches the italics tag to rule 3, and adds the I tags, and its child, the text Suddenly,, to the result document.

The match attribute can also include more complicated patterns of elements. One can designate a template to match more than one element by listing the elements in the match pattern, concatenated by bar (|) operators. For example, this template would match both element1 and element2 tags:

<xsl:template match="element1 | element2">

3.2.3 Ancestry patterns

One can also specify a template rule to match only elements with a certain ancestry. The / operator within the match pattern indicates that the left element must parent the right element. In the next example, a myGreatgrandparent tag parents a myGrandparent tag, a myGrandparent tag parents a myParent tag, and a myParent tag parents a myElement tag:

<!-- sequence1a.xml -->
<myGreatgrandparent>
	<myGrandparent>
		<myParent>
			<myElement/>
		</myParent>
	</myGrandparent>
</myGreatgrandparent>

Then an XSL processor may match the following template with the myElement tag:

<!-- template A -->
<xsl:template match="myGrandparent/myParent/myElement">

The processor will only do so if the last template processed, the one that made the apply-templates request which initiated the template match trial on this template, was a template for a myGreatgrandparent element, such as this:

<!-- template B -->
<xsl:template match="myGreatgrandparent">
	<xsl:apply-templates/>
</xsl:template>

If not denoted otherwise, patterns implicitly begin with the current node; in the above example, myGreatgrandparent represents the current node when the processor works on the template B apply-templates request. In this case, template A matches. When an XSL processor works on the apply-templates request in template C,

<!-- template C -->
<xsl:template match="myParent">
	<xsl:apply-templates/>
</xsl:template>

it will not match template A because the match pattern in template A begins with a parent of myParent, and not a child.

A dot (.) explicitly indicates the current node; the following two templates produce the same result:

<!-- templates D -->
<xsl:template match="myParent">
<xsl:template match="./myParent">

Two dots (..) indicates a parent. For sequence1a.xml, template E (below) produces the same result as the templates D above. For other XML sequences, template E differs from the first two template rules in that to match it, an XSL processor must have myGrandparent as the current node.

<!-- template E -->
<xsl:template match="../myGrandparent/myParent">

In the below sequence1b.xml, when an XSL processor has as the current element myStepgrandparent, it can match templates D, but not template E.

<!-- sequence1b.xml -->
<myStepgrandparent>
	<myParent>
		<myElement>
	</myParent>
	<myUncle/>
</myStepgrandparent>

The ancestor() function works similarly to the parent notation; templates F all produce the same result (matching myUncle) for a current node of myElement with sequence1b.xml:

<!-- templates F -->
<xsl:template match="../../myUncle">
<xsl:template match="ancestor(myStepgrandparent)/myUncle">
<xsl:template match=".ancestor(myStepgrandparnt)/myUncle">

The ancestor-or-self() function duplicates the results of the ancestor() function, but it will also match

<xsl:template match="ancestor(myStepgrandparent)/myUncle">

when the processor has myStepgrandparent as the current node. To match the root element of an XML document, one can simply use the match pattern /, as in:

<xsl:template match="/">

The // operator indicates any number of descendants could intervene between the element on the left and the element on the right. The following template, with a current node of myGreatgrandparent, would also match sequence1a.xml given above:

<xsl:template match="myGrandparent//myElement">

The following matches all myElement tags in a document, no matter what the processor has as the current element:

<xsl:template match="//myElement">

The * character, acting as a wild card, can take the place of an element name in a match pattern. For example, the following two templates would match myElement in sequence1a.xml given above (again with a current node of myGrandparent):

<xsl:template match="myGrandparent/*/myElement">
<xsl:template match="myGrandparent/myParent/*">

The * character only takes the place of one element level, so that myGrandparent/*/myElement matches a different sequence than does myGrandparent/*/*/myElement.

One can use an id() pattern to match an element with a specific ID, regardless of what the XSL processor has as the current element. Recall that no two individual elements in an XML document can have the same ID attribute value. To aid XSL processors that do not validate XML documents against their DTDs, the XSL id element can specify the name of the ID attribute for all elements, or just specific ones. In the following example, the first id element indicates that D elements use the identifier attribute to denote their unique ID; the second id element indicates that all other elements use the id attribute to indicate their ID:

<xsl:id attribute="identifier" element="D"/>
<xsl:id attribute="id"/>

For the next set of examples, refer to this XML sequence:

<!--sequence2.xml -->
<A>
	<B id="firstB"/>
	<C id="firstC"/>
	<B id="secondB"/>
	<D identifier="firstD">thirdB thirdC</D>
</A>
<A>
	<C id="secondC"/>
	<B id="thirdB"/>
	<C id="thirdC"/>
</A>

Using an id() pattern, the following template matches the second B tag in the sequence2.xml document above:

<xsl:template match="id('secondB')">

The next template matches both the first and second B tags:

<xsl:template match="id('firstB secondB')">

The next template contains a match pattern within the id() method that locates an element, the text of which denotes the actual ID attributes that an XSL processor should try to match. This pattern matches the only D element in the above sequence, the text of which matches the ID attributes of the third B and C elements:

<xsl:template match="id(A/D)">

This works because an XSL processor will expand the above template match pattern to:

<xsl:template match="id('thirdB thirdC')">

3.2.4 Qualified patterns

One can add qualifier expression to any element in a match pattern. An XSL processor only matches a qualified element if the expression results in a true value. One adds qualifiers in brackets ([]) directly after the element name that the qualifiers modify. For example, the first-of-type() qualifier indicates that a processor will only match this template with a B element if the B comes first in the list of the current element's children.

<xsl:template match="B[first-of-type()]">

So the first B in each A (with id values firstB and thirdB) in the above sequence2.xml document will match this template, if an XSL processor has A as the current element, because they come first among siblings of their type. If one uses the first-of-any() qualifier, such as in

<xsl:template match="B[first-of-any()]">

only the B with the id value of firstB would match the above sequence2.xml. The B with the id value thirdB would not match because a sibling, the C with id value secondB, comes before it. One can use the last-of-type() and the last-of-any() qualifiers similarly.

One can add the not() function to a qualifier expression to negate the boolean value produced within its parentheses. The template below will select the C elements with id values of firstC and thirdC, each of which have siblings who come before:

<xsl:template match="C[not(first-of-any())]">

Also using boolean logic, the and and or operators combine values from segments of qualifier expressions. For example, the template below matches a C element only if it has no siblings of its type (the firstC in sequence2.xml):

<xsl:template match="C[first-of-type() and last-of-type()]">

One can also use patterns within qualifier expressions. A qualifier pattern implicitly begins with the element that the XSL processor has selected for matching, not the previous element that made the apply-templates request, as with the template match pattern. An XSL processor will match the following template only with an element C that has at least one element B as a sibling:

<xsl:template match="C[../B]">

To use a qualifier expression to select only specific elements that have character data equal to a given string, one inserts an equals sign after the pattern that selects the element, followed by the string in quotation marks. An XSL processor will match the following template only when an element C has a sibling B that contains the character data supercalafragleisticexpealidoceus:

<xsl:template match="C[../B='supercalafragleisticexpealidoceus']">

This next template tests just element C for the same string:

<xsl:template match="C[='supercalafragleisticexpealidoceus']">

A similar test also works for element attributes. In a pattern, one can indicate an attribute by prepending an at sign (@) to the front of an attribute name. Thus the first template below will select only hotdog elements with a meat attribute, and the second will only select those with a meat attribute value of turkey:

<xsl:template match="hotdog[@meat]">
<xsl:template match="hotdog[@meat='turkey']">

One can use attributes in template match patterns as well. This causes the XSL processor to actually select an attribute, not its parent element. For the following XML sequence, two_meals.xml,

<!-- two_meals.xml -->
<plan type="A">
	<?hotdogmarkup version="12a"?>
	<meal>
		<hotdog meat="beef" bun="white/>
	</meal>
	<meal>
		<hotdog meat="turkey"/>
	</meal>
</plan>

this XSL template

<xsl:template match="meal/hotdog@bun">
	<html:P>
		<apply-templates/>
	</html:P>
</xsl:template>

will produce this output, having selected the bun attribute of the first hotdog:

<html:P>white</html:P>

One can also select actual comments and processing instructions with the comment() and pi() tags, respectively. This comment() example,

<xsl:template match="comment()">
	<html:I><apply-templates/></html:I>
</xsl:template>

when applied over two_meals.xml, produces the text contained in the comment:

<html:I> two_meals.xml </html:I>

This pi() example,

<xsl:template match="plan/pi('hotdogmarkup')">
	<html:H1><apply-templates/></html:H1>
</xsl:template>

when applied over two_meals.xml, produces the text in the processing instruction following the hotdogmarkup target identifier:

<html:H1>version="12a"</html:H1>

If more than one template matches a tag, an XSL processor will choose first the one that it found locally within the stylesheet, or that it included in the stylesheet, over one imported from another file. Then the processor will choose the rule that has an id() match pattern over one that has none. Next the processor will choose the rule that has more qualifiers over one that has less. After that, the processor will select the template that has a greater priority attribute value. The priority attribute values can range from 2147483647 to -2147483647. Finally, the processor will choose the rule that it found defined later in the stylesheet.

3.2.5 Applying results

The most basic kind of template result adds some formatting tags, and then directs processing of child elements and data with the apply-templates tag. XSL rules can also do much more than this. The variety of different actions a template can direct makes XSL more powerful than CSS. With the select attribute, an apply-templates command can direct an XSL processor to process elements other than just the current element's children. The select attribute takes a pattern just like the match attribute of the template element. This pattern begins implicitly with current element. In the following example, the XSL processor will select to processes next the mySibling sibling of the current element myElement:

<xsl:template match="myElement">
	<html:H1>
		<xsl:apply-templates select="../mySibling"/>
	</html:H1>
</xsl:template>

One can also create different template rules for the same match pattern, and use the mode attribute with both the template element and the apply-templates element to apply the different rules at different points along the process. In the following example, the first time the myElement rule selects the mySibling rule, it specifies the myheading mode, and so selects the first mySibling rule below. The second time the myElement rule selects a mySibling rule, it doesn't provide a mode attribute value, and so uses the second, default, mySibling template:

<xsl:template match="myElement">
	<html:H1>
		<xsl:apply-templates select="../mySibling" mode="myheading"/>
	</html:H1>
	
	<html:H3>
		<xsl:apply-templates select="myChild" mode="mysubheading"/>
	</html:H3>
	
	<html:P>
		<xsl:apply-templates select="../mySibling" />
	</html:P>
</xsl:template>

<xsl:template match="mySibling" mode="myheading">
	<!--extracts value of 'name' attribute of mySibling-->
	<xsl:value-of select="@name"/>
</xsl:template>

<!-- the "default" mode mySibling template -->
<xsl:template match="mySibling" >
	<html:B>
		<xsl:apply-templates/>
	</html:B>
</xsl:template>

Within a template, one can also process other elements directly. Using the for-each XSL element, one can simulate a separate template rule. One can rewrite the myElement rule directly above without separate mySibling rules, like this:

<xsl:template match="myElement">
	<html:H1>
		<xsl:for-each select="../mySibling">
			<!-- value of 'name' attribute from mySibling -->
			<xsl:value-of select="@name"/>
		</xsl:for-each>
	</html:H1>

	<html:H3>
		<xsl:for-each select="myChild">
			<!--do the myChild stuff-->
		</xsl:for-each>
	</html:H3>

	<html:P>
		<xsl:for-each select="../mySibling">
			<xsl:apply-templates/>
		</xsl:for-each>
	</html:P>
</xsl:template>

An XSL processor will treat the elements contained within each for-each element as if they belonged within a separate template rule, where patterns implicitly start at the element selected with the select pattern. Hence the pattern @name selects the name attribute of the mySibling element, and not of the myElement element. In the next example, not using the for-each tag, an XSL processor uses the list-item template to format each list-item element:

<xsl:template match="list">
	<html:UL>
		<xsl:apply-templates select="list-item">
	</html:UL>
</xsl:template>

<xsl:template match="list-item">
	<html:LI>
		<!-- output list item data -->
		<xsl:apply-templates/>
	</html:LI>
</xsl:template>

Assume that a list element parents list-item elements. In this next example, an XSL processor formats list-item elements according to the formatting specified within the list template, using a for-each element instead of a list-item template:

<xsl:template match="list">
	<html:H3>
		<xsl:for-each select="list-item">
			<html:P>
				<!-- output list item character data -->
				<xsl:apply-templates/>
			</html:P>
		</xsl:for-each>
	</html:H3>
</xsl:template>

Using the list rule with the for-each tag, a processor encloses each list-item element's character data within HTML paragraph tags, rather than within HTML list item tags.

3.2.6 Special result tags

One can use the if XSL tag to instruct a processor to complete actions only if the XSL document has a tag of a certain kind in a certain place. The if tag has an attribute test, which takes a match pattern as its value. For example, the following myElement template processes its children only if has no siblings:

<xsl:template match="myElement">
	<xsl:if test=".[first-of-any() and last-of-any()]"
		<xsl:apply-templates>
	</xsl:if>
</xsl:template>

Note that the test attribute value of the if element begins with a period, specifying that the qualifier modifies the current element. A processor would declare an error if it found a test value consisting simply of [only-of-any()] without the period.

One can build conditional structures like C "switch" statements using the choose, when, and otherwise elements. A choose element contains the when and otherwise elements. A when element takes a test attribute, similar to an if element. Each choose element can have only one otherwise element, which contains the actions a processor should take if it selects none of the when elements. In the following example, an XSL processor would perform only the actions contained within the first when element if the myElement had no siblings, or if it was first among its siblings. It would perform the actions contained within the middle when element only if the myElement was listed last among its siblings. If the parent of the myElement had more than two children, and listed myElement in the middle, the XSL processor would perform the actions contained within the otherwise element:

<xsl:template match="myElement">
	<xsl:choose>
		<xsl:when test=".[first-of-any()]">
			<html:H2><xsl:apply-templates/></html:H2>
		</xsl:when>

		<xsl:when test=".[last-of-any()]">
			<html:H5><xsl:apply-templates/></html:H5>
		</xsl:when>

		<!-- if neither the first two were selected -->
		<xsl:otherwise>
			<html:P><xsl:apply-templates/></html:P>
		</xsl:otherwise>
	</xsl:choose>
</xsl:template>

In addition to adding formatting tags like <html:H1> to a document, an XSL stylesheet can specify literal text for an XSL processor to include in the result document. The XSL text tag encloses literal text:

<xsl:template match="myName">
	<html:P>
		<xsl:text>Name: </xsl:text>
		<xsl:apply-templates/>
	<html:P>
</xsl:template>

The above example would add the string Name: in front of whatever text a myName tag contained.

Also instead of adding formatting tags directly, one can indicate that an XSL processor should add tags of a certain type using the element tag. The element tag takes an attribute name, which specifies the new element's name. The following template rules produce the same result:

<xsl:template match="myElement">
	<html:P>
		<xsl:apply-templates/>
	</html:P>
</xsl:template>

<xsl:template match="myElement">
	<xsl:element name="html:P">
		<xsl:apply-templates/>
	</xsl:element>
</xsl:template>

The copy element works similarly to the element element, but it creates a new element of the same kind as the current element. The following template

<xsl:template match="myElement">
	<xsl:copy>
		<xsl:text>Hello!</xsl:text>
	</xsl:copy>
</xsl:template>

produces

<myElement>Hello!</myElement>

in the result document. The copy element does not copy the attributes of an element, nor its character data or child elements. One can use the attribute element within an element or copy element to add attributes to a created element. The name attribute value of the attribute element takes the attribute's name; the attribute value comes from the character data within its start and end tags:

<xsl:template match="myElement">
	<xsl:element name="html:P">
		<xsl:attribute name="ALIGN">CENTER</xsl:attribute>
		<xsl:apply-templates/>
	</xsl:element>
</xsl:template>

Similarly, one can use the comment element to create comments. One places the comment text between the start and end tags:

<xsl:template match="myElement">
	<xsl:comment> This is my comment </xsl:comment>
</xsl:template>

The above template produces

<!-- This is my comment -->

in the result document. One can use the pi tag to create processing instructions. The target application of the processing instruction constitutes the value of the name attribute; one places the other information within the start and end tags:

<xsl:template match="myElement">
	<xsl:pi name="xml">version="1.0"</xsl:pi>
</xsl:template>

The above template produces

<?xml version="1.0">

in the result document.

3.2.7 Extracting character data

To extract the character data contained within an element or an element attribute, one can use the value-of tag. To extract the text of an element, one sets the select attribute value of the value-of tag to a pattern which represents the element:

<xsl:template match="myElement">
	<html:I>
		<xsl:value-of select="../mySibling"/>
	</html:I><html:BR>
</xsl:template>

When, with the above template, an XSL processor processes myElement in the following XML sequence

<myParent>
	<myElement name="Sibling Rivalry"/>
	<mySibling>I am not my brother's keeper.</mySibling>
</myParent>

it produces

<html:I>I am not my brother's keeper.</html:I><html:BR>

One can extract the character data of an attribute in the same way. The following template, when applied to the above XML sequence,

<xsl:template match="myElement">
	<html:H3><xsl:value-of select="@name"/></html:H3>
</xsl:template>

produces

<html:H3>Sibling Rivalry</html:H3>

As with any pattern, one can use various qualifiers to select an element or an attribute more specifically.

The value-of tag can also extract the value of a string constant. One can define a constant with the tag define-constant. The name attribute specifies the name of the constant; the value attribute specifies its value. The value-of element can extract a constant value when given the select attribute value of constant(myConstant), where myConstant refers to the name of a string constant. The following example defines the constant date-revised with the value Feb 10, 1998:

<xsl:define-constant name="date-revised" value="Feb 10, 1998"/>

The next example shows the value-of tag extracting the constant date-revised:

<xsl:value-of select="constant(date-revised)"/>

An XSL processor would replace the above value-of tag with the text Feb 10, 1998.

Occasionally, one wishes to extract the value of an element's text data, an attribute, or a string constant within an attribute. Obviously, one cannot include a value-of tag within another attribute, like the following:

<html:TD HEIGHT="<xsl:value-of select='constant(defaultHeight)'/>">

Instead, one must enclose within curly braces ({}) what one normally would place in the value-of element select attribute, omitting altogether the of value-of element syntax. In the following example, an XSL processor would replace the {constant(defaultHeight)} notation with the string 100:

<html:TD HEIGHT="{constant(defaultHeight)}"/>

Within element markup, one still uses the value-of notation

<xsl:template match="myElement">
	<xsl:value-of select="constant(defaultHeight)"/>
</xsl:template>

to extract the same value. One uses the curly braces only within start and empty element tags themselves, never outside of them.

3.2.8 Counting

Stylesheets can also specify literal numbers to add to a result document, which a processor can generate by calculating a matched element's position in the source document. To do so, one uses the number tag, which takes several attributes. The count attribute takes a match pattern which specifies the types of elements the processor should count. Call this pattern the "count-match pattern." By default, a count-match pattern mimics the match pattern of the current template rule's match attribute.

The level attribute of the number element can take values of either single (the default), multi, or any. The value single specifies that a processor should count only the siblings of count-matched elements closest to the template-matched element that also match the count pattern. For the XML document

<!-- number_example.xml -->
	<a>
	<b></b>
	<b>
		<b></b> <!-- not matched by multi -->
		<e/>
	</b>
	<e/>
	<b>
		<e/>
		<b></b>
		<b></b>

		><!-- third b in a single level, sixth in multi, 
		seventh of any (multi & any explained later) -->
		<b><c/></b>

		<b></b>
	</b>
	<b></b>
</a>

when a processor comes across this rule

<xsl:template match="c">
	<xsl:number	level="single"
			count="b"/>
</xsl:template>

the processor will add the number 3 to the result document, because the b element containing the c element on which the processor currently works has two b element siblings before it.

The level attribute value multi produces the total number of count-matched siblings of every count-matched ancestor (including itself) that comes before the template-matched element. For the XML document number_example.xml, when a processor comes across this rule:

<xsl:template match="c">
	<xsl:number	level="multi"
			count="b"/>
</xsl:template>

it will add the number 6 to the result document, because the b element containing the c element comes in third among its b siblings, and its ancestor b comes in third among its b siblings. The level attribute value any counts all count-matched elements that physically come before the template-matched element in the document. With the number_example.xml, when a processor works on the rule:

<xsl:template match="c">
	<xsl: number	level="any"
			count="b"/>
</xsl:template>

it will add the number 7 to the result document, because seven b elements appear before the c element in the number_example.xml document.

The from attribute, with a match pattern called the "from-match pattern," indicates at what element a processor should start counting. One can think of it as selecting a subtree of the XML document for the XSL processor to consider. For level attribute values single and multi, a processor begins counting with the descendants of the nearest from-match to the count-matched element. For the level attribute value any, a processor begins counting with the nearest from-match to the count-matched element.

The first number example below shows how an XSL stylesheet can implement a numbered list. Assume that a list element has several list-item children.

<xsl:template match="list-item">
	<html:P>
		<xsl:number	level="single"
				count="list-item"
				from="list"/>
		<xsl:text>. </xsl:text>
		<xsl:apply-templates/>
	<html:P>
</xsl:template>

In the above case, the level and count attribute values duplicate their default values. The processor counts only list-items, specified by the count="list-item" attribute; it counts only one level (in this case, its own siblings), as requested by the level="single" attribute; and it counts only list-item elements parented by list elements because of the from="list" attribute. A list formatted with the above rule would look like this if the three list-item tags in the list contained First item, Second item, and Third item respectively:

First item
Second item
Third item

The next number example shows how a stylesheet can number chapters, sections, and subsections in the style of this document (i.e. chapter 3 Extensible Stylesheet Language, section 3.2 XSL Stylesheets, and subsection 3.2.8 Counting). Assume that a chapter element has a chapter-title child, a section element has a section-title child, and a sub-section element has a sub-section-title child. Every number tag in this example has a single level attribute value by default:

<xsl:template match="chapter-title">
	<html:P><html:B>

		<xsl:number count="chapter">
		<xsl:text> </xsl:text>
		<xsl:process-children/>

	</html:B></html:P>
</xsl:template>

<xsl:template match="section-title"> 
		<html:P><html:B><html:I>

		<xsl:number count="chapter">
		<xsl:text>.</xsl:text>

		<xsl:number count="section" from="chapter">
		<xsl:text> </xsl:text>
		<xsl:apply-templates/>

	</html:P></html:B></html:I>
</xsl:template>

<xsl:template match="sub-section-title"> 
	<html:P><html:I>

		<xsl:number count="chapter">
		<xsl:text>.</xsl:text>

		<xsl:number count="section" from="chapter">
		<xsl:text>.</xsl:text>

		<xsl:number count="sub-section" from="section">
		<xsl:text> </xsl:text>
		<xsl:apply-templates/>

	</html:P></html:I>
</xsl:template>

The final sub-section-title template produces a heading with three numbers, the first based on the number of chapter elements that come before its ancestor chapter, the next on the section elements in its chapter that come before its ancestor section, and the last one the subsection elements in its section that come before it.

The format attribute of the number element indicates the kind of numbering format that an XSL processor should use. A format value 1 produces a sequence 1, 2, 3,..., 99, 100, 101,...; a format value 01 produces a sequence 01, 02, 03,..., 09, 10, 11,..., 99, 100, 101,...; a format value I produces a sequence I, II, III,..., IX, X, XI,..., IC, C, CI,...; a format value i produces a sequence i, ii, iii,..., ix, x, xi,..., ic, ci, ci,...; a format value A produces a sequence A, B, C,..., Z, AA, AB,...; a format value a produces a sequence a, b, c,..., z, aa, ab,....

3.2.9 Macros

In XSL, one can define formatting macros as shorthand for oft-used formatting sequences. To do so, one uses the define-macro tag, which has an attribute name that holds the name of the macro. When one invokes the macro, within template tags, one uses the invoke-macro tag, which also has an attribute name. This attribute holds the name of the macro one wishes to invoke. Since the invoke-macro tag can enclose more XSL and formatting objects, a define-macro tag can enclose a contents tag, which indicates where the invoke-macro tag contents should go. For example, this macro adds an EM tag and the text Note: . After Note: the XSL processor will place whatever markup the invoke-macro tags have in them.

<xsl:define-macro name="note">
	<html:EM>
		<xsl:text>Note: </xsl:text>
		<xsl:contents/>
	</html:EM>
</xsl:define-macro>

The following invokes the above macro:

<xsl:template match="myNoteworthyElement">
	<html:P>
		<xsl:invoke-macro name="note">
			<xsl:apply-templates/>
			<xsl:text>!!!</xsl:text>
		</xsl:invoke-macro>
	</html:P>
</xsl:template>

An XSL processor will replace the invoke-macro tag with whatever the define-macro tag encloses:

<xsl:template match="myNoteworthyElement">
	<html:P>
		<html:EM>
		<xsl:text>Note: </xsl:text>
		<xsl:apply-templates/>
		<xsl:text>!!!</xsl:text>
		</html:EM>
	</html:P>
</xsl:template>

A macro can also take arguments. To declare an argument within a define-macro tag, one uses the macro-arg tag. Tha macro-arg tag has an attribute name that names the argument, and an attribute default that holds the default value for the argument. One can then use the value of the argument within the macro body by using the tag value-of, and setting the select attribute to arg(myArgumentName), where myArgumentName represents the name of the argument. In the following example, the note macro takes a kind argument, which defines most of the text encapsulated by the text tag:

<xsl:define-macro name="note">
	<xsl:macro-arg name="kind" default="Note"/>
	<html:EM>
		<!-- kind arg value replaces value-of tag -->
		<xsl:text><xsl:value-of select="arg(kind)"/>: </xsl:text>
		<xsl:contents/>
	</html:EM>
</xsl:define-macro>

One can invoke the above macro with the following example template. The macro-arg tag passes the value of the argument with the value attribute.

<xsl:template match="myNoteworthyElement">
	<html:P>
		<xsl:invoke-macro name="note">
			<xsl:macro-arg name="kind" value="Warning"/>
			<xsl:apply-templates/>
			<xsl:text>!!!</xsl:text>
		</xsl:invoke-macro>
	</html:P>
</xsl:template>

The text Warning will replace the value-of tag in the define-macro element above. When an XSL processor substitutes the macro, the above template rule will look like this:

<xsl:template match="myNoteworthyElement">
	<html:P>
		<html:EM>
			<xsl:text>Warning: </xsl:text>
			<xsl:apply-templates/>
			<xsl:text>!!!</xsl:text>
		</html:EM>
	</html:P>
</xsl:template>

One can also define a macro composed of a set of formatting attributes. One specifies the formatting attributes within an attribute-set tag, which resides within a define-attribute-set element:

<xsl:define-attribute-set name="title-table-cell">
	<xsl:attribute-set ALIGN="CENTER" VALIGN="TOP" WIDTH="100"/>
</xsl:define-attribute-set>

The above example defines the ALIGN, VALIGN, and WIDTH attributes as part of the title-table-cell attribute set. One can use these attributes in a formatting element by adding an xsl:use attribute with the name of the attribute set:

<html:TD xsl:use="title-table-cell" HEIGHT="50"/>

An XSL processor will replace the xsl:use attribute in the above example with the title-table-cell attribute set:

<html:TD ALIGN="CENTER" VALIGN="TOP" WIDTH="100" HEIGHT="50"/>

An XSL processor would have simply removed the xsl:use attribute if the tag had already contained one of the attributes that the attribute set specified, like ALIGN.

3.3 Case Study: Docproc

Docproc, a Java software package, processes XML documents according to XSL rules, formatting them in regular HTML. Sean Russell, author of Docproc, created the Docproc documentation HTML file from a Docproc-processed XML file [10]. This file and its stylesheet provide the source for this study. Note that the stylesheet uses some XSL syntax that more recent drafts of the W3C's XSL Specification has rendered obsolete. This study presents the Docproc stylesheet selections in updated XSL form, and with HTML formatting replacing the obsolete formatting objects of the original XSL proposal.

3.3.1 `body`, `section`, and `topic` tags

The body tag contains the bulk of the XML file. Russell has further subdivided the body into sections (tagged section) and subsections (tagged topic). An example body:

<!-- docproc_sample.xml -->
<body title="My Document Title">

	<section title="Section One Title">
		<paragraph>This is some text that introduces 
		section one.</paragraph>

		<topic title="Topic One Title">
			<paragraph>This is some text that discusses 
			topic one.</paragraph>
		</topic>
	</section>

	<section title="Section Two Title">
		<paragraph>This is some text that discusses section 
		two.</paragraph>
	</section>

</body>

The XSL file specifies that a browser should render the body as a table without visible borders, and with two columns, the larger on the right:

<xsl:template match="body">
	<html:TABLE BORDER="0" >
		<html:TR> <!-- table row -->
			<html:TD WIDTH="15%"></html:TD> <!-- cell -->
			<html:TD WIDTH="85%"></html:TD> <!-- cell -->
		</html:TR>
		<xsl:apply-templates/>
	</html:TABLE>
</xsl:template>

The two empty table cells establish the widths of the two columns that they define. An XSL processor will place the children of the body tag into the two columns according to the section rules.

The XSL file directs a processor to create a new row in the table specified by the body tag for each section element. In the first cell, the one on the left, a processor should insert the section title, which the tag's title attribute specifies. A processor should also add a named anchor to that cell with the section title as its name. In the right cell, the browser should insert what the section tag holds.

<xsl:template match="section"/>
	<html:TR> <!-- new table row -->

		<html:TD ALIGN="RIGHT"> <!-- table cell (left) -->
			<!-- named anchor which uses the title -->
			<!-- attribute value of this section -->
			<html:A NAME="{@title}"></html:A>

			<html:H3> <!-- heading sequence -->
				<xsl:value-of select="attribute(title)"/>
			</html:H3>
		</html:TD>

		<html:TD> <!-- table cell (right) -->
			<xsl:apply-templates/>
		</html:TD>

	</html:TR>
</xsl:template>

Since section tags contain all topic tags, an XSL processor will lay out a topic element within the right cell of the body table. The template element for the topic tag indicates that the browser should place the topic's heading (again found in the title attribute of the topic tag) in a paragraph above the rest of the topic content. Whatever text the topic holds falls below. Like a section rule, the topic rule instructs a processor to create a named anchor, named with the topic's title.

<xsl:template match="topic">
	<!-- named anchor that uses the title -->
	<!-- attribute value of this topic -->
	<html:A NAME="{@title}"></html:A>

	<!-- heading sequence w/italics -->
	<html:H3><html:I>
		<xsl:value-of select="attribute(title)"/>
	</html:I></html:H3>

	<xsl:apply-templates/>
</xsl:template>

Assume that the Docproc stylesheet also has this trivial rule:

<xsl:template match="paragraph">
	<html:P><xsl:apply-templates/></html:P>
</xsl:template>

By processing the docproc_sample.xml file with the above template rules, an XSL processor would produce the docproc_sample.html document below:

<!-- docproc_sample.xml after processing -->
<html:TABLE BORDER="0">
	<html:TR>
		<html:TD WIDTH="15%"></html:TD>
		<html:TD WIDTH="85%"></html:TD>
	</html:TR>

	<html:TR>
		<html:TD ALIGN="RIGHT">
			<html:A NAME="Section One Title"></html:A>

			<html:H3>
				Section One Title
			</html:H3>
		</html:TD>

		<html:TD>
			<html:P>This is some text that introduces 
			section one.</html:P>

			<html:A NAME="Topic One Title"></html:A>

			<html:H3><html:I>
				Topic One Title
			</html:I></html:H3>

			<html:P>This is some text that discusses topic 
			one.</html:P>
		</html:TD>
	</html:TR>

	<html:TR>
		<html:TD ALIGN="RIGHT">
			<html:A NAME="Section Two Title"></html:A>

			<html:H3>
				Section Two Title
			</html:H3>
		</html:TD>

		<html:TD>
			<html:P>This is some text that discusses 
			section two.</html:P>
		</html:TD>
	</html:TR>
</html:TABLE>

This simple complete body would appear like the table below when rendered by an HTML browser (except without visible cell borders):

Section One Title

This is some text that introduces section one.

Topic One Title

This is some text that discusses topic one.

Section Two Title

This is some text that discusses section two.

3.3.2 Lists

With his list rule, Russell shows how one can easily change the way a browser displays markup. For rendering his list tag to HTML, Russell could have chosen from the HTML list types a numbered list, a list with bullets, or a list without any prefixes (a DL definition list). Russell chose a bulleted list, denoted by the UL tag (unordered list).

<xsl:template match="list">
	<html:UL> <!-- unordered list -->
		<xsl:apply-templates/>
	</html:UL>
</xsl:template>

A sample unordered list when rendered by a browser follows:

Item One
Item Two

He can redefine the above template with an alternate HTML tag to change all instances of the list element in one simple step. For example, he can change the UL tag to an OL tag to make all lists numbered:

Item One
Item Two

3.3.3 Table of contents

The Table of Contents XSL rules, which govern the toc tag, may seem complicated, but upon closer inspection, one can easily discern how they work.

Russell defined the toc element in the XML document as an empty element. It actually acts as a place holder for the XSL processor, marking where the Table of Contents should go. The content of the Table of Contents comes from the section and topic titles. The toc template element specifies that an XSL processor should create an ordered list (OL). To generate the content of the ordered list, it should follow the section rule in the toc mode:

<xsl:template match="toc">
	<html:OL>
		<xsl:apply-templates select="//section" mode="toc"/>
	</html:OL>

</xsl:template>

Recall that, in the above select pattern, the beginning double slash (//) denotes that an XSL processor can select section elements from anywhere in the XML document. Upon matching a section element, the processor uses the following toc mode section rule. It specifies that a processor should create a list item (LI), and the title attribute of the current section should constitute the content of that list item. The item also should link to the anchor named after that title (set up by the regular section rule, discussed above).

<xsl:template match="section" mode="toc">
	<html:LI>
		<html:A HREF="#{@title}">
			<xsl:value-of select="@title"/>
		</html:A>

		<html:OL>
			<xsl:apply-templates select="topic" mode="toc"/>
		</html:OL>
	</html:LI>
</xsl:template>

Following each section title, the processor should create a sublist using child topic elements of the section, with similar content and links:

<xsl:template match="topic" mode="toc">
	<html:LI>
		<html:A HREF="#{@title}">
			<xsl:value-of select="@title"/>
		</html:A>
	</html:LI>
</xsl:template>

After processing, the Table of Contents part of the finished HTML file looks like this:

<html:OL>
	<html:LI>
		<html:A HREF="#Section One Title">
			Section One Title
		</html:A>

		<html:OL>
			<html:LI>
				<html:A HREF="#Topic One Title">
					Topic One Title
				</html:A>
			</html:LI>
		</html:OL>
	</html:LI>

	<html:LI>
		<html:A HREF="#Section Two Title">
			Section Two Title
		</html:A>

		<html:OL>
		</html:OL>

	</html:LI>
</html:OL>

When rendered by a browser, it looks like this:

Section One Title
1. Topic One Title
Section Two Title

3.3.4 File dates and sizes

Russell also devised a creative XSL solution to the need to update the documentation XML file with the software's date modified and size attributes for every new version. He created a file tag, which can have either a modified or a size attribute value for the attribute attribute:

<paragraph>Most recent verison modified <file attribute="modified"/>: 
Docproc v1.21a (<file attribute="size"/>K).</paragraph>

Then he created an XSL rule for the file tag for each attribute type. Within both templates resides a processing instruction, intended for an ECMAScript module, that contains the ECMAScript code which produces the date modified and the size, respectively. Recall that ?> composes the only illegal character sequence within a processing instruction, so an XML processor will not trip over the ECMAScript code in these two template rules:

<xsl:template match="file[@attribute="modified"]">
	<?ECMAScript // ECMAScript code fits here?>
</xsl:template>

<xsl:template match="file[@attribute="size"]">
	<?ECMAScript // ECMAScript code fits here?>
</xsl:template>

An XSL processor will select which rule to use based on the value of the attribute attribute. Thus, the date modified and size replace the file tags when displayed by a browser:

Most recent version modified 2/14/98: Docproc v1.21a (782K).

4 Extensible Linking Language

XLL allows XML tags to refer to other XML documents, as well as to specific elements or data within those documents. Two complementary languages comprise XLL: XPointer and XLink. XPointer specifies a syntax for selecting specific elements within a document, and XLink specifies a set of attributes in the XLink namespace that one can add to element tags to make them refer to other documents. Often, tags that use XLink notation will also add XPointer notation to the XLink notation, creating a combined reference to an element in another document.

4.1 XLink

4.1.1 Simple links

The XLink specification [11] establishes three different, but related, linking structures that one can add to conventional XML documents. "Simple" links comprise the first type. Simple links resemble HTML links, which use the A element tag. A simple link usually resides within one of the resources it joins, just as an HTML A tag resides within the document from which it refers to a remote document. One calls such links "inline," because the linking tag itself resides within part of the content it joins. A common term for this content is the "local resource." A simple link differs from other kinds of XLinks in that it refers only to one remote resource. Therefore, it contains all linking information in one tag.

All XLinks establish their linking capability by adding XLink namespace attributes to otherwise standard XML tags. This presentation will use the namespace prefix xlink to refer to XLink attributes (although differing from the current XLink specification, a more recent note by an XLink working group editor [12] suggests this usage), with the namespace defined as follows:

xmlns:xlink="http://www.w3.org/TR/WD-xlink"

All XLinks contain a form attribute. Simple links take the form attribute value simple. They also must have an href attribute, which takes a URI as its value. In XLink, one can extend a URI to include an XPointer by adding either a pound sign (#) or a bar (|) to the end of the URI, and then placing the XPointer on the end of that. A pound sign indicates that an XLL processor should download the entire document pointed to by the URI before extracting the subresource to which the XPointer refers. A bar indicates that the XLL processor may use its discretion in accessing the resource.

An element with a simple XLink requires only the form and href attributes. One can produce the functionality of an HTML A tag with just these two attributes:

<myAnchor xlink:form="simple" xlink:href="http://www.wooster.edu/">
	College of Wooster Home Page
</myAnchor>

However, even a simple link has many more attributes. It has an inline attribute, which can take, as values, either true or false. If a link does not have inline content, an XLL processor will not use its content (the character data and markup contained within the link's start and end tags) as part of the link. The myAnchor example above shows the opposite, a link with inline content.

Simple links also have attributes that suggest how a browser or processor should display and traverse the link. The show attribute describes how a browser might display a remote link resource once it traverses to it. A value of embed indicates the browser should place the remote resource within content already present, replace indicates it should replace the current content with the remote resource, and new means that it should display the remote resource within a new venue. The actuate attribute describes how a browser might retrieve a remote resource. A value of auto indicates that it should retrieve the resource immediately once it begins processing the link; a value of user indicates it should wait until a user requests the resource.

The behavior attribute takes an XLL processor-specific value that may give the processor more detailed information regarding resource retrieval and display. It has no suggested values. The W3C has left the show, actuate, and behavior attributes all open to variations in implementation. An XLL application may use or display linked resources in a variety of ways unique to that application.

Simple links also have attributes that reveal more information about the resources to which they refer. A content-title attribute value designates the title of inline content; a content-role attribute value suggests the role the inline content may fulfill in an application. Both attributes take unconstrained text values to which specific XLL processors may or may not respond. A title attribute value designates the title of a remote resource; and a role attribute value suggests the role a remote resource may fulfill. Again, both of the attributes can take unconstrained values to which specific XLL processors may or may not respond. A browser may also use the role attribute to decide how to traverse a link.

This example simple link uses all possible attributes:

<myLink		xlink:form="simple"
		xlink:href="http://www.wooster.edu/"
		xlink:role="home page"
		xlink:title="College of Wooster"
		xlink:show="new"
		xlink:actuate="user"
		xlink:behavior="new window 300px 500px"
		xlink:content-role="my link"
		xlink:content-title="Link to the Wooster Home Page" 
		xlink:inline="true">
	College of Wooster Home Page
</myLink>

The above example describes a link to the College of Wooster home page, and suggests that a browser show the page in a new, 300 x 500 pixel window. It also requests that the browser wait for a user to actuate the link by clicking on its inline content, the text College of Wooster Home Page. Note that the role, behavior, and content-role attribute values have all been made up for the purposes of this example only.

4.1.2 Extended links

"Extended" links join more than one remote resource. Therefore, such links consist of two different kinds of elements working in concert. An element with the form attribute value extended encompasses the other, elements with the form attribute value locator:

<myExtendedLink xlink:form="extended">
	<myLocatorKind1 xlink:form="locator"/>
	<myLocatorKind1 xlink:form="locator"/>
	<myLocatorKind2 xlink:form="locator"/>
</myExtendedLink>

An extended element can also have two other XLink attributes: inline, which, just as with simple links, takes a value of either true or false; and role, which takes an unconstrained character data value that suggests the overall role of the link to an XLL processor. If an extended element has an inline attribute value true, then all of the extended element content, including all character data and subelements (except for locator elements), comprises the local resource of the link. In this case, the extended element may also take the content-role and content-title attributes, explained above with simple links.

Often, extended links take the inline attribute value false, indicating that the content of the extended element does not constitute part of the link. Note that this contrasts with a conventional HTML simple link, where a browser typically displays the character data content of the A (link) tag as text on which one can click to retrieve the remote content of the link. Extended links have much more power than these conventional simple links. Even lacking a local resource, extended links can still join two or more resources in other separate external documents as if the external documents themselves actually contained some of the extended link information. The ClothingLink example will illustrate such a possibility later.

An extended link must contain at least one locator element, but it may have many more. The locator elements refer to the resources joined together by the extended link. Each locator element refers to a single remote resource. Each locator element must have an href attribute, but it may also contain role, title, show, actuate, and behavior attributes. These attributes all have the same meaning in a locator element as they do in a simple link element. An extended element can also contain these optional attributes, which an XLL processor will fill in as the defaults for each locator included within the extended element.

In the example below, the HotdogLink, LocalHotdogContent, and HotdogContent elements embody an extended link. The HotdogLink element has an extended form attribute, so it encloses the HotdogContent locator portions of the link. The LocalHotdogContent element, a descendant of HotdogLink, comprises the local resource of the link. A browser might display Bun-size beef frank as text which, when clicked by a user, the browser replaces with the bsbeef.txt document. It might show the bsbeef.gif embedded within the original document, next to the Bun-size beef frank text. Because of its status as an extended link, a HotdogLink can refer to both the bsbeef.txt and bsbeef.gif resources simultaneously.

<HotdogLink	xlink:form="extended"
		xlink:role="hotdog display"
		xlink:content-role"link text"
		xlink:inline="true">

	<LocalHotdogContent>Bun-size beef frank</LocalHotdogContent>

	<HotdogContent	xlink:form="locator"
			xlink:href="gifs/bsbeef.gif"
			xlink:role="gif"
			xlink:show="embed"
			xlink:actuate="auto"/>

	<HotdogContent	xlink:form="locator"
			xlink:href="descriptions/bsbeef.txt"
			xlink:role="description text"
			xlink:title="Bun-size Beef Frank"
			xlink:show="replace"
			xlink:actuate="user"/>
</HotdogLink>

The next example shows an "out-of-line" extended link, which has no local resource (the opposite of inline). The first two locators use XPointer syntax to address two specific elements in two specific documents from which a user may link to a description and photograph of the product. In this case, the extended linking element does not reside in any of the documents that it joins together. The first two locators instruct an XLL processor to produce the same result if a user links from the catalogue.xml document or if he or she links from the specials.xml document. Even though all the locators have no show attributes, and the first two have no actuate attributes, the role and behavior attribute values instruct a browser how these links should work.

For this example, note that the part of the href attribute value which follows the pound sign (#) consists of XPointer syntax, which this paper will explain later. For now, assume that each of the first three href attributes identifies a specific element within the XML file specified before the pound sign.

<ClothingLink	xlink:form="extended"
		xlink:inline="false">

	<ClothingMention	xlink:form="locator"
				xlink:href="catalogue.xml#id(shirt007).child(1,Name)"
				xlink:role="initial"
				xlink:behavior="start"/>

	<ClothingMention	xlink:form="locator"
				xlink:href="specials.xml#id(shirt007).child(1,Name)"
				xlink:role="initial"
				xlink:behavior="start"/>

	<ClothingDescription	xlink:form="locator"
				xlink:href="shirts.xml#root().child(7,Clothing)"
				xlink:role="target"
				xlink:title="Description of Shirt"
				xlink:actuate="user"
				xlink:behavior="replace description-field"/>

	<ClothingPicture	xlink:form="locator"
				xlink:href="pics/shirt007.gif"
				xlink:role="target"
				xlink:title="Picture of Shirt"
				xlink:actuate="user"
				xlink:behavior="replace picture-field"/>
</ClothingLink>

To visualize how a user might view this extended link with a browser, assume he or she has already loaded the catalogue.xml file with a catalogue.xsl stylesheet, as well as the file that contains the above XLL example. Assume that the stylesheet specifies the division of the browser window into a frame for the catalogue text, which would contain mainly a list of all clothing items; an initially unfilled frame for a more specific description of an individual item; and an initially unfilled frame for a picture of an individual item:

picture field	description field
catalogue field

Within the catalogue field, the browser would highlight the character data content of every element referred to by a ClothingMention tag, indicating that a click produces more information. The browser locates that information through the specific ClothingDescription and ClothingPicture tags paired with each ClothingMention tag, which both request that the browser display their resources in specific fields when actuated.

The role and behavior attributes in the above example have been made up for this particular example only. In this instance they notify a browser that the two ClothingMention locators refer to elements already displayed by the browser before the user actuates the link; these attributes also notify the browser that it should display the resources located by the CothingDescription and ClothingPicture elements only after a user has actuated the link, and then within special fields. The above extended link might fall apart if processed by a browser or processor that did not come pre-programmed to deal with role values of initial and target, or behavior values of start and replace.

When link elements take the same attribute values over and over again, as in the above example, one can use DTDs to encode defaults for these element attributes. A DTD for the ClothingLink, ClothingMention, ClothingDescription, and ClothingPicture elements follows:

<!ELEMENT ClothingLink (ClothingMention+, ClothingDescription, ClothingPicture)>  <!ATTLIST ClothingLink xlink:form CDATA #FIXED "extended" xlink:inline (true|false) #FIXED "false">

<!ELEMENT ClothingMention EMPTY>  <!ATTLIST ClothingMention xlink:form CDATA #FIXED "locator" xlink:href CDATA #REQUIRED xlink:role CDATA #FIXED "initial" xlink:behavior CDATA #FIXED "start">

<!ELEMENT ClothingDescription EMPTY>  <!ATTLIST ClothingDescription xlink:form CDATA #FIXED "locator" xlink:href CDATA #REQUIRED xlink:role CDATA #FIXED "target" xlink:title CDATA "Description" xlink:actuate CDATA #FIXED "user" xlink:behavior CDATA "replace description-field">

<!ELEMENT ClothingPicture EMPTY>  <!ATTLIST ClothingPicture xlink:form CDATA #FIXED "locator" xlink:href CDATA #REQUIRED xlink:role CDATA #FIXED "target" xlink:title CDATA "Description" xlink:actuate CDATA #FIXED "user" xlink:behavior CDATA "replace picture-field">

Using a DTD with #FIXED and default values allows one to describe elements in an XML document much more readably; also, one can input the elements in the XML document faster and more conveniently. The following shows the minimum necessary attributes for the above ClothingLink example when using the above DTD:

<ClothingLink>
	<ClothingMention
		xlink:href="catalogue.xml#id(shirt007).child(1,Name)"/>
	
	<ClothingMention
		xlink:href="specials.xml#id(shirt007).child(1,Name)"/>
	
	<ClothingDescription
		xlink:href="shirts.xml#root().child(7,Clothing)"
		xlink:title="Description of Shirt"/>
	
	<ClothingPicture
		xlink:href="pics/shirt007.gif"
		xlink:title="Picture of Shirt"/>
</ClothingLink>

4.1.3 Group links

One may place "group links" in a document to declare to an XLL processor that a certain set of XML documents contains extended links which reference one other. For example, a user may load fileA.xml when interested in the compound document group "Z". Although fileA.xml may reference only fileB.xml, and fileB.xml refers to no other documents, the Z document group may still rely on references in fileC.xml that modify fileB.xml. In this case, one may place a group link in fileA.xml to notify an XLL processor that it must also resolve the links in fileC.xml. In this case, resolving the links in the Z document group takes two steps.

A group link consists of two kinds of elements. An element with the form attribute value group encloses elements with the form attribute value document, paralleling the relationship between extended elements and locator elements. The group element also may take an attribute steps, which takes a numeric value that suggests to an XLL processor how many levels of extended links it must resolve to complete all the links in the group. The document elements all take a href attribute with a URI value, which designates a document as part of the group. Even if one does not provide a steps value in the group element, the document elements indicate to an XLL processor which documents it must process to resolve all the links in the group.

The following example illustrates the the simple group XLink solution to the problem posed above, where before resolving document group Z, an XLL processor must process fileC.xml's links in fileB.xml:

<myGroupLink	myName="Z"
		xlink:form="group"
		xlink:steps="2">
	<myDocumentLink	xlink:form="document"
			xlink:href="fileA.xml"/>
	<myDocumentLink	xlink:form="document"
			xlink:href="fileB.xml"/>
	<myDocumentLink	xlink:form="document"
			xlink:href="fileC.xml"/>
</myGroupLink>

4.2 XPointer

An XPointer expression, as designated by the XPointer specification [13], consists of a series of terms, concatenated by periods. Each term identifies an element, a sequence of elements, an element attribute, or the text data of an element. However, only one term in an XPointer expression can identify an attribute or text data block, and it must come at the end. An expression with multiple terms reflects a traversal through the element tree structure of an XML document. Note that relative terms (defined later) may also legally match more than one element in an XML document.

4.2.1 Absolute terms

An XPointer expression must begin with, and contain, only one "absolute term." This term identifies a specific and unique element. If an expression does not begin with an absolute term, an XLL processor will treat it as if it began with the absolute term root(). The root() term selects the document element of a document. Another absolute term, origin(), selects the element in which the expression resides. The following example shows a simple link element which uses XPointer syntax to refer to itself within the href attribute:

<myNamespace:myLink xlink:href="#origin()"/>

Other absolute terms include id() and html().An id() term selects the element whose ID attribute value it encloses within its parenthesis. For example, id(chapter-three) selects this element:

<chapter id="chapter-three">...</chapter>

Recall that an element defined with an attribute which uses the ID keyword must own a unique value across the XML document for that attribute. An html(myName) term, which encloses an HTML NAME attribute value, provides the same functionality as

<html:A HREF="#myName">...</html:A>

when one has defined an HTML anchor (A) element with the NAME attribute of myName:

<html:A NAME="myName">...</html:A>

4.2.2 Relative terms

A "relative term" of an XPointer expression identifies an element based on its relationship to the previous term in the expression. For example, the relative term child() in the expression root().child(1,myElement) identifies the first myElement child of the previous term, root(). These terms rely on both a term function name, which indicates the direction to search, and an argument section, which establishes specific characteristics for which to search.

The child() term selects only children of the previous term; the descendant() term selects only descendants of the previous term; the ancestor() term selects only ancestors of the previous term; the preceding() term selects only nodes that come before the previous term in the XML document; the following() term selects only nodes that come after the previous term in the XML document; the psibling() term selects only siblings of the previous term that come before it in the XML document; the fsibling() term selects only siblings of the previous term that come after it in the XML document. The following XML excerpt shows these relationships, if the previous term has selected the myPreviousTarget element (eg. via an id(prevTarg) term):

<a><!-- a is an ancestor -->
	<b/><!-- b is preceding (NOT a psibling) -->
	<c><!-- c is an ancestor -->
		<d><!-- d is preceding AND a psibling -->
			<e/><!-- e is preceding (NOT a psibling) -->
		</d>
		<myPreviousTarget id="prevTarg">
			<f><!-- f is a descendant and a child -->
				<g/><!-- g descendant (NOT a child) --> 
			</f>
		</myPreviousTarget>
		<d><!-- d is following AND an fsibling -->
			<e/><!-- e is following (NOT an fsibling) -->
		</d>
	</c>
	<b/><!-- b is following (NOT an fsibling) -->
</a>

One can use the argument section of a relative term to indicate more specifically which element to select, or even what kind of element content to select. The first argument, the only required one, denotes which element to select if the term applies to more than one element. A positive number n indicates that a processor should select the nth element from the top of the list of potential matches; a negative number -m indicates a processor should select the mth element from the bottom of the list of potential matches. One can use the keyword all in place of a number to select all potential matches.

With the following example, an XPointer expression id(start).preceding(2) would select the e element with an id of third, since it is the second element above the element with an id of start. An expression id(start).preceding(-2) would select the e element with an id of second, because it lies two elements from the beginning of the document. Note that an ancestor (in this case, myParent) counts neither as preceding nor following.

<myParent>
	<e id="first"/> 
	<e id="second"/> 
	<e id="third"/> 
	<e id="fourth"/> 
	<myPreviousTarget id="start"/>
</myParent>

The second argument specifies what kind of element or element content an XLL processor should select. One can use an element name for this argument, indicating that an XLL processor can select elements of only that kind. One can also use the #element keyword for this argument to indicate that a processor can select any kind of element, but no element content. This argument defaults to #element. Other possible values include the #pi keyword, which indicates that a processor can select the content of only processing instructions; the #comment keyword, indicating it can select the content of only comments; the #cdata keyword, indicating it can select only character data contained within CDATA blocks; and the #text keyword, indicating it can select only character data contained within CDATA blocks or enclosed with element start and end tags. The #all keyword indicates that an XLL processor can select all of the above kinds.

The next examples all use the following XML document, where p represents the root element. The CDATA block constitutes the only substantive child text block of p:

<!-- xpointer_args_example.xml -->
<p>
	<a/>
	<b>some char data contained within an element</b>
	<!-- comment -->
	<?xml processing instruction?>
	<![CDATA[some char data contained with a CDATA block]]> 
	<c/>
</p>

An XPointer expression root().child(all,a) will select only the a element. An expression root().child(-1,#element) will select only the c element, because it embodies the first element from the end of the list of the root element's child elements. An expression root().child(3,#all) will select the comment, because it embodies the third markup node from the beginning of the list of child elements. An expression root().child(1,#pi) will select the only processing instruction. An expression root().child(all,#text) will select the only CDATA block. An expression root().child(2,#cdata) will not select anything, because only one CDATA block exists as a child of p.

The remaining arguments of a relative term consist of attribute name and value pairs. One can use them only with terms that select elements. One can use the symbol * in either a name or value argument to indicate that for an XLL processor to select it, an element must have an attribute with any name but a specific value, or a specific attribute with any value, respectively. One can also use the #IMPLIED keyword for an attribute value argument to indicate that for a processor to select it, an element must have an attribute with the specified name, but with no specified attribute value, and no default attribute value. In addition, an XLL processor treats argument attribute values enclosed in quotation marks as case sensitive, and those not in quotation marks as case insensitive.

An XPointer expression root().descendant(1,a,myAttrName, myAttrValue) selects the first a descendant of the root element that has an myAttrName attribute with the value myAttrValue. An expression origin().ancestor(-1,#element,myAttr,*) selects the closest ancestor of the element in which the expression resides that also posseses a myAttr attribute with any value specified, or with a default value. An expression origin().child(all,#element,*,"Hello") selects all elements parented by the element in which the expression resides that also have any attribute value of Hello (case sensitive). An expression child(all,shirt,color,blue, size,12) selects all shirt children of root that have a color attribute value of blue and a size attribute value of 12.

4.2.3 String terms

One can use the string() term to select specific sections of character data in element content, CDATA block, comment, and processing instruction nodes. An XLL processor ignores markup characters (those within element tags, and those that comprise the <, >, <![CDATA[, ]]>, , <?, and ?> symbols). Previous terms in an XPointer expression should have already selected the specific nodes to use. The second argument of a string() term, which one must enclose with quotation marks, specifies a string within the previously selected node for which the XLL processor should search. The first argument of a string() term specifies that the processor should select the nth match right of the beginning of the section if positive, or the -nth match left of the end of the section if negative. If one passes all as the first argument, the XLL processor will select all instances of the specified string. If one passes a positive number n in the first argument, and a blank string ("") in the second, a processor will begin its selection with the nth character from the start; if one passes a negative number m and a blank string, a processor will begin its selection with the -mth character from the end.

The following examples use this XML excerpt:

<a id="begin-here">This is element content.</a>
<!-- my own comment -->
<?xml processing instruction?>
<![CDATA[This is CDATA content.]]>

An expression id(begin-here).string(2,"i") selects the second i (the i in the word is) from the a element content. An expression id(begin-here).following(1,#comment).string(3,"") selects the third character of the comment, the character y (note that the space at the beginning of the comment counts as the first character). An expression id(begin-here).string(-1," ") selects the space between element and content in the a element content. An expression id(begin-here).string(1,"a") would not select any string.

One can specify how many characters the XLL processor should select, and at what offsets, with last two arguments of a string() term. The third argument specifes the offset; a positive number n indicates the processor should start n-1 spaces from the right of the first character of the string selected with the first two arguments; a negative number m specifies the processor should start -m spaces from the left of the last character. An offset value end indicates that the processor should start at the first character to the left of the string selected by the first two arguments. The fourth argument denotes how many characters the XLL processor should select from the left of the start position designated by the first three arguments. The offset argument defaults to +1, the length argument defaults to 0.

With the above example XML excerpt, an expression id(begin-here).string(2,"i",2,7) selects the string element from the element content of a. An expression id(begin-here).fsibling(1,#pi) .string(3,"",-3,3) selects the string xml from the processing instruction. An expression id(begin-here).following(2,#text).string(-1," ",-8,8) selects the string is CDATA from the CDATA content. An expression id(begin-here).fsibling(all,#cdata) .string(any,"CDATA",1,7) will select the string content from the CDATA content.

4.2.4 Spanning terms

One can use a "spanning" term to select a sequence of elements or other nodes. The span() term takes two arguments, both XPointers themselves, which delimit the sequence. An XLL processor concatenates the terms preceding the span() term onto both argument XPointers to produce a node with which the span begins and a node with which the span ends. When one applies the XPointer expression root().span(child(1,#element), child(-1,#element).child(all)) to the following XML document

<!-- spanning_example.xml -->
<p>
	<a>
		<b/>
	</a>
	<?mypi my processing instruction?>
	<b/>
	<!-- my comment -->
	<c>
		<d/>
	</c>
</p>

an XLL processor will select

	<a>
		<b/>
	</a>
	<?mypi my processing instruction?>
	<b/>
	<!-- my comment -->
	<c>
		<d/>

even though, missing a c element end tag, the selection does not constitute a well-formed XML subtree. The selection begins with a, produced by root().child(1,#element), and ends with d, produced with root().child(-1,#element).child(all). Attribute terms represent the only kind of term that one cannot use as an argument for the span term.

4.2.5 Attribute terms

An "attribute" term selects an attribute value. A term that selects an element must directly precede an attribute term. The attr() term takes the name of an attribute as its value. For example, root().descendant(1,shirt,color,*).attr(color) will return the color attribute of the first shirt descendant of the root node that has a color attribute. The expression descendant(1,shirt).attr(color) will return the color attribute of the first shirt descendant of the root node, even if the shirt has no color attribute specified. If it has no color attribute, the expression will return a blank string.

4.3 Case Study: The Annotated XML Specification

Tim Bray, one of the XML specification's editors, came up with an innovative use for XML: He annotated the XML specification using a few simple XML elements and a little XLL. Bray also wrote an interesting piece about how he did it, which provides the source for this study [14]. Some of the syntax that Bray used has become obsolete; this study replaces it with current usage.

The plan Bray devised involved creating a large XML file that would contain all the annotations and would reference the specification XML file via XLink and XPointer tags. The editors of the XML version of the XML specification file copiously marked up its contents, providing plenty of unique elements Bray could select with XPointer terms. A brief, unbalanced selection follows:

<body> 
<div1 id='sec-intro'>
<head>Introduction</head>
<p>Extensible Markup Language, abbreviated XML, describes a class of data 
objects called <termref def="dt-xml-doc">XML documents</termref> and 
partially describes the behavior of 
computer programs which process them. XML is an application profile or restricted 
form of SGML, the Standard Generalized Markup Language <bibref 
ref='ISO8879'/>. By construction, XML documents are conforming SGML 
documents.</p>
<p><termdef id="dt-xml-proc" term="XML Processor">A software module 
called an <term>XML processor</term> is used to read XML documents and 
provide access to their content and structure.</termdef>

When a browser loads the XLL file containing the annotations, the XML specification file, and an XSL stylesheet for the spec, Bray intended the browser to lay out the files like so

xml specification field

annotation field

where a user would click on highlighted elements in the XML specification frame to read the various annotation materials in the annotation frame. If any XLL enabled XML/XSL viewers existed, he would have only needed to complete this first step. Because none did, Bray wrote a Java program that split up the annotations into individual files, and that altered a copy of the XML specification to incorporate links to those annotation files with conventional HTML.

Before Bray wrote any annotation content, he first created three unique XML elements to contain the XLink and XPointer information, as well to hold as the annotations themselves. These elements effectively reverse a conventional HTML link. One element, named here, holds individual annotation text, and serves as the target end of a link. A spec element indicates to what part of the XML specification an annotation refers. A user initiates a traversal from the XML specification to an annotation section by selecting an element in the XML specification to which a spec element refers. The x element contains one here element and one or more spec elements, joining the link initiation locators with a target locator. Thus, even though an XML browser user would appear to traverse a link from the XML specification to the annotation text directly, the link information actually exists in the annotation file, rather than in the XML specification file.

4.3.1 `x` tag

In the annotation DTD, Bray indicates that the x tag takes the form of an extended link by fixing its form attribute as extended. The x tag also adopts XLL inline, content-role, and content-title attributes. Bray fixed the inline attribute value to indicate that the extended link does not possesses a local resource; locators exclusively compose all of the link's content. The content-role and content-title values note the kind of content the entire link possesses. Bray created the custom id attribute to mark each x instance with a unique identifier.

<!-- an x consists of one here element followed by at least one spec elements -->
<!ELEMENT x (here, spec+)>
<!ATTLIST x
	xlink:form		CDATA	#FIXED	"extended"
	xlink:inline		CDATA	#FIXED	"false"
	xlink:content-role	CDATA	#FIXED	"commentary"
	xlink:content-title	CDATA	#FIXED	"Annotation" 
	id			ID	#REQUIRED>

Because of all the #FIXED attributes, to create an x element one only needs to enter the element's id:

<x id="myLinkID"><!-- link content (here & spec tags) --></x>

4.3.2 `spec` tag

The spec tag uses the xlink:form attribute value locator, establishing its part in an extended link. The href attribute value for a given spec tag refers into the XML specification document using XPointer syntax. Because Bray intends for a viewer to jump from the XML specification to the annotation material, he sets the actuate attribute to user and the show attribute to replace. Thus a browser should wait for a user to click on the end of the link in the XML specification before the browser displays the annotation end of the link by replacing the previously selected annotation. He uses the role attribute to indicate the kind of annotation a specific locator represents.

<!ELEMENT spec EMPTY>
<!ATTLIST spec
	xlink:form	CDATA		#FIXED			"locator"
	xlink:actuate	CDATA		#FIXED			"user"
	xlink:show	CDATA		#FIXED			"replace"
	xlink:role	(Using|History|Tech|Misc|Example)	"Misc"
	xlink:title	CDATA					"Into XML Specification" 
	xlink:href	CDATA		#REQUIRED>

Bray typically uses this tag with an entity reference in place of the URI and connector for the XML specification. During processing of the following line, an XML processor would replace the entity reference &s; with the full URI and pound sign XPointer connector (eg. http://www.w3.org/TR/1998/REC-xml-19980210.xml#).

<spec xlink:href="&s;id(dt-xml-proc)" xlink:role="Misc"/>

Recall that the the XPointer expression id(dt-xml-proc) selects the element with a unique ID attribute of dt-xml-proc.

4.3.3 `here` tag

Again, in specifying the here tag, Bray uses the form attribute value locator. The role attribute value indicates that the link contains annotation data. The href attribute value, an XPointer absolute term, indicates that the here tag points to itself. Thus the here element actually contains the remote resource to which it points. The title attribute value designates a title for each individual here tag. Only the index attribute does not come from the XLink specification; its value provides an index title for individual here tags, which Bray can use later to build an index of all the annotations.

<!ELEMENT here ANY>
<!ATTLIST here
	xlink:form	CDATA	#FIXED	"locator"
	xlink:role	CDATA	#FIXED	"annotation"
	xlink:title	CDATA	#REQUIRED
	xlink:href	CDATA	#FIXED	"#origin()"
	index		CDATA	#IMPLIED>

To use a here tag, Bray only has to designate its title and, optionally, its index. The actual annotation text (in HTML) goes between the start and end tags:

<here xlink:title="XML Processor"><html:P>An XML processor usually 
is a code module completely distinct from the application which uses it.</html:P></here>

4.3.4 Link example

The example below illustrates what a complete extended annotation link looks like. The x tag groups together the annotation text and the annotation's locations in the XML specification. The first extended link shows two references from the XML specification to the XML Processor annotation. When a user selects either of the two elements with ID-typed attributes of dt-xml-proc and dt-xml-appl, a browser will display the formatted text <html:P>An XML processor usually is a code module completely distinct from the application that uses it.</html:P> in the annotation field.

<x id="xml-processor-link">
	<here xlink:title="XML Processor">
		<html:P>An XML processor usually is a code module 
		completely distinct from the application that uses it.</html:P>
	</here>

	<spec xlink:href="&s;id(dt-xml-proc)" xlink:role="Misc"/>

	<spec xlink:href="&s;id(dt-xml-appl)" xlink:role="Misc"/>
</x>

The next example shows an annotation link that only references the annotation from one location in the XML specification. The index entry for this element will take the label Processing, DOM.. Notice that this link has the same specification element as the above example. When a user clicks on the element with the ID of dt-xml-proc in the XML specification, a browser compatible with the XLink format of the annotations might display both annotations, combined, in the same annotation frame. On the other hand, a browser might instead use some interface method that distinguished between the selection of different links stemming from the same element.

<x id="dom-processing-link">
	<here xlink:title="DOM Processing" index="Processing, DOM">
		<html:P>See the <html:A HREF=
		http://www.w3.org/TR/1998/REC-DOM-Level-1-
		19981001">DOM specification</html:A> for more 
		information regarding how an XML processor may 
		interact with an XML application.</html:P>
	</here>

	<spec xlink:href="&s;id(dt-xml-proc)" xlink:role="Tech"/>
</x>

5 Application Description

The application included with this paper implements an XML browser using several existing Java XML processing modules. This application, Link, renders XML documents with HTML formatting using XSL stylesheets, and simulates both simple and extended XLinks. Although XLink tags can describe a variety of undefined application-specific functionality, Link attempts to serve as a general XML/XLL browser. When a user loads a document, either from an "open file" dialog or by a link traversal, Link opens that document into a new window. It displays a link icon in front of the character data content of every element to which an XLink locator points, and in front of the character data of every XLink element that has inline content.

When a user selects a link icon (by double-click-and-holding the mouse on the icon), a popup menu appears. This menu contains an item for every locator in the XLink, excluding the locator to which the icon belongs. The menu items consist of the title attributes of the locators. If the XLink element also has inline content, this menu includes an item with the element's content-title attribute. If a user selects one of the items from the popup menu, Link opens the document to which the corresponding locator refers, if unopened, or highlights the document's window, if open. If the locator has an XPointer expression connected to it, Link resolves the expression and scrolls to the content of the element that the XPointer selects.

When a user closes a document that has XLink elements, Link retains the XLinks in memory, and continues to apply them to open documents, as well as to documents that it opens during the same session. When a user opens a document that has XLink elements, Link tries to apply the new XLinks to every open document, and re-renders those documents to which the new XLinks apply. Link also opens automatically any document referred to from a group XLink element.

5.1 Implementation

5.1.1 Overview

Behind the scenes, Link transforms an XML document into HTML using an XSL processor. It then adds HTML anchor tags to the resulting HTML file based on all the XLL tags in all the open documents that refer to the original XML file. Although Link mainly uses the Docuverse DOM implementation [15] to manipulate XML documents, it also relies on James Clark's XP XML parser [16] and the SAX parser API [17] to parse XML files for the DOM. It employs James Clark's XT XSL processor [18] to covert XML files to HTML. Once Link converts an XML file to HTML, it displays and renders the HTML file using Sun's Swing Java interface components [19].

When loaded, Link, written entirely in Java, starts with the static main() method of its App class. This method creates an App instance, which extends a Java window component, Frame. The App object sets up the application menubar, and also serves as a progress bar. From this menu bar, a user can select a local or remote XML file to view. Once he or she selects a file, the App class creates new Viewer and ViewerWindow objects using a static ViewerFactory method. A ViewerWindow holds a Viewer, and a Viewer component displays an HTML file. It extends the Swing JEditorPane class, which renders an HTML file when passed a String URI or Java URL object via the setPage() method.

The Link application overloads the setPage() method to ensure that it displays every document a user selects to view in a unique window, and that no document has more than one window. If the document contains XML, Link begins processing its XLink elements. When processing a URI, a JEditorPane object calls its own getStream() method, which returns the Java InputStream object that refers to the file which the JEditorPane will process. Link also overloads this method, and if the file contains XML, it finishes processing the XLinks. After processing the XLinks, Link copies the processed XML file to a local temp folder, processes the XSL stylesheet associated with the XML file and copies it to the local temp folder, and then passes both files to the XT XSL processor. Finally, the Viewer object's getStream() method returns an InputStream object referring to the HTML file that resulted from XT's processing.

5.1.2 Link processing

When processing XLinks, a Viewer calls its own addLinks() method. This method opens the source XML file into memory using the DOM. Link steps through the source file recursively using the addLinks(Element,Vector) method. For every simple or extended XLink element it finds, it adds that DOM Element object to a static Java Vector called xlinks. This Vector, as a static object, remains available to all Viewer objects. The xlinks Vector allows Link to find the elements joined by an XLink when a user tries to traverse the link. If an XLink element has inline content, the addLinks(Element, Vector) method will call addAnchor(), which inserts an html:a anchor tag into the XLink element as its first child. The XSL processor will later expand this tag to produce the desired HTML functionality.

For every simple XLink, addLinks(Element,Vector) calls addSimpleLink(). This method adds a Locator object to the static xpointers Vector that encodes all the information Link will need to resolve an XPointer in the href attribute of the XLink element. For every extended XLink, addLinks(Element,Vector) calls addExtendedLink(). This method also adds a Locator object to the xpointers Vector for each locator child of the extended element. For every group XLink, addLinks(Element,Vector) calls addGroupLink(), opening all the documents to which the group refers. All three methods will add the file name in every href attribute they encounter to the viewers Vector. This allows the original caller, addLinks(), to update with the new XLink information each document, if open, via displayPage().

When addGroupLink() opens a new document, it uses a ViewerFactory method that directs the Viewer setPage() method not to automatically display the new document. That way, the addLinks() methods of each new Viewer can add its document's links to the xlinks and xpointers Vectors before Link renders or re-renders any new or already open documents. In all other cases, such as when addLinks() calls displayPage() for all new and updated Viewers -- which in turn calls the setPage() method for each Viewer, the Viewer method setPage() calls the JEditorPane setPage() method. The JEditorPane setPage() method calls the overloaded Viewer getStream() method, which calls two Viewer methods, processLinks() and processStylesheet().

The processLinks() method steps through each Locator object in the xpointers Vector. If the URI of the Viewer's document matches that of the Locator's href String, it removes the Locator from the Vector, and resolves the XPointer expression, if present, using an XPointer object's getSelection() method. The processLinks() method then, with the addAnchor() method, inserts an html:a tag as the first child of the elements selected by the XPointer. In this way, Link processes each XLink locator only once, while any unprocessed XLink locators remain for documents opened in the future.

The processStylesheet() method first attempts to locate a local default XSL stylesheet with the same name as the document element of the Viewer's XML document. Then it attempts to locate the stylesheet indicated by an xml-stylesheet processing instruction in the XML file, which it prefers over its default stylesheet. Once it finds a stylesheet, it copies the stylesheet with the Viewer object's createStylesheet() method, which modifies the stylesheet. The createStylesheet() method adds a xsl:template rule to the stylesheet which specifies that, for every html:a tag it processes in the source XML file, an XSL processor should insert an html:a anchor tag around a small link icon in the resulting HTML file. It also adds both href and name attributes to the html:a tag, which refer back to the XLink Element that defined the link.

In the XSL file, Link also modifies every template element to ensure that the XSL processor will have a chance to process each html:a tag. Recall that an XSL processor will process the templates of child elements only if the templates of their parents have either an xsl:apply-templates tag with no select attribute (a "general" apply-templates tag), or an xsl:apply-templates tag with a select attribute that specifically designates the child's name. The createStylesheet() method, with the addApplyTemplates() method, inserts an <xsl:apply-templates select="html:a"/> tag into every template of an XLink element that does not have a general apply-templates tag.

Once Link finishes modifying the stylesheet, it invokes XT to convert the source XML file into a result HTML file, and then allows the JEditorPane routines to render the result file in the Viewer component. When a user clicks on the rendered result of an html:a tag, the JEditorPane generates a HyperlinkEvent. The Viewer in which this event occurred (recall that a Viewer object extends the JEditorPane class) traps the event with the hyperlinkUpdate() method. This method receives the HyperlinkEvent object as a parameter, which it uses to determine the value of the href attribute of the html:a tag that generated the event. Instead of encoding a real URI in the href attribute of html:a tags, Link writes the xlinks index of the XLink element that the html:a tag represents, as well as another number indicating the particular locator it embodies.

When a user selects a link icon, Link uses this information to build a popup menu containing the title (or in the case of an inline XLink element, content-title) attribute of every locator, except the one which the icon embodied. Because the JEditorPane implementation that Link uses only passes HyperlinkEvent objects after a user has released the mouse button, Link pops up the menu on the next MOUSE_PRESSED MouseEvent, which it also traps. The Viewer actionPerformed() method traps the user's selection in the popup menu. The getActionCommand() method of the menu-generated ActionEvent contains the xlinks item and locator information that actionPerformed() uses to scroll to the selected locator. To do this, it uses the familiar setPage() method, which opens the selected document, if not already open, and scrolls to the correct element location.

The "named anchor" mechanism of HTML allows the JEditorPane to scroll correctly. Recall that for each html:a tag, Link created both an href attribute and a name attribute. When the value of a name attribute appears at the end of a selected anchor's href URI, like myName in the relative URI myFile.html#myName, an HTML browser automatically scrolls to the location of the html:a tag with that name attribute value. In this way, each html:a tag Link adds to a result HTML file can serve as both a starting anchor and an ending anchor, depending on whether or not a user clicked on it or on one of its complements (for an extended link, the other locators; for a simple link, either the XLink element itself, or the element to which it points).

5.2 Discussion

While Link demonstrates one way to implement XLL in an XML browser, one could take a number of different approaches. One could write a custom browser, perhaps rendering a different set of formatting tags than HTML, such as the formatting objects described in the XSL specification. Such a browser might have the ability to keep track of the element content referenced by XLinks directly, without playing games with HTML anchor tags, by noting the location of the content during both the XSL processing and the formatting tag rendering stages. An application like this could probably also store linking information more efficiently than the xlinks and xpointers vectors of Link. In fact, contrary to what Link does, an application need not continue to hold the XML structure of processed documents in memory -- although the application may have to reload already-processed documents when it processes new XLink documents.

If an application uses XSL stylesheets, it will have to go through a stage where it transforms the source XML file into a formatted file with a potentially different structure. Using a CSS stylesheet simplifies this process, as it allows a browser to render XML elements directly using the structure of the XML document itself. (Recall that CSS style rules directly impose formatting properties on specified XML elements.) This avoids creating a separate result file, and enables a browser to apply XPointer expressions to a XML document directly, instead of trying to locate element content within the XSL transformed and formatted file. Unfortunately, CSS falls short when a user wants to view only certain parts of an XML document, or in a format at odds with the XML document's structure.

With an application designed to work with only certain specific kinds of XML documents, one could actually take advantage of XSL's ability to transform documents by using stylesheets specific to the application. In some cases, especially with simple links, rules in such stylesheets could automatically add anchor tag formatting around XLink elements and the elements to which XLinks refer. Such applications might also display a group of XLinked documents within a single window. (The HotdogLink and ClothingLink examples in the XLink section of the XLL chapter above describe applications like this). They probably would take advantage of the xlink:role and xlink:behavior attributes to show certain documents in certain frames in response to a user's XLink selection. They might also choose not to display documents that contained mainly XLink information at all.

While the potential exists to build XLL-enabled browsers with a wide variety of features and functionality, they must share some things in common. They should all have the ability to process newly-opened files with the XLink information from already-opened files, and the ability to process already-opened files with the XLink information from newly-opened files. This necessitates an XLink processing algorithm similar to the one Link demonstrates. Although specific applications may require extensive modifications to both its methods and data object members stored, Link, primarily through its Viewer class, can serve as a template for future XLL-processing browsers.

A References

[1] The World Wide Web Consortium, W3C, <http://www.w3.org/>, 1 Mar. 1999.

[2] R. Cover, The SGML/XML Web Page, <http://www.oasis-open.org/cover/>, 1 Mar. 1999.

[3] T. Bray, J. Paoli, C. M. Sperberg-McQueen, eds., "Extensible Markup Language 1.0," <http://www.w3.org/TR/1998/REC-xml-19980210>, 1 Mar. 1999.

[4] T. Bray, D. Hollander, A. Layman, eds., "Namespaces in XML," <http://www.w3.org/TR/1999/REC-xml-names-19990114>, 1 Mar. 1999.

[5] M. Champion, et al., eds, "Document Object Model (Core) Level 1," Document Object Model Level 1 Specification, eds. L. Wood, et al., <http://www.w3.org/TR/1998/REC-DOM-Level-1-19981001/level-one-core.html>, 1 Mar. 1999.

[6] Sun Microsystems, Inc., "Java Speech Markup Language Specification," draft, <http://java.sun.com/products/java-media/speech/forDevelopers/JSML/index.html>, 1 Mar. 1999.

[7] J. Clark, S. Deach, eds., "Extensible Stylsheet Language 1.0," draft, <http://www.w3.org/TR/1998/WD-xsl-19981216>, 1 Mar. 1999.

[8] J. Clark, ed., "Associating Stylesheets with XML Documents," draft, <http://www.w3.org/TR/1999/PR-xml-stylesheet-19990114>, 1 Mar. 1999.

[9] H. W. Lie, B. Bos, eds., "Cascading Style Sheets, Level 1," <http://www.w3.org/TR/1999/REC-CSS1-19990111>, 1 Mar. 1999.

[10] S. Russell, "Docproc," <http://javalab.uoregon.edu/ser/software/docproc/index.xml>, 14 Nov. 1998.

[11] E. Maler, S. DeRose, eds., "XML Linking Language," draft, <http://www.w3.org/TR/1998/WD-xlink-19980303>, 1 Mar. 1999.

[12] E. Maler, "Xlink's 'xml:link' to Be 'xlink:form,'" archived email, <http://www.oasis-open.org/cover/xmlColonLinkChanged.html>, 1 Mar. 1999.

[13] E. Maler, S. DeRose, eds., "XML Pointer Language," draft, <http://www.w3.org/TR/1998/WD-xptr-19980303>, 1 Mar. 1999.

[14] T. Bray, "Building the Annotated XML Specification," <http://www.xml.com/xml/pub/98/09/exexegesis-0.html>, 1 Mar. 1999.

[15] Docuverse DOM SDK, Java software library, Docuverse, vers. 1.0PRB3, <http://www.docuverse.com/domsdk/index.html>, 1 Mar. 1999.

[16] J. Clark, XP, Java software library, vers. 0.5, <http://www.jclark.com/xml/xp/index.html>, 1 Mar. 1999.

[17] D. Megginson, et al., The Simple API for XML (SAX), Java software library, XML-DEV mailing list, vers. 1.0, <http://www.megginson.com/SAX/index.html>, 1 Mar. 1999.

[18] J. Clark, XT, Java software library, vers. 19990115, <http://www.jclark.com/xml/xt.html>, 1 Mar. 1999.

[19] Swing, Java software library, Sun Microsystems, Inc., vers. 1.1, <http://www.javasoft.com/products/jfc/download.html>, 1 Mar. 1999.

An Investigation of XML with Emphasis onExtensible Linking Language (XLL)

Abstract

Table of Contents

Section One Title

Topic One Title

Section Two Title

An Investigation of XML with Emphasis on
Extensible Linking Language (XLL)