[This local archive copy mirrored from the canonical site: http://sunsite.unc.edu/xml/books/xml/; links may not have complete integrity, so use the canonical document at this URL if possible.]

XML: Extensible Markup Language

Welcome to XML. After reading this book I hope you'll agree with me that XML is the most exciting development on the Internet since Java, and that it makes web site development easier, more productive, and more fun.

This book is your introduction to the exciting and fast growing world of XML. In this book you'll learn how to write documents in XML and how to use XSL style sheets to convert those documents into HTML so legacy browsers can read them. You'll also learn how to use DTDs to describe and validate documents. This will become increasingly important as more and more browsers like Netscape and Internet Explorer 5.0 provide native support for XML.

This book is the first to look at XML not from the perspective of a software developer but rather that of a web page author. It doesn't spend a lot of pages talking about BNF grammars or parsing element trees. Instead it shows you how you can use XML and existing tools today to more efficiently and productively produce powerful web sites.

Who You Are

This book is aimed squarely at web site developers. I assume that you want to use XML to produce web sites that are difficult to impossible to create with raw HTML. You'll be amazed to discover that in conjunction with XSL style sheets and a few free tools, XML lets you do things that previously required either custom software costing hundreds to thousands of dollars per developer or extensive knowledge of programming languages like Perl. None of the software in this book will cost you more than a few minutes of download time. None of the tricks require any programming beyond the most basic cut and paste JavaScript.

However, XML does build on HTML and the underlying infrastructure of the Internet. To that end, I will assume you know how to ftp files, send email, and load URLs in your web browser of choice. I will also assume you have a reasonable knowledge of HTML at about the level of Netscape 1.0. On the other hand when this book discusses newer aspects of HTML that are not yet in widespread use like cascading style sheets or the <SPAN> and <DIV> tags, I will cover them in depth.

To be more specific:

On the other hand, there are a number of things I do not assume you know. In particular:

There are also a number of things that are not prerequisites for this book but that would nonetheless be helpful to know once you begin writing XML files. Among others these include

You don't need to know any of these things to learn to write XML files, any more than you need to know them to write HTML. (You absolutely do not have to know how to read Chinese to read this book. It is written in English, after all.) Nonetheless readers who do understand these topics will find a few sections of this book more compelling than readers who do not. If you are familiar one or more of these topics, then certain practices can be more easily motivated and explained. If you don't know about these things, you can still write XML; you'll just be asked to accept that there is indeed a method to the madness behind certain rules of XML that appear arbitrary on the surface.

One final note: this book assumes you're using Windows 95 or NT 4.0 or later. As a longtime Mac and Unix user, I do regret this. Like Java, XML is supposed to be platform independent. Also like Java, the reality is somewhat short of the hype. Although XML code is pure text that can be written with any editor, there are some crucial tools that are currently available only on Windows. I very much hope that in the not too distant future these tools will be made available on the Macintosh and Unix. But until that becomes true, XML development will remain primarily a PC based activity.

What You'll Learn

This book has one primary goal, to teach you to write XML documents for the web. The next three hundred pages or so are going to show you how to do exactly that. Fortunately XML has a decidedly unsteep learning curve, much like HTML (and unlike SGML). As you learn a little you can do a little. As you learn a little more, you can do a little more. Thus the chapters in this book build steadily on each other. They are meant to be read in sequence. Along the way you'll learn:

In the final section of this book, you'll see several practical examples of XML being used for real-world applications including:

This book is divided into four parts:

  1. XML Basics
  2. DTDs
  3. The Bleeding Edge
  4. Real World Applications

By the time you're finished, you'll be ready to use XML to create compelling web pages.

What's In The Book

This book is divided into four parts of two to four chapters each. These parts cover

Part I: XML Basics

Part 1, XML Basics. introduces the purpose, structure, and syntax of XML and its associated style sheet language, XSL. In Part 1 you'll learn how to create basic XML pages and publish them on the Web.

Chapter 1: Introducing XML

The first chapter introduces you to the history and theory behind XML, the goals XML is trying to achieve, and to some of XML's more intriguing uses and applications. You'll learn what XML is good for, and what it isn't. You'll learn how XML can be used for such diverse areas as chemistry, mathematics, push, multimedia arrangements, and more. Finally, some intriguing current uses of XML are briefly explored including the Chemical Markup Language, Microsoft's Channel Definition Format, and the complete works of William Shakespeare.

Chapter 2: Beginning XML

The second chapter shows you some very simple XML documents. You'll learn how to write them with a text editor, how to render them into HTML, and how to serve them from a web server. This chapter endeavors to teach you by example, not from first principals. It does not cross all the t's and dot all the i's. There are exceptions and special cases that aren't discussed here. Those will be addressed in the next several chapters. For the most part, you don't need to worry about the technical rules right up front. As with HTML you can learn and do a lot by copying simple examples that others have prepared and modifying them to fit your needs.

Chapter 3: Formalizing XML

HTML 4.0 has about three hundred different tags. Most of these tags have half a dozen possible attributes for several thousand possible variations. Because XML is more powerful than HTML, you may think XML would have even more tags, but you'd be wrong. XML gets its power through simplicity and extensibility @md not a plethora of tags.

Actually, XML predefines almost no tags at all. Instead, XML enables you to define your own tags as needed. These tags and the documents built from them are not completely arbitrary, however. They have to follow a specific set of rules elaborated in this chapter. A document that follows these rules is well-formed. Well-formedness is the minimum criteria necessary for XML parsers, processors, and browsers to read your files. In this chapter, you'll examine the rules for well-formed documents and focus on how XML differs from HTML.

Chapter 4: XSL

One of the fundamental principals of XML is the separation of data from the presentation of the data. The first three chapters focused on how you describe data. Chapter 4 focuses on how you present data.

Each XML document can be associated with an XSL style sheet that describes how individual elements should be formatted. XSL style sheets provide far more detailed control over appearance than is possible with standard or non-standard HTML. This chapter shows how to use style sheets to provide custom appearances that provide a web site with a unified look and feel.

Part II: DTDs

Part II introduces the crucial concept of document type definitions, DTDs for short. In Part II you'll learn how to prepare DTDs and how to validate documents against those DTDs.

Chapter 5: Using DTDs in XML Documents

XML has been described as a meta markup language, that is a language for describing markup languages. This chapter explores its use as such a meta language. Readers will begin learning how to design and create new markup languages for use in specialized domains such as music, mathematics, astronomy, electronics, genealogy, and any other field you can imagine. Such markup languages are defined via a document type definition, commonly abbreviated to DTD, which is what Chapter 5 is all about.

Chapter 6 Assembling Documents from Multiple Data Sources

Chapter 6 shows you that a single XML document may draw both data and declarations from many different sources, which may be in different files. In fact some of the data may draw directly from databases, CGI scripts, or other things that aren't files at all. This provides XML with a powerful client side include mechanism through entities and entity references.

Chapter 7 Describing Elements with Attributes

Chapter 7 shows you how to use and declare XML attributes inside of XML tags. Attributes contain information intended for the application that's reading the XML data, but not for the human that's reading the document. All the basic information on the page should be available as plain text even when all the tags are completely stripped out of the page. Attributes are intended for extra information associated with an element like an ID number used only by programs that read and write the file, not for the content of the element that's read and written by humans.

Part III: The Bleeding Edge

This title's a bit misleading. Right now there's very little about XML that isn't on the bleeding edge. Nonetheless, in Part III you'll encounter topics that are far more compelling for what they promise than for what they deliver today.

Chapter 8: International Character Sets

The web is international, yet most of the text you'll find on it is written in English. XML is starting to change this. Unicode is XML's native character. This is good news for Web authors because Unicode supports almost every character commonly used in every modern script on Earth including Cyrillic, Roman, Arabic, Han, and more.

In this chapter you'll explore how to write XML documents in languages other than English, how international text is represented in computer applications, how XML understands text, and how you can take advantage of the software you have to read and write in languages other than English.

Chapter 9: XLinks and XPointers

XML provides all the hypertext power of HTML while adding the ability to link to essentially arbitrary locations in a remote file without using named anchors. Thus it's possible to link to a particular paragraph of text on a foreign web server whether or not the author of that remote page made special provision for that paragraph to be linked to. This chapter shows how linking works in XML, and discusses when and when not to use it.

Chapter 9 introduces XLL, the eXtensible Linking Language, a new means of linking between documents. XLinks and XPointers can do everything HTML's URL based hyperlinks and anchors can do and a lot more. XLinks enable multidirectional links. XPointers allow links arbitrary locations in a document. These features make XLL not more suitable for new uses, but also for things that can only be done with considerable effort in HTML such as cross-references, footnotes, end notes, interlinked data, and more.

Part IV: XML Applications

Part IV introduces you to two practical uses of XML in different domains, webcasting (a.k.a. push) and genealogy.

Chapter 10 Pushing Web Sites with CDF

Microsoft's Channel Definition Format (CDF) is an XML-based markup language for defining channels. Channels allow web sites to automatically notify readers of changes to critical information. Similar to subscription services, this method is alternately called webcasting or push. Chapter 10 explores that format, and shows you how to convert your web sites to CDF channels.

A CDF file is an XML document, separate from, but linked to the HTML documents in a site. The channel defined in the CDF document establishes the parameters for a connection between the readers and the content on the site. The data can be transferred through push--sending notifications, or even whole web sites, to registered readers--or through pull, where readers choose to load the page in their web browser and get the update information.

There is no need to rewrite your site to take advantage of CDF. The CDF file is simply an addition to the site. A link to a CDF file, generally found on a site's home page, downloads a copy of the channel index to the reader's machine. This allows the reader to access the current data, as defined in the channel, with a click on an icon.

Chapter 11 Genealogy

Chapter 10 introduced you to a markup language that was already written by other people and showed you how to use it. Chapter 11 shows you might go about developing such a DTD from scratch. The example used here is genealogy. In this chapter you'll see the gradual development of several DTDs that can be used for genealogical data. You'll be privy to the thought processes of a data designer preparing a DTD for tracking family members and relationships. Along the way you'll learn no just how to use XML tags, but why and when to choose them.

How to Use This Book

This book is designed to be read more or less cover-to-cover. Each chapter builds on the material in the previous chapters in a fairly predictable fashion. Of course, you're always welcome to skim over material that's already familiar to you.

I also hope you'll stop along the way to try out some of the examples and to write some XML documents of your own. It's important to learn not just by reading, but also by doing.

Summing Up

XML is the wave of the future. Writing XML: Extensible Markup Language not only taught me about XML itself. It changed the way I looked at the Web. XML is a sea change in the way you looked at web sites and web site development. And I can think of no better way to learn about it than reading XML: Extensible Markup Language. Why don't you check it out, and let me know what you think?

If I've succeeded in piqueing your interest, you'll be able to find XML: Extensible Markup Language in late June, 1998, at almost any bookstore that carries computer books including amazon.com. It's $39.99, published by IDG Books, and written by me, Elliotte Rusty Harold.


Table of Contents

Preface
Part 1: XML Basics
Chapter 1 Introducing XML
Chapter 2 Beginning XML
Chapter 3 Formalizing XML
Chapter 4 XSL
Part II: DTDs
Chapter 5 Using DTDs in XML Documents
Chapter 6 Assembling Documents from Multiple Data Sources
Chapter 7 Describing Elements with Attributes
Part III: The Bleeding Edge
Chapter 8 International Text
Chapter 9 XLL
Part IV: XML Applications
Chapter 10 CDF
Chapter 11 Genealogy
Appendixes
XML QuickRef
Appendix A International Character Sets
Appendix B XML Glossary
Appendix C About the CD
Appendix D Additional Resources
Index

[ Cafe con Leche | Order from amazon.com ]

Copyright 1998 Elliotte Rusty Harold
elharo@sunsite.unc.edu
Last Modified July 28, 1998