[This local archive copy mirrored from the canonical site: http://www.vsi.com/Xml-f/extensions/what_is_xml.html; links may not have complete integrity, so use the canonical document at this URL if possible.]

What is XML?

XML is remarkably hot today. Articles in technology and business periodicals are talking about this "giant leap beyond HTML" and its ability to make documents multidimensional. Analysts tell us that XML "opens the door to new object-oriented documents" and that it makes data "richer and smarter." Products and technologies that use XML are being announced by key industry players such as Informix, Novell, Microsoft, Samsung, Platinum Technology, Netscape, Sun, CSC, and Sybase.

But what is it?

XML is a markup language, like HTML, with a vocabulary that consists primarily of "tags." Tags are words or characters enclosed in angle brackets like "<tag-name>". When you use tags to mark up a document, you simply add these tags to the document for the purpose of communicating whatever the markup language is capable of communicating.

Today, HTML is the most well-known markup language. It was specifically designed for a single purpose, which is marking up documents to communicate how they should be formatted for display on the World Wide Web. For example, you can use the HTML tag of "<P>" to indicate the beginning of a new paragraph. Other HTML tags indicate bolding, italicization, etc. Upon this simple foundation lies what we typically think of as "the web" - richly formatted and widely available documents.

The purpose of XML is far more open-ended, but in general it is a "meta-language" - a description of a language - that is designed to be used to define other markup languages, each with a specific purpose.

While HTML can only be used to describe how data is to be formatted, XML can do much more. Specifically, XML allows and encourages marking up for the communication of information content rather than mere formatting. For example, in an XML language, a tag such as "<P vendor-name>" might be used to communicate that a particular piece of data in a document is the name of a vendor. Other tags could indicate address information, terms, and the like. Content markup is valuable because if the information content of a document can be identified, something can be done with the document. Specifically, computer programs can automatically search-for, index, store and make decisions based on the content in the document.

An XML language is made up of one or more XML definitions, which is a set of tags for describing a particular collection of data. These XML definitions are referred to as Data Type Definitions or DTDs. In an XML language, one DTD is created for each collection of data that requires description. For example, to create an XML language for Accounts Payable, one DTD would be required to describe the vendors, one to describe the purchase orders, another to describe invoices, and so on.

A collection of data is generally accompanied by its DTD whenever it is stored or transferred from one place to another, creating "documents that know themselves." The persistent inclusion of the DTD means that the content or meaning of the data is embedded in the document itself in a way that is human-legible and, more important, easily interpreted by any XML-aware process such as a computer program.

This may not seem important at first glance, since computers have been successfully interpreting data for a long time, but XML is unique because it is simple and universal.

XML is simple because, although it is a powerful conveyor of information, the angle-bracket tag format is extremely simple to understand and use. The creators of XML endorse recommend users "be verbose" in defining tag-names so that documents are understandable. Furthermore, XML is always readable - it's plain text. This also makes XML documents easy to create.

Furthermore, it is also relatively easy to write computer programs that process XML documents. That's because the hard parts like "parsing" (finding all the elements in the document) and "validation" (ensuring you received all the elements that you need) have already been done for us in freely available computer programs called "parsers". Finally, programming languages like Perl and Java are being enhanced to work readily with XML documents.

The reason that data marked up with XML is universal is because its data structure is not fixed. This means that programs that make use of XML documents need not contain hard-coded data handling routines and need not have prior knowledge of the format of the data it will receive. Rather, with the aid of the DTD, a program can interpret an XML document however it is delivered. This feature makes it easier for programmers to write programs to handle data marked up with XML. Most important, it makes data easily accessible by a variety of XML-aware processes running on a variety of hardware/software platforms. Because an XML-aware process can receive and process XML documents without having prior knowledge of the sender application, differences in computing standards and protocols become a potential non-issue.

XML is finding its way into many computer systems where one piece of software has to talk to another - and that's a lot of places. XML documents will describe people for universal exchange of phone numbers with PIMs, contact managers and email systems. XML will describe invoices, sales orders, and shipping confirmations for instant-EDI applications. XML is being used to describe real-estate listings, multi-media data streams, and payment transactions. And XML will be used as a core component in universal messaging systems.

The power, simplicity, and universality of HTML created a world-wide-web of readable, interesting, formatted information available to all. Likewise, XML promises to deliver a new web of inter-connectable systems that will move business processes and network services to an entirely new level. We have really only begun to see what XML will deliver.