Many companies have jumped wholeheartedly into the Web, only to find that deploying a large Web site is as complex as developing a large application--and that HTML is not up to the task. It's akin to trying to develop an operating system in BASIC.
The Extensible Markup Language, or XML, is the World Wide Web Consortium's answer to the limitations of HTML. It is an extremely flexible language that will enable organizations to deploy more sophisticated documents and exchange complex data via the Web.
The XML specification was released at the Sixth International World Wide Web Conference in Santa Clara, Calif., earlier this month (see story). Several software vendors, including Microsoft Corp. and Netscape Communications Corp., have already endorsed it.
What is XML?
XML is a simplified version of SGML (Standard Generalized Markup Language). To understand what XML is, and what it's good for, it's necessary to understand SGML.
SGML, an international standard that predates the Web, is actually a "metalanguage," a language for describing document markup languages.
For example, HTML is a markup language that can be described by SGML. To use an SGML editor to create Web pages, an author would first have to supply the editor with a description of HTML. That description (written in SGML) is called a DTD (Document Type Description).
SGML also enables organizations to exchange data. For example, an auto parts manufacturer could use SGML to create a markup language for its parts documentation. The language might include tags such as <make>, <model> and <year>.
The manufacturer could then distribute the DTD to its distributors, who would use it to create applications that search for the custom tags and extract the information automatically.
Theoretically, we could be using SGML browsers to surf the Web. This would give Web authors tremendous flexibility: If HTML didn't have the features needed for a given set of documents, authors could create extensions to HTML and attach a DTD to their documents.
But Web browsers are not designed that way, because SGML is simply too complex. Writing a DTD is difficult, as is writing applications that can accurately decipher them. In other words, the complexity of fully implementing SGML outweighs its benefits.
XML to the rescue
XML was designed as a compromise between the simplicity of HTML and the flexibility of SGML. Like SGML, XML is a metalanguage, but it's easier to use and creates simpler DTDs.
Using XML, authors can create new tags at will, even very complex ones. They also can use XML DTDs to validate the structure of large numbers of documents, which is important when importing the data from those documents into other applications.
XML is also fully SGML-compatible. Because XML documents are readable by SGML software, organizations with an investment in SGML can use XML right away.
However, since XML is a subset of SGML, it can't read all SGML documents. Ironically, one important SGML language that is not XML-compatible is HTML. Fortunately, only minor changes are needed to make an HTML document compatible with XML.
Organizations can use XML to ease the exchange of information between disparate applications. For example, the Chemical Markup Language is an XML-compatible markup language with specific extensions for describing molecules and compounds. Using the DTD for that language, a developer could create a filter to import data points from a Web page into a proprietary chemical modeling application.
Developers also will be able to create clients that are more intelligent. An XML client, for example, could sort the part manufacturer's data by make, model or year--or show the user only the portion of the data pertaining to his or her model of car.
In addition, XML will make intelligent agents easier to design and deploy. Today, agent software has to jump through hoops to recognize the right data points on constantly changing Web pages. With XML, relevant data points can be marked with their own tags (such as <price>, for example), so they're easy to find.
Finally, XML includes hypertext features that are currently missing from the Web. Such features as bidirectional and location-independent links and "transclusion" (where a linked document appears as part of the current page) will be possible using XML.
For more information on XML, visit the W3C's Web site at www.w3.org/pub/WWW/MarkUp/SGML/Activity.