[Archive copy mirrored from the URL: http://www.nwfusion.com/; see this canonical version of the document.]
By Tim Bray
People even remotely involved with intranet development know the acronym HTML. The same can't be said for XML - at least not yet.
For those who don't know, XML stands for Extensible Markup Language. It dates back slightly more than a year, to when the World Wide Web Consortium (W3C) authorized the creation of something called the SGML Activity. Nobody noticed.
XML broke the surface last November with the release of the first language specification. One or two people noticed.
In February, Microsoft Corp. and Netscape Communications Corp. became publicly interested in XML. Now a lot of people are noticing.
Here's an explanation of why XML was built, what it looks like and how it can be put to practical use on your intranet.
XML is a drastically simplified subset of the W3C's Structured General Markup Language (SGML). It provides what's needed to make intranet applications run smarter and faster. On an intranet, XML will enable two capabilities notably absent from today's Web technology.
XML will make intranet data much easier to manage, because it will be much richer and more self-describing. And it will enable the development of smarter, faster applications by transferring processing from the generally overloaded server to the often underworked desktop PC.
Why XML?Web technology is great because it's easy. Users don't need any training in how to make it go, and information providers can get Web-based applications running on intranets in no time.
But Web technology also is a pain, because it's stupid and slow. It's stupid because it turns a powerful desktop PC into a dumb batch terminal. It's slow because to get any work done you have to send a batch of information across the network to the server and then back to your desktop.
These problems can be solved by downloading some of the processing work to the fast PC sitting in front of the user. That way, the application code on the desktop can respond instantly to each user keystroke and can do at least part of its work without network round trips to the server.
One of the key strategies for solving this problem is the use of smarter data.
Intranets mostly run on HTML, which does a great job of displaying documents and graphics. It comes with a fixed set of tags and structures optimized for this purpose. This is fine, but what if you want some indexing information, or want to identify part numbers, or inventories or date of employment in your intranet data? HTML just doesn't do this, and won't.
This is what XML is all about. Rather than providing a set of predefined tags, XML requires you to define your own. This means the documents on your intranet become richer, smarter, customized to your needs and ready for use by your applications.
Technical highlightsWhile XML is actually a simplified version of SGML, you don't need to understand SGML to understand XML. This is good, because SGML is incredibly general, flexible and hard to understand. XML takes the most commonly used parts of SGML and packages them in an easy-to-understand way.
With XML, document components are marked with tags and attributes, just as in HTML. You can invent your own tags and attributes and, if you want, share them and control their use with a formal declaration called a Document Type Declaration. What's more, XML is designed so it's incredibly easy to write programs to read and extract information from XML files. A good programmer can whip up such a program in a day.
In addition, XML has built-in facilities for re-using shared information in multiple documents or multiple times in one document. It also handles all the world's alphabets, based on the Unicode standard - you can mix Arabic, Eng-lish, Japanese and Russian with no problems.
Unlike HTML, XML doesn't have built-in tag formatting, since there are no built-in tags. So to present XML on screen or paper, you have to write a stylesheet. In fact, XML wouldn't be useful if not for the arrival of Cascaded Style Sheets (CSS) on the Web technology scene. (For a Handbook on CSS, see IntraNet, August 1996, page 10).
CSS control the display of Web pages by using an external stylesheet rather than built-in tag behaviors. Since XML has no built-in tag behaviors, you have to have stylesheets to display it at all. But page creation is much more flexible and sophisticated.
XML also requires another new Web techno-logy - Dynamic HTML. One of XML's main purposes is to allow programs running in the browser (Java or ActiveX) to access and manipulate the data you send over your intranet. Up until recently, with the announcement of Dynamic HTML from Microsoft and Netscape, browsers didn't make this possible.
While Dynamic HTML addresses the problem, the two companies' versions are incompatible. The good news is that they are now working together in the W3C on something called the Document Object Model (DOM). The goal is to provide a single version of Dynamic HTML that will work in browsers from either vendor.
Intranet metadata with XMLOne of the most popular intranet applications is document search, retrieval and management. A host of vendors, old and new, offer excellent products in this area. However, these wares are held back by the fact that HTML, the language of most intranet documents, is so simple and hard coded.
If you are going to search, retrieve and manage documents, you need metadata - in other words, data about data. Examples of metadata are document authorship, ownership, date, subject, security rating, pricing information, copyright coverage - anything that is about the document but not part of it.
HTML doesn't offer a good place to store metadata. But since XML is extensible, metadata is easy - you just make up whatever data you need to manage your intranet properly. Thus, professional document management and retrieval is immensely easier for XML documents.
The big win with XML, though, is that you can write smart applications that make the best use of the browser and the desktop PC, without requiring server intervention for most tasks.
Imagine you are building a human resources application that allows retrieval and update of records based on data such as name, age, date of employment, salary, location and department. You can build an application like this right now on the Web, but since HTML doesn't have tags for any of these fields, you have to go back to the server every time you want to do any real work, even if all you want to do is to re-sort the records on your screen by age or department or date of hire.
With XML, all these items of information can be marked up as what they are, and a Java program running in the browser can do all sorts of useful analysis, sorting and selection without involving the server. The result is a much smarter and faster intranet application.
Furthermore, since Java code is portable and downloaded on demand, you haven't lost the benefits of the uniform low-overhead browser delivery environment.
Tool and resource issuesFor XML to become more than just an idea, tools are needed for authoring, generating, managing and analyzing XML data.
The good news is that since XML is a form of SGML, a lot of tools already are on the market that can handle it. Granted, there is work to be done: Since SGML has been used mostly by large-scale professional publishers, the tools tend to be expensive and require considerable training. The producers of these tools are going to have to move quickly to bring the price points down and achieve the ease-of-use levels typical of a modern word processor.
Looking ahead in the Internet and intranet worlds is perilous. XML seems to solve a lot of problems at once, but nobody can tell where we'll be standing a year from today. One thing is sure: When every desktop is equipped with a browser that is capable of processing XML, CSS and the DOM, we'll have a richer and cheaper application delivery environment than we do today.
Network World, Inc. | Sponsor index | How to Advertise | Copyright