[This local archive copy mirrored from the canonical site: http://www.sun.com/980310/xml/, 980316; links may not have complete integrity, so use the canonical document at this URL if possible.]
XML: Mastering Information on the Web
by Todd Freter
The attention paid to XML (Extensible Markup Language), whose 1.0 standard was published February 10, 1998, is impressive. XML has been heralded as the next important internet technology, the next step following HTML, and the natural and worthy companion to the JavaTM programming language itself. Enterprises of all stripes have rapturously embraced XML.
The origins of XML technology reveal much about its intent and its promise. And XML's promise no small one also implies some significant challenges for the people and organizations who want to take advantage of the XML phenomenon.
This article is first in a series about XML, its promises, and its challenges. Four are planned, but more may appear, and their order here may not be the order in which we publish them. Even so, here is a preview:
Instead, these articles are about what XML means for people, for enterprises, and perhaps for the future of information itself. As a major developer of technology products that have enabled the internet's explosive growth, Sun Microsystems believes it is important to propagate open perspectives on open standards that make information more available and useful.
Today's article, "The XML Idea," addresses these issues and subjects:
"HTML is our data type," Microsoft's Bill Gates said in a February 1996 interview.
That pronouncement was an emblem for the impact that the burgeoning internet and its friendly interface, the World Wide Web, had exerted on corporations, governments, and people. With everyone from billion-dollar corporations and governments to elementary school classes and private individuals publishing websites and web pages, the success of the web and its original means for presenting information, HTML, had been amply demonstrated.
However, some people who had been looking at the internet from a different perspective had concluded otherwise. For those observers, who would start developing XML, HTML had problems:
This is not to denigrate HTML, but merely to establish the perspective that XML's developers held. From other valid perspectives, these problematic characterizations of HTML represent uncontestable virtues. But that wasn't the point.
While the world was flocking to the internet and HTML, a group of men and women watched with bemused concern. These were the developers, implementors, and users of HTML's parent technology, SGML (Standard Generalized Markup Language, ISO 8879:1986). These individuals and their companies had already invested heavily in SGML, which governs the semantics of their documents and of the information of which the documents were composed.
SGML, unlike HTML, assures its users an extensible tag set, and it establishes the rules by which documents (or "information products," as one expert persists in calling them) are produced. SGML yields sets of tags, as HTML is a set of tags, for characterizing what pieces of information mean. The people who used SGML and structured information systems were to become XML's developers, and they believed that SGML technology could enrich and revolutionize the web in some key ways:
In August, 1996 these concerned SGML experts gathered in Seattle under the auspices of the GCA (Graphic Communications Association) to investigate how SGML could emerge on the web scene and command the interest of the web community. Led by Jon Bosak of Sun Microsystems, their discussions focused on two general areas:
The first discussion established the need for SGML on the web. By articulating worthwhile, even mission-critical work that could be done on the web if there were a suitable information format, the SGML experts hoped to justify SGML on the web with some compelling business cases.
The second discussion raised the thornier issue of how to "fix" SGML so that it was suitable for the web. After all, if SGML on the web were such an intuitively brilliant idea, it ought to have happened already. But HTML and its specific tag presentational tag set, not SGML and its multiple semantic tag sets, were on the web in August, 1996.
The experts laid out a plan of radical surgery for the SGML standard itself. In order to make SGML palatable to a wider audience, aspects of the standard that made logical sense but were difficult and costly to program had to be modified or even excised. It should be noted that SGML was designed as a rigorous, complete system, but ease of implementation in software applications was not the ruling priority for the SGML standard. The experts quickly established a rough laundry list of "SGML inessentials" for moving structured information onto the web.
Even before this Seattle conference, Bosak and a small, carefully chosen group of SGML and structured-information experts approached the W3C to propose adding an "SGML on the web" activity to its efforts. The W3C agreed that this was worthwhile and sponsored the effort within its architecture domain. By July 1996, the effort to fit SGML on the web began.
Early in the activity, the W3C representatives who were to develop the XML standard determined that "SGML on the web" would not fly. SGML has its passionate devotees, but it also has its equally passionate detractors. The working group (originally called the "SGML Editorial Review Board") decided to refashion SGML on the web into something new, unburdened with SGML's history. To emphasize its difference from HTML, the working group named it Extensible Markup Language.
The working group members quickly set themselves an aggressive schedule in which to specify the features of XML. They planned the work in three phases:
The particulars of these efforts are available at the XML resources listed above. As mentioned previously, the XML 1.0 standard was approved and published by the W3C on February 10, 1998. Work on XLL and XSL is proceeding.
The world has responded enthusiastically to XML. Go to any trade show or conference associated with publishing, documents, or the internet (and intranets or extranets), and you will see vendor upon vendor pledging or even demonstrating support for XML. The tools with which to create internet content are promising XML. The programs that deliver internet content are embracing XML. The systems that manage internet content are committed to XML.
What does that mean for you, the individual or enterprise who wants to take strategic advantage of XML and all that it promises?
You have to clear some hurdles. These include both familiar and new challenges:
If you have ever migrated from one application to another for developing information, say in a word processor or a spreadsheet, you know about changing your information to fit the new tool's data format. Moving your legacy data into XML is a data conversion task, but it is more. It is a strategic operation to add new business value to your information. This requires work.
Converting any information from a display format such as HTML, RTF, MIF, or PostScript to a structured format like XML will require that you understand what your information really contains. This requires a document analysis and the determination of information semantics on which different parts of your enterprise rely. If it sounds daunting, it is. But there is also good news. Many an enterprise like your own has done this already, and many enterprises in different business sectors have established industry standard information models that can be expressed in XML and, more importantly, can be shared.
Once the relevant information models and their expressions in XML are constructed, the effort to convert existing information into the XML format can proceed. It may or may not be painful, depending on the condition of your existing documents. These efforts can be done in house, or they can be completed with the help of qualified consultants.
Future articles will discuss possible transitions from today's information into XML in greater detail.
Once you have created a body of XML information, you will learn to treat it differently from the information you had before. The applications, file systems, and other software you relied on to elaborate information may not work so well with XML. Those traditional tools may not effectively expose the new value in your XML information. But again there is good news. It is clear that the marketplace is well prepared to deliver XML support in all phases of an enterprise's transition to XML. Already many software vendors are announcing, testing, and even delivering tools to aid in these critical phases of your transition:
Again, future articles in this series will discuss the issues and strategies about moving into XML and taking full advantage of structured information.
There are as many ways to make the XML transition as the Cartesian product of organizations, legacy data formats, and human moods. The next article in this series will look at some XML transition plans currently underway at Sun.
Todd Freter is a programs manager at Sun and responsible for developing tools and technology strategies for producing and delivering technical information.