[Archive copy mirrored from the URL: http://www.mcs.net/~dken/xmlcase.htm; see this canonical version of the document.]

The Case for XML
by Dianne Kennedy

(Return to Index)

Are People Really Happy with HTML?

According to Netscape (Seybold Report on Internet Publishing, January 1997), HTML is and will remain the overwhelming choice for Web-based publishing. "The customer has spoken clearly preferring HTML to SGML, and Netscape feels that customer needs are better met by deploying resources to further the functionality and ease of use of HTML." Given my first hand experience as an industry leader in several major publishing sectors,1 I was surprised to see the lack of understanding of the publishing market displayed by Netscape. I am the first to admit that SGML is certainly over-kill ( even for many serious publishers. But I cannot buy Netscape's position that Internet publishers are satisfied with HTML as the predominant Web publishing language. Yes, my teenage daughter and her friends are happy with HTML. It's simple and fun to use. But it is my contention that serious publishers find HTML woefully inadequate.

Publishers are clearly intrigued in the promise of publishing directly on the Web. Yet today the majority of electronic publishers remain steadfastly committed to CD-ROM as their preferred commercial delivery mechanism. According to Dale Waldt, VP of Product Development for Thomson's Research Institute of America Tax Publications, HTML is just too "brain-dead" to provide useful functionality. Datamation in its February 1997 issue points out that companies with established intranets are also becoming disenchanted with HTML. "Companies are yearning to push their intranets past the limits of read-only, in-building HTML."

HTML, the underlying language of most Web documents today, is a tag set that has been specifically designed to support display and hypertext linking. The use of HTML has grown exponentially because it is so easy to learn and to use. Tools to author in HTML are now both commonplace and affordable. But more and more, HTML is coming under fire for several reasons.

HTML for Automotive / Heavy Truck Information Delivery

On November 15, 1990, the Clean Air Act mandated the automotive industry to provide emissions-related service information in an electronic format which would facilitate the diagnosis and repair of vehicle emission control systems. To satisfy this regulatory requirement, the Society of Automotive Engineers responded with the development of a family of standards known collectively as SAE J2008. At the heart of these standards is a relational data model for the systems, configurations, and components of a vehicle and an SGML tagging scheme which specifies the encoding of all service information. In 1995, the heavy trucking industry developed a companion standard known as T2008 which is currently being consolidated with the automotive standard. By the year 2002, vehicle service information in this SGML format will be a legal requirement for sales in the state of California. It is expected other states will quickly adopt this requirement as well.

The outstanding question in the automotive and trucking industry remains one of delivery. Clearly the Web should be the delivery media of choice. But this is simply unworkable if HTML remains the sole language of the Web. To understand why this is so, one must go back to the legislative requirement ( that is, the requirement for electronic access to diagnostic and repair information. J2008 is a specification which partitions and categorizes vehicle diagnostic and service information. It was designed so that a technician can quickly navigate the information base and find all information necessary to perform effective diagnosis and repair. The goal is to support reduced search time by facilitating data retrieval. Because HTML does not allow for either hierarchy or the addition of data-specific tags to facilitate data navigation and retrieval, the Web is simply not a delivery option for this major business sector. If HTML could be extended to include some critical data-specific tags such as make, model, year, configuration, and service category, the Web would become a center for automotive and heavy truck publishing commerce. People could download service information and because of the richness of encoding continue with client-side processing. Without extensibility, these large corporations must turn to other electronic delivery channels or rely upon server side processing.

HTML for the Airline Industry

In 1990, the ATA (Air Transport Association)/ AIA (Airline Industry Association) began work to update their publication specification then known as ATA 100. The goal of the updated standard was for the manufacturers to provide digital information which was directly usable by the airlines. SGML was chosen as the basis for the encoding scheme and over the next few years document tags sets where designed which would not only facilitate the interchange and use of data by the airlines, but would facilitate the use of the information by airline mechanics by providing a mechanism for easy location and retrieval of information required to complete a specific maintenance task on an aircraft uniquely identified by tail number. Today most aircraft which are in production are supported by SGML encoded maintenance and flight operations data.

Again, the question comes down to one of delivery. Providing highly structured data, designed for increased information access times, was the goal of ATA Specification 2100 and is clearly the goal of aircraft manufacturers. But without economic and reliable information delivery mechanisms, no benefit can be realized. According to a statement by the ATA TICC Committee in 1996, it is estimated that structure-based information access can result in savings of from 2.1 to 3.6 mechanic-hours per day or $500,000.00 or more per year to the average airline. Can the airline community rely upon the Web to serve as the channel for information delivery? With the current limitations of HTML, the answer is "NO". What users need is extensible HTML to support critical search paths in order for the Web to become a viable information delivery alternative in this business sector.

The List Goes On . . .

Another publishing sector where I am highly involved is the arena of journal publishing. Journal publishers have attempted for years to standardize and use SGML as a basis for their publishing efforts. First the industry developed the Association of American Publishers Electronic Manuscript Standard. This was an early SGML definition for the coding and publication of journals and manuscripts. In the early 1990's this work was updated under the auspices of ISO to become ISO 12083, "Electronic Manuscript Preparation and Markup". Despite the clear need for structured data to facilitate ongoing academic research, the cost of implementing SGML and the roadblocks caused by the disparate authoring community have lead to a painfully slow adoption of SGML within this publishing market. Electronic delivery in this market seems split between HTML on the Web and PDF with acrobat readers. Economics leave most journal publishers with few if any options. However, according to Trish Redmond, publishing specialist with the Coris group of R.R. Donnelley in New York, neither HTML or PDF meet the real need. "With either medium the academic community must rely upon archaic word search to find what they need. With just a few critical data field tags in addition to the HTML tag set, search time could be cut in half. New discoveries could be shared and utilized instantaneously across the Web. It boggles the mind to imagine the advances such a simple improvement in the Web language would allow."

In her position at Donnelley, Ms. Redmond works in many other publishing market places. Another area where an extensible HTML would improve the ability of companies to do business on the Web is in on-line catalogs. "With HTML, limiting customers to word search of on-line catalogs makes use of such catalogs quite cumbersome and time consuming. A frustrated customer is not a happy customer. The only option today is to put catalog databases up on the Web. This is costly and also not ideal from a consumer point of view. If we could extend HTML, however, to add our own tags for key fields such as , , or searches would become much easier for customers trying to shop on the Web. This is just what we need for our catalog customers."

XML; The Time has Come

According to Norman Scharpf, president of GCA, "As the volume of on-line documents continues to grow, being limited to text search will become increasingly frustrating and unacceptable. XML is designed to facilitate the development of next-generation Web publications where searches can be based upon document structure or specific content fields as well as text strings. XML will provide solutions for Web publishers with none of the shortcomings inherent in today's HTML."

XML, like HTML, is an extremely simple dialect of SGML. But XML is different in that it is extensible. This means that finally Web publishers can either extend HTML by adding their own tags, or can even create their own descriptive tag sets. Because the extension of HTML tags or creation of new tag sets follows a simple methodology prescribed by the XML standard, XML provides Web publishers with the best of all worlds without the rigors of document design, coding, and validation imposed by SGML:

Netscape dismisses the potential of XML by simply viewing it as "essentially a new version of SGML" and therefore uninteresting to the publishing community at large. I agree with the Seybold Report when it replies, "That product managers at Netscape think HTML somehow will work as a way to define all Web documents that we need ... illustrates just how far behind Netscape will be should Microsoft decide to leverage its expertise in SGML and XML... By ignoring the W3C-blessed vendor-neutral approach, Netscape is leaving itself wide open to be pounded by Microsoft on the one point Netscape once hounded its rival(the oh-so-bad label of "proprietary."
(Return to XML TOC)
(Return to Home Page)

Dianne Kennedy serves as chairperson for the SGML-based publishing efforts of the automotive and heavy trucking industries SAE J2008 standard. She is also deputy convener for ISO 12083, the ISO standard for journal publishing. Over the past 7 years, she has also served as a key industry consultant to the airline industry in the development of their ATA Specification 2100 which guides the authoring of all aircraft maintenance and flight operations documentation.