SGML: Seybold on XML

SGML: Seybold on XML

From [W3C mailing] Thu Nov 21 18:45:30 1996
Date: Thu, 21 Nov 1996 14:37:47 -0800
From: bosak@atlantic-83.Eng.Sun.COM (Jon Bosak)
Subject: Seybold on XML
[Courtesy of a correspondent.]


Volume 2, No. 7
November 20, 1996


On the tenth anniversary of the adoption of SGML as an ISO standard, a band of SGML experts announced they have drafted a simplified subset of the language they hope will spur the use of SGML on the Internet. The new language, Extensible Markup Language, or XML, was prepared by a World Wide Web Consortium working group consisting of about 80 members, primarily representing vendors. The announcement was made at SGML '96, being held in Boston this week. The first published draft is available on the Web at

XML, like SGML, is a meta-language for describing the markup of different types of documents. It is simpler than SGML, reducing a 500-page reference to 26 pages.

Unlike HTML, which has a fixed (albeit changing) set of tags, XML lets you define your own tags and attributes. Support for XML by the Internet community would open up vast new possibilities for Internet publishing. Instead of shoehorning all documents into HTML, or having to invent a browser to handle non-HTML documents, XML would enable a wide array of user-defined documents to be handled by generic Web application software.

Users of SGML can easily make use of XML. XML is a valid subset of SGML, so translation from SGML to XML is straightforward.

To simplify SGML, the W3C working group dropped support for certain features that required heavy processing on SGML client software. For example, a well-formed XML document is unambiguous, so that a browser or editor can read the tags and create a tree of the hierarchical structure without having to read a document type definition. XML also does not allow markup minimization, require that empty elements be self-identifying or support several of the complex optional features of SGML.

At least two vendors were demonstrating XML support at SGML '96. Neither Microsoft nor Netscape has disclosed if it will support the standard.

XML uses 8-bit ASCII and Unicode as its primary character sets. Having rocked the SGML community with the most radical SGML development in a decade, the W3C working group plans to continue with two more phases of XML. According to Jon Bosak, chair of the W3C SGML Editorial Review Board, the next phase will add more complex hyperlinking, and the third phase will address style sheets, using either an improved version of CSS or an online version of DSSSL.