[Archive copy mirrored from the URL: http://www.zdnet.com/macweek/seybold97/1001/standards.html; see this canonical version of the document.]
SGML gains recognition, popularity in XML subset
Standards focus of session
By Erik Sherman
SGML, or Standard Generalized Markup Language, has been an esoteric realm since its introduction in 1986. SGML certainly has been put to good use by corporations in industries such as aerospace, automotive, pharmaceuticals and commercial publishing to create and maintain long technical documents that are impossible to manage in more typical desktop publishing programs. Yet for all the utility in larger businesses, the standard has received relatively little attention, especially compared with its Web-weaving cousin, HTML.
How things have changed. Today, for the first time in the history of its San Francisco show, Seybold is featuring a full-day seminar on the topic -- "SGML/XML Knowledge Day." Specialized vendors are exhibiting in the SGML Open Consortium Pavilion (Booth #3109 N) with a number planning to make product announcements, and Microsoft Corp. (Booth #1233 S) and Netscape Communications Corp. are at the show and are expected to demonstrate significant support for XML (eXtensible Markup Language) -- a new standard that is a subset of SGML.
Smaller is bigger
"The biggest thing that's happening is XML," said Frank Gilbane, a director at CAP Ventures Inc., a Norwell, Mass.-based research firm. "It's designed to solve the main limitations of HTML without the complexities that a full SGML would carry with it."
HTML describes how text data will be presented on a Web page. Yet by its nature, HTML acts like an abstracted typographic language -- as it indicates whether text should be displayed as a headline, article text and so on -- and makes no distinction about the nature of a document's content. So HTML cannot distinguish between, say, a chapter and a section that belongs within the chapter. This has little impact on simple Web sites. Complex sites, however, become much more difficult to manage because a company cannot examine the nature of content.
"Right now it's hard to do any document management [in HTML] because the links are hard coded," said Robin Tomlin, executive director of the SGML Open Consortium, an SGML vendor association. "When you have [support for document structure], you can have some validations of the information, so you know that all the information you should have there exists."
XML is like a cross between SGML and HTML. It has the flexibility developers need to define types of content and how they are related while being less complex than full-blown SGML. Since for all practical purposes it encompasses HTML, existing Web content can remain as it currently stands. Because it has most of the power of SGML, a single tagged text document could conceivably drive not only display of Web sites, but production of paper documents.
"XML is not so format-driven, but it looks that way. Once you have information identified like that, then you can do more manipulation. It's not just dumb data," Tomlin said.
"In very few cases would you see anyone creating HTML documents, then using that source data to produce paper or some other output form of the document," said Mike Maziarka, director of Parlance product management for XyVision Inc. (Booth #3314 N). Such cross production should be easily achieved with XML.
"The impact of XML on companies creating content is that the process of collecting and/or converting information into XML is much quicker and less expensive than for SGML. The fact that XML is not predefined allows the information to be more easily repurposed," said Gary Palmer, director of R&D for ActiveSystems Inc. (Booth #3202 N).
Similarly, XML and HTML are different, but only to the creator of a document. Users retrieving a page would not notice any difference, however, because an organization using XML, or even SGML, and publishing information on the Web could translate the document into HTML, which the browser would then display.
The result of the multiple standards tracks has been a splitting of the market. The authoring market has shaken out and what once was an entire category of high-end authoring tools has shrunk to a handful of offerings.
For XML to catch on, it needs wide vendor acceptance, especially if browsers are to support the standard. And that seems to be on the horizon, as both Netscape and Microsoft have signed on to back the SGML extravaganza at Seybold San Francisco.
"For them to be talking on SGML Knowledge Day about XML is a real leap," Tomlin said. "Typically, SGML has been a real niche standard." According to sources, the major vendor support could be impressive, with Microsoft conceivably creating an entire XML marketing group for its activities.
Gilbane said he also expects major support from the two browser companies to drive use of the standard. He noted that Microsoft is backing the use of XML to describe such Web-related activities as channels for push publishing.
Support for XML is coming from the niche vendors as well as the Internet giants. "A lot of the vendors are coming out with new XML browsers and viewers," Tomlin said. "And because XML allows you to manipulate data on the Web, I think you're also going to see data-management tools that support XML on the Web."
As an example, ArborText Inc. (Booth #3203 N) is announcing a major upgrade to its authoring system. XyVision (Booth #3314 N), while not planning any product releases with XML support for the show, is still formulating its plans and expects to make an announcement in the future. ActiveSystems (Booth #3202 N) will demonstrate how to use its existing products to integrate SGML and XML. A number of vendors in the SGML Open Consortium are exhibiting as part of an SGML pavilion at the show (Booth #3109 N). Several companies are also offering presentations on SGML in a theater area near the pavilion.
"They don't just want to publish stuff on the Web," Gilbane said. "They want to build a repository of this electronic information." By taking the additional work to create an SGML repository and creating tools to extract HTML content, there are more options open for publishing.
One problem with user implementation has been unnecessary complication. "If you design your application well enough, you don't really need a complex tool," Gilbane said. "Sometimes you can't make it that simple, but a lot more people could than do."
"SGML/XML Knowledge Day," Wednesday, 10:30 a.m.-5:30 p.m., Center for the Arts: Forum.
Copyright © 1997 Mac Publishing LLC. All rights reserved.