[Archive copy (text only) mirrored from the URL: [IEEE DynaWeb server]; see this canonical version of the document.]

[CS_home] [All_titles] [This_title] [Expand_this_view] [Collapse_this_view] [Previous_hit] [Next_hit] [Clear_search] [Show_frames]



IEEE Expert Intelligent Systems & Their Applications (formerly IEEE Expert) 1997



Internet Services

IEEE EXPERT
Vol. 12, No. 3: MAY/JUNE 1997, pp. 98-99

XML and style sheets promise to make the Web more accessible

[PDF ICON]

Giovanni Flammia MIT Laboratory for Computer Scienceflammia@sls.lcs.mit.edu

The theme of the Sixth World Wide Web Conference, held in Santa Clara, April 6-11, was accessibility. Today's Hypertext Markup Language documents, with all the graphical bells and whistles added by Netscape and Microsoft, are biased toward a rich graphical user interface provided by the two most popular Web browsers running on desktop machines.


This poses a serious accessibility problem when we try to present the same information encoded in an HTML document, using a different kind of user interface. For example, we might want to present the information by speaking it aloud via a telephone handset or using a braille device, via a text-only small display such as the US Robotics Palm Pilot, or on a low-resolution display such as WebTV. We might even want to present the content in a different language or summarize it in a few sentences. The accessibility problem is an issue not only for disabled people but also for the elderly[mdash]and for rendering Web documents on net-enabled computing devices, each one with its own special user interface modality and limitations.


The current situation

Currently, the job of rendering HTML, using a different modality or display resolution for which it was originally designed, is a very tedious process that usually produces a medium- to low-quality output. This is because it is difficult to extract the relevant content from HTML documents and separate it from the display directives such as columns, tables, frames, and JavaScript widgets.

In addition, metacontent about the document is usually missing. Metacontent is some additional information that specifies in a structured fashion what a document is all about so that information can be easily extracted and indexed. One example of metacontent would be, "This document is a directory with names, e-mail addresses, and phone numbers, and the format follows such and such a syntax." This type of deep information is different from an HTML surface specification, which instead states, "This document is a table with three columns and 20 lines, there is some text in each cell, the second column cells all have the @ sign, and the third column cells all have seven digits."

HTML is an application of Structured Generalized Markup Language, a metalanguage that rules how to specify the structure of text and hypertext documents, using delimited tags that wrap around plain text. HTML was invented by Tim Berners-Lee so that anybody could write simple hypertext documents by hand without knowing the details of SGML. HTML was so easy to learn that it contributed to the exponential growth of the number of documents on the Internet. In contrast, the more generic SGML is more difficult to learn, and SGML documents require special grammar rules for parsing the document contents. The rules are specified by document-type definitions, encoded into separate files. Because of the universal nature of SGML, it is quite complex. HTML started out as a simple application of SGML, with its own DTD. Unfortunately, even HTML is becoming quite complex, and it is heavily biased toward the visual display and typographical layout of information.


The World Wide Web Consortium introduces two new features

The W3C has been trying to standardize proprietary features that Netscape and Microsoft introduce with each new version of their browsers. To allow more flexible display of documents, the W3C introduced cascade style sheets, external files that specify typographical renderings of documents in the same spirit of Tex-document style specifications.

At this conference, the W3C introduced another application of SGML[mdash]namely, the extensible markup language. XML is a limited subset of SGML, with a simplified syntax, that does not require DTD grammar rules but allows the specification of any new tagging schemes, as long as they obey some simple rules of well-formed syntax. XML makes encoding the metacontent about each document as easy as writing HTML documents. For example, an address list can be encoded in an XML document as follows, wrapping each element into balanced tags:

<?XML VERSION="1.0" RMD="NONE"?>
<addressBook>
 <entry>
  <firstName>Bill    </firstName>
  <lastName> Clinton </lastName>
  <mailto>   president@whitehouse.gov</mailto>
 </entry>
</addressBook>

XML is not limited to encoding text data, but can be used to represent and communicate in plain text many types of structured data. For example, one of the earliest applications of XML has been the specification of a standard for encoding chemical structures called the Chemical Markup Language. Following the example of the chemical domain, each different application can define new tagging schemes, and generic tools can be used to parse, render, index, and search XML documents. In principle, it is not necessary to establish standards for determining the syntax of new tags, as long as new and specialized tagging schemes obey some simple rules of well-formedness.

XML can be coupled with cascade style sheets that specify how to render each field of XML documents on various types of net-enabled devices. Each device will have its own specialized style sheet that takes advantage of the modalities and limitations of the device it is designed to operate on. In particular, T.V. Raman from Adobe introduced another new proposed standard from the W3C, cascade speech style sheets. These are specialized style sheets that are tailored toward the audio rendering of HTML documents. Analogous to cascade style sheets, which specify the various typographical font styles for displaying various sections and tagged text, cascade speech style sheets specify voice color, timbre, pitch, and quality for text-to-speech systems that are used to speak the documents aloud. For example, different types of voices, pauses, and sounds can be used to render hyperlinks and list items. The specification includes directions for placing voices and sounds in a 3D space, using audio stereo output.


Future prospects

I left Santa Clara wondering about the future. While XML and cascade speech style sheets promise to increase the accessibility and indexing of Web documents, it is unclear whether they will be adopted. Will future versions of the two popular browsers include XML and cascade speech style sheets? Will authoring tools such as Microsoft Home Page let us edit XML documents? The next six months will tell us whether these are just really nice ideas or the introduction of new standards used every day by millions of Web-application developers. At the moment, SGML and XML are being used in academic and research projects and text-processing applications that handle large amounts of structured documents, such as public filings at the US Securities and Exchange Commission.

Archrivals Microsoft and Sun are both part of the XML consortium. Microsoft has specified its new "push" technology, the Content Definition Format, as an application of XML. Sun envisions a synergy between data structured with XML and Java servers, clients, applets, and applications that process XML files. On the other hand, structured data can be transmitted between Java clients and servers in the form of binary objects and classes, using component architectures such as JavaBeans, and application programming interfaces such as Remote Method Invocation. In principle, XML should be easier to use than these programming alternatives, because it is plain text that is easy to read and write by nonprogrammers. Reluctantly, Netscape joined the XML consortium soon after the Web conference, after having dismissed the technology as unnecessary.


Further reading

Accessibility guidelines for the Web: A position paper by Gregg Vanderheiden from Trace Research and Development

http://www6.nttlabs.com/HyperNews/get/PAPER253.html

A simple tutorial explaining the basic elements of SGML

http://www.softquad.com/sgmlinfo/primintr.html

The XML list of frequently asked questions: a tutorial providing the latest developments

http://www.ucc.ie/xml/

CML, an introduction to structured documents

http://argon.ch.ic.ac.uk/ectoc/echet96/+CML/epub/xml.html

XML, Java, and the future of the Web: A position paper by John Bosak from Sun

http://sunsite.unc.edu/pub/sun-info/standards/xml/why/xmlapps.htm

The Cascade Speech Style Sheets specification, by T.V. Raman

http://www6.nttlabs.com/HyperNews/get/PAPER14.html

The complete proceedings of the Sixth World Wide Web Conference

http://www6.nttlabs.com/ToC.html