SGML to Braille, Large Print, and Audio

Yuri Rubinsky

In the last two decades of the 20th century, the term access has begun to have a specialized and somewhat technically oriented meaning in the library community. Specifically, access is used as an alternative to acquisition of library materials. In other words, the library has fulfilled its mission if it gets the requested information for a patron even though a particular library might not have it. Access also refers to the ability of a computer to find and retrieve a citation or document in an electronic database. In both cases, access is enhanced by the associated technology. In the first case, interlibrary networking facilitates finding remote sources of information. In the second, a data collection is organized through software and made available through a combination of computer hardware and software.

But what of the human side of access? Is there yet another possible meaning or use of the term as it applies to making buildings and materials accessible to patrons with special needs? Certainly many libraries have led the way in making their facilities accessible to the handicapped. What if the disability is visual impairment? Once patrons are inside the building, what do libraries offer if their needs require alternative forms of typical library materials? Unless the library is specifically oriented toward visually impaired patrons, chances are good that access to the material is virtually impossible.

This problem provides an opportunity to apply technology to facilitate access for the print-disabled. Yuri Rubinsky and his International Committee for Accessible Document Design (ICADD) colleagues have devised a method of electronic document markup and transformation based on the Standard Generalized Markup Language (SGML) which leverages the existing document structure and enables rapid production of alternative forms of a text for the visually impaired. This article discusses Rubinsky's explanation of the process delivered as part of the Distinguished Seminar Series at OCLC on October 11, 1994. We will briefly cover the background of Rubinsky's work, discuss SGML and the automated transformation process, and suggest what this process might imply for patrons, libraries, and librarians.

Background

In 1989, Rubinsky received a letter from Jesse Kaysen formerly of Raised Dot Computing outlining the way in which she had used SGML as a "consistent, easy way" to pour documents into software that could produce Braille output. This achievement was notable because it highlighted the importance of an output-independent language for converting electronic documents into Braille. Specifically, it showed three important points:
  1. There was a good mapping from the nonprint aspects of SGML to Braille.
  2. The potential also existed to transform the markup to other output manifestations for the visually impaired (e.g., large print or voice).
  3. Automated transformation of SGML-tagged texts could open up a world of documents to the visually impaired.
Given that a standard mapping protocol could be developed, an automated transformation could greatly reduce the cost ($1,600 per book produced by manual conversion) and production time (typically eight months for a manual conversion of a print book to Braille). Moreover, the creation of these SGML elements could facilitate the production of large print and voice-synthesized versions of the text as well.

SGML

Dr. Rubinsky explains that "all you need to know about SGML" is that it is a language which allows the author to define a markup environment in the flow of a document.

A basic SGML text consists of a Document Type Definition (DTD) and the document content; content is composed of markup and data. The DTD defines the valid markup elements for a document type and what they mean. In other words, a DTD is a descriptive envelope for the content of the document. For example:

<:!DOCTYPE ARTICLE [
<:!ELEMENT ARTICLE -- (AUTHOR, TEXT)
<:!ELEMENT AUTHOR -- (#PCDATA)
<:!ELEMENT TEXT -- (#PCDATA)
<:!ATTLIST TEXT style CDATA #implied ]
The preceding DTD defines a document type article as consisting of two structural parts, author and text. The elements author and text are further defined by the terminal symbol PCDATA which means that the contents of this element is parsed character data, i.e., data characters in the text that are not markup. Finally, the text element has an attribute style.

In the "document instance," markup tags surround pieces of document content and describe the function of the pieces in the document's overall structure. Attributes modify the elements by carrying additional information about the document section. In the following example, the tags are in bold and the attribute is in italic:

<article>
<author>Yuri Rubinsky</author>
<text style=emphasized>This is a 
SHORT article</text> </article>
The resultant printed output given the above DTD and document instance could be:

This is a SHORT article

Note that we said that the resultant output could appear like the string above. This is because an (output) application could use the style attribute value "emphasized" to italicize the text. No physical manifestation is presumed by the style attribute value.

A key aspect about a DTD is that it allows you to validate attributes and attribute values for each element without marking up the document again for a special output application (like Braille). The SGML parser determines whether the document structure is valid. The output application (print, Braille, Mosaic, etc.) determines what the special markup attributes mean.

SGML and the ICADD Methodology

Since December 1991, Rubinsky has served as a member of the International Committee for Accessible Document Design (ICADD), which is developing strategies and techniques for the use of SGML to generate Braille, large print, and voice-synthesized texts. ICADD's work began with three assumptions:
  1. The markup technique must be straightforward and simple.
  2. Only one set of markup should be used. If a second markup is required for nonvisual encoding, it will likely not happen.
  3. Archival documents must always contain the richest possible markup, thereby further facilitating access to the document.
Given those assumptions, the ICADD technical subcommittee's goals were:
  1. To make the transform process as automatic as possible.
  2. To keep the technique simple.
  3. To reduce the costs involved in making texts available for the print-disabled community.
The committee believes that it is possible to have creators of DTDs "build in the relevant attributes to allow for Braille, large print, and voice-synthesis from the files encoded for other purposes, as a by-product."

ICADD 22

The result of ICADD's work is a set of 22 attributes to support basic output formats available in Braille, large print, and computer voice. These attributes describe how elements should be transformed by an application attempting to create mapped output. The ICADD 22 are part of the ISO 12083 (SGML) standard (Appendix A.8: Facilities for Braille, large print, and computer voice) and as a suite they are known as SDA or SGML Disabled Access transforms. In SGML, attributes can appear in element tags like "style=emphasized" in the "text" element tag in the previous example. However, it is also possible to "fix" the value of an attribute in the DTD so that element tags do not need to contain the attribute explicitly. The ICADD work uses the "fixed" approach. That is, the DTD designer associates fixed SDA attributes and values with each element to be transformed. As a result, each element is uniformly mapped to the ICADD 22 equivalent.

Production Process and a Success Story

After the markup and validation process is complete, the SGML document is then handed over to the document output system. Such a system takes the form of computer software which creates publisher-determined output from the SGML document. Two companies whose software performs the Braille transformation are Exoterica (OmniMark) and AIS.

Rubinsky cites the production of his novel, Christopher Columbus Answers All Charges, as an example of the successful use of the ICADD Architecture and the SDA transforms (Rubinsky and Giacomelli, 1993). The SDA transforms were used throughout the creation of the work. As soon as the authors completed their intellectual activity, the work could be submitted to publishers to create versions for various visually impaired readers. Following the receipt of the electronic version of the SGML text, it took two hours to create the Braille version; 90 minutes to make a voice-synthesized version; two days to create the large print version. Rubinsky is quick to point out that the fact that the work was a novel expedited the transformation process, but he believes that similar gains can be achieved for texts in all subject areas. Clearly, complex texts will take longer to transform, but will require significantly less time than the eight months it now takes. As a footnote to this publication process, the trade paperback version of Christopher Columbus . . . appeared three months after the print-disabled versions, thus making it the first book to achieve this distinction.

Benefits

Why is this process helpful? Mr. Rubinsky highlights several benefits for the print-disabled community:
  1. The process is hardware and software independent.
  2. Markup creates a highly structured document which facilitates document access and navigation.
  3. The process is generalized so that Braille, large-print, and voice-synthesized versions can be simultaneously created.
  4. Decreased production costs for publishers and lower costs for libraries and visually impaired readers.
Quite simply, widespread application of this transformation process will break down the cost and time barriers imposed by present conversion technologies. It could indeed take a practical step forward in providing universal access to all library patrons regardless of ability or disability.

Notes

Yuri Rubinsky is president and co-founder of SoftQuad, a leader in the creation of software for SGML; SoftQuad's products also include HoTMetaL, an HTML editor. Rubinsky is the editor of Charles Goldfarb's acclaimed The SGML Handbook and the co-author of the historical novel, Christopher Columbus Answers All Charges. His present areas of interest include improving access to electronic documents and the promotion of standards in the World Wide Web environment. To further that end, he is a member of the Internet Engineering Task Force working group which is developing and maintaining HTML.

I am grateful to my colleague, Dr. Keith Shafer, for the supplemental material and examples of SGML.

Dr. Keith Shafer offers the following definition: "SGML is a meta-language for writing DTDs. A DTD describes how a document conforming to it should be marked up. For instance, a DTD will describe which structural tags (elements) may occur in the document and in what order . . . Simply put, a DTD describes a class of documents. Documents can be validated against a given DTD to see if they conform to the desired format."

References

Rubinsky, Yuri, and Marc, Giacomelli. 1993. Christopher Columbus Answers All Charges. Erin, Ontario: Porcupine's Quill.


Presented October 11, 1994. Yuri Rubinsky is president and co-founder of SoftQuad, Inc.