Description of the ICADD Mechanism Yuri Rubinsky Description of the ICADD Mechanism Foreword Large repositories of SGML-encoded material are beginning to be built up: Some 50 publishers are using (for one or more projects) the ANSI standard markup scheme for books and journals commonly called the AAP DTD. An updated version of this markup augmented with the Accessible Document techniques described in this booklet has been put forward as ISO 12083. Some 40 projects around the world are marking up millions of pages of literary materials using SGML Document Type Definitions created in accordance with the guidelines of the Text Encoding Initiative (TEI). Aerospace, defense, automotive, workstation computing, semiconductor and telecommunications industries all have DTDs supporting their requirements and new industries are exploring the role which SGML will play in their documentation. When encoded using the SGML Document Access (or SDA) techniques described in this booklet, all of the textual materials created by the associations, industries, corporations, institutions and individuals will be re adily available to visually impaired readers, for Braille, for the generation of navigable voice-synthesized texts and for the publication of large print books and journals. 1.0 A Very Brief Introduction to Braille 1 Description of the ICADD Mechanism Note: It is intended that later versions of this booklet will introduce the concepts necessary for creation of computer voice and large print editions. This version, however, concentrates primarily on the creation of Braille and relies on the happy coincidence that a majority of the requirements of markup for large print and computer voice will be readily met by the Braille techniques described here. Certainly the more one is able to familiarize oneself with the goals and techniques of Braille, the more effectively one can plan for the eventual re-use of text marked up for other purposes. At the same time, the committee recognizes that it is not the intent of all authors, editors and document analysts to become Braille-fluent. Accordingly, this section hopes to present just what is needed for the purposes of creating Braille-ready text as a by-product of other markup activities. Braille is both a short-hand language and a set of rules for the display of text (letters and numbers, including complex mathematics and chemistry, for example) in a manner that is sensible to touch. A Braille character consists of six dots placed in a "cell," three rows down by two across. A dot may be either raised or not, and combinations of dots represent alphabetic and numeric characters. Because there aren't enough combinations of six dots to display the full range of letters, numbers, and special characters, Braill e cells must be read in sequence: One Braille symbol will indicate that the following symbol (or symbols) must be read as a number instead of a letter, for example. A Braille page generally consists of 24 rows of Braille cells, with 40 cells in each row. If each Braille cell stood for just one letter or number, each page would hold roughly 200 words, and Braille books would be too thick to carry. Instead, Braille has been built up as a very dense set of abbreviations, and a large part of the skill of an experienced Braillist is knowing when and how certain abbreviations are to be used. In recent years, software has been able to duplicate the"context-sensitive" nature of Braille translation and now produces satisfactory to very good Braille. The techniques for making text braille-ready described in this document are geared for automated software translation. A software will transform a "traditionally marked up" SGML file into an output file which is enabled for Braille. That output is a second SGML f ile which serves as input for Braille translation software. During this process, an experienced Braille transcriber will be required to handle the most complex transformations. For people who are print-disabled, Braille remains the only display method that gives them the advantages of books, that is, the ability to mark and review passages of interest, to skim (to some extent) based on the available formatting, and to study complex passages. For this reason, Braille is still the first choice for production of textbooks. Braille is a formatting language, a fairly sophisticated one, built up out of a very minimal set of format capabilities. On a traditional print page one has at one's disposal an extraordinary variety of techniques with which one can express the rich layers of meaning conveyed by a document's logical structures: point size, leading, choice of typeface to express heading levels, for example; tabular alignment to express matrix relationships; emboldening and italicization to display emp hasis. When designing a page for the Braille reader, one is limited to a mere handful of format techniques: levels of standard indentation (and, inversely, outdents), centering, blank lines, right alignment and spaces. And one must use these techniques sparingly, knowing that there can be at most 960 cells on any page, and knowing also that much more than in a traditional print publication, each additional page bears a hefty price tag. Despite the limitations of available formatting, note too that there is a degree of flexibility in creation of 2 Description of the ICADD Mechanism Braille pages, and Braillists comfortable with the techniques will bend rules slightly in order to optimize the amount of data that can be placed on a page. The means that the process is always partially manual. Nonetheless, the techniques outlined in this booklet are intended to automate whatever can be automated of the process. 2.0 The Goals of the ICADD Architecture This sub-committee's work began with the following assumptions: 1 For a markup technique oriented to the visually-impaired to succeed, it must require absolutely minimal overhead (if any) on the part of the people actually marking up the text. 2 If people have to mark up text twice -- once for their own purposes and once separately for the purposes of non-visual encoding -- there is a great possibility that the second stage will not happen. 3 There is value in having one piece of text marked up in such a way that both codings are effectively present. Most significantly, as new technology becomes available to make possible browsing access to any markup, all repositories will still contain the full richness of their information. In practice this means that the "real" or "archival" document will always be the version with the source markup. The Braille-enabled output file is intended to be only a temporary part of a translation process, not to be preserved for later use. The specific goals of the technical work arose from those assumptions and were as follows: 1 to enable, as much as possible, the automated conversion of marked-up files to Braille, large print and voice editions. 2 to minimize the burden on writers and editors of understanding the requirements of markup for Braille, large print and voice synthesized delivery. (That is: This booklet should be all the background one needs to prepare Braille-ready texts.) 3 to minimize or eliminate publishers' costs in making texts available for the print-disabled community. 4 to keep the technique simple. It is our belief, after discussion with colleagues in the SGML community, that it is possible to have creators of DTDs -- all DTDs -- build into them the relevant attributes to allow for Braille, large print and voice-synthesis from the files encoded for other purposes, as a by-product. In this fashion, everyone creating markup schemes shares the burden, a burden which is absolutely minimal on a site-by-site basis -- perhaps as little as one or two days for a reasonably complex DTD if the technique is well understood, no more than half a day for a simple one. 2.1 What Is Not Covered This document does not deal with the most complicated requirements of Braille encoding. That is, in order to devise a manageable first step, the committee has chosen to describe a base architecture plus a set of methods which are associated with SGML fragments or modules. One such fragment is included in this booklet -- for the handling of tables. Future enhancements will include support for mathematics, poetry, drama, and chemistry. Separate documentation is forthcoming which will establish both markup and procedures for these areas. 3 Description of the ICADD Mechanism Their exclusion from the current work should not be interpreted as indicative of a lack of recognition within the committee of their importance. 2.2 For Constructs Not Supported Although the techniques described in this Booklet will enable the preparation for multiple ICADD uses of nearly all SGML documents, certain complex constructs may not be adequately supported. However, note that one need not use the ICADD techniques to create input to Braille, large print or voice delivery software. Any of a number of widely-available sophisticated SGML transformation tools can be used to generate the markup required by the ICADD22.DTD in Appendix A. 3.0 How to Begin The person creating a SGML-accessible application will most likely begin from one of two points: 1 a DTD already exists for the content to be converted, or 2 a DTD is to be created. In the second case, one should proceed to design an SGML application optimized to the requirements of the content almost as if there were no Braille component to the work, following a normal course for the creation of a DTD, including analyzing the requirements, proposing a set of tags to cover the requirements, and determining clear instructions for their correct use. However, during that process, we recommend that one go beyond the basic requirements of that application to add special value for the print-disabled. Where this is possible, please see the section at the end of this booklet entitled "Using Braille-Specific Elements in Any DTD". In the unusual situation where the DTD will exist only to prepare a file for Braille encoding, see both special sections at the end of this document: "A DTD Just for Braille" and "Using Braille-Specific Elements in Any DTD." Normally however, one begins with any existing SGML application. Rather than build a separate transformation file of some sort, in the DTD, the ICADD architecture uses four specialized attributes to build in enough information for a processor to extract a file encoded for use with Braille software. One ends up with two SGML files: one encoded in accordance with the original DTD and a temporary version whose markup conforms to the tagset described in this booklet. 4.0 Overview of the ICADD Methodology 4.1 Using Architectural Forms The idea underlying the ICADD SGML work comes from ISO 10744 -- the International Standard for Hypertext and Time-Based Encoding ("HyTime"). The HyTime standard articulates the notion that a set of 4 Description of the ICADD Mechanism semantic constructs may be associated with SGML elements as attribute values and therefore carry with them meaning or intentions beyond what is normally part of an SGML application. This technique, invented by Dr Charles Goldfarb of IBM, defines these constructs as "architectural forms". HyTime standardizes one set of forms which is useful for its purposes. The ICADD technical sub-committee has developed a very simple implementation of architectural forms to represent the formatting capabilities of Braille with a small set of "Braille elements". To a great extent, the formatting capabilities also represent the simple typesetting requirements of large print books and the implied structural hierarchies of the architectural forms turn out to be appropriate for navigation through computer voice delivery of texts. By using architectural forms, the ICADD technique lets SGML attributes in a source DTD carry with them the mapping for their associated Braille elements. This means that no separate file needs t o be associated with either the source file or the DTD; all the necessary information is immediately available in the DTD. This also means, interestingly, that authors can create documents using a DTD without the ICADD attributes, and pass the resultant file to a system which has access to the same DTD but in an ICADD-enabled version. Indeed, if appropriate, only Braille creation facilities need use the Braille-enabled version of a DTD, although we feel it important that the creators of the DTD be the ones to indicate their preferred mappings. This approach has the additional benefit of spreading the cost of building ICADD-enabled DTDs amongst all those who create the DTDs, and of asking those most familiar with a DTD (that is, its creators) to give guidance, through the "SGML Document Access" attributes, to later users of the marked-up content. 4.2 The Building Blocks There are five major components of the ICADD technique for SGML Document Access. Together they form a simple architecture for informing an SDA transformation process as to the intended output of such a process -- that is, they allow DTD designers to plan for the creation of an inputstream to production facilities for Braille, computer voice and large type editions. Note that very useful mappings can be constructed using only the SDAFORM and the simple form of the SDARULE attributes. There is sufficient capability in these techniques to accomplish a great deal more, but for many documents, the simplest transformations will go a long way towards making more electronic information readily accessible. Each of these components is discussed in further detail in the following pages. 1. SDAFORM An attribute declared on a source element whose value is the intended result of an SDA transformation; that is, the value of such an attribute must be an element name in the canonical SDA tagset. sdaform may also specify attributes of the original element to be carried forward into the output or transformed file. With the keywords #ATTLIST or #ATTRIB , sdaform carries the names and values of source attributes through the SDA transformation. 2. SDARULE An attribute declared on a source element whose value indicates logical contexts in which values specified override the default element name as specified in an sdaform attribute. With the keyword #USE , sdarule attributes may carry pointers to sets of rules activated by their place in the parentage of the current element -- that is, they may be used to establish stacking rules of precedence for transformations. 3. SDAPREF 5 Description of the ICADD Mechanism An attribute declared on a source element whose value is to be reproduced as content in the output of the transformation; that is, the transformation process imitates procedures which might take place in a typesetting process to generate fixed text or, with the keyword #COUNT , automatic numbering in place of the source start-tag. The keyword #SET establishes generated text or a counter with the name of an element and the keyword #USE indicates the attribute for which that counter is to be used. With the keywords #ATTLIST , #ATTRIB or #ATTVAL , sdapref carries the names and/or values of source attributes through the SDA transformation. An SGML processing instruction declared as part of an sdapref value and containing the word sdatrans indicates that a Braille transcriber or other expert should examine, and if necessary, modify the current output element structure by hand. 4. SDASUFF An attribute declared on a source element whose value is to be reproduced as content in the output of t he transformation; that is, the transformation process imitates procedures which might take place in a typesetting process to generate fixed text or, with the keyword #COUNT , automatic numbering in place of the source end-tag. The keyword #SET establishes generated text or a counter with the name of an element and the keyword #USE indicates the attribute for which that counter is to be used. With the keywords #ATTLIST , #ATTRIB or #ATTVAL , sdasuff carries the names and/or values of source attributes through the SDA transformation. sdasuff is the same as sdapref except that the generated text replaces the source element end-tag instead of the start-tag and sdatrans instructions are not allowed. 5. Special Character Handling Use of standardized SGML entity references to ensure that characters and symbols not in the base ASCII character set are successfully passed to an accessible document input stream. 5.0 Using the ICADD Architecture 5.1 Base Tag Set A small set of "canonical" elements (the "SDA tagset" henceforward in this document has been created to support the basic output formats available in Braille. Because the first target audience for the Braille-ready techniques comprises publishers who produce textbooks with desktop publishing software and who have indicated their enthusiasm for a minimal tagset, the name given to the formal version of this tagset is "ICADD22.dtd". Note that the actual DTD included in this document as an appendix is still in draft form pending additional experimentation and testing. When using the mapping techniques described on the following pages, one must use the canonical element names listed below as the FIXEDvalues of the appropriate attributes, that is, as the target elements of any transformation. This is the heart of the ICADD standardization effort. The elements defined in this base DTD are as follows: ANCHOR Mark Spot on a Page AU 6 Description of the ICADD Mechanism Author(s) B Bold Emphasized Text BOOK Highest Level Element for Document BOX Boxed or Sidebar Information BQ Block Quotation FIG Figure Title and Description FN Footnote H1 Major Level Heading within Book H2 Second Level Heading H3 Third Level Heading or BOX Heading H4 Fourth Level Heading H5 Fifth Level Heading H6 Sixth Level Heading IPP Page Number of Ink Print Page IT Italic Emphasized Text LANG Language Indicator LHEAD List Heading LIST List of Items LIT Literal or Computer Text LITEM List Item NOTE Note in Text OTHER Other Emphasized Text PARA Paragraph PP Print Page Reference TERM Term or Keyword 7 Description of the ICADD Mechanism TI Title of the Book XREF Cross Reference 5.2 Table Module An optional set of canonical elements has been created to support the encoding of tables. Tables marked up with this set may be used for Braille, large type and computer voice. The set consists of: TABLE The highest level element, which will include at least one TGROUP TGROUP Grouping element allows repeated combinations of the next three elements to appear in one table: THEAD Table Header (optional) TBODY Table Body (required) TFOOT Table Footer (optional) COLDEF Column Definition (which carries necessary attributes for the column information) HDROW Row in a Header HDCELL Cell in a Header ROW Row in the Table Body STUBCELL The Stub Cell (or Row Heading) of a Row SSTCELL A Sub-Stub Cell in a Row (usually with a different indent) CELL Table Cell SHORTXT Short Text Element provides alternative text for a stub cell or head cell for voice representation or for a cell reference to longer text carried in the NOTE in a Braille table NOTE Text extracted from Braille table cells in order to allow the narrowest possible column widths in the table body 5.3 The Basic Transformation Requirement Imagine a Braille, large print and voice synthesis tagset which lists only the following elements: H1 Major Level Heading H2 Second Level Heading 8 Description of the ICADD Mechanism H3 Third Level Heading LIST List of Items LITEM List Item PARA Paragraph Here's a sample piece of an SGML document type definition, using other elements: These declarations state that a sec (section) element must begin with a section title ( sectitle ) which must be followed by one or more paragraphs ( p ) or sequential lists ( seqlist ). A seqlist consists of one or more list items ( li ), which in turn are made up of any number of paragraphs ( p ). Finally, the last declaration says that sectitle and p are made up of parsable character data, that is, letters and numbers. Now we need to map the elements in the source DTD to the available SDA elements. That is, we need to state how to represent sec, sectitle, seqlist, p and li using only the tagnames h1, h2, h3, para, list and litem . 5.4 Establishing One-to-One Mapping: SDAFORM We put the two sides together by using attributes to add extra information to all instances of an element. In its simplest form, the attribute list for an element names the SDA element to which the element corresponds. This technique uses the SGML keyword FIXED to force a specific attribute value to be associated with every appearance of the related element. Its value cannot be changed within the document. For example, to associate the sample element sectitle with the SDA element h1 , we create an attribute named sdaform that "fixes" that correspondence: This indicates that whenever sectitle is used, it stands for an h1 in the SDA tagset. The example uses only a few of the available SDA "elements" but the technique is the same for any DTD: One uses FIXED sdaform attributes to map any source DTD's elements to those of the canonical set in ICADD22.dtd (listed in Section 5.1 of this book and shown in the DTD in Appendix A). The fixed value must be one from the canonical set. If an element has no previously declared attributes, you will need to create an ATTLIST declaration. If the element already has attributes, simply add the FIXED attributes after those already declared as part of the same declaration. Any element may have no fixed sdaform or sdarule attributes (that is, no declared mapping). For each such element, the transformation process must discard both its start- and end-tags. A typical case where this occurs is with "containing elements", those which do not contain character content of their own, but which mark structural boun daries, and contain only other elements. 9 Description of the ICADD Mechanism Note that software which transforms an SGML file from being encoded according to a source DTD is likely to be SGML software, able to read the DTD and perform actions based on the attribute values established in the DTD. The output of such a transformation will be text tagged according to the canonical ICADD tagset, but, under some circumstances, it may not parse against the ICADD22.dtd. Nor does it need to. Braille translation software can withstand a great deal of flexibility in its handling of an input stream. 5.5 Context-Dependent Mapping: SDARULE Clearly, however, there are occasions when a source element is mapped to a different SDA form depending on the context in which it appears. The ICADD technique includes both a simple mechanism for simple contextual mappings, and more complex ones for those situations in which the mapping may be dependent on attributes in ithe source document instance or on the fulfillment of one or more conditions in the element's ancestry. As just described, the attribute sdaform defines a mapping in the attribute list declared for that element. The simple form of the sdarule attribute allows mappings for an element to be defined in the attribute list of a different element, one which provides context for the mapping. An example makes this clear. Both of the elements concerned -- sectitle and title -- may appear in other elements, with other mappings, but the declaration says that within a sec , a sectitle maps to h1 and a title maps to h3 . (The attribute declaration in this and other examples is spread across two lines only for legibility. Spaces, carriage returns and tabs are not meaningful in the declarations, with the exception of generated text -- see Section 5.7 below.) In a simple contextual mapping, the attribute sdarule takes an even number of arguments; within each ATTLIST , you can declare any number of pairs of arguments. This mechanism can readily provide different mappings for the same elements in different contexts. For example: The interpretation of these rules is simple: In a chp , title maps to an h2 element. In a sec , title maps to an h3 element. When it appears in a fig , it is transformed into an it element (italic in the SDA tagset). Remember that title could also have an sdaform attribute of its own: Unless specified otherwise by an sdarule , therefore, title has a default mapping of ti (title in the SDA tagset). Any active SDARULE overrides the default and the attribute closest to the element to be transformed takes precedence: If a title appears in a figure within a chapter, in this example, it is mapped to it . 5.6 More Complex Context: Stacking Rules with #USE 10 Description of the ICADD Mechanism The ICADD architecture specifies a more complex mechanism for creating a contextual mapping rule as an attribute on one element, and pointing to it from another element. Only when both the elements appear as part of the full path -- that is, when both element's start-tags have appeared but not their end-tags, then the rule is established and activated and that particular and specific mapping occurs. This is a very powerful technique that can be exploited to great advantage where necessary. Note that in the great majority of cases, the simple sdarule attributes described above will suffice to provide adequate context for changing mappings within document structures. The procedure described in this section will appear only rarely. This mechanism allows one to set a stack of open rules, where the rule in a parent element closest to the current context overrides rules higher in the element ancestry (or previous in the stack). For example, in a content model where chp could occur at multiple levels in the document, we need to be able to specify different mappings for title depending on whether chp is in a part or not: The highest level title will always be an h1 , whether it's in a part or a chp . The second level element will always be an h2 . Accordingly, the Braille architectural forms need to carry hierarchical level information with them, that is, they must distinguish between an h1 and an h2 , recognizing that if title appears in a part it maps to h1 , if it appears in a chp within a part , it maps to an h2 , but if the chp is not in a part , the title maps to an h1 . This is done by establishing rules within chp that set the two mapping conditions: Notice that arbitrary application-specific names are given to two new attributes ( sdabdy and spapart ). Later, other attributes will point to them. They can be given any names that have meaning to the DTD designer; they are not specified by the ICADD techniques. (The example used "SDA" as their prefix only to be consistent with the accessible document attributes. It need not have done so.) From the example in Section 5.5, title has a default mapping set to the ICADD element ti , but this has no bearing on the present situation. It is over-ruled by the sdarule attributes: (In some cases, both the source element ti , for example, in the Association of American Publishers' BOOK DTD -- and the ICADD tag TI have the same name. The sdaform attribute must nonetheless be established.) Remember that these stack, that is: If a user has entered a bdy start-tag, he or she has activated the rule that says within a chp element, use the sdabdy rule. A processor now comes upon the chp start-tag and knows to choose the sdabdy rule which decrees that in this context, the title element is to become an h1 . 11 Description of the ICADD Mechanism Imagine, instead, that the processor, after seeing the bdy start-tag, then came upon a part start-tag. To the stack, it would add the two rules that say transform a title element, into an h1 element; and, upon encountering a chp element, use the sdapart rule. Accordingly, when it comes upon the first title element it hasn't seen the chp start-tag yet, so it is still using the part 's rule and transforms the title into an h1 . Then, when the processor comes upon the chp start-tag, it knows to act on the sdapart rule, and it transforms the next title into an h2 . 5.7 Generated Text: SDAPREF and SDASUFF The third type of ICADD attribute covers situations in which the name of an SGML element (its generic identifier) carries useful information which would be lost if the original element were transformed into a sdaform attribute which carries only the information needed for presentation. Often formatting software packages will generate fixed text strings or counters as they produce a book. An ICADD input file must have this information in the stream of content. (A Braille book must match the standard print book even if some of the print book's content was generated by the typesetting system and does not exist in the raw electronic manuscript.) The attributes sdapref and sdasuff duplicate this part of the typesetting process, and also have a rudimentary counting mechanism to generate alphabetic or numeric content. Several examples follow: Here the element apptitle , the title of an appendix, is mapped to a standard h1 element in the SDA set. The sdapref attribute carries "generated text," words to be produced by the translator software as a string appearing within the h1 element immediately after the h1 start-tag. sdasuff is a similar attribute for generated text to be inserted where the end-tag appears. When sdapref or sdasuff attributes are used without an sdaform attribute, the result is effectively the simple replacement of both or either the source start- and end-tags by generated text. A basic example: In the example immediately above, the start- and end-tags for a quote element are replaced with typewriter quote marks. In this example, both the figure description and the simplified figure description (see Section 6.3 of this booklet) are transformed into paragraphs whose opening text is "Figure Description: ". Notice that the sdapref attribute value includes a space after the colon since the contents of the paragraph would normally begin immediately after the start-tag. 5.8 Preserving Text: #CONTENT Generated text is not always formatted according to the stype of the element which generates it. To 12 Description of the ICADD Mechanism accomodate this, the prefix and suffix strings may contain markup. The use of the keyword #content allows the replacement of one start--tag with a small element structure, as in the following example: Abstract#CONTENT" > In the example, the source element abs would normally have the title "Abstract" generated by the formatter, but the remainder of the element would be formatted as a paragraph. The effect of the example would be to transform: This document describes a portion of the work of the Technical Subcommittee of the International Committee for Accessible Document Design. into the following:

Abstract

This document describes a portion of the work of the Technical Subcommittee of the International Committee for Accessible Document Design. A more complex use of the technique would be, for example, the preservation of the content of an element as the attribute value of the new ICADD element. 5.9 Generating Counters: #COUNT Automatic numbering may be specified in sdapref , sdasuff and sdaform attributes (although the capability is usually associated with sdapref ). One may associate an automatically incremented value with an element and may also access that value with an attribute for any of that element's subelements. The keyword #count causes the expression following to be interpreted. The expression itself appears within parentheses and may be mixed in with fixed text to be generated. The expression follows the form: #COUNT(element, format) and specifies both element which is being established as a counter and the format of the counter. One needs to specify the affected element since counters are sometimes associated with a specific element, and sometimes with its parent or child elements. There are five supported formats indicated by the second argument: I specifies uppercase Roman numerals: I, II, III, IV, ... i specifies lowercase Roman numerals: i, ii, iii, iv, ... 1 specifies Arabic numbers: 1, 2, 3, 4, ... A specifies uppercase alphabetic: A, B, C, D, ... a specifies lowercase alphabetic: a, b, c, d, ... By default, the numbering starts at 1 (or the equivalent in the other formats). A counter can be initialized with an expression of the form: 1=3 13 Description of the ICADD Mechanism as in which indicates that the appendix numbering (where app is the element name for appendix) is upper case alphabetic and starts with the letter C. Note that this example includes no other generated text and would simply print the letter C in place of the app start-tag. Two types of numbering are needed in a typical document. In the first type, elements are numbered consecutively throughout. This is supported by the basic technique described above. The second type is one whose counter needs to be reset under various conditions, particularly when a higher-level element changes. Often the value of a counter may be used in the sdapref attribute of any of an element's subelements. For example: When the counter for the parent element changes (in this case sec ), the counter for the subelement is automatically reset to 1 . The example above would generate Section A when a sec element first appeared, and Subsection A.a when the first subsec element appears. The following variation allows the format of the counter to be different from that of a parent element: which will generate: Figure 1.1, 1.2, 1.3 ... and so on, until the next section, even if the parent section elements are numbered A, B, C. An exclamation mark in the counter format supports the case in which the counter should not be reset when the parent changes. A typical example might be figure numbers which are consecutive throughout a book but which incorporate the chapter or section number as well: which will generate: Figure 1.1, 1.2, 1.3 in section A, and 14 Description of the ICADD Mechanism Figure 2.4, 2.5, 2.6 in section B. Under certain conditions, a counter needs to be reset even though the parent has no counter or its counter doesn't appear in the current element's generated text. The tilde character is the non-printing indicator: A typical example: which will generate: 1. 2. and so on and ensures that the listitems will be reset each time a new list starts. Notice that the list itself doesn't need to have a blank counter established in its attribute list declaration; to enable the reset mechanism, no value is needed beyond the non-printing counter in the listitem . Notice also that spaces are not meaningful within the #count expression (that is, between the # and the close parenthesis), but are everywhere else within the attribute value. The final situation covered allows one to make numbering decisions based on a parent element even when an element may have a variety of parents: #set always takes two arguments. The first is the name of the element in the source DTD which governs the counter. The second argument is either the format of the counter or the content of the prefix or suffix that will be referenced and picked up by the sub-element that needs it. Content other than counter formats must be set off using single quotes. In the example, the orderedlist element establishes that a listitem is prefaced with a numeric counter followed by a period and a space. When the same listitem element appears within a bulletlist , however, its prefix is a bullet followed by a space. The #use function may take only one argument which is the name of the counter it should use. If there is a second argument, it is a format for a counter which overrides that which may have been set in the parent's attribute. The #set function never appears in text generated by the element in which it is declared. It may, therefore, appear with other content which is to be generated, as in the exam ple above where "Other generated text." appears in place of the orderedlist start-tag. The #set function is implied for any #count which is not explicitly set. That is, #set is used only for complex situations in which you wish to establish multiple possible prefix or suffix strings. Note: To use actual angle brackets, hash marks, tildes, exclamation marks and quotation marks -- both single and double -- in all the sdapref and sdasuff values, one should use SGML entity references, 15 Description of the ICADD Mechanism even when the special characters are used in a place where the context might inform their correct usage. Note that all the capabilities available in sdapref are also available in sdasuff attributes although they are not normally used there. 5.10 Attribute Handling On occasion it will be necessary to carry the names and/or values of attributes through the SDA transformation process. This is accomplished with the use of three keywords which may be employed in association with any of the other SDA attributes. #attlist brings forward the entire attribute list of the base element, excluding any attribute whose name begins with SDA (or its replacement as established by APPINFO; see Section 7.0 of this booklet). This keyword may be used with sdaform or sdapref #attrib (xxxxx) brings forward the attribute xxxxx and its value (complete with the equals sign and the quotation marks). This is used to isolate one or more specific attributes from a longer list. That is: #attrib (xxxxx yyyyy) picks up both. Clustering the arguments with parentheses means that one can isolate attribute names from other generated text. The space between the #attrib and the (xxxxx) is optional. This keyword may be used with sdaform or sdapref #attval (xxxxx) brings forward the value only of the attribute xxxxx . This may be used with generated text in an sdapref attribute to rename an attribute. This keyword may also be used with more than one argument. Two examples: Abstract</h1>"> 5.11 A Basic Location Model There are several classes of source DTD hierarchical structures which are not well served by the techniques described earlier in this booklet. Most important of those, by virtue of its use in a variety of existing DTDs, is the requirement to allow for the mapping of elements within a recursively nesting element. For example, the following case Level One Title Level Two Title Level Three Title can easily create a structure in which the first title element must be mapped to the SDA h1 , the second title must be mapped to an h2 , and the third title element must be mapped to the SDA h3 . The #use construct described in Section 5.6 deals with a variety of structures, but not with placement of an element within its tree or with respect to its subelements. For that reason, the committee has developed a small "location model" language to describe a set of standard conditions. The syntax for these conditions involves use of ">", square brackets and parentheses. This was adopted because ">" is very unlikely to be a character allowed in an element name. Except for the use of the rare 16 Description of the ICADD Mechanism SGML feature CONCUR, the same is true of "(" and ")". The square brackets group together the location model in order to allow non-significant white space to occur. The location model works exactly the same way as sdarule except that the first argument, which occurs within square brackets, may represent a complex set of conditions which must be fulfilled for the mapping to occur. [chap>>p>>emph] means "the current element and its ancestry matches the pattern chap containing a p containing a emph ". It is not necessary to put the current element into this pattern if emph is to contain it, but not necessarily immediately. You can put in the current element either by name or, sometimes more usefully, by the special symbol #CE . [chap>p>emph>#CE] means "the current element and its ancestry matches the pattern chap immediately containing p immediately containing emph immediately containing the current element". ">>" and ">" can be mixed as needed. [(chap|sec)>>p] means "either a chap or a sec containing a p ". [chap >> p ID=AC555 >> emph] indicates that the transformation is to take place only if the specified attribute value matches. Alternative values are allowed for attributes in location models. Thus: [chap>> p ID=(A|B|C) >> emph] means match a chap containing a p with ID attribute equal to an A OR a B OR C containing an emph . Accordingly, for the nested sec example described above, the following attribute declarations would handle the mappings: >title] h2 [sec>>sec>>title] h3 [sec>>sec>>sec>>title] h4" 5.12 Braille Transcriber's Notes In the case of creation of Braille editions, certain transformations will always require the intervention of an experienced Braille transcriber. Often these can be predicted: One knows that in a DTD with the potential for complex tables, or one which supports the inclusion of graphics, that the Braillist should be alerted to either proofread or create the required content and markup. In the case of graphics, for example, a sighted person will have to describe the image. It would be useful to have the transformation process place a marker in the text at each point where one knows in advance that such work will be necessary. The ICADD technique recommends the consistent use of a processing instruction as just such a marker. The marker is placed by declaring an sdapref attribute at the highest level of the relevant element/sub-element group. For example, a marker should be put on a table element rather than on a row or cell : " > A transcriber message can be used jointly with any other generated text, but it should be placed first in the prefix string: 17 Description of the ICADD Mechanism Warning!" > Until more common practice in the creation of accessible DTDs develops, one should not be afraid to use liberally, thereby consistently marking any element structure (or DTD fragment) whose transformation may not be fully automatable. 5.12 Tables Tables, unfortunately, can be very complicated. The simplest way to support the generation of multiple ICADD output formats for tables is, wherever possible, to adopt in its entirety the table DTD fragment in Appendix B of this booklet. This set of declarations has been put into the public domain and may be used as they are for the creation of both accessible and traditional print tables. Under circumstances where you have no compelling reason to use another specific style of table markup, the committee recommends this one. The second approach is to add the SDA attributes to your existing declared table elements to transform those elements into the canonical ICADD table elements. Note that the most important contribution that any markup can offer for the ICADD translation process is, first, to distinguish the contents of cells within rows; and then to be able to distinguish header information (both for column heads and row heads -- usually called "stubs"). If your translation is able only to accompl ish this much, it still provides much-needed automation of a large part of the process. Each table element, at its highest level, should include the sdapref attribute whose content consists of the request for a Braille transcriber to check the table: tbl SDAFORM CDATA #FIXED "table" " > This example assumes that the source DTD includes a table element with a generic identifier of tbl . 5.13 Sample SDA Parameter Entities Naturally, with so many repetitive constructs to be placed in a DTD, one may find it useful to copy the following declarations into one's DTD and use parameter entities through the attribute declarations: These would be used as in the following examples: Use of these parameter entities is encouraged by the committee so that a consistent pool of SDA-enabled DTDs may be developed. 18 Description of the ICADD Mechanism 5.14 Special Characters In the midst of otherwise clearcut ways of performing the translations required for Braille, large print and computer voice, we are still confronted with special or "funny" characters. These are the thousands of symbols and signs that appear in print but that are not part of a base character set common to any wide set of typewriters or computers. (A basic French keyboard, naturally, is different from an English one, or a Polish one.) There is a base character set called ASCII, which supports a broad selection of American, English Canadian and English characters, and does so in a manner that is consistent amongst a wide set of keyboards. At this time, ASCII is the base character set used for all content and markup in the ICADD techniques. This is fine for base text, but the questions remain: What should one do to represent an accented character (a Spanish tilde, for example, or a French cedilla)? How about special symbols (sometimes called dingbats) such as boxes or bullets? How about mathematical ch aracters? The real difficulty here is that while it may be possible to achieve such characters on a keyboard, it is unlikely that the computer's internal coding of that character will have any meaning when the file is moved to another computer. For a file to be SDA-enabled, it is imperative that all such characters be marked completely unambiguously for later depiction in Braille and the other formats. Luckily, SGML provides a construct to meet this requirement: the entity reference. SGMLdeclarations allow one to establish an unambiguous ASCII representation of any special or funny character by naming it using an ASCII string of characters delimited with an ampersand at the start and a semi-colon at the end. (For example, • could be made to stand for a (a bullet), and é for an "e". It is up to the SGML system to interpret these entity references to the user, either by substituting the special character if available, or otherwise indicating the existence of the reference.) A very rich s et of useful entity references is available as part of the current SGML standard, and a fuller set is available within an ongoing ISO technical report, TR-9573 -- Techniques for Using SGML. (This report is available from ISO, most national standards bodies, the International SGML Users' Group and the Graphic Communications Association.) Note that many people will be using an SGML parser to transform a source SGML file into one marked up for Braille, large print or computer voice. That parser will normally transform all entity references into the content that has been defined for them. At that point, their value to the ongoing process vanishes; they will have been converted to machine specific or software specific codes: For ICADD purposes, it is critical that they remain "unexpanded" so they are still computer and software independent when they reach Braille or other ICADD software. Accordingly, all entity references used with the ICADD-enabling techniques must be declared as being of the type CDATA or SDATA. This will ensure they pass through the SGML parser unscathed. A typical SGML entity declaration (specific to one computer): This declares that ñ is equivalent to an internal code #165. The ICADD-enabling version of the same SGML entity declaration: 19 Description of the ICADD Mechanism 5.15 Support for Multiple Languages The techniques described above for sdapref and sdasuff are based on the premise that it makes sense to incorporate text directly into the DTD that will become part of the input stream to a Braille translation process. This, in turn, assumes that the text of the marked-up file and the text of the DTD will be the same -- and more importantly, will remain the same. In fact, this cannot be safely assumed. SGML is part of an active, international community in the forefront of reusing information across many borders and boundaries. An additional technique is needed to ensure the separability of the generated text from the remaining work that goes into making a DTD ICADD-enabled. The committee suggests removing the specific contents of all generated text from the attribute declarations and defining them indirectly as SGML entities which are gathered in a set of declarations, which may exist either in an external file or within the DTD. Both mechanisms allow users to switch easily between different lang uages. The following example illustrates the use of the external file. In the DTD is a reference to a local (system) or public entity set: %SDAGEN; Wherever sdapref and sdasuff attributes were used, instead of the form described elsewhere in this document, one would include: Notice that the sdapref value is defined as having FIXED ENTITY attribute values instead of the CDATA attribute values as in Sections 5.7, 5.8 and 5.9. In the SDAGEN entities file, we might find for instance: When it becomes appropriate to re-run the transformation process for a second language, the entity reference should be re-declared to refer to a second SDAGEN file, in the desired second language, edited locally by a translator (and not necessarily the DTD creator). This process is repeated for as many languages as are needed. (Note that one could do some fancy renaming of the external entity files so that multiple files exist but, as needed, a copy is made which is temporarily called by the name embedded within the DTD. That way the DTD doesn't have to have the file name re-declared each time it is used to establish the mapping transformations for the new language.) For any DTD which would be used in a variety of countries, this approach means that one defines one common entity file (presumably in the base language of the DTD) for the generated text which appears in all the sdapref and sdasuff attributes declared throughout the DTD. There is a disadvantage to this approach in that the person cr eating the ICADD-enabled DTD always needs to include (at least) one separate entity file. Accordingly, there is a slight risk of the two files becoming separated or out-of-synchronization. However, in the committee's opinion, this solution is better than the most obvious alternative: having to deal with multiple versions of the same DTD whose only differences 20 Description of the ICADD Mechanism are that they contain generated text attribute values in different languages. There is a second approach which has the advantage of maintaining all the content in one file and the disadvantage of creating a slightly more cluttered DTD. One would decide which approach to use primarily based on whether one wants to have the DTD seem to be unchanging -- only the external file changes -- or whether one is more concerned about keeping everything needed for the transformation in one file. The second technique involves the use, within the DTD, of marked section parameter entities for each language. The example shows the principle: Adres voor elektronische post: "> ]]> Electronic Mail Address: "> ]]> Adresse electronique: "> ]]> Here one declares a marked section parameter entity in the DTD for each relevant language and sets all languages to IGNORE except for the current one. The text -- as well as anything else which may appear in the generated text attribute values, including context and markup -- appears within the appropriately marked up marked section once for each language. Notice that non-ICADD declarations appear intermingled with the SDA attributes but that only the SDA declarations must use the declared entities. From a practical point of view, the technique is quite easy to administer. All sdapref and sdasuff values are declared as entities and given a unique name. Each is declared within a set of entity declarations gathered together for convenience, perhaps at the end of the DTD, within the appropriate marked section. That entire list is copied over and over, once for each language, and only the language-sensitive words are translated. The additional characters are left precisely as they are to ensure ide ntical handling by the transformation process. Note that either of these mechanisms result in a set of declared entities that will also now be valid elsewhere in documents conforming to the DTD. This means that if users attempt to declare entities which, by coincidence, have the same name, they will over-ride the declarations in the DTD's entity set. The committee recommends naming all such entities so that they begin with the letters "sda". These techniques also mean that authors working with any of a number of common SGML editing tools will likely be offered dialog box pick lists of entity references -- and these lists will include the entity references that are intended only for internal use within the DTD. In theory, an author could insert them 21 Description of the ICADD Mechanism anywhere in the document. 6. 0 Creating Braille-Ready Files 6.1 Braille-Ready Files from within Wordprocessing or Desktop Publishing Software The SDA canonical tagset has been chosen to represent a set of standard document structures that could be used to compose a book or article. In order to facilitate its implementation, the committee chose to keep the list short rather than duplicate the work of industry associations which have created rich DTDs geared to capturing specific details of document components. Most wordprocessing and desktop publishing software supports an ability for users to define and exploit "style sheets". If you are using such software to create electronic files which you hope to make available in Braille, large type and computer voice, wherever possible, use style sheets to format the document. Give the style sheets the names of the canonical tagset, and choose the appropriate set of export options which will store the style names in the output text file. There are limitations built into this approach. The formatting of phrases or words within paragraphs, for example, is generally lost to the wordprocessing or deskto p publishing software export. Tables will not usefully be captured this way. Nonetheless, the real goal of the ICADD techniques is to make any file more useful to members of the print-disabled community. While exported files marked up with style names cannot have the overall usefulness of an SGML file, they are considerably better than plain text files. 6.2 A DTD Just for Braille The dtp.dtd [See Appendix A of this document], which is the formal rendering in SGML of the canonical tagset referred to throughout this document, may be used by itself to mark up text if the sole point of that markup is to generate Braille and/or large print editions. This use is discouraged by the committee, however, since such action would result in a file not necessarily useful for the generation of navigable voice synthesized text, or indeed for any worthwhile searching and retrieval. Accordingly, if the target document is a traditional trade book, journal or article, the sub-committee recommends the use of the DTDs created by the Association of American Publishers and updated as ISO 12083. Note, however, that the ICADD committee has extended these DTDs with several constructs optimized for use by the print disabled. Contact the authors of this report for additional information. 6.3 Using Braille-Specific Elements in Any DTD The techniques described elsewhere in this booklet are geared to those publications where creating the Braille version is only a by-product of the creation of the source input file for traditional typesetting. That is, nothing appears in the source file specifically to support the Braille version. All interaction is between the DTD and the SGML parser/transformer. At the same time, it must be recognized that authors, editors and production people may wish to augment 22 Description of the ICADD Mechanism the input file's contents to better support the print-disabled community. The elements used for this are the following: Ink Print Page Braille editions of books generally include the page numbers of the print version so that Braille and sighted colleagues and class-mates may refer to a common page number. However, the typesetting input file will not include those page numbers -- they appear only in the output stream. The element has been created for insertion after the formatting process to mark each pagebreak. It may appear anywhere in the file and its contents should be the number of the page just beginning. Usage: The element should be added as an inclusion at the top level of any document. Figures Many books contain illustrations. The SGML input file will normally contain a reference to a graphic, and, perhaps, a title. The reference will disappear from the Braille version of the SGML file, and the figure title, if there is one, will generally not contain enough information for the visually-impaired reader to imagine the graphic. Here the element should be used for a detailed description of the contents of the image. When this description can be composed by the author, it can highlight the graphic contents to which the author attaches importance. Otherwise this description will originate with Braille transcription experts. Note that if an author or editor creates the , a more experienced transcription expert may need to enhance it. Usage: The element is a sub-element of any figure and may contain PCDATA , standard paragraph sub-elements (such as emphasis), and special notation elements ( and described below). A typical use might be: Simplified Figures In print editions for people with limited sight, original illustrations may be simplified -- a detailed road map, for example, may be reproduced highlighting only major highways -- and these graphics may have their own, simplified descriptions. and support these two elements. Usage: When you know that a figure will be created in simplified form it is appropriate to point to the file containing it in exactly the same way as the source graphic file. The has the same content model as but it is likely to be shorter. A typical use: . Similarly, for example, in the newspaper headline "EUROPE JOINS US IN TRADE PACT," Braille rendering requires that one distinguish between US being a contraction or a pronoun. Accordingly, authors should make this distinction explicit by marking up the ambiguous word as an abbreviation or acronym ( ). A related situation exists for software which reads text aloud, pronouncing each set of continuous letters as a word. Where these should be read letter by letter (as in US, CE, UK, or TV), they should also be surrounded by tags. All elements are automatically treated as uncontractable, so, if a file is intended both for voice synthesis and Braille delivery, only words which should be pronounced as whole units (such as NATO) should be marked as elements. Usage: Wherever PCDATA may normally appear in a document, it should be joined by these "special needs" elements. A typic al parameter entity to accomplish this: (This example quoted from the CAPSNEWS.DTD created for the CEC Technology Initiative for Disabled and Elderly people (TIDE) Pilot Action.) 7.0 Indicating ICADD Usage in the SGML Declaration A document indicates conformance to the ICADD SGML Document Accessibility techniques in the APPINFO parameter of the SGML declaration, which specifies the characters in the "SDA prefix" that identifies attributes that represent "SDA declarations." SDA declaration facilities are provided by attributes described in this document. Conformance of a document to SDA is indicated by a parameter of the APPINFO parameter of the SGML declaration. Its format is: APPINFO SDA The parameter can also specify the name of the "SDA prefix" if it is other than "SDA". The format is: APPINFO SDA=NEW where the "NEW" is replaced by the new prefix name. The new name must be a valid name in the concrete syntax of the SGML declaration. As an example, if one wished to use "BRL" as the standard prefix, one would use the following string in the APPINFO declaration: APPINFO SDA=BRL This might be appropriate, for example, in situations where one enables a DTD specifically for Braille with no consideration for large print or voice synthesis requirements. This means that the attribute names used 24 Description of the ICADD Mechanism throughout the DTD would be: BRLFORM BRLRULE BRLPREF BRLSUFF and the transcriber note would be . SGML Document Access software should recognize the APPINFO clause and effect the substitution of BRL for SDA, performing all requested transformations in this variant syntax. Appendix A This appendix contains the draft version (pending additional usage and updates) of the ICADD dtp.dtd, which is the formal SGML declaration of the canonical tagset used throughout the ICADD architecture. Appendix B This appendix contains the draft version of the tables module of the ICADD dtp.dtd. This set of declarations may be integrated directly into any DTD simply by declaring content for the parameter entities which control the contents of cells (including hdcell , stubcell , sstcell and cell . Note that note has been declared within the table DTD fragment; if these declarations are used with the ICADD dtp.dtd, the second element declaration for note should be removed. 25 .