Digital Talking Book Standards Committee
Document Navigation Features List
Status of this Document: This document is in draft status. Please send any comments to Michael Moodie at mmoo@loc.gov.
Draft 4 -- December 29, 1999
Modifications made in 12/29/99 draft in section 6, eliminating requirement that notes be marked as explanatory or source and requirement that note references be included in the NCC.
Modifications made in 7/4/99 draft in sections: 1.1, 2, 3, 6, 9, 10, 12, 14, 16, and 18 (now 19). A new section 18 was added and sections 18-21 renumbered 19-22.
Changes made in 12/10/98 draft:
"Navigation Table" changed to "Navigation Control Center"
Modifications made to sections: Background, 2, 3, 5, 6, 7, 11, 15, 16, 18.
1.1 Basic Movement Through Text
1.2 More Sophisticated Movement
2. Fast Forward and Fast Reverse
4. Treatment of the Table of Contents
5.1 Moving Between the Navigation Control Center and the Actual Book
14. Text Attributes and Punctuation
18. Skipping User-Selected Text Elements
20. Summary and Reporting Information
22. Other Kinds of Visual Representations
The National Information Standards Organization (NISO) is a nonprofit association accredited as a standards developer by the American National Standards Institute. NISO Standards Committee "AQ" was formed in March, 1997, to develop a national standard for a digital talking book (DTB) for blind and physically-handicapped readers. A DTB is envisioned to be, in its fullest implementation, a group of digitally-encoded files containing an audio portion recorded in human speech; the full text of the work in electronic form, marked with the tags of a descriptive markup language; and a linking file that synchronizes the text and audio portions. As this document illustrates, such a structure will allow the DTB user a broad range of capabilities not possible with current talking books.
The Digital Talking Book Standards Committee formed a number of working groups to carry out specific tasks related to development of the standard. Working Group 1 was charged with creating a comprehensive list of the features/functions envisioned for the most complex book accessed through the most sophisticated playback device conceived for digital talking books. This document presents the recommendations of that working group. It is deliberately comprehensive to ensure that the underlying file structure being developed by other NISO working groups will be able to support every conceivable feature.
In all likelihood, the most sophisticated playback device for a DTB will be a personal computer running special software. Such a device will not only be able to play the audio portion of a DTB, but will also have the capability of displaying the text file in appropriate font sizes for users with visual impairment or reading disability. However, it is recognized that most DTB users will, in fact, want a far simpler method of reading talking books. At least three levels of device are envisioned: a very simple unit suitable for users who primarily listen to books or magazines straight through; a more complex, but still portable, device with a user interface that allows sophisticated navigation through a document; and software running on a PC.
It is envisioned that most digital talking books will contain an audio file of recorded human speech. Some will also include the full text of the book in electronic form; the proportion that does will depend mostly on the costs associated with acquiring and marking up text files. Some number will consist only of the marked-up text. Users will be able to access such text files through synthetic speech, screen magnification systems, or braille displays. However, in this document, unless otherwise specified, when mention is made of "reading" or "hearing," the reference is to listening to the sound track of recorded human speech.
In the following discussion the term "book" includes a wide variety of documents besides books themselves, e.g., magazines, journals, reference works, etc.
Many of the navigation features which should be available in a digital talking book of the advanced variety will of necessity correspond to the navigation features available in today's personal computers. Blind people who are sophisticated users of screen access technology, word processors, or book reading software have already been exposed to many of the navigation features discussed here. Moreover, for purposes of discussion, it is assumed that users of the advanced digital talking book text navigation features possess a high degree of technological sophistication.
1.1 Basic Movement Through Text
The advanced digital talking book should provide the ability for the user to move through text one character, word, line, sentence, paragraph, or page (corresponding to the printed page, if present) at a time. In addition, the user should be able to jump to a specific page in the book (e.g., go to print page 55) and any specific line or paragraph on that page.
The user should be able to read the entire publication--from beginning to end--without having to jump up and down a hierarchical tree structure (e.g., moving in and out of the Table of Contents to go to the next chapter).
Another basic movement function that needs to be provided is time. The user should be able to move back and forth through the book using either a small (ten seconds, for example) or a large (e.g., ten minutes) time slice specified by the user.
1.2 More Sophisticated Movement
The user needs to have the ability to "jump" to specific chapters, sections, headings, and other segments of the digital talking book. For example, there should be functions such as "Go to next chapter," "Go to next subheading," "Go to next section," "Go to Chapter 5, Section 1," etc. This feature may be linked to a hierarchical, collapsible "Navigation Control Center" (discussed later), but then again, the user should have the ability to jump to a specific part of the book if its number or title is already known.
2. Fast Forward and Fast Reverse
It would be useful to have a simple tape-recorder-type navigation feature (cue and review function). For example, there could be a slider-like control or push buttons that would allow the user to fast-forward or fast-reverse through the book at a high speed. As the text was traversed, speech could be generated at a high speed using some form of time scale modification. Readers can learn much about the structure of the text that is passing. For example, lists can be detected as a series of short, staccato bursts. Paragraphs, chapter headings, etc. could be indicated by strategically-generated tones. Thus, an individual could just zip forward or backward through the book rather than typing commands to accomplish the same tasks. For some individuals, this interface would be much simpler and easier to use. It might also be much more useful in a document that is long and does not have particularly good titling or sectioning.
An alternative method of allowing the user to skim a document would be to have the playback device read the types of text elements that are passed. For example, the user might hear, "part, chapter, section, paragraph, paragraph,..., section, paragraph, paragraph,..., table, paragraph, paragraph,..., sidebar, etc."
It is recommended that the fast forward and reverse feature allow the book to be traversed anywhere from 10-25 times the normal or real-time reading speed.
It should be possible to read the digital talking book at speeds that are faster than or slower than the normal listening rate. This variable speed feature is necessary to enable playback at a speed that is comfortable and efficient for a wide range of readers. Three times the normal "real-time" rate should be possible, and the slowest speed should be around 1/3 the real-time reading rate.
The device should offer the user the option of "Time-Scale Modification" (TSM), that is, the capability to maintain constant pitch while the playback speed is varied. This feature should be optional, however, so that the user can choose to have the pitch change as the playback speed changes. The TSM system should not produce audible chopping, burble, or reverberation and should not skip over significant units of sound at high playback speeds.
4. Treatment of the Table of Contents
Most, if not all, books are supplied with what we traditionally call a Table of Contents. In printed works, the Table of Contents represents an important way for the reader to locate specific parts of the book. In a digital talking book, these functions can be performed by both the Table of Contents and by another tool called the Navigation Control Center (NCC -see following section), with the NCC offering significantly more capabilities. If the original book does contain a Table of Contents per se, it should be represented in the digital talking book exactly as it is in the printed work. When looking at the Table of Contents from the highest possible view of the book, it is merely another element of the book--similar to a chapter, Introduction, Foreword, or title page. When focusing in on the Table of Contents, the reader would hear a list of headings with associated page numbers (if present in the original work). From each heading, representing a specific segment of the book, it should be possible to jump directly to the segment and from the segment back to the Table of Contents. In addition, the user should be able to utilize the function described in Section 16, "Nested Lists" to determine at which level within the Table of Contents a given heading falls.
The digital talking book should have incorporated into its design a "Navigation Control Center" (NCC), which allows the user to easily obtain an overview of the material in the book while, at the same time, providing a convenient means for navigating through the book. This NCC should appear to the user to be a dynamic outline that can be collapsed or expanded easily.
The structure represented in the NCC should be the structure of the book as defined by the author. The Table of Contents prepared by the author can serve as a means of determining the basic structure. Additional levels of information can then be added based upon the headings and hierarchy provided by the author in the book itself (which usually goes beyond that reflected in the author-supplied Table of Contents). Talking Book producers should not reorganize or restructure the book but instead use the NCC as a means of enhancing the structure already defined by the author.
The most detailed level of the NCC should incorporate all of the components of the book including:
The NCC is also an ideal place to list footnote references. The user should be able to read the actual footnote when, while moving through the NCC, its reference is spoken.
The user should have the choice of whether to read the NCC in a circular (top-to-bottom-to-top) fashion or in a single-pass (top-to-bottom only) fashion.
In addition to any labeling that may have been placed in the book by the original author, there should be additional standardized labeling available in the NCC to aid in the determination of the heading level being examined. See Section 16, "Nested Lists."
5.1 Moving Between the Navigation Control Center and the Actual Book
While the user is examining the Navigation Control Center, it should be possible to jump immediately to the beginning of the section, chapter, or heading whose title is being spoken. Once the actual text of the desired item has been read, the user should be able to either continue on to the next item or return to the NCC. In so doing, the user should have some options as to where to end up within the NCC. One option should be to return to the place in the NCC that was last read--in other words, the jumping off point. Another option should allow the user to return to the NCC at a spot which corresponds logically to the location of the text that the user was just reading.
If the digital talking book contains notes, e.g., footnotes or endnotes, the user should have a good deal of flexibility in how they are presented as the book is read. The user should be able to choose to hear the note references and the notes, the note references only, or neither. If the user has chosen to hear only the note references, he or she should be able to override the current setting and hear a given note.
At any time during the reading of a note, the user should be able to return to the point in the text immediately following the point of departure. For example, if the note was read at the end of the sentence, the user interrupting the reading of the note should be returned to the beginning of the next sentence.
If after reading a passage without listening to the notes contained within it, the user wished to hear the notes and their context, he or she should be able to go to each note reference, back up a short distance, and listen to that portion of the text and to the note.
All cross references in the digital talking book should be set up as hypertext links--that is, links that, when triggered by the user, move the user immediately to the target location. An example of a cross reference might be "For additional information, see Chapter 5."
The user should have the option of being notified via an audible signal when a cross reference is encountered while reading a book. User options should include enable (default), disable, and a choice among several audible indicators. The user should also be able to query the player as to whether a given link is to an internal or external target, since the decision to follow a link may depend on the target's location. If the playback device is connected to the Internet or other network, the user should be able to follow external links.
When the user prompts the playback device to follow a link, the device should launch the nearest previous link.
Having followed a cross reference to a target item, the user should be able to then return from that item to the cross reference source. The target item itself may contain cross-referenced text which points to another chapter, topic, appendix, or table; and the user should be able to follow this cross reference as well. It should also be possible for the user to retrace the cross reference path back to the original cross-referenced text.
Consider the following example. We have a paragraph describing the steps necessary to install a word processing software package. The paragraph talks about creating a DOS batch file. The text "create a batch file" is highlighted and linked to an appendix at the end of the book called DOS Batch File Creation and Syntax. In the appendix, a reference is made to loading a program into high memory. The phrase "high memory" links to another chapter in the book called "Use of High Memory." From there, the user should be able to trace back to the appendix and then back to the installation instructions for the word processing software.
If the book has an index, it should be possible to jump directly from an index entry to the top of the page which it references and then back to the index. This differs from a cross reference or hypertext link in that the link from the index is to the top of a page and not to any specific text.
The index of the digital talking book can be conceived of as a simple
text page with each page reference in the index acting as a hypertext link
to the top of a print page in the book. Where multiple pages are given
for a word or phrase, each of the individual page references would be a
hypertext link pointing to the top of the referenced page. Where a page
range is given for a word or phrase, the hypertext link would point to
the top of the first
page in the range.
The user should be able to set a large number of bookmarks within the book. These should not actually be inserted into the book itself but saved in a separate file that would be synchronized with the book. This file should be capable of being exported and used on other compatible devices.
Each bookmark should be capable of being tagged by a text or voice label for which the user can search. For example, the user might want to locate a bookmark containing the label "Scientific Discoveries," which is either a text label or one recorded with the user's voice. There should be enough storage available for labeling so that the bookmark can be used for annotation purposes. The user should be able to browse through all bookmarks that may have been set for the book, regardless of any label that may be associated with a particular bookmark, and jump to any of them. The user should be able to assign the same label to multiple bookmarks to create a set of related bookmarks. A separate set of bookmarks is maintained for each book being read.
Any time a user stops playback, an unlabeled bookmark should be automatically placed at that point. The user can choose to disable this feature.
It should be possible to highlight portions of the digital talking book, assigning similar or different characteristics (labels) to each section highlighted. The user should be able to create text- or voice-input labels. For example, the designator "Professor Smith's ideas" could be used to designate highlighted passages of text which a student thinks are important for an exam to be given by Professor Smith. Another category, "Final Exam Notes" could be used to designate highlighted passages of text which a student might want to review in preparation for the final exam. As with bookmarks, it should be possible for the user to browse through all highlighted text, regardless of the designator used to identify the highlighting; and like bookmarks, there should be a sufficient amount of storage for identifiers to enable the user to include notes to be associated with the highlighted text.
With this feature, the user should be able to jump to a specific highlight, identified with a text or voice label, or to review a list of all highlighted text items.
As the user is reading along in the book, some indication should be provided as to when highlighted text (provided by the author or inserted by the reader) has been encountered. Also, the user should be able to learn what label (if any) has been attached to the highlighted text.
Like bookmarks, information about text highlighted by the user should
be stored in a file that is separate from the actual book itself. The difference
between highlighting and bookmarking is that the former lays down two
markers, one each at the beginning and end of a segment, while the
latter lays down only a single marker.
If an individual is doing any type of research or just making notes, he or she will find it useful to be able to copy a portion of text from a digital talking book and paste it into another location. Some type of mechanism that would allow this capability, within the bounds of copyright law, should be provided.
It should be possible to search the book for specific text strings. It should also be possible to search the book and move to specific structural and other tags. For example, the user should be able to search for and jump to all of the pictures in the book (assuming that they are described), all of the sidebars, and all of the footnotes, or any other text or structural element in the book (e.g., search for headings of level 1 type, search for ordered lists, search for unordered lists, jump to next unordered list, jump to next ordered list).
It should be possible to have individual words spelled. If the user is listening to a digital recording of human speech as opposed to the text file rendered in synthesized speech, a mechanism must be in place to synchronize the speech with the text file so that the user can ask for a word to be spelled as soon as it is spoken.
14. Text Attributes and Punctuation
The user needs to be able to know when and where bolded, italicized, or otherwise emphasized text and such elements as subscripts and superscripts occur within the digital talking book. Any feature which provides this information should be one that the user can easily turn on and off. Character sets supporting international characters (e.g., double-byte codes used in Asian languages or extended ASCII characters) should be accommodated. The user should also have the ability to identify text attributes such as background and foreground color, font type and size.
When listening to a synthesized speech rendering of the text file, the user should also be able to control how much punctuation is spoken. For some publications, it is important to hear every single comma, period, space, or exclamation point. For others, the user may want to hear only full words spoken.
It should be possible for the user to choose a variety of ways to read a table whether reading the text file of a digital talking book via synthetic speech, large print, or braille, or listening to the human speech recording. Possible ways to read a table include:
(1) Reading the table one row at a time. The user would hear the row heading and the contents of the row. Optionally, column headings could be spoken before reading any item in the row or spoken only when reading items in the first row.
(2) Reading the table one column at a time. The user would hear the column heading and the contents of the column. Optionally, row headings could be spoken before reading any item in the column or spoken only when reading items in the first column.
(3) Locating a given cell in a table and hearing the value of the item in the cell. This might be accomplished by allowing the user to traverse the list of row or column headings until the desired row or column is found, then to traverse the table in the other direction, hearing headings until the appropriate cell is found.
When an item in a list contains its own secondary list of items, that second list is said to be "nested" inside the main list. Users should be able to invoke a function that assists with the comprehension of the layout of nested lists. The user needs to be able to determine at what level within a nested list a given item falls. One approach would be to apply a numbering scheme such as that used for legal documents (e.g., 4.2.3.6) that will tell the user precisely where in a list the current item falls.
Any book, whether electronic or on paper, contains a variety of elements--some physical and some logical. For example, the page is an example of a physical element in a book written on paper whereas a chapter heading is an example of a logical element which is similar in character to any other piece of text in the book but which has been given the logical designator "heading." In many cases, it is not as important to understand the physical format of a given piece of text as it is to understand the logical elements of which it is composed. In other situations, it is important to keep text in its original format. These principles have been recognized in the use of descriptive markup languages and other tagging schemes used in conjunction with electronic text.
Given that an electronic version of a book will likely contain markup tags (e.g., to designate headings, paragraphs, and other elements), it should be possible to search the book for specific tags. Where tags are not present--as in the case of pre-formatted text--there should be provision in the navigation system to give the reader information about layout and structure of text that may not be provided by tags.
18. Skipping User-Selected Text Elements
The user should be able to instruct the playback device to skip over specified elements of the document such as picture captions, optional producer's notes, tables, sidebars, etc.
If a sighted person can obtain information on his or her current location within a printed document by an examination of the document, then that information should be available to the digital talking book user. Location information should be relative to the particular publication. Three types of location information suggested are:
(1) Logical -- where are you with respect to the chapter--what chapter are you in, what section, subsection, etc.?
(2) Physical -- what page of the book are you in and on what line (where applicable)?
(3) Temporal -- how much time remains in the chapter and the book?
The user interface could be set up to provide more detailed information in response to more presses of a button. That is, the first press might elicit the title, the second the chapter, the third the page number, etc.
The user should have access to enough information to create meaningful footnote references when extracting information from the book.
20. Summary and Reporting Information
A feature should be available that permits the user to obtain a quick overview of the book (e.g. 563 pages, 9 hours of play, 4 levels of headings, 2 parts, 12 chapters, 5 tables). Suggested information includes: title, author, playing time, number of major logical elements (parts, chapters), and if applicable, number of printed pages.
Reporting should include a dynamic summary--that is, if the user is in a specific Part of the book, information should be provided as to the number of chapters contained within that Part. If the user is in a chapter, he/she might be able to learn the number of sections and maybe the number of pages.
While it has traditionally been difficult if not impossible to render audibly scientific and mathematical information in a meaningful way, and while the traditional ASCII character set as we know it creates difficulties for the encoding of such information in a digital form, it is nevertheless essential to consider the issue in terms of the digital talking book and text navigation. It is possible that in time, an approach to rendering math and science information audibly in a meaningful way will be developed. In designing the navigation software, therefore, provisions should be made to accommodate any such approach, should it become available.
22. Other Kinds of Visual Representations
Other kinds of visual representations such as organizational charts, flow charts, family trees, etc. should be able to be treated using specialized presentation techniques, depending on the information available in the original document.