National Information Standards Organization
File Specifications for the Digital Talking Book



Draft -- Version 3.8

February 1, 2001

Copyright © 2001 by National Information Standards Organization




Table of Contents

  1. General Information
  2. Overview
  3. Package File
  4. Content Format for Text
  5. Audio File Formats
  6. Other Media File Formats
  7. Synchronization of Media Files
  8. Navigation Control
  9. Portable Bookmarks/Highlights
  10. Resource File
  11. Packaging Files for Distribution
  12. Presentation Styles
  13. Types of DTB
  14. Digital Rights Management
  15. Time-Scale Modification
  16. Conformance
  17. References to Other Specifications/Documents

Foreword

(This foreword is not a part of the American National Standard for Digital Talking Books... . It is included for information only.)

This standard presents the file specifications for digital talking books (DTBs) for blind, visually impaired, physically handicapped, or otherwise print-disabled readers. For many years, "talking books" have been made available to print-disabled readers on analog media phonograph records and audio cassettes. Those media served their users well in providing human-speech recordings of a wide array of print material in increasingly robust and cost-effective formats. However, analog media are limited in several respects when compared to a print book. First, they are by their nature linear presentations, which while suitable for novels, leaves much to be desired when reading reference works, textbooks, magazines, and other materials which are often accessed randomly. Digital media offer readers the ability to move around a book or magazine as freely as (and more efficiently than) a sighted reader flips through a print book. Second, analog recordings do not allow users to interact with the book, placing bookmarks, highlighting material, and so forth. A DTB offers this capability, storing the bookmarks and highlights separate from, but associated with, the DTB itself. Third, talking book users have long complained that they did not have access to the spelling of the words their heard. As will be explained below, some DTBs will include a file containing the full text of the work, synchronized with the audio presentation, thereby allowing readers to locate specific words and hear them spelled. Finally, analog audio offers readers only one version of the document. If, for example, a book contains footnotes, they are either read where referenced, which burdens the casual reader with unwanted interruptions, or grouped at a location out of the flow of the text, making it difficult for interested readers to access them. A DTB allows the user to easily skip over or read footnotes. So the Digital Talking Book offers the print-disabled user a significantly enhanced reading experience -- one that is much closer to that of the sighted reader using a print book. This standard describes the various files that make up a DTB and specifies how each must be formatted.

DTBs go far beyond the limits imposed on analog audio books because they can include not just the audio rendition of the work, but the full text file and images as well. Because the text file is synchronized with the audio file, a DTB offers multiple sensory inputs to readers, a great benefit to learning-disabled readers, for example. Some visually impaired readers may choose to listen to most of the book, but find that inspecting the images provides information not available in the narrative flow. Others may opt to skip the audio presentation altogether and instead view the text file via screen-enlarging software. Braille readers may prefer to read some or all of the document via a refreshable braille display connected to their DTB player and accessing the text file.

Digital Talking Books are not tied to a single distribution medium. CD-ROMs will be used first but DTBs will be portable to any digital distribution medium capable of handling the large files associated with digital audio recordings. Regardless of how a DTB is distributed, however, it must be in the context of a digital rights management system whose functional requirements this standard describes.

The initiative behind this document grew from a desire to standardize DTB file structures, in the hope that it might prevent a recurrence of the multiple formats currently used for talking books throughout the world. This document benefitted greatly from the work of the DAISY Consortium, whose members had broken much of the ground covered in this standard and who contributed enormously to the solution of the many problems encountered.

NISO Voting Members

NISO Board of Directors

Standards Committee AQ

Standards Committee AQ on Digital Talking Books had the following members at the time this standard was approved:

Contents

Acknowledgements

Standards Committee AQ gratefully acknowledges the assistance of the following individuals: Robert Berkovitz, Sensimetrics Corporation; Harvey Bingham; Mike Brown; John Churchill, Recording for the Blind and Dyslexic; Hiromitsu Fujimori, Plextor Corporation; Manon Gaudet, VisuAide, Inc.; Al Gilman; Steve Jacobs, NCR Corporation; Lynn Leith, Canadian National Institute for the Blind; Rob Meredith, American Printing House for the Blind; Tatsu Nishizawa, Plextor Corporation; James Pritchett, Recording for the Blind and Dyslexic; Dr. Gregg Vanderheiden, TRACE Research and Development Center, University of Wisconsin; Mr. Paul Vassallo, National Institute of Standards & Technology; Norm Welch, EvaTone, Inc.; with special thanks to members of the DAISY Consortium's Specifications and Guidelines Work Team. Thanks also to these members of the W3C Synchronized Multimedia (SYMM) Working Group: Dick Bulterman, Oratrix; Wo Chang, NIST; Lloyd Rutledge, CWI; Patrick Schmitz, Microsoft.

Contents

1. General Information

(This section is informative)

1.1 Purpose and Scope of Standard

This standard establishes the file specifications for digital talking books (DTBs) for blind, visually impaired, physically handicapped, or otherwise print-disabled readers. Its purpose is to ensure interoperability across service organizations and vendors providing content and playback systems to the target population.

This standard provides specifications applicable to all aspects of digital talking book production and rendering, including authoring tools for DTBs, hardware- or software-based playback devices, and compliance-testing software.

Contents

1.2 Definitions

The following acronyms and terms are used in this standard as defined below. In the following definitions and throughout the standard, bracketed items correspond to entries in section 17, "References to Other Specifications/Documents," where the full URL is provided for each reference.

Accessible
With respect to implementations, accessible refers to the design and functionality of the playback system where all features are usable by the target population.
CSS
Cascading Style Sheets [CSS] is a mechanism for adding style (e.g. fonts, colors, spacing, formatting) to HTML or XML documents.
DRM
Digital Rights Management is a system of tools and processes that protect intellectual property when it is encoded and distributed in digital form.
DTB
The Digital Talking Book content data set that complies with the specifications in this standard.
DTD
The Document Type Definition file contains machine-readable rules that define allowable XML markup for a particular application.
DTBook3
DTBook3 is a unique DTD file (dtbook3.dtd) that defines the XML markup for the text content of a DTB.
Fragment Identifier
A means to address a named place in a document. For reference within the current document, the reference part is to a named target, and begins with "#".See URI for addressing into another document.
Global navigation
Efficient movement to user-selected portions of a document, with that movement enabled by the NCX. Navigation targets may be headings representing the hierarchical structure of the document or specific points such as pages, notes, sidebars, etc.
Guide
A component of the Package File, the Guide lists the key structural features of a DTB, such as the table of contents, introduction, bibliography, etc. to enable playback devices to provide convenient access to them.
IMPLIED
When used in definitions of attributes, means the attribute is optional, as opposed to REQUIRED.
Informative
An explanatory part of this standard. Contrast with normative.
Local navigation
Movement within a document at a granularity finer than that provided by the NCX. For example, navigation by paragraph or sentence, or within a table or nested list. Precise local navigation is controlled by the text file; the granularity is limited by the degree to which the text file has been marked up. Time-based movement through a document (similar to fast-forward and rewind on an analog cassette) may also be implemented.
Manifest
A component of the Package File, the Manifest lists all files included in the DTB.
May
In this standard, the word "may" is to be interpreted as an optional feature that is not required but can be provided.
Must
In this standard, the word "must" is to be interpreted as a mandatory requirement on the content or implementation. The term "shall" has the same definition as "must".
Normative
A portion of the standard that supplies precise specifications rather than background or explanation. Contrast with informative. Notes within a normative section may be informative.
NCX
The Navigation Control file for XML applications (NCX) provides the reader efficient and flexible access to the hierarchical structure of a DTB as well as direct access to selected elements such as page numbers, notes, figures, etc.
OEBF (Open eBook Forum)
An organization formed to create and maintain standards and promote the successful adoption of electronic books. The Open eBook Publication Structure Version 1.0.1 OEBPS provides a specification for representing the content of a book when it is converted from print to electronic form. This DTB standard utilizes a subset (the Package File) of that specification.
OPF
See "Package File."
Package File
The Open eBook Forum Package File (OPF) is an XML file conforming to the oebpkg1.dtd that contains administrative information about the DTB, the files that comprise it, and how these files interrelate.
Playback
With regard to implementations, playback refers to the methods used to render the DTB content. Playback may include audio, braille, large print, and synthetic speech as appropriate for the content and as supported by the playback system.
Player
The hardware/software platform which renders the contents of a DTB to a user. Synonymous with "Playback System."
Reader
The person reading the digital talking book. Synonymous with "user."
REQUIRED
When used in definitions of attributes, means the attribute is required, as opposed to IMPLIED.
SMIL
The Synchronized Multimedia Integration Language [SMIL] is a draft W3C specification (SMIL 2.0) utilized in this standard to control the synchronized presentation of content in multiple media.
Shall
See "Must."
Should
With respect to implementation, the word "should" is to be interpreted as an implementation recommendation, but not a requirement. With respect to content, the word "should" is to be interpreted as recommended programming practice for content.
Spine
A component of the Package File, the Spine lists the SMIL files included in the DTB in default reading order.
Target population
The target population consists of blind, visually impaired, physically handicapped and otherwise print-disabled readers.
TSM
Time-scale modification (TSM) is variable playback rate (both slower and faster than real time) while maintaining constant pitch.
User
See "Reader."
Text File
The content of the subject document in a character set specified by ISO/IEC 10646 to which XML markup has been applied.
URI
Uniform Resource Identifier, the means to uniquely identify a document and reference it. A URI may include a fragment identifier suffix beginning with "#" that matches some named anchor in the target document.
XML
A file conforming to the Extensible Markup Language 1.0 [XML] specification.
XSL
A file conforming to the Extensible Style Language [XSL] specification.
Contents

1.3 Strategy

This standard is based primarily on a variety of widely used standards and specifications, including several from the World Wide Web Consortium and the Open eBook Forum. Wherever applicable and appropriate standards or specifications existed they were used. The use of these specifications and technologies is intended to promote a fast and consistent adoption of this standard for the target population, while encouraging its extension into mainstream use.

Contents

1.4 Relationship to Other Specifications

This standard is based on the specific versions of the standards and specifications referenced herein, which are used as defined, except as noted by this document. Any refinement or replacement of a referenced specification by a newer or different version is not directly applicable to this standard. Conformance to this standard is based on the versions of the standards and specifications in effect at the time of this writing.

Contents

1.5 Patent Rights

It is possible that compliance with this standard may require the use of one or more inventions covered by patent rights. It is believed that all companies claiming such rights have agreed to grant a license under such rights that they hold on reasonable and non-discriminatory terms and conditions to any applicant.

Producers of DTB systems or any component thereof are responsible for obtaining the appropriate licenses for any and all technology defined by the relevant standards and specifications referenced by this standard.

Issues surrounding the protection of intellectual property embodied in the works distributed as digital talking books are discussed in section 14, Digital Rights Management.

Contents

2. Overview

(This section is informative)

A digital talking book (DTB) is a collection of electronic files arranged to present information to the target population via alternative media, namely, human or synthetic speech, refreshable braille, or visual display, e.g., large print. When these files are created and assembled into a DTB in accordance with this standard, they make possible a wide range of features such as rapid, flexible navigation; bookmarking and highlighting; keyword searching; spelling of words on demand; and user control over the presentation of selected items (e.g., footnotes, page numbers, etc.) For a full discussion of these capabilities, see the Document Navigation Features List [Navigation Features], developed as the user requirements document on which this standard was based. Appendix 7 "Theory Behind the DTBook3 DTD also describes the navigational capabilities of a DTB in some detail. The content of DTBs will range from audio alone, through a combination of audio, text, and images, to text alone.

DTB players will also take a variety of shapes. The simplest might be portable devices with audio-only capabilities. More complex portable players could include text-to-speech capabilities as well as audio output for recorded human speech. The most comprehensive playback systems are expected to be PC-based, supporting visual and audio output, text-to-speech capability, and output to a braille display.

The files comprising a DTB fall into ten categories, as described below:

Package File
The Package File, drawn from the Open eBook Publication Structure 1.0.1, contains administrative information about the DTB and the files that comprise it. An XML version 1.0 file, it contains a complete set of metadata describing the DTB, a list (the manifest) of all files that make up the DTB, a spine that defines the default reading order of the document, and an optional guide to enable easy access to key points in the DTB. See section 3, "Package File."
Document Text File
A DTB may contain part or all of the text of the document, as an XML 1.0 file tagged in accordance with the document type definition (DTD) defined for this standard, DTBook3.dtd. (See Appendix 5, "DTBook3 DTD".) The document text file enables a playback device to spell words on demand, carry out keyword searches, and permit finely-grained navigation. It may also be accessed directly via refreshable braille display, synthetic speech, or screen-enlarging software. See section 4, "Content Format for Text."
Audio Files
A DTB may include human or synthetic speech recordings of the document, embodied in audio files encoded in one of a specified group of audio formats. Section 5, "Audio File Formats," presents the set of formats specified by this standard.
Other Media Files
In addition to text and audio, DTBs may include images which can be presented on PC-based players. Section 6, "Other Media File Formats," lists the formats specified by this standard.
Synchronization Files
To synchronize the different media files of a DTB during playback, this standard specifies the World Wide Web Consortium's (W3C) Synchronized Multimedia Integration Language (SMIL), SMIL 2.0 version, an XML 1.0 application. The DTB SMIL files define a sequence of media events. During each event, text elements and corresponding audio clips as well as any additional visual elements are presented simultaneously. DTB players utilize the synchronization information to both index into the audio presentation and to track, during audio playback, the corresponding position in the document text file. IDs on the SMIL elements match those on the corresponding elements in the text file. This standard utilizes a subset of the full SMIL 2.0 specification. See section 7, "Synchronization of Media Files," for discussion of these issues and Appendix 4, "DTB SMIL Profile and DTB-Specific DTD," for the DTDs and Modules that define the DTB SMIL application.
Navigation Control File
The DTB system supports two modes of navigation, global and local. Global navigation -- movement by structure (chapter, section, subsection) and by other selected points such as pages -- is effected through the Navigation Control file for XML applications (NCX) and the Lightweight Navigation file (LWN). The NCX and LWN present a dynamic view of the document's hierarchical structure, allowing the user to move through the document in large steps corresponding to its major divisions, or in progressively smaller steps down to a limit set by the document's detail. Text, audio, and image elements present to the user the document's headings, and id-based links point to the SMIL presentation at the corresponding locations. Appendix 1 contains the XML 1.0 DTDs for the NCX and LWN. Local (more finely-grained) navigation is not handled by the NCX but is enabled through the document text file or through time-based movement through the audio presentation, depending on the document and on the player. See section 8, "Navigation Control," and Appendix 1, "DTD for NCX." for specifications related to the NCX and LWN.
Bookmark/Highlight File
This standard supports user-set, exportable bookmarks and highlights to which text and audio notes may be applied. Specifications for the XML 1.0 file for portable bookmarks and highlights are presented in section 9, "Portable Bookmarks/Highlights" and Appendix 2, "DTD for Portable Bookmarks/Highlights."
Resource File
The resource file contains various text segments, audio clips, and/or images that provide alternative representations of navigational information -- for example, feedback on the user's current location in the document. It supplies information normally presented in a print book via typographical clues. It is an XML version 1.0 file based on the DTD "resource.dtd". See section 10, "Resource File," and Appendix 3, "DTD for Resource File" for file specifications.
Distribution
Given the great size of audio files, even when heavily compressed, it will be common for large books to span several distribution media. Section 11, "Packaging Files for Distribution," describes how DTB producers will map the location of each file to a specific media unit, e.g., disk 1 of 3. Alternatively, several small books may be distributed on the same media unit. Appendix 8, "Distribution Information DTD," presents the document type definition for "Distinfo" files.
Presentation Styles
Section 12, "Presentation Styles," discusses how presentation styles may be controlled for a variety of outputs modes through the use of optional style sheets.
Contents

3. The DTB Package File

(This section is normative)

The Package File, drawn from the Open eBook Forum™ (OEBF) Publication Structure 1.0.1, is a collection of files that contain administrative information about the DTB, the files that comprise it, and how these files interrelate. The Package File as utilized in this standard is identical in most respects to the Package File specified in the OEBF Publication Structure 1.0.1. Despite those few differences, the document type definition (DTD) that describes the OEBF package file also describes the DTB Package File. The first difference between the OEBF package file and the DTB application is simply an extension of section 2.4 of the Publication Structure to allow the spine element to refer to media types other than text/x-oeb-document. Specifically, DTBs require the spine to also allow SMIL files. The second difference is that items in a DTB manifest need not be restricted to OEBF core MIME types, or provide fallbacks that resolve to such MIME types.

A DTB conforming to this standard must include exactly one Package File which must be a valid XML 1.0 document that conforms to the OEBF package DTD (oebpkg1.dtd). Where multiple DTBs are distributed on a single piece of distribution media (e.g., CD-ROM) each DTB must include its own Package File. See section 11 for more information about multiple DTBs on one piece of media.

(This section is informative)

The full specification and DTD for the OEBF package file (section 2 of the OEBF Publication Structure 1.0.1 [OEBPS]) are available on the OEBF site. This section, drawn largely from the Publication Structure, provides only a brief summary of the function of each section with an example illustrating how it is applied to the DTB. Please see section 2 of the full OEBF Publication Structure 1.0.1 for complete details on the Package File.

The Publication Structure describes the major parts of the Package File as follows:

Here is an informal outline of the package file:

<?xml version="1.0"?>
<!DOCTYPE package PUBLIC "+//ISBN 0-9673008-1-9//DTD OEB 1.0 Package//EN" "http://openebook.org/dtds/oeb-1.0/oebpkg1.dtd">
<package>
</package>

3.1Package Identity

(This section is normative)

The root (the outermost) element in a package file is package. All other elements are nested inside it. The package must include a value for its unique-identifier attribute. This is required because more than one dc:Identifier may be present in a DTB's Package File metadata and the unique-identifier specifies which dc:Identifier element provides the package's preferred, or primary, identifier. The value of unique-identifier must match the id of the primary dc:Identifier.

The package file's author must ensure that the primary identifier is globally unique to the DTB, i.e., to the set of files listed in the package file's manifest. The dc:Identifier itself has an optional scheme attribute which names the system or authority that generated or assigned the text contained within the dc:Identifier element, for example "ISBN," or "DOI."

(This section is informative)

Example 3.1:

... <package unique-identifier="uid">
</package>

3.2 Publication Metadata

(This section is normative)

This portion of the Package File contains the information about a DTB that would normally be found in a library catalog record. It includes data about the DTB itself (e.g., title, author, producer, format, and narrator) as well as information about the source publication (usually a print book) such as publisher, edition, copyright statement, etc.

The Package File must contain exactly one metadata element which must contain one and only one dc-metadata element holding Dublin Core [DC] metadata and may contain supplemental metadata in an x-metadata element. If used, the x-metadata element must contain one or more instances of the meta element, which uses "name" and "content" attributes to define its value.

3.2.1 Dublin Core Metadata

(This section is normative)

The Publication Structure describes the use of Dublin Core metadata in the following three paragraphs.

" The dc-metadata element can contain any number of instances of any Dublin Core elements. Dublin Core element names begin with the "dc:" prefix followed by a leading uppercase letter. Dublin Core metadata elements may occur in any order; in fact, multiple instances of the same element type (multiple dc:Creator elements, for example) can be interspersed with other metadata elements without change of meaning.

"For upwards-compatibility, the element metadata in an OEB package is required to have an attribute of xmlns:dc="http://purl.org/dc/elements/1.0/" and xmlns:oebpackage="http://openebook.org/namespaces/oeb-package/1.0/".

"Each Dublin Core field is represented by an element whose content is the field's value. The dc:Title and at least one dc:Identifier must be included in the dc-metadata element. Dublin Core elements, like any other elements in the OEB package file, may have an id attribute specified. At least one dc:Identifier, that which is referenced from the package unique-identifier attribute, must have an id specified."

Following are brief definitions of the Dublin Core elements. See the Publication Structure and the Dublin Core itself for more complete descriptions. The attributes "xml:lang" and "id" can be applied to all "dc:..." elements. Additional attributes can be used with several elements as detailed below.

3.2.2 X-Metadata

(This section is normative)

The following elements were developed for the DTB application to supply information that the Dublin Core element set does not cover. The following elements would appear within the x-metadata containing element.

Authoring tools must enforce the use of required elements.

(This section is informative)

Example 3.2:

...
<metadata>
</metadata>
...

3.3 Manifest

(This section is normative)

The manifest contains a list of all of the files (documents, images, style sheets, etc.) that make up the DTB, including the package file itself. Each file is referenced by an item element. Each item must have valued an href attribute (the URI of the referenced file; it must not include fragment identifiers), a media-type attribute containing the MIME media type of the file, and an id attribute. The id is specifically utilized when a manifest item is referenced by the spine. The manifest also includes fallback declarations for files of types not supported by this standard (see OEBF Publication Structure for details). A sample manifest for a DTB with audio, structure, and text follows (multimediaType=audioFullText):

(This section is informative)

Example 3.3:

...
<manifest>
</manifest>
...

Here is a manifest for an audio-only version of the above DTB (multimediaType=audioNcx), where separate SMIL files were created for each segment of the book.

Example 3.4:

...
<manifest>
</manifest>
...

3.4 Spine

(This section is normative)

The spine consists of a list of one or more itemref elements whose order defines the default linear reading order for the DTB. Each itemref contains an idref which points to the id of a file listed in the manifest. The SMIL files for the DTB must be listed in the spine in order of presentation. The player must consult the spine when it reaches the end of a SMIL file to determine which file to render next. The first of the following examples shows the spine that corresponds to the first of the two manifest examples above:

(This section is informative)

Example 3.5:

<spine>
</spine>

The following </spine> matches the second manifest example above. The correct reading order is presented here. Note that it does not match the order of files in the manifest, where order is not significant.

Example 3.6:

<spine>
</spine>

3.5 Tours

(This section is informative)

The tours section of the Package File is described in the OEBF Publication Structure as follows: "Much as a tour guide might assemble points of interest into a set of sightseers' tours, a content provider may assemble selected parts of a publication into a set of tours to enable convenient navigation. ... Reading systems may use tours to provide various access sequences to parts of the publication, such as selective views for various reading purposes, reader expertise levels, etc." Because of inherent differences between the structure of a DTB and the OEBF tours, it is not feasible to implement tours in a DTB prepared in accordance with this standard. If a producer wishes to provide alternate versions of a DTB that supply the functionality described above, they may do so by producing versions with different NCXs.

3.6 Guide

(This section is informative)

As specified in the OEBF Publication Structure, the guide lists the key structural features of the DTB, such as the table of contents, introduction, bibliography, etc. to enable playback devices to provide convenient access to them. Because DTBs include a mandatory NCX that satisfies a more rigorous and detailed access requirement, the guide may not be used frequently in DTBs. The Publication Structure defines a limited set of recognized "types."

Example 3.7:

<guide>
</guide>
Contents

4. Content Format for Text

4.1 Introduction

(This section is normative)

This standard defines an XML 1.0 element set (DTBook3) for markup of the text files of books and other publications presented in digital talking book (DTB) format. A Document Type Definition (DTD)-- dtbook3.dtd -- specifies the markup elements ("tags") and attributes needed to define a document's structure and the semantics of the tagged items. This DTD can be found in Appendix 5, "DTBook3 DTD." To be compliant with this standard, a text file of a DTB must be valid to dtbook3.dtd.

Any element that is to be referenced from the navigation file or synchronization file must contain a unique id.

DTB content producers may extend the base DTD by including one or more new elements or full modules for special situations. To remain conformant with this standard, such extensions must employ the mechanisms specified by XML 1.0.

4.2 Using the DTBook3 Element Set

(This section is informative)

A discussion of the rationale underlying the DTBook3 element set and the benefits it provides to digital talking book applications is located in Appendix 7, "Theory Behind the DTBook3 DTD."

An alphabetical listing of the DTBook3 tags, with definitions, is included in section 4.4. Two documents external to this standard provide detailed information on the use of the element set. First, an expanded version of the DTD, in HTML format, (see [DTBook3 HTML] provides full detail on each element, describing where it can be used and which elements can be used within it, along with an expanded list of attributes.

Second, a comprehensive set of guidelines [StructGuide] for applying the DTBook3 markup is available from the DAISY Consortium. These Structure Guidelines describe the correct application of the DTBook3 element set, emphasizing the importance of capturing the structure of the text content and providing detailed examples of the use of nearly all elements. The expanded DTD and guidelines are not normative.

For more information on XML 1.0 markup and DTD usage, see the W3C XML site [XML].

4.2.1 Modular Extension of the DTD

(This section is informative)

The DTBook3 DTD includes a base set of 83 elements for use in tagging a broad range of material. Additional modules containing tags for specialized applications such as poetry, plays, dictionaries, mathematics, etc. can be "invoked" from within a DTBook3 document when needed, as described below.

A DTBook3 document is an XML application. Therefore it should begin with the XML processing instruction identifying the version of XML, and the character set encoding (see Appendix 5 - DTBook3 DTD for more information):
<?xml version="1.0" encoding="ISO-8859-1" ?>

This is followed by the document type declaration, the DOCTYPE:

<!DOCTYPE dtbook3 PUBLIC
             "-//NISO//DTD dtbook3.dtd Version 3-07 2001-01-29//EN"
             "http://www.loc.gov/nls/niso/dtbook3.dtd"
             "dtbook3.dtd">

A book can invoke other DTDs or modules to augment the DTBook3 DTD by adding instructions in square brackets before the concluding ">" of the document type declaration. Such instructions in square brackets are called the "internal subset of declarations." For example:

<!DOCTYPE dtbook3 PUBLIC
         "-//NISO//DTD dtbook3.dtd//EN"
         "http://www.loc.gov/nls/niso/dtbook3.dtd"
         "dtbook3.dtd"
         [
             <!ENTITY % dramaModule 
"http://www.loc.gov/nls/niso/drama.dtd" >
             %dramaModule;
             <!ENTITY % externalblock "| drama">
             <!ENTITY % externalinline "| stagedir">
         ]>

The first line of the internal subset declares an entity known as "&dramaModule" and provides the URI where that module can be found. The second line invokes the entity, that is "brings it into" the current document, just as the DOCTYPE declaration invoked the base DTD (DTBook3). The third line declares the entity "&externalblock" and gives it the value "drama." Since DTBook3 contains an entity of the same name, and the internal subset overrules the base DTD in areas of conflict, everywhere in DTBook3 where &externalblock appears (wherever block elements are allowed), the value "drama" is substituted. Since "drama" is the root element in the drama module, the full drama module can be used there. Similarly, the last line effectively allows the element "stagedir" to be used anywhere %externalinline is allowed in DTBook3 (wherever inline elements can be used).

More than one module may be needed and included in a book. In the following example, both a poetry and drama module are invoked, as well as one inline element from the drama module.

         [
             <!ENTITY % poemModule 
"http://www.loc.gov/nls/niso/poem.dtd" >
             %poemModule;
             <!ENTITY % dramaModule 
"http://www.loc.gov/nls/niso/drama.dtd" >
             %dramaModule;
             <!ENTITY % externalblock "| poem | drama" >
             <!ENTITY % externalinline "| stagedir">
         ]>
See Appendix 5 - DTBook3 DTD for further details.

This standard does not mandate the level of markup to be applied to a text file. However, the richer the tagging, the greater the functionality available to the reader.

4.3 Playback Systems and DTBook3

(This section is informative)

The DTBook3 DTD is an XML application defining the structure and content allowed in the textual portion of a digital talking book. The content is provided within semantically rich elements. The functions as outlined in the DTBook3 DTD should be implemented in playback systems to provide the efficient document navigation required by the target population.

Global navigation is controlled by the NCX and provides direct access to the hierarchical structure of the book as defined by the author and publisher -- parts, chapters, sections, subsections, etc., as well as to pages, notes, sidebars, etc. Section 8 describes how this is implemented. Local navigation based on the marked-up text file, when present, should also be supported, enabling movement through the document at a finer granularity than allowed by the NCX. Examples are navigating through tables and nested lists, movement by paragraph, sentence, or word (depending on the degree of markup), and automatically skipping predefined classes of objects such as notes or sidebars. For more information see the Document Navigation Features List [Navigation Features].

Players should have an accessible interface to provide efficient navigational capabilities to the target population. The implementation of navigation between NCX references (global) and inside markup content (local) should be transparent to the target population.

4.4 DTBook3 Tags

(This section is informative)

The element names from DTBook3 are listed below in alphabetical order. The description provided for each element is taken directly from the DTBook3 DTD. In the following list, elements labeled "HTML 4.0" were drawn from the HTML 4.0 specification when creating DTBook3.

ElementDescription
a contains an anchor, which is used in two ways: A name anchor identifies an exact position in a given document. A link anchor, when activated, "links to" (repositions the focus to) another location within that document or another document. [HTML 4.0]
abbr designates an abbreviation, a shortened form of a word. For example: Mr., approx ., lbs., rec'd.
acronym marks a word formed from key letters (usually initials) of a group of words. For example: UNESCO, NATO, XML.
address contains a location at which a person or agency may be contacted. [HTML 4.0]
author identifies the writer of a given work.
base contains the base URI from which local references start. It acts as an absolute URI that serves as the base URI for resolving relative URIs found within the document. It is an empty element that may appear only in <head>. [HTML 4.0]
bdo is used in special cases where the automatic actions of the bidirectional algorithm would result in incorrect display. [HTML 4.0]
blockquote indicates a block of quoted content that is set off from the surrounding text by paragraph breaks. Compare with <q> which marks short, inline quotations. [HTML 4.0]
bodymatter consists of the text proper of a book, as opposed to preliminary material <frontmatter> or supplementary information <rearmatter>.
book surrounds the actual content of the document, which is divided into <frontmatter>, <bodymatter>, and <rearmatter>. <head>, which contains metadata, precedes <book>.
br marks a forced line break. [HTML 4.0]
caption describes a table. If used, it must follow immediately after the table start tag. [HTML 4.0]
citation marks a reference to another document.
code designates a fragment of computer code. [HTML 4.0]
col is a means to apply attribute values to table columns. [HTML 4.0]
colgroup is a group of columns that may share attribute values within a table. [HTML 4.0]
dd marks a definition of a term within a definition list. [HTML 4.0]
dfn marks the first occurrence of a word or term that is defined or explained elsewhere in a book. [HTML 4.0]
div is a generic container for subdivisions of a book. The <level1> ... <level6> hierarchy, or the <level> tag used recursively, should mark the major hierarchical structures of a book, while <div> is used in less formal circumstances or when for production purposes it is desired that a structure should be treated differently. The class attribute identifies the actual name (e.g., part, chapter, letter) of the structure it marks. Compare with <span> which is used in inline settings. [HTML 4.0]
dl contains a definition list, usually consisting of pairs of terms <dt> and definitions <dd>. Any definition can contain another definition list. [HTML 4.0]
doctitle marks the title of the book, as the first tag within <frontmatter>. It is used to quickly identify the book.
dt marks a term in a definition list. [HTML 4.0]
dtbook3 is the root element in the Digital Talking Book 3.0 DTD. Contains metadata in <head> and the document itself in <book>.
em indicates emphasis. Compare with <strong>. [HTML 4.0]
frontmatter contains preliminary material such as the copyright notice, foreword, acknowledgments, table of contents, etc. which serves as a guide to the contents and nature of a book.
h1 contains the text of the heading for a <level1> structure. [HTML 4.0 but nested]
h2 contains the text of the heading for a <level2> structure. [HTML 4.0 but nested]
h3 contains the text of the heading for a <level3> structure. [HTML 4.0 but nested]
h4 contains the text of the heading for a <level4> structure. [HTML 4.0 but nested]
h5 contains the text of the heading for a <level5> structure. [HTML 4.0 but nested]
h6 contains the text of the heading for a <level6> structure. [HTML 4.0 but nested]
hd marks the text of a heading in a <list> or <sidebar>.
head contains metainformation about the book but no actual content of the book itself, which is placed in <book>. This information is consonant with the head information in xhtml. See: http://www.w3.org/ [HTML 4.0]
hr is an empty element indicating a horizontal rule. May be used to indicate a break in the text where only blank lines, a row of asterisks, a horizontal line, etc. are used in the print book. [HTML 4.0]
img marks a visual image. The "src" attribute specifies the location of the image file. The "alt" and "longdesc" attributes may be used to supply short and long descriptions, respectively. may be used to supply short and long descriptions, respectively, although prodnote will generally contain the latter. Longdesc may contain a pointer to the prodnote. The referencing is typically of the form <imgcaption imgref="#yyy">The Caption</imgcaption> containing the printed caption for the <img id="yyy">. [HTML 4.0]
imgcaption contains the caption for one or more <img>. If the caption applies to more than one <img>, each idref in the list of IDs in the imgref is separated by whitespace.
imggroup provides a container for <img> and its associated <imgcaption> and <prodnote>. <prodnote> will contain a description of the image. Content model allows multiple <img> if they share a caption, multiple <imgcaption> if several captions refer to a single <img>, and multiple <prodnote> if different versions are needed for different media (e.g., large print, braille, etc.)
kbd designates information that the reader is to input directly into a computer using the keyboard. [HTML 4.0]
level is an alternative tag for marking the major structures in a book. It may be used recursively, i.e., repeated indefinitely with each successive occurrence nesting within the previous. It may also be included in a subsequent higher level. The class attribute identifies the actual name (e.g., part, chapter, section, subsection) of the structure it marks. the depth attribute indicates the nesting depth, starting at 1. Subordinate levels have greater depth.
level1 is the highest level container of major divisions of a book. Used in <frontmatter>, <bodymatter>, and <rearmatter> to mark the largest divisions of the book (usually parts or chapters), inside which level2 subdivisions (often sections) may nest. The class attribute identifies the actual name (e.g., part, chapter) of the structure it marks.
level2 contains subdivisions that nest within <level1> divisions. The class attribute identifies the actual name (e.g., subpart, chapter, subsection) of the structure it marks.
level3 contains subdivisions that nest within <level2> subdivisions (e.g., subsections within subsections). The class attribute identifies the actual name (e.g., section, subpart, subsubsection) of the subordinate structure it marks.
level4 contains subdivisions that nest within <level3> subdivisions. The class attribute identifies the actual name of the subordinate structure it marks.
level5 contains subdivisions that nest within <level4> subdivisions. The class attribute identifies the actual name of the subordinate structure it marks.
level6 contains subdivisions that nest within <level5> subdivisions. The class attribute identifies the actual name of the subordinate structure it marks.
levelhd contains the text of a heading within <level>. Corresponds to <h1> through <h6> used in <level1> through <level6>.
li marks each list item in a <list>. <li> content may be either inline or block and may include other nested lists. Alternatively it may contain a sequence of list item components, <lic>, that identify regularly occurring content, such as the heading and page number of each entry in a table of contents. [HTML 4.0]
lic ("list item component") allows ordered substructure within a list item <li>. Used when a list item is made up of two or more components, as in a table of contents entry.
line marks a single logical line of text. Often used in conjunction with <linenum> in documents with numbered lines.
linenum contains a line number in, for example, legal text.
link is an empty element appearing in the <head> section of a document that establishes a connection between the current document and another document(s). The <link> element conveys relationship information (for example, "next" and "previous") that may be rendered by user agents in a variety of ways. [HTML 4.0]
list contains a list. The "type" attribute can indicate whether a list is ordered or unordered.
meta indicates metadata about the book. It is an empty element that may appear only in <head>. [HTML 4.0]
noscript identifies an alternate method for carrying out a function when a playback device cannot execute a <script>. See <script>. [HTML 4.0]
note marks a footnote, endnote, annotation, etc. The reference to the note within the text is marked with a <noteref>.
noteref marks one or more characters that reference a footnote, endnote, or annotation <note>.
notice contains a warning, caution, or other type of admonition normally found in the margin of a book. Differs from a sidebar in that a notice must be presented at a specific location within the text and its presentation is not optional.
object marks an embedded object, which may consist of scripts, applets, images, etc. [HTML 4.0]
p contains a paragraph. [HTML 4.0]
pagenum contains a page number from the print document, recorded as the first text object on a page. The"page" attribute allows three types of page numbering schemes to be identified: "normal" arabic numbering in the body of the book, "front" pages (from the frontmatter), and "special" pagination schemes such as hyphenated numbers in appendices.
param provides a named property for <object>. [HTML 4.0]
prodnote contains language added to the alternative-format version by the producers; commonly used to provide verbal descriptions of visual elements such as charts, graphs, etc.; to supply operating instructions; or to describe differences between the print book and the audio version.
q contains a short, inline quotation. Compare with <blockquote> which marks a longer quotation set off from the surrounding text. [HTML 4.0]
rearmatter contains supplementary material such as appendices, glossaries, bibliographies, and indices following the <bodymatter> of the book.
samp contains a sample of work created by the author for use as an example or template. For example, a sample business letter, resume, computer program output, or form. [HTML 4.0]
script contains a script, a program that may accompany a document or be embedded directly in it. The program executes on the client's machine when the document loads, or at some other time such as when a link is activated. See <noscript> for an alternative in case the <script> cannot be executed. [HTML 4.0]
sent marks a sentence.
sidebar contains information supplementary to the main text and/or narrative flow and is often boxed and printed apart from the main text block on a page.
span is a generic container for use in inline settings when no specific tag exists for a given situation. The class attribute may describe the nature of the text it marks (e.g., a typographical error). May be used to mark a class of items to which styles are to be applied. Compare with <div> which is used in block settings. [HTML 4.0]
strong marks stronger emphasis than <em>. [HTML 4.0]
style is means to include styling information that applies to the book. It may appear only in <head>. [HTML 4.0]
sub indicates a subscript character (printed below a character's normal baseline). Can be used recursively and/or intermixed with <sup>. [HTML 4.0]
sup marks a superscript character (printed above a character's normal baseline). Can be used recursively and/or intermixed with <sub>. [HTML 4.0]
table contains a table data arranged in rows and columns. [HTML 4.0]
tbody marks a group of rows in the main body of a table. If the table is divided into several sections, each consisting of a number of rows, each section would be separately tagged with tbody. [HTML 4.0]
td indicates an individual data cell in the body of a table. [HTML 4.0]
tfoot marks table footer information, consisting of one or more rows (each marked with the tr tag). [HTML 4.0]
th indicates a table cell containing header information. [HTML 4.0]
thead marks header information in a table, consisting of one or more rows (each marked with the tr tag) of <th> cells. [HTML 4.0]
title contains the title of the book but is used only as metainformation in <head>. Use <doctitle> within <book> for the actual book title, which will usually be the same. [HTML 4.0]
tr marks one row of a table containing <th> or <td> cells. [HTML 4.0]
var indicates an instance of a variable or program argument. Commonly used as a placeholder for text to be entered by the user. [HTML 4.0]
w marks a word.
Contents

5. Audio File Formats

(This section is normative)

5.1 Introduction

(This section is normative)

The limitations placed upon currently feasible DTB distribution systems by the size of audio files necessitate some form of data compression. A set of audio file formats are listed below; audio players should be capable of decoding all of the applicable formats listed, while content must be delivered in one of these formats, or any mixture of them.

5.2 Distribution File Formats

(This section is normative)

Digital Talking Book producers must use only a selected set of standard audio encoding formats for distribution of content to the user. These formats are based on stable and widely-used standards.

It is permissible for parts of a single document to be encoded in different audio formats. For example, a producer may choose to encode a lengthy bibliography at a lower bitrate or with a different codec than the main body of the book. Players must support transitions between differently encoded sections smoothly.

Support for the decoding of stereo or multi-channel signals is not required.

While the ISO standards for MP3 and AAC require support for variable bitrate playback, DTB players will only be required to support constant bitrate playback.

A compliant DTB player that provides audio output should be capable of decoding the following audio formats:

Audio players capable of recording and exporting audio notes for bookmarks and highlights must support encoding in the following format. Similarly, audio players capable of importing bookmarks and highlights must support decoding of the following format.

  • ADPCM - ITU-T G.726
    Communication quality at 40,32,24 and 16 kbps, usually encoded at 32 kbps from 16 bit 8 kHz data. Encoder and decoder are simple to implement; primarily intended for recording and playback of user notes on bookmarks and highlights.
  • Contents

    6. Other Media File Formats

    (This section is normative)

    Playback devices that support image display must be capable of displaying the following image formats: jpeg (RFC 2046) and png (RFC 2083).

    Contents

    7. Synchronization of Media Files

    7.1 Introduction

    (This section is informative)

    The Synchronized Multimedia Integration Language (SMIL) was developed by the World Wide Web Consortium as a standard for definition and playback of multimedia presentations over the Internet. SMIL defines the sequence of playback for one or more media objects. In the case of DTBs, the primary media objects are audio and text files; SMIL provides for their parallel and synchronized presentation. Any DTB constructed using SMIL, and utilizing content encoded in standard text and audio media types, is playable on any device or platform which has implemented a SMIL-conformant player of the same or later SMIL version, so long as the necessary audio, image, and textual rendering decoders are present.

    What distinguishes a DTB playback system from a basic SMIL player is the inclusion of specific navigation and presentational capabilities set out in the user requirements for DTBs ([Navigation Features]). These capabilities can utilize information from an NCX file, from the textual content, and/or from the SMIL file itself. The key to this information is the inclusion of unique identifiers within the textual content (when present) and SMIL files. Audio files are indexed by time-based positions and in themselves contain no embedded semantic structure. To provide semantic structure to audio content, it is necessary to associate time-points in the audio file with the corresponding position within the textual content. This is achieved using SMIL through the pairing of a pointer to a specific position within a text file (referenced by a Uniform Resource Identifier (URI)) with its corresponding time position in the audio content. In the case of the DTB SMIL application, each synchronization point within the SMIL file is assigned a unique identifier. The presence of these identifiers within both the textual content and the SMIL allows navigation to occur by several different methods, as determined by the playback system.

    SMIL incorporates a control structure called customTests, which allows SMIL authors to define optional content selection during playback. This capability permits multiple auditory "views" of a DTB to be selected by a reader. Because a SMIL-based DTB is composed of a sequence of structured audio elements, it is possible to tag individual elements with a structure classification, such as notes or page numbers. Playback systems should expose to the user the presence of these customTests and allow the user to select whether a given structural element is to be read during sequential playback or not.

    The DTB producer determines granularity of the synchronization events. Synchronization events may be limited to the primary structural elements (those indicated in the NCX) or may be augmented in books with full textual content to include synchronization down to paragraph, sentence, or even word level. The requirement for this level of synchronization is that the textual content include mark-up tags for the desired elements, and that those elements include unique identifiers that can be referenced in the SMIL.

    The SMIL file for a DTB consists of a sequence of typically parallel events, e.g., text and audio (and possibly image) events occurring simultaneously. SMIL represents this structure through the use of the "time containers" seq (sequence of events) and par (parallel time grouping in which multiple elements play back at the same time). A simple form of DTB SMIL file would be as follows, where the three pars shown are played one after the other:

    <smil>
    ...
    <seq>
    <par>...</par>
    <par>...</par>
    <par>...</par>
    </seq>
    ...
    </smil>

    7.2 SMIL Modules

    (This section is informative)

    This standard is based on the SMIL 2.0 Specification. [Note: At the time of this writing, SMIL 2.0 is in last call status (21 September 2000 draft) so parts of this standard may change if the applicable sections of SMIL 2.0 are modified before the draft achieves recommendation status.] Developers are requested to reference the SMIL 2.0 specification for complete background and details. Only a small subset of the SMIL specification is utilized in this implementation, drawing from the following modules, which are grouped by functional area. Modules marked with asterisks are used in whole or in part in this application; the others are included because they are part of a core set of modules required for host language conformance under W3C modularization guidelines.

    (This section is normative)

    The above modules have been chosen to provide the functionality required for the DTB application. Together they form a profile defined in Appendices 4.1 and 4.2, which consist of a DTD and an associated module, as follows:

    Authoring tools using the above files to validate DTB SMIL files must also reference the module files listed in Appendix 4.3, "SMIL 2.0 Modules Included in DTB SMIL Profile."

    To simplify validation using commonly available parsers and to lessen the complexity of determining content models and attribute lists, a DTB-specific SMIL DTD is included in Appendix 4.4.

    A compliant DTB must contain at least one SMIL file. All SMIL files that comprise a DTB must be valid XML and must validate to either the DTB-Specific SMIL DTD or to the DTB SMIL Profile. Any player that complies with the DTB SMIL Profile will be able to play a compliant DTB SMIL presentation.

    7.3 SMIL Elements

    (This section is informative)

    As mentioned above, the DTB application utilizes only a portion of the elements and attributes that make up the modules in the DTB SMIL Profile. Playback devices compliant with this standard need support only the following SMIL elements and attributes.

    7.3.1 Common Attributes

    (This section is informative)

    The following attributes are allowed when the entity %Core.attrib; is listed above:

    7.4 SMIL Production Issues

    7.4.1 "Escapable" Structures

    (This section is normative)

    DTB players should provide the functionality to allow readers to escape from specific structures (at a minimum tables, lists, and notes) with a single action. To support this functionality, producers must ensure that the beginning and end of each such structure is indicated in SMIL by wrapping in a <seq> any such structure consisting of multiple time containers (i.e., <seq>s and <par>s). In addition, producers must include a class attribute on the <seq> or <par> containing a table, list, or note, using element names drawn from the DTBook3 DTD (i.e., "table," "list," and "note").

    7.4.2 Automatic Invocation of Special Navigation Modes

    (This section is normative)

    DTB players should automatically invoke special navigation modes when the reader enters a table or list. To support this functionality, producers must include a class attribute on the <seq> or <par> containing a table or list, using element names drawn from the DTBook3 DTD (i.e., "table" and "list.") Producers and players may also support this functionality for other structures using the same mechanism.

    7.4.3 "Skippable" Structures

    (This section is normative)

    Players should offer the user the option to "turn off" selected structures in a DTB, that is, identify certain structures such as notes or line numbers that the player will automatically skip over during sequential playback. To support this capability, producers must include customTest attributes on <seq>s and <par>s containing those structures. In addition, <customAttributes>/<customTest> elements must be valued in the <head> of each SMIL file. At a minimum, producers must offer readers the option to skip over <linenum>, <note>, <noteref>, <pagenum>, optional <prodnote>, and <sidebar>.

    7.4.4 Packaging Files across Several Distribution Media

    (This section is normative)

    When a DTB spans several distribution media, producers must package SMIL and other files correctly to ensure a complete DTB presentation. See section 11.2, "Distribution Requirements" for details.

    7.4.5 Use of CustomTest Element and Attribute

    (This section is normative)

    Players should expose to the user the presence of customTests and allow the user to select whether a given class of structure (e.g., page number, sidebar, optional producer's note, etc.) will be read during sequential playback.

    Authoring tools should be able to create different customTests for a single element, depending on the element's attributes. For example, <prodnote render="optional"> might be assigned the customTest "prodnote_opt", while <prodnote render="required"> would not need to be assigned a customTest as the user should not have to option of turning them off.

    If customAttributes are to be included in a SMIL presentation, they must be present in the head of each SMIL file in the DTB.

    7.4.6 Links

    (This section is normative)

    If links are present in the document text file of a DTB, producers must also include them in SMIL. The default behavior of a link is to be active for the duration of the media object it contains. If producers specify the precise active duration of a link they must use either dur or end, but not both.

    7.5 SMIL Metadata

    (This section is normative)

    Metadata is included in the <head> element using the <meta> tag. Content producers may introduce other metadata if needed.



    7.6 SMIL Examples

    (This section is informative)

    The following example illustrates the use of head and its contents. The meta element contains the unique id of the DTB as well as the title. The root-layout element defines the size of the rendering window. The visual display location of any text elements with region ="text" or region="notes" is specified by the region elements within layout. The text region occupies most of the screen (the bottom edge of the "text" region is 15% from the bottom of the overall rendering window), while the notes regions occupies only the bottom 15%. The customAttributes indicate that any pars with customTest="pagenum" will not be rendered by default, while pars with customTest="notes" will automatically be played. If the user interface of the playback device supports it, the user may change these settings.

    Example 7.1:
    <smil>

    </smil>

    Example 7.2 shows the use of SMIL elements within body. The initial seq includes the attribute "dur" which specifies that the entire SMIL file is one hour, three minutes, 24.9 seconds long. Each par (a page number, a heading, and two paragraphs are shown) includes the segment of text and corresponding audio clip that are to be rendered simultaneously. The last par includes a link that "wraps" the audio element. The link becomes active at 2 minutes 12.6 seconds (relative to the beginning of the audio file referenced by the audio element) and becomes inactive 15 seconds later. Alternatively, the producer could have chosen to specify when the link would become inactive with the end attribute, perhaps in a table of contents where each entry is a link and the producer wishes to make each link active only until the next begins. However, as mentioned above, the default behavior of a link is to be active for the duration of the media object it contains.

    Example 7.2:
    <smil>

    </smil>

    As mentioned earlier, seqs may be nested in a DTB SMIL file. Notes or sidebars containing multiple paragraphs will need to be represented as a series of pars wrapped in a seq, so that a customTest can be applied to the seq, permitting the user to skip the entire sequence. In addition, note references occuring in the middle of a paragraph will require this special syntax so that the playback device can properly render the text with or without either the note reference or the note. In Example 7.3, the first par contains the portion of paragraph 12 preceding the note reference. The second par holds the note reference itself (e.g., "footnote 1"). The third par contains the contents of footnote 1 and the last holds the remainder of paragraph 12. Note that the seq and each par contains a unique id. The region attribute on text will control whether each segment is displayed in the text or notes region.

    Example 7.3:
    ...
    <body>

    </seq>
    </body>
    ...

    7.7 Clock Values

    (This section is normative)

    The SMIL 2.0 Timing and Synchronization Module describes several different formats in which "clock values" (timing) may be represented. See Clock Values [SMILclock] in that module. Playback devices must support all of these formats. Examples of the three different formats follow:

    Full-clock-val (hours, minutes, seconds, and fractions of seconds: 3:22:55.9

    Partial-clock-val (minutes, seconds, and fractions of seconds: 43:15.0

    Timecount-val (one or more digits, plus an optional fraction and unit of measurement -- h=hours, min=minutes, s=seconds, ms=milliseconds): 34.6s or 356ms or 58.2. (For Timecount values, if no unit is shown, the default is "s" (for seconds).)

    If either of the first two formats is used, authoring tools must add leading zeroes to single-digit values for minutes and seconds.

    Contents

    8. Navigation Control (NCX and LWN)

    8.1 Introduction

    (This section is informative)

    Navigation is controlled by one of two files: the NCX file, an XML file that complies with the DTD for NCX found in Appendix 1.1, or the LWN file, a transformation of the NCX file that simplifies navigation for lightweight players, that complies with the DTD for Navigation with Lightweight Players found in Appendix 1.2. Both files are required, and a player may choose to navigate using either.

    The NCX file exposes the hierarchical structure of a document to allow the user to navigate through it. The NCX is similar to a table of contents in that it enables the reader to jump directly to any of the major structural elements of the document, i.e. part, chapter, or section, but it will often contain more elements of the document than the publisher chooses to include in the original print table of contents. It can be visualized as a collapsible tree familiar to users of Windows. Its development was motivated by the need to provide quick access to the main structural elements of the document without the need to parse the entire marked-up text file, which in many cases may not be present at all. Other elements such as pages, footnotes, figures, tables, etc. can be included in separate, non-hierarchical lists and may be accessed by the user as well.

    The LWN file contains the same information as the NCX file, transformed to a non-hierarchical, sequential list. Hierarchical nesting information is preserved by the inclusion of levelNumber attributes, and all elements that will be available for navigation are merged in the proper sequence into a single list. This file will allow a player with limited resources to jump to any arbitrary location in the book and begin playing without needing to parse the entire NCX or synchronize multiple lists. It is planned to define a compiled format for the LWN in a future version of this standard.

    It is important to emphasize that these navigation features are intended as a convenience for users who want them, and not a burden to those who do not. Players should be able to present the document in the default play sequence defined by the package file's spine without requiring user input beyond play and stop controls.

    8.2 Navigation Control Files

    8.2.1 NCX

    (This section is normative)

    Every DTB must contain exactly one NCX file. The NCX file must comply with the NCX DTD (see Appendix 1.1, "DTD for NCX") The NCX entry in the Package File manifest must have an id value equal to "ncx". The NCX file itself must be named with the extension ".ncx".

    (This section is informative)

    Brief descriptions of the NCX elements follow. Each includes the element declaration extracted from the NCX DTD, along with descriptions of any applicable attributes.

    8.2.2 LWN Elements

    (This section is normative)

    Every DTB must contain exactly one LWN file. The LWN file must comply with the LWN DTD (see Appendix 1.2 "DTD for Navigation with Lightweight Players"). The LWN file must contain the identical information contained in the NCX file, transformed into a non-hierarchical list with all elements in the proper sequence. The LWN entry in the Package File manifest must have an id value equal to "lwn". The LWN file itself must be named with the extension ".lwn".

    (This section is informative)

    LWN elements are identical to the corresponding NCX elements with the following exceptions:

    1. The root element is <lwn>.
    2. <navObject>s may not nest.
    3. <navStruct> may contain a mixture of <navObject>s and <navTarget>s.
    4. <navList> is not allowed.

    Brief descriptions of LWN elements whose content models differ from the corresponding NCX elements are included below. See section 8.2.1 for descriptions of all other elements.

    8.3 Navigation Modes

    (This section is informative)

    The digital talking book system supports two modes of navigation:

    Both modes can use the same set of controls, and the user should be able to use the same methods to move through the material in either mode.

    8.4 How NCX Works

    8.4.1 NCX (Hierarchical) Version

    (This section is informative)

    Upon opening a document, a player will by default use the NCX navStruct to define the user's choices for navigation. The navStruct contains nested navObjects that represent the major divisions of the document. For example, the structure of the book whose NCX is shown in Section 8.5, Example 8.1 would look like this:

    Foreword and Standards are at the same level, in this case the highest level, level 1. The nesting of navObjects allows the user to move directly between these objects without passing through the lower level divisions in between. From Foreword, the user can move to level 2 and step to any of the sections of Foreword. Since there is no level3 under Foreword, no smaller divisions can be accessed from the NCX. Such smaller divisions may be present, but they can only be reached through local navigation. The a. division of Standards is at level4, and can be reached by stepping through 1 Core Services and 1.1.

    The user will also have the option of navigating to items that do not fit easily into the hierarchical structure of a document, e.g. pages, footnotes or sidebars. This function is provided by navLists. There is no nesting in navLists, all navTargets are at the same level. In example 1, there are two navLists: the first contains three navTargets representing page numbers, and the second contains three navTargets representing notes.

    Each navObject or navTarget provides navigation information about one piece of the document, e.g. a chapter heading, section number, page number, figure, etc. The text element contains the actual heading, page number, etc. for visual or text-to-speech presentation; the audio element uses SMIL 2.0 syntax to point to a clip containing the audio presentation of the same information. One or both should be used to give location feedback to the user. The content element provides a pointer to an ID within a SMIL file.

    The required structRef attribute of navTarget allows synchronization of navLists with the navStruct. structRef points to the navObject that contains the page number, note, or other element referenced by the navTarget. Similarly, the pageRef attribute of navObject points to the navTarget representing the page on which the navObject begins.

    8.4.2 LWN (Non-hierarchical) Version

    (This section is normative)

    The LWN file consists of a single list with references to every navigable element of the book. That is, the navObjects and navTargets are intermingled, with each element in its proper order. Since players may choose to navigate with either NCX or LWN in different situations, the LWN file is required to contain the same information as the NCX file and it should be derived from a valid NCX file; the marked up text file or SMIL file(s) will often also be needed to achieve proper sequencing of non-hierarchical elements.

    8.5 Navigation Metadata

    (This section is normative)

    Metadata is included in the <head> element using the <meta> tag. Content producers may introduce other metadata if needed.

    8.6 Examples

    (This section is informative)

    Example 8.1: NCX

    <ncx>
       <head>
         <UID>us-nls-00001</UID>
    	<charset>ISO-8859-1</charset>
    	<meta name="dtb:pageNormal" content="47"/>
    	<meta name="dtb:pageSpecial" content="0"/>
    	<meta name="dtb:pageFront" content="5"/>
       </head>
    <docTitle> <text>Revised Standards and Guidelines of Service for the Library of Congress Network of Libraries for the Blind and Physically Handicapped 1995</text> <audio src="rs_title.mp3" /> </docTitle>
    <navStruct> <navObject class="chapter" id="lvl1_3" pageRef="1" levelNumber="1"> <descr> <text>Foreword</text> <audio src="rs_fwdx.mp3" clipBegin="00:01.5" clipEnd="00:02.0" /> </descr> <content src="sample.smil#h1_3" /> <navObject class="section" id="lvl2_1" pageRef="1" levelNumber="2"> <descr> <text>History</text> <audio src="rs_fwdx.mp3" clipBegin="00:03.4" clipEnd="00:03.9" /> </descr> <content src="sample.smil#h2_1" /> </navObject> <navObject class="section" id="lvl2_2" pageRef="2" levelNumber="2"> <descr> <text>Development of Standards</text> <audio src="rs_fwdx.mp3" clipBegin="00:56.3" clipEnd="00:57.7" /> </descr> <content src="sample.smil#h2_2" /> </navObject> </navObject> ... <navObject class="chapter" id="lvl1_7" pageRef="16" levelNumber="1"> <descr> <text>Standards</text> <audio src="rs_stdx.mp3" clipBegin="00:01.3" clipEnd="00:02.1" /> </descr> <content src="sample.smil#h1_7" /> <navObject class="section" id="lvl2_11" pageRef="16" levelNumber="2"> <descr> <text>1 Core Services</text> <audio src="rs_stdx.mp3" clipBegin="00:02.9" clipEnd="00:04.9" /> </descr> <content src="sample.smil#h2_10" /> <navObject class="subsection" id="lvl3_1" pageRef="16" levelNumber="3"> <descr> <text>1.1</text> <audio src="rs_stdx.mp3" clipBegin="00:05.7" clipEnd="00:06.7" /> </descr> <content src="sample.smil#h3_1" /> <navObject class="sub-subsection" id="lvl4_1" pageRef="16" levelNumber="4"> <descr> <text>a.</text> <audio src="rs_stdx.mp3" clipBegin="00:18.7" clipEnd="00:19.1" /> </descr> <content src="sample.smil#h4_1" /> </navObject> </navObject> <navObject class="subsection" id="lvl3_2" pageRef="16" levelNumber="3"> <descr> <text>1.2</text> <audio src="rs_stdx.mp3" clipBegin="00:50.5" clipEnd="00:51.4" /> </descr> <content src="sample.smil#h3_2" /> </navObject> </navObject> </navObject> </navStruct>
    <navList id="pages" class="pages"> <navTarget class="page" id="p1" value="1" structRef="lvl1_3"> <descr> <text>1</text> <audio src="rs_fwdx.mp3" clipBegin="00:00" clipEnd="00:00.9" /> </descr> <content src="sample.smil#p1" /> </navTarget> <navTarget class="page" id="p2" value="2" structRef="lvl2_2"> <descr> <text>2</text> <audio src="rs_fwdx.mp3" clipBegin="00:53.9" clipEnd="00:54.6" /> </descr> <content src="sample.smil#p2" /> </navTarget> ... <navTarget class="page" id="p16" value="16" structRef="lvl1_7"> <descr> <text>16</text> <audio src="rs_stdx.mp3" clipBegin="00:00.0" clipEnd="00:00.7" /> </descr> <content src="sample.smil#p3" /> </navTarget> ... </navList>
    <navList id="notes" class="notes"> <navTarget class="note" id="nref_1" structRef="lvl2_2"> <descr> <text>1</text> <audio src="rs_fwdx.mp3" clipBegin="01:22.6" clipEnd="01:23.5" /> </descr> <content src="sample.smil#nref_1" /> </navTarget> <navTarget class="note" id="nref_2" structRef="lvl2_2"> <descr> <text>2</text> <audio src="rs_fwdx.mp3" clipBegin="02:00.6" clipEnd="02:01.4" /> </descr> <content src="sample.smil#nref_2" /> </navTarget> <navTarget class="note" id="nref_3" structRef="lvl2_2"> <descr> <text>3</text> <audio src="rs_fwdx.mp3" clipBegin="03:13.3" clipEnd="03:14.1" /> </descr> <content src="sample.smil#nref_3" /> </navTarget> </navList> </ncx>

    Example 8.2: LWN

    <lwn>
       <head>
         <UID>us-nls-00001</UID>
    	<charset>ISO-8859-1</charset>
    	<meta name="dtb:pageNormal" content="47"/>
    	<meta name="dtb:pageSpecial" content="0"/>
    	<meta name="dtb:pageFront" content="5"/>
       </head>
    <docTitle> <text>Revised Standards and Guidelines of Service for the Library of Congress Network of Libraries for the Blind and Physically Handicapped 1995</text> <audio src="rs_title.mp3" /> </docTitle>
    <navStruct> <navTarget class="page" id="p1" value="1" structRef="lvl1_3"> <descr> <text>1</text> <audio src="rs_fwdx.mp3" clipBegin="00:00" clipEnd="00:00.9" /> </descr> <content src="sample.smil#p1" /> </navTarget> <navObject class="chapter" id="lvl1_3" pageRef="1" levelNumber="1"> <descr> <text>Foreword</text> <audio src="rs_fwdx.mp3" clipBegin="00:01.5" clipEnd="00:02.0" /> </descr> <content src="sample.smil#h1_3" /> </navObject> <navObject class="section" id="lvl2_1" pageRef="1" levelNumber="2"> <descr> <text>History</text> <audio src="rs_fwdx.mp3" clipBegin="00:03.4" clipEnd="00:03.9" /> </descr> <content src="sample.smil#h2_1" /> </navObject> <navTarget class="page" id="p2" value="2" structRef="lvl2_2"> <descr> <text>2</text> <audio src="rs_fwdx.mp3" clipBegin="00:53.9" clipEnd="00:54.6" /> </descr> <content src="sample.smil#p2" /> </navTarget> <navObject class="section" id="lvl2_2" pageRef="2" levelNumber="2"> <descr> <text>Development of Standards</text> <audio src="rs_fwdx.mp3" clipBegin="00:56.3" clipEnd="00:57.7" /> </descr> <content src="sample.smil#h2_2" /> </navObject> <navTarget class="note" id="nref_1" structRef="lvl2_2"> <descr> <text>1</text> <audio src="rs_fwdx.mp3" clipBegin="01:22.6" clipEnd="01:23.5" /> </descr> <content src="sample.smil#nref_1" /> </navTarget> <navTarget class="note" id="nref_2" structRef="lvl2_2"> <descr> <text>2</text> <audio src="rs_fwdx.mp3" clipBegin="02:00.6" clipEnd="02:01.4" /> </descr> <content src="sample.smil#nref_2" /> </navTarget> <navTarget class="note" id="nref_3" structRef="lvl2_2"> <descr> <text>3</text> <audio src="rs_fwdx.mp3" clipBegin="03:13.3" clipEnd="03:14.1" /> </descr> <content src="sample.smil#nref_3" /> </navTarget> ... <navTarget class="page" id="p16" value="16" structRef="lvl1_7"> <descr> <text>16</text> <audio src="rs_stdx.mp3" clipBegin="00:00.0" clipEnd="00:00.7" /> </descr> <content src="sample.smil#p3" /> </navTarget> <navObject class="chapter" id="lvl1_7" pageRef="16" levelNumber="1"> <descr> <text>Standards</text> <audio src="rs_stdx.mp3" clipBegin="00:01.3" clipEnd="00:02.1" /> </descr> <content src="sample.smil#h1_7" /> </navObject> <navObject class="section" id="lvl2_11" pageRef="16" levelNumber="2"> <descr> <text>1 Core Services</text> <audio src="rs_stdx.mp3" clipBegin="00:02.9" clipEnd="00:04.9" /> </descr> <content src="sample.smil#h2_10" /> </navObject> <navObject class="subsection" id="lvl3_1" pageRef="16" levelNumber="3"> <descr> <text>1.1</text> <audio src="rs_stdx.mp3" clipBegin="00:05.7" clipEnd="00:06.7" /> </descr> <content src="sample.smil#h3_1" /> </navObject> <navObject class="sub-subsection" id="lvl4_1" pageRef="16" levelNumber="4"> <descr> <text>a.</text> <audio src="rs_stdx.mp3" clipBegin="00:18.7" clipEnd="00:19.1" /> </descr> <content src="sample.smil#h4_1" /> </navObject> <navObject class="subsection" id="lvl3_2" pageRef="16" levelNumber="3"> <descr> <text>1.2</text> <audio src="rs_stdx.mp3" clipBegin="00:50.5" clipEnd="00:51.4" /> </descr> <content src="sample.smil#h3_2" /> </navObject> </navStruct> </lwn>

    8.7 NCX Production Implications

    8.7.1 IDs

    (This section is normative)

    The id attribute on navObject and navTarget must be equal to the id attribute of the corresponding element in the XML and/or SMIL files.

    8.7.2 DTBs Spanning Multiple Media Units

    (This section is normative)

    When a DTB spans several distribution media (e.g., multiple CD-ROMs), the audio clips referenced in the NCX must be copied from the audio content files into a single file that is replicated on each piece of distribution media. Creating such a file will ensure that the full NCX is functional on each piece of media. The NCX must also be constructed in a slightly different manner than ordinarily, in that the audio element must point to clips in a special file, instead of to clips in the audio content file(s). See Section 11, Distribution Information for further discussion of this issue.

    Contents

    9. Portable Bookmarks and Highlights

    9.1 Introduction

    (This section is normative)

    This standard establishes a specific XML file format to support bookmark and highlight export and import. A playback system may allow readers to set bookmarks and to highlight passages in a document, label the marked sections with text or audio notes, and export the resulting collection of marks and notes to other compliant playback devices. Bookmarks and highlights, and their associated notes, if any, are stored within the player, separate from the DTB itself.

    This standard does not require that all compliant players support all of the functionality described above. In addition, this standard places no constraints on a playback system's internal system for storing or manipulating the information in the bookmark file. However, if a player supports the export of bookmarks and highlights and their associated audio notes, the player must format the information as described below when it is exported. See Appendix 2 -- DTD for Portable Bookmarks/Highlights.

    If a playback device supports user-recording of audio notes on bookmarks or highlights that may be exported, the recording may be in any format supported by the standard. The format is implicit in the suffix of the filename. The playback device must generate this suffix when generating the filename.

    Bookmark files shall be named, by default, with the value from uid and the extension ".bmk". For example: "us-nls-14339.bmk". Players may allow users to apply their own filenames to accommodate character limitations in other filesystems and to avoid filename collisions. To accommodate user-supplied names, players with bookmark import capabilities must be able to open bookmark files and read the uid value to match the correct bookmark file with the current DTB.

    Players may implement a variety of systems for numbering or otherwise identifying bookmarks or highlighted sections so the user can step through and choose from a group of them, but the default order in which they are numbered must be the order in which they fall in the document. When exported, bookmarks and highlights shall be in the default order.

    This standard prescribes no methods for setting or accessing bookmarks or highlights, for notifying the user of their presence, or for controlling the export or import process. Player manufacturers are free to design the user interface for bookmarks/highlights in whatever manner integrates most effectively into the overall user interface of the playback device.

    9.2 Bookmark/Highlight Elements

    (This section is informative)

    Brief descriptions of the Bookmark/Highlight elements follow. Each includes the element declaration extracted from the Bookmark DTD found in Appendix 2, along with descriptions of any applicable attributes.