National Information Standards Organization
File Specifications for the Digital Talking Book

Draft -- Version 3.8

February 1, 2001

Foreword

NISO Voting Members
NISO Board of Directors
Standards Committee AQ
Acknowledgements

General Information
Overview
Package File
Content Format for Text
Audio File Formats
Other Media File Formats
Synchronization of Media Files
Navigation Control
Portable Bookmarks/Highlights
Resource File
Packaging Files for Distribution
Presentation Styles
Types of DTB
Digital Rights Management
Time-Scale Modification
Conformance
References to Other Specifications/Documents

Appendix 1 - DTDs for Navigation

Appendix 1.1 - DTD for NCX
Appendix 1.2 - DTD for Navigation with Lightweight Players

Appendix 2 - DTD for Portable Bookmarks/Highlights
Appendix 3 - DTD for Resource File
Appendix 4 - DTB SMIL Profile and DTB-Specific DTD
Appendix 5 - DTBook3 DTD
Appendix 6 - Accessibility Issues
Appendix 7 - Theory Behind the DTBook3 DTD
Appendix 8 - Distribution Information DTD

Foreword

(This foreword is not a part of the American National Standard for Digital Talking Books... . It is included for information only.)

This standard presents the file specifications for digital talking books (DTBs) for blind, visually impaired, physically handicapped, or otherwise print-disabled readers. For many years, "talking books" have been made available to print-disabled readers on analog media phonograph records and audio cassettes. Those media served their users well in providing human-speech recordings of a wide array of print material in increasingly robust and cost-effective formats. However, analog media are limited in several respects when compared to a print book. First, they are by their nature linear presentations, which while suitable for novels, leaves much to be desired when reading reference works, textbooks, magazines, and other materials which are often accessed randomly. Digital media offer readers the ability to move around a book or magazine as freely as (and more efficiently than) a sighted reader flips through a print book. Second, analog recordings do not allow users to interact with the book, placing bookmarks, highlighting material, and so forth. A DTB offers this capability, storing the bookmarks and highlights separate from, but associated with, the DTB itself. Third, talking book users have long complained that they did not have access to the spelling of the words their heard. As will be explained below, some DTBs will include a file containing the full text of the work, synchronized with the audio presentation, thereby allowing readers to locate specific words and hear them spelled. Finally, analog audio offers readers only one version of the document. If, for example, a book contains footnotes, they are either read where referenced, which burdens the casual reader with unwanted interruptions, or grouped at a location out of the flow of the text, making it difficult for interested readers to access them. A DTB allows the user to easily skip over or read footnotes. So the Digital Talking Book offers the print-disabled user a significantly enhanced reading experience -- one that is much closer to that of the sighted reader using a print book. This standard describes the various files that make up a DTB and specifies how each must be formatted.

DTBs go far beyond the limits imposed on analog audio books because they can include not just the audio rendition of the work, but the full text file and images as well. Because the text file is synchronized with the audio file, a DTB offers multiple sensory inputs to readers, a great benefit to learning-disabled readers, for example. Some visually impaired readers may choose to listen to most of the book, but find that inspecting the images provides information not available in the narrative flow. Others may opt to skip the audio presentation altogether and instead view the text file via screen-enlarging software. Braille readers may prefer to read some or all of the document via a refreshable braille display connected to their DTB player and accessing the text file.

Digital Talking Books are not tied to a single distribution medium. CD-ROMs will be used first but DTBs will be portable to any digital distribution medium capable of handling the large files associated with digital audio recordings. Regardless of how a DTB is distributed, however, it must be in the context of a digital rights management system whose functional requirements this standard describes.

The initiative behind this document grew from a desire to standardize DTB file structures, in the hope that it might prevent a recurrence of the multiple formats currently used for talking books throughout the world. This document benefitted greatly from the work of the DAISY Consortium, whose members had broken much of the ground covered in this standard and who contributed enormously to the solution of the many problems encountered.

NISO Voting Members

NISO Board of Directors

Standards Committee AQ

Standards Committee AQ on Digital Talking Books had the following members at the time this standard was approved:

Mr. Donald J. Breda
American Council of the Blind
Mr. George Brummell
Blinded Veterans Association
Mr. John Bryant
National Library Service for the Blind and Physically Handicapped
Library of Congress
Mr. Glen Cavanaugh
Telex Communications, Inc.
Mr. Curtis Chong
World Blind Union
Mr. Thomas Kjellberg Christensen
DAISY Consortium
The Danish National Library for the Blind
Mr. John Cookson
National Library Service for the Blind and Physically Handicapped
Library of Congress
Mr. Frank Kurt Cylke
National Library Service for the Blind and Physically Handicapped
Library of Congress
Mr. Jack Decker
American Printing House for the Blind
Dr. Judith Dixon
National Library Service for the Blind and Physically Handicapped
Library of Congress
Mr. Jim Dust
Telex Communications, Inc.
Dr. Michael Gosse
National Federation of the Blind
Mr. Luis Gutierrez
American Foundation for the Blind
Mr. John Hedges
American Printing House for the Blind
Mr. Mark Hakkinen
isSound, Inc.
Ms. Vivian Juig
The Hadley School for the Blind
Ms. Rosemary Kavanagh
Canadian National Institute for the Blind
Mr. George Kerscher
Recording for the Blind and Dyslexic and DAISY Consortium
Mr. Wells "Brad" Kormann
National Library Service for the Blind and Physically Handicapped
Library of Congress
Ms. Kathie Korpolinski
Recording for the Blind and Dyslexic
Dominic Labbé
VisuAide, Inc.
Ms. Mary-Frances Laughton
Assistive Devices Industry Office
Industry Canada
Mr. Thomas McLaughlin
National Library Service for the Blind and Physically Handicapped
Library of Congress
Mr. Michael Moodie, Chair
National Library Service for the Blind and Physically Handicapped
Library of Congress
Ms. Freddie Peaco
National Library Service for the Blind and Physically Handicapped
Library of Congress
Mr. Gilles Pepin
VisuAide, Inc.
Mr. Lloyd Rasmussen
National Library Service for the Blind and Physically Handicapped
Library of Congress
Ms. Janina Sajka
American Foundation for the Blind
Mr. Rudy Savage
Talking Book Publishers, Inc.
Mr. Larry Skutchan
American Printing House for the Blind
Ms. Linda Stetson
Association of Specialized and Cooperative Library Agencies American Library Association
Mr. George Stockton
National Library Service for the Blind and Physically Handicapped
Library of Congress
Ms. Karen Taylor
Canadian National Institute for the Blind

Contents

Acknowledgements

Standards Committee AQ gratefully acknowledges the assistance of the following individuals: Robert Berkovitz, Sensimetrics Corporation; Harvey Bingham; Mike Brown; John Churchill, Recording for the Blind and Dyslexic; Hiromitsu Fujimori, Plextor Corporation; Manon Gaudet, VisuAide, Inc.; Al Gilman; Steve Jacobs, NCR Corporation; Lynn Leith, Canadian National Institute for the Blind; Rob Meredith, American Printing House for the Blind; Tatsu Nishizawa, Plextor Corporation; James Pritchett, Recording for the Blind and Dyslexic; Dr. Gregg Vanderheiden, TRACE Research and Development Center, University of Wisconsin; Mr. Paul Vassallo, National Institute of Standards & Technology; Norm Welch, EvaTone, Inc.; with special thanks to members of the DAISY Consortium's Specifications and Guidelines Work Team. Thanks also to these members of the W3C Synchronized Multimedia (SYMM) Working Group: Dick Bulterman, Oratrix; Wo Chang, NIST; Lloyd Rutledge, CWI; Patrick Schmitz, Microsoft.

Contents

1. General Information

(This section is informative)

1.1 Purpose and Scope of Standard

This standard establishes the file specifications for digital talking books (DTBs) for blind, visually impaired, physically handicapped, or otherwise print-disabled readers. Its purpose is to ensure interoperability across service organizations and vendors providing content and playback systems to the target population.

This standard provides specifications applicable to all aspects of digital talking book production and rendering, including authoring tools for DTBs, hardware- or software-based playback devices, and compliance-testing software.

Contents

1.2 Definitions

The following acronyms and terms are used in this standard as defined below. In the following definitions and throughout the standard, bracketed items correspond to entries in section 17, "References to Other Specifications/Documents," where the full URL is provided for each reference.

Accessible: With respect to implementations, accessible refers to the design and functionality of the playback system where all features are usable by the target population.
CSS: Cascading Style Sheets [CSS] is a mechanism for adding style (e.g. fonts, colors, spacing, formatting) to HTML or XML documents.
DRM: Digital Rights Management is a system of tools and processes that protect intellectual property when it is encoded and distributed in digital form.
DTB: The Digital Talking Book content data set that complies with the specifications in this standard.
DTD: The Document Type Definition file contains machine-readable rules that define allowable XML markup for a particular application.
DTBook3: DTBook3 is a unique DTD file (dtbook3.dtd) that defines the XML markup for the text content of a DTB.
Fragment Identifier: A means to address a named place in a document. For reference within the current document, the reference part is to a named target, and begins with "#".See URI for addressing into another document.
Global navigation: Efficient movement to user-selected portions of a document, with that movement enabled by the NCX. Navigation targets may be headings representing the hierarchical structure of the document or specific points such as pages, notes, sidebars, etc.
Guide: A component of the Package File, the Guide lists the key structural features of a DTB, such as the table of contents, introduction, bibliography, etc. to enable playback devices to provide convenient access to them.
IMPLIED: When used in definitions of attributes, means the attribute is optional, as opposed to REQUIRED.
Informative: An explanatory part of this standard. Contrast with normative.
Local navigation: Movement within a document at a granularity finer than that provided by the NCX. For example, navigation by paragraph or sentence, or within a table or nested list. Precise local navigation is controlled by the text file; the granularity is limited by the degree to which the text file has been marked up. Time-based movement through a document (similar to fast-forward and rewind on an analog cassette) may also be implemented.
Manifest: A component of the Package File, the Manifest lists all files included in the DTB.
May: In this standard, the word "may" is to be interpreted as an optional feature that is not required but can be provided.
Must: In this standard, the word "must" is to be interpreted as a mandatory requirement on the content or implementation. The term "shall" has the same definition as "must".
Normative: A portion of the standard that supplies precise specifications rather than background or explanation. Contrast with informative. Notes within a normative section may be informative.
NCX: The Navigation Control file for XML applications (NCX) provides the reader efficient and flexible access to the hierarchical structure of a DTB as well as direct access to selected elements such as page numbers, notes, figures, etc.
OEBF (Open eBook Forum): An organization formed to create and maintain standards and promote the successful adoption of electronic books. The Open eBook Publication Structure Version 1.0.1 OEBPS provides a specification for representing the content of a book when it is converted from print to electronic form. This DTB standard utilizes a subset (the Package File) of that specification.
OPF: See "Package File."
Package File: The Open eBook Forum Package File (OPF) is an XML file conforming to the oebpkg1.dtd that contains administrative information about the DTB, the files that comprise it, and how these files interrelate.
Playback: With regard to implementations, playback refers to the methods used to render the DTB content. Playback may include audio, braille, large print, and synthetic speech as appropriate for the content and as supported by the playback system.
Player: The hardware/software platform which renders the contents of a DTB to a user. Synonymous with "Playback System."
Reader: The person reading the digital talking book. Synonymous with "user."
REQUIRED: When used in definitions of attributes, means the attribute is required, as opposed to IMPLIED.
SMIL: The Synchronized Multimedia Integration Language [SMIL] is a draft W3C specification (SMIL 2.0) utilized in this standard to control the synchronized presentation of content in multiple media.
Shall: See "Must."
Should: With respect to implementation, the word "should" is to be interpreted as an implementation recommendation, but not a requirement. With respect to content, the word "should" is to be interpreted as recommended programming practice for content.
Spine: A component of the Package File, the Spine lists the SMIL files included in the DTB in default reading order.
Target population: The target population consists of blind, visually impaired, physically handicapped and otherwise print-disabled readers.
TSM: Time-scale modification (TSM) is variable playback rate (both slower and faster than real time) while maintaining constant pitch.
User: See "Reader."
Text File: The content of the subject document in a character set specified by ISO/IEC 10646 to which XML markup has been applied.
URI: Uniform Resource Identifier, the means to uniquely identify a document and reference it. A URI may include a fragment identifier suffix beginning with "#" that matches some named anchor in the target document.
XML: A file conforming to the Extensible Markup Language 1.0 [XML] specification.
XSL: A file conforming to the Extensible Style Language [XSL] specification.

Contents

1.3 Strategy

This standard is based primarily on a variety of widely used standards and specifications, including several from the World Wide Web Consortium and the Open eBook Forum. Wherever applicable and appropriate standards or specifications existed they were used. The use of these specifications and technologies is intended to promote a fast and consistent adoption of this standard for the target population, while encouraging its extension into mainstream use.

Contents

1.4 Relationship to Other Specifications

This standard is based on the specific versions of the standards and specifications referenced herein, which are used as defined, except as noted by this document. Any refinement or replacement of a referenced specification by a newer or different version is not directly applicable to this standard. Conformance to this standard is based on the versions of the standards and specifications in effect at the time of this writing.

Contents

1.5 Patent Rights

It is possible that compliance with this standard may require the use of one or more inventions covered by patent rights. It is believed that all companies claiming such rights have agreed to grant a license under such rights that they hold on reasonable and non-discriminatory terms and conditions to any applicant.

Producers of DTB systems or any component thereof are responsible for obtaining the appropriate licenses for any and all technology defined by the relevant standards and specifications referenced by this standard.

Issues surrounding the protection of intellectual property embodied in the works distributed as digital talking books are discussed in section 14, Digital Rights Management.

Contents

2. Overview

(This section is informative)

A digital talking book (DTB) is a collection of electronic files arranged to present information to the target population via alternative media, namely, human or synthetic speech, refreshable braille, or visual display, e.g., large print. When these files are created and assembled into a DTB in accordance with this standard, they make possible a wide range of features such as rapid, flexible navigation; bookmarking and highlighting; keyword searching; spelling of words on demand; and user control over the presentation of selected items (e.g., footnotes, page numbers, etc.) For a full discussion of these capabilities, see the Document Navigation Features List [Navigation Features], developed as the user requirements document on which this standard was based. Appendix 7 "Theory Behind the DTBook3 DTD also describes the navigational capabilities of a DTB in some detail. The content of DTBs will range from audio alone, through a combination of audio, text, and images, to text alone.

DTB players will also take a variety of shapes. The simplest might be portable devices with audio-only capabilities. More complex portable players could include text-to-speech capabilities as well as audio output for recorded human speech. The most comprehensive playback systems are expected to be PC-based, supporting visual and audio output, text-to-speech capability, and output to a braille display.

The files comprising a DTB fall into ten categories, as described below:

Package File: The Package File, drawn from the Open eBook Publication Structure 1.0.1, contains administrative information about the DTB and the files that comprise it. An XML version 1.0 file, it contains a complete set of metadata describing the DTB, a list (the manifest) of all files that make up the DTB, a spine that defines the default reading order of the document, and an optional guide to enable easy access to key points in the DTB. See section 3, "Package File."
Document Text File: A DTB may contain part or all of the text of the document, as an XML 1.0 file tagged in accordance with the document type definition (DTD) defined for this standard, DTBook3.dtd. (See Appendix 5, "DTBook3 DTD".) The document text file enables a playback device to spell words on demand, carry out keyword searches, and permit finely-grained navigation. It may also be accessed directly via refreshable braille display, synthetic speech, or screen-enlarging software. See section 4, "Content Format for Text."
Audio Files: A DTB may include human or synthetic speech recordings of the document, embodied in audio files encoded in one of a specified group of audio formats. Section 5, "Audio File Formats," presents the set of formats specified by this standard.
Other Media Files: In addition to text and audio, DTBs may include images which can be presented on PC-based players. Section 6, "Other Media File Formats," lists the formats specified by this standard.
Synchronization Files: To synchronize the different media files of a DTB during playback, this standard specifies the World Wide Web Consortium's (W3C) Synchronized Multimedia Integration Language (SMIL), SMIL 2.0 version, an XML 1.0 application. The DTB SMIL files define a sequence of media events. During each event, text elements and corresponding audio clips as well as any additional visual elements are presented simultaneously. DTB players utilize the synchronization information to both index into the audio presentation and to track, during audio playback, the corresponding position in the document text file. IDs on the SMIL elements match those on the corresponding elements in the text file. This standard utilizes a subset of the full SMIL 2.0 specification. See section 7, "Synchronization of Media Files," for discussion of these issues and Appendix 4, "DTB SMIL Profile and DTB-Specific DTD," for the DTDs and Modules that define the DTB SMIL application.
Navigation Control File: The DTB system supports two modes of navigation, global and local. Global navigation -- movement by structure (chapter, section, subsection) and by other selected points such as pages -- is effected through the Navigation Control file for XML applications (NCX) and the Lightweight Navigation file (LWN). The NCX and LWN present a dynamic view of the document's hierarchical structure, allowing the user to move through the document in large steps corresponding to its major divisions, or in progressively smaller steps down to a limit set by the document's detail. Text, audio, and image elements present to the user the document's headings, and id-based links point to the SMIL presentation at the corresponding locations. Appendix 1 contains the XML 1.0 DTDs for the NCX and LWN. Local (more finely-grained) navigation is not handled by the NCX but is enabled through the document text file or through time-based movement through the audio presentation, depending on the document and on the player. See section 8, "Navigation Control," and Appendix 1, "DTD for NCX." for specifications related to the NCX and LWN.
Bookmark/Highlight File: This standard supports user-set, exportable bookmarks and highlights to which text and audio notes may be applied. Specifications for the XML 1.0 file for portable bookmarks and highlights are presented in section 9, "Portable Bookmarks/Highlights" and Appendix 2, "DTD for Portable Bookmarks/Highlights."
Resource File: The resource file contains various text segments, audio clips, and/or images that provide alternative representations of navigational information -- for example, feedback on the user's current location in the document. It supplies information normally presented in a print book via typographical clues. It is an XML version 1.0 file based on the DTD "resource.dtd". See section 10, "Resource File," and Appendix 3, "DTD for Resource File" for file specifications.
Distribution: Given the great size of audio files, even when heavily compressed, it will be common for large books to span several distribution media. Section 11, "Packaging Files for Distribution," describes how DTB producers will map the location of each file to a specific media unit, e.g., disk 1 of 3. Alternatively, several small books may be distributed on the same media unit. Appendix 8, "Distribution Information DTD," presents the document type definition for "Distinfo" files.
Presentation Styles: Section 12, "Presentation Styles," discusses how presentation styles may be controlled for a variety of outputs modes through the use of optional style sheets.

Element	Description
`a`	contains an anchor, which is used in two ways: A name anchor identifies an exact position in a given document. A link anchor, when activated, "links to" (repositions the focus to) another location within that document or another document. [HTML 4.0]
`abbr`	designates an abbreviation, a shortened form of a word. For example: Mr., approx ., lbs., rec'd.
`acronym`	marks a word formed from key letters (usually initials) of a group of words. For example: UNESCO, NATO, XML.
`address`	contains a location at which a person or agency may be contacted. [HTML 4.0]
`author`	identifies the writer of a given work.
`base`	contains the base URI from which local references start. It acts as an absolute URI that serves as the base URI for resolving relative URIs found within the document. It is an empty element that may appear only in <head>. [HTML 4.0]
`bdo`	is used in special cases where the automatic actions of the bidirectional algorithm would result in incorrect display. [HTML 4.0]
`blockquote`	indicates a block of quoted content that is set off from the surrounding text by paragraph breaks. Compare with <q> which marks short, inline quotations. [HTML 4.0]
`bodymatter`	consists of the text proper of a book, as opposed to preliminary material <frontmatter> or supplementary information <rearmatter>.
`book`	surrounds the actual content of the document, which is divided into <frontmatter>, <bodymatter>, and <rearmatter>. <head>, which contains metadata, precedes <book>.
`br`	marks a forced line break. [HTML 4.0]
`caption`	describes a table. If used, it must follow immediately after the table start tag. [HTML 4.0]
`citation`	marks a reference to another document.
`code`	designates a fragment of computer code. [HTML 4.0]
`col`	is a means to apply attribute values to table columns. [HTML 4.0]
`colgroup`	is a group of columns that may share attribute values within a table. [HTML 4.0]
`dd`	marks a definition of a term within a definition list. [HTML 4.0]
`dfn`	marks the first occurrence of a word or term that is defined or explained elsewhere in a book. [HTML 4.0]
`div`	is a generic container for subdivisions of a book. The <level1> ... <level6> hierarchy, or the <level> tag used recursively, should mark the major hierarchical structures of a book, while <div> is used in less formal circumstances or when for production purposes it is desired that a structure should be treated differently. The class attribute identifies the actual name (e.g., part, chapter, letter) of the structure it marks. Compare with <span> which is used in inline settings. [HTML 4.0]
`dl`	contains a definition list, usually consisting of pairs of terms <dt> and definitions <dd>. Any definition can contain another definition list. [HTML 4.0]
`doctitle`	marks the title of the book, as the first tag within <frontmatter>. It is used to quickly identify the book.
`dt`	marks a term in a definition list. [HTML 4.0]
`dtbook3`	is the root element in the Digital Talking Book 3.0 DTD. Contains metadata in <head> and the document itself in <book>.
`em`	indicates emphasis. Compare with <strong>. [HTML 4.0]
`frontmatter`	contains preliminary material such as the copyright notice, foreword, acknowledgments, table of contents, etc. which serves as a guide to the contents and nature of a book.
`h1`	contains the text of the heading for a <level1> structure. [HTML 4.0 but nested]
`h2`	contains the text of the heading for a <level2> structure. [HTML 4.0 but nested]
`h3`	contains the text of the heading for a <level3> structure. [HTML 4.0 but nested]
`h4`	contains the text of the heading for a <level4> structure. [HTML 4.0 but nested]
`h5`	contains the text of the heading for a <level5> structure. [HTML 4.0 but nested]
`h6`	contains the text of the heading for a <level6> structure. [HTML 4.0 but nested]
`hd`	marks the text of a heading in a <list> or <sidebar>.
`head`	contains metainformation about the book but no actual content of the book itself, which is placed in <book>. This information is consonant with the head information in xhtml. See: http://www.w3.org/ [HTML 4.0]
`hr`	is an empty element indicating a horizontal rule. May be used to indicate a break in the text where only blank lines, a row of asterisks, a horizontal line, etc. are used in the print book. [HTML 4.0]
`img`	marks a visual image. The "src" attribute specifies the location of the image file. The "alt" and "longdesc" attributes may be used to supply short and long descriptions, respectively. may be used to supply short and long descriptions, respectively, although prodnote will generally contain the latter. Longdesc may contain a pointer to the prodnote. The referencing is typically of the form <imgcaption imgref="#yyy">The Caption</imgcaption> containing the printed caption for the <img id="yyy">. [HTML 4.0]
`imgcaption`	contains the caption for one or more <img>. If the caption applies to more than one <img>, each idref in the list of IDs in the imgref is separated by whitespace.
`imggroup`	provides a container for <img> and its associated <imgcaption> and <prodnote>. <prodnote> will contain a description of the image. Content model allows multiple <img> if they share a caption, multiple <imgcaption> if several captions refer to a single <img>, and multiple <prodnote> if different versions are needed for different media (e.g., large print, braille, etc.)
`kbd`	designates information that the reader is to input directly into a computer using the keyboard. [HTML 4.0]
`level`	is an alternative tag for marking the major structures in a book. It may be used recursively, i.e., repeated indefinitely with each successive occurrence nesting within the previous. It may also be included in a subsequent higher level. The class attribute identifies the actual name (e.g., part, chapter, section, subsection) of the structure it marks. the depth attribute indicates the nesting depth, starting at 1. Subordinate levels have greater depth.
`level1`	is the highest level container of major divisions of a book. Used in <frontmatter>, <bodymatter>, and <rearmatter> to mark the largest divisions of the book (usually parts or chapters), inside which level2 subdivisions (often sections) may nest. The class attribute identifies the actual name (e.g., part, chapter) of the structure it marks.
`level2`	contains subdivisions that nest within <level1> divisions. The class attribute identifies the actual name (e.g., subpart, chapter, subsection) of the structure it marks.
`level3`	contains subdivisions that nest within <level2> subdivisions (e.g., subsections within subsections). The class attribute identifies the actual name (e.g., section, subpart, subsubsection) of the subordinate structure it marks.
`level4`	contains subdivisions that nest within <level3> subdivisions. The class attribute identifies the actual name of the subordinate structure it marks.
`level5`	contains subdivisions that nest within <level4> subdivisions. The class attribute identifies the actual name of the subordinate structure it marks.
`level6`	contains subdivisions that nest within <level5> subdivisions. The class attribute identifies the actual name of the subordinate structure it marks.
`levelhd`	contains the text of a heading within <level>. Corresponds to <h1> through <h6> used in <level1> through <level6>.
`li`	marks each list item in a <list>. <li> content may be either inline or block and may include other nested lists. Alternatively it may contain a sequence of list item components, <lic>, that identify regularly occurring content, such as the heading and page number of each entry in a table of contents. [HTML 4.0]
`lic`	("list item component") allows ordered substructure within a list item <li>. Used when a list item is made up of two or more components, as in a table of contents entry.
`line`	marks a single logical line of text. Often used in conjunction with <linenum> in documents with numbered lines.
`linenum`	contains a line number in, for example, legal text.
`link`	is an empty element appearing in the <head> section of a document that establishes a connection between the current document and another document(s). The <link> element conveys relationship information (for example, "next" and "previous") that may be rendered by user agents in a variety of ways. [HTML 4.0]
`list`	contains a list. The "type" attribute can indicate whether a list is ordered or unordered.
`meta`	indicates metadata about the book. It is an empty element that may appear only in <head>. [HTML 4.0]
`noscript`	identifies an alternate method for carrying out a function when a playback device cannot execute a <script>. See <script>. [HTML 4.0]
`note`	marks a footnote, endnote, annotation, etc. The reference to the note within the text is marked with a <noteref>.
`noteref`	marks one or more characters that reference a footnote, endnote, or annotation <note>.
`notice`	contains a warning, caution, or other type of admonition normally found in the margin of a book. Differs from a sidebar in that a notice must be presented at a specific location within the text and its presentation is not optional.
`object`	marks an embedded object, which may consist of scripts, applets, images, etc. [HTML 4.0]
`p`	contains a paragraph. [HTML 4.0]
`pagenum`	contains a page number from the print document, recorded as the first text object on a page. The"page" attribute allows three types of page numbering schemes to be identified: "normal" arabic numbering in the body of the book, "front" pages (from the frontmatter), and "special" pagination schemes such as hyphenated numbers in appendices.
`param`	provides a named property for <object>. [HTML 4.0]
`prodnote`	contains language added to the alternative-format version by the producers; commonly used to provide verbal descriptions of visual elements such as charts, graphs, etc.; to supply operating instructions; or to describe differences between the print book and the audio version.
`q`	contains a short, inline quotation. Compare with <blockquote> which marks a longer quotation set off from the surrounding text. [HTML 4.0]
`rearmatter`	contains supplementary material such as appendices, glossaries, bibliographies, and indices following the <bodymatter> of the book.
`samp`	contains a sample of work created by the author for use as an example or template. For example, a sample business letter, resume, computer program output, or form. [HTML 4.0]
`script`	contains a script, a program that may accompany a document or be embedded directly in it. The program executes on the client's machine when the document loads, or at some other time such as when a link is activated. See <noscript> for an alternative in case the <script> cannot be executed. [HTML 4.0]
`sent`	marks a sentence.
`sidebar`	contains information supplementary to the main text and/or narrative flow and is often boxed and printed apart from the main text block on a page.
`span`	is a generic container for use in inline settings when no specific tag exists for a given situation. The class attribute may describe the nature of the text it marks (e.g., a typographical error). May be used to mark a class of items to which styles are to be applied. Compare with <div> which is used in block settings. [HTML 4.0]
`strong`	marks stronger emphasis than <em>. [HTML 4.0]
`style`	is means to include styling information that applies to the book. It may appear only in <head>. [HTML 4.0]
`sub`	indicates a subscript character (printed below a character's normal baseline). Can be used recursively and/or intermixed with <sup>. [HTML 4.0]
`sup`	marks a superscript character (printed above a character's normal baseline). Can be used recursively and/or intermixed with <sub>. [HTML 4.0]
`table`	contains a table data arranged in rows and columns. [HTML 4.0]
`tbody`	marks a group of rows in the main body of a table. If the table is divided into several sections, each consisting of a number of rows, each section would be separately tagged with tbody. [HTML 4.0]
`td`	indicates an individual data cell in the body of a table. [HTML 4.0]
`tfoot`	marks table footer information, consisting of one or more rows (each marked with the tr tag). [HTML 4.0]
`th`	indicates a table cell containing header information. [HTML 4.0]
`thead`	marks header information in a table, consisting of one or more rows (each marked with the tr tag) of <th> cells. [HTML 4.0]
`title`	contains the title of the book but is used only as metainformation in <head>. Use <doctitle> within <book> for the actual book title, which will usually be the same. [HTML 4.0]
`tr`	marks one row of a table containing <th> or <td> cells. [HTML 4.0]
`var`	indicates an instance of a variable or program argument. Commonly used as a placeholder for text to be entered by the user. [HTML 4.0]
`w`	marks a word.

DTB Type	OPF	NCX	Audio	Text	SMIL	Image
Full audio only	R	R	R	N/A	R	N/A
Full audio+structure	R	R	R	N/A	R	N/A
Audio+structure+partial text	R	R	R	R	R	O
Audio+structure+full text	R	R	R	R	R	O
Full text+structure+partial audio	R	R	R	R	R	O
Full text+structure, no audio	R	R	N/A	R	R	O
Full text only	R	R	N/A	R	R	O

National Information Standards Organization File Specifications for the Digital Talking Book

Draft -- Version 3.8

February 1, 2001

(This section is informative)

1.5 Patent Rights

(This section is informative)

3. The DTB Package File

(This section is normative)

(This section is informative)

3.1Package Identity

(This section is normative)

(This section is informative)

3.2 Publication Metadata

(This section is normative)

3.2.1 Dublin Core Metadata

(This section is normative)

3.2.2 X-Metadata

(This section is normative)

(This section is informative)

3.3 Manifest

(This section is normative)

(This section is informative)

3.4 Spine

(This section is normative)

(This section is informative)

3.5 Tours

(This section is informative)

3.6 Guide

(This section is informative)

4. Content Format for Text

4.1 Introduction

(This section is normative)

4.2 Using the DTBook3 Element Set

(This section is informative)

4.2.1 Modular Extension of the DTD

(This section is informative)

4.3 Playback Systems and DTBook3

(This section is informative)

4.4 DTBook3 Tags

(This section is informative)

5. Audio File Formats

(This section is normative)

5.1 Introduction

(This section is normative)

5.2 Distribution File Formats

(This section is normative)

6. Other Media File Formats

(This section is normative)

7. Synchronization of Media Files

7.1 Introduction

(This section is informative)

7.2 SMIL Modules

(This section is informative)

(This section is normative)

7.3 SMIL Elements

(This section is informative)

7.3.1 Common Attributes

(This section is informative)

7.4 SMIL Production Issues

7.4.1 "Escapable" Structures

(This section is normative)

7.4.2 Automatic Invocation of Special Navigation Modes

(This section is normative)

7.4.3 "Skippable" Structures

(This section is normative)

7.4.4 Packaging Files across Several Distribution Media

(This section is normative)

7.4.5 Use of CustomTest Element and Attribute

(This section is normative)

7.4.6 Links

(This section is normative)

7.5 SMIL Metadata

(This section is normative)

7.6 SMIL Examples

(This section is informative)

7.7 Clock Values

(This section is normative)

8. Navigation Control (NCX and LWN)

8.1 Introduction

(This section is informative)

National Information Standards Organization
File Specifications for the Digital Talking Book