[This local archive copy is from the official and canonical URL, http://www.openebook.org/OEB1.html; please refer to the canonical source document if possible.]
| Open eBook Publication Structure 1.0 |
Open eBook™
Publication Structure 1.0
September 16, 1999
Open eBook™ Publication Structure 1.0
September 16, 1999
Please be advised that this work is protected under Title 17 of the United States Code. Reproduction or dissemination of this work with changes is prohibited except with the express permission of the authors.
1.4 Relationship to Other Specifications
1.4.2 Relationship to XML Namespaces
1.4.5 Relationship to Dublin Core
1.5.1 Document and Publication Conformance
1.5.2 Reading System Conformance
1.5.3 Compatibility with Future Versions
2.2.2 <dc:Creator> </dc:Creator>
2.2.3 <dc:Subject> </dc:Subject>
2.2.4 <dc:Description> </dc:Description>
2.2.5 <dc:Publisher> </dc:Publisher>
2.2.6 <dc:Contributor> </dc:Contributor>
2.2.9 <dc:Format> </dc:Format>
2.2.10 <dc:Identifier> </dc:Identifier>
2.2.11 <dc:Source> </dc:Source>
2.2.12 <dc:Language> </dc:Language>
2.2.13 <dc:Relation> </dc:Relation>
2.2.14 <dc:Coverage> </dc:Coverage>
2.2.15 <dc:Rights> </dc:Rights>
3.2 Generally Applicable Attributes
3.3 Rendering on Limited Reading Systems
3.6.4 charset, type, hreflang, accesskey, shape, coords, tabindex, onfocus, onblur
3.7.6 tabindex, accesskey, onfocus, onblur
3.11 <blockquote> </blockquote>
3.12.3 link, vlink, alink, background
3.15 <center> </center> (deprecated)
3.24 <font> </font> (deprecated)
3.25 <h1> </h1> through <h6> </h6>
3.26.2 id, style, class, title
3.28.1 id, style, class, title
3.35.5 id, style, class, title
3.36.7 align, height, width, border, hspace, vspace (deprecated)
3.36.9 declare, standby, tabindex, name
3.44.1 charset, type, src, defer, event, for
3.44.2 id, style, class, title
3.47 <strike> </strike> (deprecated)
3.52.4 cellspacing, cellpadding
3.53.5 width (deprecated), height (deprecated)
3.54.1 id, style, class, title
4.1 background-color: and color:
4.13 margin-left: and margin-right:
4.14 margin-top: and margin-bottom:
4.17 vertical-align: (meaning and applicability changed)
APPENDIX A: ELEMENT TYPE TABLE
APPENDIX B: THE OEB PACKAGE DTD
The purpose of the Open eBook Publication Structure is to provide a specification for representing the content of electronic books. Specifically:
|
• |
The specification is intended to give content providers (e.g., publishers, and others who have content to be displayed) and tool providers minimal and common guidelines which ensure fidelity, accuracy, accessibility, and presentation of electronic content over various electronic book platforms. |
|
• |
The specification seeks to reflect established content format standards. |
|
• |
The goal of this specification is to provide the purveyors of electronic-book content (publishers, agents, authors et al.) a format for use in providing content to multiple reading systems. |
This specification is based on the premise that in order for electronic-book technology to achieve widespread success in the marketplace, reading systems must have convenient access to a large number and variety of titles.
This specification has been developed through a cooperative effort, bringing together publishers, reading system vendors, software developers, and experts in the relevant standards. This "Open eBook Authoring Group" is composed of:
Facilitator:
Victor McCrary, National
Institute of Standards and Technology,
U.S. Department of Commerce
Authors:
Jeff Alger, Microsoft Corporation
Garth Conboy, SoftBook Press
Horace Dediu, BCL Computers
Steve DeRose, Brown University Scholarly Technology Group
Kimmo Djupsjöbacka, Nokia
Brady Duga, SoftBook Press
Jerry Dunietz, Microsoft Corporation
David Goldstein, Versaware Inc.
Gene Golovchinsky, FX Palo Alto Laboratory, Inc
Markku Hakkinen, The Productivity Works
Gunter Hille, Project Gutenberg-DE
Kate Hughes, Microsoft Corporation
George Kerscher, DAISY Consortium
Steve Kotrch, Simon & Schuster
Brian Lambert, Glassbook Inc.
Jon Noring, Exemplary Technologies
Aleksey Novicov, Softbook Press
David Ornstein, NuvoMedia, Inc.
Dhiren Patel, NuvoMedia, Inc.
Steve Potash, OverDrive Systems
Allen Renear, Brown University Scholarly Technology Group
Mike Riley, R.R. Donnelley & Sons Company
Ben Trafford, Exemplary Technologies
John Voiklis, Red Figure, Inc.
Garret Wilson, GlobalMentor, Inc.
Other Contributing Organizations:
Adobe
EAST Co., Ltd
IBM
Librius
Vadem
Basic OEB Document
An OEB document which restricts itself to the constructs defined in the specification.
Content Provider
A publisher, author, or other information provider, who provides a publication to one or more reading systems in the form described in this specification.
Deprecated
A feature that is permitted, but not recommended, by this specification. Such features may become obsolete in future revisions.
Extended OEB Document
An OEB document which uses constructs beyond those in this specification, but uses the extension mechanism defined herein.
OEB Core Media Type
A MIME media type that all reading systems must support.
OEB Document
An XML document which conforms to this specification.
OEB Package
A file that describes an OEB publication. It identifies all other files in the publication and provides descriptive and access information about them.
OEB Publication
A collection of OEB documents and other files, typically in a variety of media types, including structured text and graphics, that constitutes a cohesive unit for publication.
Reader
A person who reads a publication.
Reading Device
The physical platform (hardware and software) on which publications are rendered.
Reading System
A combination of hardware and/or software that accepts OEB publications, and directly or indirectly makes them available to readers. Great variety is possible in the architecture of reading systems. A reading sytem may be implemented entirely on one device, or it may be split among several computers. In particular, a reading device that is part of a larger reading system need not directly accept OEB publications, but all reading systems must do so. Reading systems may include additional processing functions beyond the scope of this specification, such as compression, indexing, encryption, rights management, and distribution.
1.4 Relationship to Other Specifications
This specification combines subsets and applications of other specifications. Together, these facilitate the construction, organization, presentation, and unambiguous interchange of electronic documents:
1. the XML 1.0 markup meta-language (http://www.w3.org/TR/REC-xml);
2. the XML namespace specification (http://www.w3.org/TR/REC-xml-names);
3. the HTML 4.0 document content markup language (http://www.w3.org/TR/REC-html40), with consideration of the pending XHTML 1.0 specification (http://www.w3.org/TR/xhtml1/);
4. the CSS 1 stylesheet language (http://www.w3.org/TR/REC-CSS1), with a very few properties also from CSS 2 (http://www.w3.org/TR/REC-CSS2);
5. the Dublin Core metadata language (http://purl.org/dc/) and the USMARC relator code list (http://www.loc.gov/marc/relators/re9802r1.html);
6. the Unicode character set (http://www.unicode.org);
7. particular MIME media types (http://www.ietf.org/rfc/rfc1738.txt).
OEB is based on XML because of its generality and simplicity, and because this increases the likelihood that documents will survive longer. XML also provides well-defined rules for the syntax of documents, which decreases the cost to implementers and reduces incompatibility across systems. Further, XML enables extensibility because it is not tied to any particular set of element types, it supports internationalization, and it encourages document markup that can represent a document’s internal parts more directly, making them amenable to formatting and other types of computer processing.
OEB reading systems must be XML processors as defined in XML 1.0. All OEB documents must be well-formed XML documents, although they need not be valid XML documents. However, this specification ensures that for any basic OEB document, there is a syntax form that:
|
• |
is a valid XML document, |
|
• |
conforms fully to the OEB document DTD, |
|
• |
is expected to conform to XHTML 1.0 when that specification is issued, and |
|
• |
is effectively previewable in typical version 4 HTML browsers. |
The last point above does not claim full HTML 4.0 conformance for one primary reason: HTML 4.0 supports the XML empty element syntax in practice (i.e., it works in browsers) but not in formal HTML 4.0 specifications prior to XHTML 1.0.
XML well-formedness requires characteristics beyond what HTML browsers typically require, such as:
|
• |
Elements must be bounded by both start- and end-tags; |
|
• |
Elements must nest properly, with no overlaps; |
|
• |
Attribute values must be quoted; |
|
• |
Attribute assignments must use the non-minimized form (unlike some "border" usages); |
|
• |
All "<" and "&" characters intended as content must be escaped as "<" and "&"; |
|
• |
All element names and attribute names must be consistent in case (all OEB 1.0 names are, as expected for XHTML 1.0, lower-case); |
|
• |
All empty elements must use the XML empty element syntax (this specification also requires whitespace before the trailing slash, although such space is optional in XML; for example, "<br />"). |
Empty elements are those (such as the HTML br and hr elements) that permit no content. The XML and formal HTML syntaxes for these are incompatible, though the XML form with whitespace before the trailing slash is accepted by most HTML browsers (and is strictly conformant XML, as XML ignores whitespace within tags. For this reason, this specification requires this (conforming) variation of the XML form (for example, "<br />"); this is the most portable syntax and it contributes the most to document longevity, even though, strictly speaking, it is not valid in HTML.
Syntactic transformation from valid HTML to well-formed XML is trivial (though semantic transformations that also add brand-new structure and information value may not be). Transformation from invalid but moderately clean HTML is also usually an easy process and easily automated: several free tools already exist for this, such as "Tidy" (see http://www.w3.org/People/Raggett/tidy/). Transformation from extremely dirty HTML to XML, however, is of unpredictable complexity.
Not all well-formed XML 1.0 documents are conformant OEB documents. This is because this specification imposes further constraints in order to improve interoperability. These constraints are the "OEB Common Requirements," defined below.
This specification defines two XML DTDs – the package DTD and the basic OEB document DTD. The package forms the "root" of a complete publication, and reading systems should use it to find and organize publication components. The basic OEB document DTD formally defines the HTML subset described in this specification.
1.4.2 Relationship to XML Namespaces
This version of the specification does not demand that reading systems process XML namespace prefixes according to the XML Namespaces Recommendation at http://www.w3.org/TR/REC-xml-names.
Namespace prefixes are a method for prefixing element and attribute names to distinguish identical names that are drawn from different definition sets. A prefix is associated with a unique URI by an XML namespace declaration. Alternatively, a namespace declaration may identify a URI as the default namespace, applicable to elements lacking a namespace prefix. The XML namespace prefix is separated from the suffix element by a colon.
This specification forbids the use of an oeb: namespace prefix within OEB documents . The use of the dc: prefix, however, is required for Dublin Core metadata element attributes in the OEB package. An element within the OEB document that contains a namespace prefix is treated as an extended element, with the colon acting as a normal XML name character. Reading systems must recognize the colon as a valid name character within an OEB document.
For upwards compatibility, the element metadata in an OEB package is required to have an attribute of xmlns:dc="http://purl.org/dc/elements/1.0/" and xmlns:oebpackage="http://openebook.org/namespaces/oeb-package/1.0/". In addition, the Dublin Core elements will be declared in the OEB package DTD with an explicit prefix of dc:.
This specification recognizes the importance of current software tools, legacy data, publication practices, and market conditions, and so is based on HTML. This approach allows content providers to exploit current HTML content, tools, and expertise.
To minimize the implementation burden on reading system implementers (who may be working with devices that have power and display constraints), the publication structure does not include all HTML elements and attributes. The elements and attributes were selected from the HTML 4.0 specification and were chosen to be consistent with current directions in HTML and XHTML development and the emergence of XML. See Appendix A for a complete table of the element types in HTML 3.2, HTML 4.0, and OEB documents.
Any HTML construct deprecated in HTML 4.0 is either omitted from this specification or is deprecated; CSS-based equivalents are provided in such cases. Stylesheet constructs are also used for new functionality beyond that provided in HTML 4.0..
To achieve predictable results and to support upwards compatibility with future versions of this specification, it is strongly recommended that basic OEB documents be valid XML documents with respect to the OEB document DTD.
This specification defines a style language based on CSS 1 and CSS 2, with a media type of "text/x-oeb1-css". The OEB Authoring Group is aware that this definition of a media type goes against the recommendation of the CSS Working Group (see http://www.w3.org/TR/REC-CSS1), but has chosen to do so due to practical considerations.
The CSS-based stylesheet constructs in this specification have been included to define baseline rendering functionality. To minimize the burden on reading system developers and device manufacturers, not all CSS 1 or CSS 2 properties are included. A few additional properties and values have been added for supporting page layout, headers, and footers.
In a number of cases, this specification does not require reading systems to provide the full range of rendering that a standard CSS stylesheet might request. For example, some reading systems will use monochrome displays. It would neither be acceptable to limit all reading systems to monochrome, nor to declare color use a non-standardized extension beyond OEB. In such cases, the CSS settings are allowed, and keep their meanings; but a conforming reading system may gracefully degrade to a simpler rendering.
This specification supports the inline style attribute, the style element, and externally linked stylesheets. This specification does not require that any handling of XML namespaces be performed by the reading system in the processing of stylesheets.
Stylesheets can be associated with an OEB document in several ways:
1. by style attributes on specific HTML elements;
2. by style elements within the HTML header;
3. by an external stylesheet identified on a link element in the HTML head; and/or
4. by an external stylesheet identified via a processing instruction as defined in the W3C proposed recommendation "Associating stylesheets with XML documents" (http://www.w3.org/TR/xml-stylesheet), or its final form if and when it is adopted.
The relative priority of the first three cases is as defined for HTML 4.0 and CSS 2. Stylesheets linked via a processing instruction are treated as if they had been linked via HTML link elements preceding any actual HTML link elements. As defined in the Conformance section, if no stylesheet is defined or no applicable style is found for a given element, HTML rendering is the default as defined elsewhere in this specification and the HTML4.0 specification.
Styles attached via the first two methods listed above may use only those CSS constructs defined as supported in Section 4 of this specification. External stylesheets linked via the HTML link element or by processing instructions, however, may use this or any other style language, such as XSL (see http://www.w3.org/TR/WD-xsl).
Only those CSS constructs defined as supported in Section 4 of this specification may be included in stylesheets of type "text/x-oeb1-css". Stylesheets of other MIME media types may be substituted for the text/x-oeb1-css stylesheets at the discretion of the reading system.
The HTML 4.0 specification groups externally linked stylesheets into sets by their titles (including a "persistent" set for which the title is the null string). This specification requires that at least one stylesheet in each such set must be of MIME media type "text/x-oeb1-css".
Reading systems that implement only the OEB CSS subset may ignore any stylesheets using other style languages. Reading systems that support extended stylesheet functionality may choose among any of the other external stylesheets. It is strongly recommended that unique MIME media types be defined for any novel stylesheet languages supported, and that stylesheets in those languages be detected by examining the MIME media type.
1.4.5 Relationship to Dublin Core
The Dublin Core is designed to minimize the burden of cataloging on authors and publishers, while providing enough data to be useful. This specification supports the entire current set of Dublin Core metadata elements, supplemented with a small set of additional attributes addressing areas where more specific information may be useful. For example, the role attribute added to the dc:Contributor element allows for much more detailed specification of contributors to a publication, including their roles expressed via relator codes.
Content providers must include a minimum set of a metadata elements, defined in section 2.2, and should incorporate additional metadata to enable readers to discover publications of interest.
Publications may use the entire Unicode character set, in UTF-8 or UTF-16 encodings, as defined by Internet RFC 2279 (see http://info.internet.isi.edu/in-notes/rfc/files/rfc2279.txt). The use of Unicode facilitates internationalization and multilingual documents. However, reading systems are not required to provide glyphs for all Unicode characters.
Reading systems are required to parse all UTF-8 and UTF-16 characters properly (as required by XML). Reading systems may decline to display some characters, but must be capable of signaling in some fashion that undisplayable characters are present. They must not display Unicode characters merely as if they were 8-bit characters. For example, the biohazard symbol (0x2623) need not be supported by including the correct glyph, but must not be parsed or displayed as if its component bytes were the two characters "&#" (0x0026 0x0023).
This specification defines a list of "core" MIME media types that all reading systems must support and publications may include. Publications may include resources of other media types, but for each such resource must include an alternative resource of a core MIME media type (using methods defined in this specification).
The core OEB MIME media types
are:
|
Media Type |
Reference |
Description |
|
image/jpeg |
RFC 2046 |
Used for raster graphics |
|
image/png |
RFC 2083 |
Used for raster graphics |
|
text/x-oeb1-document |
this specification |
Used for basic or extended OEB documents |
|
text/x-oeb1-css |
this specification |
Used for OEB CSS-subset stylesheets |
This section defines conformance for OEB documents, publications, and reading systems.
1.5.1 Document and Publication Conformance
This specification defines two named levels of conformance for OEB documents–basic and extended, and one conformance level for OEB publications.
A conformant OEB document (both basic and extended), or conformant OEB package file, must meet these necessary conditions, referred to in this specification as the "common requirements:"
|
(i) |
it is a well-formed XML document (as defined in XML 1.0); |
|
(ii) |
it begins with a correct XML declaration (e.g. <?xml version=’1.0’?>); |
|
(iii) |
it is encoded in UTF-8 or UTF-16; |
|
(iv) |
for empty elements it uses only the XML empty element syntax with whitespace before the trailing slash; |
|
(v) |
it does not include an internal declaration subset; and |
|
(vi) |
any attribute with a value of NMTOKEN, ID, or IDREF must be a XML name. |