The Cover PagesThe OASIS Cover Pages: The Online Resource for Markup Language Technologies
Advanced Search
Site Map
CP RSS Channel
Contact Us
Sponsoring CP
About Our Sponsors

Cover Stories
Articles & Papers
Press Releases

XML Query

XML Applications
General Apps
Government Apps
Academic Apps

Technology and Society
Tech Topics
Related Standards
Last modified: July 15, 1998
Megginson, Structuring XML Documents. Contents.

Structuring XML Documents
by David Megginson

[Volume Description and Table of Contents]

Megginson, David. Structuring XML Documents. Charles F. Goldfarb Series on Open Information Management. [Subseries:] The Definitive XML Series from Charles F. Goldfarb. Upper Saddle River, NJ: Prentice Hall PTR, [March] 1998. Extent: xxxviii + 425 pages, CDROM. ISBN: 0-13-642299-3. Price: US $39.95.

A volume description and provisional Table of Contents for David Megginson's book Structuring XML Documents are provided below. See the full bibliography entry for a publisher's description of the work and other details; see also the "Prentice-Hall SGML Series" web page. David Megginson is the senior architect with Microstar Software Ltd., principal in Megginson Technologies Ltd.), and is the design lead for SAX, the Simple API for XML, a common event-based XML API now in use by many parsers and applications. Other published works by Megginson are listed on the author's Home Page. He may be contacted by email at


Structuring XML Documents is not a beginner's tutorial on XML, but a book written on an intermediate/advanced level, designed to help applications designers build XML/SGML DTDs that work in real-world document systems. The author interacts rigorously with five major industry-standard DTDs -- ISO 12083, DocBook, Text-Encoding Initiative (TEI), MIL-STD-38784 (CALS), Hypertext Markup Language (HTML 4.0) -- to illustrate how the necessary customizations and extensions can be implemented to support enterprise document processing objectives.

Structuring XML Documents is designed to help users apply XML and SGML to solve their document structuring problems. Specifically, readers will learn to: "1) analyze DTDs and adapt them for their specific processing needs; 2) build DTDs that are easier for others to learn, use, and process; 3) ensure structural compatability throughout their collection of enterprise DTDs; 4) use the new Architectural Forms standard to simplify complex DTD problems." [adapted from the back cover] The book's primary features, according to the front cover description: "1) Covers XML and Full SGML; 2) [Provides] the Expert's Guide to DTD Development; 3) [Helps] Leverage the Power of Architectural Forms; 4) Up to Date: Based on XML 1.0; 5) Companion CD-ROM Includes State-of-the-Art DTDs Plus XML Parsing Tools."

Structuring XML Documents is organized in four major parts:Part 1: Background, Part 2: Principles of DTD Analysis, Part 3: Advanced Issues in DTD Maintenance and Design, Part 4: DTD Design with Architectural Forms. Part 1 provides the reader with a review of XML/SGML DTD syntax sufficient to support an understanding of advanced topics treated in the remainder of the book; it also introduces the five industry DTDs that are to be used as models elsewhere in the book. Part 2 develops general principles for design and analysis using XML/SGML DTDs, as applicable to the collaborative work of writers, editors, and engineers. Part 3 of the book examines advanced topics in DTD design and maintenance, including: building compatibility between various versions of DTDs, document disassembly and reassembly, and DTD customization. Part 4, "DTD Design with Architectural Forms," illustrates the use of Architectural Forms and architecture processing relevant to SGML/XML documents, as recently standardized in the SGML Extended Facilities. The three chapters of Part 4 introduce Architectural Form processing, explain the most important features of the syntax, and address some advanced architectural-form constructs for difficult situations. In addition to a General Index (Appendix B), Appendix A of the book provides a detailed method for accessing the elements and attributes discussed in the industry DTDs: "Model DTDs: Index of Element Types and Attributes."

The companion CDROM for Structuring XML Documents provides several resources which enhance the value and usefulness of the book: 1) Free XML/SGML software and a live parsing demo; 2) HTML (live) links for the latest information on XML and SGML at the time of release; 3) Index of URLs mentioned in the book, organized by chapter; 4) Information on the five model DTDs used in the book, with links to local copies of four of them; 5) The standard ISO character entity sets for SGML.

The sub-series title of Structuring XML Documents -- "The Definitive XML Series from Charles F. Goldfarb" -- reflects the recent bifurcation of the primary Goldfarb series ("The Charles F. Goldfarb Series on Open Information Management"), at least for categorization, into "XML Titles" and "SGML Titles." In this schema, Megginson's book and The SGML Buyer's Guide are member of the former set, along with other XML titles "coming soon" from well-recognized authors: XML by Example, by Sean McGrath; The XML Handbook, by Charles Goldfarb and Paul Prescod; Designing XML Internet Applications, by Michael Leventhal, David Lewis and Matthew Fuchs; The XML and SGML Cookbook: Recipes for Structured Information, by Rick Jelliffe. The subseries description, as printed reads: "As XML is a subset of SGML, the Series List is categorized to show the degree to which a title applies to XML. 'XML Titles' are those that discuss XML explicitly and may alkso cover full SGML. 'SGML Titles' do not mention XML per se, but the principles covered may apply to XML."

Table of Contents

Foreword, by Charles F. Goldfarb

0. Introduction
  0.1. XML and SGML
  0.2. The Book's Structure
  0.3. Notations and Conventions
    0.3.1. Presentation of Examples
    0.3.2. Typographical Conventions

Part 1: Background

Chapter 1. Review of DTD Syntax
  1.1. Document type declaration
  1.2. Elements
    1.2.1. Element Type
    1.2.2. Content Specification Content Model Mixed Content Element Content Content Particles The ANY Keyword The EMPTY Keyword
    1.2.3. SGML: Elements Multiple Element Types Omitted Tag Minimization Exceptions Declared Content Mixed Content Unordered Content
  1.3. Attributes
    1.3.1. Attribute Type String Type Tokenized Types Enumerated Types NOTATION Attributes
    1.3.2. Default Value Literal Values Keywords
    1.3.3. Multiple Declarations
    1.3.4. SGML: Attributes Attribute Types Attribute Default Values Multiple Attribute Definition Lists Global Attributes
  1.4. Entities
    1.4.1. Entity Location
    1.4.2. Entity Definitions
    1.4.3. Entity Boundaries
    1.4.4. SGML: Entities Default Entity External Identifiers Data Text External Entity Types
  1.5. Notations
    1.5.1. Notation Declarations
    1.5.2. SGML: Notations Data Attributes
  1.6. Conditional Sections
  1.7. Processing Instructions
    1.7.1. Why bother with Processing Instructions?
    1.7.2. SGML: Processing Instructions PI Entities

Chapter 2. Model DTDs
  2.1. Reading about the Model DTDs
    2.1.1. Sample Documents
  2.2. A Note on Using Industry-Standard DTDs
  2.3. The Five Model DTDs
    2.3.1. ISO 12083 Background Quick Tour What's on Top? What's in the Middle? What's on the Bottom? Sample Document Availability
    2.3.2. DocBook Background Quick Tour What's on Top? What's in the Middle? What's on the Bottom? Sample Document Availability
    2.3.3. Text-Encoding Initiative (TEI) Background Full TEI Quick Tour What's on Top? What's in the Middle? What's on the Bottom? Sample Document Availability
    2.3.4. MIL-STD-38784 (CALS) Background Quick Tour What's on Top? What's in the Middle? What's on the Bottom? Sample Document Availability
    2.3.5. Hypertext Markup Language (HTML 4.0) Background Quick Tour What's on Top? What's in the Middle? What's on the Bottom? Sample Document Availability

Part 2: Principles of DTD Analysis

Chapter 3. Ease of Learning
  3.1. DTD Size
    3.1.1. Logical Units Examples from the Model DTDs
    3.1.2. Learning Requirements Examples from the Model DTDs
  3.2. DTD Consistency
    3.2.1. Naming Examples from the Model DTDs
    3.2.2. Parallel Design Examples from the Model DTDs
    3.2.3. Element-Type Classes Examples from the Model DTDs
    3.2.4. Global Attributes Examples from the Model DTDs
  3.3. DTD Intuitiveness
    3.3.1. Naming Examples from the Model DTDs
    3.3.2. Structure Examples from the Model DTDs

Chapter 4. Ease of Use
  4.1. Physical Effort
    4.1.1. Content Models Examples from the Model DTDs
    4.1.2. Attribute Definitions Examples from the Model DTDs
  4.2. Choice
    4.2.1. Limiting Choices Examples from the Model DTDs
  4.3. Flexibility
    4.3.1. Descriptive and Prescriptive DTDs Examples from the Model DTDs
    4.3.2. Inline Element Types Examples from the Model DTDs
    4.3.3. Role Attributes Examples from the Model DTDs
    4.3.4. Generic Element Types Examples from the Model DTDs

Chapter 5. Ease of Processing
  5.1. Predictability
    5.1.1. Constraint Examples from the Model DTDs
    5.1.2. Recursion Examples from the Model DTDs
    5.1.3. Generic Element Types and Role Attributes Examples from the Model DTDs
    5.1.4. Authors' Modifications Examples from the Model DTDs
    5.1.5. SGML: Placement of Data and Subdocument Entities Examples from the Model DTDs
  5.2. Context
    5.2.1. Containers Examples from the Model DTDs
    5.2.2. Implied Attribute Values Examples from the Model DTDs
  5.3. DTD Analysis: Final Considerations

Part 3: Advanced Issues in DTD Maintenance and Design

Chapter 6. DTD Compatibility
  6.1. Structural Compatibility
    6.1.1. Repetition
    6.1.2. Omissibility
    6.1.3. Alternation
    6.1.4. Changes in Combination Changes to the Same Content Token New Element Types
    6.1.5. ANY and EMPTY
    6.1.6. Attribute Compatibility Repetition Omissibility Changes to Default Value Alternation Typing
    6.1.7. SGML: Structural Compatibility Ordering Ordering of Data Repetition of Data CDATA and RCDATA declared content Inclusion and Exclusion Exceptions Additional SGML Attribute Types
  6.2. Lexical Compatibility
    6.2.1. Entities
    6.2.2. Whitespace
    6.2.3. SGML: Lexical Compatibility Markup Minimisation Start-Tag Omission End-Tag Omission Record Ends

Chapter 7. Exchanging Document Fragments
  7.1. Editing Fragments as Stand-Alone Documents
    7.1.1. Ancestors and Siblings
    7.1.2. Cross-References Changing IDREFs Creating Placeholders
    7.1.3. Entities
    7.1.4. Summary
    7.1.5. SGML: Stand-Alone Fragments #CURRENT Attributes Inclusion and Exclusion Exceptions Inclusion Exceptions Exclusion Exceptions
  7.2. Reparenting in a Dummy Document
    7.2.1. Ancestors and Siblings
    7.2.2. Cross-References
    7.2.3. Entities
    7.2.4. Summary
    7.2.5. SGML: Reparenting Inclusion and Exclusion Exceptions
  7.3. Using Subdocuments
    7.3.1. Ancestors and Siblings
    7.3.2. Cross-References Simple External Reference: HyTime Scheme HyTime Value Reference Simple External Reference: XLL Scheme
    7.3.3. Entities
    7.3.4. Summary
    7.3.5. SGML: Subdocuments SUBDOC Entities Inclusion and Exclusion Exceptions

Chapter 8. DTD Customisation
  8.1. Types of Customisation
    8.1.1. Simplifying a DTD for Authoring Eliminating Unnecessary Choice Avoiding Markup Errors
    8.1.2. Adding Element Types to a DTD
    8.1.3. Restructuring a DTD's Components
  8.2. Extension Mechanisms in the Model DTDs
    8.2.1. Customising the DocBook DTD
    8.2.2. Customising the TEI DTDs Base and Auxiliary Tagsets
    8.2.3. Customising the HTML DTD
    8.2.4. Customising the MIL-STD-38784 DTD
    8.2.5. Customising the ISO 12083 DTDs

Part 4: DTD Design with Architectural Forms

Chapter 9. Architectural-Forms Concepts
  9.1. Meta-DTDs
  9.2. Documents
    9.2.1. Types of Architectural Forms
    9.2.2. The Architectural Document
  9.3. Practical Uses of Architectural Forms
    9.3.1. DTD Extension
    9.3.2. Software Reusability A Common Book Architecture?
    9.3.3. Multi-Use Documents
    9.3.4. Extended Validation
  9.4. Summary of Terminology

Chapter 10. Basic Architectural-Forms Syntax
  10.1. Setup and Configuration
    10.1.1. Architecture Use Declaration Attributes
    10.1.2. SGML: Original Syntax Architecture Base Declaration Architecture Notation Declaration Architecture Entity Declaration Architecture Support Attributes
  10.2. Basic Forms
    10.2.1. Deriving Elements Element Form Strategies
    10.2.2. Deriving Attributes
    10.2.3. Deriving Notations
    10.2.4. SGML: Basic Forms Notation Forms

Chapter 11. Advanced Architectural-Forms Syntax
  11.1. Automatic Derivation
    11.1.1. SGML: Automatic Derivation
  11.2. Suppressing Architectural Processing
    11.2.1. Suppressing Elements
    11.2.2. Suppressing Data
    11.2.3. SGML: Suppressing Architectural Processing
  11.3. Architectural Attribute Values
    11.3.1. Attribute Defaulting
    11.3.2. Tokens
    11.3.3. Deriving Content from Attribute Values
    11.3.4. Deriving Attribute Values from Content
    11.3.5. SGML: Architectural Attributes
  11.4. Default Architectural Information
    11.4.1. Creating a Default Notation
    11.4.2. Resolving IDREFs
    11.4.3. SGML: Default Architectural Information
  11.5. Meta-DTDs
    11.5.1. Meta-DTD Configuration SGML: Meta-DTD Configuration
    11.5.2. SGML: Meta-DTDs Meta-DTD Quantities General NAMECASE Substitution

Back Matter

Appendix A. Model DTDs: Index of Element Types and Attributes

Appendix B. General Index

[Prepared by Robin Cover as part of the SGML/XML Web Page.]

Hosted By
OASIS - Organization for the Advancement of Structured Information Standards

Sponsored By

IBM Corporation
ISIS Papyrus
Microsoft Corporation
Oracle Corporation


XML Daily Newslink
Receive daily news updates from Managing Editor, Robin Cover.

 Newsletter Subscription
 Newsletter Archives
Globe Image

Document URI:  —  Legal stuff
Robin Cover, Editor: