[This local archive copy is from the official and canonical URL, http://info.admin.kth.se/SGML/Konferenser/xml98sve/seminar.html; please refer to the canonical source document if possible.]


Why do We Need XML?

Author: W. Eliot Kimber
ISOGEN International Corp.
eliot@isogen.com
www.isogen.com

Presented at: Swedish SGML User's Group XML Seminar, 8 Sept. 1998

Discusses the XML family of standards.


Copyright (c) 1998 W. Eliot Kimber and ISOGEN International Corp.


Table of Contents


1. Why is XML Needed?

1.2. What Is XML?

  • W3C Recommendation REC-xml-19980210
  • Proper subset of SGML designed for Web use
  • Avoids features that make parsing difficult or costly
  • Requires support for Unicode/ISO 10646
  • Introduces concept of "DTD-less" documents

1.3. Why Isn't SGML Enough?

  • XML doesn't do anything SGML couldn't always do
  • But...XML avoids several of SGML's barriers to acceptance and use
    • Specification is small (about 30 pages)
    • No optional features makes implementation cost easier to judge
    • Makes it clear how light-weight clients can be built
    • Avoids the stigma of the "SGML" name
  • "SGML is 'big iron', tired, outdated"
  • "XML is hip, happening, now"

1.4. Why Isn't HTML Enough?

  • HTML fine for presenting text or structuring Web pages
  • HTML has no well-defined or definable structural rules
  • HTML cannot convey rich structures and semantics
  • HTML not acceptable as an authoring or archival form for many kinds of data
  • HTML is too tightly bound to browser implementations

1.5. Conclusion: We Need XML

  • Need a light-weight, user-friendly form of SGML
  • Need everything we like SGML for:
    • Arbitrary document types
    • Rich structuring and semantics
    • Well-defined and controlled syntax
    • Clearer separation of content from style and behavior
    • Richer data model to enable richer addressing
  • XML provides a powerful marketing vehicle for selling the benefits of SGML to a new audience
  • Gets a whole new group of people thinking about how to take advantage of structured markup

2. The XML Family of Recommendations

2.2. XML Language Recommendation

  • Defines the base syntax for XML documents
  • Analogous to SGML standard (ISO 8879)
  • Defined as a proper subset of SGML
  • All XML documents are, by definition, SGML documents
  • TCs 2 and 3 to ISO 8879 align SGML with XML

2.3. XML Style Language (XSL) Working Draft

  • Defines a standard for print and online style sheets
  • Functionally similar to DSSSL (ISO/IEC 10179:1996)
  • Use XML syntax for style specification
  • Unlike DSSSL, does include complete programming language
  • Strikes what appear to be appropriate balance between simplicity and completeness

2.4. Cascading Style Sheets, level 2 CSS2 Specification Recommendation

  • Simple style mechanism designed for use with HTML documents
  • Can be used with XML documents (assuming browser supports XML display)
  • Most browsers support CSS
  • Not as strong as XSL, but often adequate for many situations

2.5. XML Document Object Model (DOM) Proposed Recommendation

  • Defines "in-memory" model for representing parsed XML documents
  • Designed to provide common structures in XML browsers
  • Is an implementation-specific design, not an abstract data model
  • Implemented by Internet Explorer and Netscape
  • Intended to enable interoperable XML processing across browsers

2.6. XML Name Spaces Working Draft

  • Attempts to enable use of names defined by different sources
  • Allows element type and attribute names to be qualified with a prefix:
    <isogen:para>This is an ISOGEN-defined
    paragraph</isogen:para>
  • A name space is nothing more than a named vocabulary of names
  • Name space is named by a URI (URL or URN)
  • Elements declare the use of name spaces and establish name space scope

2.7. XML Linking Language Working Draft

  • Provides hyperlinking elements that go beyond HTML A element
  • Defines two types of links: simple and extended
  • Simple links are like HTML A element
  • Extended links can be separate from (independent of) the things they link
  • Consistent with HyTime architecture (ISO/IEC 10744:1997)
  • Provides robust linking structures without undue complication
  • Currently requires use of XML Pointers for addressing

2.8. XML Pointer Language Working Draft

  • Provides robust addressing of XML documents
  • Lets you address by ID, element type, structure, etc.
  • Used with normal URLs:
    href="somedoc.xml#id(foo).child(2)"
  • Does not provide any form of indirection
  • Not defined for addressing anything that is not XML
  • Xpointers can be used outside of XML context (e.g., with HyTime)

2.9. Synchronized Multimedia Integration Language (SMIL) Recommendation

  • Provides XML syntax for defining multimedia presentations
  • Similar to HyTime's event schedules
  • Defines spatial and temporal relationships among multimedia components

2.10. Resource Description Framework (RDF) Schema Specification Working Draft

  • Abstract mechanism for defining simple relationships among Web resources
  • Lets you define "graphs" of relationships.
  • Designed primarily to enable the association of metadata with Web resources
  • Provides an XML syntax that can be used with XML documents

2.11. Mathematical Markup Language (MathML) 1.0 Specification Recommendation

  • XML markup language for describing mathematical expressions
  • Can describe expressions presentationally or semantically

3. The Interaction Between XML and HTML

4. XML For Storage, Searching, and Display

4.2. XML for Storage

  • XML, like SGML, well suited as source storage format
  • Relatively compact syntax
  • Generalized and standardized
  • Product independent
  • Issues of storage organization
    • Size of documents
    • Storage as text strings or in parsed form

4.3. XML for Searching

  • Use of content-specific markup enables robust searching
  • Search engines need to be XML aware
  • Existing full-text engines not very XML aware
  • Existing SGML-aware search engines should work for XML

4.4. XML for Presentation

  • Several presentation options:
    • Convert to HTML at server
    • Use specialized Java applications to render in browser
    • Use XSL or CSS to render in browser (requires browser support for XML)
  • First option easy to do using a variety of tools.
  • Second option appropriate for specialized data
  • Third option depends on browser support, which is not currently universal
  • Issues:
    • Do users need the XML at the client?
    • Is presentation of information dependent on interaction with XML structure?
    • Does information require better rendering than HTML provides?

5. Wrap Up