Minimal XML 1.0

Version: 2000-04-11

Editors:

Don Park (Docuverse) – donpark@docuverse.com

[Ed:Add names of top N contributors to SML-DEV archive]

1. Introduction

Minimal XML is a subset of XML 1.0; including features essential for data interchange applications, and excluding non-essential features that are arcane, legacy-related, problematic for data interchange applications, or redundant.

1.2 Goals

· A subset that allows easily implemented parsers that are much faster and smaller than full XML parsers.

· A subset with simpler information model that can easily be mapped to other information models.

· A subset that is much easier to learn, teach, and use.

1.2 Features Not Supported in Minimal XML

· Attributes [ ? ]

· CDATA Sections [ ? ]

· Comments [ ? ]

· Document Type Declarations [ ? ]

· Empty-Element Tags [ ? ]

· Entity References [ ? ]

· Mixed Contents [ ? ]

· Predefined Entities [ ? ]

· Processing Instructions [ ? ]

· Prolog [ ? ]

· XML Declaration [ ? ]

[Ed: [?] is link to an entry in the Minimal XML FAQ which explains the rationale for removing the feature. We need to start collecting rationales scattered in the message archive. Volunteers?]

2. Character Encoding

Minimal XML documents must be encoded in either UTF-8 or UTF-16. Minimal XML parsers must support both UTF-8 and UTF-16 character encoding formats. [ ? ]

3. Syntax

3.1 Documents

[1] document ::= WS* element WS*

A Minimal XML document contains one or more elements. There is exactly one element, called the root, or document element, which is not contained within another element. White space surrounding the root are not reported. Minimal XML parsers must be able to parse multiple documents in a single stream or file.

Following example shows a file containing two documents whose document elements are <logentry>:

<who>syslogd</who><what>startup</what>

</logentry>

<who>syslogd</who><what>shutdown</what>
</logentry>

3.2 Elements

[2] element ::= STag content ETag

[3] STag ::= '<' Name '>'

[4] ETag ::= '</' Name '>'

[TBD]

Following example shows an element with its start tag (<who>), end tag (</who>), and its content (“syslogd”):

<who>syslogd</who>

3.3 Element Contents

[5] content ::= (element | WS)* | (CharData | CharRef)*

Element content must be either:

· a sequence of elements with optional white spaces between elements

· a sequence of character data and character references

Mixed-contents are not supported. White spaces surrounding elements are not reported.

Following example shows an element (<timestamp>) with two child elements (<date> and <time>), each of which contains character data (“2000/03/26” and “10:20”):

3.4 Element Names

[6] Name ::= [^<>&/]+

In Minimal XML, element names that cannot satisfy the XML 1.0 Name production are reserved. Element names starting with underscore ('_') character are also reserved. Use of character ‘:’ is reserved for namespace mechanisms.

[Add Example]

3.5 Character Data

[7] CharData ::= [^<>&]

Character data may not contain ‘<’, ‘>’ or, ‘&’ in literal form.

[Add Example]

3.6 Character References

[8] CharRef ::= '&#' [0-9]+ ';'

[TBD]

Following example shows three character references representing three reserved characters (‘<’,‘>’,‘&’):

Character data my not contain ‘<’, ‘>’, or ‘&’ in literal form.

3.7. White Spaces

[9] WS ::= (#32 | #9 | #13 | #10)

Space, tab, carriage return, and newline characters are considered to be white spaces in Minimal XML. White spaces surrounding elements are not reported.

[Add Example]

4. Information Model

[Ed: simple paragraph explaining why Minimal XML’s simpler information model is important for data interchange applications.]

[TBD]

In Minimal XML information model, everything is a node.

[Ed: below is a Grove-like version of our information model where name, value, and colors and unified into properties. Just experimenting to see if it comes out clear. Unfortunately, I think it reads more like Zen mumbo-jumbo.]

4.1 Nodes

A node has one or more properties.

4.2 Properties

A property is a node whose nodeName is its property name.

4.3 nodeName Property

nodeName property is a node whose nodeName is ‘nodeName’ and whose nodeValue is element name.

4.4 nodeValue Property

nodeValue property is a node whose nodeName is ‘nodeValue’ and whose value is either a string or a list of nodes.

A. Appendix

A.1 Contributors

[TBD: Collect names from SML-DEV statistic page for frequent posters]