[From: http://www.docuverse.com/smldev/minxmlspec.html; use this canonical/up-to-date document if possible.]
Don Park (Docuverse) donpark@docuverse.com
[Ed:Add names of top N contributors to SML-DEV archive]
Minimal XML is a subset of XML 1.0; including features essential for data interchange applications, and excluding non-essential features that are arcane, legacy-related, problematic for data interchange applications, or redundant.
· A subset that allows easily implemented parsers that are much faster and smaller than full XML parsers.
· A subset with simpler information model that can easily be mapped to other information models.
· A subset that is much easier to learn, teach, and use.
· Attributes [ ? ]
· CDATA Sections [ ? ]
· Comments [ ? ]
· Document Type Declarations [ ? ]
· Empty-Element Tags [ ? ]
· Entity References [ ? ]
· Mixed Contents [ ? ]
· Predefined Entities [ ? ]
· Processing Instructions [ ? ]
· Prolog [ ? ]
· XML Declaration [ ? ]
[Ed: [?] is link to an entry in the Minimal XML FAQ which explains the rationale for removing the feature. We need to start collecting rationales scattered in the message archive. Volunteers?]
Minimal XML documents must be encoded in either UTF-8 or UTF-16. Minimal XML parsers must support both UTF-8 and UTF-16 character encoding formats. [ ? ]
[1] document ::= WS* element WS*
A Minimal XML document contains one or more elements. There is exactly one element, called the root, or document element, which is not contained within another element. White space surrounding the root are not reported. Minimal XML parsers must be able to parse multiple documents in a single stream or file.
Following example shows a file containing two documents
whose document elements are <logentry>
:
<logentry>
<timestamp><date>2000/03/26</date><time>10:10</time></timestamp>
<who>syslogd</who><what>startup</what>
</logentry>
<logentry>
<timestamp><date>2000/03/26</date><time>10:20</time></timestamp>
<who>syslogd</who><what>shutdown</what>
</logentry>
[2] element ::= STag content ETag
[3] STag ::= '<' Name
'>'
[4] ETag ::= '</' Name
'>'
[TBD]
Following example shows an element with its start tag (<who>
), end tag (</who>
), and its content
(syslogd
):
<who>syslogd</who>
[5] content ::= (element | WS)* | (CharData | CharRef)*
Element content must be either:
· a sequence of elements with optional white spaces between elements
· a sequence of character data and character references
Mixed-contents are not supported. White spaces surrounding elements are not reported.
Following example shows an element (<timestamp>
) with two
child elements (<date>
and <time>
),
each of which contains character data (2000/03/26
and 10:20
):
<timestamp><date>2000/03/26</date><time>10:20</time></timestamp>
[6] Name ::= [^<>&/]+
In Minimal XML, element names that cannot satisfy the XML
1.0 Name production are reserved. Element
names starting with underscore ('_')
character are also reserved. Use of
character : is reserved for namespace mechanisms.
[Add Example]
[7] CharData ::= [^<>&]
Character data may not contain <, > or, & in literal form.
[Add Example]
[8] CharRef ::= '&#'
[0-9]+ ';'
[TBD]
Following example shows three character references
representing three reserved characters (<,
>,
&
):
Character data my not contain <, >, or & in literal form.
[9] WS ::= (#32 | #9
| #13 | #10)
Space, tab, carriage return, and newline characters are considered to be white spaces in Minimal XML. White spaces surrounding elements are not reported.
[Add Example]
[Ed: simple paragraph explaining why Minimal XMLs simpler information model is important for data interchange applications.]
[TBD]
In Minimal XML information model, everything is a node.
[Ed: below is a Grove-like version of our information model where name, value, and colors and unified into properties. Just experimenting to see if it comes out clear. Unfortunately, I think it reads more like Zen mumbo-jumbo.]
A node has one or more properties.
A property is a node whose nodeName is its property name.
nodeName property is a node whose nodeName is nodeName and whose nodeValue is element name.
nodeValue property is a node whose nodeName is nodeValue and whose value is either a string or a list of nodes.
[TBD: Collect names from SML-DEV statistic page for frequent posters]