SGML: Data Types in XML

Data Types in XML

From w3c-sgml-wg-request@w3.org Thu May 15 23:18:18 1997
From: Jean Paoli <jeanpa@microsoft.com>
To: "'w3c-sgml-wg@w3.org'" <w3c-sgml-wg@w3.org>
Subject: SD3 - Data Types
Date: Thu, 15 May 1997 21:14:17 -0700


SD3 - Data Types
[See also provisionally:
XML for Structured Data - http://207.201.154.232/murray/specs/xml-sd.html,
provided in HTML by Murray Altheim (altheim@eng.sun.com)]
-------------------------------

For many data processing applications, knowing the semantics of an
element is not enough.  Certain typical types of data, such as numbers
and dates, form the basic measurements of whatever it is their
elements represent. These types imply a typical sort order, an
opportunity for efficient storage, and a grammar beyond XML.

Proposal: (With special thanks to Tim Bray's paper "http://www.textuality.com/xml/typing.html"). Define
a standard attribute, XML-TYPE, based on SQL data types. Valid values
are CHAR, VARCHAR, INTEGER, FLOAT, DECIMAL, DATE, TIME and
TIMESTAMP. Numbers and dates, times, etc.  will have implied text
representations given by some t.b.d. ISO standards (e.g. ISO
8601::1988). At a minimum, float formats must allow both scientific
and non-scientific formats (e.g. "12.34" and "0.1234E+2".) If no
XML-TYPE attribute is given, VARCHAR is default.

These data types have the effect of subclassing the contents of any
element whose content type is ANY or MIXED. (More generally, I think
of xml-type as inline subclassing of the contents of any element; the
subclassing must be compatible with the element's content spec.)

Many of these types can be qualified by further attributes. Here is a
first-cut:

<!ELEMENT VARCHAR (#PCDATA)>
	<!ATTLIST CHAR XML-MAXSIZE CDATA> 

<!ELEMENT CHAR (#PCDATA)>
	<!ATTLIST CHAR 
				XML-MAXSIZE CDATA 
				XML-SIZE CDATA #IMPLIED >

<!ELEMENT DECIMAL (#PCDATA)>
	<!ATTLIST DECIMAL 
				XML-DIGITS-R CDATA 4 
				XML-DIGITS-TOTAL CDATA 18 
				XML-SCALE CDATA 0 >

<!ELEMENT INTEGER (#PCDATA)>
	<!ATTLIST INTEGER
				XML-DIGITS-R CDATA #FIXED 0 
				XML-DIGITS-TOTAL CDATA 9 >

<!ELEMENT FLOAT (#PCDATA)>
	<!ATTLIST FLOAT XML-BITS CDATA 64 >

<!ELEMENT TIME (#PCDATA)>

<!ELEMENT DATETIME (#PCDATA)>
	<!ATTLIST DATETIME XML-ZONE ( false | true ) true >

<!ELEMENT DATE (#PCDATA)>
	<!ATTLIST DATE XML-ZONE ( false | true ) true >

Here is an example of data types in action:

<LINEITEM XML-ID="L1">
	<NAME>I, the Jury</NAME>
	<AUTHOR>Spillane, Mickey</AUTHOR>
	<PRICE xml-type="DECIMAL">2.95</PRICE>
	<PUBLICATION-DATE xml-type="DATE">19470101</PUBLICATION-DATE>
	<MANF-CODE><*xml-type xml-size=12>CHAR</>451-AE1396 </MANF-CODE>
	</LINEITEM >


-----------------------------