[This local archive copy is from the official and canonical URL, http://www.lbl.gov/~olken/mendel/w3c/type.extensibility.html; please refer to the canonical source document if possible.]
Frank Olken
Lawrence Berkeley National Laboratory
September 11, 1998
DRAFT 1.0
This document addresses issues of extensible type specifications for use in XML and RDF schemas, i.e., schemas which describe information encoded as either XML or RDF documents. XML documents are described by XML schemas such as XML-Data and DCD (Document Content Definitions). RDF documents are described by W3C RDF schemas.
We are particulary concerned with two issues:
The immediate origin of this work was a perceived need by the RDF Schema Working Group for an explicit type system for use By RDF Schemas of RDF documents. Meanwhile Microsoft, et al. had proposed XML-Data as an attempt to recast DTDs into XML and to provide a type system for XML documents. Subsequently, Microsoft, et al. have proposed DCD (Data Content Definition) which is similar in intent to XML-Data. However, DCD encodes the schema definition as an RDF document.
In various discussions, it has become clear that many W3C participants believe that both XML and RDF need extensible type systems, and that it would be preferable if these two type systems were the same, or at least had the same basic types. We are proceeding under this assumption and are therefore we are attempting both sets of requirements.
In particular, we view RDF as the basis for our proposed type specifications, e.g., as does DCD.
A basic type specification is a tuple of type attributes, which specify the characteristics of basic data types:
Subtyping constructs a new type which is a further restriction of a data type, e.g., positive integers are a subtype of integers. Subtyping is one of the most common type extensions. It is supported in most programming languages and object oriented modeling methodologies, usually via "isa" relationships, and some inheritance mechanism. Note that range constraints (esp. of basic numeric types) are very common.
Inheritance of type specifications (and methods) is a major area of dispute, especially as regards single vs. multiple inheritance. We believe that multiple inheritance is quite important for concise specification of data type constraints.
Composite types are constructed via some sort of aggregation of more elementary types. Relational databases have sets or bags of tuples (aggregation) of basic types. Object oriented programming languages and DBMSs have additional composite type constructors:
There is also the question of whether such composite types can be constructed recursively.
Note that XML does not directly specify the type of composite structure. RDF currently supports some (bag, ...) of the type constructors, but not others.
Two types of extensible type specifications are of interest:
Note that the second sort of type specification extensibility is typically not provided by most type systems. These are (effectively) new type attributes which are orthogonal to existing type attributes.
There are also funny sorts of composite types (aggregates) used to denote various sorts of measurements such as length (feet and inches), weight (pounds and ounces), or latitude (degreees, minutes, seconds, north/south). Note that these composite constructions are equivalent to simple scalars (plus units). Arguably, they should be treated as alternative representations of the scalar forms. In any case, they are ubiquitous in electronic commerce, GIS, etc. and need to be addressed at some point.
In most languages types are not full rank objects, and no computations or queries on them are possible. We believe that treating types as full rank objects will enhance extensibility of the type system and facilitate implementation of extensible type systems. Treating types as objects (resources in RDF jargon) should fit nicely into RDF. Potential ramifications of such a decision should be carefully discussed, especially with members of the programming language communities, e.g,. ML, which have experience/knowledge of such practices.
Do we allow unnamed types: e.g., composite or subtypes? Or do we require all types to be named explicity? This has implications for specification of recursively constructed composite types.
What is the scope of named types? global, XML namespace, or within enclosing type declarations?
Exclusive use of XML elements seems to be a cleaner, more systematic syntax and more readily extensible than the use of XML attributes. However, it is typically more verbose. Note that XMI (the OMG XML metadata interchange format) uses data elements solely (?). However, DCD (Data Content Definition) allows either encoding.
Nested specification of types follows conventions widely employed in programming languages, often these nested types are not named (but they could be). The flat grammatical approach seems less natural to programmers accustomed to nested hierarchical specifications of data types.
We believe that the factorization of basic type specifications into multiple attributes can be easily encoded in RDF, by treating a "type" as a resource, described by multiple "properties". As noted above, we view RDF (rather than raw XML) as the basis for our proposed syntax. This is the position of the DCD Note.
ISO Standard 11404 on Language Independent Data Types addresses many of the issues raised in our paper. We believe that it should be examined as a possible starting point for development of a type system for XML/RDF. In particular, ISO 11404 makes some attempt to decompose the specification of basic types into a set of attributes (semantic type, range constraints, storage representation, etc.)
ISO 11404 was originally developed for use by various programming language, database, and data exchange standards efforts in an attempt to bring about some convergence on data type standardization. It includes a large number of base types, and a number of type constructors.
The ISO 11404 specificaton can be found at ..... at least for W3C members (and other standards developers). Because the material is copyrighted, we are as yet unable to provide legal public access to the document online. Hardcopies are available from ANSI in the United States. We have begun discussions with ISO concerning public (web) release of this standard.
We believe it important to be (largely) interoperable with the most popular existing type systems used to encode data. Examples include SQL datatypes, ASN.1 datatypes, C, C++, Java, ODMG data types, ... We are primarily concerned (at present) with congruence of basic types: integer, string, etc.
Below we give brief descriptions of each type system. /