[This local archive copy is from the official and canonical URL, http://www.lbl.gov/~olken/mendel/w3c/type.extensibility.html; please refer to the canonical source document if possible.]

Extensible Type Specifications for RDF and XML Schemas

Frank Olken
Lawrence Berkeley National Laboratory
September 11, 1998

DRAFT 1.0

Introduction

This document addresses issues of extensible type specifications for use in XML and RDF schemas, i.e., schemas which describe information encoded as either XML or RDF documents. XML documents are described by XML schemas such as XML-Data and DCD (Document Content Definitions). RDF documents are described by W3C RDF schemas.

We are particulary concerned with two issues:

decomposing the descriptation of basic types
extensibility of type specifications

A Common Type System for XML and RDF?

The immediate origin of this work was a perceived need by the RDF Schema Working Group for an explicit type system for use By RDF Schemas of RDF documents. Meanwhile Microsoft, et al. had proposed XML-Data as an attempt to recast DTDs into XML and to provide a type system for XML documents. Subsequently, Microsoft, et al. have proposed DCD (Data Content Definition) which is similar in intent to XML-Data. However, DCD encodes the schema definition as an RDF document.

In various discussions, it has become clear that many W3C participants believe that both XML and RDF need extensible type systems, and that it would be preferable if these two type systems were the same, or at least had the same basic types. We are proceeding under this assumption and are therefore we are attempting both sets of requirements.

In particular, we view RDF as the basis for our proposed type specifications, e.g., as does DCD.

Basic Type Specifications

A basic type specification is a tuple of type attributes, which specify the characteristics of basic data types:

Semantic Class: integer, real number representation, string, date, date-time (?)
Storage length: "n" octets
Transfer representation: e.g., floating point formats, character set encodings ....
Order relations: partial, total, none ....

A Taxonomy of Type Extensibility

Subtyping - e.g., more restrictive types, inheritance
Subtyping constructs a new type which is a further restriction of a data type, e.g., positive integers are a subtype of integers. Subtyping is one of the most common type extensions. It is supported in most programming languages and object oriented modeling methodologies, usually via "isa" relationships, and some inheritance mechanism. Note that range constraints (esp. of basic numeric types) are very common.

Inheritance of type specifications (and methods) is a major area of dispute, especially as regards single vs. multiple inheritance. We believe that multiple inheritance is quite important for concise specification of data type constraints.
Composite Types - a.k.a. collection types
Composite types are constructed via some sort of aggregation of more elementary types. Relational databases have sets or bags of tuples (aggregation) of basic types. Object oriented programming languages and DBMSs have additional composite type constructors:
- aggregation (a.k.a. tuple, struct or record)
- set of (no duplicates)
- bag of (duplicates allowed)
- sequence of (a.k.a. list)
- vector of (single dimension array)
- array of (possibly multidimensional)
There is also the question of whether such composite types can be constructed recursively.

Note that XML does not directly specify the type of composite structure. RDF currently supports some (bag, ...) of the type constructors, but not others.
Extensible type specifications
Two types of extensible type specifications are of interest:
- additional attribute values, e.g., new character set encodings
- additional type attributes, e.g., measurement units or dimensionality, or coordinate system
Note that the second sort of type specification extensibility is typically not provided by most type systems. These are (effectively) new type attributes which are orthogonal to existing type attributes.

There are also funny sorts of composite types (aggregates) used to denote various sorts of measurements such as length (feet and inches), weight (pounds and ounces), or latitude (degreees, minutes, seconds, north/south). Note that these composite constructions are equivalent to simple scalars (plus units). Arguably, they should be treated as alternative representations of the scalar forms. In any case, they are ubiquitous in electronic commerce, GIS, etc. and need to be addressed at some point.

Types as Objects

In most languages types are not full rank objects, and no computations or queries on them are possible. We believe that treating types as full rank objects will enhance extensibility of the type system and facilitate implementation of extensible type systems. Treating types as objects (resources in RDF jargon) should fit nicely into RDF. Potential ramifications of such a decision should be carefully discussed, especially with members of the programming language communities, e.g,. ML, which have experience/knowledge of such practices.

Type Naming

Do we allow unnamed types: e.g., composite or subtypes? Or do we require all types to be named explicity? This has implications for specification of recursively constructed composite types.

Scope of Type Names

What is the scope of named types? global, XML namespace, or within enclosing type declarations?

Syntax issues

Use of XML arguments vs. XML elements
Use of flat DTD-like grammar specifications vs. nested hierarchical type structure specifications.
Relation to RDF

Exclusive use of XML elements seems to be a cleaner, more systematic syntax and more readily extensible than the use of XML attributes. However, it is typically more verbose. Note that XMI (the OMG XML metadata interchange format) uses data elements solely (?). However, DCD (Data Content Definition) allows either encoding.

Nested specification of types follows conventions widely employed in programming languages, often these nested types are not named (but they could be). The flat grammatical approach seems less natural to programmers accustomed to nested hierarchical specifications of data types.

We believe that the factorization of basic type specifications into multiple attributes can be easily encoded in RDF, by treating a "type" as a resource, described by multiple "properties". As noted above, we view RDF (rather than raw XML) as the basis for our proposed syntax. This is the position of the DCD Note.

ISO 11404 - Language Independent Data Types

ISO Standard 11404 on Language Independent Data Types addresses many of the issues raised in our paper. We believe that it should be examined as a possible starting point for development of a type system for XML/RDF. In particular, ISO 11404 makes some attempt to decompose the specification of basic types into a set of attributes (semantic type, range constraints, storage representation, etc.)

ISO 11404 was originally developed for use by various programming language, database, and data exchange standards efforts in an attempt to bring about some convergence on data type standardization. It includes a large number of base types, and a number of type constructors.

The ISO 11404 specificaton can be found at ..... at least for W3C members (and other standards developers). Because the material is copyrighted, we are as yet unable to provide legal public access to the document online. Hardcopies are available from ANSI in the United States. We have begun discussions with ISO concerning public (web) release of this standard.

External Type System Compatibility

We believe it important to be (largely) interoperable with the most popular existing type systems used to encode data. Examples include SQL datatypes, ASN.1 datatypes, C, C++, Java, ODMG data types, ... We are primarily concerned (at present) with congruence of basic types: integer, string, etc.

Below we give brief descriptions of each type system. /

ISO11404
ASN.1 - an ISO data interchange standard
SQL 2
SQL 3?
Java

Bibliographic Resources on Type Systems

OO Type Theory by Laurent Dami
Work related to "A Theory of Objects" by Cardelli, et al.
A Theory of Objects by Luca Cardelli, et al.
The Type Forum a mailing list on type theory

Maintained by Frank Olken at Lawrence Berkeley National Laboratory. olken@lbl.gov Last updated: September 11, 1998