[This local archive copy is from the official and canonical URL, http://www.lbl.gov/~olken/mendel/w3c/type.extensibility.html; please refer to the canonical source document if possible.]


Extensible Type Specifications for RDF and XML Schemas

Frank Olken
Lawrence Berkeley National Laboratory
September 11, 1998

DRAFT 1.0

Introduction

This document addresses issues of extensible type specifications for use in XML and RDF schemas, i.e., schemas which describe information encoded as either XML or RDF documents. XML documents are described by XML schemas such as XML-Data and DCD (Document Content Definitions). RDF documents are described by W3C RDF schemas.

We are particulary concerned with two issues:

A Common Type System for XML and RDF?

The immediate origin of this work was a perceived need by the RDF Schema Working Group for an explicit type system for use By RDF Schemas of RDF documents. Meanwhile Microsoft, et al. had proposed XML-Data as an attempt to recast DTDs into XML and to provide a type system for XML documents. Subsequently, Microsoft, et al. have proposed DCD (Data Content Definition) which is similar in intent to XML-Data. However, DCD encodes the schema definition as an RDF document.

In various discussions, it has become clear that many W3C participants believe that both XML and RDF need extensible type systems, and that it would be preferable if these two type systems were the same, or at least had the same basic types. We are proceeding under this assumption and are therefore we are attempting both sets of requirements.

In particular, we view RDF as the basis for our proposed type specifications, e.g., as does DCD.

Basic Type Specifications

A basic type specification is a tuple of type attributes, which specify the characteristics of basic data types:

A Taxonomy of Type Extensibility

Types as Objects

In most languages types are not full rank objects, and no computations or queries on them are possible. We believe that treating types as full rank objects will enhance extensibility of the type system and facilitate implementation of extensible type systems. Treating types as objects (resources in RDF jargon) should fit nicely into RDF. Potential ramifications of such a decision should be carefully discussed, especially with members of the programming language communities, e.g,. ML, which have experience/knowledge of such practices.

Type Naming

Do we allow unnamed types: e.g., composite or subtypes? Or do we require all types to be named explicity? This has implications for specification of recursively constructed composite types.

Scope of Type Names

What is the scope of named types? global, XML namespace, or within enclosing type declarations?

Syntax issues

Exclusive use of XML elements seems to be a cleaner, more systematic syntax and more readily extensible than the use of XML attributes. However, it is typically more verbose. Note that XMI (the OMG XML metadata interchange format) uses data elements solely (?). However, DCD (Data Content Definition) allows either encoding.

Nested specification of types follows conventions widely employed in programming languages, often these nested types are not named (but they could be). The flat grammatical approach seems less natural to programmers accustomed to nested hierarchical specifications of data types.

We believe that the factorization of basic type specifications into multiple attributes can be easily encoded in RDF, by treating a "type" as a resource, described by multiple "properties". As noted above, we view RDF (rather than raw XML) as the basis for our proposed syntax. This is the position of the DCD Note.

ISO 11404 - Language Independent Data Types

ISO Standard 11404 on Language Independent Data Types addresses many of the issues raised in our paper. We believe that it should be examined as a possible starting point for development of a type system for XML/RDF. In particular, ISO 11404 makes some attempt to decompose the specification of basic types into a set of attributes (semantic type, range constraints, storage representation, etc.)

ISO 11404 was originally developed for use by various programming language, database, and data exchange standards efforts in an attempt to bring about some convergence on data type standardization. It includes a large number of base types, and a number of type constructors.

The ISO 11404 specificaton can be found at ..... at least for W3C members (and other standards developers). Because the material is copyrighted, we are as yet unable to provide legal public access to the document online. Hardcopies are available from ANSI in the United States. We have begun discussions with ISO concerning public (web) release of this standard.

External Type System Compatibility

We believe it important to be (largely) interoperable with the most popular existing type systems used to encode data. Examples include SQL datatypes, ASN.1 datatypes, C, C++, Java, ODMG data types, ... We are primarily concerned (at present) with congruence of basic types: integer, string, etc.

Below we give brief descriptions of each type system. /

Bibliographic Resources on Type Systems

Maintained by Frank Olken at Lawrence Berkeley National Laboratory. olken@lbl.gov Last updated: September 11, 1998