Measurement Units in XML Datatypes

Frank Olken and John McCarthy

olken@lbl.gov, jlmccarthy@lbl.gov

Lawrence Berkeley National Laboratory

 

1999 June 7 (Version 0.5c)

 

URL for latest version: http://www.lbl.gov/~olken/mendel/w3c/xml.schema.wg/units/syntax.htm

1.0 EXECUTIVE SUMMARY

This paper discusses how the XML Schema Working Group's current Datatypes proposal can be extended to support structured representation of measurement units and related information. We outline an incremental approach to incorporate measurement units into XML and XML schemas which can integrate a variety of measurement units representations, from instance level units specifications, to detailed formal type specifications of underlying measurement properties, dimensions, coordinate systems.

Time may not permit incorporation of the full proposal outlined here into the initial Proposed Recommendation for XML Datatypes. But our XML Schema Working Group can and should go beyond simple lexical representation of dates, times, and other common measurement units, as proposed in our current drafts. This paper shows how we can begin to do so by simply adding measurement units, etc. as optional facet(s) to the current XML Datatypes draft. We also provide a roadmap for more complete treatment of measurement units in Version 2.0 of the XML Datatypes Recommendation.

We have tried to structure this paper so that it can serve both datatype specialists and those whose interest is more general. Readers with limited time may simply skim section 7 "Detailed Design" but should be sure to read sections 8 "Incremental Implementation" and 9 "Conclusions".

2.0 TABLE OF CONTENTS

3.0 WHO NEEDS MEASUREMENT UNITS?

Nearly all applications which involve descriptions of things in the real world require some sort of measurement units and/or coordinates. Examples range from simple notions of length (for page layout) to mass or volume for bulk commodity transactions, to various types of dateTime specifications (temporal coordinates) to measurement of various concentrations for medical or environmental monitoring. Measurement units are perhaps the oldest standards in human history. Detailed study of measurement semantics (i.e., dimensional analysis) dates back more than a century. Measurement units are a key aspect of data representation and exchange for many important communities, including electronic commerce (of bulk commodities), medicine, engineering (CAD), environmental monitoring, and other scientific applications. Inadequately documented units specifications render the corresponding measurements meaningless and are a chronic source of errors in data exchange.

If our XML Schema Working Group does not begin to address this need in a systematic way, other standards groups (e.g., health care informatics standards organizations such as HL7, CEN TC 251 or ASTM) will begin to proliferate sector-specific idiosyncratic XML solutions that will hinder inter-sector interoperability.


The XML Schema WG's initial datatype proposal already contains one of the more complicated examples of measurement units, namely dateTime. There has also been some discussion about lexical representation of other common measurement units such as currency and length. The question is not whether the XML-Schema Working Group will include measurement units as part of its recommendation, but whether we do so in a systematic, extensible way or in a more limited, piecemeal fashion. We hope that considering these questions in a broader analytic framework of measurement properties and coordinate systems will help facilitate our immediate discussions as well as help us develop more flexible and extensible solutions.

4.0 WHERE ARE MEASUREMENT UNITS NEEDED IN XML?

Potential users of XML measurement units specifications seek several different sorts of capabilities. Many of these have been noted in early proposals to the WG (e.g., XML-Data), in requests from the XSL WG, and in email discussions concerning implicitly structured strings (e.g., "3 inches"). Various users have sought to:

4.1 Binding times

Plausible binding for various aspects (facets) of measures/coordinates include:

Each of these specification modes has its uses. Complete flexibility (instance level specification of everything) might be appropriate for general purpose applications such as catalogs, where the attributes of various catalog elements vary widely in terms of the sorts of measurements specified. Complete units specification in the XML Schema would be appropriate for routine data exchange between two applications. For large scale data exchanges, schema level specification of measurement units offers the advantage of reducing the amount of data transfered. Finally, partial specification of measured data types is appropriate for many applications which understand the type of data to expect, e.g., length, but wish to provide users with the flexibility to specify their preferred units (meters or feet) but not allow units (kilograms) which are incompatible with the type of input desired (length).

We would hope that a flexible type system would permit all of the various modes of units specification. Furthermore, one might envision that a schema could specify a default units specification, which could be replaced by explicit element instance units specification, e.g., in the opening element tag. This would be similar to the XML-Data proposal for element instance measurement unit specification. However, we envision that this units attribute would refer to a qualified name of a units declaration, rather than simply be a funny sort of string.

4.2 Shared Registries for Systems of Units

Ordinary users would rarely, if ever, need to construct or even see the complete specification for particular units and related properties. We envision that a few standards organizations would package complete sets of units declarations (basis units, dimensions/properties, and coordinate definitions) into web accesible XML documents which could be referenced by individual schemas. Thus one might have SI, US English, and (British) Imperial units specification reference documents. More specialized communities might package collections of units for electrical, chemical, or nuclear measurements Reference to such units definitions would be facilitated if namespace prefixes (abbreviations) could be reliably used in attribute and element content. Thus commonly used units would be similar to standard macro packages used in wordprocessors such as LaTeX or troff or standard libraries for many programming languages.

5.0 CONCEPTUAL OVERVIEW

In order to fully specify the various aspects of measurements we consider the following concepts.

We also distinguish between:

Note that measured property and measurement dimensionality are semantic (conceptual) properties of the data, while measurement units are representational properties. Observe that dateTimeStamps (July 3, 1999 3:00 PM PDT) are temporal coordinates, whereas durations of a temporal intervals (e.g., 3 hours 20 minutes 5 seconds) are temporal measures. The distinction between measures and coordinates is semantic, not simply representational. Note that anchored intervals (e.g., the month of May 1999) are usually specified as a pair of temporal coordinates; but anchored intervals are semantically distinct from temporal locations.

6.0 DESIGN ISSUES

6.1 Intensional vs. Extensional Units Specification

Measurement units can be incorporated as a natural extension of our current faceted datatype specification [5]. Measurement units can also be systematically specified and described by XML representations of dimensionality, measures, and coordinates. These ideas are extensions of ideas originally presented in our earlier paper on "Multi-faceted Datatype Specification for XML Schemas" [6].

There are at least two ways to incorporate measurement units into XML schemas and document instances:

  1. Intensional specification: Add new facet(s) to the scalar type system.
  2. Extensional specification: Construct one generic measurement type (struct) comprised of (numeric value, units). thereby specify units at the element instance level.

The advantage of the first (faceted) approach is that units issues are treated orthogonally to all other type issues. Hence issues of numeric precision, scaling, etc. can be addressed independently of units. Furthermore, this approach facilitates specification of units/dimensionality of elements/attributes at XML schema definition time. This is in turn facilitates automatic generation of validation code. Schema specification of measurement units permits more concise data exchange because units designators have been factored out the data exchange (into the schema).

The second approach would be to construct a generic measurement type as a composite type (i.e., a struct) consisting of two components, (numeric value, units). This effectively buries all of the units conversion issues and measurement consistency issues into the methods supported on this generic measurement type. This approach is "extensional" because the units specification is carried along with each instance of the measurement. This sort of approach has often been used to add measurement units to object oriented languages such as C++. It is simpler to implement one extra measurement class, than to extensively change the native type system. Note that units component could either be a string, IDREF, or struct. This imposes less burden on XML processors and the XML type system. However, it has several deficiencies:

Various XML applications may find either approach advantageous. We therefore envision XML Datatypes supporting both approaches.

In general, we favor the faceted approach on the grounds that it appears to offer the best prospects for specifying and verifying constraints on units. However, we recognize that some implementors may object to the additional complexity introduced into the type system. Note that while details of our proposal depend on how one introduces units/dimensionality into XML schemas, the fundamental semantic specifications of measurement units/dimensionality and coordianates (including conversions) are independent of this choice.

6.2 XML Syntax Issues

We propose a strawman syntax for specifying various aspects of measurement data (e.g., units, properties, dimensions, and coordinates). While we have some syntactic preferences (outlined below) the principal purpose of this note is to outline the basic concepts needed to describe measurement data in a systematic fashion by extending our basic mechanism for specifying datatypes via facets. We anticipate that some adjustment of the syntax will occur in the course of harmonizing our syntax with the rest of the XML Datatype and Schema syntax.

There are a number of generic syntax issues which must be considered in proposing XML Schema Syntax:

We will treat each of these matters in turn.

Explicit XML tagging forces all of the syntactic issues into pure XML, leaving only atomic information (e.g., unit names) as attribute or element content. Implicitly structured strings allows a simpler tag system, however, attribute/element content become composite strings with their own syntax, e.g., regular expression, unit expressions, etc. The attraction of explicit XML tagging is that only one parser, for XML, is required. Implicitly structured strings require both an XML parser and a parser for the content strings. Explicit XML tagging is more readily interoperable, but tends to be verbose and sometimes awkward to read. Implicitly structured strings may be more concise and readable. We favor explicit XML tagging as we anticipate that the detailed XML specifications for measurement units, dimensionality, etc. will be tucked away in standard namespace schema fragments and not often read by most users. It also is easier to implement and extend.

As an example consider the following:

Note however, that the example above is the simplest possible case. In general, units designations may include products/ratios of integer powers of base units, e.g., degrees Kelvin/joule/gram for heat capacity.

If the XML Schema WG decidex that implicitly structured strings are necessary, we believe that processors should map the implicitly structured strings into composite XML structs, according to a specified grammar. This approach would assure the delivery of this information to applications in a uniform manner (i.e., as XML Info Set components accessed via the DOM).

On the subject of attribute specifications vs. element specifications, we favor the (nearly) universal use of elements. This provides a measure of uniformity in the syntax, and elements are a more extensible mechanism than attributes. We are mindful here of Microsoft's criticism of the syntactic diversity of the RDF proposals.

The subject of flat vs. nested specifications of units, etc. is similar to that of flat vs. nested specification of datatypes. Programming languages and DDLs tend to favor the more concise nested representation, while grammar driven parsers (e.g., SGML) tend to use the simpler flat specifications. In this paper we primarily use flat specifications, because the result grammar is simpler and the scoping issues easier to understand.

The issue of referencing predefined units, dimensions, etc., is similar to the issue of how to reference predefined datatypes. One can construct an entirely separate symbol table mechanism and scoping rules for units, etc. - a cumbersome and expensive notion. Alternatively, one can use of the mechanisms already provided in XML, e.g., IDREF, XLINKS, Namespaces, .... We favor the latter course, as we expect so will the schema WG for general type extensibility. We presently favor the use of XLINKS as the most general mechanism. We envision the use of namespace designators in the URL specification. An argument could be made that XPTRS (which can specify the span of an XML subtree) would be better than XLLINKS which formally specify only a single location in an XML document. However, we assume that the XLINK will be used to designate the opening tag of an element and thus implicitly refer to the entire element content.

Note that the manner of referring to units specification may interact with binding time and syntax considerations. Thus XML-Data [2] proposed to permit instance-level specification of measurement units of elements via a units attribute in the opening element tag. Obviously, this method is not readily applicable to specification of measurement units for attributes (It would require a second attribute for each measured attribute to specify the units.). Units were designated by means of a string. As noted above, such units designators are sometimes ambiguous. They also require a second (non-XML) parser to parse the units strings. Given our proposed composite specification of units/dimensions, etc. the most practical way to provide instance-level specifications of units would appear to either be the creation of a measurement datatype (value, units) and/or the use of HREF type attributes (preferably qualified names) to reference the units type declaration. As noted above, namespaces would be used to package consistent (and perhaps topical) collections of properties, dimensionalities, units, coordinate specifications.

7.0 DETAILED DESIGN

This section outlines what a full-blown measurement units specification component for XML would include. As noted above, we do not anticipate that most users would write units specifications at this level of detail. However, we believe all of these components are vital. Furthermore, we believe the distinctions we make below could help inform the current debates in the XML Schema WG on related issues. This material constitutes the "backend" specification which would undergird measurement unit and property names.

7.1 MEASURED PROPERTIES

Measured properties are physical properties which can be observed in a quantifiable manner. Examples include mass, length, time, current, luminous intensity, amount, area, speed, energy, power, etc. Measured properties have dimensionality specified with respect to a particular set of basis dimensions. Measured properties are both a necessary and convenient part of the datatype specification, because. dimensionality specifications may not uniquely specify the measured property, e.g., mass**2 length**2 / time**2 may be either torque or energy (work).

An example syntax:

	<measuredProperty name='volumetric_density'>
	<measuredProperty name='volume'>

7.2 MEASUREMENT UNITS

Measurement units are a representational property which designate the units (meters, seconds, kilograms) in which a measured property is expressed. Conversion among measurement units is usually multiplicative. Units are, in effect, scale factors for numeric values of measurements.

It is thus natural and straightforward to specify measurement units simply as a further kind of derived datatype with the addition of a units facet. For example, consider the following datatype declaration:

	<typedcl ID="speedmps" />
		<name> speed in meters/second </name>
		<typefacets>
			<basetype> float32 </basetype>
			<units> <A xml:link="simple"  HREF="#meterspersecond" /> </units>
		<typefacets>
	</typedcl>

Here we have simply added a facet called "units." Now let us see how more complex kinds of units can be defined in a systematic way. Units can either be basis units, derived units, composite, or multi-radix units.

7.2.1 Basis Units

We first need to specify a set of basis units. For example,

	<unitdecl ID="meter" />
		<name> Meter </name>
		<UnitType> Base </UnitType>
		<symbol> M </symbol>
		<dimensionality> <A xml:link="simple"  HREF="#length" /> </dimensionality> 
		<definition>  
			Defined as the some number times the
			wavelength of Cesium-xxx transition 
			between two specified electronic states.
		</definition>  
		<cite> <A xml:link="simple"  HREF="...." /> </cite>
	</unitdecl>

7.2.2 Derived Units

We then can specify various derived units (i.e., used as basis units, but defined in terms of standard SI units).

	<unitdecl ID="inch" />
		<name> Inch </name>
		<UnitType> Derived </UnitType>
		<dimensionality> 
			<A xml:link="simple"  HREF="#length" />
		</dimensionality> 
		<conversionfactor> 0.0254 </conversionfactor>
		<units> 
			<A xml:link="simple"  HREF="#meter" />
		</units>

	</unitdecl>

7.2.3 Composite Units

Composite units are formed as the quotient of the products of powers of the base or derived units. Exponents are always integers, usually from zero to 3. Note that while many proposals, e.g., Schadow [7], allow only a single vector of integer exponents (either positive or negative), we allow a quotient of two sets of exponentiated basis units (mimicking Euclides and LOINC in the medical laboratory results reporting standards). This permits us to denoted concentration mass ratios as distinct from concentration volume ratios or concentration molar ratios. (See further discussion below under dimensions.) For example,

		<unitdecl ID="meterspersecond" >
		<name> MetersPerSecond </name>
		<UnitType> Composite </UnitType>
		<dimensionality>
			<A xml:link="simple"  HREF="#DimSpeed" />
		</dimensionality>
		<units>
			<numerator>
			<unit>
				<radix> <A xml:link="simple"  HREF="#meter" />  </radix>
				<exponent> 1   </exponent>
			</unit>
			</numerator>
			<denominator>
			<unit>
				<radix> <A xml:link="simple"  HREF="#second" />  </radix>
				<exponent> 1   </exponent>
			</unit>
			<denominator>
		</units>
		</unitdecl>

Readers familiar with other implementations of units designators may ask why not simply encode the product of powers (positive or negative) of basis elements. The reason for allowing "mass/mass" or "amount (moles) / amount (moles) is that such "units" are commonly used in recording concentration of substances. The SI standard is to measure concentration in moles/cubic meter. However, nominally dimensionless ratios are quite commonly used in medicine, air and water pollution measurements. Our quotient approach allows one to differentiate between mass ratios and molar ratios. See references cited above. Note that ISO, ANSI, et alia, disapprove of the use of mass/volume/molar ratios for measuring concentration..

7.2.4 Multi-radix units

Multi-radix units include such constructions as [feet, inches] or [hours, minutes, secs] or [degrees, minutes, seconds] (for latitude).

Note that we do not explicitly indicate the radix, since this can be determined from the canonical representation of the component units. Also note that multi-radix units are purely a representational issue.

Multi-radix measurement units have been dealt with similarly in a paper by Lorentzos [8]. Lorentzos called multi-radix measurements "no-metric" measurements, by which he meant that they were not specified as a single number.


	<unitdecl ID="feetinches" />
	<name> feetinches </name>
	<UnitType> MultiRadix </UnitType>
	<multiradix>
		<radixcomponent> <A xml:link="simple"  HREF="#feet" /> </radixcomponent>
		<radixcomponent> <A xml:link="simple"  HREF="#inches" /> </radixcomponent>
	</multiradix>
	</unitdecl>

7.2.5 Logarithmic Units

Some units, such as decibels (dB) for noise, or dBm for power are actually logarithms of the ratio of the measured noise/power with respect to a standard amount of noise/power. Hence, dBm, is the log of the ratio of the power used to one milliwatt. We suggest that this would require specification of the logarithmic units as a kind of derived unit, with both a reference level and a log transform specification.

7.3 DIMENSIONS

Just providing a mechanism to specify standard and derived measurement units would be quite helpful, but we can go much further. In order to support validity checking and more complex transformations, we can specify measurement properties and units in terms of the fundamental measurement dimensions on which they are based. Note that each of the above examples includes specification of one or more dimensions, which themselves can be specified in terms of XML structures.

Dimensionality is a semantic property which specifies the fundamental nature of the property being measured (e.g., mass, length, time). Dimensionality is a semantic concept, e.g. length, whereas units are a representational issue, e.g., meters. Furthermore, units (such as ton) may be ambiguous (e.g., units of mass, refrigeration, or explosive energy) unless the dimensionality or property is also specified.

Note that in terms of ISO 11179, dimensionality would be a property of a data element concept, whereas units are a property of a data element (e.g., database column). It is conventional to express compound dimensions/units in terms of quotients of products of basis dimensions/units. Note that in the syntax below we allow both numerators and denominators for the dimensions/units. This is to permit the explicit specification of mass and molar ratios, which would otherwise be represented as dimensionless quantities.

[NOTE: Barry Taylor (NIST) has pointed out that dimensionality may depend on the system of units. Thus in SI units current is a basis dimension and charge is defined in terms of current, current*time. In the CGI-ESU system of units, charge is a basis dimension, and current is defined as charge/time. At some point in the future, we could extend the mechanisms proposed here to include different measurement systems (e.g., SI vs. CGI-ESU).]

The SI system of units is in use throughout the world, except for the United States. In the U.S. conventional (inches, pounds, quarts) are all now defined exactly in terms of SI units, i.e., as derived units . units are defined in terms of the SI units and metric (SI) usage is mandated for government agencies. In the SI (standard Metric) system of units the basis dimensions and units are:

Dimension Unit
Mass kilogram
Time second
Length meter
Amount of Substance mole
Current Ampere
Luminous Intensity candela
Below is a suggested syntax which characterizes dimensions in terms of simple XML contructs.

7.3.1 Basis Dimension Syntax

As with units, we first need to specify the basis dimensions.

	<dimensiondecl ID="length" />
		<name> length </name>
		<DimensionType> Base </DimensionType>
		<definition>  
			A measurement of distance.  
		</definition>  
		<cite> <A xml:link="simple"  HREF="...." /> </cite>
		<exampleunits>
			Meters, ....
		</exampleunits>
	</dimensiondecl>

7.3.2 Composite Dimension Syntax

Composite dimensions can then be built on top of our basis dimensions. For example,

		<dimensiondecl ID="DimSpeed" >
		<name> Speed  </name>
		<DimensionType> Composite </DimensionType>
		<dimensionality>
			<numerator>
			<dimension>
				<radix> <A xml:link="simple"  HREF="#length" />  </radix>
				<exponent> 1   </exponent>
			</dimension>
			</numerator>
			<denominator>
			<dimension>
				<radix> < XLINK HREF="#time" >  </radix>
				<exponent> 1   </exponent>
			</dimension>
			<denominator>
		</dimensionality>
		</dimensiondecl>

7.3.3 Dimensionality Binding

For each measured property we need to specify its dimensionality with respect some set of basis dimensions. Hence, we need to specify (and name) the basis set. The most commonly used basis dimension set are the SI (standard metric dimensions) given above. Thus measured properties will have attributes/subelements which reference the basis dimension set and a composite dimensionality specification. While some systems of units employ nonstandard basis dimensions (charge instead of current) we expect that most applications will use the SI dimensions.

7.4 COORDINATES

We also need to differentiate between measures (scalar quantities such as the length of a spatial or temporal interval) and COORDINATES, a position in a spatial, temporal, or temperature coordinate system.

Coordinates refer to a point in a temporal, spatial, or temperature coordinate system. They are a location rather than a magnitude. Conversion among coordinates is often an affine transformation (change of both offset and scale). Coordinates are specified relative to a frame of reference, which specifies an origin and (for vectors) a set of basis vectors.

The significance of this distinction is twofold. Measures and coordinates are not directly comparable, i.e., it should be a violation of the type system to ask if a temporal duration is less than a dateTimestamp. Secondly, conversion of coordinates usually involves at least affine transformations (i.e., y=a+bx), whereas unit conversions are usually merely multiplication by conversion factors. Coordinates always require some specification of the origin (and sometimes direction) of the coordinate system with respect to some enclosing coordinate system. We have not yet formalized that aspect.

Note we can add/subtract two measures to get a measure result. We can add a measure to a coordinate to get a coordinate. We can subtract two coordinates to get a measure.

Examples of some common measures and their corresponding coordinates are shown in the table below:

Measure Coordinate
duration dateTimestamp
length position
temperature difference absolute temperature

Thus a temporal duration (length of a temporal interval) would might be specified as 20 seconds, whereas a datatimestamp would given as "1999-04-14 16:00:00 PDT".

Coordinates are often specified as vectors, e.g., Cartesian coordinates (x,y,z) or spherical coordinates (latitude, longitude, radius). Such coordinate systems, e.g., Cartesian, require that we specify the origin of the coordinate system (translation), and directions (rotation) of the basis elements with respect to some parent coordinate system. Because XML Schema does not yet support vectors or matrices, we omit further dicussion of this topic.

7.4.1 Coordinate Syntax

Coordinates require that we specify (at least) the origin of the coordinate system. Although "coordinateness" is actually a semantic property, we follow conventional practice and include a units designation (which we need for the origin offset). We assume the coordinate units are the same as the offset units. [Editorial note: This still needs some thought.]

	<unitdecl ID="KelvinTemp" />
		<name> Kelvins </name>
		<UnitType> Coordinate </UnitType>
		<dimensionality> 
			<A xml:link="simple"  HREF="#temperature" /> 
		</dimensionality>
		<units>
			<A xml:link="simple"  HREF="#Kelvins" />
		</units>
		<coordinates>
			<origindescription>
			Absolute zero temperature - no molecular motion.
			</origindescription>
			<originoffset> 0.0 </originoffset>
		</coordinates>
	</unitdecl>

7.4.2 Calendrical Coordinates

Calendrical coordinates (dates) are temporal coordinates (dateTimestamps) specified with respect some calendar system, which typically takes the form of a multi-radix specification (year, month, day, hour, minute, seconds), wherein some of the radix component are irregular (not always the same length, e.g., months). More subtle issues arise with respect to timestamps given in Universal Coordinated Time, to which an occasional leap second is added (relative to Universal Atomic Time) to some minutes to maintain correspondence with solar time. Some applications need to support Universal Atomic Time (i.e., seconds since some fixed time) for astronomical, satellite tracking/imagery, and GPS applications.

Note that we do not include an exact specification of date/time conversion to/from calendrical systems. Note also that even datestamps (without timestamps) need a time zone designation to indicate exactly which 24 hour period is referred to. Commonly, dates are given with only implicit time zone specification.

In the example below we (partially) specify a Gregorian Date/timestamp, with the implicit specification that it is in the Common Era, a.k.a. Anno Domini (A.D. in Christian countries). The <calendrical> tag is used to denote a calendar coordinate system, vs. a spatial coordinate system.
	<coordinatedecl ID="GregorianDateTimeInCE" >:	
		<name> GergorianDateTimeInCE </name>
		<calendrical>
		<sequence>
			<radixcomponent> <A xml:link="simple"  HREF="#GregorianYear" /> </radixcomponent>
			<radixcomponent> <A xml:link="simple"  HREF="#GregorianMonth" /> </radixcomponent>
			<radixcomponent> <A xml:link="simple"  HREF="#GregorianDay" /> </radixcomponent>
			<radixcomponent> <A xml:link="simple"  HREF="#GregorianHour" /> </radixcomponent>
			<radixcomponent> <A xml:link="simple"  HREF="#GregorianMinute" /> </radixcomponent>
			<radixcomponent> <A xml:link="simple"  HREF="#GregorianSecond" /> </radixcomponent>
		</sequence>
		</calendrical>
	</coordinatedecl>:	

8.0 INCREMENTAL IMPLEMENTATION

Some members on the XML Schema WG have suggested that we defer the proposed specification of measurement units. Partly, this is due to the press of other matters before the XML Schema WG, the unfamiliarity of some WG members with respect to detailed specification of measurement units, the complexity of the resulting type system, the absence of a completely detailed specification, and the absence of many major commercial implementations. Note that there have been experimental implementations of many aspects of the proposed design. In particular, Funston, et al., have built composite types which include units specifications.

In an effort to allay some of these concerns, we propose an incremental approach to incorporating measurement units into the XML Datatypes proposal. We envision the following phases:

9.0 CONCLUSIONS

Measurement data arises in many XML applications, including eCommerce, medicine, engineering and construction, page layout, etc. Neasurement data requires some sort of units designation to be meaningful. Such units designations can either be provided with each instance (e.g., via a composite element structure with a numerical value and units subelements), or specified as part of the type specification for an element or attribute.

The XML Schema WG's initial datatype proposal already contains one of the more complicated examples of measurement units, namely dateTime. There has also been some discussion about lexical representation of other common measurement units such as currency and length. The question is not whether the XML-Schema Working Group will include measurement units as part of its recommendation, but whether we do so in a systematic, extensible way or in a less consistent, ad-hoc way.

In this paper we have discussed how to extend XML base datatype definitions to include measurement units. Measurement units are not simply a representational question, but are used to scale the measurement of a particular property, which can be defined in terms of its dimensionality with respect to a standard set of basis dimensions, e.g., mass, length, time , ... Dimensionality reflects the semantic nature of a property, and generates an equivalence classes of inter-convertible units.

We have tried to show that inclusion of measurement units, etc. into the XML type system is advantageous, in terms of clearer semantic specifications and better data validation. We have suggested that this could be done by adding additional facets to the basic type specification mechanism proposed for the XML Schema Language.

We also distinguished between measures (magnitudes of properties) and coordinates (positions in time, space, or temperature scales). Datetimestamps can be seen as a funny kind temporal coordinate system.

We have suggested a possible XML Syntax for these specificiations. These constructs are sufficient to deal both SI units (metric) and U.S. conventional (a.k.a. English units) which are actually legally defined as derived units from SI units.

We have also suggested an incremental approach to the specification and validation of measurement units, etc. which we believe will facilitate their early adoption.

We believe that units specifications are essential part of the specification of data elements. Essential for the understanding and processing of data and essential for reliable data exchange. Failure to address the problem will merely shift the problem to other standards venues. Leaving the matter to individual users will simply continue the current unreliable chaotic state of affairs. We believe that addressing this problem in the XML Schema Language will permit better solutions and facilitate interoperable XML applications.

10.0 ACKNOWLEDGEMENTS and DISCLAIMERS

This work is supported by the U.S. Environmental Protection Agency, Office of Information Resource Management and Superfund Office as part of EPA efforts to improve the documentation of data and to facilitate finding and integrating of data from multiple sources. The EPA has recognized that detailed documentation of measurement units for data elements is essential.

Opinions expressed in this paper are solely those of the authors, and do not necessarily reflect policy positions of Lawrence Berkeley National Laboratory, U.S. Environmental Protection Agency, or the U.S. Department of Energy.

11.0 DOCUMENT STATUS

This document is preliminary, we anticipate revisions. Comments on the semantics, syntax, and presentation are sought. Please respond to the authors. Additional reference to other standards for units designation are also welcomed. Please send comments to the authors via email.

Known problems

12.0 BIBLIOGRAPHY

  1. ISO/IEC 11404 Language Independent Datatypes
    URL: http://www.w3.org/XML/Group/1998/09/iso11404/iso11404.19980924.html
  2. Taylor, Barry. Guide for the Use of the International System of Units (SI) NIST Special Publication 811, 1995 Edition U.S. National Institute of Standards and Technology
  3. ISO TC-12. Quantities and units, ISO Standards Handbook, International Organization for Standardization, 345 pages, 3rd edition, Geneva, 1993, ISBN 92-67-10185-4. (Available in the United States from ANSI) (Contains multiple ISO standards)
  4. ISO TC-12. ISO 31:1992 Parts 0-13, Quantities and Units International Organization for Standardization, Geneva, Switzerland, 1992
  5. Paul Byron and Ashok Mahlotra, XML Datatype Language 16 April, 1999
  6. Frank Olken and John McCarthy, Multi-faceted Datatype Specification for XML Schemas
  7. 5 March, 1999
  8. Monica Gayle Funston, Walter Gerstle, and Malcolm Panthaki, "Quantity, Revisited: An Object-Oriented Reusable Class", URL: http://www.arc.unm.edu/CoMeT/publication/quantity.html, 1998(?)
  9. Schadow, G; McDonald, CJ; Suico, JG; Föhring, U; Tolxdorff, T. "Units of measure in clinical information systems", Journal of the American Medical Informatics Association, 1999 Mar-Apr, vol. 6 number 2, pages:151-62.
  10. Lorentzos, N.A. DBMS support for nonmetric measurement systems. IEEE Transactions on Knowledge and Data Engineering, vol.6, (no.6), Dec. 1994. p.945-53.
  11. Gruber, T.R.; Olsen, G.R. An ontology for engineering mathematics. in (Edited by: Doyle, J.; Sandewall, E.; Torasso, P.) Proceedings of 4th International Conference on Principles of Knowledge Representation and Reasoning (KR'94), Bonn, Germany, 24-27 May 1994.) San Francisco, CA, USA: Morgan Kaufmann Publishers, 1994. p.258-69 (See also URL: http://www-ksl.stanford.edu/knowledge-sharing/papers/engmath.html )
Maintained by Frank Olken at Lawrence Berkeley National Laboratory. olken@lbl.gov Last updated: June 7, 1999 7:45 PM PDT