XTech 2000 Presentation
What XML Schema Designers Need to Know About Measurement Units
Frank Olken, Computer Scientist , Lawrence Berkeley National Laboratory olken@lbl.gov
John McCarthy, Computer Scientist, Lawrence Berkeley National Laboratory jlmccarthy@lbl.gov
Biography: Frank Olken is a database researcher at Lawrence Berkeley National Laboratory. He currently works on metadata and related standards (ISO 11179, XML Schema, XML Query Language) and various informatic issues (data exchange, distributed systems, etc.) related to electric power systems. His other interests include statistical data management, sampling from database, OLAP, genomic databases, workflow management, ....
John McCarthy is a database administrator and researcher at Lawrence Berkeley National Laboratory. He currently also works on metadata standards (ISO 11179, XML Schema Language).
Abstract: On September 23, 1999, NASA lost the Mars Climate Observer spacecraft (at a cost of $125 Million). "The 'root cause' of the loss of the spacecraft was the failed translation of English units into metric units in a segment of ground-based, navigation-related mission software, as NASA has previously announced," said Arthur Stephenson, Chairman of the Mars Climate Orbiter Mission Failure Investigation Board.
This loss reminds us of the importance of measurement units for data exchange in various fields (navigation, engineering, architecture, medicine, science, and commerce). Misunderstandings of measurement units are not simply expensive - in medical settings and aircraft navigation they can be fatal. As the Failure Investigation Board noted, NASA contractors failed to fully specify the interface (i.e., measurement units) between the two software packages. While NASA contemplates various management review mechanisms to prevent recurrence of such problems, we believe the appropriate answer is to effectively extend the type systems used to specify data interchanges (e.g., XML Schemas for XML documents) to include measurement unit information. Hence, it would be possible to automatically detect (and often to automatically resolve via conversion) inconsistent measurement units, much as we now routinely detect the need to convert among integer, single precision floating point and double precision floating point numbers in computer programs.
We review the classical semantic theory of physical measurement units known as dimensional analysis. This theory was developed over the past 100 years by physicists, chemists and mathematicians. Each kind of measurement has a "dimensionality", e.g., mass, length, time, length/time (i.e., speed), etc. The dimensionality may be specified as the ratio of two monomials in which the variables (indeterminants) represent the basis dimensions e.g. in the SI system of units, mass, length, time, current, .... Thus dimensionality for speed would be (Mass**1/Length**1). The dimensionality for energy is (Mass**1 * Length**2 / Time**2 ). Note that dimensionality is a partial specification of the semantics of a measurement. Also note that given a specified list of basis dimensions, we can specify the dimensionality of a measurement as 2 exponent vectors, where we have restricted the exponents to be non-negative.
In contrast measurement units are representational aspect of a measurement. Typically they are specified as a scale factor with respect to a set of basis units (one basis unit for each basis dimension). Thus 1 inch is defined as 2.54 * 10**-2 meters. The reason for using a ratio of two monomials for dimensionality is that this permits us to distinguish "dimensionless" quantities, such as mass ratios (mass/mass), molar ratios (moles/moles), or volume ratios (length**3/length**3). Such a requirement arises with respect to dimensionless (ratiometric) specification of concentrations.
Most common dimensionally consistent unit conversions can be accomplished by dividing by the ratio of the unit scale factors with respect to a common set of basis units. However, dimensionally inconsistent conversions, e.g., mass to volume are quite common in commercial applications. Such conversions are more complex, relying on knowledge of materials properties (such as density) which vary with the condition (e.g., temperature) of the material.
Not all unit conversions are simple multiplications. Temperature conversions are affine transformations, accounting for differing origins of the temperature scales. Also note that "temperature coordinates" and "temperature intervals" are different sorts of measurements and have differing unit conversions. Lamentably, this distinction is not always made explicitly. Similar distinctions must be made between spatial interval lengths and spatial coordinates as well as temporal intervals and coordinates. Temporal coordinates are commonly referred to as dates or datetimestamps.
Having reviewed this generic discussion of dimensionality and measurement unit conversion we proceed to discuss how measurement units (and dimensionality) could be systematically encoded in XML (either in schemas or documents). We observe that usage of conventional abbreviations (miles/hour) would require the development of an application specific parser. Instead we suggest a systematic mapping of conventional units designations into legal XML names (miles_Per_hour). Such names could be used to reference definitions (dimensionality monomials, scale factors, ...) via designated namespaces (e.g., ISOMeasurementUnits). Such an approach, while more verbose than conventional abbreviations, would be simpler to implement and maintain. We provide detailed examples of how this could be done.
We suggest that such measurement unit encodings should be standardized for use in conjunction with the W3C XML Schema Languages specification. We envision that such measurement units specifications would typically be specified in the XML schema for a data exchange document as an extension of the datatype specification. We consider the implications of this with respect to the extensibility and annotation mechanisms proposed for the XML Schema Language.
Finally, we discuss the relation of this work to previous efforts concerning measurement units and dimensionality specification in the standards, database, and programming language communities.
See http://www.gca.org/attend/2000_conferences/xtech_2000/proceedings/technicaltrack_tuesday.htm
By
Frank Olken, Lawrence Berkeley National Laboratory
John L. McCarthy, Lawrence Berkeley National Laboratory.
Presented at Conference on Scientific and Technical Data Exchange and Integration, December 15-17, 1997, in Bethesda, MD.
"In this paper we are concerned with the reconciliation of conceptual differences in scientific and technical data interchange and database integration. Such problems arise from attempts to aggregate or compare data that measure similar concepts. We discuss several examples taken from biological sciences, materials sciences, social sciences, medicine, accounting, etc. It is our view that the sources (scientific, epistemological, institutional, legal, linguistic, ideological, accidental) of conceptual differences have important implications for the feasibility and methodology of reconciling these differences. We discuss several attempts at building shared data models. We discuss both bottom-up consensus-based approaches and top-down administrative approaches, the need for precise definitions, the role of common discourse, the role of data registries, taxonomies, ontologies, and generalization/specialization. We also consider some issues of both invariant and material-dependent measurement unit conversions that arise in resolving conceptual differences."