- Initiatives and Projects
- Chemical Markup Language (CML)
- Extensible Data Format (XDF)
- Extensible Scientific Interchange Language (XSIL)
- Finite Element Modeling Markup Language (femML)
- Materials Markup Language (matML)
- Mathematics Markup Language (MathML)
- Numerical Data Markup Language (NDML)
- OASIS Materials Markup Language TC
- Standards Organizations, Professional Societies, and Trade Associations
- Articles, Papers, News
Numerous XML markup languages have been designed for use in the representation and interchange of scientific data relating to chemical and (thermo-)physical properties of industrial materials. Such languages encompass mathematical notation, measurement units, process engineering, visualization, materials trade, and related features. Several representative examples are referenced below.
Peter Murray-Rust and Henry Rzepa have developed CML and related languages (STMML, CMLCore) over a number of years. "CML (Chemical Markup Language) is a new approach to managing molecular information using recently developed Internet tools such as XML and Java. It is based strictly on SGML, the most robust and widely used system for precise information management in many areas. It has been developed over several years and has been tested in many areas and on a variety of machines. CML is not 'just another file format'; it is capable of holding extremely complex information structures and so acting as an interchange mechanism or for archival. It interfaces easily with modern database architectures such as relational databases or object-oriented databases. Most importantly, it a large amount of generic XML software to process and transform it is already available from the community."
"CML has already been used to manage documents and information in: Macromolecular Sequence, Macromolecular Structure, Spectra, Organic Molecules, Publishing, Quantum Chemistry, Inorganic Crystallography, Hypertext (HTML), Databases, Terminology, Regulatory processes, and Molecular databases..." [adapted from the FAQ]
"XDF is a common scientific data format based on XML and general mathematical principles that can be used throughout the scientific disciplines. It includes these key features: hierarchical data structures, any dimensional arrays merged with coordinate information, high dimensional tables merged with field information, variable resolution, easy wrapping of existing data, user specified coordinate systems, searchable ASCII meta-data, and extensibility to new features/data formats. [As of 2003-06] the XDF project supported two versions of XDF: a 'stable' and 'development' version..."
From the version 0.18 DTD: "An XDF document contains arrays and data structures. It is designed to be both an interchange format for scientific data and to be of archival quality. Multidimensional tables and scalar or vector fields are represented in a consistent way and become thoroughly self describing. Axial information provides a full description of the space in which each datum resides. This means that XDF provides a consistent way to hold spectra with their wavelength scales, images with coordinate axes, vector fields with unitDirection, data cubes in complicated spaces, tables with column headers, and series of tables with cut-in-heads..."
- XDF Home Page
- eXtensible Data Format XML DTD. Experimental/development DTD. Version 0.18. June 28, 2002. [source]
- eXtensible Data Format XML DTD. Stable DTD. Version 0.17. July 2, 2001. [source]
- Schematic representation of the XDF data model. "Structures hold structures and arrays. Arrays hold data of any dimension (tabular or image) and axis or field information is always present..."
- "Extensible Data Format (XDF)."
"The Extensible Scientific Interchange Language (XSIL) is a flexible, hierarchical, extensible, transport language for scientific data objects. The entire object may be represented in the file, or there may be metadata in the XSIL file, with a powerful, fault-tolerant linking mechanism to external data. The language is based on XML, and is designed not only for parsing and processing by machines, but also for presentation to humans through web browsers and web-database technology. It comes with a Java object model that is designed to be extensible, so that scientific data and metadata represented in XML is available to a Java code. There is also a powerful Swing-based object browser called Xlook, that is also designed to be extensible. XSIL is directed toward a number of projects, including LIGO (Laser Interferometer Gravitational Wave Observatory), the NPACI Storage Resource Broker, and the Digital Puglia project.
"The femML effort was initiated on the summer of 2000 by members of the Composite Materials and Structures group at the Naval Research Laboratory and the International Science and Technology Outreach Society... The Finite Element Modeling Markup Language (femML) is an effort addressing the problems of data interpretation and application interoperability in the Finite Element Modeling domain of activities for model and/or product specification portability... It is a XML dialect defined in XML like any other XML variant, applied to describing the data structure of FEM data exchange and integration for ubiquitous product/model intra- and inter-application portability. The specification given in terms of: (1) UML abstract descriptions of a DTD; (2) Document Type Definition (DTD); and (3) W3C and Microsoft Schemas..." Members of the femML Working Group include John Michopoulos, Hugh Alan Bruck, Ed Begley, and Gil Kaufman. [adapted from the ISTOS.ORG website]
"MatML is an XML language developed especially for the interchange of materials information. It addresses the problems of interpretation and interoperability for materials property data exchanged via the World Wide Web... The descriptive nature of the MatML tags, such as <Name>, <Class>, and <Subclass> is plainly evident, permitting the language to be far more intelligible than non-descriptive fixed tagsets such as HTML. At the same time, MatML defines a coherent and consistent document structure for its tags, which ensures that any programming language can be used to parse and process the data in whatever manner required... The MatML Version 3.0 Schema contains the formal specification for the materials markup language and represents the efforts to date of a cross section of the international materials community with contributions from private industry, government laboratories, universities, standards organizations, and professional societies..." [adapted from the MatML website Overview]
MathML is an XML application for describing mathematical notation and capturing both its structure and content. MathML 2.0 can be used to encode both mathematical notation and mathematical content. About thirty of the MathML tags describe abstract notational structures, while another about one hundred and fifty provide a way of unambiguously specifying the intended meaning of an expression... MathML is designed to provide the encoding of mathematical information for the bottom, more general layer in a two-layer architecture. It is intended to encode complex notational and semantic structure in an explicit, regular, and easy-to-process way for renderers, searching and indexing software, and other mathematical applications..." [MathML 2.0 ]
The principal goal of MathML is to enable mathematics to be served, received, and processed on the Web, just as HTML has enabled this functionality for text. In more detail, MathML is intended to: encode mathematical material suitable for teaching and scientific communication at all levels; encode both mathematical notation and mathematical meaning; facilitate conversion to and from other math formats, both presentational and semantic; allow the passing of information intended for specific renderers and applications; support efficient browsing for lengthy expressions; provide for extensibility; be well suited to template and other math editing techniques; and be human legible (though it is very verbose), and simple for software to generate and process." [W3C FAQ]
A NIST project titled 'Numerical Data Markup Language (NDML)' proposes to "develop an XML Schema called UnitsML (Units Markup Language) for encoding measurement units in XML. Adoption of this schema will allow for the unambiguous exchange of numerical data over the Internet. [The project proposes] to develop registries/repositories containing UnitsML schemas, UnitsML instance documents and a database of scientific units... XML is a set of rules, guidelines, conventions, for designing text formats for structured data (e.g., spreadsheets, configuration parameters, financial transactions, technical drawings, scientific data, etc.), in a way that produces files that are easy to generate and read (by a computer), that are unambiguous, and that avoid common pitfalls, such as lack of extensibility, lack of support for internationalization/localization, and platform-dependency... To date, the development of markup languages to address the needs of specific communities (e.g. mathematics, chemistry, materials science, etc.) has either not addressed the issue of encoding unit information into numeric data or has addressed this issue independently for each markup language. However, developers have requested a single description of the encoding of these properties in XML. Staff at Lawrence Berkeley National Laboratory began preliminary work in the treatment of measurement units in XML, but have specifically requested that NIST work with them to provide expertise in the area of measurement units consistent with the SI (International System of Units) and to develop and make available a repository of detailed units and dimensionality information at NIST... SIMA is the appropriate sponsor for this work because this project will provide a solution for encoding units information into numeric data in a format that will allow for unambiguous storage, exchange, and processing of numeric data..." NIST Principal Investigator is Bob Dragoset [tel. +1 (301) 975-3718]
- "OASIS Members Form Materials Markup Language Technical Committee." News story 2003-06-25.
- Announcement/CFP 2003-06-25: "OASIS Materials Markup Language TC"
- OASIS Materials Markup Language TC website
SpectroML is an XML-based markup language for spectroscopic data, proposed as part of an endeavor to create NIST Standards for Exchange of Instrument Data and NIST Chemical Reference Data. "The project goal is to create an ASTM standard practice for instrument-to-instrument, instrument-to-application, and application-to-application interchanges of analytical chemistry spectroscopy data using our SpectroML XML markup language as a basis and develop SpectroML for use in NIST SRM and NTRM Programs." Problem domain: "Many critical decisions in manufacturing and engineering depend on reliable chemical data about materials and chemical reactions. Often this information comes directly from instruments in chemical analysis laboratories. The thrust of this program component is to make standard reference data in analytical chemistry data, especially data coming directly from instruments, more readily available to engineers and scientists in the U.S. manufacturing and industrial complex. [But] the interchange and storage of analytical chemistry data has long been hampered by multiple, incompatible data formats..."
"ThermoML is an XML-based approach for storage and exchange of experimental and critically evaluated thermophysical and thermochemical property data. The basic principles, scope, and description of all structural elements of ThermoML are discussed. ThermoML covers essentially all experimentally determined thermodynamic and transport property data (more than 120 properties) for pure compounds, multicomponent mixtures, and chemical reactions (including change-of-state and equilibrium). The primary focus at present is molecular compounds. Although the focus of ThermoML is properties determined by direct experimental measurement, ThermoML does cover key derived property data such as azeotropic properties, Henry's Law constants, virial coefficients (for pure compounds and mixtures), activities and activity coefficients, fugacities and fugacity coefficients, and standard properties derived from highprecision adiabatic heat-capacity calorimetry. The role of ThermoML in global data submission and dissemination is discussed with particular emphasis on the new cooperation in data processing between the Journal of Chemical and Engineering Data and the Thermodynamics Research Center (TRC) at the National Institute of Standards and Technology. The text of several data files illustrating the ThermoML format for pure compounds, mixtures, and chemical reactions, as well as the complete ThermoML schema text, is provided as Supporting Information..."
"ThermoML consists of four major blocks: (1) Citation: description of the source of the data; (2) Compound: characterization of the chemical system, The description for every compound is linked to a description of the sample used in the measurements with indication of its initial purity, purification method used, final purity, and the method used to determine it. (3) PureOrMixtureData: metadata and numerical data for a pure compound or multicomponent mixture; (4) ReactionData: metadata and numerical data for a chemical reaction with a thermodynamic state change or in a state of chemical equilibrium..." [excerpted from the paper]
- ThermoML website
- ThermoML XML Schema [source]
- "ThermoML: An XML-based Approach for Storage and Exchange of Experimental and Critically Evaluated Thermophysical and Thermochemical Property Data. Experimental Data, Supporting Information. By Michael Frenkel and others. National Institute of Standards and Technology. 112 pages. [cache]
- "ThermoML: An XML-Based Approach for Storage and Exchange of Experimental and Critically Evaluated Thermophysical and Thermochemical Property Data." By Michael Frenkel, Robert D. Chirico, Vladimir V. Diky, Qian Dong (Thermodynamics Research Center, Physical and Chemical Properties Division, National Institute of Standards and Technology) Svetlana Frenkel and Paul R. Franchois (Information Technology Laboratory, National Institute of Standards and Technology) Dale L. Embry (ConocoPhillips), Thomas L. Teague (ePlantData, Inc.), Kenneth N. Marsh (Department of Chemical and Process Engineering, University of Canterbury, New Zealand) Randolph C. Wilhoit (Texas Experimental Engineering Station, Texas A&M University System). In Journal of Chemical and Engineering Data (JCED) Volume 48, Number 1 (2003). [cache]
- ASM (American Society for Metals). ASM (now ASM International) "serves the technical interests of metals and materials professionals."
- ASTM (American Society for Testing and Materials). ASTM (now ASTM International) was founded in 1898. It is a scientific and technical organization formed for the development of standards on characteristics and performance of materials, products, systems, and services; and the promotion of related knowledge.
- IAI (International Alliance For Interoperability). "IAI is a global standards-setting organization representing widely diverse constituencies - from architects and engineers, to research scientists, to commercial building owners and contractors, to government officials and academia, to facility managers and building product manufacturers."
- NIST (National Institute of Standards and Technology). "Founded in 1901, NIST is a non-regulatory federal agency within the U.S. Commerce Department's Technology Administration. NIST's mission is to develop and promote measurement, standards, and technology to enhance productivity, facilitate trade, and improve the quality of life."