Cover Pages: MicroArray and Gene Expression Markup Language (MAGE-ML)

[February 08, 2002] Microarray Gene Expression Markup Language (MAGE-ML) "is a language designed to describe and communicate information about microarray based experiments. MAGE-ML is based on XML and can describe microarray designs, microarray manufacturing information, microarray experiment setup and execution information, gene expression data and data analysis results. MAGE-ML has been automatically derived from Microarray Gene Expression Object Model (MAGE-OM), which is developed and described using the Unified Modelling Language (UML) -- a standard language for describing object models. Descriptions using UML have an advantage over direct XML document type definitions (DTDs), in many respects. First they use graphical representation depicting the relationships between different entities in a way which is much easier to follow than DTDs. Second, the UML diagrams are primarily meant for humans, while DTDs are meant for computers. Therefore MAGE-OM should be considered as the primary model, and we will explain MAGE-ML by providing simplified fragments of MAGE-OM, rather then XML DTD or XML Schema." [from the description by Ugis Sarkans]

MicroArray and GeneExpression Markup Language (MAGE-ML). Description from the Revised Gene Expression RFP:

The MAGE-ML model defines the elements for supporting gene expression data. Because the exchange of gene expression data can be abstracted from the source from which it was obtained, it can be represented by XML files, which are both human readable and machine readable. This facilitates an independence between the export and the import of the gene expression data as illustrated below. Ad hoc queries, when the XML files are directly accessible, can take advantage of the suite of W3C recommendations, including XSLT or XMLQuery. Queries against repositories could be specified a number of ways, including through an IDL interface that had as its query language either of the above choices or OQL based on MAGEOM. The DTD file, MAGE-ML.dtd, is generated from MAGE-OM from a fixed set of rules. In one area, BioAssayData, further modifications were made to offer alternatives and efficiency to the parsing... The vocabulary of MAGE-ML is organized into sub-vocabularies in such a way thatthe sub-vocabularies are independent of each other. These sub-vocabularies are driven by the packages and Identifiable classes of the MAGE-OM, which correspond to discreet groupings of events and results of Gene expression experiments. This will allow a valid XML document to contain the data from an individual sub-vocabulary, such as BioMaterial or ArrayDesign, or to contain any combination of these sub-vocabularies, such as all the BioAssay and BioAssayData for an experiment. Implementations may impose additional ordering, such as ArrayDesigns before their Arrays, or they may require that they be exported to separate files.

The independence of sub-vocabularies is possible through the use of reference elements to link data from two sub-vocabularies within a document where the reference elements' identifier attribute can be matched to the object with that same identifier..."

Mapping from MAGE-OM to MAGE-ML: "One of the design principals of the model was to keep it straight-forward so that the translation to XML would be as simple as possible. These mapping rules are not meant to be extendable to arbitrary models, nor were they designed to do more than map MAGE-OM to this DTD. The data-centric (as opposed to process-centric) nature of gene expression data allowed the UML model to not need advanced modeling constructs. This allowed the parsing of the XMI for this model to focus on the narrow range of elements used, primarily the elements representing the Model, the Packages, the Classes, the Associations, the DataTypes, and the ExtensionMechanism (for the documentation and the constraints).

From the 2001-10-01 MAGE specification:

This document contains a proposal for a standard that addresses the representation of gene expression data and relevant annotations, as well as mechanisms for exchanging these data.

The field of gene expression experiments has several distinct technologies that a standard must include. These include single vs. dual channel experiments, cDNA vs. oligonucleotides. Because of these different technologies and different types of gene expression experiments, it is not expected that all aspects of the standard will be used by all organizations.

Given the massive amount of data associated with a single set of experiments, we feel that Extensible Markup Language (XML) is the best way to describe the data. The use of a Document Type Definition (DTD) allows a well-defined tag set, a vocabulary, to describe the domain of gene expression experiments. It also has the virtue of compressing very well so that files in an XML format compress to ten percent of their original size. XML is now widely accepted as a data exchange format across multiple platforms.

Organizations that request these XML streams can use freely available implementations of either of the W3C recommended DOM or the XML-DEV SAX parsing interfaces to create import and export applications. These import and export applications can be tailored for the specific needs of the organization without the need to burden the vocabulary of the XML with specifics of any organization's schema requirements.

With the acceptance of XML Metadata Interchange as an OMG standard and the recent emphasis on Model Driven Architecture, it is possible to specify a normative Platform Independent Model (PIM).

In this document we describe this normative PIM, Microarray and Gene Experiment Object Model (MAGE-OM). It also describes the general mapping rules to a DTD that best captures the syntax and semantics from the PIM. One area of the DTD is further transformed to provide efficient and flexible formats for the actual data. The DTD is generated from the model with the addition of the transformed representation of the gene expression data in the DTD. The algorithm as implemented by the generating code is normative.

This submittal includes the MAGE.xmi, lifesci/2001-10-02 produced from Rational Rose Enterprise Edition, v2000.02.10 using the Unisys Rose UML tool, Version 1.3.2. The export uses the XMI 1.0 option. The automatically generated DTD, MAGE-ML.dtd is included as lifesci/2001-10-3. The Java generating code and support files, including examples, is included as lifesci/2001-10-04. Although there are good, standard ways to specify queries both in terms of the object model (OQL) and the XML (XQuery, XPath), the submitters have decided to limit the scope of this proposal to the underlying format of the data for interchange between organizations. An emphasis in the design, however, was placed on making the XML modular so that the results of a single experiment could be split into different, easily managed files and that there were appropriate attributes and associations to facilitate queries.

References:

MAGE working group web site
MAGE-ML file list
MAGE-OM: Microarray Gene Expression Object Model
MAGE-ML DTD. 21-Jan-2002. From SourceForge [source]
MAGE-ML Document Type Definition (DTD). From the OMG web site. [alt URL; see the reference page]
Example MAGE-ML Files. Text example: 'BioMaterial, Septic injury of drosophila' [source]
"MAGE-ML: MicroArray Gene Expression Markup Language." Description by Ugis Sarkans (European Bioinformatics Institute). 11 pages. [source]
Gene Expression RFP Response Joint Revised Submission. MAGE specification. By EMBL-EBI (European Bioinformatics Institute) and Rosetta Inpharmatics. Edited by Michael Miller. Reference: OMG Document lifesci/2001-10-01. 123 pages. See the source [alt URL, see the reference page]
Supersedes: "Microarray Markup Language (MAML)."
Related topics:


SEARCH \| ABOUT \| INDEX \| NEWS \| CORE STANDARDS \| TECHNOLOGY REPORTS \| EVENTS \| LIBRARY