[November 29, 2000] Several submissions have now been published in response to the Object Management Group's Gene Expression RFP, originally issued in March 2000 (LSR RFP-7/lifesci/00-03-09). The RFP overview: "Life sciences research has experienced rapid growth in the number of gene expression analysis techniques and is faced with explosive growth in the amount of data produced by these experiments. The creation and adoption of standardized programmatic interfaces is a crucial step in support of automated data exchange and interoperability among different gene expression data systems. This RFP solicits proposals which define interfaces and services in support of array based gene expression data collection, management, retrieval, and analysis." The RFP also requests definition of one or more XMI compliant Document Type Definitions (DTDs) "intended for use as self-describing data structures for encapsulation of hybridization, expression, and cluster data." In response to this RFP, relevant documents have been submitted by the European Bioinformatics Institute, Rosetta Inpharmatics, and NetGenics [...]
The EBI Initial Submisison regarding the Gene Expression RFP proposes "a framework for describing information about a DNA-array experiment and a data format -- Microarray Markup Language (MAML) -- for communicating this information... MAML is based on the Extensible Markup Language XML. MAML is independent of the particular experimental platform and provides a framework for describing experiments done on all types of DNA-arrays, including spotted and synthesized arrays, as well as oligo-nucleotide and cDNA arrays, and is independent of the particular image analysis and data normalization methods. MAML does not impose any particular image analysis or data normalization method, but instead provides format to represent microarray data in a flexible way, which allows to represent data obtained from not only any existing microarray platforms, but also many of the possible future variants, including protein arrays. The format allows representation of raw and processed microarray data. The format is compatible with the definition of the 'minimum information about a microarray experiment' (MIAME) proposed by the MGED group.
On behalf of the GEML Community, Rosetta Inpharmatics has submitted to the Object Management Group (OMG) a proposed DTD based on the new version of Gene Expression Markup Language - GEML 2.0. Rosetta Inpharmatics Initial Submission regarding the Gene Expression RFP describes work in connection with the GEML DTD: "Rosetta Inpharmatics and Agilent Technologies have been using the GEML 1.0 format as part of internal pipelines for the past year. Rosetta has been continuously loading XML files on the order of thirteen megabytes into the Rosetta Resolver system, an enterprise expression data analysis product. We recently used internal tools to export the more than one thousand profiles, assigned annotations, and supporting patterns that constituted the data for the article, Functional Discovery via a Compendium of Expression Profiles, that appeared in the July 7, 2000 issue of Cell. The total size of the export, when compressed, was a little over a half of gigabyte of data. That data was then imported by Harvard into their Rosetta Resolver system. We have not, as of yet, implemented the interfaces contained in this proposal but given that the size of the compressed XML files has proven no technical obstacle, we see no technical problems in implementing the interfaces. Rosetta has developed the freeware GEML Conductor tools for visualization of GEML formatted data and for conversion of gene expression data in other formats into GEML." See the XML DTD and IDL file.
In the NetGenics Submission, the UML model is normative. "The UML, which follows the recently adopted UML Profile for CORBA, permits semantic specifications that go beyond what is expressible in IDL. Given the size of typical data sets, a stream-based externalization approach makes sense. The stream would likely contain XML (e.g., Rosetta Inpharmatics' GEML), a popular means of representing gene expression data..." See the associated XMI file for details.
Life Sciences Research / Gene Expression OMG Request for Information. LSR RFI 3. lifesci/98-11-09. 11 November 1998. "... This Request for Information (RFI) solicits information about requirements, projects, and products that will provide guidance for gene expression related object system interoperability. The Object Management Group (OMG) and, specifically, the Life Sciences Research Domain Task Force (LSR-DTF), will use this information to begin the technology adoption process for OMG-compliant interfaces for systems used in microarray gene expression research. The goal of the Life Sciences Research Domain Task Force is as follows: (1) To improve the quality and utility of software and information systems used in Life Sciences Research through use of the Common Object Request Broker Architecture (CORBA) and the Object Management Architecture (OMA). (2) To encourage the development of interoperable software tools and services in Life Sciences Research. (3) To prepare to use the Object Management Group (OMG) technology adoption process to standardize interfaces for software tools, services, frameworks, and components in Life Sciences Research. (4) To communicate the requirements of the Life Sciences Research domain to the Platform Technical Committee. (4) To coordinate with OMG Task Forces and Special Interest Groups, as well as other standards organizations and information providers, to ensure common standards..."