[February 01, 2001] GeneXML, as one resource in the GeneX Projet, is a "specification that supports the logical representation of the data so that partial or complete datasets from different Gene Expression Databases can be exchanged without loss of information... we have broken the XML Document Type Definition (DTD) into logical components that can be downloaded seperately. These include: (1) the GeneXML DTD, which is the main GeneXML data structure that contains the experimental meta-data. (2) the Sequence Feature DTD which contains information on the immobilized sequence; (3) the Array Layout DTD which handles the array layout information as it relates to the spots (4) the Array Measurement DTD where the actual measurement data is stored..."
"GeneX - a Collaborative Internet Database and Toolset for Gene Expression Data. The National Center for Genome Resources and the Computational Genomics Group at the University of California, Irvine are participating in the GeneX project to provide an Internet-available repository of gene expression data with an integrated toolset that will enable researchers to analyze their data and compare their results with other such data. This body of data will allow more confidence to be placed on the conclusions reached through analysis, as well as sharing the considerable cost of generating these datasets. Introduction: With large-scale sequencing comes the ability to query organisms for their partial or complete transcriptomes, the transcriptional response to a challenge. The myriad technologies for this are quite expensive but can provide huge amounts of data, only a small amount of which is generally of interest to the investigator. The GeneX project plans to make the greatest use of gene expression data by creating an Internet-available relational database of public data derived from these multiple technologies, as well as making the same database technology available for local installation. The database is being designed with an integrated toolset that will enable researchers to analyze their data with reference to the whole database..."
"NCGR is a nonprofit research organization in Santa Fe, New Mexico, USA. Through our research programs, NCGR develops coordinated methods, such as software tools, to help scientists comprehend biological data. People around the world can benefit from our programs through improved food crops, a cleaner environment, and better medical treatments. Bioinformatics will facilitate biological research that can enable farmers, for example, to choose genetic methods instead of pesticides to provide insect and disease resistance in crops -- lessening pollution and enhancing productivity, helping feed a hungry world and reducing pressure to convert natural habitat to farmland. Likewise, genetic engineering can augment the positive qualities of the foods we eat, like increasing dietary fiber in grains, which when consumed can reduce harmful cholesterol..."
GeneXML and 'GEML' [former name]: "Because of Rosetta's trademarking of the name 'GEML', NCGR now refers to its Markup Language as GeneXML, and is concentrating on increasing the utility and scalability of the underlying Data Model and moving towards full support of MAML. The only downside that we can find with the MAML spec is that by definition, it has support only for arrays of various kinds and makes no effort to support alternative expression technologies such as SAGE, MAGE, AFLP, etc, although it does does allow support of non-expression uses of arrays. Since we agree that arrays will provide the largest gene expression data flow in the foreseeable future, we will fully support MAML as both input and output formats. However, we are considering projects that involve non-array expression data and to provide support for them, we intend to maintain and extend GeneXML's capabilities in supporting alternative technologies, which while currently do not have as many users, are still quite useful to a number of researchers. We are open to suggestions and criticism and are hoping that user feedback will drive GeneXML and the schema in useful directions, while still maintaining compatibility with MAML and associated expression analysis tools..."
References:
[Earlier description:] The Gene Expression Markup Language. - "NCGR, together with a consortium of gene expression database creators, is developing a common data interchange format entitled the Gene Expression Markup Language (GEML). GEML is based on the eXtensible Markup Language (aka XML - here's a very basic intro), and is being designed to provide data exchange compatibility between the diverse data models being implemented for the various expression database projects. An example set of the Saccharomyces cerevisiae diauxic shift data, courtesy of the Pat Brown Lab at Stanford, can be obtained in GEML format here. Note that the gzipped file is approximately 1MB, but uncompressed, it is approximately 17MB, demonstrating both the compressibility of XML and the pressure to encode the data external to the XML. There are good reasons to support both internal and external encodings of the data and NCGR will support whatever standard is decided upon, but in this representation, the data is encoded internal to the file (and the tags for each data point are a major contributor to its size). NCGR will also attempt to provide more compact representations of the data which more accurately represent the hierarchy in the data that might be returned from a complex query. We are currently examining the Hierarchical Data Format (HDF), especially HDF5 as well as others." [cache] - From the GeneX web site; see below.
Note on GeneX: a Collaborative Internet Database and Toolset for Gene Expression Data. "The National Center for Genome Resources and the Computational Genomics Group at the University of California, Irvine are participating in the GeneX project to provide an Internet-available repository of gene expression data with an integrated toolset that will enable researchers to analyze their data and compare them with other such data. The corpus of such data will allow more confidence to be placed on the conclusions reached in this analysis, as well as sharing the considerable cost of generating these datasets. [The project contributes to 'A Gene Expression Markup Language (GEML)', an eXtensible Markup Language (XML) specification that supports the logical representation of the data so that partial or complete datasets from different Gene Expression Databases can be exchanged without loss of information. . ."