[October 26, 2001] PROXIML is a research 'Bioinformatics Project' [CMPS243] hosted at the University of California, Santa Cruz. The principal investigator is Douglas C. McArthur. "Problems associated with existing protein data formats, such as PDB and mmCIF, indicate a need for a more self-describing and machine-readable approach to exchanging protein-related data. XML (eXtensible Markup Language) is an ideal solution for this particular problem. However, most existing XML-based efforts (such as BIOML and ProML) rely on the W3C XML DTD (document type definition) to describe and validate the structure of their contents. The XML DTD approach imposes severe limitations upon both the structure and ability to validate an XML document. An alternative utilizing the W3C XML Schema approach to document definition overcomes many of these limitations. This approach has recently been adopted for CML, a general purpose chemical markup language. As an extension of the CML schema, PROXIML can encode the relevant details of protein structure in a more robust and well-structured fashion than other currently available data formats..." [Status: 2001-03-16.]
The popularity of XML in the area of bioinformatics has clearly grown in the past few years. XML provides the capability of representing protein data in a single, standardized data structure. However, the structure of XML documents defined using a DTD is limited to representing data in a hierarchical tree fashion. While some portion of protein-related data can be effectively stored in this way, a significant amount of protein-related data is better represented as an arbitrary graph rather than a hierarchical tree. The XML Schema approach, coupled with XML Linking Language (XLink) allows representation of non-hierarchical data within an XML document in a self-describing fashion. Additionally, validation of both the structure and the content (with regard to specific datatypes) is greatly facilitated by an XML Schema vs. the XML DTD. By combining elements of three separate XML-based languages using an XML Schema approach, PROXIML is able to encode the relevant details of protein structure in a more robust and well-structured fashion than the current PDB and mmCIF data formats. Adoption however will ultimately depend heavily on the availability of tools (such as viewers and converters) that support the new format..."
References:
- PROXIML web site
- Exisiting XML Formats for gene expression: Biopolymer Markup Language (BIOML), Protein Markup Language (ProML), Chemical Markup Language (CML).
- Comparison of Existing XML Formats
- PROXIML XML schema [cache]
- Related topics:
- Chemical Markup Language
- Molecular Dynamics [Markup] Language (MoDL)
- StarDOM - Transforming Scientific Data into XML
- Bioinformatic Sequence Markup Language (BSML)
- BIOpolymer Markup Language (BIOML)
- CellML
- Gene Expression Markup Language (GEML)
- GeneX Gene Expression Markup Language (GeneXML)
- Genome Annotation Markup Elements (GAME)
- MicroArray and Gene Expression Markup Language (MAGE-ML)
- Microarray Markup Language (MAML)
- XML for Multiple Sequence Alignments (MSAML)
- Systems Biology Markup Language (SBML)
- OMG Gene Expression RFP