[May 24, 2002] From the March 16, 2001 document "NCBI Data in XML" (not yet updated to reflect use of XML Schemas):
Roughly ten years ago, NCBI chose a language called Abstract Syntax Notation 1 (ASN.1) for describing and exchanging information in a manner similar to the ways XML is now used. ASN.1 came out of the telecommunications industry and is a compact binary encoding intended for both human readable text as well as integers, floating point numbers, and so on. While this is "software friendly" it is less accessible to users familiar with HTML and other text based languages. Tools for ASN.1 have largely stayed within the commercial telecommunications industry while a host of public domain tools of varying character have arisen for XML and HTML.
NCBI has recently added support for XML output to its ASN.1 toolkit. An ASN.1 specification can be automatically rendered into an XML DTD. Data encoded in ASN.1 can automatically be output in XML which will validate against the DTD using standard XML tools. We hope this will make the structured sequence, map, and structure data, as well as the output of tools like BLAST, more accessible to those who wish to work in XML. We are providing XML in two basic modes. Full Data Conversion is the direct mapping of every data field used within NCBI to XML...
While the effect of Roles, Scope, and Alternate Forms results in extensive tags in the XML, it does accurately reflect the structure and use of the data. It allows XML programs to capture as little or as much of the full data structure as they wish. And once converted back from XML to structures or classes in a variety of programming languages there is minimal overhead once again. The full NCBI DTD reflects this structure. What is called the NCBI DTD actually only specifies the basic data structures for publications, sequences, maps, alignments, and structures. These same elements are reused in different roles in many services as well, such as BLAST which produces alignments (defined in the NCBI DTD) as well as other elements specific to BLAST. We have not copied all the referenced modules into a DTD for every service as a practical matter, although we can produce XML output from any ASN.1 interface.
References:
- [US] National Center for Biotechnology Information
- Announcement 2002-05-21: "XML Schema for NCBI Data Model."
- NCBI ASN.1 Summary
- NCBI Data in XML [cache, alt URL]
- NCBI XML DTDs. See the ZIP archive or tarball.
- NCBI ASN.1 - XML resources
- Genomics and proteomics databases (H. Kaiser Yang)
- XML Schemas [cache]
- Contact: H. Kaiser Yang [alt email handle]
- XML for Molecular Biology Compiled by Paul Gordon.