The Cover PagesThe OASIS Cover Pages: The Online Resource for Markup Language Technologies
SEARCH | ABOUT | INDEX | NEWS | CORE STANDARDS | TECHNOLOGY REPORTS | EVENTS | LIBRARY
SEARCH
Advanced Search
ABOUT
Site Map
CP RSS Channel
Contact Us
Sponsoring CP
About Our Sponsors

NEWS
Cover Stories
Articles & Papers
Press Releases

CORE STANDARDS
XML
SGML
Schemas
XSL/XSLT/XPath
XLink
XML Query
CSS
SVG

TECHNOLOGY REPORTS
XML Applications
General Apps
Government Apps
Academic Apps

EVENTS
LIBRARY
Introductions
FAQs
Bibliography
Technology and Society
Semantics
Tech Topics
Software
Related Standards
Historic
Created: May 24, 2002.
News: Cover StoriesPrevious News ItemNext News Item

XML Schemas for the NCBI Molecular Biology Data Model.

A posting from H. Kaiser Yang reports on the release of thirty-one (31 ) draft XML Schema files and six corresponding sample XML bio-sequence files from the NCBI's data modeling research. The US National Center for Biotechnology Information supports a "multi-disciplinary research group comprised of computer scientists, molecular biologists, mathematicians, biochemists, research physicians, and structural biologists concentrating on basic and applied research in computational molecular biology." NCBI has used ASN.1 [Abstract Syntax Notation One] "for the storage and retrieval of data such as nucleotide and protein sequences, structures, genomes, and MEDLINE records; it permits computers and software systems of all types to reliably exchange both the data structure and content." The draft XML schemas are orthogonal to the DTDs in current use, and will replace the DTDs in the next version of the database toolkit. NCBI earlier "added support for XML output to its ASN.1 toolkit such that an ASN.1 specification could be automatically rendered into an XML DTD; data encoded in ASN.1 can then be output automatically in XML which will validate against the DTD using standard XML tools."

From the March 16, 2001 document "NCBI Data in XML" (not yet updated to reflect use of XML Schemas):

Roughly ten years ago, NCBI chose a language called Abstract Syntax Notation 1 (ASN.1) for describing and exchanging information in a manner similar to the ways XML is now used. ASN.1 came out of the telecommunications industry and is a compact binary encoding intended for both human readable text as well as integers, floating point numbers, and so on. While this is "software friendly" it is less accessible to users familiar with HTML and other text based languages. Tools for ASN.1 have largely stayed within the commercial telecommunications industry while a host of public domain tools of varying character have arisen for XML and HTML.

NCBI has recently added support for XML output to its ASN.1 toolkit. An ASN.1 specification can be automatically rendered into an XML DTD. Data encoded in ASN.1 can automatically be output in XML which will validate against the DTD using standard XML tools. We hope this will make the structured sequence, map, and structure data, as well as the output of tools like BLAST, more accessible to those who wish to work in XML. We are providing XML in two basic modes. Full Data Conversion is the direct mapping of every data field used within NCBI to XML...

While the effect of Roles, Scope, and Alternate Forms results in extensive tags in the XML, it does accurately reflect the structure and use of the data. It allows XML programs to capture as little or as much of the full data structure as they wish. And once converted back from XML to structures or classes in a variety of programming languages there is minimal overhead once again. The full NCBI DTD reflects this structure. What is called the NCBI DTD actually only specifies the basic data structures for publications, sequences, maps, alignments, and structures. These same elements are reused in different roles in many services as well, such as BLAST which produces alignments (defined in the NCBI DTD) as well as other elements specific to BLAST. We have not copied all the referenced modules into a DTD for every service as a practical matter, although we can produce XML output from any ASN.1 interface.


Hosted By
OASIS - Organization for the Advancement of Structured Information Standards

Sponsored By

IBM Corporation
ISIS Papyrus
Microsoft Corporation
Oracle Corporation

Primeton

XML Daily Newslink
Receive daily news updates from Managing Editor, Robin Cover.

 Newsletter Subscription
 Newsletter Archives
Bottom Globe Image

Document URI: http://xml.coverpages.org/ni2002-05-24-b.html  —  Legal stuff
Robin Cover, Editor: robin@oasis-open.org