The Cover PagesThe OASIS Cover Pages: The Online Resource for Markup Language Technologies
Advanced Search
Site Map
CP RSS Channel
Contact Us
Sponsoring CP
About Our Sponsors

Cover Stories
Articles & Papers
Press Releases

XML Query

XML Applications
General Apps
Government Apps
Academic Apps

Technology and Society
Tech Topics
Related Standards
Created: June 05, 2002.
News: Cover StoriesPrevious News ItemNext News Item

Library of Congress Publishes MARC 21 XML Schema and Transformation Tools.

A posting from Corey Keith of the US Library of Congress announces the publication of an XML Schema for use in communicating MARC 21 records. Prepared by the Library of Congress Network Development and MARC Standards Office, the XML Schema "was developed in collaboration with OCLC and RLG and reviewed by the National Library of Canada and the National Library of Medicine (NLM), after a survey of schemas in use in various projects. The schema will be maintained by the Library of Congress, along with software that enables lossless conversion to and from MARC 21 records in the ISO 2709 structure. The schema supports tags with alphabetics and subfield codes that are symbols, neither of which are as yet used in the MARC 21 communications formats, but are allowed by MARC 21 for local data; it accommodates all types of MARC 21 records: bibliographic, holdings, bibliographic with embedded holdings, authority, classification, and community information." The software tools maintained by LOC will support transformations to and from other metadata approaches, including Dublin Core and MODS. The Metadata Object Description Schema (MODS) "is a new schema for a bibliographic element set that is a subset of MARC expressed in XML with language-based rather than numeric tags."

From the LOC website 'MARC XML Design Considerations':

The core of the MARC XML framework is a simple XML schema which contains MARC data. This base schema output can be used where full MARC records are needed or act as a "bus" to enable MARC data records to go through further transformations such as toDublin Core and/or processes such as validation. The MARC XML schema will not need to be edited to reflect minor changes to MARC21. The schema retains the semantics of MARC.

All of the essential data in a MARC record is converted and expressed in XML. MARC structrual elements, such as the length of field and starting position of field data in directory entries are not needed in the XML record. Leader data positions not needed in the XML environment are retained as place holders or carried as blanks.

As a consequence of the lossless conversion from MARC (2709), information in a MARC XML record enables recreation of a MARC (2709) record without loss of data. A MARC (2709) record can also be created without data loss from MARC XML records.

Once MARC data has been converted to XML, data presentation is possible by writing a XML stylesheet to select the MARC elements to be displayed and to apply the appropriate markup. Some single or batch updates such as adding, updating, or deleting a field to a MARC record can be accomplished with simple XML transformations. Most data conversions can be written as XML transformations. For more complex transformations of the data, software tools which read MARC XML can be written.

Validation with this schema is accomplished via a software tool. This software, external to the schema, will provide three possible levels of validation: (1) Basic XML validation according to the MARC XML Schema; (2) Validation of MARC21 tagging (field and subfield); (3) Validation of MARC record content, e.g., coded values, dates, and times.

From the announcement:

By collaboratively developing a communications schema, the Library of Congress encourages the standardization of MARC 21 exchange records in the XML environment, recognizing that MARC 21 records inside systems will continue to use different record configurations, tailored to the characteristics of the system. Provision of the tools for transformations to and from other metadata approaches, such as Dublin Core and the Metadata Object Description Schema (MODS), will help to standardize derivative metadata records also. MODS is a new schema for a bibliographic element set that is a subset of MARC expressed in XML with language-based rather than numeric tags. The tools take the mappings between MARC and other metadata sets, that have been maintained on the MARC web site, to an operational level.

One project interested in a standard, lossless MARCXML schema is the Open Archive Initiative (OAI) which found it necessary to draft a schema in the absence of an official one. The Library of Congress worked with the OAI to provide a transformation from the original oai_marc schema to this one so the Initiative can take advantage of a schema that is maintained by the MARC 21 maintenance agency and in broad use. The transformation is available from the MARCXML web site.

With the slim approach, schema-driven validation is only possible at the highest structural level. The Network Development and MARC Standards Office will therefore maintain downloadable tag, subfield, and value validation software on the web site that will enable users to build validation programs for their needs. Use of these standard validations represent another attempt to assure standardization of records to support effective record interchange.

The Library has maintained two SGML DTDs (for Bibliographic-type and Authority-type records) since 1996, which take a different approach to the data elements in MARC B an approach that enables validation of data through the DTD itself but requires a very large DTD and DTD maintenance. The Bibliographic-type DTD was converted to an XML DTD in 2000. These DTDs have been effectively used by some agencies (including the Library of Congress), primarily for internal processes, therefore transformations between them and the new slim MARCXML schema are being provided. Maintenance techniques and/or possible revision of the XML DTDs are under consideration.

Hosted By
OASIS - Organization for the Advancement of Structured Information Standards

Sponsored By

IBM Corporation
ISIS Papyrus
Microsoft Corporation
Oracle Corporation


XML Daily Newslink
Receive daily news updates from Managing Editor, Robin Cover.

 Newsletter Subscription
 Newsletter Archives
Bottom Globe Image

Document URI:  —  Legal stuff
Robin Cover, Editor: