The Cover PagesThe OASIS Cover Pages: The Online Resource for Markup Language Technologies
Advanced Search
Site Map
CP RSS Channel
Contact Us
Sponsoring CP
About Our Sponsors

Cover Stories
Articles & Papers
Press Releases

XML Query

XML Applications
General Apps
Government Apps
Academic Apps

Technology and Society
Tech Topics
Related Standards
Last modified: July 23, 2002
MARC (MAchine Readable Cataloging) and SGML/XML

This document provides references to the use of SGML/XML in bibliographic data management based upon MARC formats. It also references some generalized approaches to markup-based bibliographic database management and citation generation.

MARC (MAchine-Readable Cataloging) refers to a suite of related standards (USMARC, Can/MARC, InterMARC, UKMARC, CCF, etc.) used for bibliographic control within the library science and 'digital libraries' communities. 'MARC' is based upon ISO 2709:1996, Format for Information Exchange (INEX). 'USMARC' is based on ANSI Z39.2, American National Standard for Bibliographic Information Interchange. Conversion from MARC to SGML/XML (and the reverse) has been addressed in several different efforts. We expect to see increased interest in the development of interchange DTDs and software supporting such conversions, together with work on facilitating interoperability with the TEI (header), Dublin Core, RDF, and other metadata formats.

"The MARC formats are standards for the representation and communication of bibliographic and related information in machine-readable form. A USMARC record involves three elements: the record structure, the content designation, and the data content of the record. A USMARC format is a set of codes and content designators defined for encoding machine-readable records. Formats are defined for five types of data: bibliographic, holdings, authority, classification, and community information." [From: "The USMARC Formats: Background and Principles"]


  • MARC SGML and XML from US Library of Congress

  • MARC 21 XML Schema. [cache]

  • MARC 21 XML Schema, graphical view

  • Metadata Object Description Schema (MODS)

  • MODS XML Schema: text version (.xsd) and graphical view. [cache, 2002-06-05]

  • [July 23, 2002] "XML and Bibliographic Data: the TVS (Transport, Validation and Services) Model." By Joaquim Ramos de Carvalho (IHTI Faculdade de Letras, Universidade de Coimbra, Portugal) and Maria Inês Cordeiro (Art Library, Calouste Gulbenkian Foundation, Portugal). Paper prepared for the 68th IFLA General Conference and Council 'Libraries for Life: Democracy, Diversity, Delivery', August 18-24, 2002, Glasgow, Scotland. 13 pages, with 44 references. "This paper discusses the role of XML in library information systems at three major levels: as a representation language that enables the transport of bibliographic data in a way that is technologically independent and universally understood across systems and domains; as a language that enables the specification of complex validation rules according to a particular data format such as MARC; and, finally, as a language that enables the description of services through which such data can be exploited in alternative modes that overcome the limitations of the classical client-server database services. The key point of this paper is that by specifying requirements for XML usage at these three levels, in an articulated but distinct way, a much needed clarification of this area can be achieved. The authors conclude by stressing the importance of advancing the use of XML in the real practice of bibliographic services, in order to improve the interoperable capabilities of existing bibliographic data assets and to advance the WWW integration of bibliographic systems on a sound basis... By 'transport format' we mean an XML format designed to take a role similar to that of ISO 2709. Its purpose is to allow the efficient transport of bibliographic data. Like ISO 2709 such a format contains the necessary information for representing the morphological structure of the MARC record, i.e., without aiming at validation of the complete syntax/semantics of the MARC format, but rather mapping directly to the MARC record main structural levels. ISO 2709 was modelled to the needs of the technological environment of its time; an XML equivalent for the current technological context must target the interoperability paradigms of today: Web services. So far, most of the approaches to encoding bibliographic records in XML have been based on the assumption that the use of XML would imply the expression of the whole syntax/semantics of MARC, enforcing validation. Not only this approach has proved difficult (very complex and long DTDs or schema) but also it does not facilitate the transport and reuse of data in practical applications. This is because in such a way only valid records can be represented, and also because it is difficult to rebuild MARC data from the very complex MARC XML records generated in such a model..." From a posting: "A web page with various examples and sample code is available; the page proposes a mapping that is targeted to Web Services development. Sample web services are available at the National Library of Portugal, allowing experimental access to over one million records..." [cache]

  • [June 14, 2002] Metadata Object Description Schema (MODS). "The Library of Congress' Network Development and MARC Standards Office, with interested experts, has developed the Metadata Object Description Schema (MODS), which is a bibliographic element set that may be used for a variety of purposes, particularly for library applications. As an XML schema it is intended to be able to carry selected data from existing MARC 21 records as well as to enable the creation of original resource description records. It includes a subset of MARC fields and uses language-based tags rather than numeric ones, in some cases regrouping elements from the MARC 21 bibliographic format. The elements inherit MARC semantics, so are more compatible with existing library data than other metadata schemes. MODS could potentially be used as follows: (1) as a Z39.50 Next Generation specified format; (2) as an extension schema to METS (Metadata Encoding and Transmission Standard); (3) to represent metadata for harvesting; (4) for original resource description in XML syntax (using MARC semantics); (5) for representing a simplified MARC record in XML; (6) for metadata in XML that may be packaged with an electronic resource..."

  • [June 05, 2002]   Library of Congress Publishes MARC 21 XML Schema and Transformation Tools.    A posting from Corey Keith of the US Library of Congress announces the publication of an XML Schema for use in communicating MARC 21 records. Prepared by the Library of Congress Network Development and MARC Standards Office, the XML Schema "was developed in collaboration with OCLC and RLG and reviewed by the National Library of Canada and the National Library of Medicine (NLM), after a survey of schemas in use in various projects. The schema will be maintained by the Library of Congress, along with software that enables lossless conversion to and from MARC 21 records in the ISO 2709 structure. The schema supports tags with alphabetics and subfield codes that are symbols, neither of which are as yet used in the MARC 21 communications formats, but are allowed by MARC 21 for local data; it accommodates all types of MARC 21 records: bibliographic, holdings, bibliographic with embedded holdings, authority, classification, and community information." The software tools maintained by LOC will support transformations to and from other metadata approaches, including Dublin Core and MODS. The Metadata Object Description Schema (MODS) "is a new schema for a bibliographic element set that is a subset of MARC expressed in XML with language-based rather than numeric tags." [Full context]

  • MARC XML DTD for Bibliographic/Holdings/Community Information Record [cache]

  • [June 22, 2001] Announcement: Special issue of OCLC Systems & Services on XML and libraries. "The OCLC Systems & Services journal plans to devote a special issue to cover XML applications for libraries. If you have implemented any XML applications for your work, please share them with my readers. The deadline will be January 31, 2002. OCLC Systems & Services is a refereed and quarterly publication with an international readership. Its publisher is MCB University Press. If you are interested in contributing an article on this topic, please do not hesitate to contact me..." Post 21-Jun-2001 from Sheau-Hwang Chang (Editor of OCLC S&S, Bridgewater State College, Bridgewater, MA 02325).

  • [November 01, 2001] "The NISO Circulation Interchange Protocol (NCIP). An XML Based Standard." By Mark Needleman, John Bodfish, Tony O'Brien, James E. Rush, and Patricia Stevens. In Library Hi Tech Volume 19, Number 3 (2001), pages 223-230. ISSN: 0737-8831. "The article describes the NISO (National Information Standards Organization) Circulation Interchange Protocol (NCIP) and some of the design decisions that were made in developing it. When designing a protocol of the scale and scope of NCIP, certain decisions about what technologies to employ need to be made. Often, there are multiple competing technologies that can be employed to accomplish the same functionality, and there are both positive and negative reasons for the choice of any particular one. The article focuses specifically on the areas on which the protocol would be supported. The authors give particular emphasis to the decision to choose XML as the encoding technology for the protocol messages. One of the main design goals for NCIP was to try to strike the appropriate balance between ease of implementation and providing appropriate functionality. This functionality includes that needed to support both those application areas that the NISO committee anticipate will use the protocol in the short term and new applications that might be developed in the future." See: "NISO Circulation Interchange Protocol (NCIP)."

  • MARC Standards - Information from the Network Development and MARC Standards Office

  • MARC Documentation

  • MARC DTDs (Document Type Definitions) - From the Library of Congress

  • Background information on the MARC DTD Development project; [local archive copy]

  • Announcement for a Beta Test Version of the MARC DTDs and Conversion Utilities

  • MARC-SGML Conversion Program User's Manual

  • MARC-SGML Conversion Program Maintenance Manual

  • MARC Bibliographic/Holdings/Community Information Record DTD; [local archive copy]

  • MARC Authority/Classification Record DTD; [local archive copy]

  • MARC-SGML Conversion Utilities; [local archive copy]

  • "MARC Data in an SGML Structure." By Sally H. McCallum (Chief, Network Development and MARC Standards Office, Library of Congress, Washington DC, 20540 USA). October 9, 1996. [local archive copy]

  • "From MARC to Markup: SGML and Online Library Systems." By Edward Gaynor. Associate Director of Special Collections, and Coordinator of the Special Collections Digital Center, University of Virginia Library. [local archive copy]

  • "Moving From MARC to XML." Three parts (as of 2001-06): (1) Part One - Introduction; (2) Part Two - Handling of Multi-Scripts Metadata; (3) Part Three - Handling of Authority Metadata.

  • SGML-MARC: Incorporating Library Cataloging into the TEI Environment. By Stephen Paul Davis, Columbia University.

  • TEI/MARC "Best Practices"

  • SGML and MARC- Florida State University

  • MARCDTD FTP Directory - UC Berkeley

  • USMARC SGML DTD [local archive copy]

  • UC Berkeley marc2sgml conversion utilities; [local archive copy]

  • MARC XML - Conversion of MARC records to XML and back. From Logos. See also below. Description: "MARC XML. This document describes Logos' application for conversion of MARC records to XML and back. MARC XML generates a simple, well-formed XML document representing any MARC record and can convert XML in that format into a valid MARC record. There is no DTD and no character set conversion is performed other than to move 8-bit ANSEL characters to UTF-8 safe character entities and back. (If the MARC record is in ANSEL, the data will "round-trip" in the ANSEL character format.) It's important to note that the approach taken by the Logos MARCXML program is very simplistic. What's encoded in the XML file is simply the raw structure of the file. There is no built-in understanding of USMARC or any other MARC format and there is no detection of improper use of tags during an XML to MARC conversion. The Library of Congress has information on the MARC Homepage about a DTD-based set of conversion tools for taking MARC records in and out of SGML. The LC DTD's actually incorporate a very significant amount of information about the meanings and uses of the MARC field and subfield tags. If you'd like more information about the MARCXML application please email

  • [May 05, 2000] "... Over time we've dealt with a few things viz. MARC and XML, and I was not sure-- in the XML MARC SOftware thread I don't know if a truly sublime and see-spot-run easy tool which round trips was mentioned (I'm using it and XSLT to populate an 856 field via automated lookups our Oracle DBA is doing in PL/SQL). It's called marcxml.exe, and is Windows only, but free and will recalculate MARC from records modified in XML , thus allowing an XSLT transformation of MARC records. It's great tool you can find it at: Documents on related procedures at [from John Robert Gardner, Ph.D. (XML Engineer, Emory University) to USMARC List [], 2000-05-05.]

  • The Network Development and MARC Standards Office

  • TEI and XML in Digital Libraries - Meeting Schedule and Final Reports, with WG Recommendations

  • [October 23, 2001] "Java MARC Events [James]." From a posting of Bas Peters (2001-10-23). "James (Java MARC Events) is a free Java package that provides an event model for MARC records through Java callbacks. James is inspired by the Simple API for XML (SAX). Using James you can write programs that involve MARC records without knowing the details of the MARC record structure. James provides a sequential model to access a collection of MARC records in tape format. The goal of James is to provide a generic application interface to records that conform to the ISO-2709 exchange format. The MARCHandler interface provides methods to get information about the record label (record position 00-23), control fields (001-009) and data fields (010-999), including indicator values, tag names, subfield codes and data. The character encoding of the original records is preserved. Field data is returned in character arrays. The optional ErrorHandler interface provides methods to handle error messages. The current release of James does not support character conversions. Included with James are two sample programs. The com.bpeters.samples.TaggedPrinter program converts MARC records to a tagged display format and the com.bpeters.samples.XMLPrinter program builds a JDOM document out of MARC records and writes the JDOM document as XML to a file. The current version of James is beta 1. The release contains binary and source distributions. James is published under the terms of the GNU General Public License as published by the Free Software Foundation. Downloads and additional information can be found at James beta 2 provides a round trip from MARC to non-MARC and back again using a MARC object model and includes Javadoc documentation... Besides the event model James also includes a MARC object model to provide a round trip from MARC to a different format using the event model (for example to convert MARC records to XML) and from a non-MARC format to MARC using Record objects. Using the MARC object model it is for example possible to use the XML SAX API to build MARC Record objects from XML and to serialize them to tape format." Related: "MARC (MAchine Readable Cataloging) and SGML/XML"; "BiblioML - XML for UNIMARC Bibliographic Records"; "bibteXML: XML for BibTeX"; "Medlane XMLMARC Experiment - MARC to XML." See the .ZIP distribution [cache]

  • See also:

    • "Nomen Project for Enhanced MARC 21 Name Authority."
    • XML4Lib Electronic Discussion Forum on the Use of XML in Libraries
    • BiblioML - XML for UNIMARC Bibliographic Records
    • Medlane XMLMARC Experiment - MARC to XML
    • NISO Circulation Interchange Protocol (NCIP)
    • [January 10, 2002]   RefDB Bibliographic Database Management Tool Supports DocBook and TEI.    A posting from Markus Hoenicka announces a new release of RefDB for bibliographic database management. RefDB is a "reference database and bibliography tool for markup languages; it helps you to keep track of the publications you read and allows you to automatically create bibliographies in your SGML, XML, or LaTeX documents. The citations and bibliographies can be formatted according to the specifications of a particular journal or publisher. RefDB currently supports document types and stylesheets based upon DocBook SGML (DSSSL), DocBook XML (DSSSL or XSL), and TEI XML (XSL). Further document types can be added without modifying RefDB itself. Using RefDB one may create HTML, PostScript, PDF, DVI, MIF, or RTF output from DocBook or TEI sources with fully formatted citations and bibliographies according to a publisher's specifications. RefDB is a client/server system which was specifically designed to allow sharing of databases in workgroups or departments, although it runs just as well on a standalone workstation. RefDB currently runs on Linux, FreeBSD, and Windows/Cygwin, but other Unices most likely work as well. RefDB is released under the GPL and available for free." [Full context]

Hosted By
OASIS - Organization for the Advancement of Structured Information Standards

Sponsored By

IBM Corporation
ISIS Papyrus
Microsoft Corporation
Oracle Corporation


XML Daily Newslink
Receive daily news updates from Managing Editor, Robin Cover.

 Newsletter Subscription
 Newsletter Archives
Globe Image

Document URI:  —  Legal stuff
Robin Cover, Editor: