The Cover PagesThe OASIS Cover Pages: The Online Resource for Markup Language Technologies
Advanced Search
Site Map
CP RSS Channel
Contact Us
Sponsoring CP
About Our Sponsors

Cover Stories
Articles & Papers
Press Releases

XML Query

XML Applications
General Apps
Government Apps
Academic Apps

Technology and Society
Tech Topics
Related Standards
Created: April 04, 2005.
News: Cover StoriesPrevious News ItemNext News Item

OASIS Entity Resolution TC Releases XML Catalogs V1.1 for Public Review.


A public review draft of the XML Catalogs Version 1.1 specification has been released by the OASIS Entity Resolution Technical Committee. The review period extends from 1-April-2005 through 30-April-2005. Public review and feedback from potential users, developers and stakeholders are an important part of the OASIS process to assure interoperability and quality.

The XML Catalogs specification defines mechanisms to facilitate machine processing of XML entities associated with external identifiers, as defined in production rule 75 of the W3C Recommendation Extensible Markup Language (XML) 1.0 Second Edition. External identifiers, which include system identifiers, are used in the XML Document Type Definition, Entity Declarations (general entities, parameter entities), and in Notation Declarations.

Because the XML Recommendation itself does not specify entity management in a detailed way, "the interoperability of XML documents has been impeded in several ways, identified in the XML Catalogs document: (1) External identifiers may require resources that are not always available; for example, a system identifier that points to a resource on another machine may be inaccessible if a network connection is not available; (2) External identifiers may require protocols that are not accessible to all of the vendors' tools on a single computer system; an external identifier that is addressed with the ftp: protocol, for example, is not accessible to a tool that does not support that protocol; (3) It is often convenient to access resources using system identifiers that point to local resources; exchanging documents that refer to local resources with other systems is problematic at best and impossible at worst."

The XML Catalogs specification therefore defines an entity catalog "that maps both external identifiers and arbitrary URI references to URI references. Conceptually, a catalog is a logical structure that contains mapping information. A catalog may be physically contained in one or more catalog entry files, and a catalog entry file is a document that contains a set of catalog entries. The logical input to a catalog processor is an external identifier (some combination of public and system identifiers) or a URI reference; the logical output of the catalog processor is a URI reference."

The catalog is effectively "an ordered list of (one or more) catalog entry files. It is up to the application to determine the ordered list of catalog entry files to be used as the logical catalog. Each entry in the catalog associates a URI reference with information about an external reference that appears in an XML document. A catalog can be used in two different, independent ways: (1) it can be used to locate the replacement text for an external entity, or (2) it can be used to locate an alternate URI reference for a resource. Although these functions are similar in nature, they are distinct and exercise two different sets of entries in the catalog."

The application-independent entity catalog specified in the draft XML Catalogs document also defines a format for catalog entry files using XML instances, with support for XML Namespaces. "By design, XML Catalogs defined by the draft Committee Specification use the same namespace name as XML Catalogs Committee Specification 1.0. Although additional elements have been defined, the semantics of all existing elements remain unchanged."

Section 7 of the XML Catalogs specification ("Catalog Resolution Semantics") describes how catalog resolution is performed, including resolution of external identifiers and URI references.

The draft XML Catalogs document contains four non-Normative appendices: Appendix A provides a W3C XML Schema for the XML Catalog; Appendix B presents a RELAX NG Grammar for the XML Catalog; Appendix C suppies an XML DTD for the XML Catalog; Appendix D discusses support for SGML Open [OASIS] TR 9401 Catalog Semantics.

Bibliographic Information

XML Catalogs. Edited by Norman Walsh (Sun Microsystems, Inc). OASIS Committee Specification [editorial draft] Version 1.1. 16-March-2005. 41 pages (PDF). Document identifier: 'cs-entity-xml-catalogs-1.1'. Produced by members of the OASIS Entity Resolution Technical Committee. Available in XML, HTML, and PDF formats.

Specification contributors. Members of the OASIS Entity Resolution TC include Adam Di Carlo (Debian), Anthony Coates (Individual Member), Paul Grosso (Arbortext), Mark Johnson (Debian), Jirka Kosek (Individual Member), Craig Salter (IBM), Norman Walsh (Sun Microsystems, Editor), and Lauren Wood (Individual Member, Chair). These individuals were also members of the committee during the development of previous versions: John Cowan (Reuters Health) David Leland (Elsevier Science London), and Normand Montour (IBM).

What is Entity Resolution?

The OASIS TC website offers the following brief description of entity resolution:

"Entity resolution is the process that an XML processor goes through when it has been requested to find another file* in the course of processing the file it's working on. The XML processor knows labelling information about the file such as its system identifier and possibly a name, public identifier, and so forth. These identifiers can be used to determine the actual location of the desired external file. This determination process (which 'maps' the known labelling information into an actual location) is called an entity resolution, and the file that contains the specific mapping information is called the entity resolution catalog.

Note: a file is used here for simplicity, but it could be any other resource consisting of such things as declarations, a parsed entity, an unparsed entity, etc."

Principal References

Hosted By
OASIS - Organization for the Advancement of Structured Information Standards

Sponsored By

IBM Corporation
ISIS Papyrus
Microsoft Corporation
Oracle Corporation


XML Daily Newslink
Receive daily news updates from Managing Editor, Robin Cover.

 Newsletter Subscription
 Newsletter Archives
Bottom Globe Image

Document URI:  —  Legal stuff
Robin Cover, Editor: