Norman Walsh (Technology Development Group, Sun Microsystems, Inc.) announced the publication of a new version of the XML Catalogs specification from the OASIS Entity Resolution Technical Committee. The document addresses problems stemming from the lack of entity management facilities in XML, which has "impeded" interoperability of XML documents. The specification "defines an entity catalog that handles the simple cases of mapping an external entity's public identifier and/or system identifier to an alternate URI. Though it does not handle all issues that a combination of a complete entity manager and storage manager addresses, it simplifies both the use of multiple products in a great majority of cases and the task of processing documents on different systems." Formal notations for the XML Catalog are given in the four appendices: XML Schema for the XML Catalog, TREX Grammar for the XML Catalog, RELAX Grammar for the XML Catalog, XML DTD for the XML Catalog.
From the 16-February-2001 version of 'XML Catalogs': "The requirement that all external identifiers in XML documents must provide a system identifier has unquestionably been of tremendous short-term benefit to the XML community. It has allowed a whole generation of tools to be developed without the added complexity of explicit entity management. However, the interoperability of XML documents has been impeded in several ways by the lack of entity management facilities: (1) External identifiers may require resources that are not always available. For example, a system identifier that points to a resource on another machine may be inaccessible if a network connection is not available. (2) External identifiers may require protocols that are not accessible to all of the vendors' tools on a single computer system. An external identifier that is addressed with the ftp: protocol, for example, is not accessible to a tool that does not support that protocol. (3) It is often convenient to access resources using system identifiers that point to local resources. Exchanging these documents with other systems is problematic at best and impossible at worst. While there are many important issues involved and a complete solution is beyond the current scope, the OASIS membership agrees upon the enclosed set of conventions to address a useful subset of the complete problem. To address these issues, this specification defines an entity catalog that maps an entity's external identifier to a URI..."
"The objective of the Entity Resolution Technical Committee is to provide facilities to address issue A of the OASIS Catalog Specification (TR 9401:1997). These facilities will take into account new XML features and omit those features of TR 9401 that are only applicable to SGML, as well as those features applicable only to issue B in TR 9401.
"Entity resolution is the process that an XML processor goes through when it has been requested to find another file in the course of processing the file it's working on. The XML processor knows labelling information about the file such as its system identifier and possibly a name, public identifier, and so forth. These identifiers can be used to determine the actual location of the desired external file. This determination process (which 'maps' the known labelling information into an actual location) is called an entity resolution, and the file that contains the specific mapping information is called the entity resolution catalog."
Principal references:
- XML Catalogs. Revision date: 16-February-2001. [cache]
- OASIS Entity Resolution Technical Committee
- TC Mailing List archive
- SGML/XML Entity Sets and Entity Management - Local reference
- SGML Entity Types, and Entity Management - Local reference
- Catalogs, Formal Public Identifiers, Formal System Identifiers - Local reference