This document contains the most frequently asked questions about XML Catalogs and the work of the Entity Resolution TC. It is not intended to replace any specifications or other documents. More information about the work of the ER TC, and ways to send comments, are available at the address above.
An entity is what the web world calls a resource, e.g., a file, an image, a stylesheet, or just something at the end of a URI.
A more precise way to word this is "what does it mean to resolve an entity reference". What this means is that you have information about, actually a reference to, some resource (file, stylesheet, etc), and the entity resolution process resolves that reference into something that the application can access, thereby enabling the application to use the resource (apply the stylesheet, include the file, look up the WSDL).
It's the listing of mappings from the references to entities (resources) to the actual resources that can be accessed or that are trusted.
Here's an example catalog, taken from the DocBook distribution.
<?xml version='1.0'?> <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog" prefer="public"> <!-- ...................................................................... --> <!-- DocBook driver file .................................................. --> <public publicId="-//OASIS//DTD DocBook XML V4.5CR1//EN" uri="docbookx.dtd"/> <system systemId="http://www.oasis-open.org/docbook/xml/4.5CR1/docbookx.dtd" uri="docbookx.dtd"/> <system systemId="http://docbook.org/xml/4.5CR1/docbookx.dtd" uri="docbookx.dtd"/> <!-- ...................................................................... --> <!-- DocBook modules ...................................................... --> <public publicId="-//OASIS//DTD DocBook CALS Table Model V4.5CR1//EN" uri="calstblx.dtd"/> <public publicId="-//OASIS//ELEMENTS DocBook XML HTML Tables V4.5CR1//EN" uri="htmltblx.mod"/> <public publicId="-//OASIS//DTD XML Exchange Table Model 19990315//EN" uri="soextblx.dtd"/> <public publicId="-//OASIS//ELEMENTS DocBook Information Pool V4.5CR1//EN" uri="dbpoolx.mod"/> <public publicId="-//OASIS//ELEMENTS DocBook Document Hierarchy V4.5CR1//EN" uri="dbhierx.mod"/> <public publicId="-//OASIS//ENTITIES DocBook Additional General Entities V4.5CR1//EN" uri="dbgenent.mod"/> <public publicId="-//OASIS//ENTITIES DocBook Notations V4.5CR1//EN" uri="dbnotnx.mod"/> <public publicId="-//OASIS//ENTITIES DocBook Character Entities V4.5CR1//EN" uri="dbcentx.mod"/> <!-- ...................................................................... --> <!-- ISO entity sets ...................................................... --> <public publicId="ISO 8879:1986//ENTITIES Diacritical Marks//EN//XML" uri="ent/isodia.ent"/> <public publicId="ISO 8879:1986//ENTITIES Numeric and Special Graphic//EN//XML" uri="ent/isonum.ent"/> <public publicId="ISO 8879:1986//ENTITIES Publishing//EN//XML" uri="ent/isopub.ent"/> <public publicId="ISO 8879:1986//ENTITIES General Technical//EN//XML" uri="ent/isotech.ent"/> <public publicId="ISO 8879:1986//ENTITIES Added Latin 1//EN//XML" uri="ent/isolat1.ent"/> <public publicId="ISO 8879:1986//ENTITIES Added Latin 2//EN//XML" uri="ent/isolat2.ent"/> <public publicId="ISO 8879:1986//ENTITIES Greek Letters//EN//XML" uri="ent/isogrk1.ent"/> <public publicId="ISO 8879:1986//ENTITIES Monotoniko Greek//EN//XML" uri="ent/isogrk2.ent"/> <public publicId="ISO 8879:1986//ENTITIES Greek Symbols//EN//XML" uri="ent/isogrk3.ent"/> <public publicId="ISO 8879:1986//ENTITIES Alternative Greek Symbols//EN//XML" uri="ent/isogrk4.ent"/> <public publicId="ISO 8879:1986//ENTITIES Added Math Symbols: Arrow Relations//EN//XML" uri="ent/isoamsa.ent"/> <public publicId="ISO 8879:1986//ENTITIES Added Math Symbols: Binary Operators//EN//XML" uri="ent/isoamsb.ent"/> <public publicId="ISO 8879:1986//ENTITIES Added Math Symbols: Delimiters//EN//XML" uri="ent/isoamsc.ent"/> <public publicId="ISO 8879:1986//ENTITIES Added Math Symbols: Negated Relations//EN//XML" uri="ent/isoamsn.ent"/> <public publicId="ISO 8879:1986//ENTITIES Added Math Symbols: Ordinary//EN//XML" uri="ent/isoamso.ent"/> <public publicId="ISO 8879:1986//ENTITIES Added Math Symbols: Relations//EN//XML" uri="ent/isoamsr.ent"/> <public publicId="ISO 8879:1986//ENTITIES Box and Line Drawing//EN//XML" uri="ent/isobox.ent"/> <public publicId="ISO 8879:1986//ENTITIES Russian Cyrillic//EN//XML" uri="ent/isocyr1.ent"/> <public publicId="ISO 8879:1986//ENTITIES Non-Russian Cyrillic//EN//XML" uri="ent/isocyr2.ent"/> <!-- End of catalog data for DocBook XML V4.5CR1 ............................. --> <!-- ...................................................................... --> </catalog>
You can use an entity resolver for DTDs, schemas, stylesheets, other docs you're including with XInclude, and lots more. Any time you have a URI for a resource and you want to map that to a different copy or version of that resource (perhaps local, perhaps trusted), you can use an entity resolver to handle the mapping for you.
The entity manager is the part of the application that uses the catalog to actually get the requested resource.
The Entity Resolution TC has a (probably partial) list at List of implementations.
You can use a catalog to map from URIs in documents to copies from a trusted or known source. Norm Walsh wrote a paper on this that was delivered at the XML 2003 Conference. The proceedings paper is available at Caching in with Resolvers.
If you're using a SAX processor, putting relative URIs in the catalog mapping is not going
to work since SAX resolves relative URIs before the entity manager sees them. To overcome
this problem, either use systemSuffix
or put the
absolute URIs in your catalog.
XInclude refers to documents it's going to include through a URI and you can map those in the catalog to any resource that you want
Yes - this is one use case for the delegate entries
You use system
for
system identifiers and uri
for everything else. System identifiers
are carefully defined in XML so the system entry matches that
definition and is used for DTDs, XML entities, and notations
Many applications dereference a schema namespace to try to get information about the schema; there may or may not be anything useful at the end of that reference. This was the usecase for RDDL. The URI entry could point to a schema or to a RDDL file if the processor knows what to do with a RDDL file.
Yes, catalogs and entity resolution can be used for anything that can be dereferenced from a URI.
Yes, the file format is extensible - you just have to ensure you use a different namespace.
There is a mailing list at <docbook-apps@lists.oasis-open.org> where many people have experience with using catalogs for DocBook and may be able to help. More information on the list and how to subscribe is at DocBook Mailing Lists
There is information about getting help and support for using libXML at Reporting bugs and getting help. Note that libXML is not a commercial product, so help and support may be patchy.
There is a project mailing list at How can I find out more? for people using xml-commons, where people may be able to help (depending on the issue).
Please contact the vendor support desk of your implementation for vendor product support.