[This local archive copy mirrored from the canonical site: http://www.stud.ifi.uio.no/~larsga/download/python/xml/xmlproc-catalog-doco.html, 980724; links may not have complete integrity, so use the canonical document at this URL if possible.]
This page consists of the following sections:
Catalog files are a means of telling a parser how to map public identifiers to system identifiers. One simple example of this would be to use a catalog file to tell an SGML parser that the DTD with the public identifier "-//W3C//DTD HTML 4.0 Transitional//EN" can be found at the location "file:///usr/pub/sgml/dtds/html40.dtd".
In other words: a public identifier is a well-known name for something that is not site-dependent, while a system identifier tells applications how to find this thing on the local system. A catalog file can be used to find out where to find something at a particular site given its public identifier.
In addition to this, catalog files can affect the parsing of documents in other ways as well.
Catalog files come from the SGML community, but are not part of the SGML standard itself. The catalog file format and semantics are defined in SGML Open Technical Resolution TR9401:1997, and have since been implemented in the SP SGML parser, the DXP XML parser and xmlproc.
The format used by SP (which extends the original format somewhat) has become the de facto standard for catalog files. xmlproc supports a subset of this format.
Catalog files consist of entries: which start with a keyword, followed by arguments separated by whitespace. Arguments which contain spaces must be quoted. Entries are separated by whitespace and comments (which start with "--" and end with "--") can appear anywhere --> --whitespace can appear.
An example catalog file:
-- DSSSL -- PUBLIC "-//James Clark//DTD DSSSL Flow Object Tree//EN" "c:\programfiler\apps\jade\fot.dtd" PUBLIC "ISO/IEC 10179:1996//DTD DSSSL Architecture//EN" "c:\programfiler\apps\jade\dsssl.dtd" PUBLIC "-//James Clark//DTD DSSSL Style Sheet//EN" "c:\programfiler\apps\jade\style-sheet.dtd" -- HTML 2 -- PUBLIC "-//IETF//DTD HTML//EN" html2.dtd PUBLIC "-//IETF//DTD HTML 2.0//EN" html2.dtd
The support for catalog files has not been thoroughly tested and xmlproc probably will not handle the cases where there are conflicts between entries correctly. This part of xmlproc should be considered to be of demonstration quality.
xmlproc supports the following keywords:
PUBLIC pubid sysid
SYSTEM sysid1 sysid2
DOCUMENT sysid
CATALOG sysid
BASE sysid
DELEGATE pubid-prefix sysid
This is easily done. Here is some code that parses the catalog file referred to by the XMLSOCATALOG environment variable:
import os
from xml.parsers.xmlproc import xmlval,catalog
p=xmlval.XMLValidator()
cat=catalog.xmlproc_catalog(os.environ["XMLSOCATALOG"],\
catalog.CatParserFactory())
p.set_pubid_resolver(cat)
p.parse_resource(sysid)
The xmlproc implementation contains both a general catalog file parser and a general catalog file implementation, to which the xmlproc PubIdResolver is just one of many possible clients. This means that you can use this catalog file parser in your own applications.
If you just want to make xmlproc use a catalog file you should look at the xmlproc_catalog class.
The catalog module has the following classes and interfaces:
The CatalogParser class is mainly useful if you want to develop your own catalog file support completely from scratch. It only parses the file and passes information to you, without doing anything with it. If you just want to query the parsed information you should probably look at the catalog manager below.
The CatalogParser class has these methods:
def __init__(self):
def set_application(self,app):
def set_error_handler(self,err):
def parse_resource(self,sysid):
This is the definition of the interface used by applications that wish to receive catalog file parsing events. No attempt is made to interpret the entries or their parameters in any way. These methods are required:
def handle_public(self,pubid,sysid):
def handle_delegate(self,prefix,sysid):
def handle_document(self,sysid):
def handle_system(self,sysid1,sysid2):
def handle_base(self,sysid):
def handle_catalog(self,sysid):
The CatalogManager is a central class in the catalog implementation. Users that want to work with catalog files should instantiate a CatalogManager and let it parse and keep track of the catalog information for them, and only query it when information is needed.
The CatalogManager class has these methods:
def __init__(self):
def set_error_handler(self,err):
def set_parser_factory(self,parser_fact):
def parse_catalog(self,sysid):
def report(self,out=sys.stdout):
def get_document_sysid(self):
def remap_sysid(self,sysid):
def resolve_sysid(self,pubid,sysid):
This class is used by the CatalogManager to create catalog parsers for parsing catalog files. It is mainly interesting if you want to control which parser the CatalogManager uses for parsing its catalog files, such as if you want to use your own subclass of CatalogParser instead of the usual class.
The CatParserFactory has these methods:
def make_parser(self,sysid):
This class is a client to the CatalogManager that conforms to the PubIdResolver interface, and so can be used to make xmlproc use a catalog file. The xmlproc_catalog class has these methods:
def __init__(self,sysid,pf):
This class is a client to the CatalogManager that conforms to the SAX EntityResolver interface, and so can be used to make a SAX use a catalog file for resolving entity public identifiers. The SAX_catalog class has these methods:
def __init__(self,sysid,pf):
Just before xmlproc 0.50 was released John Cowan proposed the XCatalog 0.1 standard for catalog files in XML format. This proposal has an XML DTD which can be used to mark up catalog files instead of the special syntax used by SGML Open Catalogs. The XCatalog DTD only has a subset of the catalog file functionality implemented by xmlproc for SGML Open Catalogs.
The xmlproc XCatalog implementation is found in the xcatalog module and consists of three classes:
The support for XCatalog should be considered an experimental feature.