[SGML tool] Carthage README file [Mirrored (August 02, 1996) from TEI server: ftp://ftp-tei.uic.edu/pub/tei/grammar/carthage/README] What is it? Carthage is a yacc/lex-based parser for SGML DTDs which can delete references to undeclared elements. It can also do a few other things, depending on the run-time flags you give it. Carthage is unsupported software; it may be used freely without further permission or royalty, but users who improve it or fix errors are requested to notify the author so he can also fix them. How do I use it? To clean a DTD in the current directory, just invoke the command 'carthage' with two arguments: the filename of the input DTD and the file name for the output DTD. For example: carthage dirty.dtd clean.dtd The DTD parser itself is called 'carthago' (that's Latin for 'Carthage') and can be invoked on its own if you like. To get a list of the options it understands, try invoking it with the option '-?'. When I did this just now, this was the result: $ carthago -? carthago: expected usage is carthago [options] < dtd-file > output-file where options are these (* means default): --msglevel 5*|n (or --trace --debug --verbose) --msdrop* --mskeep (drop/keep marked sections, -s0 -s1) --dupentquiet* --dupentwarn (warn if entity declared twice? -e0 -e1) --pedrop* --pekeep (drop/keep parameter entity declarations -p0 -p1) --commdrop* --commkeep (drop/keep parameter entity declarations -c0 -c1) --undeclared* --unused --declared --used (include in report file, any or all) --delete ... (list of GIs to delete from content models) --delenda (name of delete-list file) --output (name of report file, may reuse as delenda file) What this means is that carthago reads from stdin and writes to stdout, and takes the following options: --msglevel --trace (same as --msglevel 0) --debug (same as --msglevel 1) --verbose (same as --msglevel 3) These control the emission of error and warning messages. Trace, debugging, and 'verbose' messages are enabled when the message level is set to 0, 1, or 3 respectively. The default (--msglevel 5) enables informative messages; to allow warnings only, set --msglevel 7. For error messages only, use --msglevel 10. Values higher than 10 have no effect. The default is --msglevel 5, and message levels above 10 have no effect. --msdrop* --mskeep Control whether marked sections marked IGNORE are suppressed silently in the output or kept. By default they are suppressed. --dupentquiet* --dupentwarn Control whether a warning is issued if an entity is declared twice. The default is to say nothing (because the TEI uses multiple declarations frequently, so double declarations are seldom errors in my own work.) --pedrop* --pekeep Control whether to drop or keep parameter entity declarations in the output. By default they're dropped, because all references to such entities are expanded and there seems to reason to keep the declarations around if the references are expanded. --commdrop --commkeep* Control whether to drop or keep comments in the output. The message says that by default they are dropped but I've just checked the code and by default they are kept. Sorry about that. --undeclared* --unused --declared --used Control what lists of elements should be included in the report file, if any. By default undeclared elements (i.e. element which are used but not declared) are included in the list. Unused elements (elements declared but not appearing in content models), declared elements, and elements referred to in content models (--used) may also be listed. Comments in the output file should make clear which is which. --delete ... --delenda Indicate a list of generic identifiers which should be deleted from content models before element declarations are written out. The --delete option allows you to specify generic identifiers on the command line; I use this for testing, but it might also be handy for real work in some cases. The --delenda option takes the name of a file containing the generic identifiers to be deleted. ('Delenda' is the Latin word meaning 'things to be deleted'. In a famous speech Cicero used the word to describe Rome's enemy Carthage -- Carthago delenda est -- which gives this program its name.) If the program encounters a reference to a generic identifier that cannot safely be deleted (e.g. because it's a required subelement), an error message is issued and the reference is left undeleted. If --output Specify the name of the report file to be generated; the file thus created may be reused as the delenda file. How do I fetch and compile it? 1 ftp to ftp-tei.uic.edu and fetch everything from the directories /pub/tei/grammar/carthage /pub/tei/grammar/carthage/lib The makefiles assume that these will go into ~/carthage and ~/lib, respectively, but if you are willing to change the makefiles, you can put them where you like. 2 go to the library directory and type make N.B. on my AIX system this works fine; on the Sun I have access to, gcc complains bitterly about line 43 of file msg.c. The quick hack is to delete the asterisk from line 43 if you get such a complaint. The next release will replace this whole routine with a more rational message function; in the meantime, please let me know if changing it, or not changing it, leads to crashes. 3 go to the carthage directory and type make carthago 4 (optional) add ~/carthage to your path, or move carthage and carthago to your ~/bin directory. 5 If you want a simple check that carthago compiled correctly, just invoke it and type some declarations at it. Here's a transcript of such an interactive session; I typed the lines labeled u, carthago typed those labeled c: u $ carthago --delete a e i o u --verbose u u c c c c ! ! ! :2 Element a is marked for deletion, but is declared.! ! ! u c ! ! ! :3 Model requires hand work: (c, d, e)! ! ! c c c u c c c u c c c $ To test that carthage (the shell script) also works, invoke it on some existing DTD, e.g. the dirty.dtd included in the carthage directory. Check that your output looks like the clean.dtd also in the directory: carthage dirty.dtd new.dtd diff clean.dtd new.dtd N.B. if ftp didn't do it, you will need to set the execute permission on the carthage script. If it still doesn't work, make sure it's pointing to the actual location of the Korn shell on your system. Feel free to modify the SGML_PATH and DPP_PATH values in the shell script; the ones in there now are just those I habitually use; they should be replaced with the ones you habitually use. -C. M. Sperberg-McQueen University of Illinois at Chicago 17 June 1996