Processing Odd files at UICVM
[Cache from http://www.tei-c.org/Vault/ED/eda16.gml; please use this canonical URL/source if possible.]
.sr docfile = &sysfnam. ;.sr docversion = 'Draft';.im teigmlp1 <!-- TEI Doc. No: ED A16 --> <!-- Title: Processing Odd files at UICVM --> <!-- Drafted: 25 Mar 92 CMSMcQ --> <!-- ********************************************************** --> <!-- Revision History (add lines at top) --> <!-- Date Who What --> <!-- 23 Jul 92 WP More final changes --> <!-- 21 Jul 92 WP Final Changes --> <!-- 15 Jul 92 WP Corrected Script; updated per trial --> <!-- session with MSM --> <!-- 6 May 92 WP Continued revisions in Manual Fixes --> <!-- 27 Apr 92 WP Continued revisions thru Manual Fixes --> <!-- 23 Apr 92 WP Continued revisions --> <!-- 19 Apr 92 WP Began revision (new section 2) --> <!-- 25 Mar 92 CMSMcQ made file --> <!-- ********************************************************** --> <!-- --> .* Document proper begins. <gdoc sec=&security.> <include file=teiblank> <frontm> <titlep> <title stitle='TEI &docfile. Processing Odd Files'>Processing Odd Files at UICVM <author>C. M. Sperberg-McQueen <docnum>TEI &docfile. <date>&docdate. </titlep> <!> <toc> </frontm> <!> <body> <h1>Introduction and Overview <p>ODD (hereafter <q>Odd</q>) files are those prepared by the work groups and editors of the TEI for TEI P2. Before they can be printed, Odd files must be translated into other forms, and may need some manual editing along the way. This document provides the bare minimum of information on how we do this at UICVM. For further information on the Odd tag set or other tag sets mentioned here, see document ED W29, the Odd Manual. A conceptual scheme of the entire processing from ODD to publication of P2 is shown below: <include file=nodashes> <xmp> "Make Postscript ODD P2Driv Script------> Memo Spitbol Programs / \ / \ / \ \ Doc" Script97 / | \ | | | \ P2X Ref(opt.) DTD | | | "Norman" Spitbol |---------------------- | | \--->LaTeX | | | Script -----------------------------| Bib (opt.) P2 Processing Scheme from ODD to Publication </xmp> <include file=dashdefs> <h1>File Types to Remember <p>Working with Odd files you may see the following file types <gl> <gt>Odd <gd>the main input file type for TEI P2. All editorial corrections should be made here. A complete Odd file for a single chapter may in fact consist of more than one file, including the driver file and a number of individual files. <gl> <gt>Odd driver files (e.g. P234Driv Odd) <gd>Many Odd files will be driver files that include various components of the chapter. These driver files will include the necessary commands and headings to process the chapter, and will imbed the various chapter components (e.g. P234a Odd or P2361 Odd). <gt>Odd component files (e.g. P234a Odd, P2361 Odd) <gd>These are the prose files that comprise the chapter ODD file, and are imbedded into the chapter's Odd driver file. <gt>Odd tags file (e.g. P234Tags Odd; P236Tags Odd) <gd>This is the Odd file that defines the tags and attributes associated with the chapter's subject matter. As with the Odd component files, this will also be imbedded in the Odd Driver file. </gl> <gt>p2x <gd>a modified version of an Odd file, in a tag set easier to process directly in Waterloo Script. A P2X file rearranges some text in the Odd, deletes some text, and uses different tags. <gt>ref <gd>text for the reference section of TEI P2. A Ref file adds some new information to the corresponding parts of the Odd it came from, and uses very slightly different tags. If a chapter does not add any new tags, it will not have a Ref file. <gt>dtd <gd>normal SGML document type declaration files; these too are produced from the Odd files. <gt>script <gd>a file in a tag set we can process directly using Waterloo Script/GML; in this case, it is almost identical to the P2X file, but differs in a couple of areas where it's easier to change the tagging than to write Waterloo Script macros to handle the P2X tags directly. (In the future, we will write Script macros for all P2X tags, and we won't use SCRIPT as a file type anymore.) <gt>log <gd>some of the programs in the Odd system produce a log file containing important messages, for study if things go wrong; if nothing goes wrong, the log can be discarded. <gt>out <gd>some programs may still produce files with a file type of OUT instead of one of the other file types in this list; over time, these programs should all be modified to use more informative file types, so report any Out files you see. </gl> <h1>P2Driv Script File <p>This is the file that ultimately will be used to print out all of P2, through the imbedding of each of the separate chapters, reference sections, bibliogaphies, etc. In the meantime, it is used to print out individual chapters and their corresponding reference sections and (in some cases) bibliographies. As a driver file, it serves the function of including the necessary adjunct material for each fascicle -- an overview note on P2, the complete table of contents for P2, the cover page for the part of the chapter being released, and the User Response and Comment form -- along with the individual chapter material as well. See below under "Step-By-Step Processing" for a more detailed description of this file. <h1>Programs to Remember <p>These are the major programs and commands to use to work with Odd files; this list does not include things like Xedit and ftp, which can be used on Odd files as on others. First, there are programs to validate SGML files: <gl> <gt>vm2 <gd>fast public-domain validator: it reads an SGML file and tells you whether it's a valid file or not and if not why not. Its messages are not always very helpful, though. Our copy runs under DOS, though we can also get a Mac version. (Progress on a VM/CMS version has come to a standstill.) To run VM2 on a file called <term>myfile.odd</term>, at the DOS prompt type <xmp> vm2 myfile.odd </xmp> or if the output is voluminous and you want to see just a screen at a time: <xmp> vm2 myfile.odd | more </xmp> or to place the output in a file (e.g. <term>myfile.log</term>) to study: <xmp> vm2 myfile.odd > myfile.log </xmp> <gt>parse <gd>batch command for DOS machines; invokes Mark-It, a commercial SGML parser with better diagnostic messages than VM2 (but slower). To run Parse on the file <term>myfile.odd</term>, at the DOS prompt type: <xmp> parse myfile.odd </xmp> <gt>xtran <gd>DOS-based SGML processor which we can also use to validate documents; this runs only on 386 machines, so it can't be run from the Zeniths. <gt>checkmark <gd>Mac-based validator for SGML documents <gt>Author/Editor <gd>Mac-based editor for SGML documents; it can validate Odds, but not TEI documents. A Windows-based version doesn't work quite right yet and isn't useful right now. <gt>mkted <gd>DOS-based commercial SGML-aware editor with parser available from inside; you are unlikely to use this very often. To use MktEd to edit the file <term>myfile.odd</term>, at the DOS prompt type: <xmp> mkted myfile.odd </xmp> </gl> <p>In addition to the general SGML software, we have other programs specifically written to process Odd and related files: <gl> <gt>sp <gd>batch program / exec to run the Spitbol interpreter (we have this both under DOS and under VM/CMS, but the DOS version requires a 386 machine, so Spitbol programs can only run on the PS2 or on the CompuAdd portable machine) <gt>oddp2x <gd>Spitbol program to produce a P2X from an Odd <gt>odddtd <gd>Spitbol program to produce DTD files from an Odd <gt>oddref <gd>Spitbol program to produce REF files from an Odd <gt>norman <gd>Spitbol program to produce LaTeX files from SGML documents, to be used in combination with Latex.Dic (i.e. dictionary for use with Norman) <gt>p2xgml <gd>Spitbol program to read a P2X file and translate it into a Script file which uses the P2X SCRIPT macros on the TEIDOC disk <gt>dtdodd <gd>Spitbol program to read a DTD and produce draft ODD material <gt>makedoc <gd>CMS exec to run Script with the proper options to produce an output file of a specified kind </gl> <h1>Step-by-Step Processing <ol compact=0> <li>Checking and Validating the Odd file <p>In processing an Odd file, the following steps may be needed; they need not be done in exactly this order but they should all be done unless described as optional. The checklist assumes we are working with chapter 34 of P2 -- for information on the filenames used in the draft of P2, see document ED W23. <ol compact=0> <li>optionally check the ODD file for things that create problems in Script; this may also be done later. For a list of such problems, see <hdref refid=manfix> below. <li>move all of the current Odd and Odd Driver files of the chapter (e.g. P234driv odd; P234a odd; P234b odd; p234tags odd) to a DOS machine; typically this is most easily done using tftp, from VM/CMS: <xmp> getdisk telnet tftp wendyp.cc.uic.edu put p234driv.odd.c p234driv.odd put p234a.odd.c p234a.odd put p234b.odd.c p234b.odd put p234tags.odd.c p234tags.odd </xmp> <p> This example assumes you are using Wendy's terminal; if you are using one of the other terminals, the appropriate tftp addresses are <xmp> michaels.cc.uic.edu davids.cc.uic.edu tei.cc.uic.edu [n.b.this is the Macintosh in the corner] </xmp> Also, the Odd file may be on TEISCOM or TEIWCOM, and the filemode may vary (usually, it will be "c" or "d"). Check the directory before you begin the tftp procedure. <p> <li>Leave tftp process, logoff and check directory for files that have been transferred. <xmp> quit logoff dir /p </xmp> <li>validate the Odd driver file using vm2 or Parse or MktEd <xmp> vm2 P234driv.odd | more </xmp> or <xmp> parse P234driv.odd </xmp> or <xmp> mkted P234driv.odd </xmp> Alternatively, move the file to the Mac and validate it using Checkmark. </ol> <li>Creating and Validating the P2X File <ol compact=0> <li>make a P2X file. Under VM or DOS, type: <xmp> sp oddp2x </xmp> The program will ask you for the name of the file to be run, to which you should answer: <xmp> P234driv.odd.d [or whatever filemode you are working from] </xmp> N.B. use the dot between file name and file type even under VM; at the moment, at least, oddp2x and the other programs may not handle the file name correctly if you specify it as <q>P234driv odd</q> or <q>P234driv odd c</q>. <p> If you run out of space, you may need to erase one or more of the following files from the TEISCOM disk: <ul> <li>ODDP2X LISTING <li>ODDREF LISTING <li>ODDDTD LISTING <li>P2XGML LISTING <li>TRADUCE LISTING </ul> <p> No matter how many files are included in the Odd file you process, OddP2X will produce just one file: <term>P234driv.p2x</term>. <li>validate the P2X file. If under DOS, just use VM2 or Parse; if under VM, you have to move the file to DOS and check it. </ol> <li>Creating and Validating the Ref File <ol compact=0> <li>make the Ref file. Proceed as for making P2X files, but use OddRef, not OddP2X. Under DOS or VM, type: <xmp> sp oddref </xmp> The program will ask you for the name of the file to be run, to which you should answer: <xmp> P234driv.odd.d [or whatever filemode you are working from] </xmp> <li>rename the Ref files. No matter how many files are included in the Odd file you process, OddRef will produce just one file: <term>P234driv.ref</term>. For later processing with Script, this needs to be renamed: if there is a "driv" suffix, remove it and replace it with the suffix "refs"; otherwise, merely add the suffix "refs" to the filename, most often a chapter number (e.g. P234). Also change the the file type to <q>script</q>. Under VM: <xmp> rename P234driv ref d P234refs script = </xmp> <li>delete the comments from the file by giving the command in Xedit <xmp> Stripcom </xmp> Check the revised file for unnecessary blank lines -- the Stripcom Xedit macro does not remove all blank lines when it removes the comments. Remove the unnecessary ones. <li>validate the Ref files. </ol> <li>Creating and checking the DTD Files. <ol compact=0> <li>make the DTD files. Proceed as for P2X and Ref files, but use the program OddDtd, not OddP2X or OddRef. Under DOS or VM, type: <xmp> sp odddtd </xmp> The program will ask you for the name of the file to be run, to which you should answer: <xmp> P234Driv.odd </xmp> OddDtd will produce one or more DTD files. (If the Odd file you are processing has no DTD fragments in it, OddDtd will produce no output, and in that case you really don't need to run OddDtd in the first place.) The DTD files should be checked for proper formatting; we will periodically also use them to test examples of TEI P2 tagging, but that is not part of processing the Odd file. </ol> <li>Creating and Revising the Script File <ol compact=0> <li>make the Script file(s). Under DOS or VM, type: <xmp> sp p2xgml </xmp> The program will ask you for the name of the file to be run, to which you should answer: <xmp> P234driv.p2x.d [or whatever filemode you are working from] </xmp> P2xGml should produce a file called <term>P234driv.scr</term> under DOS or <term>P234driv script</term> under VM. (It may instead produce <term>P234driv.out</term>, in which case the file should be renamed <term>P234driv script</term> before going on to the next step.) <li>Prepare the Script file for processing by these steps: <ol> <li>strip out the document type declaration at the beginning, if there is one (from the first line through the concluding <q>]></q>) <li>remove the <tag>tei.1</tag>and <tag>/tei.1</tag> tags. <li>for individual chapters of P2, the file should contain just one <gi>h1</gi> (or <gi>div1</gi>) element (in rare cases, just one <gi>h1</gi> or <gi>div2</gi>). This means: <ol> <li>remove any front matter (from <tag>front</tag> to <tag>/front</tag>). <li>remove the opening <tag>body</tag> tag. <li>remove the closing <tag>/body</tag> tag. <li>if there is any back matter, put it in a separate file with an appropriate name. </ol> </ol> <li>check the Script file(s) for problems requiring manual fixes (see <hdref refid=Manfix> below). </ol> <li>Add the Script file of the new chapter to the overall P2 driver file, "P2Driv Script". Usually, all that is necessary is that the existing "P2Driv Script" file be updated and all references to other chapters be commented out. <ol> <li>within the <tag>body</tag> element of P2Driv Script, ensure that the <tag>h0 n=xxx</tag> tag or (<tag>div0 n=xxx</tag> tag) has the appropriate value for <att>n</att>: 2 for part 2, 3 for part 3, etc. <li>make the <tag>include file=xxx</tag> tag point at the appropriate chapter file (e.g. P221Driv, P234Driv). <li>within the chapter file itself, ensure that the <tag>h1 n=xxx</tag> tag (or <tag>div1 n=xxx</tag> tag) has an appropriate value for <att>n</att>: part number plus chapter number (part 3 chapter 34 should have <tag>h1 n=34 id=P234</tag>). <li>if a file of reference material has been prepared (if you are printing output from ODDREF), then make sure that <tag>h0 n=7</tag> and <tag>include file=P234refs</tag> appear and are not commented out. </ol> <p> The relevant portion of the P2Driv Script file should look something like this: <xmp> <![ CDATA [ <h0 n=3>Base Tag Sets <!> <include file=p234driv> <h0 n=7>Alphabetic Reference List of Tags <include file=p234refs> <!> </body> <back> <include file=p234back> </back> ]]> </xmp> <li>Create the Final Versions of the Fascicle to be Published <ol> <li>Create the Memo, Script97 and PostScript files from the Script file <ul> <li>MEMO <li>SCRIPT97 <li>LISTPS </ul> <p> To do this, issue the following command: <xmp> makedoc P2Driv all (twopass execute </xmp> <li>Rename and save the Script97 file on TEIWCOM for printing out hard copies. <xmp> Rename P2Driv Script97 D P234 = = </xmp> <li>Create and print the LaTeX file. <ol compact=0> <li>make the LaTeX file(s) using the program Norman.prt and file LateX.Dictionary (see Lou Burnard's memo about LaTeX processing). <!-- I am not sure whether the material below still applies-wp --> <!-- < xmp> --> <!-- sp traduce P234.p2x --> <!-- < /xmp> --> Some manual post-editing will probably be necessary. <li>process and print the LaTeX file(s): <xmp> getdisk tex latex P234 printtex P234 </xmp> </ol> </ol> <li>Add the files to the TEI-L Filelist. <ul compact=0> <li>Obtain the TEI-L Filelist from Listserv by giving the command: <xmp> tell listserv get tei-l filelist (ctl </xmp> <p> Until you return the Filelist to Listserv, it will be unavailable to anyone else to edit, and, possibly, to read. <li>Go into your reader list and receive the file that will have been sent to you (TEI-L FILELIST). If you have a previous copy on your filelist, instead of just pressing F9, type on the command line <xmp> Receive (replace </xmp> <li>Add the new P2 files <emph>under new names</emph> to the Filelist by use of Xedit. <ul> <li>"P2Driv Memo" should be added as "P234 Doc". <li>"P2Driv Listps" should be added as "P234 PS". <li>"P234Driv P2X" should be added as "P234 P2X". <li>"P234Ref" should be added as "P234 Refs". </ul> <li>Return the TEI-L Filelist to listserv by giving the command <xmp> lsvput tei-l filelist </xmp> </ul> <li>Put the files on the Listserv file server. [N.B. You can only use the "lsvput" command if the VM name of the file is the same as the name of the file on the TEI-L filelist (otherwise, it will be rejected.) As described immediately above, in most cases, the names will vary. The easiest way of dealing with this is to copy the file from TEISCOM to your A disk using the newly-assigned TEI-L Filelist name (see above), and then use the "lsvput" command as shown below. You can erase the file from your A disk as soon as you receive notification that the file has been successfully stored.] <xmp> copy P234Ref script c P234 Ref a copy P2Driv Memo c P234 Doc a copy P2Driv Listps c P234 PS a copy P234Driv P2x c P234 P2X a lsvput P234 p2x lsvput P234 ref lsvput P234 doc lsvput P234 PS lsvput P234 tex </xmp> </ol> <h1 id=ManFix>Manual Fixes (Xedit macros etc.) <h2>Miscellaneous Checklist for Script Problems <p>Script has a number of requirements beyond simple SGML conformance in a file; failure to follow them may result in less attractive output, or none. The possible problems known at present are: <ul compact=0> <li>Comment Lines <ul compact=0> <li>Waterloo Script cannot handle multi-line SGML comments; (i.e. comment lines in which the comment delimiters are only included at the beginning and end of a multi-line SGML comment); therefore, <emph>every</emph> comment line must be enclosed in the comment format even if the comment is a continuation of a single sentence or idea. <xmp> <![ CDATA [ <!-- This comment is acceptable to Waterloo Script. Note that <!-- closing delimiters are not necessary. <!-- This comment is not acceptable to Waterloo Script, as there is no opening delimiter on this line. --> ]]> </xmp> <li>nothing may follow a comment on the same line, although it may be preceded by non-comment material; it must be the last non-blank data on the line. <xmp> <![ CDATA [ <!-- This line is acceptable to Waterloo Script. --> <p>An example: <!-- This line is ok,too. --> <!-- This line is not acceptable to Waterloo Script --> <p> ]]> </xmp> <li>comments <emph>must</emph> begin with the string <q>angle-bracket, hyphen, hyphen, <emph>space</emph></q>: if the space is not present, the comment is not recognized. </ul> <li>Blank Lines <ul compact=0> <li>blank lines cause extraneous white space in the output, although they are useful in the original Odd file for improving readability. In order to suppress them in the Script file, use the following two macros (located on TEISCOM) by typing them on the command line in the file: <gl compact=0> <gt>cdata <gd>hides all examples (actually, just all CDATA marked sections, which is usually the same thing) so they are not affected by global changes <gt>allblank <gd>hides all non-blank lines, leaving only blank lines visible; the command <xmp> delete * </xmp> will then kill all the blank lines. </gl> </ul> <li>Marked sections (sections in which SGML tagging is to be literally or semi-literally displayed, as in examples) <ul compact=0> The only types of marked sections currently being used in Odd files are examples. <p> Examples of any type are tagged with <gi>eg</gi> or <gi>xmp</gi> tags; examples of SGML usage also have to be enclosed in a <term>CDATA marked section</term>, which suppresses recognition of SGML markup within the section. The TEI macros in Waterloo Script can recognize and handle CDATA marked sections, but the following restrictions apply: <ul compact=0> <li>lines should never be longer than 65 characters long, and if they can be kept to 60, so much the better <li>the example <emph>must</emph> be formatted as follows: <ol> <li>The string <q>< ! [ CDATA [</q> with which the marked section begins must be on a single line, and should be the last thing on the line. (There should also be no spaces on either side of the exclamation point, though there can be spaces on either side of the <q>CDATA</q> keyword.) <li>The string <q>] ] ></q> which ends the marked section must begin in column 1 and should be the only thing on the line. </ol> Since these restrictions result from shortcomings in our Waterloo Script macros and do not exist in SGML, the SGML parsers may ignore violations of them. Thus, even validated files have to be checked manually for correct example formatting. Use the command <xmp> <![ CDATA [ all /<![/ | /]>/ ]]> </xmp> to find all the relevant lines, and then use PF11 to split them as needed to ensure that the beginnings and endings of marked sections are properly laid out for Script. It is probably a good idea to do this in the Odd file rather than the Script file, so it doesn't have to be done again, but this involves re-processing the Odd file. </ul> </ul> <li>Valid Script Attribute Values <ul compact=0> <li>P2 Odd files will use periods to make ID and IDREF attribute values more readable (e.g."P221.1"; "s2.s3.s4"). However, Script requires ID and IDREF attributes to have values comprising letters and numbers, but no periods. The following are acceptable using Script: <xmp> <![ CDATA [ <p id=P221> <align targets=s1s2s3> ]]> </xmp> <!> The following are <emph>not</emph> acceptable using Script: <xmp> <![ CDATA [ <p id=P22.1> <align targets=s1.s2.s3> ]]> </xmp> <!> As SGML allows periods in ID and IDREF attribute values, the validation process will not treat them as errors. Thus, the periods must be located and stripped out of the Script file for Script processing. (P2xGml tries to do this but is not completely successful.) <li>Script occasionally objects to spaces after the equal sign in a tag's attribute specification: of the two examples below, the first will produce an error message and the second will be processed correctly. I don't now know exactly what Script's requirements are, so I make a practice, when this error occurs, of just eliminating spaces on either side of equal signs in attribute values. <xmp> <vallist type= closed> <vallist type=closed> </xmp> </ul> </ul> </body> <!> </gdoc>
Prepared by Robin Cover for The XML Cover Pages archive. See "Text Encoding Initiative (TEI) - XML for TEI Lite."