Processing Odd files at UICVM

[Cache from http://www.tei-c.org/Vault/ED/eda16.gml; please use this canonical URL/source if possible.]
.sr docfile = &sysfnam. ;.sr docversion = 'Draft';.im teigmlp1
<!-- TEI Doc. No:  ED A16                                       -->
<!-- Title:  Processing Odd files at UICVM                      -->
<!-- Drafted:  25 Mar 92 CMSMcQ                                 -->
<!-- ********************************************************** -->
<!-- Revision History (add lines at top)                        -->
<!-- Date      Who    What                                      -->
<!-- 23 Jul 92 WP     More final changes                        -->
<!-- 21 Jul 92 WP     Final Changes                             -->
<!-- 15 Jul 92 WP     Corrected Script; updated per trial       -->
<!--                   session with MSM                         -->
<!-- 6  May 92 WP     Continued revisions in Manual Fixes       -->
<!-- 27 Apr 92 WP     Continued revisions thru Manual Fixes     -->
<!-- 23 Apr 92 WP     Continued revisions                       -->
<!-- 19 Apr 92 WP     Began revision (new section 2)            -->
<!-- 25 Mar 92 CMSMcQ made file                                 -->
<!-- ********************************************************** -->
<!--                                                            -->
.* Document proper begins.
<gdoc sec=&security.>
<include file=teiblank>
<frontm>
<titlep>
<title stitle='TEI &docfile. Processing Odd Files'>Processing Odd Files at UICVM
<author>C. M. Sperberg-McQueen
<docnum>TEI &docfile.
<date>&docdate.
</titlep>
<!>
<toc>
</frontm>
<!>
<body>
<h1>Introduction and Overview
<p>ODD (hereafter <q>Odd</q>) files are those prepared by the work
groups and editors of the TEI for TEI P2.  Before they can be printed,
Odd files must be translated into other forms, and may need some manual
editing along the way.  This document provides the bare minimum of
information on how we do this at UICVM.  For further information on the
Odd tag set or other tag sets mentioned here, see document ED W29, the
Odd Manual.
 
A conceptual scheme of the entire processing from ODD to publication
of P2 is shown below:
<include file=nodashes>
<xmp>
                                                   "Make  Postscript
               ODD                   P2Driv Script------>  Memo
         Spitbol Programs           /  \ / \  / \ \ Doc"  Script97
           /    |       \             |   |    |   \
       P2X     Ref(opt.) DTD          |   |    | "Norman"
     Spitbol    |----------------------   |    |     \--->LaTeX
        |                                 |    |
      Script -----------------------------|   Bib (opt.)
 
       P2 Processing Scheme from ODD to Publication
</xmp>
<include file=dashdefs>
<h1>File Types to Remember
<p>Working with Odd files you may see the following file types
<gl>
<gt>Odd
<gd>the main input file type for TEI P2.  All editorial corrections
should be made here.  A complete Odd file for a single chapter may
in fact consist of more than one file, including the driver file and a
number of individual files.
<gl>
<gt>Odd driver files (e.g. P234Driv Odd)
<gd>Many Odd files will be driver files that include various components
of the chapter. These driver files will include the necessary commands
and headings to process the chapter, and will imbed the various chapter
components (e.g. P234a Odd or P2361 Odd).
<gt>Odd component files (e.g. P234a Odd, P2361 Odd)
<gd>These are the prose files that comprise the chapter ODD file, and
are imbedded into the chapter's Odd driver file.
<gt>Odd tags file (e.g. P234Tags Odd; P236Tags Odd)
<gd>This is the Odd file that defines the tags and attributes
associated with the chapter's subject matter.  As with the Odd
component files, this will also be imbedded in the Odd Driver file.
</gl>
<gt>p2x
<gd>a modified version of an Odd file, in a tag set easier to process
directly in Waterloo Script.  A P2X file rearranges some text in the
Odd, deletes some text, and uses different tags.
<gt>ref
<gd>text for the reference section of TEI P2.  A Ref file adds some new
information to the corresponding parts of the Odd it came from, and
uses very slightly different tags.  If a chapter does not add any
new tags, it will not have a Ref file.
<gt>dtd
<gd>normal SGML document type declaration files; these too are produced
from the Odd files.
<gt>script
<gd>a file in a tag set we can process directly using Waterloo
Script/GML; in this case, it is almost identical to the P2X file, but
differs in a couple of areas where it's easier to change the tagging
than to write Waterloo Script macros to handle the P2X tags directly.
(In the future, we will write Script macros for all P2X tags, and we
won't use SCRIPT as a file type anymore.)
<gt>log
<gd>some of the programs in the Odd system produce a log file
containing important messages, for study if things go wrong; if nothing
goes wrong, the log can be discarded.
<gt>out
<gd>some programs may still produce files with a file type of OUT
instead of one of the other file types in this list; over time, these
programs should all be modified to use more informative file types, so
report any Out files you see.
</gl>
<h1>P2Driv Script File
<p>This is the file that ultimately will be used to print out all of
P2, through the imbedding of each of the separate chapters, reference
sections, bibliogaphies, etc.  In the meantime, it is used to print out
individual chapters and their corresponding reference sections and (in
some cases) bibliographies.  As a driver file, it serves the function
of including the necessary adjunct material for each fascicle -- an
overview note on P2, the complete table of contents for P2, the cover
page for the part of the chapter being released, and the User Response
and Comment form -- along with the
individual chapter material as well.
See below under "Step-By-Step Processing" for
a more detailed description of this file.
<h1>Programs to Remember
<p>These are the major programs and commands to use to work with Odd
files; this list does not include things like Xedit and ftp, which can
be used on Odd files as on others.  First, there are programs to
validate SGML files:
<gl>
<gt>vm2
<gd>fast public-domain validator:  it reads an SGML file and tells you
whether it's a valid file or not and if not why not.  Its messages are
not always very helpful, though.  Our copy runs under DOS, though we
can also get a Mac version.  (Progress on a VM/CMS version has come to
a standstill.)  To run VM2 on a file called <term>myfile.odd</term>, at
the DOS prompt type
<xmp>
vm2 myfile.odd
</xmp>
or if the output is voluminous and you want to see just a screen at a
time:
<xmp>
vm2 myfile.odd | more
</xmp>
or to place the output in a file (e.g. <term>myfile.log</term>) to study:
<xmp>
vm2 myfile.odd > myfile.log
</xmp>
<gt>parse
<gd>batch command for DOS machines; invokes Mark-It, a commercial SGML
parser with better diagnostic messages than VM2 (but slower).  To run
Parse on the file <term>myfile.odd</term>, at the DOS prompt type:
<xmp>
parse myfile.odd
</xmp>
<gt>xtran
<gd>DOS-based SGML processor which we can also use to validate
documents; this runs only on 386 machines, so it can't be run from the
Zeniths.
<gt>checkmark
<gd>Mac-based validator for SGML documents
<gt>Author/Editor
<gd>Mac-based editor for SGML documents; it can validate Odds, but not
TEI documents.  A Windows-based version doesn't work quite right yet
and isn't useful right now.
<gt>mkted
<gd>DOS-based commercial SGML-aware editor with parser available from
inside; you are unlikely to use this very often.  To use MktEd to edit
the file <term>myfile.odd</term>, at the DOS prompt type:
<xmp>
mkted myfile.odd
</xmp>
</gl>
<p>In addition to the general SGML software, we have
other programs specifically written to process Odd and related files:
<gl>
<gt>sp
<gd>batch program / exec to run the Spitbol interpreter (we have this
both under DOS and under VM/CMS, but the DOS version requires a 386
machine, so Spitbol programs can only run on the PS2 or on the CompuAdd
portable machine)
<gt>oddp2x
<gd>Spitbol program to produce a P2X from an Odd
<gt>odddtd
<gd>Spitbol program to produce DTD files from an Odd
<gt>oddref
<gd>Spitbol program to produce REF files from an Odd
<gt>norman
<gd>Spitbol program to produce LaTeX files from SGML documents, to
be used in combination with Latex.Dic (i.e. dictionary for use with
Norman)
<gt>p2xgml
<gd>Spitbol program to read a P2X file and translate it into a Script
file which uses the P2X SCRIPT macros on the TEIDOC disk
<gt>dtdodd
<gd>Spitbol program to read a DTD and produce draft ODD material
<gt>makedoc
<gd>CMS exec to run Script with the proper options to produce an output
file of a specified kind
</gl>
<h1>Step-by-Step Processing
<ol compact=0>
<li>Checking and Validating the Odd file
<p>In processing an Odd file, the following steps may be needed; they
need not be done in exactly this order but they should all be done
unless described as optional.  The checklist assumes we are working
with chapter 34 of P2 --
for information on the
filenames used in the draft of P2, see document ED W23.
<ol compact=0>
<li>optionally check the ODD file for things that create problems in
Script; this may also be done later.  For a list of such problems, see
<hdref refid=manfix> below.
<li>move all of the current Odd and Odd Driver files of the chapter
(e.g. P234driv odd; P234a odd; P234b odd; p234tags odd)
to a DOS machine; typically this is most easily done
using tftp, from VM/CMS:
<xmp>
getdisk telnet
tftp wendyp.cc.uic.edu
put p234driv.odd.c p234driv.odd
put p234a.odd.c p234a.odd
put p234b.odd.c p234b.odd
put p234tags.odd.c p234tags.odd
</xmp>
<p>
This example assumes you are using Wendy's terminal; if you are using
one of the other terminals, the appropriate tftp addresses are
<xmp>
michaels.cc.uic.edu
davids.cc.uic.edu
tei.cc.uic.edu [n.b.this is the Macintosh in the corner]
</xmp>
Also, the Odd file may be on TEISCOM or TEIWCOM, and the filemode
may vary (usually, it will be "c" or "d").  Check the directory
before you begin the tftp procedure.
<p>
<li>Leave tftp process, logoff
and check directory for files that have been transferred.
<xmp>
quit
logoff
dir /p
</xmp>
<li>validate the Odd driver file using vm2
or Parse or MktEd
<xmp>
vm2 P234driv.odd | more
</xmp>
or
<xmp>
parse P234driv.odd
</xmp>
or
<xmp>
mkted P234driv.odd
</xmp>
Alternatively, move the file to the Mac and validate it using Checkmark.
</ol>
<li>Creating and Validating the P2X File
<ol compact=0>
<li>make a P2X file.
Under VM or DOS, type:
<xmp>
sp oddp2x
</xmp>
The program will ask you for the name of the file to be run, to which
you should answer:
<xmp>
P234driv.odd.d [or whatever filemode you are working from]
</xmp>
N.B. use the dot between file name and file type even under VM; at the
moment, at least, oddp2x and the other programs may not handle the
file name correctly if you specify it as <q>P234driv odd</q> or
<q>P234driv odd c</q>.
<p>
If you run out of space, you may need to erase one or more
of the following files from the TEISCOM disk:
<ul>
<li>ODDP2X LISTING
<li>ODDREF LISTING
<li>ODDDTD LISTING
<li>P2XGML LISTING
<li>TRADUCE LISTING
</ul>
<p>
No matter how many files are included in the Odd file you process,
OddP2X will produce just one file:  <term>P234driv.p2x</term>.
<li>validate the P2X file.  If under DOS, just use VM2 or Parse; if
under VM, you have to move the file to DOS and check it.
</ol>
<li>Creating and Validating the Ref File
<ol compact=0>
<li>make the Ref file.  Proceed as for making P2X files, but use
OddRef, not OddP2X.
Under DOS or VM, type:
<xmp>
sp oddref
</xmp>
The program will ask you for the name of the file to be run, to which
you should answer:
<xmp>
P234driv.odd.d [or whatever filemode you are working from]
</xmp>
<li>rename the Ref files.
No matter how many files are included in the Odd file you process,
OddRef will produce just one file:  <term>P234driv.ref</term>.  For later
processing with Script, this needs to be renamed: if there is a "driv"
suffix, remove it
and replace it with the suffix "refs"; otherwise, merely add the
suffix "refs" to the filename, most often a chapter number (e.g. P234).
Also change the the file type to <q>script</q>.  Under VM:
<xmp>
rename P234driv ref d P234refs script =
</xmp>
<li>delete the comments from the file by giving the
command in Xedit
<xmp>
Stripcom
</xmp>
Check the revised file for unnecessary blank lines -- the
Stripcom Xedit macro does not remove all blank lines when it removes
the comments.  Remove the unnecessary ones.
<li>validate the Ref files.
</ol>
<li>Creating and checking the DTD Files.
<ol compact=0>
<li>make the DTD files.  Proceed as for P2X and Ref files, but use the
program OddDtd, not OddP2X or OddRef.  Under DOS or VM, type:
<xmp>
sp odddtd
</xmp>
The program will ask you for the name of the file to be run, to which
you should answer:
<xmp>
P234Driv.odd
</xmp>
OddDtd will produce one or more DTD files.  (If the Odd file you are
processing has no DTD fragments in it, OddDtd will produce no output,
and in that case you really don't need to run OddDtd in the first
place.)  The DTD files should be checked for proper formatting; we will
periodically also use them to test examples of TEI P2 tagging, but that
is not part of processing the Odd file.
</ol>
<li>Creating and Revising the Script File
<ol compact=0>
<li>make the Script file(s).  Under DOS or VM, type:
<xmp>
sp p2xgml
</xmp>
The program will ask you for the name of the file to be run, to which
you should answer:
<xmp>
P234driv.p2x.d [or whatever filemode you are working from]
</xmp>
P2xGml should produce a file called <term>P234driv.scr</term> under DOS
or <term>P234driv script</term> under VM.  (It may instead produce
<term>P234driv.out</term>, in which case the file should be renamed
<term>P234driv script</term> before going on to the next step.)
<li>Prepare the Script file for processing by these steps:
<ol>
<li>strip out the document type declaration at the beginning, if there
is one (from the first line through the concluding <q>]></q>)
<li>remove the <tag>tei.1</tag>and <tag>/tei.1</tag> tags.
<li>for individual chapters of P2, the file should contain just one
<gi>h1</gi> (or
<gi>div1</gi>) element (in rare cases, just one <gi>h1</gi> or
<gi>div2</gi>).  This
means:
<ol>
<li>remove any front matter (from <tag>front</tag> to <tag>/front</tag>).
<li>remove the opening <tag>body</tag> tag.
<li>remove the closing <tag>/body</tag> tag.
<li>if there is any back matter, put it in a separate file with an
appropriate name.
</ol>
</ol>
<li>check the Script file(s) for problems requiring manual fixes
(see <hdref refid=Manfix> below).
</ol>
<li>Add the Script file of the new chapter
to the overall P2 driver file,
"P2Driv Script".  Usually, all that is necessary is
that the existing "P2Driv Script" file be updated and all references to
other chapters be commented out.
<ol>
<li>within the <tag>body</tag> element of P2Driv Script, ensure that the
<tag>h0 n=xxx</tag> tag or (<tag>div0 n=xxx</tag> tag) has the
appropriate value for <att>n</att>: 2 for part 2, 3 for part 3, etc.
<li>make the <tag>include file=xxx</tag> tag point at the appropriate
chapter file (e.g. P221Driv, P234Driv).
<li>within the chapter file itself, ensure that the <tag>h1 n=xxx</tag>
tag (or
<tag>div1 n=xxx</tag> tag) has
an appropriate value for <att>n</att>:  part number plus chapter number
(part 3 chapter 34 should have <tag>h1 n=34 id=P234</tag>).
<li>if a file of reference material has been prepared (if you are
printing output from ODDREF), then make sure that <tag>h0 n=7</tag>
and <tag>include file=P234refs</tag> appear and are not commented out.
</ol>
<p>
The relevant portion of the P2Driv Script file
should look something like this:
<xmp>
<![ CDATA [
<h0 n=3>Base Tag Sets
<!>
<include file=p234driv>
<h0 n=7>Alphabetic Reference List of Tags
<include file=p234refs>
<!>
</body>
<back>
<include file=p234back>
</back>
]]>
</xmp>
<li>Create the Final Versions of the Fascicle to be Published
<ol>
<li>Create the Memo, Script97 and PostScript files from the Script file
<ul>
<li>MEMO
<li>SCRIPT97
<li>LISTPS
</ul>
<p>
To do this, issue the following command:
<xmp>
makedoc P2Driv all (twopass execute
</xmp>
<li>Rename and save the Script97 file on TEIWCOM for printing out
hard copies.
<xmp>
Rename P2Driv Script97 D P234 = =
</xmp>
<li>Create and print the LaTeX file.
<ol compact=0>
<li>make the LaTeX file(s) using the program Norman.prt and file
LateX.Dictionary (see Lou Burnard's memo about LaTeX processing).
<!-- I am not sure whether the material below still applies-wp -->
<!-- < xmp>                                                     -->
<!-- sp traduce P234.p2x                                       -->
<!-- < /xmp>                                                    -->
Some manual post-editing will probably be necessary.
<li>process and print the LaTeX file(s):
<xmp>
getdisk tex
latex P234
printtex P234
</xmp>
</ol>
</ol>
<li>Add the files to the TEI-L Filelist.
<ul compact=0>
<li>Obtain the TEI-L Filelist from Listserv by giving the command:
<xmp>
tell listserv get tei-l filelist (ctl
</xmp>
<p>
Until you return the Filelist to Listserv, it will be unavailable to
anyone else to edit, and, possibly, to read.
<li>Go into your reader list and receive the file that will have been
sent to you (TEI-L FILELIST).  If you have a previous copy on your
filelist, instead of just pressing F9, type on the
command line
<xmp>
Receive (replace
</xmp>
<li>Add the new P2 files <emph>under new names</emph> to the Filelist
by use of Xedit.
<ul>
<li>"P2Driv Memo" should be added as "P234 Doc".
<li>"P2Driv Listps" should be added as "P234 PS".
<li>"P234Driv P2X" should be added as "P234 P2X".
<li>"P234Ref" should be added as "P234 Refs".
</ul>
<li>Return the TEI-L Filelist to listserv by giving the command
<xmp>
lsvput tei-l filelist
</xmp>
</ul>
<li>Put the files on the Listserv file server. [N.B. You can only use
the "lsvput" command if the VM name of the file is the same as the name
of the file on the TEI-L filelist (otherwise, it will be rejected.)
As described immediately above, in most cases, the names will vary.
The easiest way of dealing with this is to copy the file from TEISCOM
to your A disk using the newly-assigned TEI-L Filelist name (see
above), and then use the "lsvput" command as shown below.  You can
erase the file from your A disk as soon as you receive notification
that the file has been successfully stored.]
<xmp>
copy P234Ref script c P234 Ref a
copy P2Driv Memo c P234 Doc a
copy P2Driv Listps c P234 PS a
copy P234Driv P2x c P234 P2X a
lsvput P234 p2x
lsvput P234 ref
lsvput P234 doc
lsvput P234 PS
lsvput P234 tex
</xmp>
</ol>
<h1 id=ManFix>Manual Fixes (Xedit macros etc.)
<h2>Miscellaneous Checklist for Script Problems
<p>Script has a number of requirements beyond simple SGML conformance
in a file; failure to follow them may result in less attractive output,
or none.  The possible problems known at present are:
<ul compact=0>
<li>Comment Lines
<ul compact=0>
<li>Waterloo Script cannot handle multi-line SGML
comments; (i.e.
comment lines in which the comment delimiters are only included at
the beginning and end of a multi-line SGML comment);
therefore,
<emph>every</emph> comment line must be enclosed in the comment format
even if the comment is a continuation of a single sentence or idea.
<xmp>
<![ CDATA [
<!-- This comment is acceptable to Waterloo Script. Note that
<!-- closing delimiters are not necessary.
 
<!-- This comment is not acceptable to Waterloo Script, as there
     is no opening delimiter on this line.                      -->
]]>
</xmp>
<li>nothing may follow a comment on the same line, although it
may be preceded by non-comment material;
it must be the
last non-blank data on the line.
<xmp>
<![ CDATA [
<!-- This line is acceptable to Waterloo Script.              -->
<p>An example: <!-- This line is ok,too.                      -->
<!-- This line is not acceptable to Waterloo Script --> <p>
]]>
</xmp>
<li>comments <emph>must</emph> begin with the string <q>angle-bracket,
hyphen, hyphen, <emph>space</emph></q>:  if the space is not present,
the comment is not recognized.
</ul>
<li>Blank Lines
<ul compact=0>
<li>blank lines cause extraneous white space in the output, although
they are useful in the original Odd file for improving readability.
In order to suppress them in the Script file, use the following
two macros (located on TEISCOM) by typing them on the command line
in the file:
<gl compact=0>
<gt>cdata
<gd>hides all examples (actually, just all CDATA marked sections, which
is usually the same thing) so they are not affected by global changes
<gt>allblank
<gd>hides all non-blank lines, leaving only blank lines visible; the
command
<xmp>
 
delete *
 
</xmp>
will then kill all the blank lines.
</gl>
</ul>
<li>Marked sections (sections in which SGML tagging
is to be literally or semi-literally displayed, as in examples)
<ul compact=0>
The only types of marked sections currently being used in Odd files
are examples.
<p>
Examples of any type are tagged with <gi>eg</gi> or <gi>xmp</gi>
tags; examples of SGML usage also have to be enclosed in a <term>CDATA
marked section</term>, which suppresses recognition of SGML markup
within the section.  The TEI macros in Waterloo Script can recognize
and handle CDATA marked sections, but the following restrictions apply:
<ul compact=0>
<li>lines should never be longer than 65 characters long,
and if they can be kept to 60, so much the better
<li>the example <emph>must</emph> be formatted as follows:
<ol>
<li>The string <q>< ! [ CDATA [</q> with which the marked section
begins must
be on a single line, and should be the last thing on the line.
(There should also be no spaces on either side of the exclamation
point, though there can be spaces on either side of the <q>CDATA</q>
keyword.)
<li>The string <q>] ] ></q> which ends the marked section must begin in
column 1 and should be the only thing on the line.
</ol>
Since these restrictions result from shortcomings in our
Waterloo Script macros and do not exist in SGML, the SGML parsers
may ignore violations of them.  Thus, even validated files have to
be checked manually for correct example formatting.  Use the command
<xmp>
<![ CDATA [
all /<![/ | /]>/
]]>
</xmp>
to find all the relevant lines, and then use PF11 to split them as
needed to ensure that the beginnings and endings of marked sections are
properly laid out for Script.  It is probably a good idea to do this in
the Odd file rather than the Script file, so it doesn't have to be done
again, but this involves re-processing the Odd file.
</ul>
</ul>
<li>Valid Script Attribute Values
<ul compact=0>
<li>P2 Odd files will use periods to make ID and IDREF
attribute values more readable (e.g."P221.1"; "s2.s3.s4").
However,
Script requires ID and IDREF attributes to have values comprising
letters and numbers, but no periods.  The following are acceptable
using Script:
<xmp>
<![ CDATA [
      <p id=P221>
      <align targets=s1s2s3>
]]>
</xmp>
<!>
The following are <emph>not</emph> acceptable using Script:
<xmp>
<![ CDATA [
      <p id=P22.1>
      <align targets=s1.s2.s3>
]]>
</xmp>
<!>
As SGML allows periods in ID and IDREF attribute values,
the validation process will not treat them as errors.
Thus, the periods must be located and
stripped out of the Script file
for Script processing.  (P2xGml tries to do this
but is not completely successful.)
<li>Script occasionally objects to spaces after the equal sign in a
tag's attribute specification:  of the two examples below, the first
will produce an error message and the second will be processed correctly.
I don't now know exactly what Script's requirements are, so I make
a practice, when this error occurs, of just eliminating spaces on either
side of equal signs in attribute values.
<xmp>
 <vallist type= closed>
 <vallist type=closed>
</xmp>
</ul>
</ul>
</body>
<!>
</gdoc>
Prepared by Robin Cover for The XML Cover Pages archive. See "Text Encoding Initiative (TEI) - XML for TEI Lite."