This information is derived from a post to the TEI-L about the problems involved in getting some parts of the TEI DTD to compile with RulesBuilder, the DTD parser/compiler companion for SoftQuad's Author/Editor product.
This message did not pretend to be anything other than an informative summary of events: it is not intended to be regarded as any kind of formal statement of how a solution ought or ought not to be derived.
The message header was:
From pflynn Fri Jul 14 10:57:34 1995 Date: Fri, 14 Jul 1995 10:57:34 +0100 From: Peter Flynn <pflynn> To: TEI-L@UICVM.CC.UIC.EDU Subject: Re: RulesBuilder a nightmare? Cc: trujillo@mail.utexas.eduand the original text is located in the TEI-L archive at listserv@uicvm.cc.uic.edu as
Item # Date Time Recs Subject ------ ---- ---- ---- ------- 001254 95/07/14 09:19 219 Re: RulesBuilder a nightmare?
A previous correspondent had mentioned:
> I accidentally erased the message, but in the last couple of > days someone wrote an aside about the horrors of using Rules Builder > with the TEI DTD(s).
That was probably me, but there weren't any horrors, just a few snags getting it to swallow stuff the right way round. Now that I have found out how to do it (thanks to some swift help from SQ), it's working fine. I was critical of RB doing things this way, but SQ's support has been excellent and has fixed the problem.
> Please tell me more. I am about to embark on a medium-sized tagging > project (a 45,000-word language corpus), and after getting lots of > really slick adverts from the big names in the business (ArborText, > Interleaf, etc) with Big Business-sized price tags, I had pretty > much decided on getting an Author/Editor-Rules Builder bundle > which carries a *substantial* academic discount. But now I'm not > so certain. . .
No, as far as I can see A/E is quite capable of handling this. I don't know how it performs on a file that large (if you really do have it all as a single corpus file), as my experience of editing files >1Mb has been limited to A/E on a very small old PC, which is not a valid platform to compare it with modern machines. (In fact, A/E worked fine, it was the slowness of the PC and lack of memory and disk space which I had a problem with :-)
I got the DTD finished yesterday, and I promised to tell you all what I had to do, so here goes:
<!DOCTYPE TEI.2 SYSTEM "tei2.dtd" [ <!-- Standard tagsets needed for TLH work -- > <!ENTITY % TEI.corpus.dtd 'INCLUDE'> <!ENTITY % TEI.prose 'INCLUDE'> <!--ENTITY % TEI.verse 'INCLUDE' can't be used with prose--> <!ENTITY % TEI.transcr 'INCLUDE'> <!ENTITY % TEI.textcrit 'INCLUDE'> <!ENTITY % TEI.names.dates 'INCLUDE'> <!-- Extra tagset needed to allow documentation of tags in header --> <!--ENTITY % TEI.tagsets.dtd system 'teitsd2.dtd'> %TEI.tagsets.dtd;--> <!-- Standard character entities --> <!ENTITY % ISOlat1 system "ISOLat1" --"ISO 8879:1986//ENTITIES Added Latin 1//EN"--> %ISOlat1; <!ENTITY % ISOlat2 system "ISOLat2" --"ISO 8879:1986//ENTITIES Added Latin 2//EN"--> %ISOlat2; <!-- Local mods to supplement ISO char ents and rename tags --> <!ENTITY % CURIA.entities system 'curia.ent'> %CURIA.entities; ]>
<!ENTITY fdot SDATA "f" -- lenited f (dot-over)--> <!ENTITY Fdot SDATA "F" -- lenited F (dot-over)--> <!ENTITY ndot SDATA "n" -- nasalised n (dot-over)--> <!ENTITY mdot SDATA "m" -- nasalised m (dot-over)--> <!ENTITY mmacr SDATA "m" -- m with macron --> <!ENTITY Sdot SDATA "S" -- lenited S (dot-over)--> <!ENTITY sdot SDATA "s" -- lenited s (dot-over)--> <!ENTITY ampersir SDATA "&" -- insular ampersand--> <!ENTITY turnsemi SDATA ";" -- inverted semi-colon--> <!-- we need to add more of these, eg &longe; --> <!ENTITY % n.persName 'ps'> <!ENTITY % n.addName 'an'> <!ENTITY % n.genName 'gn'> <!ENTITY % n.forename 'fn' > <!ENTITY % n.surname 'sn' > <!ENTITY % n.nameLink 'nk' > <!ENTITY % n.placeName 'pn'> <!ENTITY % n.orgName 'on' > <!ENTITY % n.roleName 'rn'> <!ENTITY % n.expan 'ex' > <!ENTITY % n.foreign 'frn' > <!ENTITY % n.milestone 'mls' > <!ENTITY % n.supplied 'sup' > <!ENTITY % n.quote 'qt' > <!ENTITY % n.unclear 'uncl' > <!-- Bodge to get round bug in TEI DTD, courtesy of LB --> <!ENTITY % x.data '%n.persName | %n.placeName | %n.orgName |'>
<!-- Standard tagsets needed for TLH work -- > <!ENTITY % TEI.corpus.dtd 'INCLUDE'> <!ENTITY % TEI.prose 'INCLUDE'> <!--ENTITY % TEI.verse 'INCLUDE' can't be used with prose--> <!ENTITY % TEI.transcr 'INCLUDE'> <!ENTITY % TEI.textcrit 'INCLUDE'> <!ENTITY % TEI.names.dates 'INCLUDE'> <!-- Extra tagset needed to allow documentation of tags in header --> <!--ENTITY % TEI.tagsets.dtd system 'teitsd2.dtd'> %TEI.tagsets.dtd;--> <!-- Standard character entities --> <!ENTITY % ISOlat1 system "ISOLat1" --"ISO 8879:1986//ENTITIES Added Latin 1//EN"--> %ISOlat1; <!ENTITY % ISOlat2 system "ISOLat2" --"ISO 8879:1986//ENTITIES Added Latin 2//EN"--> %ISOlat2; <!-- Local mods to supplement ISO char ents and rename tags --> <!ENTITY % CURIA.entities system 'curia.ent'> %CURIA.entities; <!ENTITY % TEI2.full.dtd "tei2.dtd"> %TEI2.full.dtd;
<!--ENTITY % TEI2.full.dtd "tei2.dtd"> %TEI2.full.dtd;--> <!ENTITY % missing.bits "lang|oref|ovar|pref|pvar|link|xptr|xref|caesura| anchor|c|cl|m|phr|s|seg|w|att|gi|tag|val|formula| camera|caption|move|sound|tech|view|castlist| figure|table|textdesc|etree|graph|tree| particdesc|settingdesc|alt|altgrp|certainty| flib|fs|fslib|fvlib|interp|interpgrp|join| joingrp|linkgrp|respons|span|spangrp|timeline| epilogue|performance|prologue|set"> <!ELEMENT (%missing.bits;) - o EMPTY> <!-- tei2.dtd: written by OddDTD 1994-09-09 --> <!-- 3.6.1: File tei2.dtd: Main document type declaration --> <!-- file -->and then the rest of the DTD. Note the sequence is important:
> (BTW, I am primarily an OS/2 user, and although I have several > Windows-based products, MS-published software is generally not > welcome here. So the inexpensive MS Word SGML editor add-ons are > not an option.)
I don't know if SQ do an OS/2 version of A/E. I haven't had a chance to test SGML Author for Word or WordPerfect SGML Edition with the TEI. Anyone done this yet?
///Peter