[This local archive copy is from the official and canonical URL, http://www.geocities.com/ResearchTriangle/Lab/6259/XTech99/xtech99.zip; please refer to the canonical source document if possible.]
This DTD allows <Fnote> in <Preface>, but does not allow <Fnote> in <Body>.
C:\regularity\cross>type a1.dtd <!ELEMENT DOC (Preface?, Body)> <!ELEMENT Preface ((Para|Fnote)*)> <!ELEMENT Body (Para*)> <!ELEMENT Para (#PCDATA)> <!ELEMENT Fnote (#PCDATA)>
This DTD allows <Fnote> in <Body>, but does not allow <Fnote> in <Preface>.
C:\regularity\cross>type a2.dtd <!ELEMENT DOC (Preface, Body)> <!ELEMENT Preface (Para*)> <!ELEMENT Body ((Para|Fnote)*)> <!ELEMENT Para (#PCDATA)> <!ELEMENT Fnote (#PCDATA)>
Here is an intersection DTD, which I constructed by hand. We want to automatically construct such a schema.
C:\regularity\cross>type a3.dtd <!ELEMENT DOC (Preface,Body)> <!ELEMENT Preface (Para*)> <!ELEMENT Body (Para*)> <!ELEMENT Para (#PCDATA)>
Before we show the construction, we have to introduce another syntax for expressing DTDs. In this syntax, the first DTD is represented as below:
C:\regularity\cross>type a1.rl <!SCHEMA (NT-4)> <!ELEMENT Fnote> <!ELEMENT Para> <!ELEMENT Body> <!ELEMENT Preface> <!ELEMENT DOC> <!VARIABLE #PCDATA> <!RULE [NT-0] Fnote (NT-5*)> <!RULE [NT-1] Para (NT-5*)> <!RULE [NT-2] Body (NT-1*)> <!RULE [NT-3] Preface ((NT-1|NT-0)*)> <!RULE [NT-4] DOC ((NT-3|""),NT-2)> <!RULE [NT-5] #PCDATA>
Content models references to non-terminals rather than generic identifiers. For example, NT-4 is a start non-terminal. We have one rule (i.e., <!RULE [NT-4] DOC ((NT-3|""),NT-2)>) for this non-terminal. It has "DOC" as the generic identifier, and has a content model ((NT-3|""),NT-2). For the non-terminal NT-3, we have only one rule. It has "Preface" as the generic identifier and has ((NT-1|NT-0)*) as a content model.
C:\regularity\cross>type a2.rl <!SCHEMA (NT-4)> <!ELEMENT Fnote> <!ELEMENT Para> <!ELEMENT Body> <!ELEMENT Preface> <!ELEMENT DOC> <!VARIABLE #PCDATA> <!RULE [NT-0] Fnote (NT-5*)> <!RULE [NT-1] Para (NT-5*)> <!RULE [NT-2] Body ((NT-1|NT-0)*)> <!RULE [NT-3] Preface (NT-1*)> <!RULE [NT-4] DOC (NT-3,NT-2)> <!RULE [NT-5] #PCDATA>
Let us automatically construct the intersection schema.
C:\regularity\cross>rcross a1.ha a2.ha | useful | renum | ha2sch | sch2rl <!SCHEMA (NT-4)> <!ELEMENT DOC> <!ELEMENT Body> <!ELEMENT Para> <!ELEMENT Preface> <!VARIABLE #PCDATA> <!RULE [NT-4] DOC (NT-3,NT-2)> <!RULE [NT-2] Body (NT-1*|"")> <!RULE [NT-1] Para (NT-0*|"")> <!RULE [NT-3] Preface (NT-1*|"")> <!RULE [NT-0] #PCDATA>
Observe that this is equivalent to a3.dtd. Thus, we have successfully created the intersection schema.
This DTD does not allow <Fnote>.
C:\regularity\union>type a1.dtd <!ELEMENT DOC (Preface?, Body)> <!ELEMENT Preface (Para*)> <!ELEMENT Body (Para*)> <!ELEMENT Para (#PCDATA)>
This DTD allows <Fnote> in <Body>.
C:\regularity\union>type a2.dtd <!ELEMENT DOC (Body)> <!ELEMENT Body (Para*)> <!ELEMENT Para (#PCDATA|Fnote)*> <!ELEMENT Fnote (#PCDATA)>
We want to construct the union schema. I manually constructed a DTD (shown below), but it fails to capture the union schema.
C:\regularity\union>type a3.dtd <!ELEMENT DOC (Preface?, Body)> <!ELEMENT Preface (Para*)> <!ELEMENT Body (Para*)> <!ELEMENT Para (#PCDATA|Fnote)*> <!ELEMENT Fnote (#PCDATA)>
This DTD allows <Preface> containing <Fnote>. However, a1.dtd does not allow <Fnote> and a2.dtd does not allow <Preface>. Thus, documents matching the union schema never have <Preface> containing <Fnote>. Therefore, this DTD does not caputre the union. The union schema should allow <Para> to have <Fnote> only when <Para> is subordinate to <Body>.
C:\regularity\union>type a1.rl <!SCHEMA (NT-3)> <!ELEMENT Para> <!ELEMENT Body> <!ELEMENT Preface> <!ELEMENT DOC> <!VARIABLE #PCDATA> <!RULE [NT-0] Para (NT-4*)> <!RULE [NT-1] Body (NT-0*)> <!RULE [NT-2] Preface (NT-0*)> <!RULE [NT-3] DOC ((NT-2|""),NT-1)> <!RULE [NT-4] #PCDATA>
C:\regularity\union>type a2.rl <!SCHEMA (NT-3)> <!ELEMENT Para> <!ELEMENT Body> <!ELEMENT Preface> <!ELEMENT DOC> <!VARIABLE #PCDATA> <!RULE [NT-0] Para (NT-4*)> <!RULE [NT-1] Body (NT-0*)> <!RULE [NT-2] Preface (NT-0*)> <!RULE [NT-3] DOC ((NT-2|""),NT-1)> <!RULE [NT-4] #PCDATA>
We can automatically construct the union schema.
C:\regularity\union>runion a1.ha a2.ha | useful | renum | ha2sch | sch2rl <!SCHEMA (NT-5|NT-7|NT-9)> <!ELEMENT DOC> <!ELEMENT Body> <!ELEMENT Para> <!ELEMENT Fnote> <!ELEMENT Preface> <!VARIABLE #PCDATA> <!RULE [NT-5] DOC (NT-1)> <!RULE [NT-7] DOC (NT-4,NT-1)> <!RULE [NT-9] DOC (NT-8)> <!RULE [NT-1] Body (NT-2*|"")> <!RULE [NT-8] Body (NT-2*,NT-6,(NT-2|NT-6)*)> <!RULE [NT-2] Para (NT-0*|"")> <!RULE [NT-6] Para (NT-0*,NT-3,(NT-0|NT-3)*)> <!RULE [NT-3] Fnote (NT-0*|"")> <!RULE [NT-4] Preface (NT-2*|"")> <!RULE [NT-0] #PCDATA>
Observe that <Para> for NT-6 requires at least one NT-3, which corresponds to <Fnote>, but <Para> for NT-2 requires #PCDATA only. NT-6 is referenced only from NT-8.
The above union schema cannot be captured by the DTD syntax. However, we can automatically loosen the schema so that the result can be expressed in the DTD syntax.
C:\regularity\union>runion a1.ha a2.ha | useful | localize | ha2sch | sch2rl <!SCHEMA (NT-0)> <!ELEMENT DOC> <!ELEMENT Body> <!ELEMENT Para> <!ELEMENT Fnote> <!ELEMENT Preface> <!VARIABLE #PCDATA> <!RULE [NT-0] DOC (NT-1|NT-4,NT-1)> <!RULE [NT-1] Body (NT-2*|"")> <!RULE [NT-2] Para ((NT-3|NT-5)*|"")> <!RULE [NT-3] Fnote (NT-5*|"")> <!RULE [NT-4] Preface (NT-2*|"")> <!RULE [NT-5] #PCDATA>
This schema is identical to the DTD we constructed by hand.
This DTD allows <Fnote>.
C:\regularity\diff>type a1.dtd <!ELEMENT DOC (Body)> <!ELEMENT Body ((Para|Fnote)*)> <!ELEMENT Para (#PCDATA)> <!ELEMENT Fnote (#PCDATA)>
This DTD does not allow <Fnote>.
C:\regularity\diff>type a2.dtd <!ELEMENT DOC (Body)> <!ELEMENT Body (Para*)> <!ELEMENT Para (#PCDATA)>
This DTD is constructed by hand. If a document has at least one <Fnote>, it is valid against the difference schema.
C:\regularity\diff>type a3.dtd <!ELEMENT DOC (Body)> <!ELEMENT Body (Para*,Fnote,(Fnote|Para)*)> <!ELEMENT Para (#PCDATA)> <!ELEMENT Fnote (#PCDATA)>
We can automatically construct the difference schema.
C:\regularity\diff>hadiff a1.ha a2.ha | useful | renum | ha2sch | sch2rl <!SCHEMA (NT-4)> <!ELEMENT DOC> <!ELEMENT Body> <!ELEMENT Para> <!ELEMENT Fnote> <!VARIABLE #PCDATA> <!RULE [NT-4] DOC (NT-3)> <!RULE [NT-3] Body (NT-2*,NT-1,(NT-1|NT-2)*)> <!RULE [NT-2] Para (NT-0*|"")> <!RULE [NT-1] Fnote (NT-0*|"")> <!RULE [NT-0] #PCDATA>
This DTD allows <Fnote> in both <Preface> and <Body>.
C:\regularity\diff>type b1.dtd <!ELEMENT DOC (Preface, Body)> <!ELEMENT Preface ((Para|Fnote)*)> <!ELEMENT Body ((Para|Fnote)*)> <!ELEMENT Para (#PCDATA)> <!ELEMENT Fnote (#PCDATA)>
This DTD has <Preface> and <Body>, but does not allow <Fnote>.
C:\regularity\diff>type b2.dtd <!ELEMENT DOC (Preface, Body)> <!ELEMENT Preface (Para*)> <!ELEMENT Body (Para*)> <!ELEMENT Para (#PCDATA)>
I constructed this DTD by hand, but it fails to capture the difference schema. It mandates both <Body> and <Preface> to have at least one <Fnote>. However, if EITHER <Body> or <Preface> has <Fnote>, the document should be valid against the difference schema.
C:\regularity\diff>type b3.dtd <!ELEMENT DOC (Preface, Body)> <!ELEMENT Body (Para*,Fnote,(Fnote|Para)*)> <!ELEMENT Preface (Para*,Fnote,(Fnote|Para)*)> <!ELEMENT Para (#PCDATA)> <!ELEMENT Fnote (#PCDATA)>
We can automatically construct the difference schema.
C:\regularity\diff>hadiff b1.ha b2.ha | useful | renum | ha2sch | sch2rl <!SCHEMA (NT-7)> <!ELEMENT DOC> <!ELEMENT Body> <!ELEMENT Para> <!ELEMENT Fnote> <!ELEMENT Preface> <!VARIABLE #PCDATA> <!RULE [NT-7] DOC (NT-6,NT-3|NT-6,NT-5|NT-4,NT-5)> <!RULE [NT-3] Body (NT-2*|"")> <!RULE [NT-5] Body (NT-2*,NT-1,(NT-1|NT-2)*)> <!RULE [NT-2] Para (NT-0*|"")> <!RULE [NT-1] Fnote (NT-0*|"")> <!RULE [NT-4] Preface (NT-2*|"")> <!RULE [NT-6] Preface (NT-2*,NT-1,(NT-1|NT-2)*)> <!RULE [NT-0] #PCDATA>
Note that the rule for NT-5 mandates <Fnote> in <Body> and the rule for NT-6 mandates <Fnote> in <Preface>. The rule for NT-7 (which is the start non-terminal) references to at least one of NT-6 and NT-6.
The above union schema cannot be captured by the DTD syntax. However, we can automatically loosen the schema so that the result can be expressed in the DTD syntax.
C:\regularity\diff>hadiff b1.ha b2.ha | useful | renum | localize | ha2sch | sch2rl <!SCHEMA (NT-4)> <!ELEMENT DOC> <!ELEMENT Body> <!ELEMENT Para> <!ELEMENT Fnote> <!ELEMENT Preface> <!VARIABLE #PCDATA> <!RULE [NT-4] DOC (NT-3,NT-2)> <!RULE [NT-2] Body ((NT-0|NT-1)*|"")> <!RULE [NT-1] Para (NT-5*|"")> <!RULE [NT-0] Fnote (NT-5*|"")> <!RULE [NT-3] Preface ((NT-0|NT-1)*|"")> <!RULE [NT-5] #PCDATA>
This schema is identical to the DTD we constructed by hand.