This document describes an element set consistency language used by the consistency check engine developed for consistencycheck.com. The following sections will explain the basic ideas behind the language and the approach taken to the expression of consistency relationships, present the new language and a few introductory examples and finally give the DTD of the new language.
Note: This document is a preliminary release draft Before reading the overview you should already be familiar with the broad concept of checking consistency between XML documents. You will need to understand XML, DTDs, XPath and a bit of set theory (negligible). Please contact us if you want more information.
This rule is to ensure that students and staff can get access to information, e.g. prerequisites, reading lists, about each possible course. We will leave aside the DTD for the syllabus information, let's say it is very simple: The courses are kept in one XML file and there is one XML file for each course syllabus. A connection between the two can be made via the course code.
Here is an example syllabus for a course (with a lot of information removed):
complexity.xml
<syllabus> <identity> <title>Computational Complexity</title> <code>3C04</code> <summary>Theory of computation and complexity.</summary> </identity> <workload> <workload_element workload_hours="30" workload_mode="Lectures"/> <workload_element workload_hours="65" workload_mode="Private_Reading"/> <workload_element workload_hours="12" workload_mode="Required_Written_Work"/> <workload_element workload_hours="33" workload_mode="Revision"/> <workload_total>140</workload_total> </workload> </syllabus>And here is a sample degree course (again, with a lot of information removed).
samplecourse.xml
<StudyPlan> <StudentIdentity> <FirstName>Homer</FirstName> <FamilyName>Finkelstein</FamilyName> <Reg_Number>3645</Reg_Number> </StudentIdentity> <Programme> <Title>CS</Title> <Award>BSc</Award> </Programme> <Year number="1"> <Course> <Name>Computer Architecture and Networks</Name> <Code>1B10</Code> </Course> <Course> <Name>Consistency Checking of Distributed Documents</Name> <Code>0xffff</Code> </Course> </Year> </StudyPlan>Let us formulate the example consistency rule in a more precise way, but still in English: For every course module in the curriculum, a syllabus entry exists with the same code.
To illustrate the thinking behind the consistency rule language, we will now parse the english text of the sentence to point out the important bits:
Set 1 | Set 2 |
---|---|
Course (1B10) | code (1B10) |
Course (1B12) | code (1B12) |
Course (1BGOOD) |
To summarize, we will generate sets of elements (e.g. Course and code elements) and for each element, we will generate a relative set (e.g. the code attribute) for comparison with the relative set of another element.
Our approach is to execute a filter query (more or less equivalent to the idea of "restriction" in the relational algebra, but filter sounds more user-friendly) on the destination set, predicated by the current source node's relative set. For example, suppose we have just started our traversal of the source set and are at entry 1, "Course (1B10)". We now execute a filter roughly of the sort "Code = text()" (the text() function is xpath notation and refers to the text contained in the code element of the syllabus identity), which keeps all nodes in the destination set which have the same code as the current source node. The resulting table is:
Set 1 | Filtered Set 2 (*Code = text()) |
---|---|
* Course (1B10) | code (1B10) |
Course (1B12) | |
Course (1BGOOD) |
The observation that the filtered set is not empty tells us that our existential quantifier ("there exists a syllabus entry..") is satisfied. The current source element is thus consistent with our rule. Now, the processing will continue for each node in the source set. Suppose we arrive at the last entry, "Course (1BGOOD)". The destination set will be filtered using "Code = text()", giving
Set 1 | Filtered Set 2 (*Code = text()) |
---|---|
Course (1B10) | |
Course (1B12) | |
* Course (1BGOOD) |
Unfortunately, the filtered destination set is empty. We conclude that there is no syllabus information for the course with code "1BGOOD". The current source element is inconsistent with our constraint.
The user could also specify "generate inconsistent links". In this case, every element in the source set will get an incosistent link to the elements in the filtered destination set. Imagine a rule of the form For every course in the degree, there must not be a syllabus with the same name, however silly this example may sound.
Finally, the user could specify "generate inconsistent links to nowhere". This would apply in our example. In Table 3, the destination set is empty. The current source element is therefore inconsistent by the definition of our rule. However, where do we link if there is nothing to link to? The source element is not inconsistent with anything else, it is inconsistent with itself, or the system it resides in. We produce an inconsistent link that doesn't point anywhere (or to the element itself).
At the end of this example, we should have the following links:
<ConsistencyRule id="r1"> <Description>Each course in the studyplan has a syllabus </Description> <SetDefinition id="s1"> /StudyPlan/Year/Course </SetDefinition> <SetDefinition id="d1"> /syllabus/identity </SetDefinition> <Forall setid="s1"> <SizeNotEqual cmode="C" imode="IX"> <Filter setid="d1"> <Equal> <XPathSource value="Code"/> <XPathFilter value="code"/> </Equal> </Filter> <Integer value="0"/> </SizeNotEqual> </Forall> </ConsistencyRule>Now, looking at the rule, a destination element goes into the filtered set if the equal expression, which works on the relative xpath expressions inside, returns true. The XPathSource element will generate an attribute set relative to the nearest enclosing forall expression (there can only be one forall anyway, but these are the semantics). The XPathFilter expression works on the set in the nearest enclosing Filter element. Again, there can only be one filter element containing an XPathFilter, but this is the way this constraint is enforced in our DTD.
The Filter element will thus filter those nodes out of the destination set for which the filter expression returns false. The second parameter to the SizeNotEqual element is an integer with the value zero. The size of the filtered set will be compared to the integer. If they are not equal, consistent links are generated into the filtered set from the current source node, otherwise an inconsistent link to "missing" is generated from the current source node.
<ConsistencyRule> <SetDefinition id="source"> /Year[@number="1"]/Course </SetDefinition> <SetDefinition id="destination"> /Curricula/Curriculum/Year/Course/Type[@requirement="C"]/.. </SetDefinition> <Forall setid="source"> <SizeEqual cmode="CX" imode="IX"> <Filter setid="destination"> <And> <Equal> <XPathSource value="Code"/> <XPathFilter value="Code"/> </Equal> <Equal> <XPathFilter value="Type[@requirement]"/> <Constant value="C"/> </Equal> </And> </Filter> <Integer value="8"/> </SizeEqual> </Forall> </ConsistencyRule>First of all, observe that using the XPath expression /Year[@number="1"]/Course we can already pre-filter the source set to only include first year courses. In addition, we only want the Course elements which contain a Type subelement with the requirements attribute set to "C", again this is done by XPath.
Now we filter the set of course modules
to include only the courses with the same code as the degree programme.
If the final, filtered set is of size 8, we generate a consistent link
from the current source node to the rule (specified by "cmode=CX" in the
SizeEqual element). This way of handling the consistency links seems to
make more sense than linking to all destination courses (the source node
is consistent towards the rule, but it cannot be said that it is "consistent
with its modules"). Similarly, if the size of the destination set is above
or below eight, we generate an inconsistent link to the rule ("imode=IX").
<ConsistencyRule id="r6">
<SetDefinition id="s1">
/syllabus/identity
</SetDefinition>
<SetDefinition id="d1">
/syllabus/identity
</SetDefinition>
<Forall setid="s1" mode="oneSet">
<SizeNotEqual cmode="CN"
imode="IX">
<Filter setid="d1" mode="oneSet">
<And>
<Equal>
<XPathSource value="code"/>
<XPathFilter value="code"/>
</Equal>
<IsIntersect>
<XPathSource value="code"/>
<XPathFilter value="closure(code,../../subjet/prerequisites/pre_code)"/>
</IsIntersect>
</And>
</Filter>
<Integer value="0"/>
</SizeNotEqual>
</Forall>
</ConsistencyRule>
The way the transitive closure operator
works is as follows: given a course element, it first retrieves its direct
prerequisites and then it repeats this process recursively, until all the
prerequisites have been found. We then only have to check whether or not
the course is listed (IsInteresect operator)
among them.
<ConsistencyRule id="r6">
<SetDefinition id="s1">
/StudyPlan/Year/Course
</SetDefinition>
<SetDefinition id="d1">
/Curricula/Curriculum/Year/Course
</SetDefinition>
<Forall setid="s1" mode="multipleSet">
<SizeNotEqual cmode="C"
imode="IX">
<Filter setid="d1" mode="oneSet"/>
<Equal>
<XPathSource value"Code"/>
<XPathFilter value="Code">
</Equal>
<Integer value="0"/>
</SizeNotEqual>
</Forall>
</ConsistencyRule>
The number of courses retrieved in the destination set "d1" is quite high (about 500). During the filtering operation we have to evaluate the xpath expression "Code" on each of these courses, and this operation is time-consuming. What we would like to do instead is to work on a destination set containing only those courses relevant to this studyplan, that is only the courses that belong to the same programme undertaken by the student. Using variables we are able to cut down the number the courses that belong to the destination set, as shown below:
<SetDefinition id="d1">
/Curricula/Curriculum/Programme/Title[text()=$s1/../../Programme/Title]/../Award[text()=$s1/../../Programme/Award]/../../Year/Course
</SetDefinition>
The variable "$s1"
refers to the definition set having the same id. In principle this variable
could be bound to any set defined inside the rule; at the moment we are
using a simplified version of this approach so that this variable is always
bound to the set that is referred to by the "Forall"
statement.
This pre-filtering operation drastically
reduces the cardinality of the destination set so that the filtering operation
can be performed much quicker. In some circumstances, the pre-filtering
operation actually does all the filtering necessary, and all we have to
do is to check the cardinality of the destination set. This is the case
with the rule above that can be rewritten in the following way.
<ConsistencyRule>
<SetDefinition id="s1">
/StudyPlan/Year/Course
</SetDefinition>
<SetDefinition id="d1">
/Curricula/Curriculum/Programme/Title[text()=$s1/../../Programme/Title]/../Award[text()=$s1/../../Programme/Award]/../../
Year[@number=$s1/../@number]/Course/Code[text()=$s1/Code]/..
</SetDefinition>
<Forall setid="s1" mode="multipleSet">
<SizeNotEqual cmode="C"
imode="IX">
<Filtered setid="d1" mode="oneSet"/>
<Integer value="0"/>
</SizeNotEqual>
</Forall>
</ConsistencyRule>
<?xml version="1.0" encoding="ISO-8859-1"?>
<!ENTITY % Boolean
"And,Or">
<!ENTITY % MultiOperator
"AndOperator,OrOperator">
<!ELEMENT XPathSource
EMPTY>
<!ATTLIST XPathSource
value
CDATA #REQUIRED>
<!ELEMENT XPathFilter
EMPTY>
<!ATTLIST XPathFilter
value
CDATA #REQUIRED>
<!ELEMENT Constant
EMPTY>
<!ATTLIST Constant
value
CDATA #REQUIRED>
<!ELEMENT Integer
EMPTY>
<!ATTLIST Integer
value
CDATA "0">
<!-- ConsistencyRule -->
<!ELEMENT ConsistencyRuleSet
(ConsistencyRule)*>
<!ELEMENT ConsistencyRule
(Description,SetDefinition*,Forall)>
<!ATTLIST ConsistencyRule
id ID #REQUIRED>
<!ELEMENT Description (#PCDATA)>
<!ELEMENT SetDefinition
(#PCDATA)>
<!ATTLIST SetDefinition
id
ID #REQUIRED>
<!-- Forall -->
<!ELEMENT Forall (%MultiOperator;,SizeEqual,SizeNotEqual)>
<!ATTLIST Forall
setid
IDREF #REQUIRED
mode
(oneSet|multipleSet) "multipleSet">
<!ELEMENT AndOperator
(SizeEqual*,SizeNotEqual*,AndOperator*,OrOperator*)>
<!ATTLIST AndOperator
cmode
(C|CX) "C"
imode
(I|IX) "">
<!ELEMENT OrOperator
(SizeEqual*,SizeNotEqual*,AndOperator*,OrOperator*)>
<!ATTLIST OrOperator
cmode
(C|CX) "C"
imode
(I|IX) "">
<!ELEMENT SizeEqual
(Filter*,Filtered*,Integer*)>
<!ATTLIST SizeEqual
cmode
(C|CX) "C"
imode
(I|IX) "">
<!ELEMENT SizeNotEqual
(Filter*,Filtered*,Integer*)>
<!ATTLIST SizeNotEqual
cmode
(C|CX) "C"
imode
(I|IX) "">
<!-- Filters -->
<!ELEMENT Filter (%Boolean;,Equal,NotEqual,IsIntersect,Subset)>
<!ATTLIST Filter
setid
IDREF #REQUIRED
mode
(oneSet|multipleSet) "multipleSet">
<!ELEMENT Filtered
(%Boolean;,Equal,NotEqual,IsIntersect,Subset)?>
<!ATTLIST Filtered
setid
IDREF #REQUIRED
mode
(oneSet|multipleSet) "multipleSet">
<!ELEMENT Equal (XPathSource,XPathFilter,Constant*)>
<!ELEMENT NotEqual
(XPathSource,XPathFilter,Constant*)>
<!ELEMENT IsIntersect
(XPathSource,XPathFilter,Constant*)>
<!ATTLIST IsIntersect
size CDATA #REQUIRED>
<!ELEMENT Subset
(XPathSource,XPathFilter,Constant*)>
<!ATTLIST Subset
size CDATA #REQUIRED>
<!ELEMENT And (Equal*,NotEqual*,IsIntersect*,Subset*,And*,Or*)>
<!ELEMENT Or (Equal*,NotEqual*,IsIntersect*,Subset*,And*,Or*)>