This paradigm will be more or less satisfactory in expressing the constraints depending on the particular case; indeed, when the schema was written before the document, the schematic paradigm will influence the structures allowed and used to express interlocking constraints. Thus the schema paradigm acts to extend the effective vocabulary of markup types beyond those provided by the particular markup language. [SGML] DTDs allow 'global inclusion exceptions'; [XML] DTDs do not allow them, so XML DTDs cannot express the particular constraint.
In recognition that any schematic paradigm imposes limitation on the
constraints expressable by a schema, SGML declarations refer to its grammar
declarations as 'content models'. Thus one can say there is a difference
in terminology between the expectations of a schema versus a model: the
former defines canonically or exhaustively, whilst the latter describes
as best it can according to its schematic paradigm.
Conventionally, starting with SGML DTDs, schemas for markup languages are defined in terms of grammars to regulate element containment, lists to regulate attribute containment, augmented by datatype constraints on various information elements. In XML, these additional constraints are concerned with enabling graph structures to be represented, rather than describing the semantics or types of information elements. In the late 1990s, many schema languages were developed for XML in anticipation of the development of the World Wide Web Consortium's [XML Schema] schema language. All these schema languages used the grammar-founded approach mentioned above, elaborating on them using objects [SOX], modules [RELAX], production selectors [Assertion Grammars], etc.
But these approaches have certain deficiencies. For a start, SGML used grammars as its schematic paradigm because one does indeed define grammars with it, down to lexical level. The grammar paradigm is not necessary for schemas for XML. Secondly, the grammar approach is not sufficient to express any constraints between information items in different branches of the attribute-value tree which forms the primary view of an XML document. The mechanisms for declaring unique identifies and references do not alter this; mechanisms such as that proposed for [XML Schema] to introduce grammar non-terminals (termed the tag/type distinction) allow an element name to have a different datatype of content model depending on its parent, however this merely allows the type of an element to be constrained by its parent's type as well as by its generic identifier (i.e., by its tag).
These deficiencies are nothing more than schema paradigm limitations. However there are other pragmatic and policy considerations that may make a grammar-based schema paradigm unattractive.
Secondly, an perhaps related to the previous point, it may be even more likely that if a schematic paradigm has language or cultural affinities, that different schema languages may be more or less useful to people with different cognative impairments. I only need to take this point as far as saying the obvious that a complex schema paradigm may be more difficult than a simple one. And the difficulty of a schematic paradigm may be more than just its complexity to explain but also its complexity to use and implement.
Third, when considering the needs of schemas for constraints on documents which are needed to support accessibility by disabled people, we come up against what I regard as a fundamental shortfall in existing schema languages: they are designed to support definitional schemas which intend to specifiy exhaustively or canonically the required constraints on a document. However, acessibility constraints are policy constraints imposed on a document in addition to those constraints required to define that document.
Fourthly, after admitting that there can be important non-definitional constraints on a document, the question arises of what other non-definitional constraints can there be? The main one I identify is the requirements of workflow: that some constraints only may come into existence during some phase of a document's lifecycle. Without some notion of constraints that come into play during a phase, one must either weaken constraints on a schema or arbitrarily switch schemas during the document's life cycle.
Fifthly, building on the notion that it is useful to be able to switch constrains in and out during formally defined phases of a document's life cycle, we can see that the ability to group and switch in and out constraints on an ad hoc basis during editing of a document would be useful. It is a common difficulty with validating editors of structured documents that otiose errors are reported for documents under construction and incomplete.
Thus one consideration leads to the next, and the result can be considerable doubt that grammars form an adequate schematic paradigm for documents for the World Wide Web. These are some of the needs and considerations underpinning development of the Schematron assertion language.
The <assert> element is used to tag positive assertions about a document. For example,
<assert>An 'dog' element should contain two 'ear' elements.</assert>This asssertion is something that is expected to be true of the document. If a document is validated against the schema, and the test for this assertion fails, an application can take some action. Schematron does not specify any actions: it only allows assertions to be tested, for the parts of assertions to be given roles, for the assertions to be grouped into rules, for the rules to be grouped into patterns, and for the patterns to be activated in various phases.
The <report> element is used to tag negative assertions about a document. For example,
<report>This dog has a bone.</report>Within these two elements, it is possible to use a <name> element, which gives the specific name of the context element for which the assert statement failed or the report statement succeeded. The <name> element can also have an attribute value in which an [XPath] expression can be given; this allows the name of an element or attribute different to the context element to be specified. Because some implementations of Schematron may format these names differently. For better formatting, an element <emph> is also allowed; its only use is to allow names of elements or attributes to be specified in assertions to have the same format as those provided by evaluating the <name> element.
For internationalization, the element <dir> can be used inside these two elements to support bidirectional written languages; the semantics are those of the dir element of [HTML]. The elements may also have an xml:dir attribute for tagging the written language of the contents of the element; the xml:lang attribute does not express the language of the target document.
For better formating of assertion reports, these two elements may also have an icon attribute, which is the [URL] of a small image that may provide visual clues to a user.
These two elements can also have a subject attribute. This is an [XPath] path which allows very direct specification of the subject of the assertion: this may be useful information for automatically generating [RDF] documents.
The full declarations for the assertions above are
<rule context="dog"> <assert test="count(ear) = 2" >A 'dog' element should contain two 'ear' elements.</assert> <report test="bone" >This dog has a bone.</report> </rule>These three elements are the operational core of Schematron. [XPath] expressions allow a very wide range of constraints to be expressed: based on element and attribute names, based on their position and occurrence, based on text values, and based on counts. In the example, the context is every element with a generic identifier 'dog': the test in the <assert> element counts the number of child elements with the generic identifer 'ear'. Neither assertion in this rule will fail for the following XML document:
<dog><ear/><ear/></dog>The context attribute is an [XPath] as extended by [XSLT], allowing 'or' operations, for example. The test attributes are [XPath] expressions which allow various logical operators such as '|'.
The <pattern>,<assert> and <report> elements can each have a role attribute. This is an identifier within the schema to identify the role that is played. These elements can also have id attributes.
This double path system is reminiscent of SQL queries: one could consider a query SELECT x FROM y WHERE z IS a to be a context statement (i.e., 'WHERE x IS y') and a test (i.e., 'x FROM y).
A <rule> element can also contain <key> elements, which allows [XPath]'s key mechanism to be used. This allows various testing of reference constraints; it is more powerful than the [XML] ID/IDREF mechanism. The path attribute is an [XPath] path; the name attribute is a token naming the key. The icon attribute allows specification of an icon.
An important feature to note is that, because of [XSLT]'s document() function, a Schematron assertion test can refer to data in a different document from the context document. This allows Schematron schemas to be used for two important uses: to validate against a controlled vocabulary located externally to the schema (indeed, this can be in any XML document type, not just using a Schematron schema), and to validate the output of some programs function against data found in its input (or vice versa) as a form of black-box testing.
A simple macro mechanism is allowed on rules. A <rule> element can have one or more <extends> elements. These have a rules attribute, which is the identifier of another rule. This allows you to bring in the assertions of an abstract rule which was specified with an abstract attribute with a value "yes". An abstract rule element cannot have a context attribute. As an example, this constraint can be specified as follows (in [XPath} paths
<rule context="sch:rule"> <assert test="(attribute::abstract='yes') and not(attribute::context)" >An abstract rule cannot have a context attribute.</assert> <assert test="(attribute::abstract='no') and attribute::context" >A rule should have a context attribute (except for abstract rules.)</assert> <assert test="not(attribute::abstract) and attribute::context" >A rule should have a context attribute (except for abstract rules.)</assert> <report test="attribute::abstract and not(attribute::abstract='yes') and not(attribute::abstract='no')" >In a rule, the abstract attribute is optional, and can have values 'yes' or 'no'</report> </rule>Note in this example that Schematron schemas are very specific. It is quite probable that a simpler schema would be just as effective, or the various assertions could be combined into a larger test with a more general statement.
Pattern elements have various attributes. The name attribute allows specification of a simple human-readable string to identify the pattern. The id attribute allows a unique identifier to be assigned. for reference purposes. The fpi attribute allows an [SGML] Formal Public Identifier to be attached. The see attribute allows a [URL] to be specified which gives some human readable documentation for the pattern; a hypertext presentation of the schema results can link to that resource.
A <pattern> element can have an icon element.
Typically the schema will be declared using XML [Namespace] conventions. The preferred prefix is sch and the appropriate namespace URI is
http://www.ascc.net/xml/schematronThus a complete XML schema document is as follows:
<?xml version="1.0" encoding="US-ASCII"?> <sch:schema xmlns:sch="http://www.ascc.net/xml/schematron"> <sch:title>Example Schematron Schema</sch:title> <sch:pattern> <sch:rule context="dog"> <sch:assert test="count(ear) = 2" >A 'dog' element should contain two 'ear' elements.</sch:assert> <sch:report test="bone" >This dog has a bone.</sch:report> </sch:rule> </sch:pattern> </sch:schema>The <schema> element can have a ns attribute which gives the namespace URI that role attributes will have, if the role is used to externally mark up the target document.
The <schema> element also allows explicit declaration of namespace prefixes and URLs that are used in the schema, using the <ns> subelements. The usual XML [Namespaces] mechanism can be used, however, then the prefix and URL data is not available for diagnostic reporting or application processing; furthermore, some implementations may require that the information is made available in that form.
A <schema> can have an icon attribute. It can also contain <p> elements, allowing some modest end-user-oriented documentation to be given: this allows the user to know what kind of validation or constraints the schema specifies, to aid them in interpreting any results usefully. The <p> element can have an icon attribute.
The <phases> element has an attribute activePatterns which is a list of the identifiers of patterns.
By default, all patterns in a document are active. However, an application may provide a way to allow the user to select the phase to be used: for example, a command line option when invoked from the command line, a preferences dialog box in a GUI, or a parameter on the function invocation when called as a precondion checker in a programming language.
A <hint> element is general text. It can contain <dir> and <emph> subelements. It must have an id attribute to allow references to it. The <hint> element can have <value-of> sub-elements, which have the same semantics as in [XSLT]. These allow insertion of value information as well as name details. A <hint> element can have an icon attribute.
<!-- +//IDN sinica.edu.tw//DTD Schematron 1.4//EN --> <!-- Data types --> <!ENTITY % URI "CDATA" > <!ENTITY % PATH "CDATA" > <!ENTITY % EXPR "CDATA" > <!ENTITY % FPI "CDATA" > <!-- Element declarations -->
<!ELEMENT schema ( title?, ns*, phase*, p*, pattern+ , p*, hints )>
<!ELEMENT assert ( #PCDATA | name | emph | dir )*> <!ELEMENT dir ( #PCDATA )> <!ELEMENT emph ( #PCDATA )> <!ELEMENT extends EMPTY > <!ELEMENT hint (#PCDATA | value-of | emph | dir)* > <!ELEMENT hints ( hint+ )> <!ELEMENT key EMPTY > <!ELEMENT name EMPTY > <!ELEMENT ns EMPTY > <!ELEMENT p ( #PCDATA | dir | emph) > <!ELEMENT pattern ( p*, rule+ )> <!ELEMENT phase ( #PCDATA | dir | emph) > <!ELEMENT report ( #PCDATA | name | emph | dir )*> <!ELEMENT rule ( assert | report | key | extends )+> <!ELEMENT title ( #PCDATA | dir ) > <!ELEMENT value-of EMPTY >
<!-- Attribute declarations --> <!ATTLIST schema xmlns %URI; #FIXED "http:/www.ascc.net/xml/schematron" fpi %FPI; #IMPLIED defaultPhase IDREF #IMPLIED icon %URI; #IMPLIED xml:lang NMTOKEN #IMPLIED > <!ATTLIST assert test %EXPR; #REQUIRED role NMTOKEN #IMPLIED id ID #IMPLIED hints IDREFS #IMPLIED icon %URI; #IMPLIED subject %PATH; #IMPLIED xml:lang NMTOKEN #IMPLIED > <!ATTLIST dir value ( ltr | rtl ) #IMPLIED > <!ATTLIST extends rule IDREF #REQUIRED > <!ATTLIST hint id ID #REQUIRED icon %URI #IMPLIED xml:lang NMTOKEN #IMPLIED > <!ATTLIST key name NMTOKEN #REQUIRED path %PATH; #REQUIRED icon %URI; #IMPLIED > <!ATTLIST name path %PATH; #IMPLIED > <!-- Schematrons should implement '.' as the default value for path --> <!ATTLIST p xml:ns NMTOKEN #IMPLIED > <!ATTLIST pattern name CDATA #REQUIRED see %URI; #IMPLIED id ID #IMPLIED icon %URI; #IMPLIED> <!ATTLIST ns uri %URI; #REQUIRED > prefix NMTOKEN #IMPLIED > <!ATTLIST phase id ID #REQUIRED fpi %FPI; #IMPLIED activePatterns IDREFS #REQUIRED icon %URI; #IMPLIED > <!ATTLIST report test %EXPR; #REQUIRED role NMTOKEN #IMPLIED id ID #IMPLIED hints IDREFS #IMPLIED icon %URI; #IMPLIED subject %PATH; #IMPLIED > <!ATTLIST rule context %PATH; #IMPLIED abstract (yes | no ) "no" role NMTOKEN #IMPLIED id ID #IMPLIED > <!-- Schematrons should implement 'no' as the default value of abstract --> <!ATTLIST value-of select %PATH; #REQUIRED >
 The Schematron project website is at
http://www.ascc.net/xml/resource/schematron/schematron.html[deFrancis] The Chinese Language
[RELAX] Murata M.