CP RSS Channel
About Our Sponsors
Articles & Papers
Technology and Society
|Schematron: XML Structure Validation Language Using Patterns in Trees|
Schematron, a rules-based schema language for XML, was published as an ISO/IEC standard in January, 2006 under the title Information technology — Document Schema Definition Languages (DSDL) — Part 3: Rule-based validation — Schematron. ISO/IEC 19757-3:2006 establishes requirements for Schematron schemas and specifies when an XML document matches the patterns specified by a Schematron schema. ISO/IEC 19757 defines a set of Document Schema Definition Languages (DSDL) that can be used to specify one or more validation processes performed against Extensible Markup Language (XML) or Standard Generalized Markup Language (SGML) documents, where XML is an application profile of SGML, ISO 8879:1986.
The Schematron is "a simple and powerful Structural Schema Language. It differs in basic concept from other schema languages in that it not based on grammars but on finding tree patterns in the parsed document. This approach allows many kinds of structures to be represented which are inconvenient and difficult in grammar-based schema languages. If you know XPath or the XSLT expression language, you can start to use The Schematron immediately. The Schematron allows you to develop and mix two kinds of schemas: (1) Report elements allow you to diagnose which variant of a language you are dealing with. (2) Assert elements allow you to confirm that the document conforms to a particular schema. The Schematron is based on a simple action: First, find a context nodes in the document (typically an element) based on XPath path criteria; Then, check to see if some other XPath expressions are true, for each of those nodes. The Schematron can be useful in conjunction with many grammar-based structure-validation languages: DTDs, XML Schemas, RELAX, TREX, etc. You can even embed a Schematron schema inside an XML Schema <appinfo> element. Free and open source implementations of Schematron are available. The Schematron is trivially simple to implement on top of XSLT and to customize. There are also implementations in Python and Perl."
Excerpted from the Introduction to the standard:
ISO/IEC 19757 defines a set of Document Schema Definition Languages (DSDL) that can be used to specify one or
more validation processes performed against Extensible Markup Language (XML) or Standard Generalized Markup
Language (SGML) documents. (XML is an application profile SGML, ISO 8879:1986.)
A document model is an expression of the constraints to be placed on the structure and content of documents to be
validated with the model. A number of technologies have been developed through various formal and informal consortia
since the development of Document Type Definitions (DTD) as part of ISO 8879, notably by the World Wide Web
Consortium (W3C) and the Organization for the Advancement of Structured Information Standards (OASIS). A number
of validation technologies are standardized in DSDL to complement those already available as standards or from
To validate that a structured document conforms to specified constraints in structure and content relieves the potentially many applications acting on the document from having to duplicate the task of confirming that such requirements
have been met. Historically, such tasks and expressions have been developed and utilized in isolation, without
consideration of how the features and functionality available in other technologies might enhance validation objectives.
The main objective of ISO/IEC 19757 is to bring together different validation-related tasks and expressions to form a single extensible framework that allows technologies to work in series or in parallel to produce a single or a set of validation results. The extensibility of DSDL accommodates validation technologies not yet designed or specified.
In the past, different design and use criteria have led users to choose different validation technologies for different portions of their information. Bringing together information within a single XML document sometimes prevents existing document models from being used to validate sections of data. By providing an integrated suite of constraint description languages that can be applied to different subsets of a single XML document, ISO/IEC 19757 allows different validation technologies to be integrated under a well-defined validation policy.
This part of ISO/IEC 19757 is based on the Schematron assertion language [Resource Description for Schematron, Rick Jelliffe]. The let element is based on XCSL [XML Constraint Specification Language, José Carlos Leite Ramalho]. Other features arise from the half-dozen early Open Source implementations of Schematron in diverse programming languages and from discussions in electronic forums by Schematron users and implementers.
The structure of this part of ISO/IEC 19757 is as follows. Clause 5 describes the syntax of an ISO Schematron schema. Clause 6 describes the semantics of a correct ISO Schematron schema; the semantics specify when a document is valid with respect to an ISO Schematron schema. Clause 7 describes conformance requirements for implementations of ISO Schematron validators. Annex A is a normative annex providing the ISO/IEC 19757-2 (RELAX NG) schema for ISO Schematron. Annex B is a normative annex providing the ISO Schematron schema for constraints in ISO Schematron that cannot be expressed by the schema of Annex A. Annex C is a normative annex providing the default query language binding to XSLT. Annex D is an informative annex providing a ISO/IEC 19757-2 (RELAX NG compact syntax) schema and corresponding ISO Schematron schema for a simple XML language Schematron Validation Report Language. Annex E is an informative annex providing motivating design requirements for ISO Schematron. Annex F is a normative annex allowing certain Schematron elements to be used in external vocabularies. Annex G is an informative annex with a simple example of a multi-lingual schema.
Considered as a document type, a Schematron schema contains natural-language assertions concerning a set of documents, marked up with various elements and attributes for testing these natural-language assertions, and for simplifying and grouping assertions.
Considered theoretically, a Schematron schema reduces to a non-chaining rule system whose terms are Boolean functions invoking an external query language on the instance and other visible XML documents, with syntactic features to reduce specification size and to allow efficient implementation.
Considered analytically, Schematron has two characteristic high-level abstractions: the pattern and the phase. These allow the representation of non-regular, non-sequential constraints that ISO/IEC 19757-2 cannot specify, and various dynamic or contingent constraints.
The Schematron was designed by Rick Jelliffe (Academia Sinica Computing Centre, Taibei). Software development takes place on SourceForge.
[September 11, 2002] Topologi Collaborative Markup Editor Supports XML and SGML. A posting from Rick Jelliffe announces the release of the Topologi Collaborative Markup Editor version 1.0.1. The Collaborative Markup Editor is a delimiter-aware text editor with markup-aware cut-and-paste operations, undo/redo, rectangular selection, clear diagnostics, and over a dozen innovative tools to handle common markup tasks. The editing environment is described as "a new tool for professional publishing teams and individuals which supports the whole of lifecycle for large and complex XML and SGML documents." The program design reflects an observation that "standard text editors don't provide enough validation but XML editors lack the flexibiltiy that publishers require." The Topologi Collaborative Markup Editor is suitable especially in contexts where the editing tasks involve markup, text conversion, efficient teamwork, multiple DTDs, larger files, and multiple languages or platforms. The tool supports Unicode and offers validation for both SGML and XML. It "gracefully handles long files and bad markup, offers fast text input and scrolling, and incorporates innovative tools in a familar interface framework. It provides flexibility by working with different document types, and adjusts readily to new document types, with special validation modes for incomplete documents." Topologi is based upon standards from ISO, IETF and W3C; it supports ISO RELAX NG, Schematron 1.5, W3C XML, Namespaces, and W3C XML Schema. The Community Edition for Wintel may be downloaded for evaluation from Topologi's website; Linux and Mac OS X versions are now in beta testing. [Full context]
- Other references:
[February 21, 2007] "Schematron News." By Rick Jelliffe. From O'Reilly News (February 19, 2007). Some of the recent news on ISO Schematron: (1) My XSLT 'skeleton implementation (the latest version of the most commonly used version of Schematron) is available in beta from Schematron.com, as open source, non-viral. This version fully supports ISO Schematron (except for abstract patterns, for which a preprocessor has been contributed) and has a lot of input from members of the schematron-love-in maillist. Notable contrabutions are from Ken Holman, Dave Pawson and Florent Georges. A variety of different output formats are available as backends, including an ISO SVRL (Schematron Validation Report Language) XML format and a terminate-on-first-error backend. (2) Topologi's Ant Task for Schematron is available now in beta from Schematron.com. The code will be available as open source, non-viral. Thanks for Allette System's Christophe Lauret and Willi Ekasalim for doing the programming on this. It can output text to standard error or collate all the SVRLs into a single XML file. (3) Dave Pawson is writing a little online book ISO Schematron tutorial concentrating on using Schematron with XSLT2. I haven't reviewed it thoroughly yet, but Dave has a good track record. (4) Mitre's Roger Costello has written up two pages 'Usage and Features of Schematron' and 'Best way to phrase the Schematron assertion text' that seem pretty sensible to me. Roger followed his usual method of asking people on the XML-DEV maillist and compiling the results. (5) Murata Makoto has been preparing the Japanese translation of ISO Schematron, to be used as the text for the Japanese Industrial Standard. He has also been translating other parts of ISO DSDL. The great thing about diligent translators such as Dr Murata and Dr Komachi is that they uncover many practical issues; in Schematron's case there are a couple of paragraphs in the ISO standard that seem completely reasonable when you know what they are supposed to mean, but actually are pretty cryptic. Murata-san also has pointed out an improvement to the formal specification of Schematron in predicate logic.
[February 18, 2007] "Schematron Usage and Features." By Roger L. Costello. "Categories of Schematron Usage: (1) Co-constraint checking. In the example there is a co-constraint between the two Classification values; namely, the two values must be identical. In general, co-constraints are constraints that exist between data (element-to-element co-constraints, element-to-attribute, attribute-attribute). Co-constraints may be 'within' an XML document, or 'across' XML documents (intra- and inter-document co-constraints). Schematron is very well-suited to expressing co-constraints. The term 'co-constraint' is a misnomer, as it suggests a constraint only between two items. There may in fact be a constraint over multiple items, not just two items. For example, if there were many Classification elements then we need to check that ALL values are identical. Co-constraints may exist between XML structure components (elements, attributes) as well as between data values. For example, if Classification has the value 'unclassified' then Document must only contain the elements shown above; if Classification has the value 'secret' then Document must only contain other elements... (2) Cardinality checking. In the example the cardinality constraint is: the text in the Para element must not contain any restricted keywords; the keywords may be obtained dynamically from another file. In general, cardinality constraints are constraints on the occurrence of data. The cardinality constraints may apply over the entire document, or to just portions of the document. Schematron is very well-suited to expressing cardinality checks... (3) Algorithmic checking. In general, validity of data in an XML instance document is determined not by mere examination or comparison of the data, but requires performing an algorithm on the data. Schematron is very well-suited to expressing algorithmic checks... (4) Author specified error messages. Schematron allows the schema author to write the error messages, thus the errors can be reported at a higher (operational/user) level. The schema author can thus communicate with the user and explain the error in an understandable way and direct the user on how to correct the problem... (5) External Data Mashups. Data used in Schematron assertions may be dynamically obtained from external files...' See also Schematron Assertion Text: 'Schematron is an assertion-based schema language. Assertions are expressed using XPath and natural language (i.e., the 'Assertion Text'). How the Assertion Text is phrased is an important consideration. In general, it should be phrased using terminology appropriate to the domain. Beyond that, how it is phrased depends on whether the assertion is intended to be used as a statement of a contractual obligation, or the assertion is intended to be used as a friendly message to guide users, or the assertion is intended to be used as information for developers." Source: the posting.
[February 16, 2007] Open Source Beta Ant Task for Schematron Now Available. Rick Jelliffe, Software Announcement. See the documentation PDF. Developers have announced the availability of an Java Ant task for ISO Schematron validation. It can output error messages to the console, or aggregate SVRL messages for each file into a single file. It supports phases and other Schematron features, using the recent ISO Schematron skeleton. The software was written by Christophe Lauret and Willi Ekasalim at Allette Systems for Topologi Pty. Ltd. and will be released as open source (probably as part of the standard tasks for Ant). The current version is beta quality, test reports and comments are very welcome. Ant is a 'make' system for Java; there is also a .NET version. It is useful for batch processing files, especially for running the same processes on multiple files that are part of larger document sets. Note: the ISO SVRL (Schematron Validation Report Language) implementation is also now available. SVRL is an XML language to present the results of validating with a Schematron schema. It can be used for testing implementations, benchmarking, and for collecting validation data to be sent to other formatting or reporting stages. SVRL is Annex D of ISO Schematron...
[January 25, 2007] "'oXygen' NVDL (Namespace-based Validation Dispatching Language) Editor." Staff, Syncro Soft Ltd Announcement. Syncro Soft Ltd, the producer of "oXygen" XML Editor, has announced the immediate availability of version 8.1 of its XML Editor, Schema Editor and XSLT/XQuery Debugger. Version 8.1 of the "oXygen" XML Editor improves the support for NVDL scripts and adds a series of enhancements, fixes and component updates. The new "oXygen" NVDL (Namespace-based Validation Dispatching Language) editor allows you to visually edit NVDL scripts. A diagram showing the script structure and allowing navigation from a mode reference to its definition is available. When editing an NVDL script the content completion offers assistance for entering a mode reference by presenting the defined modes and for entering a new mode by presetting the modes used but not defined. Also the NVDL schema that drives the content completion was annotated, so you will get documentation for the proposals offered during editing. Using the tool one may edit and validate support for XML Schema (visual diagram), Relax NG (visual diagram), NVDL scripts, DTD, Schematron. It is available as standalone desktop or Java Web Start application, or as an Eclipse plugin. It supports multiple validation engines, including: Xerces, XSV, LIBXML, MSXML 4.0, MSXML.NET, Saxon SA, SQC...
[January 15, 2007] "Update on the Service Modeling Language (SML)." By Sam Ramji (Microsoft Open Source Labs). The author reports on an update to the Service Modeling Language draft specification created by Microsoft and a number of other leading technology companies. SML is designed to model complex IT services and systems, including their structure, constraints, policies, and best practices. SML is based on a profile on XML Schema and Schematron. SML was created by the SML working group whose members are BEA, BMC, Cisco, Dell, EMC, HP, IBM, Intel, Microsoft and Sun. SML will allow for the creation of best practices and policies that automate the services' validation, development, operations, updates and end-of-life — the full lifecycle. SML does not prescribe a specific IT model or set of models; instead, it defines the syntax and semantics that all SML models must follow: their base vocabulary, the rules of composition, the grammar and the syntax. SML Specifies: (1) Profiles for the use of XML 1.0 Schema and Schematron to define service models; (2) Extensions to support and constrain inter-document references in those models; (3) Inter-document uniqueness and key definitions plus the ability to use them across documents; (4) Rules to capture best practices and policies... From the SML specification [Service Modeling Language, Draft Specification, Version 0.65, 7 November 2006] 4. Rules: "XML Schema supports a number of built-in grammar-based constraints but it does not support a language for defining arbitrary rules for constraining the structure and content of documents. Schematron is an ISO standard for defining assertions concerning a set of XML documents. SML uses a profile of the Schematron schema to add support for user-defined constraints. SML uses XPath1.0, augmented with SML-specific XPath extension functions, as its constraint language. This section assumes that the reader is familiar with Schematron concepts; the Schematron standard is documented [biblio refs] are good tutorials on an older version of Schematron..."
[January 3, 2007] "UBL Methodology for Code-list and Value Validation." By Rick Jelliffe. From O'Reilly Reviews. Ken Holman sent me copy of the latest draft of the OASIS/UBL Methodology for Code-list and Value Validation, which is a pretty good use of Schematron. It looks like a neat and workable solution to a problem that is somewhere between baroque and a hard place using XSD. Imagine you are a trading company: you have documents which various fields for countries: countries you can send from, countries you can send to, countries the US won't allow you to export to, countries you can use as hubs, countries with regional offices, etc. And you also have lots of other documents with similar or different sets of countries. And countries are only the start: you also have product codes where different fields can have different sets of codes, and so on. And this may vary according to where the document came from (the Libyan branch office may have different rules from the Alaskan branch office). And, of course, the values of codes may have interdependencies, such as "the source must be different from the destination." So lots of uses of a standard vocabulary, but lots of local and changing subsets that are much closer to "business rules" than "datatypes". If you used XML Schemas, you could theoretically derive by restriction all the different subset codes, then use "redefine" on every top-level element that used the subsets. You'd have to do this redefine on base types where possible, so that subsequent derived types would inherit the restriction, perhaps, except then you'd have to check that any subsequent derived types that themselves define restrictions are indeed subsets. Have a breakdown and a good cup of tea. With the Schematron approach, you select the items from the code list you want, and some magic tool provided by the methodology generates the Schematron code, which just uses simple XPaths..." See also announced earlier.
[December 19, 2006] "Universal Business Language (UBL) V2.0 Approved as OASIS Standard." OASIS Staff. [...] In addition to greatly expanding the range of business processes supported by UBL, version 2.0 also taps the power of W3C XSLT, W3C XPath, and ISO Schematron to provide a breakthrough in code list management. Tim McGrath, vice chair of the OASIS UBL Technical Committee: "Employing the new 'genericode' XML specification for code list publication currently under development in OASIS, our approach allows trading partners to easily and precisely specify code list subsets and extensions and even to apply them to particular elements and subtrees within UBL instances -- all without changing the standard UBL schemas. Once in place, this standards-based process enables the implementation of business rule checking as part of instance validation. Open source software included in the UBL 2.0 release provides this new functionality 'out of the box.'" *Update in UBL Methodology for Code-list and Value Validation.
[October 18, 2006] DocBook Version 4.5 Approved as an OASIS Standard. Staff, OASIS Announcement. "... The DocBook TC has also produced several versions of Version 5.0, rewritten as a native RELAX NG grammar. The goals of this redesign were to produce a schema that 'feels like, DocBook, so that most existing documents should still be valid or it should be possible to transform them in simple, mechanical ways into valid documents. It also enforces as many constraints as possible in the schema. Some additional constraints are expressed with Schematron rules.
[August 2006] "Implementing XML Schema Naming and Design Rules: Perils and Pitfalls." Joshua Lubell, et al. Presentation at Extreme Markup 2006. [NIST QoD (Quality of Design) software tool kit encodes rules in Schematron assertion language]. Organizations developing XML schemas often establish NDRs (Naming and Design Rules) in order to maximize interoperability and quality. NDRs are a good way to help enforce best practices, a particular modeling methodology (such as the CCTS - Core Components Technical Specification), or conformance to standards such as ISO 11179 (Naming and Design Principles for Data Elements). But no single set of Naming and Design Rules can satisfy everyone's requirements. As a result, NDRs are proliferating. And new groups embarking on XML schema development are asking, 'Should we create our own NDR, or should we use a pre-existing one?' To help address proliferation and reuse, the National Institute of Standards and Technology (NIST) is building a QoD (Quality of Design) software tool kit to make it easier for schema developers to choose and apply an appropriate NDR set. A Web-based prototype allows users to upload a schema and select rules from a cross-section of NDRs to check the schema against. The prototype's purpose is to provide a user-friendly environment for checking XML schema design quality in a collaborative environment. The rules are encoded in either the Schematron assertion language or in the Jess (Java Expert System Shell - Jess) expert system rule language. [...] "This example shows that a seemingly simple rule can become more complex upon closer examination. The complexity becomes even more pronounced when we attempt to implement the rule. As our implementation method, we choose Schematron, a schema language for XML. Schematron can be used to validate any XML document, including an XML schema itself. Schematron differs from most other schema languages in that it is rule-based and uses XPath expressions instead of grammars. Instead of enforcing a grammar on an XML document, a Schematron processor applies assertions to specific context paths within the document. If an XML document fails to meet an assertion, a diagnostic message supplied by the author of the Schematron schema can be displayed. Because Schematron supports assertions about arbitrary patterns in XML documents, and because diagnostic messages are author-supplied, Schematron can enforce constraints that would be problematic to enforce using grammar-based schema languages. The following Schematron schema, which shows Schematron's expressive power, implements rule ELD1..."
[June 27, 2006] "Apply Schematron Constraints to XForms Documents Automatically." By J.J. Kratky, K.E. Kelly, S. Speicher, and K. Wells. From IBM developerWorks. IBM alphaWorks has released a new round of free tools, including the XML Forms Generator, to accelerate the development of forms that comply to the W3C XForms standard. The recent update lets you apply constraints defined in a Schematron 1.5 document to the generated form. Itself an XML markup, Schematron provides for the specification of business rules and data relationships that XML Schema cannot. While XForms natively provides for validation against XML Schema, any use of Schematron constraints must be built into the form itself. With development efforts already under way to integrate Schematron constraints with XForms, automation of the application of these constraints is a natural next step. W3C XML Schema is widely used and well-suited for statically describing the structure and content of XML. It is, however, limited in terms of more-dynamic analysis of instances. For example, in XML Schema, you cannot constrain an XML document in this way: "The sum of the values of elements A and B must be equal to 100." In Schematron, you can specify a constraint such as that easily. Like XML Schema, Schematron is itself XML, and is therefore a natural fit for XForms, which is itself an XML markup for manipulating XML data. With its small tag set and use of familiar syntax such as XPath, Schematron is easy to learn and write, yet powerful. The International Organization for Standardization (ISO) is working toward the standardization of Schematron; a draft specification is available.
[May 2006] Information technology — Document Schema Definition Languages (DSDL). Part 3: Rule-based validation — Schematron. [== Technologies de l'information — Langages de définition de schéma de documents (DSDL), Partie 3: Validation de règles orientées — Schematron]. International Standard: ISO/IEC 19757-3, First edition. 2006-06-01. Reference number: ISO/IEC 19757-3:2006(E). Copyright (c) ISO/IEC 2006. 38 pages. ISO/IEC 19757-3 was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology, Subcommittee SC 34, Document description and processing languages. Available online from the ISO/IEC Freely Available Standards collection.
[November 09, 2004] Final Committee Draft of ISO Schematron Released for Public Review. A communiqué from Rick Jelliffe describes the availability of a ISO FCD (Final Committee Draft) for ISO/IEC 19757-3 Document Schema Definition Languages (DSDL) — Part 3: Rule-Based Validation — Schematron. Schematron is a language for making assertions about patterns found in XML documents, and serves as a schema language for XML. As Part 3 of the multi-part ISO/IEC 19757 (DSDL) standard, it defines "requirements for Schematron schemas and specifies when an XML document matches the patterns specified by a Schematron schema." This Final Committee [Review] Draft of ISO Schematron incorporates feedback from national standards bodies and from implementers. It is available online in PDF, HTML, and RTF formats. Improvements "include an annex on multilingual schemas, further treatment of abstract patterns, and validated schemas. The predicate logic used to specify Schematron formally has also been reworked. The specification remains very small, at about 35 pages including front matter, schemas and non-normative annexes." This FCD draft has been made publicly available for comment, for identification of spelling errors, and as an aid implementers and users until the final International Standard is published in paper by ISO and other nations that adopt Schematron as a national standard, expected in 2005. This text is suitable as the interim reference for organizations adopting Schematron. The editors encourage all Schematron implementers to check the draft standard and to add support for it for 2005. According to the Schematron.com Overview, "the Schematron differs in basic concept from other schema languages in that it not based on grammars but on finding tree patterns in the parsed document. This approach allows many kinds of structures to be represented which are inconvenient and difficult in grammar-based schema languages. If you know XPath or the XSLT expression language, you can start to use The Schematron immediately. It allows you to develop and mix two kinds of schemas: (1) Report elements allow you to diagnose which variant of a language you are dealing with; (2) Assert elements allow you to confirm that the document conforms to a particular schema."
[October 08, 2004] "Yet More Schematron Activity." By Uche Ogbuji. From O'Reilly Developer Weblogs. "Earlier on Schematron inventor Rick Jelliffe blogged some exciting developments in the area. Heavy activity continues in the world of Schematron, the validation language++ for XML. ISO standardization is impending, implementations advance and tutorials/articles proliferate. Schematron is just all over .NET. Besides the various implementations, Dare Obasanjo offers his own advocacy in the MSDN article 'Improving XML Document Validation with Schematron' my own implementation Scimitar is up to version 0.9.0, implementing all that I could decipher from the ISO draft. I've already started on the 1.1 generation of Scimitar, which includes some support for Jeni Tennison's Datatype Library Language (DTLL), so if no major bugs or omissions turn up soon, 0.9.0 will pretty much become 1.0. Whether standalone or embedded in RELAX NG or WXS, if you're using XML, you should at least be considering Schematron..."
[October 08, 2004] "Discover the flexibility of Schematron abstract patterns. Advanced Schematron features open up extraordinary possibilities for XML schemata." By Uche Ogbuji (Principal Consultant, Fourthought, Inc). From IBM developerWorks. October 08, 2004. "If you have the basics of an XML format in mind, but know that you will not be able to get everyone at the table to agree to every detail of the schema, consider Schematron abstract patterns. Schematron is probably the most powerful XML schema language available (and it can be much more than just a schema language). Its advanced features, especially abstract patterns, allow for schemata that you can quickly adapt to multiple variants of XML formats. This opens up extraordinary possibilities for XML schema, including the abilities to restrict XML formats and to make them generic and adaptable as well... ISO Schematron is a very unique XML schema language, offering extraordinary power in conjunction with other schema languages or on its own. As I pointed out in my tutorial 'A hands-on introduction to Schematron', Schematron is not just a schema language but a full-blown reporting facility for XML. I shall focus on variable assignment and abstract patterns, which open up some impressive possibilities for XML schema design. In the tutorial and in this article, I use the term candidate XML to describe the XML file against which a Schematron schema is invoked.. Schematron is a host language for many potential means of accessing data (which could include XML or something else, such as flat text or database formats). But almost every Schematron implementation that I know of uses XPath and XSLT as query languages and is used for processing XML..."
[October 06, 2004] "Schematron 1.5: Looking Under the Hood." By Bob DuCharme. From XML.com (October 06, 2004). "Schematron's reference implementation is written using XSLT stylesheets, and while no knowledge of XSLT is necessary to use Schematron, XSLT developers can learn a lot by studying these stylesheets. I like to think of a Schematron schema as a collection of rules about conditions that must be true in the data being checked (stored in assert statements) and conditions that, if true, generate an error message (report elements). From an XSLT point of view, the basic workflow of Schematron usage works like this: An XSLT stylesheet available on the Schematron web site reads your rules file — a 'schema' expressed in Schematron's own simple, straightforward syntax — and uses it to generate another stylesheet. Whenever you want to check that your data conforms to the rules in your rules file, you run the generated stylesheet against that data. If you update your rules file, you need to rerun the generating stylesheet against it to keep your rules-checking stylesheet up-to-date. The separation of the rules-checking logic from the interface and the resulting ease with which we can develop new interfaces makes it easy to integrate Schematron with a variety of applications. Schematron's home page mentions API implementations in Java, Perl, Python, and .NET C#. The combination of Schematron's power with the simplicity of its API make it worth serious consideration for integration into any ambitious XML application, especially one using XSLT..."
[September 2004] "Improving XML Document Validation with Schematron." By Dare Obasanjo. From Microsoft MSDN Library (September 2004). "Currently the most popular XML schema language is the W3C XML Schema Definition language (XSD). Although XSD is capable of satisfying scenarios involving type annotated infosets it is fairly limited when it comes to describing constraints on the structure of an XML document. There are many examples of situations where common idioms in XML vocabulary design are impossible to express using the constraints available in W3C XML Schema. The three most commonly requested constraints that are incapable of being described by W3C XML Schema are: (1) The ability to specify a choice of attributes; (2) The ability to group elements and attributes into model groups; (3) The ability to vary the content model based on the value of an element or attribute. For example, if the value of the status attribute is available then the element should have an uptime child element; otherwise it should have a downtime child element. The technical name for such constraints is co-occurrence constraints. Although these idioms are widely used in XML vocabularies it isn't possible to describe them using W3C XML Schema, which makes it difficult to rely on schema validation for enforcing the message contract. This article describes how to layer such functionality on top of the W3C XML Schema language using Schematron. It shows that it is possible to use Schematron and the W3C XML Schema to have one's cake and eat it, too..."
[February 02, 2004] IBM Proposes BI-ICS Specification for Declaring Business Information Conformance. IBM researchers have published a BI-ICS specification which supports a formal declaration of business information conformance through an extensible XML vocabulary. The developers believe that standardization of the BI-ICS specification would help increase business interoperability and define a foundation for future related specifications. BI-ICS "provides the ability to declare that business information is conformant with type systems and processes. It intrinsically supports the declaration of a sequential conformance model with specific W3C XML Schema instances, MIME types, and XSLT transforms based on Schematron assertions. BI-ICS is extensible for extended support of alternate type systems, process mechanisms, and conformance models." The BI-ICS document does not specify actual usage contexts, but "many uses are possible for many types of information. Formally stating information conformance using ICS also allows for trading partners to advertise the information constraint requirements needed for conducting electronic business, by embedding an BI-ICS into partner agreements, registries, web site publishing, etc. A BI-ICS formalism may also be utilized within interaction middleware runtimes to check information conformance prior to application dispatch, or exchanged at runtime to express the information constraints for conducting electronic business with a specific partner. A corresponding tool "Business Integration - Information Conformance Statements for Java (BI-ICS4J)" is available from IBM alphaWorks. BI-ICS4J contains a conformance engine supporting W3C XML Schema and Schematron assertions. A conformance statement may be created by hand or by a tool such as the visual ICSManipulator in BI-ICS4J; the statement specifies a set of 'conformances' and a location URL of each conformance resource mechanism. "Typically, a conformance statement may declare a model of increased constraints, starting with industry-level schema such as a purchase order, and specify additional constraints required by a specific business. The conformance statement is processed by BI-ICS4J against an instance of business information in order to check conformance, and a yes/no result is yielded."
[November 24, 2003] "An Introduction to Schematron." By Eddie Robertsson. From XML.com (November 12, 2003). "The Schematron schema language differs from most other XML schema languages in that it is a rule-based language that uses path expressions instead of grammars. This means that instead of creating a grammar for an XML document, a Schematron schema makes assertions applied to a specific context within the document. If the assertion fails, a diagnostic message that is supplied by the author of the schema can be displayed. One advantages of a rule-based approach is that in many cases modifying the wanted constraint written in plain English can easily create the Schematron rules. In order to implement the path expressions used in the rules in Schematron, XPath is used with various extensions provided by XSLT. Since the path expressions are built on top of XPath and XSLT, it is also trivial to implement Schematron using XSLT, which is shown later in the section Schematron processing. Schematron makes various assertions based on a specific context in a document. Both the assertions and the context make up two of the four layers in Schematron's fixed four-layer hierarchy: phases (top-level), patterns, rules (defines the context), and assertions... This introduction covers only three of these layers (patterns, rules and assertions); these are most important for using embedded Schematron rules in RELAX NG... Version 1.5 of Schematron was released in early 2001 and the next version is currently being developed as an ISO standard. The new version, ISO Schematron, will also be used as one of the validation engines in the DSDL (Document Schema Definition Languages) initiative..."
[June 19, 2003] Namespace Routing Language (NRL) Supports Multiple Independent Namespaces. James Clark has announced the publication of a Namespace Routing Language (NRL) specification. NRL is "an XML language for combining schemas for multiple namespaces; it allow the schemas that it combines to use arbitrary schema languages." The release includes a tutorial and specification document and a sample implementation in the Jing (RELAX NG Validator in Java) distribution. NRL "is the successor to Clark's Modular Namespaces (MNS) language and is intended to be another step on the path towards Document Schema Definition Languages (DSDL) Part 4." The W3C XML Namespaces Recommendation itself "allows an XML document to be composed of elements and attributes from multiple independent namespaces: each of these namespaces may have its own schema and the schemas for different namespaces may be in different schema languages. The problem then arises of how the schemas can be composed in order to allow validation of the complete document." The Namespace Routing Language attempts to solve this problem. Among the features and benefits of NRL: it supports schema language coexistence, allows extension of schemas not designed to be extended, makes authoring of extensible schemas easier supports 'transparent' namespaces, allows contextual control of extension, and allows concurrent validation. "For RELAX NG, it can be used to provide some of the namespace-based modularity features that are built-in to XSD. NRL is designed to allow an implementation to stream, and the sample implementation does so. The sample implementation has a SAX-based plug-in architecture that allows new schema languages to be added dynamically. It comes with support for RELAX NG (both XML and compact syntax), W3C XML Schema (via a wrapper around Xerces-J), Schematron, and (recursively) NRL; it can also use any schema language with an implementation that supports the JARV interface."
[November 25, 2002] "The Undecidability of the Schematron Conformance Problem." By Robert C. Lyons (Unidex, Inc). November 2002. "... The Post Correspondence Problem (PCP) is an undecidable problem... The proof that the PCP is undecidable can be built upon the undecidability of the Halting Problem. The undecidability of the PCP has often been used to prove the undecidability of other problems. It occurred to me that the undecidability of the PCP could be used to prove the undecidability of the Schematron Conformance Problem. In this paper, we'll describe the Schematron Conformance Problem (SCP) and Post Correspondence Problem (PCP). We'll then prove that the Schematron Conformance Problem is undecidable (even if we restrict ourselves to Schematron schemas that don't use the document and key functions). We assume that you are familiar with the basics of Schematron, which is a powerful XML schema language in which validity constraints are defined using XPath expressions... To prove that the SCP is undecidable, we show, by way of contradiction, that if the SCP is decidable, then the PCP is decidable..." Summary: "we showed that the Schematron Conformance Problem (SCP) is undecidable. This is true even if we restrict ourselves to Schematron schemas that do not use the document and key functions of XPath. To prove the undecidability of the SCP, we showed that if the SCP is decidable, then the Post Correspondence Problem is decidable; however, the Post Correspondence Problem is undecidable. Therefore, the SCP must be undecidable. The fact that the SCP is undecidable means that it's impossible to build a Schematron schema editor that evaluates the user's schema and always lets him/her know whether or not there are any XML documents that conform to the schema..." Note from Lyons in posting to XML-DEV: "Recently, I was reading about the Post Correspondence Problem (PCP), which is undecidable (i.e., unsolvable). I realized that the undecidability of the PCP could be used to prove that the following problem is undecidable: Given a Schematron schema, is there an XML document that conforms to this schema? If you're interested in reading the proof and a description of the PCP [see the paper]."
[November 22, 2002] A posting from Rick Jelliffe reports on a new Open Source Java implementation of Schematron 1.5. "This implementation will appeal to Java developers who want an uncomplicated implementation of Schematron, to be used with a JAXP-based XSLT engine, such as SAXON or XALAN. Phases are supported. Prepared by Eddie Robertsson, the code is now available at http://www.topologi.com/public/index.html. The distribution also includes support for embedded Schematron (in RELAX NG schemas, using James Clark's Jing package). This is Eddie's third implementation of a Schematron framework: he also wrote the VB implementation used by Topologi's Schematron Validator and the Java implementation used by Topologi's Collaborative Markup Editor." Note: "The Java API for XML Processing (JAXP) supports processing of XML documents using DOM, SAX, and XSLT. JAXP enables applications to parse and transform XML documents independent of a particular XML processing implementation..."
- [June 21, 2002] Schematron (class_schematron.php). "... a class to process Schematron validations from PHP. XML documents can be validated from files, URLs or PHP strings and the Schematron script can be processed from a file, script or string too. Schematron scripts can be "compiled" into XSLT stylesheets. Methods to validate files, documents or URLs using the compiled scripts are provided as well..." See PHP XML Classes: A collection of classes and resources to process XML using PHP.
[May 17, 2002] "Filling in the DTD Gaps with Schematron." By Bob DuCharme. From XML.com. May 15, 2002. ['Using Schematron to add greater capabilities to applications using DTDs.'] "Many XML developers, just when they've gotten used to DTDs, are hearing about alternatives and wondering what to do with them. W3C schemas, RELAX NG, Schematron -- which should they go with? What will each buy them? What software support does each have? How much of their current systems will they still be able to use? The feeling of unease behind these questions can be summed up with one question: if I leave DTDs behind to use one of the others, will I regret it? One nice thing about Schematron is its ability to work as an adjunct to the others, including DTDs, so you don't have to leave DTDs behind to take advantage of Schematron. To use Schematron in combination with RELAX NG, Sun's msv validator has an add-on library that lets you check a document against a RELAX NG schema with embedded Schematron rules, but you don't need a combination validator like msv to take advantage of Schematron. There's nothing wrong with checking a document against one type of schema and then checking it against a set of Schematron rules as well. In fact, more and more XML developers are realizing that a pipeline of specialized processes that each check for one class of problems can serve their needs better than a monolithic processor that does most of what they need and several more things that they don't need... This turns out to be the answer to the prayers of many developers wondering about the best way to move on from DTDs. If you have a working system built around DTDs and want to take advantage of the Schematron features that are unavailable in DTDs, you can go ahead and write the Schematron rules that fill in those gaps and continue using your DTD-based system... Once XPath 2.0 is implemented in some XSLT processors, I should be able to do type checking properly without moving beyond my Schematron+DTD combination. And I do have a lot right now: all the new possibilities of Schematron for specifying data constraints without giving up any of the features of DTDs and the extensive support available for them. Schematron support only requires an XSLT processor, and there are plenty of those around. The fact that [almost] all of the examples in this article were real problems that I had to solve for a project unrelated to this article made it clear to me: Schematron can add a lot to XML-based systems currently in production or in development without forcing us to leave DTDs behind until we're good and ready to..."
[October 27, 2001] Updated Topologi Schematron Validator Supports Validation for Multiple Schema Languages. A posting from Rick Jelliffe announces an updated release of the Topologi Schematron Validator. The Topologi Schematron Validator is a "free Windows-based tool for checking XML documents against the assertions in a Schematron schema. Using the tool, you can (1) validate one or several XML documents using DTDs, W3C XML Schemas, Schematron schemas embedded in W3C XML Schemas, and Schematron schemas; (2) view the results in a convenient linked-view browser; (3) automatically generate RDF statements or Topic Maps; (4) edit the results; (5) print the results; and (6) save the results. A variety of Schematron schemas have been included in this distribution, including RSS, RDF, SOAP, SMIL, WSDL, QAML, XTM, XLink, WAI XHTML, RDDL XHTML, CALS tables and, of course, the Schematron 1.5 schema itself." Enhancements include use of the Microsoft MSXML 4 RTM library, support for the schemaLocation attribute, and improved schemas. Schematron is "an assertion language for XML based on matching combinations of XPath expressions. It can be used both as a schema language and for automatically generating external markup (such as RDF, XLinks and Topic Maps) to annotate XML documents." [Full context]
[June 28, 2001] Topologi Announces Schematron Validator. A communiqué from Rick Jelliffe announces the availability of the Topologi Schematron Validator tool. The Schematron Validator user interface "supports validation of multiple files, and store/recall of locations to allow validation of chains of transformations. As well as Schematron validation, the tool also supports DTDs and W3C XML Schemas validation. Schematron schemas can be embedded in W3C XML Schemas schemas to augment them. The tool is highly reconfigurable. It also provides XSLT transformations, and the automatic generation of Topic Maps and RDF (beta). It allows simple editing and a variety of different viewers, including text, single-pane web-browser and double-pane web-browser. The Topologi Schematron Validator comes with schemas for CALS tables, NITF, QAML, RDDL, RDF, RSS, Schematron, SOAP, SMIL, XHTML WAI, WSDL, XLink, and XTM. These are all open source and readily accessible. The tool will be useful for anyone with document which have constraints that cannot be expressed in schema languages such as DTDs, XML Schemas, RELAX, etc. and for creating friendly validators for files. Educators may find it convenient for teaching XML classes. Experimenters will appreciate the tool's configurability." The tool is available for no cost from the Topologi web site. The developers have provided an online version of the Schematron Validator manual, together with screen shots. [Full context]
[February 12, 2001] Schematron Version 1.5 Assertion Language Supports 'phrases' for Dynamic Validation. A communiqué from Rick Jelliffe (Academia Sinica Computing Centre) reports on the version 1.5 release of Schematron, including references for the updated specification, implementation, conformance tests, mailing list, and schemas web site. Schematron is a "simple XML-based assertion language using patterns in trees. Its uses include validation, automated link generation, and for triggering actions based on complex criteria. Version 1.5 adds support for adds 'phases', a way of grouping patterns together to allow dynamic validation where different rules and assertions will be tested according to the phase, 'diagnostics', for generating very specific diagnostic hints, 'abstract rules', which allow more convenient declarations and type extension, and various smaller changes to allow elements to be decorated with more kinds of information that metastylesheets or user interfaces need." [Full context]
[April 2001] "Schematron: Validating XML Using XSLT." By Leigh Dodds. Paper presented at the XSLT UK Conference (2001) (Keble College, Oxford, England). "Schematron [Schematron] is a structural based validation language, defined by Rick Jelliffe, as an alternative to existing grammar based approaches. Tree patterns, defined as XPath expressions, are used to make assertions, and provide user-centred reports about XML documents. Expressing validation rules using patterns is often easier than defining the same rule using a content model. Tree patterns are collected together to form a Schematron schema. Schematron is a useful and accessible supplement to other schema languages. The open-source XSLT implementation is based around a core framework which is open for extension and customisation. This paper introduces the Schematron language and the available implementations. An overview of the architecture, with a view to producing customised versions is also provided... Schematron is unique amongst current schema languages in its divergence from the regular grammar paradigm, and its user-centric approach. Schematron is not meant as a replacement for other schema languages; it is not expected to be easily mappable onto database schemas or programming language constructs. It is a simple, easy to learn language that can perform useful functions in addition to other tools in the XML developers toolkit. It is also a tool with little overhead, both in terms of its learning curve and its requirements. XSLT engines are regular components in any XML application framework. Schematrons use of XPath and XSLT make it instantly familiar to XML developers. A significant advantage of Schematron is the ability to quickly produce schemas that can be used to enforce house style rules and, more importantly, accessibility guidelines without alteration to the schema to which a document conforms. An XHTML document is still an XHTML document even if it does not meet the Web Accessibility Initiative Guidelines [WAI]. These kind of constraints describe a policy which is to be enforced on a document, and can thus be layered above other schema languages. Indeed in many cases it may be impossible for other languages to test these kinds of constraints. [cache]
[September 16, 2000] Rick Jelliffe (Academia Sinica Computing Center, Taipei) recently announced the creation of a opensource Schematron project on the SourceForge website. SourceForge is 'a free service to Open Source developers offering easy access to the best in CVS, mailing lists, bug tracking, message boards/forums, task management, site hosting, permanent file archival, full backups, and total web-based administration.' Rick writes: "I am happy to announce the start of a project on the Source Forge website for Schematron. The two main facilities we are making use of are: (1) a mail list for anyone interested in Schematron, alternative schema languages and automated, external inference of assertions about structured data; I don't think it will be a high volume site, but I hope it will be interesting; (2) a central public site for adding implementations and schemas. To register for the mail-list, go to webpage http://lists.sourceforge.com/mailman/listinfo/schematron-love-in; to see the project (which will be loaded over the next few weeks), go to the project page." The Schematron is 'An XML Structure Validation Language using Patterns in Trees'. It is an assertion language for XML based on matching combinations of XPath expressions. It can be used both as a schema language and for automatically generating external markup (such as RDF, XLinks and Topic Maps) to annotate XML documents. "The Schematron differs in basic concept from other schema languages in that it not based on grammars but on finding tree patterns in the parsed document. This approach allows many kinds of structures to be represented which are inconvenient and difficult in grammar-based schema languages. If you know XPath or the XSLT expression language, you can start to use The Schematron immediately. The Schematron can be useful in conjunction with many grammar-based structure-validation languages: DTDs, XML Schemas, DCD, SOX, XDR. The Schematron allows you to develop and mix two kinds of schemas: (1) Report elements allow you to diagnose which variant of a language you are dealing with. Many languages have these kind of variants: HTML 2, 3.2., Strict HTML 4, Transitional HTML 4, Frameset HTML 4, ISO HTML, etc. (2) Assert elements allow you to confirm that the document conforms to a particular schema. The Schematron is based on a simple action: First, find a context nodes in the document (typically an element) based on XPath path criteria; Then, check to see if some other XPath expressions are true, for each of those nodes." For an overview of the Schematron, see (1) "Introducing the Schematron. A fresh approach to XML validation and reporting," by Uche Ogbuji and (2) the Zvon Schematron tutorial by Nic Miloslav.
[October 19, 1999] Schematron - XML Structure Validation Language using Patterns in Trees. A communiqué from Rick Jelliffe of the Academia Sinica Computing Centre reports on the initial release of "The Schematron" application, an XML Structure Validation Language using Patterns in Trees. "The Schematron differs in basic concept from other schema languages in that it not based on grammars but on finding tree patterns in the parsed document. This approach allows many kinds of structures to be represented which are inconvenient and difficult in grammar-based schema languages. If you know XPath or the XSLT expression language, you can start to use 'The Schematron' immediately! The Schematron allows you to develop and mix two kinds of schemas: report schemas and assert schemas. It is implemented as a simple XSL script and works with the latest version of XT, and possibly other versions of XSL too. It complements content-model-based structural schema languages such as DTDs, DCD, XDR, SOX, and XML Schemas. The Schematron home page can be found at http://www.ascc.net/xml/resource/schematron/schematron.html; the web site includes some of the rationale behind the approach." See also the Schematron tutorial from Miloslav Nic [1999-10-27].
[September 16, 2000] "Introducing the Schematron. A fresh approach to XML validation and reporting." By Uche Ogbuji. In SunWorld Magazine (September 16, 2000). ['Judging from the ongoing developments and debates about XML document validation, it's evident the language is in flux. In this article, writer and consultant Uche Ogbuji gets a handle on some of these changes and introduces the Schematron, a new validation and reporting methodology and toolkit.]' The Schematron is a validation and reporting methodology and toolkit developed by Rick Jelliffe, a member of the W3C Schema working group. Without denigrating the efforts of his group, Mr. Jelliffe has pointed out that XML Schemas may be too complex for many users, and so he approaches validation from the same approach as the DTD. Jelliffe developed the Schematron as a simple tool to harness the power of XPath, attacking the schema problem from a new angle. As he writes on his Website: 'The Schematron differs in basic concept from other schema languages in that it is not based on grammars but on finding tree patterns in the parsed document. This approach allows many kinds of structures to be represented which are inconvenient and difficult in grammar-based schema languages.' The Schematron is no more than an XML vocabulary that can be used as an instruction set for generating stylesheets. . . There are several things that DTDs provide that Schematron cannot, such as entity and notation definitions, and fixed or default attribute values. RELAX does not provide any of these facilities either, but XML Schemas provide them all -- as they must, because they are positioned as a DTD replacement. RELAX makes no such claim, and indeed the RELAX documentation has a section on using RELAX in concert with DTDs. We have already mentioned that Schematron, far from claiming to be a DTD replacement, is positioned as an entirely fresh approach to validation. Nevertheless, attribute-value defaulting can be a useful way to reduce the clutter of XML documents for human readability, so we'll examine one way to provide default attributes in association with Schematron. Remember that you're always free to combine DTDs with Schematron to get the best of both worlds, but if you want to leave DTDs behind, you can still get attribute-defaulting at the cost of one more pass through the document when the values are to be substituted. This can be done by a stylesheet that transforms a source document into a result that is identical except that all default attribute values are given. . . At Fourthought, we've used Schematron in deployed work products both for our clients and for ourselves. Because we already do a lot of work with XSLT, it's a very comfortable system and there's not much training required for XPath. To match the basic features of DTD, not a lot more knowledge is needed than path expressions, predicates, unions, the sibling and attribute axes, and a handful of functions. Performance has not been an issue because we typically have strong control over XML data in our systems and rarely use defaulted attributes. This allows us to validate only when a new XML datum is input, or an existing datum has modified our systems, reducing performance concerns. Schematron is a clean, well-considered approach to validation and simple reporting. XML Schemas are significant, but it is debatable whether such a new and complex system is required for validation. RELAX and the Schematron both present simpler approaches coming from different angles, and might be a better fit for quick integration into XML systems. In any case, Schematron once again demonstrates the extraordinary reach of XSLT and the flexibility of XML as a data-management technology."
[November 16, 1999] "Schematron: An Interview with Rick Jelliffe." By Simon St.Laurent. From XMLHack.com. (November 15, 1999). ['Rick Jelliffe is the developer of the Schematron, a schema language that takes a very different approach from every other XML schema language proposed so far.'] "What inspired such a different approach? [A] It became clear when writing my book The XML & SGML Cookbook: Recipes for Structured Documents, especially the central pages on patterns (which are pretty novel), that DTDs merely provided an 'assembler language' to represent them. Even if you make parameter entities into first class objects and call them archetypes, you still are stuck with regular grammars at heart. When I started my book I wanted to produce something much more like what Liam Quin has independently and subsequently done, but I found that that many interesting patterns are not clear to express using parameter entities... Anyway, I tried lots of different approaches. The 'path model' and the 'axis model' were two which basically act to allow more powerful right-hand-sides of the BNF production, as it were. They are comparable to Dave Raggett's 'assertion grammars' which works by allowing patterns on the left-hand-side of a production. I wrote a little note about using XSL as an implementation for validation that was well-received. So I guess that Schematron combines path models and assertion grammars, specified using XPaths, implemented through XSL... Schematron rejects the idea that the result of validation is a binary valid/invalid. The purpose of a schema is to make various assertions that should constrain a document; to report on the presence or absence of patterns. So the result of validation may be a complex set of values. Various backends should make use of that set of information, each in their way..."
The Schematron Assertion Language." [Draft version only 2000-05-05; see 'The Schematron' web site for a (possibly) canonical URL. This draft points toward the new architecture for Schematron version 1.4.]
|Receive daily news updates from Managing Editor, Robin Cover.|