Examplotron

Table of contents

  1. Purposes
  2. Limitations
  3. Tutorial
    1. Getting started
    2. What about the attributes
    3. Occurrences
    4. Namespaces
    5. Assertions
    6. Imports
    7. Place holders
  4. Resources
  5. To do
  6. Acknowledgements
  7. History
  8. Legal statement

1. Purposes

The purpose of examplotron is to use instance documents as a lightweight schema language-- eventually adding the information needed to guide a validator in the sample documents.

"Classical" XML validation languages such as DTDs, W3C XML Schema, Relax, Trex or Schematron rely on a modeling of either the structure (and eventually the datatypes) that a document must follow to be considered as valid or on the rules that needs to be checked.

This modeling relies on specific XML serialization syntaxes that need to be understood before one can validate a document and is very different from the instance documents and the creation of a new XML vocabulary involves both creating a new syntax and mastering a syntax for the schema.

Many tools (including popular XML editors) are able to generate various flavors of XML schemas from instance documents, but these schemas do not find enough information in the documents to be directly useable leaving the need for human tweaking and the need to fully understand the schema language.

Examplotron may then be used either as a validation language by itself, or to improve the generation of schemas expressed using other XML schema languages by providing more information to the schema translators.

2. Limitations

The obvious limitation of working with sample documents is that while this is very efficient to describe patterns that can be "shown" in a document, this cannot by itself be used to describe abstract "constructed" patterns.

To workaround this limitation, one need to introduce modeling elements or attributes, moving to an hybrid schema language involving both pure "schema by example" and modeling or rules construction.

The current release includes such an attribute (eg:occurs) to provide a control on the number of occurrences of an element (see section "Occurrences" for a detailed description of this attribute).

I plan to consider the addition of other similar elements or attributes to workaround other similar restrictions such as:

3. Tutorial

3.1. Getting started.

This first instance document (examplotron1.xml) is also a examplotron schema:

<?xml version="1.0" encoding="UTF-8"?>
<foo>
	<bar>My first examplotron.</bar>
	<bar>Hello world</bar>
</foo>

This schema will validate all the documents without any namespace and the "same" structure, i.e. three element nodes (a document element of type "foo" with two children elements of type "bar") and no attributes. The text nodes are not handled in this release and may appear everywhere.

The examplotron compiler (i.e. the compile.xsl XSLT sheet) transforms the examplotron schema into a stylesheet (examplotron1.xsl) that can be applied to any document to check if it has the same structure.

The structure of examplotron1.xsl (and therefore the transformation defined in compile.xsl) is very straightforward.

Default templates are generated that will raise errors when meeting unexpected elements or attributes:

   <x:template match="*">
      <error type="Unexpected element">
         <x:attribute name="path">
            <x:call-template name="getPath"/>
         </x:attribute>
      </error>
   </x:template>
   <x:template match="@*">
      <error type="Unexpected attribute">
         <x:attribute name="path">
            <x:call-template name="getAttPath"/>
         </x:attribute>
      </error>
   </x:template>

And a template is generated for each valid path:

   <x:template priority="1" match="/foo[count(bar)=2 and  1=1]">
      <x:apply-templates select="*|@*"/>
   </x:template>

Additional templates are also generated that are not absolutely necessary for the validation but do provide a more accurate diagnosis:

   <x:template match="/foo">
      <error type="Element content mismatch" name="/foo" expected="[count(bar)=2 and  1=1]"/>
   </x:template>

3.2. What about the attributes?

Attributes are also supported as shown by examplotron2.xml:

<?xml version="1.0" encoding="UTF-8"?>
<foo>
	<bar true="no longer">My first examplotron.</bar>
	<bar>Hello world</bar>
</foo>

That will generate additional templates (examplotron2.xsl):

   <x:template match="/foo/bar">
      <error type="Element content mismatch" name="/foo/bar" expected="[@true and  1=1]"/>
   </x:template>
   <x:template priority="1" match="/foo/bar[@true and  1=1]">
      <x:apply-templates select="*|@*"/>
   </x:template>
   <x:template priority="1" match="/foo/bar/@true"/>

Note that the first of these three templates will be overidden by the higher priority template generated out of the second occurrence of the bar element:

   <x:template priority="1" match="/foo/bar[ 1=1]">
      <x:apply-templates select="*|@*"/>
   </x:template>

This example is showing that examplotron does operate a "or" between the structures defined in multiple occurrences of a structure (ie elements with the same name and ancestors' names) and accept all the variations that it will find.

3.3. Occurrences

There are two main cases when the control of the occurrences per samples is not that convenient. The first case is when there isn't a fixed number of occurrences and the second one when we would like to use occurrences in the examplotron to define alternatives rather than to define the number of occurrences in the target documents.

For these two cases, examplotron defines a simple mechanism inspired by the DTDs allowing to override the definition of the occurrences (examplotron3.xml):

<?xml version="1.0" encoding="UTF-8"?>
<foo xmlns:eg="http://examplotron.org/0/">
	<bar true="no longer">My first examplotron.</bar>
	<bar eg:occurs="+">Hello world</bar>
	<!-- eg:occurs could also have been set to "*", "." or "?" -->
</foo>

The value of the eg:occurs attributes can be "*" (0 or more), "+" (1 or more), "." (exactly one) or "?" (0 or 1) and overides the number of occurrences found the the examplotron (examplotron3.xsl):

   <x:template priority="1" match="/foo[count(bar)>0 and 1=1 and  1=1]">
      <x:apply-templates select="*|@*"/>
   </x:template>

The condition is generated after looking at number of siblings with the same qualified name found in the instance document and the occurrences of eg:occurs attributes found in these elements. The eg:occurs overide the number of occurrences found in the instance documents and if several eg:occurs are defined, a logical "or" of the values they allow is done as follow:

Number of siblings eg:occurs="*" eg:occurs="?" eg:occurs="+" eg:occurs="." Condition
N no no no no = N
N no no no yes = 1
N yes no no no always true
N yes no no yes always true
N yes yes no no always true
N yes yes no yes always true
N yes yes yes no always true
N yes yes yes yes always true
N no yes no no <= 1
N no yes no yes <= 1
N no yes yes no always true
N no yes yes yes always true
N yes no yes no always true
N yes no yes yes always true
N no no yes no >= 1
N no no yes yes >= 1

3.4. Namespaces

Examplotron does support namespaces without any known restriction other than the fact that, for this reason, it requires using a XSLT processor that fully supports namespaces nodes as defined per the XSLT 1.0 recommendation.

To achieve this level of support, examplotron needs to rewrite the namespace prefixes.

The reason for this rewriting is that examplotron uses absolute XPath expressions that are constructed from the element names.

Keeping the prefixes (or lack of prefix) would be a problem when a default namespace has been defined in the instance document or the same prefixes are being reused.

To cope with this rewriting, examplotron starts with storing all the namespace URIs found in the examplotron document in a global variable.

To be able take advantage of this variable, examplotron relies on a result-tree to nodeset conversion function available in most of the XSLT processors the current version being wired to the SAXON implementation by this function and by the saxon:distinct() function:

  <!-- Get the list of namespaces in the instance document -->
  <xsl:variable name="ns-rtf">
    <namespaces>
      <xsl:for-each select="saxon:distinct(//namespace::*)">
        <ns orig-prefix="{name()}" uri="{.}" pos="{position()}"/>
      </xsl:for-each>
    </namespaces>
  </xsl:variable>
  <xsl:variable name="ns" select="saxon:node-set($ns-rtf)"/>

The adherence to SAXON is still weak and examplotron might be easily ported to other XSLT processors assuming they fully support namespaces nodes. Please send me an email if you have any need of such a port.

This implementation allows using namespaces in the examplotron documents in a very straightforward and natural manner (examplotron4.xml):

<?xml version="1.0" encoding="UTF-8"?>
<foo xmlns:eg="http://examplotron.org/0/" xmlns:bar="http://http://examplotron.org/otherns/">
	<bar:bar true="no longer">My first examplotron.</bar:bar>
	<bar:bar eg:occurs="+">Hello world</bar:bar>
	<!-- eg:occurs could also have been set to "*", "." or "?" -->
</foo>

XSLT 1.0 doesn't allow to create namespaces nodes per nihilo and dummy attributes need to be created to generate the namespace definition in the validation document (examplotron4.xsl):

<?xml version="1.0" encoding="UTF-8"?>
<x:stylesheet xmlns:x="http://www.w3.org/1999/XSL/Transform" 
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
 xmlns:eg="http://examplotron.org/0/" 
 xmlns:saxon="http://icl.com/saxon" 
 xmlns:ns1="http://www.w3.org/XML/1998/namespace" 
 xmlns:ns2="http://examplotron.org/0/" 
 xmlns:ns3="http://http://examplotron.org/otherns/" 
 version="1.0" ns1:dummy="" ns2:dummy="" ns3:dummy="">

The XPath expressions can then use the rewritten prefixes in a consitent way such as in:

   <x:template match="/foo">
      <error type="Element content mismatch" name="/foo" expected="[count(ns3:bar)>0 and 1=1 and  1=1]"/>
   </x:template>
   <x:template priority="1" match="/foo[count(ns3:bar)>0 and 1=1 and  1=1]">
      <x:apply-templates select="*|@*"/>
   </x:template>

3.5. Assertions

In order to describe more complex rules, it is possible to define assertions (i.e. statements that need to be met) as XPath expressions using "eg:assert" attributes (examplotron5.xml):

<?xml version="1.0" encoding="UTF-8"?>
<foo xmlns:eg="http://examplotron.org/0/"  eg:assert="sum(percent)=100">
<!-- The sum of the values of the "percent" element needs to be equal to 100 -->
	<percent eg:occurs="+">100</percent>
</foo>

The expression found in an "eg:assert" attribute is copied in the matching templates by the compiler (examplotron5.xsl):

   <x:template priority="1" 
    match="/foo[count(percent)&gt;0 and 1=1 and (sum(percent)=100) and  1=1]">

Assertions can be used without restriction on document using namespaces, but please remember that XPath expressions do not support default namespaces (examplotron6.xml):

<?xml version="1.0" encoding="UTF-8"?>
<foo:foo xmlns:eg="http://examplotron.org/0/"  eg:assert="sum(bar:percent)=100" 
    xmlns:foo="http://examplotron/otherns/foo" xmlns:bar="http://examplotron/otherns/bar">
	<bar:percent eg:occurs="+">100</bar:percent>
</foo:foo>

To deal with possible redefinitions of namespace prefixes, the compiler copies all the namespaces nodes found in the element where the assertion is found in the template (examplotron6.xsl):

   <x:template xmlns:bar="http://examplotron/otherns/bar" 
    xmlns:foo="http://examplotron/otherns/foo" priority="1"
    match="/ns3:foo[count(ns4:percent)>0 and 1=1 and (sum(bar:percent)=100) and  1=1]">
      <x:apply-templates select="*|@*"/>
   </x:template>

Many thanks to David Carlisle for this tip.

Warning: when an element is repeated, the "eg:assert" attribute needs to be included on the first occurrence of this element.

3.6. Import

There are a couple of reasons why one might want to import examplotron schemas.

The first one, common to any schema language is to write modular vocabularies with as few interdependencies as possible.

The second one, more specific to examplotron is to consolidate the patterns found in different samples, for instance when the document element might be different.

The import operation is performed using the eg:import element (yes, it's our first element!) and its meaning is to "import" the definitions from another examplotron by merging the possibilities described in the imported document with these from the current document.

The eg:import element is defined as a simple XLink and its location within the importing document is not significant.

If we want to extend our first example to accept "foo" elements with three instances of "bar", we can do it by importing if into a new examplotron (examplotron7.xml):

<?xml version="1.0" encoding="UTF-8"?>
<foo xmlns:eg="http://examplotron.org/0/" xmlns:xlink="http://www.w3.org/1999/xlink">
	<eg:import xlink:href="examplotron1.xml"/>
	<bar>one</bar>
	<bar>two</bar>
	<bar>three</bar>
</foo>

One should note that the import is a merge between the patterns found in several examplotron and that the order in which the imports are done and the location of the import statement are not significant.

3.7. Place holders

There are cases where it may be needed to add patterns on top of what has already been described, especially when designing modules to be added on top of core vocabularies.

The examplotron defining a module needs to define where the additions will sit without re-defining the the core vocabulary.

Using an element to locate its children in the structure without redefining it is done using eg:placeHolder attribues.

If we want to add optional "sub-bars" elements under the "bar" elements of our first example, we will write (examplotron8.xml):

<?xml version="1.0" encoding="UTF-8"?>
<foo xmlns:eg="http://examplotron.org/0/" xmlns:xlink="http://www.w3.org/1999/xlink" 
     eg:placeHolder="true">
	<eg:import xlink:href="examplotron1.xml"/>
	<bar eg:placeHolder="true">
		<sub-bar eg:occurs="*">here we are.</sub-bar>
	</bar>
</foo>

Note: The full implementation of this feature requires a modification in the architecture of the compiler and the current implementation does not properly support the eg:occurs attributes under placeholder elements (these elements are always considered as having eg:occurs="*" attributes). This will be fixed in a next release if the feedback from the community leeds to keep this feature as defined here.

4. Resources

This documentation has been written as a RDDL document and this section will be developed to include more resources related to examplotron.

XSLT Compiler

This compiler is a XSLT transformation that compiles an examplotron schema into a XSLT transformation that can be used to validate documents that are conform to this schema.

This release must be run using Saxon. The transformation generated by the compiler can be run by any XSLT 1.0 processor.

W3C XML Schema for examplotron

This W3C XML Schema (Proposed Recommendation, 16 March 2001) schema describes the examplotron vocabulary and can be imported in W3C XML Schema to validate examplotron schemas.

CSS Stylesheet

A CSS stylesheet borrowed from RDDL used to provide the "look-and-feel" of this document, suitable in general for RDDL documents.

CSS Stylesheet (original).

Original version of the previous CSS stylesheet on rddl.org.

XYZFind Server User's Guide

The chapter 7. of the XYZFind Server User's Guide describes a schema language used by XYZFind Server that is very similar to examplotron.

Proposal for XSL

This early proposal for XSL proposed a syntax similar to the one used by examplotron for expressing patterns.

5. To do

I have been pleasantly surprised after a couple of hours working on examplotron that this simple tool was beginning to be useful while still very simple (or simplistic).

The current version is already a powerful tool that can be used to validate documents.

It can be used as a main validation tool, or as a complement of a more classical validation tool, for instance, to add additional requirements and constraints to existing vocabularies when an application is using a subset of a vocabulary.

This being said, the simplicity of the tools is leaving room for many applications and extensions on which your feedback is welcome:

  1. Documentation: develop the resources section.
  2. Optimization: filtering of duplicate templates.
  3. Optimization: improvement of the named templates that generates the XPath paths.
  4. Interface: generation of a Schematron after an examplotron.
  5. Interface: generation of a structure based schema languages (W3C XML Schema, Relax, TREX, ...) after an examplotron.
  6. Extension: ability to accurately control the number of occurrences (may already be done using eg:assert).
  7. Extension: ability to control text nodes.
  8. Extension: ability to control the order of the elements.
  9. Extension: ability to define recursive models.
  10. Extension: ability to add type information.
  11. Extension: ability to generate a post validation document.
  12. Extension: ability to explicitly control the occurrences of attributes.
  13. Readability: ability to add documentation.
  14. Proof of concept: write an examplotron schema for XHTML.
  15. Anything else ?

6. Acknowledgements

Many thanks to the many people that have given me hints, ideas or encouragements or even let me think that examplotron could be the best invention since the French baguette

Note: the French baguette is another very simple invention made only of flour, salt, yeast and water (exactly like examplotron that is made out of XML 1.0, Namespaces in XML 1.0, XPath 1.0 and XSLT 1.0). The Englo-American sliced bread (often used in this context), involving more ingredients and postprocessing is far more complex and does obviously not belong to the same category than examplotron.

Non normative list (by chronological order): Simon St.Laurent, Edd Dumbill, John Cowan, Len Bullard, Rick Jelliffe, Evan Lenz, Dan Brickley, Jonathan Borden, David Mundie, David Carlisle, Murata Makoto, Cyril Jandia, Amelia A. Lewis, Gavin Thomas Nicol, Tim Mueller-Seydlitz, Michael Champion, ...

7. History

V0.1

  • Creation

V0.2

  • Addition of several sections (limitations, acknowledgements, history and legal)
  • Clarifications after comments through xml-dev and private mails.
  • Addition of an history section in compile.xsl
  • Creation of a W3C XML Schema for examplotron (examplotron.xsd).
  • Start to feed the resources section.

V0.3

  • Addition of eg:assert.
  • Rewrite of the history section as RDDL resources.
  • Addition of new resources.
  • Expansion of the list of acknowledgements.

V0.4

  • Addition of eg:import.
  • Addition of eg:placeHolder.
  • Restructured the document.
  • Addition of new resources.
  • Expansion of the list of acknowledgements.