[Cache from http://www.jenitennison.com/datatypes/DTLL.html, 2005-08-05; please see the canonical source.]
This document is a basic specification of the Datatype Library Language (DTLL). It includes, embedded within it, the RELAX NG Compact Syntax schema for DTLL. There are still many areas that require greater detail.
This version is a simplification of the previous version of DTLL which attempts to find the minimum required to support the definition of datatypes for the purposes of validation. In particular, the changes are:
Hierarchies of datatypes have been removed: datatypes no longer have supertypes or subtypes, and consequently do not have parameters or constraints. The concept of abstract datatypes is also no longer needed. It is still possible to create datatypes that are based on other datatypes, however; for example, to create an integer between 1 and 10, you could do:
<datatype name="integer-from-1-to-10"> <variable name="integer" select="." type="integer" /> <condition test="$integer >= 1" /> <condition test="$integer <= 10" /> </datatype>
There's no longer a specialised parsing method for enumerated
values: these can be parsed using regular expressions and tested
against external code lists using normal constraints by accessing the
code lists using the doc()
function.
The method for parsing lists of values has been simplified: the DTLL processor only has to break up the list into separate values; testing that these values are of particular types can be done using constraints.
There's no method for specifying the collation used to compare values of a particular datatype. The main purpose of supplying a collation is to facilitate XPath datatyping rather than validation. Although the lack of collations makes writing conditions harder, it's still generally possible to do so without them.
A couple of extra extension functions to XPath 1.0 have been added, though others have been removed.
Mapping has been modified to support the kinds of things you'd otherwise do with hierarchies, including strong and weak typing. The nooks and crannies of mappings and type conversions haven't been properly explored yet, but I thought it was better to "release early" than wait 'til I had time to do so.
Unlike XML Schema, RELAX NG doesn't provide a mechanism for users to define their own types. If they're not satisfied with the two built-in types of string and token, RELAX NG users have to create a datatype library, which they then refer to from the schema.
Most RELAX NG validators provide built-in support for the XML Schema datatype library. Many also support an interface that allows you to plug in datatype modules, written in the programming language of your choice, to define extra datatypes. But the fact that these datatype libraries have to be programmed means that ordinary users find them hard to construct.
One option would be for RELAX NG validators to support datatype
definition via XML Schema - using <xs:simpleType>
elements to create new atomic types. However, there are several problems
with this:
It wouldn't be particularly easy for implementations to support
the <xs:simpleType>
elements in isolation,
but RELAX NG validators don't want to have to be able to understand
XML Schema schemas.
It wouldn't be particularly easy for RELAX NG users to switch to using the very different style employed by XML Schema, and again RELAX NG users don't want to have to be able to write XML Schema schemas.
Creating user-defined datatypes based on the XML Schema datatypes means incorporating all the built-in types, including types that are unlikely be required for a particular schema.
In general, the XML Schema type system goes against RELAX NG's open philosophy, for example by dictating the required format for numbers and dates when different markup languages might reasonably use different formats (for internationalisation reasons, for example).
So the primary motivation for putting together a language for datatype libraries is to enable RELAX NG users to construct their own datatypes without having to resort to a procedural programming language or having to learn how to use XML Schema, which might not be suited for their needs.
datatypes xs = "http://www.w3.org/2001/XMLSchema-datatypes" default namespace dt = "http://www.jenitennison.com/datatypes" namespace local = "" start = \datatypes
<datatypes>
is the document element.
The version
attribute holds the version of the
datatype library language. The current version is 0.4.
If a DTLL version 0.4 processor encounters a datatype library with a version higher than 0.4, it must treat any attributes or elements that it doesn't understand (that are not part of DTLL 0.4) in the same way as it would treat extension attributes or elements found in the same location.
\datatypes = element datatypes { attribute version { "0.4" }, ns?, extension-attribute*, top-level-element* }
top-level-element |= named-datatype top-level-element |= top-level-map top-level-element |= \include top-level-element |= \div top-level-element |= extension-top-level-element
<include>
elements include datatype
libraries from elsewhere. It is as if the content of the included document
(the children of the <datatypes>
element) is
inserted into the datatype library in place of the
<include>
element.
\include = element include { attribute href { xs:anyURI }, extension-attribute* }
It is an error for a datatype library to contain circular includes. If the datatype library A includes the datatype library B, then B must not include A or include any datatype library that (at any remove) includes A.
<div>
elements are simply used to partition
a datatype library and to provide a scope for ns
attributes.
\div = element div { ns?, extension-attribute*, top-level-element* }
Extension top-level elements can be used to hold data that is used within the datatype library (such as code lists used to test enumerated values), documentation, or information that is used by implementations. For example, an extension top-level element can be used by an implementation to define extension functions (using XSLT, for example) that can be used in the XPath expressions used within the datatype library.
extension-top-level-element = extension-element
Named datatypes are given at the top level of the datatype library
using <datatype>
elements. Each named datatype
has a qualified name that can be used to refer to it.
The name of the datatype is given in the name
attribute. If this is unprefixed, the nearest ancestor
ns
attribute (including one on the
<datatype>
element itself) is used to provide the
namespace for the datatype.
named-datatype = element datatype { attribute name { xs:QName }, ns?, extension-attribute*, datatype-definition-element* }
Anonymous datatypes are used to provide the datatype for a property or variable if that property or variable's type can't be referred to by name.
anonymous-datatype = element datatype { extension-attribute*, datatype-definition-element* }
Datatypes are referenced using qualified names. If the qualified
name hasn't got a prefix, the nearest ancestor ns
attribute (including one on the element that's referring to the datatype)
is used to resolve the name.
datatype-reference = xs:QName
A datatype definition consists of a number of elements that test values and define variables. If a value passes the tests specified by these elements, then it's a valid value for the datatype.
datatype-definition-element |= property datatype-definition-element |= parse datatype-definition-element |= condition datatype-definition-element |= except datatype-definition-element |= variable datatype-definition-element |= local-map datatype-definition-element |= extension-definition-element
Extension definition elements can be used at any point within a datatype definition. If a processor doesn't recognise an extension definition element, it must ignore it and behave as if the value passed whatever test the extension definition element represented.
Extension definition elements can be used to hold documentation
about the datatype. For example, an
<eg:example>
element might be used to provide
example legal values of the datatype:
<datatype name="RRGGBBColour"> <eg:example>#FFFFFF</eg:example> <eg:example>#123456</eg:example> <parse name="RRGGBB"> <regex>#(?[RR][0-9A-F]{2})(?[GG][0-9A-F]{2})(?[BB][0-9A-F]{2})</regex> </parse> ... </datatype>
extension-definition-element = extension-element
Certain aspects of a datatype definition can be negated by being
placed in an <except>
element. A value is only
valid if it isn't valid according to any of the
datatype definition elements held within an
<except>
element.
except = element except { extension-attribute*, negative-test+ } negative-test |= condition negative-test |= variable negative-test |= parse
Parsing can perform two functions: it tests whether a value adheres to a particular format, and can assign a tree value to a variable to enable pieces of the string value to be extracted, tested, assigned to properties and so on.
The <parse>
element holds any number of
parsing methods, one or more of which must be satisfied in order for the
value to be considered valid. The name
attribute, if
present, specifies the name of the variable to which the tree resulting
from the parse is assigned. The first successful parse of those specified
within the <parse>
element is used to give the
value of this variable (thus the processor does not have to attempt to
perform any parses once one has been successful).
A datatype can specify as many <parse>
elements as it wishes. All must be satisfied by a value for that value to
be a legal value of the datatype.
parse = element parse { name?, preprocess*, extension-attribute*, parsing-method+ }
Before a value is parsed by a <parse>
element, it can be preprocessed. This does not change the string value,
but it may simplify the specification of the parsing method that's
used.
The only built-in form of preprocessing is whitespace processing.
The whitespace can be preserved ('preserve'
),
whitespace characters replaced by space characters
('replace'
), or leading and trailing whitespace
stripped and sequences of whitespace characters replaced by spaces
('collapse'
, the default).
preprocess |= attribute whitespace { "preserve" | "replace" | "collapse" }
There are two core methods of parsing: via a regular expression, and by specifying a list. This set of methods can be supplemented by extension parsing elements.
parsing-method |= regex parsing-method |= \list parsing-method |= extension-parsing-element
The <regex>
element specifies parsing
via an extended regular expression. To be a legal value, the entire
string value must be matched by the regular expression. (Although it's
legal to use ^
and $
to mark the
beginning and end of the matched string, it's not necessary.)
The tree value generated by parsing consists of a root (document) node with text node and element children. The string value of the root (document) node is the string value itself. There is one element for each named subexpression. The element's name being the name of the subexpression with the namespace indicated by the prefix indicated in the name. If no prefix is used, the element is in no namespace. The string value of each of these elements is the matched part of the string value as a whole.
For example, the regex:
(?[year]-?[0-9]{4})-(?[month][0-9]{2})-(?[day][0-9]{2})
parsing the value:
2003-12-19
generates the tree:
(root) +- year | +- "2003" +- "-" +- month | +- "12" +- "-" +- day +- "19"
regex = element regex { regex-flags*, extension-attribute*, extended-regular-expression }
Four attributes modify the way in which regular expressions are applied. These are equivalent to the flags available within XPath 2.0.
By default, the "."
meta-character matches
all characters except the newline (#xA
)
character. If dot-all="true"
then
"."
matches all characters, including the newline
character.
regex-flags |= attribute dot-all { boolean }
By default, ^
matches the beginning of the
entire string and $
the end of the entire string.
If multi-line="true"
then ^
matches the beginning of each line as well as the beginning of the
string, and $
matches the end of each line as
well as the end of the string. Lines are delimited by newline
(#xA
) characters.
regex-flags |= attribute multi-line { boolean }
By default, the regular expression is case sensitive. If
case-insensitive="true"
then the matching is
case-insensitive, which means that the regular expression
"a"
will match the string
"A"
.
regex-flags |= attribute case-insensitive { boolean }
By default, whitespace within the regular expression matches
whitespace in the string. If
ignore-whitespace="true"
, whitespace in the
regular expression is removed prior to matching, and you need to use
"\s"
to match whitespace. This can be used to
create more readable regular expressions.
<regex ignore-whitespace="true"> (?[year][0-9]{4})- (?[month][0-9]{2})- (?[day][0-9]{2}) </regex>
This is not the same as <parse
whitespace="collapse">...</parse>
, which
preprocesses the string value itself.
regex-flags |= attribute ignore-whitespace { boolean }
Boolean values are 'true'
or
'false'
, with optional leading and trailing
whitespace.
boolean = xs:boolean { pattern = "true|false" }
The <list>
element specifies parsing of
the string value into a list of values, simply using a
separator
attribute to provide a regular
expression to break up the list into items.
The result of parsing the string value based on the
<list>
element is a node-set of sibling
elements. The names of the item elements are implementation-defined.
For example, if you have:
<list separator="\s*,\s*" />
and the string value:
1, 2, 3, 45
then the variable is set to the elements in the tree:
(root) +- item | +- "1" +- item | +- "2" +- item | +- "3" +- item +- "45"
These elements need not be named 'item'
.
The separator
attribute specifies a regular
expression that matches the separators in the list. The default is
"\s+"
(one or more whitespace characters). It is an
error if the regular expression matches an empty string (i.e. if it
matches ""
).
\list = element list { attribute separator { regular-expression }?, extension-attribute* }
Extension parsing elements can be used to parse elements using methods other than the core methods explained above. Extension parsing elements can be used, for example, to parse a value using EBNF or PEGs.
If the extension parsing element isn't recognised, the value is
considered to fail the parse. If the extension parsing element occurs
in a <parse>
element without any alternative
parsing methods, this means no value can match the datatype, and the
implementation must issue a warning. Usually, an extension parsing
element will be used alongside a built-in parsing method.
<parse name="path"> <ext:ebnf ref="http://www.w3.org/1999/xpath" /> <regex dot-all="true">.*</regex> </parse>
extension-parsing-element = extension-element
Conditions define run-time tests that check values.
The <condition>
element tests whether a
particular condition is satisfied by a value. The value is not valid if
the test evaluates to false.
condition = element condition { extension-attribute*, test }
Tests are done through a test
attribute which
holds an XPath expression. If the effective boolean value of the result of
evaluating the XPath expression is true then the test succeeds and the
condition is satisfied.
test = attribute test { XPath }
Properties and variables declare variables for use
in binding expressions (i.e. XPath expressions). Property variables are of
the form $this.name
where
name is the name of the property; ordinary
variables just use the name of the variable. The variable
$this
refers to the value itself (as does the XPath
expression .
).
Variable binding is carried out in the order the variables are declared. It is an error if a variable is referenced without being declared. The scope of a variable binding is limited to the following siblings of the variable declaration and their descendants.
The <property>
element specifies a
property of the datatype. The values of properties are available via the
dt:property()
extension function within XPath
expressions in DTLL (or via other implementation-defined APIs).
The value of a property for a value can be referenced using
$this.name
where
name is the value of the
name
attribute on the
<property>
element.
For example, consider:
<datatype name="RRGGBB"> <parse name="colour"> <regex ignore-whitespace="true"> #(?[red][0-9A-F]{2}) (?[green][0-9A-F]{2}) (?[blue][0-9A-F]{2}) </regex> </parse> <property name="red" select="$colour/red" /> <property name="green" select="$colour/green" /> <property name="blue" select="$colour/blue" /> </datatype>
property = element property { name, type?, binding, extension-attribute* }
The <variable>
element binds a value to a
variable. Variables are similar to properties except that their values
aren't accessible via APIs. The value of a variable is accessed through
$name
, where
name is the name of the variable. It is an
error if the name of a variable starts with (or is)
'this'
. For future use, it is also an error if the
name of a variable starts with (or is) 'type'
.
Variables are used for intermediate calculations.
variable = element variable { name, type?, binding, extension-attribute* }
There are two ways to specify a type: via a
type
attribute or via an anonymous
<datatype>
element.
type |= attribute type { datatype-reference } type |= anonymous-datatype
If there is a mapping specified from the type of the provided value
to the required type, then that mapping is used to convert the value to
the required type. If the value is a standard XPath 1.0 type (string,
number, boolean or node-set), then that value is converted to a string
using the string()
function and interpreted as the
string value of the required type. Otherwise (there's no mapping and the
value is not a standard XPath type), it's an error.
If no type is specified for a variable or property, then the supplied value is used directly. Note that this value may be a standard XPath type (string, number, boolean or node-set).
There are two built-in ways to bind a value to a property or
variable: through the value
attribute, which holds a
literal value or through a select
attribute, which
holds an XPath expression. Implementations can also define their own
extension binding elements.
binding = (literal-value | select), extension-binding-element*
If a value
attribute is specified, its value is
the string value of the value of the variable or property; the type of the
variable or property is used to interpret that value.
literal-value = attribute value { text }
If a select
attribute is specified, the XPath
expression it contains is evaluated to give the value of the property or variable.
select = attribute select { XPath }
Extension binding elements are used where more power is needed to
specify the value of a parameter, property or variable. This can be used
to provide values using methods such as XSLT or MathML. If an
implementation does not support any of the extension binding elements
specified, then it must assign to the variable the value specified by
the value
or select
attribute
instead. If an implementation supports one or more of the extension
binding elements, then it must use the first extension binding element
it understands to calculate the value of the variable.
extension-binding-element = extension-element
Maps provide a way of converting a value of one datatype to another datatype. Maps are either strong or weak. If there's a strong map from datatype A to datatype B then every legal value of datatype A must map onto a legal value of datatype B. A weak map means that some of the values of datatype A can be mapped on to legal values of datatype B. In both cases, the mapping is uni-directional: often a strong map from A to B is coupled with a weak map from B to A.
The <map>
element defines a map from one
datatype to another. The attributes of the
<map>
element defines how the mapping is
done.
Note that it is possible for there to be maps to and from two datatypes, but it is not necessarily the case that a round-trip will result in the same string value.
For example, with the datatype definitions:
<datatype name="UKDate"> <parse name="date"> <regex ignore-whitespace="true"> (?[day][0-9]{1,2})/(?[month][0-9]{1,2})/(?[year][0-9]{4}) </regex> </parse> <property name="year" select="$date/year" /> <property name="month" select="$date/month" /> <property name="day" select="$date/day" /> </datatype> <datatype name="ISODate"> <parse name="date"> <regex ignore-whitespace="true"> (?[year][0-9]{4})/(?[month][0-9]{2})/(?[day][0-9]{2}) </regex> </parse> <property name="year" select="$date/year" /> <property name="month" select="$date/month" /> <property name="day" select="$date/day" /> </datatype> <map from="UKDate" to="ISODate" select="concat(format-number($this.year, '0000'), '-', format-number($this.month, '00'), '-', format-number($this.day, '00'))" /> <map from="ISODate" to="UKDate" select="concat($this.day, '/', $this.month, '/', $this.year)" />
the UKDate "5/1/1947"
maps to the ISODate
"1947-01-05"
, which maps back to the UKDate
"05/01/1947"
.
Local maps appear within a <datatype>
element
and define maps from or to the datatype in which they're defined to or from
the datatype referenced in the to
or
from
attribute. Top-level maps appear within the
<datatypes>
element and define maps from the
datatype referenced in the from
attribute to the
datatype referenced in the to
attribute.
local-map = element map { (from | to), kind?, mapping, extension-attribute* } top-level-map = element map { from, to, kind?, mapping, extension-attribute* }
The to
attribute holds a reference to a
datatype, which is the datatype to which a value can be mapped, or a
*
. The value *
indicates that the
map describes how to map from the datatype specified by the
from
attribute to any other datatype.
to = attribute to { datatype-reference | "*" }
The from
attribute holds a reference to a
datatype, which is the datatype from which a value can be mapped, or a
*
. The value *
indicates that the
map describes how to map to the datatype specified by the
to
attribute from any other datatype.
from = attribute from { datatype-reference | "*" }
The kind
attribute indicates whether the map is a
strong or weak map. Strong maps are guaranteed to succeed; weak maps may
fail, depending on the value. If the kind
attribute is
missing, the <map>
element defines a strong map
if both datatypes are specified and a weak map if the map is to/from any
type.
kind = attribute kind { "strong" | "weak" }
A <map>
element that specifies a map from
datatype A to datatype B also implicitly defines weak maps from A to any type
via B and from any type to B via A.
It is an error if there are two or more explicit maps defined between the same two datatypes, or from/to a datatype and any type. It is an error if there are two or more implicit maps from/to a datatype and any type unless there is an explicit map from/to that datatype and any type. There can only be one map defined to be from any type to any type.
The following is an error:
1 | <map from="A" to="B" select="..." /> 2 | <map from="A" to="C" select="..." />
1 sets up an explicit map from A to B. This sets up an implicit map from A to any type via B and from any type to B via A. 2 sets up an explicit map from A to C. This sets up an implicit map from A to any type via B and from any type to B via A. This is an error because there are two implicit maps from A to any type and no explicit map from A to any type. To fix the error, an explicit map from A to any type needs to be created:
<map from="A" to="*" as="B" />
The map itself is defined through a binding which creates
a string which is a valid string value for the target datatype or
through an as
attribute. If an as
attribute is provided, the mapping should be carried out via the
intermediate datatype specified by the as
attribute.
mapping |= binding mapping |= attribute as { datatype-reference }
To work out how to convert from a source value of type S to a target value of a required type R, an application has to locate an appropriate mapping pathway to use. A mapping pathway can consist of several steps via intermediate types. To convert from S to R, the value is converted from S to an intermediate type I and then from I to R.
There is a mapping pathway from S to R if a mapping binding is specified for converting directly from S to R, or if there is a mapping pathway from S to I and a mapping pathway from I to R.
There may be multiple mappings specified from S to R. The first of the following list of available mappings that provides a mapping pathway from S to R is used.
A (strong or weak) mapping defined to be from S to R.
An explicit strong mapping defined to be from S to any type.
An explicit strong mapping defined to be from any type to R.
An explicit weak mapping defined to be from S to any type.
An explicit weak mapping defined to be from any type to R.
An implicit mapping defined to be from S to any type.
An implicit mapping defined to be from any type to R.
A mapping defined to be from any type to any type.
Consider the following mapping definitions and the mapping from A to B:
1 | <map from="A" to="*" as="C" /> 2 | <map from="A" to="C" select="..." /> 3 | <map from="A" to="D" select="..." /> 4 | <map from="D" to="B" select="..." />
These explicit mappings generate the following implicit mappings:
5 | <map from="*" to="C" as="A" /> 6 | <map from="*" to="D" as="A" /> 7 | <map from="D" to="*" as="B" /> 8 | <map from="*" to="B" as="D" />
There are two possible mappings from A to B: 1 (explicitly from A to any type, via C) and 8 (implicitly from any type to B via D). Since 1 is preferred over 8, we first try to find a mapping pathway via C. There's a mapping binding from A to C (2), and an implicit mapping (8) from any type to B via D, so we try to find a mapping from C to D and from D to B. There's an implicit mapping from any type to D (6) via A, so we need mappings from C to A and from A to D. There's no mapping from C to A, so there's no mapping pathway based on 1.
Since the mapping defined by 1 does not lead to a mapping pathway, we try to find a mapping pathway using the mapping defined by 8, via D. There's a mapping binding defined from A to D (3) and a mapping binding defined from D to B (4).
The final conversion used is from A to D to B, using the mapping bindings defined in 3 and 4.
XPath 1.0 expressions are used to bind values to variables or properties and to express tests in conditions.
XPath = text
Variable and property values are available within an XPath expression if the variable or property is declared prior to the XPath expression.
Within a datatype library, each datatype has a corresponding extension function named after the name of the datatype. This function takes a single argument, which can be of any type, and returns a typed value of the type specified by the name of the function. The supplied value is converted to the required type using the same rules as for type conversions for variables. Note that this works for all datatypes, including lists.
Other extension functions are:
dt:item(list-value, number)
returns the item in the list-value at the index given by the number (counting starts from 1); returns an empty string if the number is greater than the number of items in the list-value. Values that aren't of a list type are treated like list-type values with a single item.
dt:property(value, prop-name)
returns the value of the named property for the value
dt:if(test, true, false)
returns the true value if the test is true and the false value
if the test is false. Note that both the true and false arguments
are evaluated (unlike the if
expression in
XPath 2.0.
dt:default(value, default)
returns the first argument if the effective boolean value of the first argument is true, and the second argument otherwise
A regular expression as defined in XPath 2.0
regular-expression = text
Extended regular expressions can have named subexpressions. Named
subexpressions are specified with the syntax
(?[name]regex)
where name is name of the subexpression and
regex is the subexpression itself.
(?[year]-?[0-9]{4})-(?[month][0-9]{2})-(?[day][0-9]{2})
extended-regular-expression = text
name = attribute name { xs:NCName } dt-name = attribute dt:name { xs:NCName } ns = attribute ns { xs:anyURI }
Extension elements are any attributes that aren't in the DTLL namespace. They can contain anything (including DTLL elements). Extension attributes are any attributes that are in neither the DTLL namespace or no namespace (unprefixed). They can have any kind of value.
extension-element = element * - dt:* { anything } extension-attribute = attribute * - (local:* | dt:*) { text } anything = attribute * { text }*, mixed { element * { anything }* }
This example shows a way of defining the numeric datatypes from XML Schema, plus a hexadecimal byte datatype. It includes an extension function defined within the datatype library using XSLT 2.0.
<datatypes version="0.4" xmlns="http://www.jenitennison.com/datatypes" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xsl:version="2.0" xmlns:eg="http://www.jenitennison.com/datatypes/examples" ns="http://www.jenitennison.com/datatypes/examples"> <datatype name="double"> <parse name="double"> <regex>(?[mantissa](\+|-)?[0-9]+(\.[0-9]+)?)([eE](?[exponent][0-9]+))?</regex> <regex>(?[inf]\+?INF)</regex> <regex>(?[neginf]-INF</regex> <regex>(?[nan]NaN)</regex> </parse> <variable name="mantissa" type="decimal" select="$double/mantissa" /> <variable name="exponent" type="integer" select="dt:default($double/exponent, 0)" /> <property name="xpath-value" select="dt:if($double/inf, 1 div 0, dt:if($double/neginf, -1 div 0, dt:if($double/nan, number('NaN'), eg:power($mantissa, $exponent))))" /> </datatype> <datatype name="decimal"> <parse> <regex>(\+|-)?[0-9]+(\.[0-9]+)?</regex> </parse> <map to="double" select="." /> <map from="double" kind="weak" select="." /> </datatype> <datatype name="integer"> <parse> <regex>(\+|-)?[0-9]+</regex> </parse> <map from="decimal" select="round(.)" /> <map to="decimal" select="." /> </datatype> <datatype name="nonNegativeInteger"> <parse> <regex>\+?[0-9]+</regex> </parse> <map from="integer" select="dt:if(. >= 0, ., -.)" /> <map to="integer" select="." /> </datatype> <datatype name="positiveInteger"> <condition test=". != 0" /> <map to="nonNegativeInteger" select="." /> <map from="nonNegativeInteger" kind="weak" select="." /> </datatype> <datatype name="nonPositiveInteger"> <parse> <regex>-[0-9]+</regex> </parse> <map from="integer" select="dt:if(. > 0, -., .)" /> <map to="integer" select="." /> </datatype> <datatype name="negativeInteger"> <variable name="value" type="nonPositiveInteger" select="." /> <condition test=". != 0" /> <map to="nonPositiveInteger" select="." /> <map from="nonPositiveInteger" kind="weak" select="." /> </datatype> <datatype name="long"> <variable name="value" type="integer" select="." /> <condition test=". >= -9223372036854775808" /> <condition test=". <= 9223372036854775807" /> <map to="integer" select="." /> <map from="integer" kind="weak" select="." /> </datatype> <datatype name="int"> <variable name="value" type="long" select="." /> <condition test=". >= -2147483648" /> <condition test=". <= 2147483647" /> <map to="long" select="." /> <map from="long" kind="weak" select="." /> </datatype> <datatype name="short"> <variable name="value" type="int" select="." /> <condition test=". >= -32768" /> <condition test=". <= 32767" /> <map to="int" select="." /> <map from="int" kind="weak" select="." /> </datatype> <datatype name="byte"> <variable name="value" type="short" select="." /> <condition test=". >= -128" /> <condition test=". <= 127" /> <map to="short" select="." /> <map from="short" kind="weak" select="." /> </datatype> <datatype name="unsignedLong"> <variable name="value" type="nonNegativeInteger" select="." /> <condition test=". <= 18446744073709551615" /> <map to="nonNegativeInteger" select="." /> <map from="nonNegativeInteger" kind="weak" select="." /> </datatype> <datatype name="unsignedInt"> <variable name="value" type="unsignedLong" select="." /> <condition test=". <= 4294967295" /> <map to="unsignedLong" select="." /> <map from="unsignedLong" kind="weak" select="." /> </datatype> <datatype name="unsignedShort"> <variable name="value" type="unsignedInt" select="." /> <condition test=". <= 65535" /> <map to="unsignedInt" select="." /> <map from="unsignedInt" kind="weak" select="." /> </datatype> <datatype name="unsignedByte"> <variable name="value" type="unsignedShort" select="." /> <condition test=". <= 255" /> <map to="unsignedShort" select="." /> <map from="unsignedShort" kind="weak" select="." /> </datatype> <datatype name="hexByte"> <parse> <regex>[0-9A-F]{2}</regex> </parse> <variable name="hexDigits" select="'0123456789ABCDEF'" /> <map to="unsignedByte" select="string-length(substring-before(substring(., 1, 1), $hexDigits) * 16 + string-length(substring-before(substring(., 2, 1), $hexDigits))" /> <map from="unsignedByte" select="concat(substring($hexDigits, floor(. div 16), 1), substring($hexDigits, . mod 16, 1))" /> </datatype> <xsl:function name="eg:power"> <xsl:param name="number" /> <xsl:param name="power" /> <xsl:sequence select="eg:_power($number, $power, 1)" /> </xsl:function> <xsl:function name="eg:_power"> <xsl:param name="number" /> <xsl:param name="power" /> <xsl:param name="result" /> <xsl:choose> <xsl:when test="$power = 0"> <xsl:sequence select="$result" /> </xsl:when> <xsl:otherwise> <xsl:sequence select="eg:_power($number, $power - 1, $result * $number)" /> </xsl:otherwise> </xsl:choose> </xsl:function> </datatypes>
This example illustrates the date, time and duration types from XML Schema and XPath 2.0.
<datatypes version="0.4" xmlns="http://www.jenitennison.com/datatypes" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xsl:version="2.0" xmlns:eg="http://www.jenitennison.com/datatypes/examples" ns="http://www.jenitennison.com/datatypes/examples"> <datatype name="dateTime"> <parse name="dateTime"> <regex ignore-whitespace="true"> (?[date]-?[0-9]{4,}-[0-9]{2}-[0-9]{2}) T(?[time][0-9]{2}:[0-9]{2}:[0-9]{2}(\.[0-9]+)) (?[timezone](Z|((\+|-)[0-9]{2}:[0-9]{2})))? </regex> </parse> <property name="timezone" type="timezone" select="$dateTime/timezone" /> <property name="date" type="date" select="concat($dateTime/date, $this.timezone)" /> <property name="time" type="time" select="concat($dateTime/time, $this.timezone)" /> <property name="year" type="year" select="dt:property($this.date, 'year')" /> <property name="month" type="month" select="dt:property($this.date, 'month')" /> <property name="day" type="day" select="dt:property($this.date, 'day')" /> <property name="hour" type="hour" select="dt:property($this.time, 'hour')" /> <property name="minute" type="minute" select="dt:property($this.time, 'minute')" /> <property name="second" type="second" select="dt:property($this.time, 'second')" /> </datatype> <datatype name="date"> <parse name="date"> <regex ignore-whitespace="true"> (?[year]-?[0-9]{4,})- (?[month][0-9]{2})- (?[day][0-9]{2}) (?[timezone](Z|((\+|-)[0-9]{2}:[0-9]{2})))? </regex> </parse> <property name="year" type="year" select="$date/year" /> <property name="month" type="month" select="$date/month" /> <property name="day" type="day" select="$date/day" /> <property name="timezone" type="timezone" select="$date/timezone" /> <condition test="($this.month = 1 or $this.month = 3 or $this.month = 5 $this.month = 7 or $this.month = 8 or $this.month = 10 or $this.month = 12) or $this.day <= 30" /> <condition test="$this.month != 2 or $this.day <= 28 or ($this.day = 29 and ($this.year mod 400 = 0 or ($this.year mod 4 = 0 and not($this.year mod 100 = 0))))" /> <map from="dateTime" select="dt:property(., 'date')" /> <map to="dateTime" select="concat(dt:property(., 'year'), '-', dt:property(., 'month'), '-', dt:property(., 'day'), 'T00:00:00', dt:property(., 'timezone'))" /> </datatype> <datatype name="time"> <parse name="time"> <regex ignore-whitespace="true"> (?[hour][0-9]{2}): (?[minute][0-9]{2}): (?[second][0-9]{2}(\.[0-9]+)?) (?[timezone](Z|((\+|-)[0-9]{2}:[0-9]{2})))? </regex> </parse> <property name="hour" type="hour" select="$time/hour" /> <property name="minute" type="minute" select="$time/minute" /> <property name="second" type="second" select="$time/second" /> <property name="timezone" type="timezone" select="$time/timezone" /> <map from="dateTime" select="dt:property(., 'time')" /> </datatype> <datatype name="gYearMonth"> <parse name="gYearMonth"> <regex ignore-whitespace="true"> (?[year]-?[0-9]{4,})- (?[month][0-9]{2}) (?[timezone](Z|((\+|-)[0-9]{2}:[0-9]{2})))? </regex> </parse> <property name="year" type="year" select="$gYearMonth/year" /> <property name="month" type="month" select="$gYearMonth/month" /> <property name="timezone" type="timezone" select="$gYearMonth/timezone" /> <map from="date" select="concat(dt:property(., 'year'), '-', dt:property(., 'month'), dt:property(., 'timezone'))" /> <map to="date" select="concat(dt:property(., 'year'), '-', dt:property(., 'month'), '-01', dt:property(., 'timezone'))" /> </datatype> <datatype name="gYear"> <parse name="gYear"> <regex ignore-whitespace="true"> (?[year]-?[0-9]{4,}) (?[timezone](Z|((\+|-)[0-9]{2}:[0-9]{2})))? </regex> </parse> <property name="year" type="year" select="$gYear/year" /> <property name="timezone" type="timezone" select="$gYear/timezone" /> <map from="gYearMonth" select="concat(dt:property(., 'year'), dt:property(., 'timezone'))" /> <map to="gYearMonth" select="concat(dt:property(., 'year'), '-01', dt:property(., 'timezone'))" /> </datatype> <datatype name="gMonthDay"> <parse name="date"> <regex ignore-whitespace="true"> --(?[month][0-9]{2})- (?[day][0-9]{2}) (?[timezone](Z|((\+|-)[0-9]{2}:[0-9]{2})))? </regex> </parse> <property name="month" type="month" select="$date/month" /> <property name="day" type="day" select="$date/day" /> <property name="timezone" type="timezone" select="$date/timezone" /> <condition test="($this.month = 1 or $this.month = 3 or $this.month = 5 $this.month = 7 or $this.month = 8 or $this.month = 10 or $this.month = 12) or $this.day <= 30" /> <condition test="$this.month != 2 or $this.day <= 29" /> <map from="date" select="concat('--', dt:property(., 'month'), '-', dt:property(., 'day'), dt:property(., 'timezone'))" /> </datatype> <datatype name="gMonth"> <parse name="date"> <regex ignore-whitespace="true"> --(?[month][0-9]{2}) (?[timezone](Z|((\+|-)[0-9]{2}:[0-9]{2})))? </regex> </parse> <property name="month" type="month" select="$date/month" /> <property name="timezone" type="timezone" select="$date/timezone" /> <map from="gMonthDay" select="concat('--', dt:property(., 'month'), dt:property(., 'timezone'))" /> <map to="gMonthDay" select="concat('--', dt:property(., 'month'), '-01', dt:property(., 'timezone'))" /> </datatype> <datatype name="gDay"> <parse name="date"> <regex ignore-whitespace="true"> ---(?[day][0-9]{2}) (?[timezone](Z|((\+|-)[0-9]{2}:[0-9]{2})))? </regex> </parse> <property name="day" type="day" select="$date/day" /> <property name="timezone" type="timezone" select="$date/timezone" /> <map from="gMonthDay" select="concat('---', dt:property(., 'day'), dt:property(., 'timezone'))" /> </datatype> <datatype name="year"> <parse> <regex>-?[0-9]{4,}</regex> </parse> <condition test=". != 0" /> </datatype> <datatype name="month"> <parse> <regex>[0-9]{2}</regex> </parse> <condition test=". >= 1" /> <condition test=". <= 12" /> <variable name="month-element" select="document('')/*/eg:months/eg:month[position() = $this]" /> <property name="abbreviation" select="string($month-element/@abbr)" /> <property name="name" select="string($month-element)" /> </datatype> <eg:months> <eg:month abbr="Jan">January</eg:month> <eg:month abbr="Feb">February</eg:month> <eg:month abbr="Mar">March</eg:month> <eg:month abbr="Apr">April</eg:month> <eg:month abbr="May">May</eg:month> <eg:month abbr="Jun">June</eg:month> <eg:month abbr="Jul">July</eg:month> <eg:month abbr="Aug">August</eg:month> <eg:month abbr="Sep">September</eg:month> <eg:month abbr="Oct">October</eg:month> <eg:month abbr="Nov">November</eg:month> <eg:month abbr="Dec">December</eg:month> </eg:months> <datatype name="day"> <parse> <regex>[0-9]{2}</regex> </parse> <condition test=". >= 1" /> <condition test=". <= 31" /> </datatype> <datatype name="hour"> <parse> <regex>[0-9]{2}</regex> </parse> <condition test=". >= 0" /> <condition test=". < 24" /> </datatype> <datatype name="minute"> <parse> <regex>[0-9]{2}</regex> </parse> <condition test=". >= 0" /> <condition test=". < 60" /> </datatype> <datatype name="second"> <parse> <regex>[0-9]{2}(\.[0-9]+)?</regex> </parse> <condition test=". >= 0" /> <condition test=". <= 60" /> </datatype> <datatype name="timezone"> <parse name="timezone"> <regex>Z</regex> <regex>(?[hour](\+|-)[0-9]{2}):(?[minute][0-9]{2})</regex> </parse> <condition test="$timezone = 'Z' or ($timezone/hour >= -14 and $timezone/hour <= 14)" /> <condition test="$timezone = 'Z' or ($timezone/minute >= 0 and $timezone/minute < 60)" /> <map to="dayTimeDuration" select="dt:if($timezone = 'Z', 'PT0H0S', dt:if($timezone/hour >= 0, concat('PT', $timezone/hour, 'H', $timezone/minute, 'M'), concat('-PT', -$timezone/hour, 'H', $timezone/minute, 'M')))" /> <map from="dayTimeDuration" kind="weak" select="dt:if(dt:property(., 'hours') = 0 and dt:property(., 'minutes') = 0, 'Z', dt:if(dt:property(., 'hours') >= 0, concat('+', format-number(dt:property(., 'hours'), '00'), ':', format-number(dt:property(., 'minutes'), '00')), concat('-', format-number(-dt:property(., 'hours'), '00'), ':', format-number(-dt:property(., 'minutes'), '00'))))" /> </datatype> <include href="numbers.dtl" /> <datatype name="duration"> <parse name="duration"> <regex ignore-whitespace="true"> (?[neg]-)? P(?[years][0-9]+Y)? (?[months][0-9]+M)? (?[days][0-9]+D)? (T(?[hours][0-9]+H)? (?[minutes][0-9]+M)? (?[seconds][0-9](\.[0-9]+)?)?)? </regex> </parse> <condition test="$duration/years or $duration/months or $duration/days or $duration/hours or $duration/minutes or $duration/seconds" /> <variable name="neg" type="integer" select="dt:if($duration/neg, -1, 1)" /> <property name="years" type="nonNegativeInteger" select="$neg * dt:default($duration/years, 0)" /> <property name="months" type="nonNegativeInteger" select="$neg * dt:default($duration/months, 0)" /> <property name="days" type="nonNegativeInteger" select="$neg * dt:default($duration/days, 0)" /> <property name="hours" type="nonNegativeInteger" select="$neg * dt:default($duration/hours, 0)" /> <property name="minutes" type="nonNegativeInteger" select="$neg * dt:default($duration/minutes, 0)" /> <property name="seconds" type="decimal" select="$neg * dt:default($duration/seconds, 0)" /> </datatype> <datatype name="canonical-duration"> <parse> <regex ignore-whitespace="true"> (?[neg]-)? P(?[months][0-9]+M)? (T(?[seconds][0-9](\.[0-9]+)?)?)? </regex> </parse> <variable name="duration" type="duration" select="." /> <property name="months" type="nonNegativeInteger" select="dt:property($duration, 'months')" /> <property name="seconds" type="nonNegativeInteger" select="dt:property($duration, 'seconds')" /> <map from="duration" select="dt:if(dt:property(., 'years') >= 0, concat('P', dt:property(., 'years') * 12 + dt:property(., 'months'), 'MT', dt:property(., 'days') * 24 * 60 * 60 + dt:property(., 'hours') * 60 * 60 + dt:property(., 'minutes') * 60 + dt:property(., 'seconds'), 'S'), concat('-P', -dt:property(., 'years') * 12 + -dt:property(., 'months'), 'MT')) -dt:property(., 'days') * 24 * 60 * 60 + -dt:property(., 'hours') * 60 * 60 + -dt:property(., 'minutes') * 60 + -dt:property(., 'seconds'), 'S')" /> <map to="duration" select="." /> </datatype> <datatype name="yearMonthDuration"> <parse> <regex ignore-whitespace="true"> (?[neg]-)? P(?[years][0-9]+Y)? (?[months][0-9]+M)? </regex> </parse> <variable name="duration" type="duration" select="." /> <property name="years" type="nonNegativeInteger" select="dt:property($duration, 'years')" /> <property name="months" type="nonNegativeInteger" select="dt:property($duration, 'months')" /> <map from="duration" select="dt:if(dt:property(., 'years') >= 0, concat('P', dt:property(., 'years'), 'Y', dt:property(., 'months'), 'M'), concat('-P', -dt:property(., 'years'), 'Y', -dt:property(., 'months'), 'M'))" /> <map to="duration" select="." /> </datatype> <datatype name="canonical-yearMonthDuration"> <parse> <regex ignore-whitespace="true"> (?[neg]-)?P(?[months][0-9]+M) </regex> </parse> <variable name="duration" type="canonical-duration" select="." /> <property name="months" type="nonNegativeInteger" select="dt:property($duration, 'months')" /> <map from="canonical-duration" select="dt:if(dt:property(., 'months') >= 0, concat('P', dt:property(., 'months'), 'M'), concat('-P', -dt:property(., 'months'), 'M'))" /> <map to="canonical-duration" select="." /> </datatype> <datatype name="dayTimeDuration"> <parse> <regex ignore-whitespace="true"> (?[neg]-)? P(?[days][0-9]+D)? (T(?[hours][0-9]+H)? (?[minutes][0-9]+M)? (?[seconds][0-9](\.[0-9]+)?)?)? </regex> </parse> <variable name="duration" type="duration" select="." /> <property name="days" type="nonNegativeInteger" select="dt:property($duration, 'days')" /> <property name="hours" type="nonNegativeInteger" select="dt:property($duration, 'hours')" /> <property name="minutes" type="nonNegativeInteger" select="dt:property($duration, 'minutes')" /> <property name="seconds" type="nonNegativeInteger" select="dt:property($duration, 'seconds')" /> <map from="duration" select="dt:if(dt:property(., 'days') >= 0, concat('P', dt:property(., 'days'), 'DT', dt:property(., 'hours'), 'H', dt:property(., 'minutes'), 'M', dt:property(., 'seconds'), 'S'), concat('-P', -dt:property(., 'days'), 'DT', -dt:property(., 'hours'), 'H', -dt:property(., 'minutes'), 'M', -dt:property(., 'seconds'), 'S')" /> <map to="duration" select="." /> </datatype> <datatype name="canonical-dayTimeDuration"> <parse> <regex ignore-whitespace="true"> (?[neg]-)?PT(?[seconds][0-9]+S) </regex> </parse> <variable name="duration" type="canonical-duration" select="." /> <property name="seconds" type="decimal" select="dt:property($duration, 'seconds')" /> <map from="canonical-duration" select="dt:if(dt:property(., 'seconds') >= 0, concat('PT', dt:property(., 'seconds'), 'S'), concat('-PT', -dt:property(., 'seconds'), 'S'))" /> <map to="canonical-duration" select="." /> </datatype> </datatypes>