RELAX NG Validator on
Mobile Phones
MURATA Makoto (FAMILY Given)
International University of Japan
Demonstration
Schema and XML document
·
A RELAX NG
schema
·
created from
a 32-element-type DTD with dtdInst
·
Note: RELAX NG is a schema language for XML and will soon become an
ISO/IEC standard.
· XML
document
·
2KB
Overview
Two-phase
validation
· 1st phase: Generate a table from a schema on PCs
·
The size of the table for the demo schema is 7KB
· 2nd
Phase: Validate an XML document against the table on mobile phones
·
Note: Kohsuke Kawagauchi has used similar techniques for improving
portability
Programs
· Table
generator (Desktop PC)
·
JAR file 2MBinot
including librariesj
·
libraries: Xerces, Jing, rng2srng
· Table-driven
validator (Mobile phone)
·
JAR file 27KB
· Including
kXML2, XML doc, and table
·
CLDC
·
libraries: kXML2(12KB)
Limitations
of the present implementation
· <data> and <value> are not implemented.
·
Which datatype on mobile phones?
· <interleave>
is not implemented .
·
Will do soon.
· <list>
is not implemented.
·
Easy.
Why
is this possible?
· Theory
behind RELAX NG!
· Tree
automata
·
Conversion of Hedge Automata to Binary Tree Automata
· Handling
of attribute constraints of RELAX NG by on-the-fly rewriting of binary tree automata (Hosoya and
Murata, PLAN-X)
Implementation
Implementation
Outline
· Interpreting
XML documents as binary
trees
· Table
format
· Table-driven
validation
· Generating
tables from RELAX NG schemas
Interpreting
XML documents as binary
trees
What
is a binary tree?
· Each
node has EXACTLY TWO children.
· A node
may be NULL.
XML
documents as binary trees
preOrder,
inOrder, and postOrder actions for binary trees
Invoking
actions during parsing
·
preOrder
· when
an start tag is encountered.
·
inOrder
· when
an end tag is encountered.
·
postOrder
· when
an end tag is encountered, post-order-action is performed to all child
elements.
·
Handling of text is similar to that of a
start-end tag pair
Table Format
Overview
·
A list of (number, namespaceURI)
·
A list of (number,
datatypeLibraryURI)
·
A list of (number, nameClass)
· ...
·
A list of element transitions
·
A list of attribute transitions
· ....
Element
Transitions
· et
left:8 right:0 target:1 nc:0
· et
left:22 right:2 target:2 nc:3
· et
left:22 right:0 target:2 nc:3
· et
left:42 right:0 target:9 nc:14
· et
left:40 right:0 target:10 nc:13
Attribute
transitions
· oneOrMoreAtt
left:4 right:2 target:5 nc:2
· oneOrMoreAtt
left:0 right:2 target:5 nc:2
· oneOrMoreAtt
left:6 right:3 target:7 nc:1
· ...
· nonExisAtt
right:2 target:3 nc:2 - (0)
· nonExisAtt
right:7 target:8 nc:50 - (51)
· nonExisAtt
right:18 target:19 nc:4 - (0)
· ...
Table-driven
validation
Interpreting
Element Transitions
·
preOrder
· Receive
candidate states for this element
· Compute
candidate states for the left child
·
inOrder
· Receive
elected candidates for the left child.
· Compute
candidate states for the right child
·
postOrder
· Receive
elected candidates for the right child
· Determine
which candidates are elected
Example
·
et left:8 right:0 target:1 nc:0
·
preOrderF
· If
state 1 is a candidate state and the current tag name matches name class 0,
then state 8 is a candidate state for the left child.
·
inOrder:
· If
state 8 is elected for the left child, then state 0 is a candidate state for
the right child.
·
postOrder:
· Is
state 0 is elected for the right child, then state 1 is elected for the current
node.
Interpreting
attribute transitions
· Introduce
more candidate states by examining elements.
· Transitions
handling more than one attribute (oneOrMoreAtt )
· Transitions
ensuring absence of attributes (nonExisAtt )
oneOrMoreAtt
example
· oneOrMoreAtt
left:4 right:2 target:5 nc:2
· If
·
One or more
attributes have attribute names matching name class 0
·
State 4 is elected for any of their attribute values
·
State 5 is a candidate state
· Then
·
Add state 2 as a candidate state
nonExisAtt
example
· nonExisAtt
right:2 target:3 nc:2 - (0)
· If
·
no attribute names match name class 0 without matching name class 0
·
State 3 is a candidate state
· then
·
Add state 2 as a candidate state
Generating Tables from
RELAX NG Schemas
Outline
· Simplification
of RELAX NG schemas (as specified
in the RELAX NG spec)
· Conversion
to binary tree grammars
· Conversion
to binary tree automata
·
Available from Source Forge for RELAX NG (miaou)
Internal
Data structures
· DOM
objects represting the simple syntax
· Java
objects representing simplified syntax
· Java
objects representing binary tree grammars
· Java
objects representing binary tree automata
Conclusion
· Underlying
theory of RELAX NG helps implementation.
· Table-driven
validation provides small validators applicable to mobile phones.
Future
work
· Remaining
features of RELAX NG
· Performance
improvements
· Data
binding