[Snapshot from: http://www.asahi-net.or.jp/~eb2m-mrt/hidden/xml2002Murata.htm, a provisional reference only. See the related reference.]

RELAX NG Validator on
Mobile Phones

MURATA Makoto (FAMILY Given)
International University of Japan

Demonstration

Schema and XML document

·    A RELAX NG schema

·   created from a 32-element-type DTD with dtdInst

 

·   Note: RELAX NG is a schema language for XML and will soon become an ISO/IEC standard.

·    XML document

·   2KB

Overview

Two-phase validation

·    1st phase: Generate a table from a schema on PCs

·   The size of the table for the demo schema is 7KB

·    2nd Phase: Validate an XML document against the table on mobile phones

 

·   Note: Kohsuke Kawagauchi has used similar techniques for improving portability

Programs

·    Table generator (Desktop PC)

·   JAR file                 2MBinot including librariesj

·   libraries: Xerces, Jing, rng2srng

·    Table-driven validator (Mobile phone)

·   JAR file               27KB

·  Including kXML2, XML doc, and table

·   CLDC

·   libraries:  kXML2(12KB)

Limitations of the present implementation

·    <data> and <value> are not implemented.

·   Which datatype on mobile phones?

·    <interleave> is not implemented .

·   Will do soon.

·    <list> is not implemented.

·   Easy.

Why is this possible?

·    Theory behind RELAX NG!

·    Tree automata

·    Conversion of Hedge Automata  to Binary Tree Automata

·    Handling of attribute constraints of RELAX NG by on-the-fly rewriting  of binary tree automata (Hosoya and Murata, PLAN-X)

Implementation

Implementation Outline

·    Interpreting XML documents as binary  trees

·    Table format

·    Table-driven validation

·    Generating tables from RELAX NG schemas

Interpreting XML documents as binary  trees

What is a binary  tree?

·    Each node has EXACTLY TWO children.

·    A node may be NULL.

XML documents as binary   trees

preOrder, inOrder, and postOrder actions for binary trees

Invoking actions during parsing

·     preOrder

·    when an start tag is encountered.

·     inOrder

·    when an end tag is encountered.

·     postOrder

·    when an end tag is encountered, post-order-action is performed to all child elements.

·     Handling of text is similar to that of a
start-end tag pair

Table Format

Overview

·    A list of (number, namespaceURI)

·    A list of (number, datatypeLibraryURI)

·    A list of (number, nameClass)

·    ...

·    A list of element transitions

·    A list of attribute transitions

·    ....

Element Transitions

·    et left:8 right:0 target:1 nc:0

·    et left:22 right:2 target:2 nc:3

·    et left:22 right:0 target:2 nc:3

·    et left:42 right:0 target:9 nc:14

·    et left:40 right:0 target:10 nc:13

 

Attribute transitions

·    oneOrMoreAtt left:4 right:2 target:5 nc:2

·    oneOrMoreAtt left:0 right:2 target:5 nc:2

·    oneOrMoreAtt left:6 right:3 target:7 nc:1

·    ...

·    nonExisAtt right:2 target:3 nc:2 - (0)

·    nonExisAtt right:7 target:8 nc:50 - (51)

·    nonExisAtt right:18 target:19 nc:4 - (0)

·    ...

Table-driven validation

Interpreting Element Transitions

·     preOrder

·    Receive candidate states for this element

·    Compute candidate states for the left child

·     inOrder

·    Receive elected candidates for the left child.

·    Compute candidate states for the right child

·     postOrder

·    Receive elected candidates for the right child

·    Determine which candidates are elected

Example

·     et left:8 right:0 target:1 nc:0

·     preOrderF

·    If state 1 is a candidate state and the current tag name matches name class 0, then state 8 is a candidate state for the left child.

·     inOrder:

·    If state 8 is elected for the left child, then state 0 is a candidate state for the right  child.

·     postOrder:

·    Is state 0 is elected for the right child, then state 1 is elected for the current node.

Interpreting attribute transitions

·    Introduce more candidate states by examining elements.

·    Transitions handling more than one attribute (oneOrMoreAtt )

·    Transitions ensuring absence of attributes (nonExisAtt )

oneOrMoreAtt example

·    oneOrMoreAtt left:4 right:2 target:5 nc:2

·    If

·   One or more  attributes have attribute names matching name class 0

·   State 4 is elected for any of their attribute values

·   State 5 is a candidate state

·    Then

·   Add state 2 as a candidate state

nonExisAtt example

·    nonExisAtt right:2 target:3 nc:2 - (0)

·    If

·   no attribute names match name class 0 without matching name class 0

·   State 3 is a candidate state

·    then

·   Add state 2 as a candidate state

Generating Tables from
RELAX NG Schemas

Outline

·    Simplification of  RELAX NG schemas (as specified in the RELAX NG spec)

·    Conversion to binary tree grammars

·    Conversion to binary tree automata

 

 

·   Available from Source Forge for RELAX NG (miaou)

Internal Data structures

·    DOM objects represting the simple syntax

·    Java objects representing simplified syntax

·    Java objects representing binary tree grammars

·    Java objects representing binary tree automata

Conclusion

·    Underlying theory of RELAX NG helps implementation.

·    Table-driven validation provides small validators applicable to mobile phones.

Future work

·    Remaining features of RELAX NG

·    Performance improvements

·    Data binding