XML Validation

Date:          25 Jun 1999 16:19:44 -0700
Subject:       XML Validation
From:          jenglish@flightlab.com (Joe English)           
Organization:  Advanced Rotorcraft Technology, Inc.        
Newsgroups:    comp.lang.functional, comp.text.sgml, comp.text.xml

On comp.lang.functional, Ketil Z Malde <ketil@ii.uib.no> wrote:

>
> I think a validating [XML parser] shouldn't be too
> hard either [in Haskell]

To which I replied:
>
>This is true. I'd say that a validator is
>far easier to write in a functional language
>than in C++ or Java.

And just to put my money where my mouth is, I went and wrote one :-)

Well, not exactly. Just the part that matches an element's children against a content model (regular expression). Parsing the DTD, the instance, entity resulution, et cetera, et cetera, is left as an excercise to the reader (but see again HaXml [1], which does all of this and more). Still, not bad for half an hour's work and 80 lines of Haskell. More details at:

URL: http://www.flightlab.com/~joe/sgml/validate.html

This is mostly of interest because it demonstrates an algorithm that doesn't seem to be as well-known as it should be. Every XML validator and RE-matcher I've seen uses finite automata, but there is another (IMO much simpler) technique based on derivatives of regular expressions; the web page above contains a (very brief) description plus a Haskell program fragment demonstrating the algorithm.

Followups to comp.text.sgml.

[1] HaXml: URL: http://www.cs.york.ac.uk/fp/HaXml/.

[. . . I've written a (*very* brief) note describing an regular expression matching algorithm which may be of interest to XML parser writers. . .

Also: Mark Hopkins' regex package at the comp.compilers archive includes a much more complete description of a souped-up version of this technique: ftp://iecc.com/pub/file/regex.tar.gz]

--Joe English

jenglish@flightlab.com

Prepared by Robin Cover for the The SGML/XML Web Page archive.

SEARCH Advanced Search ABOUT Site Map CP RSS Channel Contact Us Sponsoring CP About Our Sponsors NEWS Cover Stories Articles & Papers Press Releases CORE STANDARDS XML SGML Schemas XSL/XSLT/XPath XLink XML Query CSS SVG TECHNOLOGY REPORTS XML Applications General Apps Government Apps Academic Apps EVENTS LIBRARY Introductions FAQs Bibliography Technology and Society Semantics Tech Topics Software Related Standards Historic	XML Validation Date: 25 Jun 1999 16:19:44 -0700 Subject: XML Validation From: jenglish@flightlab.com (Joe English) Organization: Advanced Rotorcraft Technology, Inc. Newsgroups: comp.lang.functional, comp.text.sgml, comp.text.xml On comp.lang.functional, Ketil Z Malde <ketil@ii.uib.no> wrote: > > I think a validating [XML parser] shouldn't be too > hard either [in Haskell] To which I replied: > >This is true. I'd say that a validator is >far easier to write in a functional language >than in C++ or Java. And just to put my money where my mouth is, I went and wrote one :-) Well, not exactly. Just the part that matches an element's children against a content model (regular expression). Parsing the DTD, the instance, entity resulution, et cetera, et cetera, is left as an excercise to the reader (but see again HaXml [1], which does all of this and more). Still, not bad for half an hour's work and 80 lines of Haskell. More details at: URL: http://www.flightlab.com/~joe/sgml/validate.html This is mostly of interest because it demonstrates an algorithm that doesn't seem to be as well-known as it should be. Every XML validator and RE-matcher I've seen uses finite automata, but there is another (IMO much simpler) technique based on derivatives of regular expressions; the web page above contains a (very brief) description plus a Haskell program fragment demonstrating the algorithm. Followups to comp.text.sgml. [1] HaXml: URL: http://www.cs.york.ac.uk/fp/HaXml/. [. . . I've written a (very brief) note describing an regular expression matching algorithm which may be of interest to XML parser writers. . . Also: Mark Hopkins' regex package at the comp.compilers archive includes a much more complete description of a souped-up version of this technique: ftp://iecc.com/pub/file/regex.tar.gz] --Joe English jenglish@flightlab.com Prepared by Robin Cover for the The SGML/XML Web Page archive.