Date: 25 Jun 1999 16:19:44 -0700 Subject: XML Validation From: email@example.com (Joe English) Organization: Advanced Rotorcraft Technology, Inc. Newsgroups: comp.lang.functional, comp.text.sgml, comp.text.xml
Ketil Z Malde <firstname.lastname@example.org> wrote:
> I think a validating [XML parser] shouldn't be too
> hard either [in Haskell]
To which I replied:
>This is true. I'd say that a validator is
>far easier to write in a functional language
>than in C++ or Java.
And just to put my money where my mouth is, I went and wrote one :-)
Well, not exactly. Just the part that matches an element's children against a content model (regular expression). Parsing the DTD, the instance, entity resulution, et cetera, et cetera, is left as an excercise to the reader (but see again HaXml , which does all of this and more). Still, not bad for half an hour's work and 80 lines of Haskell. More details at:
This is mostly of interest because it demonstrates an algorithm that doesn't seem to be as well-known as it should be. Every XML validator and RE-matcher I've seen uses finite automata, but there is another (IMO much simpler) technique based on derivatives of regular expressions; the web page above contains a (very brief) description plus a Haskell program fragment demonstrating the algorithm.
Followups to comp.text.sgml.
 HaXml: URL: http://www.cs.york.ac.uk/fp/HaXml/.
[. . . I've written a (*very* brief) note describing an
regular expression matching algorithm which may
be of interest to XML parser writers. . .
Also: Mark Hopkins' regex package at the comp.compilers archive includes a much more complete description of a souped-up version of this technique: ftp://iecc.com/pub/file/regex.tar.gz]
Prepared by Robin Cover for the The SGML/XML Web Page archive.