XML parser in Java

Subject: CTS [ANN] XML parser in Java

Date: 24 Mar 1997 19:42:59 -0800

From: Sean Russell <ser@javalab.uoregon.edu>

Newsgroup: comp.text.sgml


    --------------------------------------------------------------

I'm almost embarassed to announce this here, because I am so very much 
an amature at SGML/XML and because this toolset is so very primative.
However, I've authored the kernel of an XML parser in Java, and it is
available for download at:

	http://jersey.uoregon.edu/ser/software/XML.tar.gz

Some caveats:

1) If you try to break this, you will.  If you don't try to break
this, you probably still will.
2) This is a series of Java classes; if you don't program in Java, you 
probably won't get much use out of these.
3) The documentation is VERY lacking.  Almost non-existant.  In fact,
the only informative document is a Copyright notice which I
cannabalized from another of my projects and only cursorily modified
to fit this project.

Negativity aside, what good is this, and why did I bother?
1) Well, it is the nature of things to improve, at least things that I 
work on.  I'm expecting this to grow into something more generally
useful.
2) HotJava supports external content handlers.  This will, eventually, 
become a content handler, at which point HotJava will become an XML
browser.  I hope.  I'm not very knowledgable at layout, so this may
take a while.
3) I'm a proud user of BeOS.  There has been some discussion about
getting a standard installer for the BeOS, and I've been arguing that
installer directive scripts should be written in XML, rather than the
alternatives of lisp or TCL.  These classes will become part of that
project.
4) I end up using an SGML-based markup language for passing
information to Java applets so much that I wrote a general parser for
the job.  It wasn't very robust (it was, to be honest, a hack), so I
sat down last Friday with Bison and tried to work out a /very/ simple
XML parser.  I don't think I built in too many critical hinderances to 
extending the lexical description.

What does Ri.XML do?  Given a stream, it parses XML from the stream
and returns a tree of nodes rooted at the document.  The current
version parses comments, handles attributes, and does /very/
rudimentary docdef parsing.  I need to read more about XML before I go 
very much further with the DTD parsing, but at this point entities
which are defined to have a definite string value are replaced by
their values, whereas other entities are replaced by a rather
sparse Entity Object.

I would appreciate input; if anyone would like to see the .y file I
fed to bison (I used bison to test the lexical structure of my
construct, not for the parsing code it generates) I would be
interested in your comments.  Actually, I'd be interested in
/anybody's/ comments on this project.  

My goal is to promote XML.

-- 
 |..      --------------------- Sean Russell ----------------------
<|>       ser@javalab.uoregon.edu <-> http://jersey.uoregon.edu/ser
/|\       ----- [             Software Engineer            ] ------
/|              [ University of Oregon, Physics Department ]