SGML: XML parser in Java

XML parser in Java

Subject: CTS [ANN] XML parser in Java
Date: 24 Mar 1997 19:42:59 -0800
From: Sean Russell <>
Newsgroup: comp.text.sgml
-------------------------------------------------------------- I'm almost embarassed to announce this here, because I am so very much an amature at SGML/XML and because this toolset is so very primative. However, I've authored the kernel of an XML parser in Java, and it is available for download at: Some caveats: 1) If you try to break this, you will. If you don't try to break this, you probably still will. 2) This is a series of Java classes; if you don't program in Java, you probably won't get much use out of these. 3) The documentation is VERY lacking. Almost non-existant. In fact, the only informative document is a Copyright notice which I cannabalized from another of my projects and only cursorily modified to fit this project. Negativity aside, what good is this, and why did I bother? 1) Well, it is the nature of things to improve, at least things that I work on. I'm expecting this to grow into something more generally useful. 2) HotJava supports external content handlers. This will, eventually, become a content handler, at which point HotJava will become an XML browser. I hope. I'm not very knowledgable at layout, so this may take a while. 3) I'm a proud user of BeOS. There has been some discussion about getting a standard installer for the BeOS, and I've been arguing that installer directive scripts should be written in XML, rather than the alternatives of lisp or TCL. These classes will become part of that project. 4) I end up using an SGML-based markup language for passing information to Java applets so much that I wrote a general parser for the job. It wasn't very robust (it was, to be honest, a hack), so I sat down last Friday with Bison and tried to work out a /very/ simple XML parser. I don't think I built in too many critical hinderances to extending the lexical description. What does Ri.XML do? Given a stream, it parses XML from the stream and returns a tree of nodes rooted at the document. The current version parses comments, handles attributes, and does /very/ rudimentary docdef parsing. I need to read more about XML before I go very much further with the DTD parsing, but at this point entities which are defined to have a definite string value are replaced by their values, whereas other entities are replaced by a rather sparse Entity Object. I would appreciate input; if anyone would like to see the .y file I fed to bison (I used bison to test the lexical structure of my construct, not for the parsing code it generates) I would be interested in your comments. Actually, I'd be interested in /anybody's/ comments on this project. My goal is to promote XML. -- |.. --------------------- Sean Russell ---------------------- <|> <-> /|\ ----- [ Software Engineer ] ------ /| [ University of Oregon, Physics Department ]