Expat (XML Parser Toolkit) Module for Ruby From: http://www.bekkoame.ne.jp/~yoshidam/xmlparser_en.txt Date: 980817 --------------------------------------------------------------- Expat (XML Parser Toolkit) Module for Ruby version 0.3.3 Yoshida Masato - Introduction This is the module to access to James Clark's XML Parser Toolkit "expat" (http://www.jclark.com/xml/expat.html) from Ruby. Supported version of expat is 1.0. TAKAHASHI Masayoshi's Japanese enhanced version of expat (http://www.inac.co.jp/~maki/xml/expat.html) is supported too. - Installation This can work with ruby-1.1. And you need the source code of expat. Extract the archive of expat under this package. And apply the patch 'expat.diff' (to modify Makefile) to it. If you use Japanese enhanced version of expat (expat_ja), apply 'expat_ja.diff' cd ext gzip -dc < xmlparser-0.1.tar.gz | tar xvf - cd xmlparser unzip expat.zip cd expat patch -p1 < ../expat.diff make cd .. After making expat, install this module usually. For example, when Ruby supports dynamic linking on your OS, ruby extconf.rb make make install - Usage If you do not link this module with Ruby statically, require "xmlparser" before using. There is two styles to get parsing result. One is to define instance methods as event handlers, another is to use iterator. To define event handlers is like SAX (Simple API for XML). But XMLParser class and derived can not have instance variables, therefore its usages is limited. If you use event handlers, inherit XMLParser class and define instance methods as event handlers. Or you may use the instance of XMLParser class (or derived) with singleton instance methods as event handlers. When no event handlers are defined, this parser does syntax checking only. method name | event ----------------------+----------------------- startElement | element start tag endElement | element end tag character | character data processingInstruction | processing instruction default | other data To use iterator is probably a ruby-ish manner. I recommend you to use this style in most cases. If you use iterator, this parser ignores event handlers even if they are defined. The iterator evaluates the iterator block with three variables, event type, name, and data. event type | name | data ----------------------+---------------+------------------- XMLParser::START_ELEM | element name | hash of attributes XMLParser::END_ELEM | element name | nil XMLParser::CDATA | nil | string XMLParser::PI | PI name | string XMLParser::DEFAULT | nil | string XMLParser::DEFAULT events are generated only if a dummy "default" method is defined. Supported input character encodings are UTF-8 and UTF-16. Output character encoding is UTF-8. XMLParser class: Class method new Create a XML parser object. The failure of the creation raises a XMLParserError exception. The object that finish parsing cannot be reused, so you must create a new one for every parsing. Method parse(str) Parse a string. This method can be an iterator. Parsing results can be processed by event handlers or an iterator block. The failure to parse raises a XMLParserError exception. The character encoding of a string must be UTF-8 or UTF-16. If you are using TAKAHASHI's expat_ja, it may be EUC-JP or Shift_JIS. You should specify the character encoding of the string by "encoding" attribute of the XML declaration if the encoding is not UTF-8. defaultCurrent Raise a "default" event within any event handlers or an iterator block. You can get the corresponding markup. If within a event handler, it raise a default event immediately. But within an iterator block, the next yielding will be XMLParser::DEFUALT. line column byteIndex Get current parsing location. When a "parse" method raises XMLParserError, these method return the position of the error detected. Method (event handler) startElement(name, attrs) This method is called when element start tags are detected. "name" is the element name, attrs is a hash of attributes, the keys are the attribute's name, the values are attribute's values. endElement(name) This method is called when element end tags are detected. "name" is the element name. character(data) This method is called when texts or CDATA sections are detected. Internal entities are expanded as long as "default" handler is not defined. processingInstruction(target, data) This method is called when processing instructions are detected. default(data) This method is called when there is no applicable event handler. If this method is defined, expansion of internal entities are inhibited. If you use iterator, this method is not called, but to define this affects to cause XMLParser::DEFAULT event and to inhibit expansion of internal entities. - Additional Library XML::SimpleTree module and XML::SimpleTreeBuilder module are added since version 0.3.1. These module are not well documented, and API specification is not fixed, so they are for expert only. XML::SimpleTree module This module is a library for making and manipulating XML trees. The APIs are like Document Object Model (DOM) of W3C. Classes NameNodeMap NodeList Node Document, - History Aug 14, 1998 version 0.3.3 support expat 1.0 Aug 12, 1998 version 0.3.2 Aug 4, 1998 version 0.3.1 Jul 17, 1998 version 0.3 Jul 3, 1998 version 0.2 Jul 1, 1998 version 0.1