Lark 0.97 Available

Date: Wed, 29 Oct 1997 13:37:24 -0800
From: Tim Bray <>
Subject: Lark 0.97 Available


Lark 0.97 is now available at

Lark now
+ is smaller! More code, but the class files are back down to 45k.
+ is faster! About 200K/second on my mouldy old P100, i.e. Lark parses
  Jon's Old Testament file (3.88M) in under 20 seconds - this is just
  the event-stream & syntax check.  If you want to build complete trees 
  in memory, parsing for any document slows down a lot, obviously.
+ is free of case-folding.
+ checks for duplicate attribute names attached to one element.
+ reads multiple attlist declarations, collisions go to the first.
+ won't let you &refer-to; an external text entity in an attribute value -
  what the spec says, and James says this is a good idea and he's 
  usually right.
+ reads the external DTD subset if the toggle 
  lark.processExternalEntities(true) has been set (and, of course, if
  a usable SYSTEM ID has been provided).
+ has a new version of the central readXML method, that allows you to 
  specify a base URL for the document entity; necessary for relative-URL
  constructions such as <!DOCTYPE foo SYSTEM "foo.dtd" > to work.
+ has another Entity member mBaseURL;
  constructor argument, and set/get function pair, to retrieve the URL 
  associated with an external entity.
+ does full PE processing, including external PE's.
+ as a result, class Entity has a new member 
  boolean mPE;
  with a new argument on its constructor and a new method 
  public boolean isPE().
- doesn't do conditional sections, still.
+ upon encountering a reference to an undeclared entity, checks to see if the
  declaration might have been external and bypassed; this can happen when 
  (a) you have turned off mProcessExternalEntities, and
  (b) there is an external DTD subset, or
  (c) there is a ref to an external PE in the internal subset at
      a point where a whole markup declaration might be recognized.
  If so, Lark turns off draconian error handling and allows processing
  to continue; however, Handler has a new  method, doWarning(), 
  that gets called in this situation.
+ processes entity/char references correctly in <!ATTLIST default values.
+ has had the Handler.doAttlist() method changed - now takes 
  an Object[] instead of String[] argument, since the default value is 
  now a Text as opposed to a String, because of entities in defaults.
+ does entity declaration processing properly, doing Henry Thomson's
  hideous example from the spec Appendix C, and another, just as nasty,
  that I have cooked up for the next release of the spec.  Blecch.
+ has a big bug-fix: it turns out pre-0.97 Lark almost never parsed 
  <!doctype declarations properly, botching SYSTEM & PUBLIC identifiers; 
  so the Handler.doDoctype() method has been rebuilt, since I can't 
  imagine anybody ever actually did anything useful with it.
+ has a change to Handler.doSyntaxError() (sorry), which now has a third 
  arg, char c, that gives the character that caused Lark to decide
  the doc wasn't well-formed... in lots of cases, this turns out to
  be real useful.  Others not.

Cheers, Tim Bray

xml-dev: A list for W3C XML Developers. To post,
Archived as:
To (un)subscribe, the following message;
(un)subscribe xml-dev
To subscribe to the digests, the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (