New expat Release

From Sun Jun 21 05:22:50 1998
Date:     Sun, 21 Jun 1998 17:03:06 +0700
From:     James Clark <>
Subject:  New expat release

A new release of expat is now available for ftp from:

The zip file includes both sources and Win32 binaries.

Expat is a high-performance, fully conforming, non-validating XML 1.0 parser toolkit written C.

A significant amount of new functionality has been added since the previous release, which means this is still a beta release. I don't plan to add new features before the production release; I plan that any future beta releases will just fix bugs.

There is a callback that allows an application to add to the set of encodings that expat supports. There's an example of using this to hook into the Windows code page support.

It can be compiled to pass characters to the application in Unicode (ie as a sequence of 16-bit codes) rather than in UTF-8.

There are new callbacks to provide information about unparsed entities and notations.

There are new functions that allow an application to determine the location (line number, column number, byte index) of all events.

There are some hooks to allow applications to have access to the raw markup of the document along with the parsed result; the goal is to allow the writing of XML-to-XML filters that don't normalize the document markup.

Since xmlwf is getting a little complicated, I've added a simple sample program in the sample directory.


AND . . .

From Sun Jun 21 05:39:47 1998
Date:    Sun, 21 Jun 1998 17:21:32 +0700
From:    James Clark <>
Subject: Re: Perl expat tweaks

Larry Wall wrote:

> I can't get this idea out of my head
> of not losing any information, no matter how trivial.

I tend to agree.

> So I guess we need to view the
> textual data as not redundant with the structural data, but rather
> complementary.  So it's really okay to return the same data twice. :-)

I've put a new version of expat on

In addition to the default handler feature I mentioned in my previous
message I added a new callback:

/* This can be called within a handler for a start element, end element,
processing instruction or character data.  It causes the corresponding
markup to be passed to the default handler. */

void XML_DefaultCurrent(XML_Parser parser);

If you set up a default handler and all your other handlers call
XML_DefaultCurrent, every character is preserved.

The idea is that Perl can either expose this directly, or the glue code
can always call it and make the markup available along with the
structural information.