AElfred XML Parser

From david@freenet.carleton.ca  Tue Dec  9 18:19:34 1997
Date: Tue, 9 Dec 1997 19:19:18 -0500
Message-Id: <199712100019.TAA00266@unready.microstar.com>
From: David Megginson <ak117@freenet.carleton.ca>
Subject: AElfred XML Parser

            ------------------------------------------------

Microstar Software Ltd. is happy to announce Ælfred (AElfred), a
small, fast, DTD-aware Java-based XML parser, especially suitable for
use in Java applets.

We've designed Ælfred for Java programmers who want to add XML support
to their applets and applications without doubling their size: Ælfred
consists of only two class files, with a total size of approximately
24K, and requires very little memory to run.  Ælfred also implements
Java's java.lang.Runnable interface and a zero-argument constructor,
so it's easy to start Ælfred as a separate thread or to adapt it for
use as a JavaBean.

Ælfred is free for both commercial and non-commercial use, and COMES
WITH NO WARRANTEE.  You can download a copy of version 1.0 (with
source code) from the following URL:

  http://www.microstar.com/XML/index.htm

[December 11, 1997 Update:

Date: Thu, 11 Dec 1997 17:22:43 -0500
From: David Megginson <ak117@freenet.carleton.ca>
To: xml-dev Mailing List <xml-dev@ic.ac.uk>
Subject: AElfred 1.0beta3 release

There is a new release of Microstar's Ælfred XML parser at

  http://www.microstar.com/XML/

The new version is still interface-compatible with the first two
public betas, but it adds the ability to query for content models and
enumerated attribute types (both returned as normalised strings, with
whitespace removed and parameter entities resolved).

With the new query routines, Ælfred is now capable of producing a
normalised version of an XML document's DTD; in fact, the distribution
now includes a new demonstration class, DtdDemo.java, that does
exactly that. ]

*****************
DESIGN PRINCIPLES
*****************

1. Ælfred must be as small as possible, so that it doesn't add too
   much to your applet's download time.

   STATUS: Ælfred is currently about 24K in total, and we're still
    looking for ways to shrink it further.

2. Ælfred must use as few class files as possible, to minimize the number
   of HTTP connections necessary for applets.

   STATUS: Ælfred consists of only two class files, the main parser
    class (XmlParser.class) and a small interface for your own program
    to implement (XmlProcessor.class).  All other classes in the
    distribution are just demonstrations.

3. Ælfred must be compatible with most or all Java implementations
   and platforms.

   STATUS: Ælfred uses only JDK 1.0.2 features, and we have tested it
    successfully with the following Java implementations: JDK 1.1.1
    (Linux), jview (Windows NT), Netscape 4 (Linux and Windows NT),
    Internet Explorer 3 (Windows NT), and Internet Explorer 4 (Windows
    NT).

4. Ælfred must use as little memory as possible, so that it does not take
   away resources from the rest of your program.

   STATUS: On a P75 Linux system, using JDK 1.1.1, running Ælfred
    (with a 4MB XML document) takes only 2MB more memory than running
    a simple "Hello world" Java application.  Because Ælfred does not
    build an in-memory parse tree, you can run it on very large input
    files using little or no extra memory.

5. Ælfred must run as fast as possible, so that it does not slow down
   the rest of your program.

   STATUS: On a P75 Linux system, using JDK 1.1.1 (without a JIT
    compiler), Ælfred parses XML test files at about 50K/second.  On a
    P166 NT workstation, using jview, Ælfred parses XML test files at
    about 1MB/second.

6. Ælfred must produce correct output for well-formed and valid
   documents, but need not reject every document that is not valid or
   not well-formed.

   STATUS: Ælfred is DTD-aware, and handles all current XML features,
    including CDATA and INCLUDE/IGNORE marked sections, internal and
    external entities, proper whitespace treatment in element content,
    and default attribute values.  It will sometimes accept input that
    is technically incorrect, however, without reporting an error (see
    README), since full error reporting would make the parser much
    larger.

7. Ælfred must provide full internationalisation from the first release.

   STATUS: Ælfred supports Unicode to the fullest extent possible in
    Java.  It correctly handles XML documents encoded using UTF-8,
    UTF-16, ISO-10646-UCS-2, ISO-10646-UCS-4 (as far as surrogates
    allow), and ISO-8859-1 (ISO Latin 1/Windows).  With these
    character sets, Ælfred can handle all of the world's major (and
    most of its minor) languages.


***********************
ABOUT THE NAME "Ælfred"
***********************

Ælfred the Great (AElfred in ASCII) was king of Wessex, and at least
nominally of all England, at the time of his death in 899AD.  Ælfred
introduced a wide-spread literacy program in the hope that his people
would learn to read English, at least, if Latin was too difficult for
them.  This Ælfred hopes to bring another sort of literacy to Java,
using XML, at least, if full SGML is too difficult.

The initial "Æ" (AE ligature) is also a reminder that XML is not
limited to ASCII.


Enjoy!


David

---
David Megginson                 ak117@freenet.carleton.ca
Microstar Software Ltd.         dmeggins@microstar.com
      http://home.sprynet.com/sprynet/dmeggins/