SGML: Programming SP
Subject: Need: SGML parser with easy API
Date: Sun, 10 Mar 1996 11:45:03 +0000
From: James Clark <jjc@jclark.com>
Newsgroups: comp.text.sgml
References: <4hi802$ec6@news.jhu.edu>
_________________________________________________________________
> From: kolda@brutus.mts.jhu.edu (Kenneth D. Kolda)
> Newsgroups: comp.text.sgml
> Date: 5 Mar 1996 20:25:38 GMT
>
> I'm looking for an SGML parser with an easy-to-learn and use
> API. I've looked around at sgmls and sp 1.0.1 but they weren't
> documented well interms of using them as an API.
You can get a first draft of documentation on programming SP from
<URL:http://www.jclark.com/sp-prog.html>. This documents only SP's
"generic" API (the one in the API directory); this is an API built on
top of SP's native API which is designed to be much simpler and easier
to use that the native API whilst still being efficient and providing
enough information for many applications.
> What I'd like
> ideally is to be able to feed a DTD and a Document into the API and
> then be able to sequentially retrieve chunks of text along with
> a "tag stack" and a byte offset to the beginning of the text string.
SP doesn't maintain a tag stack but it does tell you about the start
and end of every element so that your application can maintain one.
I'm not exactly sure what you want in terms of byte offsets, but the
generic API will tell you for every chunk of data the byte offset from
the beginning of the storage object (eg file) that contains the data.
There are a couple of provisos: if you are using a variable-width
multibyte encoding (such as UTF-8), you can get the offset in
characters but not in bytes; if your data is in an internal entity,
you will get the offset from the start of the reference to the entity.
James Clark
jjc@jclark.com