SGML: LT NSL(1.4.6) released, Normalised SGML Library/API
From dmck@cogsci.ed.ac.uk Tue Jan 28 08:39:36 1997
Date: Tue, 28 Jan 1997 14:32:45 +0000 (GMT)
From: David McKelvie <dmck@cogsci.ed.ac.uk>
Subject: LT NSL(1.4.6) released, Normalised SGML Library/API
----------------------------------------------------
The Language Technology Group are pleased to announce the availability
of the public release of version 1.4.6 of
LT NSL --- Normalised SGML Library
DESCRIPTION
LT NSL is an integrated set of SGML querying/manipulation tools and a
C-language application program interface (API) designed to ease the
writing of C programs which manipulate SGML documents.
Its API is based on the idea of using 'normalised' SGML (i.e. an expanded,
easily parsable subset of SGML) as a data format for inter-program
communication of structured textual information.
The API defines a powerful query language which makes it easy to
access (either from the shell or in a program) those parts of an SGML
document which you are interested in. Both event based and (sub-)tree based
views of SGML documents are supported.
LT NSL contains everything required to efficiently process a very wide
range of conformant SGML documents. Its initial parsing module
incorporates v1.1.1 of James Clark's SP software, arguably the
broadest coverage SGML parser available anywhere, commercial or not.
This is a UNIX-only release, tested so far under SunOS 4/5 and FreeBSD 2.1.
A Windows NT release is planned by mid-1997, as is support for XML.
YOU WILL FIND LT NSL OF INTEREST
If a pipeline of selection and transformation filters will improve the
utility of SGML data for you;
If you want to write programs to process/query SGML documents, using C
(rather than C++) and without the overhead of parsing the SGML each time.
CHANGES TO PREVIOUS RELEASE (1.4.4)
*) The installation procedure for LT NSL has been changed to use
'configure', hopefully making installation more platform independant
and robust.
*) LT NSL now uses the newest version of SP (1.1.1).
*) New SGML processing tools have been added:
o) 'mkindex' and 'getindex' to support efficient random
accessing into large SGML files.
o) 'sgmltrans' and 'sgrpg' to support more complex
manipulation of/selection from normalised SGML files.
o) 'nslmkddb' and 'nslshowddb' to create and display
'compiled' DTD files.
*) All known bugs have been fixed.
*) Extension of functionality to API:
access to structure of DTD;
support for CONREF and CURRENT attributes;
nSGML files can now contain a <!DOCTYPE> statement;
more efficient handling of attributes.
AVAILABILITY
LT NSL is available from http://www.ltg.hcrc.ed.ac.uk/software.html,
where the complete user manual can be viewed.
People who have already signed an LT NSL license agreement do not need
to sign again, we will contact them and inform them how to access the
new release.
LT NSL is available to other researchers and development teams for
research purposes as follows:
To academic researchers:
academic researchers can have a free copy of LT NSL
for personal research purposes. Electronic signature
of an academic licence agreement will release details
of how to obtain a copy of LT NSL.
To industrial research groups:
industrial research groups can have a copy of LT NSL
for research purposes at a small cost. Electronic
signature of the industrial licence agreement will
release details of how to obtain a copy of LT NSL.
EXAMPLE
For example, the following C-code, using the LT NSL API will process
all <W> elements whose TAG attribute is "NOUN", the rest of the
document is passed through unchanged.
query = ParseQuery(Doctype,".*/P/S/W[TAG=NOUN]");
while( ( item=GetNextQueryItem(infile, query, outfile ) ) ) {
newItem=process_item(item); /* your code */
PrintItem(outfile, newItem);
FreeItem(item);
FreeItem(newItem);
};
SUMMARY OF LT NSL PROGRAMS
mknsg
The basic tool for converting SGML to nSGML
(normalised SGML) and caching
the DOCTYPE is called mknsg. This program converts
arbitrary valid SGML documents into nSGML.
All programs written using the LT NSL API assume that
their SGML inputs are in nSGML format, thus the mknsg
program should be called first in any chain of LT NSL
applications.
unknit
A program which creates hyperlinked SGML files from
nSGML files. The present version is still somewhat
experimental.
sggrep
This program works like the grep program in searching a
file for regular string expressions. However, unlike grep,
it is aware of the tree structure of SGML files, and
searches can be restricted to particular elements.
sgmltrans
sgmltrans is a program for translating nSGML files into
some other format (which could be HTML or LaTeX or ...).
It is loosely based on COST and other SGML programs, in
that one specifies actions to do at SGML start tags,
end tags and text content. In sgmltrans, these actions
are restricted to printing some text to the output stream.
sgrpg
An SGML selection and transformation tool, defines a
fairly powerful transformation language. Still experimental.
sgcount
A program for counting the number of SGML
elements there are in a NSGML file.
sgmltoken
Text tokenization. All text inside <TEXT> elements is
tokenized, i.e. split into tokens and marked up with
<C> elements.
sgmlseg
This is a perl program which segments an nSGML file
which has already been tokenised into words using <w> markup.
sgmlsb
A quick and dirty sentence boundary marking
application, i.e. it adds <S> elements to an NSGML file
which has already been tokenised and segmented.
pesis
A trivial version of the nsgmls program, i.e.
it takes nSGML input and produces output in the form
that nsgmls does (ESIS format).
textonly
Expects NSGML as input, and outputs text only,
i.e. removes all SGML element markup.
simple, simpleq
Two demonstrations of how to write C programs using the
LT NSL API. They are annotated in detail in manual.
nslmkddb
Reads a file which contains an SGML <!DOCTYPE> statement
and creates a DDB file for that document type.
nslshowddb
Writes a human readable format (pseudo-SGML) of a
DDB file to standard output.
mkindex
Produces an index from an SGML input document, i.e.
it produces an index file which maps (a selected
subset) of the SGML elements in the input document
to file names and character positions in those files.
getindex
Returns the content of SGML elements from their index
numbers.
INSTALLATION
The distribution is designed to work whether or not you have already
installed James Clark's SGML parser package SP-1.1.1.