SGML: LT NSL(1.4.6) released, Normalised SGML Library/API

SGML: LT NSL(1.4.6) released, Normalised SGML Library/API

From Tue Jan 28 08:39:36 1997
Date: Tue, 28 Jan 1997 14:32:45 +0000 (GMT)
From: David McKelvie <>
Subject: LT NSL(1.4.6) released, Normalised SGML Library/API


The Language Technology Group are pleased to announce the availability
of the public release of version 1.4.6 of

                LT NSL --- Normalised SGML Library


LT NSL is an integrated set of SGML querying/manipulation tools and a
C-language application program interface (API) designed to ease the
writing of C programs which manipulate SGML documents. 
Its API is based on the idea of using 'normalised' SGML (i.e. an expanded,
easily parsable subset of SGML) as a data format for inter-program
communication of structured textual information.

The API defines a powerful query language which makes it easy to
access (either from the shell or in a program) those parts of an SGML
document which you are interested in. Both event based and (sub-)tree based
views of SGML documents are supported.

LT NSL contains everything required to efficiently process a very wide
range of conformant SGML documents.  Its initial parsing module
incorporates v1.1.1 of James Clark's SP software, arguably the
broadest coverage SGML parser available anywhere, commercial or not.

This is a UNIX-only release, tested so far under SunOS 4/5 and FreeBSD 2.1.
A Windows NT release is planned by mid-1997, as is support for XML.


If a pipeline of selection and transformation filters will improve the
utility of SGML data for you;

If you want to write programs to process/query SGML documents, using C
(rather than C++) and without the overhead of parsing the SGML each time.


*) The installation procedure for LT NSL has been changed to use
'configure', hopefully making installation more platform independant
and robust.

*) LT NSL now uses the newest version of SP (1.1.1).

*) New SGML processing tools have been added:

	o) 'mkindex' and 'getindex' to support efficient random
	    accessing into large SGML files.

	o) 'sgmltrans' and 'sgrpg' to support more complex
	   manipulation of/selection from normalised SGML files.

	o) 'nslmkddb' and 'nslshowddb' to create and display 
	   'compiled' DTD files. 

*) All known bugs have been fixed.

*) Extension of functionality to API: 
	access to structure of DTD;
	support for CONREF and CURRENT attributes; 
	nSGML files can now contain a <!DOCTYPE> statement; 
	more efficient handling of attributes.


LT NSL is available from,
where the complete user manual can be viewed.

People who have already signed an LT NSL license agreement do not need
to sign again, we will contact them and inform them how to access the
new release.

LT NSL is available to other researchers and development teams for
research purposes as follows:

       To academic researchers: 

              academic researchers can have a free copy of LT NSL 
	      for personal research purposes. Electronic signature 
	      of an academic licence agreement will release details 
	      of how to obtain a copy of LT NSL. 

       To industrial research groups: 

              industrial research groups can have a copy of LT NSL 
	      for research purposes at a small cost. Electronic 
	      signature of the industrial licence agreement will 
	      release details of how to obtain a copy of LT NSL. 


For example, the following C-code, using the LT NSL API will process
all <W> elements whose TAG attribute is "NOUN", the rest of the
document is passed through unchanged.

	query = ParseQuery(Doctype,".*/P/S/W[TAG=NOUN]");
	while( ( item=GetNextQueryItem(infile, query, outfile ) ) ) {
                  newItem=process_item(item); /* your code */
                  PrintItem(outfile, newItem);


		The basic tool for converting SGML to nSGML
		(normalised SGML) and caching 
		the DOCTYPE is called mknsg. This program converts 
		arbitrary valid SGML documents into nSGML. 
		All programs written using the LT NSL API assume that 
		their SGML inputs are in nSGML format, thus the mknsg 
		program should be called first in any chain of LT NSL 
		A program which creates hyperlinked SGML files from 
		nSGML files. The present version is still somewhat 
		This program works like the grep program in searching a
		file for regular string expressions. However, unlike grep,
		it is aware of the tree structure of SGML files, and
		searches can be restricted to particular elements. 
		sgmltrans is a program for translating nSGML files into 
		some other format (which could be HTML or LaTeX or ...). 
		It is loosely based on COST and other SGML programs, in 
		that one specifies actions to do at SGML start tags, 
		end tags and text content. In sgmltrans, these actions
		are restricted to printing some text to the output stream. 
		An SGML selection and transformation tool, defines a
		fairly powerful transformation language. Still experimental.
		A program for counting the number of SGML 
		elements there are in a NSGML file. 
		Text tokenization. All text inside <TEXT> elements is 
		tokenized, i.e. split into tokens and marked up with 
		<C> elements. 
		This is a perl program which segments an nSGML file 
		which has already been tokenised into words using <w> markup. 
		A quick and dirty sentence boundary marking 
		application, i.e. it adds <S> elements to an NSGML file 
		which has already been tokenised and segmented. 
		A trivial version of the nsgmls program, i.e. 
		it takes nSGML input and produces output in the form 
		that nsgmls does (ESIS format). 
		Expects NSGML as input, and outputs text only, 
		i.e. removes all SGML element markup. 
	simple, simpleq 
		Two demonstrations of how to write C programs using the 
		LT NSL API. They are annotated in detail in manual.
		Reads a file which contains an SGML <!DOCTYPE> statement 
		and creates a DDB file for that document type.
		Writes a human readable format (pseudo-SGML) of a 
		DDB file to standard output.
		Produces an index from an SGML input document, i.e. 
		it produces an index file which maps (a selected
		subset) of the SGML elements in the input document 
		to file names and character positions in those files. 
		Returns the content of SGML elements from their index


The distribution is designed to work whether or not you have already
installed James Clark's SGML parser package SP-1.1.1.