SGML: SGREP 0.99
Subject: ANNOUNCE: sgrep v0.99 : structured grep
Date: 30 Apr 1996 10:16:22 GMT
From: jjaakkol@cs.helsinki.fi (Jani Jaakkola)
-----------------------------------------------------------------------
The Document Management Group at the Department of Computer Science
of University of Helsinki, Finland, proudly presents
-----------------------------------------------------------------------
SGREP v0.99 - A tool for searching files for structured patterns
-----------------------------------------------------------------------
INTRODUCTION
------------
If you have ever wondered how to
o Locate only TITLE and H1 .. H9 elements from HTML documents
o Remove all <FONT> tags from an HTML document
o Rename all B elements to STRONG elements
o Find out how many FIG elements there are under SUBPARA
elements but not under PARA elements in your SGML file
o Print out the TITLE elements from a set of HTML documents
in which word 'SGML' is mentioned more than 12 times, or
which contain word SGML inside H1 or H2 elements.
o Find out mail senders of mail messages from a set of mail
files, which contain word 'SGML' in the subject line, do
not contain 'HTML' in the body of the mail, are sent in year
1996 and are not sent from address flame@hot.com
then sgrep is a tool for you.
Sgrep (structured grep) is a tool for searching text files and
filtering text streams for structured criteria. Sgrep implements
a query language based on so called region expressions.
Like grep, sgrep can be used for any kind of text files. However it
is most useful for text files containing some kind of structured text.
A file containing structured text could be defined as a file, which
obeys some syntax. Examples of structured text files are SGML, HTML,
C, Tex and mail files.
ENVIRONMENT
-----------
Sgrep needs a Unix-like system to run. It has been tested on the following
platforms:
SunOS 5.4 sparc
Linux 1.3.85 alpha
Linux 1.2.13 intel, a.out binaries
Linux 1.2.13 intel, elf binaries
HP-UX 9000/735
OSF1 alpha
It has been reported to run also on
SGI/Irix 5.2
A macro preprocessor is most useful as a front-end to sgrep.
The authors use m4, and the delivery package contains example macro files
written for m4. However, a C-preprocessor or some other program could also
be used instead of m4.
COPYRIGHT
---------
Sgrep is distributed under the GNU General Public License.
WHERE CAN I FIND IT ?
---------------------
We have put up some WWW-pages on sgrep at
http://www.cs.helsinki.fi/~jjaakkol/sgrep.html
In the WWW-pages you will also find the queries, which solve the
problems above.
Source for sgrep can be downloaded from
ftp://ftp.cs.helsinki.fi/pub/Software/Local
Sorry, there are no binary distributions (yet).
Send mail to jjaakkol@cs.helsinki.fi, if you have a problem, which you
cannot solve yourself.
CREDITS
-------
Sgrep was created by Jani Jaakkola (jjaakkol@cs.helsinki.fi) and Pekka
Kilpeläinen (kilpelai@cs.helsinki.fi).
We wish to thank professor Heikki Mannila for suggesting us to design
and implement sgrep.
Sgrep is based upon the paper "An algebra for structured text search and
framework for its implementation" by C. L. A. Clarke, G. V. Cormack and
F. J. Burkowski. The Computer Journal, 38(1):43-56, 1995.
A preliminary version of their paper is available from
ftp://cs-archive.uwaterloo.ca/cs-archive/CS-94-30
However, sgrep is not a strict implementation of the language of Clarke,
Cormack and Burkowski. Unlike their language, sgrep is able to deal
with nested regions, e.g., lists within lists (within lists ..).
Enjoy !