Text Analysis Tools for XML documents: a Web application
Date: Wed, 29 Mar 2000 19:10:45 -0600 From: Alexander Nakhimovsky <sasha@CS.COLGATE.EDU> To: Humanist Discussion Group <humanist@lists.Princeton.EDU> Subject: Text Analysis Tools for XML documents: a Web application
An announcement: Text Analysis Tools for XML documents: a Web application
The release, last November, of the XSL and XPath Recommendations created a new range of possibilities for text-analysis tools. Since January, a project at Colgate University in the US has been developing a set of tools with the following design goals:
the tools are available over the network as a Web application;
the tools are DTD independent: the user interface is constructed automatically on the basis of the document's DTD;
the queries that the tools can process use XPath to express structural query conditions and Regular Expressions to describe the text patterns of the query;
the tools are extensible: if XSLT cannot do a query, it can be relegated to an extension function written in a general-purpose programming language (Java most easily);
secondary documents, such as concordances, frequency counts, inverted indices and so on, are kept as XML documents, optimized for query processing but also available for printing and display.
We now have an early version of the tools and a tutorial on how to use them, both to be found at:
http://csproj.colgate.edu/TextTools.htm
Our main purpose in posting this announcement is to get feedback: what other functionality is needed? how can the user interface be improved? We are interested in collaborating with an ongoing project to try out ideas. There are email addresses at the end of this message. Eventually, we would like to make this an open source project.
The tutorial uses a very simple DTD (Jon
Both the program and the tutorial have been prepared by Karthik Jayaraman, following initial suggestions by Alexander Nakhimovsky. Karthik (kjayaraman@mail.colgate.edu) is a senior undergraduate student, and Nakhimovsky (sasha@cs.colgate.edu) is a faculty member in the computer science department at Colgate. We will be giving a paper on our work at XML-Europe in Paris in June. A poster and a software demo will be presented at the ALLC/ACH meeting in Glasgow.
Alexander Nakhimovsky tel 315-228-7586 Computer Science Dpt fax 315-228-7004 Colgate University sasha@cs.colgate.edu or Hamilton NY 13346 sasha@mail.colgate.edu
Prepared by Robin Cover for The XML Cover Pages archive.