Cover Pages Logo SEARCH
Advanced Search
ABOUT
Site Map
CP RSS Channel
Contact Us
Sponsoring CP
About Our Sponsors

NEWS
Cover Stories
Articles & Papers
Press Releases

CORE STANDARDS
XML
SGML
Schemas
XSL/XSLT/XPath
XLink
XML Query
CSS
SVG

TECHNOLOGY REPORTS
XML Applications
General Apps
Government Apps
Academic Apps

EVENTS
LIBRARY
Introductions
FAQs
Bibliography
Technology and Society
Semantics
Tech Topics
Software
Related Standards
Historic

Text Analysis Tools for XML documents: a Web application


Date:      Wed, 29 Mar 2000 19:10:45 -0600
From:      Alexander Nakhimovsky <sasha@CS.COLGATE.EDU>
To:        Humanist Discussion Group <humanist@lists.Princeton.EDU>
Subject:   Text Analysis Tools for XML documents: a Web application

An announcement: Text Analysis Tools for XML documents: a Web application

The release, last November, of the XSL and XPath Recommendations created a new range of possibilities for text-analysis tools. Since January, a project at Colgate University in the US has been developing a set of tools with the following design goals:

  • the tools are available over the network as a Web application;

  • the tools are DTD independent: the user interface is constructed automatically on the basis of the document's DTD;

  • the queries that the tools can process use XPath to express structural query conditions and Regular Expressions to describe the text patterns of the query;

  • the tools are extensible: if XSLT cannot do a query, it can be relegated to an extension function written in a general-purpose programming language (Java most easily);

  • secondary documents, such as concordances, frequency counts, inverted indices and so on, are kept as XML documents, optimized for query processing but also available for printing and display.

We now have an early version of the tools and a tutorial on how to use them, both to be found at:

     http://csproj.colgate.edu/TextTools.htm

Our main purpose in posting this announcement is to get feedback: what other functionality is needed? how can the user interface be improved? We are interested in collaborating with an ongoing project to try out ideas. There are email addresses at the end of this message. Eventually, we would like to make this an open source project.

The tutorial uses a very simple DTD (Jon Bosak's play.dtd), and a single text, The Merchant of Venice. However, the program is DTD-independent. The next version of the tutorial will use TEI Light and provide instructions on how to use the program with a DTD of your own.

Both the program and the tutorial have been prepared by Karthik Jayaraman, following initial suggestions by Alexander Nakhimovsky. Karthik (kjayaraman@mail.colgate.edu) is a senior undergraduate student, and Nakhimovsky (sasha@cs.colgate.edu) is a faculty member in the computer science department at Colgate. We will be giving a paper on our work at XML-Europe in Paris in June. A poster and a software demo will be presented at the ALLC/ACH meeting in Glasgow.


Alexander Nakhimovsky    tel 315-228-7586
Computer Science Dpt     fax 315-228-7004
Colgate University       sasha@cs.colgate.edu or
Hamilton NY 13346        sasha@mail.colgate.edu

Prepared by Robin Cover for The XML Cover Pages archive.


Globe Image

Document URL: http://xml.coverpages.org/colgateXML20000329.html