The Cover PagesThe OASIS Cover Pages: The Online Resource for Markup Language Technologies
SEARCH | ABOUT | INDEX | NEWS | CORE STANDARDS | TECHNOLOGY REPORTS | EVENTS | LIBRARY
SEARCH
Advanced Search
ABOUT
Site Map
CP RSS Channel
Contact Us
Sponsoring CP
About Our Sponsors

NEWS
Cover Stories
Articles & Papers
Press Releases

CORE STANDARDS
XML
SGML
Schemas
XSL/XSLT/XPath
XLink
XML Query
CSS
SVG

TECHNOLOGY REPORTS
XML Applications
General Apps
Government Apps
Academic Apps

EVENTS
LIBRARY
Introductions
FAQs
Bibliography
Technology and Society
Semantics
Tech Topics
Software
Related Standards
Historic
Created: February 01, 2002.
News: Cover StoriesPrevious News ItemNext News Item

LuceneXML Package Supports Structure-Aware Searching of XML Documents.

A communiqué from Eliot Kimber announces the availability of a LuceneXML package and companion LuceneClient package which support indexing of XML documents in a way that enables structure-aware search and retrieval. LuceneXML represents "the initial result of an experiment in using the Apache Lucene package; the implementation is incomplete but sufficient to demonstrate the approach and to enable testing." Jakarta Lucene is Java-based, high-performance, full-featured text search engine suitable for full-text search. The LuceneXML package "provides a manager class (XMLSandRManager) that exposes factory methods for creating XML indexers and searchers. Using the XML indexer, you can add XML documents to a Lucene index. The XML searcher provides convenience methods for submitting XML queries to Lucene... The LuceneClient application lets you index XML documents and submit queries against Lucene indexes."

The lucene_xml package description: "The indexing approach used is to index each element as a separate Lucene document. For element, its directly-contained PCDATA content, its tagname, DOM tree location, ancestor list, and attributes are indexed. Each of these 'element docs' is related to the original XML document by the XML document's 'docid' (e.g., its fully-qualified filename, URL, or repository ID). The XMLIndexer class exposes one method: indexNewDocument(). This method takes the path of an XML document and attempts to index it. The document must be a valid XML document (e.g., you can open the document with IE5). In this implementation, the file path is used as the docid stored in the index. The method returns an IndexingMetrics object, which contains timing and data size information about the document indexed, including the Lucene-specific time, DOM-specific time, number of elements, total nodes processed, and total text content indexed..."

Principal references:


Hosted By
OASIS - Organization for the Advancement of Structured Information Standards

Sponsored By

IBM Corporation
ISIS Papyrus
Microsoft Corporation
Oracle Corporation

Primeton

XML Daily Newslink
Receive daily news updates from Managing Editor, Robin Cover.

 Newsletter Subscription
 Newsletter Archives
Bottom Globe Image

Document URI: http://xml.coverpages.org/ni2002-02-01-a.html  —  Legal stuff
Robin Cover, Editor: robin@oasis-open.org