Cover Pages: W3C Publishes Working Draft Specifications for Full-Text Search.

Members of the W3C XML Query Working Group and XSL Working Group have released two initial public working drafts for Full-Text Search. XQuery and XPath Full-Text Requirements and XQuery and XPath Full-Text Use Cases have been produced as part of the W3C XML Activity. "Full-Text Search" in this context involves "an extension to the XQuery/XPath language. It provides a way to query text which has been tokenized, i.e., broken into a sequence of words, units of punctuation, and spaces. Tokenization enables functions and operators which work with the relative positioning of words (e.g., proximity operators). Tokenization also enables functions and operators which operate on a part or the root of the word (e.g., wildcards, stemming)." The Requirements document specifies (initially) that: XQuery/XPath Full-Text functions must operate on instances of the XQuery/XPath Data Model; Full Text need not be designed as an end-user UI language; while XQuery/XPath Full-Text may have more than one syntax binding, one query language syntax must be convenient for humans to read and write while XQuery/XPath Full-Text may have more than one syntax binding, one query language syntax must be expressed in XML in a way that reflects the underlying structure of the query; if XQuery/XPath Full-Text supports search within names of elements and attributes, then it must distinguish between element content and attribute values and names of elements and attributes in any search. The Use Cases document "illustrates important applications of full-text querying within an XML query language. Each use case exercises a specific functionality relevant to full-text querying; a Schema and sample input data are provided. The full-text queries in these use cases are performed on text which has been tokenized." The W3C working groups welcome public comments on the draft documents and open issues.

Bibliographic information

XQuery and XPath Full-Text Requirements. W3C Working Draft 14-February-2003. Edited by Stephen Buxton (Oracle Corp) and Michael Rys (Microsoft). Version URL: http://www.w3.org/TR/2003/WD-xmlquery-full-text-requirements-20030214/. Latest version URL: http://www.w3.org/TR/xmlquery-full-text-requirements/.

XQuery and XPath Full-Text Use Cases. W3C Working Draft 14-February-2003. Edited by Sihem Amer-Yahia (AT&T Labs) and Pat Case (US Library of Congress). Version URL: http://www.w3.org/TR/2003/WD-xmlquery-full-text-use-cases-20030214/. Latest version URL: http://www.w3.org/TR/xmlquery-full-text-use-cases. Also available in XML format.

Full-Text Search Functionality

According to the initial Requirements working draft, XQuery/XPath Full-Text must provide, in the first release, the minimum set of Full-Text functionality that is useful:

single-word search

phrase search

support for stopwords

single character suffix

0 or more character suffix

0 or more character prefix

0 or more character infix

proximity searching (unit: words)

specification of order in proximity searching

combination using AND

combination using OR

combination using NOT

word normalization, diacritics

ranking, relevance

Full-Text Use Cases: Overview

Use cases developed in the working draft include:

Use Case "WORD": Word and Phrase Queries

Use Case "ELEMENT": Queries on XML Elements and Attributes

Use Case "STOP-WORD": Queries Ignoring and Overriding Stop Words

Use Case "CHARACTER-MANIPULATION": Queries Manipulating Normalized Characters and Tokenized Words, Spaces, and Punctuation

Use Case "WILDCARD": Character Wildcard (Prefix, Infix, Suffix) and Word Wildcard Queries

Use Case "STEMMING": Word Stemming Queries

Use Case "THESAURUS": Queries Which Use Thesauri, Dictionaries, and Taxonomies

Use Case "BOOLEAN": Or, And, and Not Queries

Use Case "DISTANCE": Queries on Distance Relationships Including Proximity, Window, Sentence, and Paragraph Queries

Use Case "ADVANCED-WORD": Advanced Word and Phrase Queries

Use Case "SCORE": Queries Unique to Score

Use Case "STRUCTURE": Queries using XPath Axes

Use Case "IGNORE": Queries Ignoring Tags and Content

Use Case "COMPOSABILITY": Queries Illustrating Composability of Full-Text with Other XQuery Functionality

Use Case "COMPLEX": Complex Queries

Status

Both WDs are draft/provisional. With respect to the Requirements: "This document is a work in progress. It contains many open issues, and should not be considered to be fully stable... At this stage in the life of the document, these requirements should be read as suggestions only: the issues associated with the requirements are to be discussed and resolved by the relevant Working Groups. This format provides a firm basis for the Working Groups to set the direction of the work on XQuery/XPath Full-Text, and to compare existing proposals. Once the issues are resolved and this Requirements document is finalized, it will be easier to define the functionality of XQuery/XPath Full-Text and its integration with XQuery and/or XPath."

With respect to the Use Cases document: "The document specifies usage scenarios for full-text queries as part of XML Query and XPath. It contains many open issues, and should not be considered to be fully stable. Vendors who wish to create preview implementations based on this document do so at their own risk. While this document reflects the general consensus of the working groups, there are still controversial areas that may be subject to change... The document supplements the XML Query Use Cases which can be found in the W3C XML Query Use Cases."

Principal references:

XQuery and XPath Full-Text Requirements. W3C Working Draft 14-February-2003.
XQuery and XPath Full-Text Use Cases. W3C Working Draft 14-February-2003.
Comments: send email to the W3C XPath/XQuery [Query and Transform] mailing list; see the archives.
W3C XML Query Working Group
W3C XSL Working Group
"XML and Query Languages" - Main reference page.


SEARCH \| ABOUT \| INDEX \| NEWS \| CORE STANDARDS \| TECHNOLOGY REPORTS \| EVENTS \| LIBRARY