Members of the W3C XML Query Working Group and XSL Working Group have released two initial public working drafts for Full-Text Search. XQuery and XPath Full-Text Requirements and XQuery and XPath Full-Text Use Cases have been produced as part of the W3C XML Activity. "Full-Text Search" in this context involves "an extension to the XQuery/XPath language. It provides a way to query text which has been tokenized, i.e., broken into a sequence of words, units of punctuation, and spaces. Tokenization enables functions and operators which work with the relative positioning of words (e.g., proximity operators). Tokenization also enables functions and operators which operate on a part or the root of the word (e.g., wildcards, stemming)." The Requirements document specifies (initially) that: XQuery/XPath Full-Text functions must operate on instances of the XQuery/XPath Data Model; Full Text need not be designed as an end-user UI language; while XQuery/XPath Full-Text may have more than one syntax binding, one query language syntax must be convenient for humans to read and write while XQuery/XPath Full-Text may have more than one syntax binding, one query language syntax must be expressed in XML in a way that reflects the underlying structure of the query; if XQuery/XPath Full-Text supports search within names of elements and attributes, then it must distinguish between element content and attribute values and names of elements and attributes in any search. The Use Cases document "illustrates important applications of full-text querying within an XML query language. Each use case exercises a specific functionality relevant to full-text querying; a Schema and sample input data are provided. The full-text queries in these use cases are performed on text which has been tokenized." The W3C working groups welcome public comments on the draft documents and open issues.
Bibliographic information
XQuery and XPath Full-Text Requirements. W3C Working Draft 14-February-2003. Edited by Stephen Buxton (Oracle Corp) and Michael Rys (Microsoft). Version URL: http://www.w3.org/TR/2003/WD-xmlquery-full-text-requirements-20030214/. Latest version URL: http://www.w3.org/TR/xmlquery-full-text-requirements/.
XQuery and XPath Full-Text Use Cases. W3C Working Draft 14-February-2003. Edited by Sihem Amer-Yahia (AT&T Labs) and Pat Case (US Library of Congress). Version URL: http://www.w3.org/TR/2003/WD-xmlquery-full-text-use-cases-20030214/. Latest version URL: http://www.w3.org/TR/xmlquery-full-text-use-cases. Also available in XML format.
Full-Text Search Functionality
According to the initial Requirements working draft, XQuery/XPath Full-Text must provide, in the first release, the minimum set of Full-Text functionality that is useful:
- single-word search
- phrase search
- support for stopwords
- single character suffix
- 0 or more character suffix
- 0 or more character prefix
- 0 or more character infix
- proximity searching (unit: words)
- specification of order in proximity searching
- combination using AND
- combination using OR
- combination using NOT
- word normalization, diacritics
- ranking, relevance
Full-Text Use Cases: Overview
Use cases developed in the working draft include:
- Use Case "WORD": Word and Phrase Queries
- Use Case "ELEMENT": Queries on XML Elements and Attributes
- Use Case "STOP-WORD": Queries Ignoring and Overriding Stop Words
- Use Case "CHARACTER-MANIPULATION": Queries Manipulating Normalized Characters and Tokenized Words, Spaces, and Punctuation
- Use Case "WILDCARD": Character Wildcard (Prefix, Infix, Suffix) and Word Wildcard Queries
- Use Case "STEMMING": Word Stemming Queries
- Use Case "THESAURUS": Queries Which Use Thesauri, Dictionaries, and Taxonomies
- Use Case "BOOLEAN": Or, And, and Not Queries
- Use Case "DISTANCE": Queries on Distance Relationships Including Proximity, Window, Sentence, and Paragraph Queries
- Use Case "ADVANCED-WORD": Advanced Word and Phrase Queries
- Use Case "SCORE": Queries Unique to Score
- Use Case "STRUCTURE": Queries using XPath Axes
- Use Case "IGNORE": Queries Ignoring Tags and Content
- Use Case "COMPOSABILITY": Queries Illustrating Composability of Full-Text with Other XQuery Functionality
- Use Case "COMPLEX": Complex Queries
Status
Both WDs are draft/provisional. With respect to the Requirements: "This document is a work in progress. It contains many open issues, and should not be considered to be fully stable... At this stage in the life of the document, these requirements should be read as suggestions only: the issues associated with the requirements are to be discussed and resolved by the relevant Working Groups. This format provides a firm basis for the Working Groups to set the direction of the work on XQuery/XPath Full-Text, and to compare existing proposals. Once the issues are resolved and this Requirements document is finalized, it will be easier to define the functionality of XQuery/XPath Full-Text and its integration with XQuery and/or XPath."
With respect to the Use Cases document: "The document specifies usage scenarios for full-text queries as part of XML Query and XPath. It contains many open issues, and should not be considered to be fully stable. Vendors who wish to create preview implementations based on this document do so at their own risk. While this document reflects the general consensus of the working groups, there are still controversial areas that may be subject to change... The document supplements the XML Query Use Cases which can be found in the W3C XML Query Use Cases."
Principal references:
- XQuery and XPath Full-Text Requirements. W3C Working Draft 14-February-2003.
- XQuery and XPath Full-Text Use Cases. W3C Working Draft 14-February-2003.
- Comments: send email to the W3C XPath/XQuery [Query and Transform] mailing list; see the archives.
- W3C XML Query Working Group
- W3C XSL Working Group
- "XML and Query Languages" - Main reference page.