The Cover PagesThe OASIS Cover Pages: The Online Resource for Markup Language Technologies
SEARCH | ABOUT | INDEX | NEWS | CORE STANDARDS | TECHNOLOGY REPORTS | EVENTS | LIBRARY
SEARCH
Advanced Search
ABOUT
Site Map
CP RSS Channel
Contact Us
Sponsoring CP
About Our Sponsors

NEWS
Cover Stories
Articles & Papers
Press Releases

CORE STANDARDS
XML
SGML
Schemas
XSL/XSLT/XPath
XLink
XML Query
CSS
SVG

TECHNOLOGY REPORTS
XML Applications
General Apps
Government Apps
Academic Apps

EVENTS
LIBRARY
Introductions
FAQs
Bibliography
Technology and Society
Semantics
Tech Topics
Software
Related Standards
Historic
Created: February 17, 2003.
News: Cover StoriesPrevious News ItemNext News Item

W3C Publishes Working Draft Specifications for Full-Text Search.

Members of the W3C XML Query Working Group and XSL Working Group have released two initial public working drafts for Full-Text Search. XQuery and XPath Full-Text Requirements and XQuery and XPath Full-Text Use Cases have been produced as part of the W3C XML Activity. "Full-Text Search" in this context involves "an extension to the XQuery/XPath language. It provides a way to query text which has been tokenized, i.e., broken into a sequence of words, units of punctuation, and spaces. Tokenization enables functions and operators which work with the relative positioning of words (e.g., proximity operators). Tokenization also enables functions and operators which operate on a part or the root of the word (e.g., wildcards, stemming)." The Requirements document specifies (initially) that: XQuery/XPath Full-Text functions must operate on instances of the XQuery/XPath Data Model; Full Text need not be designed as an end-user UI language; while XQuery/XPath Full-Text may have more than one syntax binding, one query language syntax must be convenient for humans to read and write while XQuery/XPath Full-Text may have more than one syntax binding, one query language syntax must be expressed in XML in a way that reflects the underlying structure of the query; if XQuery/XPath Full-Text supports search within names of elements and attributes, then it must distinguish between element content and attribute values and names of elements and attributes in any search. The Use Cases document "illustrates important applications of full-text querying within an XML query language. Each use case exercises a specific functionality relevant to full-text querying; a Schema and sample input data are provided. The full-text queries in these use cases are performed on text which has been tokenized." The W3C working groups welcome public comments on the draft documents and open issues.

Bibliographic information

XQuery and XPath Full-Text Requirements. W3C Working Draft 14-February-2003. Edited by Stephen Buxton (Oracle Corp) and Michael Rys (Microsoft). Version URL: http://www.w3.org/TR/2003/WD-xmlquery-full-text-requirements-20030214/. Latest version URL: http://www.w3.org/TR/xmlquery-full-text-requirements/.

XQuery and XPath Full-Text Use Cases. W3C Working Draft 14-February-2003. Edited by Sihem Amer-Yahia (AT&T Labs) and Pat Case (US Library of Congress). Version URL: http://www.w3.org/TR/2003/WD-xmlquery-full-text-use-cases-20030214/. Latest version URL: http://www.w3.org/TR/xmlquery-full-text-use-cases. Also available in XML format.

Full-Text Search Functionality

According to the initial Requirements working draft, XQuery/XPath Full-Text must provide, in the first release, the minimum set of Full-Text functionality that is useful:

  • single-word search
  • phrase search
  • support for stopwords
  • single character suffix
  • 0 or more character suffix
  • 0 or more character prefix
  • 0 or more character infix
  • proximity searching (unit: words)
  • specification of order in proximity searching
  • combination using AND
  • combination using OR
  • combination using NOT
  • word normalization, diacritics
  • ranking, relevance

Full-Text Use Cases: Overview

Use cases developed in the working draft include:

  1. Use Case "WORD": Word and Phrase Queries
  2. Use Case "ELEMENT": Queries on XML Elements and Attributes
  3. Use Case "STOP-WORD": Queries Ignoring and Overriding Stop Words
  4. Use Case "CHARACTER-MANIPULATION": Queries Manipulating Normalized Characters and Tokenized Words, Spaces, and Punctuation
  5. Use Case "WILDCARD": Character Wildcard (Prefix, Infix, Suffix) and Word Wildcard Queries
  6. Use Case "STEMMING": Word Stemming Queries
  7. Use Case "THESAURUS": Queries Which Use Thesauri, Dictionaries, and Taxonomies
  8. Use Case "BOOLEAN": Or, And, and Not Queries
  9. Use Case "DISTANCE": Queries on Distance Relationships Including Proximity, Window, Sentence, and Paragraph Queries
  10. Use Case "ADVANCED-WORD": Advanced Word and Phrase Queries
  11. Use Case "SCORE": Queries Unique to Score
  12. Use Case "STRUCTURE": Queries using XPath Axes
  13. Use Case "IGNORE": Queries Ignoring Tags and Content
  14. Use Case "COMPOSABILITY": Queries Illustrating Composability of Full-Text with Other XQuery Functionality
  15. Use Case "COMPLEX": Complex Queries

Status

Both WDs are draft/provisional. With respect to the Requirements: "This document is a work in progress. It contains many open issues, and should not be considered to be fully stable... At this stage in the life of the document, these requirements should be read as suggestions only: the issues associated with the requirements are to be discussed and resolved by the relevant Working Groups. This format provides a firm basis for the Working Groups to set the direction of the work on XQuery/XPath Full-Text, and to compare existing proposals. Once the issues are resolved and this Requirements document is finalized, it will be easier to define the functionality of XQuery/XPath Full-Text and its integration with XQuery and/or XPath."

With respect to the Use Cases document: "The document specifies usage scenarios for full-text queries as part of XML Query and XPath. It contains many open issues, and should not be considered to be fully stable. Vendors who wish to create preview implementations based on this document do so at their own risk. While this document reflects the general consensus of the working groups, there are still controversial areas that may be subject to change... The document supplements the XML Query Use Cases which can be found in the W3C XML Query Use Cases."

Principal references:


Hosted By
OASIS - Organization for the Advancement of Structured Information Standards

Sponsored By

IBM Corporation
ISIS Papyrus
Microsoft Corporation
Oracle Corporation

Primeton

XML Daily Newslink
Receive daily news updates from Managing Editor, Robin Cover.

 Newsletter Subscription
 Newsletter Archives
Bottom Globe Image

Document URI: http://xml.coverpages.org/ni2003-02-17-b.html  —  Legal stuff
Robin Cover, Editor: robin@oasis-open.org