Cover Pages: XML and Query Languages

Last modified: May 17, 2005

Core Standards

XML and Query Languages

Recent XML Query Specifications
XML/XSL/TEI Framework
W3C XML Query Working Group
QL '98 - W3C Query Languages Workshop
Query Software
Articles, News, Papers, and Other Resources

Recent XML Query Specifications

[July 12, 2004] W3C Releases Public Working Draft for Full-Text Searching of XML Text and Documents. W3C has published an initial Public Working Draft for XQuery 1.0 and XPath 2.0 Full-Text. Created as a joint specification by the W3C XML Query Working Group and the XSL Working Group as part of the XML Activity, this new draft specification defines a language that extends XQuery 1.0 and XPath 2.0 with full-text search capabilities. As defined by the draft, "full-text queries are performed on text which has been tokenized, i.e., broken into a sequence of words, units of punctuation, and spaces." New full-text search facility is implemented by extending the XQuery and XPath languages to support a new "FTContainsExpr" expression and a new "ft:score" function. Expressions of the type FTSelection are composed of:(1) words or combinations of words that are the search strings to be found as matches; (2) Match options such as case sensitivity or an indication to use stop words; (3) Boolean operators that allow composition of an FTSelection from simpler FTSelections; (4) Positional constraints such as indication of match distance or window. The new Full-Text Working Draft endeavors to meet search requirements specified in an updated companion draft XQuery 1.0 and XPath 2.0 Full-Text Use Cases. This document provides use cases designed to "illustrate important applications of full-text querying within an XML query language. Each use case exercises a specific functionality relevant to full-text querying. An XML Schema and sample input data are provided; each use case specifies a query applied to the input data, a solution in XQuery, a solution in XPath (when possible), and the expected results." Full-text query designed as an extension of XQuery and XPath will support several kinds of searches not possible using simple substring matching. It allows precision querying of XML documents containing "highly-structured data (numbers, dates), unstructured data (untagged free-flowing text), and semi-structured data (text with embedded tags). Language-based query and token-based searches are also supported; for example, find all the news items that contain a word with the same linguistic stem as the English word "mouse" — which finds occurrences of both "mouse" and "mice" together with possessive forms.

[May 06, 2003] W3C Releases Ten Working Drafts for XQuery, XSLT, and XPath. Through collaborative and coordinated effort between W3C's XML Query Working Group and XSL Working Group, a collection of ten updated working draft specifications has been issued for public review and comment. XQuery 1.0 and XPath 2.0 Data Model and XQuery 1.0 and XPath 2.0 Functions and Operators are in Last Call WD status through June 30, 2003. XPath 2.0, XSLT 2.0, XQuery 1.0, and other specifications are dependent upon the data model, functions, and operators defined in these two WDs. Other working drafts include XQuery 1.0 and XPath 2.0 Formal Semantics, XML Path Language (XPath) 2.0, XSL Transformations (XSLT) Version 2.0, XQuery 1.0: An XML Query Language, XML Query Use Cases, XML Query (XQuery) Requirements, XSLT 2.0 and XQuery 1.0 Serialization, and XQuery and XPath Full-Text Requirements. The W3C XSL Working Group "develops and maintains three main specifications: XSL Transformations (XSLT) for transforming XML documents, XSL Formatting Objects (XSL/FO) for formatting XML documents, and, jointly with the XML Query Working Group, XPath 2.0 and associated documents. XPath is used to address, point into, or match portions of XML documents. Since November 2002, the XSL Working Group has been working closely with the XML Query Working Group on XPath 2, the corresponding Data Model, and on XSLT 2.0. The W3C XML Query working group was chartered "to provide flexible query facilities to extract data from real and virtual documents on the Web; the XML Query language should encompass selecting whole documents or components of documents based on specified selection criteria as well as constructing XML documents from selected components. The WG's goal is to produce a formal data model for XML documents with Namespaces (based on the XML Infoset), a set of query operators on that data model (a so-called algebra), and a query language with a concrete canonical syntax based on the proposed operators." Both working groups are part of the W3C XML Activity.

[February 17, 2003] W3C Publishes Working Draft Specifications for Full-Text Search. Members of the W3C XML Query Working Group and XSL Working Group have released two initial public working drafts for Full-Text Search. XQuery and XPath Full-Text Requirements and XQuery and XPath Full-Text Use Cases have been produced as part of the W3C XML Activity. "Full-Text Search" in this context involves "an extension to the XQuery/XPath language. It provides a way to query text which has been tokenized, i.e., broken into a sequence of words, units of punctuation, and spaces. Tokenization enables functions and operators which work with the relative positioning of words (e.g., proximity operators). Tokenization also enables functions and operators which operate on a part or the root of the word (e.g., wildcards, stemming)." The Requirements document specifies (initially) that: XQuery/XPath Full-Text functions must operate on instances of the XQuery/XPath Data Model; Full Text need not be designed as an end-user UI language; while XQuery/XPath Full-Text may have more than one syntax binding, one query language syntax must be convenient for humans to read and write while XQuery/XPath Full-Text may have more than one syntax binding, one query language syntax must be expressed in XML in a way that reflects the underlying structure of the query; if XQuery/XPath Full-Text supports search within names of elements and attributes, then it must distinguish between element content and attribute values and names of elements and attributes in any search. The Use Cases document "illustrates important applications of full-text querying within an XML query language. Each use case exercises a specific functionality relevant to full-text querying; a Schema and sample input data are provided. The full-text queries in these use cases are performed on text which has been tokenized." The W3C working groups welcome public comments on the draft documents and open issues.

[August 20, 2002] W3C Working Groups Update Specifications for XSLT, XML Query, and XPath. Seven revised working draft specifications have been published by the W3C Working Groups for XML Query, XSL, and XML Schema. Several working drafts represent collaborative work by the XSL and XML Query Working Groups, which are jointly responsible for XPath 2.0, a language derived from both XPath 1.0 and XQuery; the XPath 2.0 and XQuery 1.0 Working Drafts are generated from a common source. The updated working drafts include: XSL Transformations (XSLT) Version 2.0; XML Path Language (XPath) 2.0; XML Query Use Cases; XQuery 1.0: An XML Query Language; XQuery 1.0 and XPath 2.0 Formal Semantics; XQuery 1.0 and XPath 2.0 Data Model; XQuery 1.0 and XPath 2.0 Functions and Operators. Comments on these drafts may be sent to the W3C Query and Transform mailing list ('public-qt-comments') set up for public feedback on W3C specifications published by the XML Query and XSL Working Groups. [Full context]

[March 26, 2002] W3C Working Draft for XQuery 1.0 Formal Semantics. A public review W3C Working Draft has been published for XQuery 1.0 Formal Semantics, providing the formal semantics of W3C XQuery as specified in XQuery 1.0: A Query Language for XML. "XQuery is a computer language designed to return information to users or their agents. It is applicable to XML data sources from documents to databases, search engines, and object repositories. Much of the WD document is the result of joint work by the W3C XML Query and XSL Working Groups, which are jointly responsible for XPath 2.0, a language derived from both XPath 1.0 and XQuery. The new document defines the formal semantics for XQuery 1.0, and a future version of the document will also define the formal semantics for XPath 2.0. XQuery is a powerful language, capable of selecting and extracting complex patterns from XML documents and of reformulating them into results in arbitrary ways. This document defines the semantics of XQuery by giving a precise formal meaning to each of the constructions of the XQuery specification in terms of the XQuery data model. The document assumes that the reader is already familiar with the XQuery language. Two important design aspects of XQuery are that it is functional (built from expressions, called queries, rather than statements) and typed. 'Types' can be imported from one or several XML Schemas (typically describing the documents that will be processed), and the XQuery language can then perform operations based on these types. In addition, XQuery also supports a level of static type analysis. This means that the system can perform some inference on the type of a query, based of the type of its inputs... These two aspects play an important role in the XQuery Formal Semantics." [Full context]

[August 27, 2001] New W3C Working Draft: XQuery 1.0 and XPath 2.0 Functions and Operators Version 1.0. W3C has published a new working draft document which describes constructors, operators, and functions that are used in XQuery 1.0 and XPath 2.0. The draft "was produced through the efforts of a joint task force of the W3C XML Query Working Group and the W3C XML Schema Working Group and a second joint task force of the W3C XML Query Working Group and the W3C XSL Working Group. The specification defines basic operators and functions on the datatypes defined in XML Schema Part 2: Datatypes for use in XQuery, XPath, and other related XML standards. It also discusses operators and functions on nodes and node sequences as defined in the XQuery 1.0 and XPath 2.0 Data Model for use in XQuery, XPath, and other related XML standards. Where XML Schema Part 2 defines a number of primitive and derived datatypes, collectively known as built-in datatypes, the new working draft defines operations on those datatypes. The document defines a number of constructors and other functions that apply to one or more data types; each constructor and function is defined by specifying its signature, a description of each of its arguments, and its semantics. In addition, examples are given of many constructors and functions to illustrate their use. The WD is generally unconcerned with the specific syntax with which the constructors, operators, and functions will be used, and focuses instead on defining the semantics of them as precisely as feasible." [Full context]

[June 12, 2001] XML Syntax for XQuery 1.0 (XQueryX) Published as W3C Working Draft. The W3C XML Query Working Group has released a first public working draft specifying an XML syntax for the W3C XML Query language (XQuery). The draft supplies a W3C XML Schema for the XQuery XML Syntax as well as an XML DTD. The working group intends that the XQueryX DTD and XML Schema "will track the XQuery 1.0 syntax and will be changed as often as the XQuery 1.0 syntax is changed in future Working Drafts." The syntax specification in 'XQueryX' "is a close representation of the abstract syntax found in Appendix B of the XQuery Working Draft; for each production in the abstract syntax, the authors created an equivalent XML representation. XQueryX is thus an XML representation of an XQuery. [Because] it was created by mapping the productions of the XQuery abstract syntax directly into XML productions, the result is not particularly convenient for humans to read and write; however, it is easy for programs to parse, and because XQueryX is represented in XML, standard XML tools can be used to create, interpret, or modify queries." Concurrent with the release of the new XQueryX draft, the XML Query Working Group has published updated four related working drafts: XQuery 1.0, the XML Query Use Cases, XQuery 1.0 and XPath 2.0 Data Model, and XQuery 1.0 Formal Semantics. W3C XQuery "is designed to be a small, easily implementable language in which queries are concise and easily understood. It is also flexible enough to query a broad spectrum of XML information sources, including both databases and documents." [Full context]

[February 15, 2001] W3C XML Query Working Group Releases XQuery Working Draft and Related Documents. Provisional update. The W3C XML Query Working Group has published an initial public working draft specification for XQuery: A Query Language for XML. XQuery "is designed to be broadly applicable across all types of XML data sources. XQuery is a functional language in which a query is represented as an expression; it is derived from an XML query language called Quilt, which in turn borrowed features from several other languages." Supporting specifications also released include: (1) XML Query Data Model "which is the foundation of XML Query Algebra; together, these two documents will provide a precise semantics for the XML Query Language. (2) The XML Query Algebra defines a a formal algebraic model for an XML query language. (3) XML Query Use Cases "specifies usage scenarios for the W3C XML Query data model, algebra, and query language." XML Query Requirements articulates "goals, requirements, and usage scenarios for the W3C XML Query data model, algebra, and query language." See further details.

[February 04, 2000] The W3C XML Query Working Group published a working draft document XML Query Requirements on January 31, 2000.

[September 14, 1999] Preliminary information on the W3C XML Query Working Group is available from the W3C XML Activity Page.

The W3C QL'98 - Query Languages Workshop represents one of several efforts to coordinate design activities that focus upon "querying" XML documents, document collections, and document webs. Features already designed within XSL, XPointer, XLink, DOM, DSSSL, and related specifications provide mechanisms for specifying locations/addresses, for tree traversal, and so forth. A current challenge, arguably, is to unify some of these expression/querying sub-languages as a basis for building generalized query facilities that are applicable to a broad range of requirements within different user communities. No formal XML 'query language activity' was established by the W3C when the W3C XML Working Groups were re-chartered in late 1998, but several have predicted that the W3C will eventually charter a new (XML) Query Language activity and/or working group.

[May 06, 1999] In connection with the publication of a working draft version of the XSL Transformations (XSLT) Specification (W3C Working Draft 21-April-1999) it was formally announced that the W3C XSL WG and the XML Linking WG "have agreed to unify XSLT expressions and XPointers. A common core semantic model for querying has been agreed upon, and this draft follows this model (see 6.1 Location Paths). However, further changes particularily in the syntax will probably be necessary. . ."

[April 14, 1999] Sharon Adler, as XSL Co-chair, posted a message to the xsl-list indicating that W3C working groups are indeed moving toward 'XPointer and XSL pattern unification'. "To all of you who wish to see this unification work go forward, I am writing this to let you all know that the work is underway. Last month both working groups voted to pursue this effort even if there was an impact to the schedule. Therefore, we will take a delay to our PR schedule. . ."

XML/XSL/TEI Framework

Extensible Style Language (XSL) - XSL select patterns are used to test for matching elements/nodes
XML Linking Language - Both the XML Linking Language (XLink) and (more particularly) the XML Pointer Language (XPointer) -- have relevance to 'query/retrieval'.
TEI Extended Pointers

W3C XML Query Working Group

XML Query Project (XQuery)
Mail Archives for the W3C public list '[email protected]'. A public mailing list on query languages, including (but not limited to) discussion on the XML-Query project. Subscribed users may post to this list.
Mail Archives for W3C public list '[email protected]'. This QT (Query and Transform) list is for public feedback on the following W3C specifications published by the XML Query and XSL Working Groups: XQuery 1.0, XSLT 2.0, XPath 2.0, XQuery 1.0 and XPath 2.0 Data Model, XQuery 1.0 and XPath 2.0 Functions and Operators Version 1.0.
Contact: Massimo Marchiori (W3C Contact for XML Query)

[December 04, 2000] W3C XML Query Working Group Publishes XML Query Algebra Working Draft. The first W3C public working draft for The XML Query Algebra has been released for review. Reference: W3C Working Draft 04-December-2000, latest draft http://www.w3.org/TR/query-algebra/, edited by Peter Fankhauser (GMD-IPSI), Mary Fernández (AT&T Labs - Research), Ashok Malhotra (IBM), Michael Rys (Microsoft), Jérôme Siméon (Bell Labs, Lucent Technologies), and Philip Wadler (Avaya Communication). The document "introduces the XML Query Algebra as a formal basis for an XML query language." The development work "builds on long standing traditions in the database community. In particular, we have been inspired by systems such as SQL, OQL, and nested relational algebra (NRA). We have also been inspired by systems such as Quilt, UnQL, XDuce, XML-QL, XPath, XQL, and YaTL. We give citations for all these systems below. In the database world, it is common to translate a query language into an algebra; this happens in SQL, OQL, and NRA, among others. The purpose of the algebra is twofold. First, the algebra is used to give a semantics for the query language, so the operations of the algebra should be well-defined. Second, the algebra is used to support query optimization, so the algebra should possess a rich set of laws. Our algebra is powerful enough to capture the semantics of many XML query languages, and the laws we give include analogues of most of the laws of relational algebra. It is also common for a query language to exploit schemas or types; this happens in SQL, OQL, and NRA, among others. The purpose of types is twofold. Types can be used to detect certain kinds of errors at compile time and to support query optimization. DTDs and XML Schema can be thought of as providing something like types for XML. The XML Query algebra uses a simple type system that captures the essence of XML Schema Structures. The type system is close to that used in XDuce. On this basis, the XML Query algebra is statically typed. This allows to determine and check the output type of a query on documents conforming to an input type at compile time rather than at runtime. Compare this to the situation with an untyped or dynamically typed query language, where each individual output has to be validated against a schema at runtime, and there is no guarantuee that this check will always succeed..." A tutorial introduction in the WD 'The Algebra by Example' introduces the main features of the algebra, using familiar examples based on accessing a database of books. In Appendix A 'The XML Query Data Model', the authors present a formal mapping relating the algebra to the XML Query Data Model. [cache]

[August 15, 2000] The W3C XML Query Working Group has published a revised working draft specification for XML Query Requirements. Reference: W3C Working Draft 15-August-2000, edited by Don Chamberlin (IBM Almaden Research Center), Peter Fankhauser (GMD-IPSI), Massimo Marchiori (W3C/MIT/UNIVE), and Jonathan Robie (Software AG). The document "specifies goals, requirements, and usage scenarios for the W3C XML Query data model, algebra, and query language." The goal of the XML Query Working Group is "to produce a data model for XML documents, a set of query operators on that data model, and a query language based on these query operators. The data model will be based on the W3C XML Infoset, and will include support for Namespaces. Queries operate on single documents or fixed collections of documents. They can select whole documents or subtrees of documents that match conditions defined on document content and structure, and can construct new documents based on what is selected." The working draft outlines several usage scenarios which are "intended to be used as design cases during the development of XML Query, and should be reviewed when critical decisions are made. These usage scenarios should also prove useful in helping non-members of the XML Query Working Group understand the intent and goals of the project: (1) Human-readable documents: Perform queries on structured documents and collections of documents, such as technical manuals, to retrieve individual documents, to generate tables of contents, to search for information in structures found within a document, or to generate new documents as the result of a query. (2) Data-oriented documents: Perform queries on the XML representation of database data, object data, or other traditional data sources to extract data from these sources, to transform data into new XML representations, or to integrate data from multiple heterogeneous data sources. The XML representation of data sources may be either physical or virtual; that is, data may be physically encoded in XML, or an XML representation of the data may be produced. (3) Mixed-model documents: Perform both document-oriented and data-oriented queries on documents with embedded data, such as catalogs, patient health records, employment records, or business analysis documents. (4) Administrative data: Perform queries on configuration files, user profiles, or administrative logs represented in XML. (5) Filtering streams Perform queries on streams of XML data to process the data in a manner analogous to UNIX filters. This might be used to process logs of email messages, network packets, stock market data, newswire feeds, EDI, or weather data to filter and route messages represented in XML, to extract data from XML streams, or to transform data in XML streams. (6) Document Object Model (DOM) Perform queries on DOM structures to return sets of nodes that meet the specified criteria. (7) Native XML repositories and web servers Perform queries on collections of documents managed by native XML repositories or web servers. (8) Catalog search Perform queries to search catalogs that describe document servers, document types, XML schemas, or documents. Such catalogs may be combined to support search among multiple servers. A document-retrieval system could use queries to allow the user to select server catalogs, represented in XML, by the information provided by the servers, by access cost, or by authorization. (9) Multiple syntactic environments Queries may be used in many environments. For example, a query might be embedded in a URL, an XML page, or a JSP or ASP page; represented by a string in a program written in a general-purpose programming language; provided as an argument on the command-line or standard input; or supported by a protocol, such as DASL or Z39.50."

[May 11, 2000] The W3C XML Query Working Group has published a first public working draft of the XML Query Data Model. Reference: W3C Working Draft 11-May-2000; edited by Mary Fernandez (AT&T Labs) and Jonathan Robie (Software AG). Document abstract: "This document defines the W3C XML Query Data Model, which is the foundation of the W3C XML Query Algebra; the XML Query Algebra will be specified in a future document. Together, these two documents [will] provide a precise semantics of the XML Query Language." Description: "This document defines the W3C XML Query Data Model, which is the foundation of the W3C XML Query Algebra; the XML Query Algebra will be specified in a future document. Together, these two documents [will] provide a precise semantics of the XML Query Language. . ." [cache]

[February 04, 2000] A public working draft for XML Query Requirements has been published by the W3C XML Query Working Group as part of the W3C XML Activity. References: W3C Working Draft 31-January-2000, edited by Peter Fankhauser (GMD-IPSI), Massimo Marchiori (W3C and MIT), and Jonathan Robie (Software AG). The draft document "specifies goals, usage scenarios, and requirements for the W3C XML Query data model, algebra, and query language." The goal of the XML Query Working Group "is to produce a data model for XML documents, a set of query operators on that data model, and a query language based on these query operators. The data model will be based on the W3C XML Information Set, and will include support for Namespaces. Queries operate on single documents or fixed collections of documents. They can select whole documents or subtrees of documents that match conditions defined on document content and structure, and can construct new documents based on what is selected." Comments on the working draft are invited, and may be sent to the W3C mailing list, where they will publicly archived.

[September 1999] An XML Query Working Group was announced in the Fall of 1999 in connection with W3C's publication of the XML Phase III activity. "Following the W3C Query Languages Workshop (QL'98), the mission of the XML Query working group is to provide flexible query facilities to extract data from real and virtual documents on the Web. The XML Query Working Group plans to develop requirements in 1999 and continue with design work in 2000. Paul Cotton of IBM is the chair of the XML Query Working Group. W3C contact: Massimo Marchiori."

[September 20, 1999] Public W3C mailing list '[email protected]' announced by Massimo Marchiori: http://lists.w3.org/Archives/Public/www-ql/.

[March 05, 1999] "The Quest for an XML Query Standard." By Lisa Rein. From XML.com (March 02, 1999). "Is there a need for a query language that speaks XML? Learn what went on when XML experts got together to talk about the need for an XML query standard at QL'98. [This] W3C workshop on query languages for XML produced a number of interesting proposals for extracting information more efficiently from XML documents. In terms of next steps, the W3C must decide whether they will simply incorporate feedback into the existing XML working group activity? Or will a separate Query Language Working Group or XML Query Language Working Group be formed? One of the difficulties is the somewhat delicate placement of the work within the W3C's architecture, since so many different groups will depend heavily on its deliverables. At this point, the W3C is gathering input from its members as to how the query effort should proceed."

QL'98 - W3C Query Languages Workshop

QL'98 - The Query Languages Workshop - W3C Workshop. December 3-4, 1998. DoubleTree Guest Suites Hotel, Boston, Massachussets. The scope of the W3C Workshop on Query Languages is "to begin the discussion of query languages for the Web (with particular emphasis on querying XML and RDF), of the needed requirements for such query language(s), and of proposing solutions." Sample position papers are referenced below. See also: the local Conference Page.

"Element Sets: A Minimal Basis for an XML Query Engine." - QL '98 Position Paper. Tim Bray. Proposes "a minimum set of functionality required for an XML query facility to be useful. The argument is that a facility which offers set arithmetic plus inclusion and containment is sufficient to express a high proportion of queries, and lends itself to an efficient implementation."
"Experiences with Information Locator Services." - QL '98 Position Paper. Eliot Christian. The paper "relates experiences in developing and promoting services interoperable with the Global Information Locator Service (GILS) standard that has now been adopted and promoted in many fora worldwide. The author describes example implementations and touches on the strategic choices made in public policy, standards, and technology."
"XQuery: A unified syntax for linking and querying general XML documents." - QL '98 Position Paper. Steven J. DeRose. ". . .proposes a query language syntax for XML documents, called ZQuery. Such a query language has quite different requirements than traditional languages; much more different than is commonly appreciated. Many past proposals have taken a basically relational query language (typically SQL), and modified it by the addition of a few constructs: typically a 'contains' operator and some features for matching strings within text chunks against regexes, or against word-roots. Such features, while needed, are not enough. The hard problem arises because the most basic design principles of relational databses do not hold for XML documents."
"Queries on Links and Hierarchies." - QL '98 Position Paper. Steven J. DeRose, C. Michael Sperberg-McQueen, and Bill Smith. 'November 18, 1998.' "This document briefly sets out functional requirements for languages to characterize and retrieve structure and content in linked hierarchies (such as XML documents). This kind of data poses more complex requirements than some others, because not only content but also structure and linkage must be available to queries." [local archive copy]
"XML-QL: A Query Language for XML." - NOTE-xml-ql-19980819, Submission to the World Wide Web Consortium 19-August-1998. Authors of the NOTE are: Alin Deutsch (University of Pennsylvania), Mary Fernandez (AT&T Labs), Daniela Florescu (INRIA), Alon Levy (University of Washington), and Dan Suciu (AT&T Labs). See also the submission request and W3C Staff Comment.
"Enabling Inferencing." [Query languages for RDF] - QL '98 Position Paper. R.V. Guha, et al. The paper "presents an overview of the query services that might be built on top of XML/RDF data. It does not present a specific proposal for an RDF query language; instead, it argues for a query language that is expressed in terms of the RDF logical data model rather than one particular concrete syntax."
"XML Query Language (XQL)." The paper is authored principally by Jonathan Robie (Texcel, Inc.), Joe Lapp (webMethods, Inc.), and David Schach (Microsoft Corporation), with contributions by Michael Hyman and Jonathan Marsh (both of Microsoft Corporation). The XQL paper proposes to use the XSL pattern language as the basis for a general query mechanism.
"Querying and Transforming XML." David Schach, Joe Lapp, Jonathan Robie. "The XQL Proposal, a superset of the XSL pattern syntax, addresses the information retrieval aspects of queries. This paper describes the benefits of using the XSL transformation language together with the XQL Proposal, to provide an integrated environment for queries and transformations."
"Providing flexible access in a query language for XML." - QL '98 Position Paper. Frank Tompa. "A query language for XML imposes semantics on the encoded data. As such, it forms an integral part of the resulting data model. The properties of the language will therefore promote or inhibit future applications of XML. Among other features, such a language must: 1) support access to data stored in any conforming XML document, respecting the encoding choices made by each application's data designers, and 2) allow for data views to be defined over collections of XML documents. In this paper, we do not advocate specific syntactic constructs. . . we propose that an XML document be modelled as a rooted, directed, ordered, labelled tree."
"Querying XML with Lore." - QL '98 Position Paper. Jennifer Wisdom.

Software

Some sample QL software, much experimental. W3C maintains a list of XQuery implementations.

[June 2004] BumbleBee is "an automated test harness for evaluating XQuery engines and validating queries expressed in the XQuery language. BumbleBee takes the pain and uncertainty out of learning and using XQuery. It starts by letting you immediately put several XQuery engines to the test so you know how they stack up against the XQuery specification. Then it lets you easily write your own tests to continually make sure your XQuery expressions produce reliable results when you upgrade your XQuery engine, try different engines, or otherwise make changes to your queries. BumbleBee is all about push-button regression testing. Indeed, BumbleBee tests act as automated change detectors to eliminate costly XQuery debugging cycles..." See Jason Hunter's article.
[June 03, 2002] "Enosys Software Introduces the First XML-Based Real-Time Enterprise Information Integration Platform. Early Adopters Of Enosys' XML Query (XQuery) Solution Attest to Reduced Cost of Custom Data Integration, Rapid Deployment of New Web Applications, and Improved Customer Service." - "Enosys Software, an XML platform software provider, today introduced the Enosys XQuery-based suite of products, the industry's first XML-based platform to enable real-time information integration inside and outside company firewalls... Enosys Software addresses the need within large enterprises to provide an integrated, real-time view of the vast amounts of data that are distributed across multiple information sources, in the form of relational databases, mainframe applications, legacy systems, documents or spreadsheets. These information sources are frequently owned by multiple departments or even trading partners of the enterprise, making it extremely difficult and expensive to create a single view of relevant data in a time-sensitive manner... The Enosys platform is an advanced Enterprise Information Integration (EII) solution that is based on open Java standards and patent-pending XQuery technology. Unlike other proprietary EII-based solutions, Enosys provides a universal layer for accessing enterprise data and eliminates the need to develop specific stovepipe solutions for each backend enterprise data source and application. Developers simply build queries that retrieve the information end users and applications need--independent of the source, location, and access method of the data-- and deliver the information as reusable XML results or web services components... Enosys Software was founded in 2000 by two experts in the area of XML processing, professors Yannis Papakonstantinou (University of California, San Diego) and Vasilis Vassalos (New York University). Their work on data integration and semi-structured data (XML) started at Stanford, where they were members of the same Computer Science program as founding members of Junglee and Google, and worked alongside Computer Science and database research luminaries Hector Garcia-Molina and Jeff Ullman..."
[May 29, 2002] Oracle XQuery Prototype and Oracle9i Database Release 2 with SQLX and XMLType Support. A communiqué from Steve Muench reports on two XML-related announcements from Oracle. (1) In March 2002, Oracle released a Java XQuery prototype which includes a Java API to XQuery (JXQI) and a command-line interface. This technical preview implementation of the W3C XQuery language with Oracle specific extensions features support "focusing on the 'R' (Relational Data) and the 'XMP' (Experiences and Exemplars) XQuery use cases; it also features an experimental JDBC-style Java API for XQuery as well as a sql() function for using XQuery over SQL query results." Oracle's goal ultimately is to "provide both a SQL-flavored and an XQuery-based query syntax for XML content in Oracle leveraging the same underlying database engine via appropriate query rewriting." (2) Oracle has also announced Oracle9i Release 2, offering significant new "native database support for XML. The new Oracle9i Database Release 2 provides a high-performance, native XML storage and retrieval technology available within Oracle9i Release 2; it fully absorbs the W3C XML data model into the Oracle9i Database, and provides new standard access methods for navigating and querying XML." Enhanced support includes XMLType and related native XML data-management features as well as XML Repository and XML-based content-management features. [Full context]
[May 20, 2002] New XML-Based Inktomi Search Toolkit Combines Keyword and Parametric Search. The Inktomi Search Toolkit has been announced as an innovative OEM solution that "delivers the advanced XML-based retrieval capabilities for finding structured, unstructured, and semi-structured content within enterprise applications to improve application usability and increase end-user productivity. By indexing documents in native XML format and preserving the hierarchy of the data, the Search Toolkit allows you to return the reference to the documents, the actual XML documents or any fragments of the documents." The toolkit "has been built from the ground up to utilize XML as the content mark up language to provide a standards-based query language (W3C XQuery) for retrieval of structured information. In addition, it provides a comprehensive suite of keyword search capabilities. It is available as a multi-threaded server product. For easy integration with the parent application, a Java API is provided for the product, as well as an open, socket-based interface using an XML-based and HTTP-based protocol. The internals of the Search Toolkit were designed to support retrieval across both unstructured content, as well as structured content marked up with XML." [Full context]
[February 01, 2002] LuceneXML Package Supports Structure-Aware Searching of XML Documents. A communiqué from Eliot Kimber announces the availability of a LuceneXML package and companion LuceneClient package which support indexing of XML documents in a way that enables structure-aware search and retrieval. LuceneXML represents "the initial result of an experiment in using the Apache Lucene package; the implementation is incomplete but sufficient to demonstrate the approach and to enable testing." Jakarta Lucene is Java-based, high-performance, full-featured text search engine suitable for full-text search. The LuceneXML package "provides a manager class (XMLSandRManager) that exposes factory methods for creating XML indexers and searchers. Using the XML indexer, you can add XML documents to a Lucene index. The XML searcher provides convenience methods for submitting XML queries to Lucene... The LuceneClient application lets you index XML documents and submit queries against Lucene indexes." [Full context]
[October 30, 2001] TransQuery: XSL Transformations as Query Language. A posting from Evan Lenz Evan Lenz announces the availability of TransQuery, a "flexible set of XSLT conventions and processing model constraints that enable the use of XSLT as a query language over multiple XML documents. TransQuery is an interoperability specification for XML databases, allowing them to use a standard XML query language today -- the XSLT Recommendation from W3C. The purpose of TransQuery is to promote interoperability between XML document management systems and XML databases that use XSLT as their primary data access language. Traditional XSLT processors are designed to process individual XML documents on the fly. They generally require the entire source tree to be loaded into memory. This obviously has some negative performance and scalability implications for large documents. TransQuery addresses interoperability between implementations of a new kind of XSLT processor -- one that instead functions as a query engine over an XML database, thus reversing the common paradigm of known stylesheet and unknown input. The TransQuery Demo is a browser-based demo illustrating use cases for TransQuery; it is hosted by XYZFind Corp. and utilizes the open-source software provided by the SourceForge TransQuery Project. The TransQuery SourceForge Project is home to an open-source implementation of the TransQuery processing model and experimental platform for the TransQuery interface." [Full context]
[August 24, 2001] Toronto XML Server (ToX) Provides Repository for Real and Virtual XML Documents. ToX (The Toronto XML Engine) is a research project of the Database Group in the Department of Computer Science at the University of Toronto. The Toronto XML Server is "a repository for XML data and metadata, which supports real and virtual XML documents. Real documents are stored as files or mapped into relational or object databases, depending on their structuredness; indices are defined according to the storage method used. Virtual documents can be remote documents, defined as arbitrary WebOQL queries, or views, defined as queries over documents registered in the system. The system catalog contains metadata for the documents, especially their schemata, used for query processing and optimization. Queries can range over both the catalog and the documents, and multiple query languages are supported." [Full context]
[August 15, 2001] Updated XML Query Language Demo from Microsoft Supports Latest XQuery Specification. The XML Query Language tool announced by Microsoft in May 2001 has been updated to be conformant to the June 07, 2001 W3C Working Draft specification for XQuery 1.0: An XML Query Language. The development team has also provided a new managed class library containing XQuery classes "that can be programmed against using the beta 2 release of the .Net Frameworks SDK. These classes allow one to run XQuery queries over arbitrary XML documents." Description: "The purpose of the XQuery demo is to enable you to experience the XQuery language and provide feedback on the implementation. Microsoft is committed to supporting the XQuery working group's progress; we will continue to revise this page and the downloadable class library as the XQuery specification develops... Since the demo page is a website, we provide a set of predefined XML documents and disallow the use of user-specified documents for security reasons. In order to execute queries over an arbitrary collection of XML documents you can download the XQuery Demo class library... The demo is implemented in C# and is currently only available via the website. The demo is not meant to give you any indication on how and where XQuery will be implemented in Microsoft products. Its main use is to familiarize the public with XQuery and to gather feedback and requirements for both the W3C working group and our own implementation effort." [Full context]
[July 06, 2001] Software AG Releases XQuery Prototype 'QuiP'. A posting from Jonathan Robie announces the availability of 'QuiP, a W3C XQuery Prototype'. QuiP is Software AG's prototype implementation of XQuery, the W3C XML query language. "QuiP can be used either with text-based XML files or for queries against a Tamino database. QuiP is designed to make it easy to learn and use the XQuery language." QuiP is available on Windows 32 bit platforms, and requires a Java virtual machine version 1.3; it may be downloaded for free. "The QuiP distribution is a good way to get a hands-on grasp of the XQuery language: it conforms to the 7-June-2001 draft of XQuery, and it includes a large number of sample queries and data files, syntax diagrams in the online help, and a GUI. There is also a developer forum that you can use to post comments on the prototype or on the XQuery language; follow the link from the downloads page. In addition to the GUI tool, there is also a command-line version of QuiP. The script file RunQuip.cmd is an example that shows how the command-line interface can be used." [Full context]
sggrep ".. .works like the grep program in searching a file for regular string expressions. However, unlike grep , it is aware of the tree structure of XML files. . ." [The XML Library LT XML]
sgrep-2 - Has an SGML/XML/HTML scanner; from Jani Jaakkola, University of Helsinki
[June 22, 2001] XML Query Engine 1.0. "The 1.0 release of XML Query Engine is now shipping. XML Query Engine is a Java-based search engine component that allows you to search small to medium-size collections of XML documents for full-text content using XQL, a de facto XML query language standard that's nearly identical to the abbreviated form of XPath. The engine is small (roughly 160 kb) and has a straightforward API that let's you wire it in to your own own Java applications using a SAX1 or SAX2 parser of your choice. The engine also has early experimental support for XQuery. There's a free eval version downloadable at the website. The eval version is unsupported but is otherwise identical to the production version, minus the availability of a persistent index feature."
[June 09, 2001] CL-XML Provides Common Lisp Support for XML, XPath, and XQuery. A communiqué from James Anderson reports on a "preliminary re-release of CL-XML which (1) includes not only a validating XML parser/processor, but also XPath and XQuery compilers, (2) supports namespace-aware DTD-based validation, and (3) can claim conformance. CL-XML is a collection of Common LISP modules for data stream parsing and serialization according to the Extensible Markup Language and ancillary standards." The associated Web site provides extensive doumentation for CL-XML, including separate BNF descriptions of the XML, XPath, and XQuery syntax used to generate the parsers. According to the site description: "The processor is intended for use both as a stand-alone XML interface and as an extension to the CL-HTTP server. The XML module implements a conformant, namespace-aware, validating XML processor which instantitiates an Info-Model compatible document model. The processor always incorporates external references. A referenced document definition is instantiated and incorporated in the document instance as an internal document type definition model. The definition is used to effect instance defaulting and typing and to perform in-line document validation. The parser can be invoked with validation enabled or disabled. It can be invoked so as to produce a data instance, a parse tree, or to parse without generating a result. The XMLPath module implements access to document models based on XML Path expressions. It includes an implementation for the XML Path library, an interpreter for paths formulated as S-expressions and, a parser to translate string-encoded expressions into the equivalent S-expression form. The XMLQuery module implements access to document models based on XML Query expressions. These incorporate XML Path expressions to address document elements and extend them with construction operations. The module includes an implementation for the XML Query library, an interpreter for queries formulated as S-expressions and, a parser to translate string-encoded expressions into the equivalent S-expression form. The base CLOS model comprises a class library which implements the XML Query Data Model and presents an Infoset compatiable programming interface." [Full context]
[May 14, 2001] Microsoft Hosts Online XQuery Prototype Application. A posting from Michael Rys (Microsoft Program Manager, SQL Server XML Technologies) announces that the XQuery Prototype demonstrated at XML DevCon in New York has been placed online for public use. "The goal of the prototype implementation is to follow the public working drafts of the W3C XML Query working group while trying to avoid 'inside' knowledge about how something is supposed to work. The prototype currently [2001-05-14] follows the February 15, 2001 W3C XQuery working draft and will be updated to the next working draft within weeks after the next working draft's publication. The web site allows you to formulate XQueries and a subset of a proposed XQuery-compatible data manipulation language and parse and execute the former, but currently only allows one to parse the later. Since the demo is provided via a website, we provide a set of predefined XML documents and disallow the use of user-specified documents for security reasons. In addition, the site offers a set of compliance tests that can be used to check the syntax for the XQuery parser. Since the tests are automatically generated based on the syntax, some of the statements may not have meaningful semantics... The prototype is implemented in C# and is currently only available via the website; future downloadable implementations of XQuery are planned for later technology previews of some of the XML technologies." [Full context]
[May 09, 2001] XCache: XML-Query Caching System (MQP) and XCache Demo. "Xcache is a web-based XML query engine. XML is a new language for the web that allows data to be stored in a specified format in a regular text file. This allows it to be queried much more efficiently than a regular HTML document. The Xcache system not only allows users to query XML documents, but it stores the results from these queries in its cache and uses them to answer future queries. This query caching process allows many queries to be answered much faster than they would be normally since the query engine can retrieve many of the results from its cache rather than having to access the data across the network as in a typical search engine." See demo. See details from the Worcester Polytechnic Insitute (WPI) Database System Research Group (DSRG).
[April 27, 2001] XML Query Engine Provides Initial XQuery Support. A posting from Howard Katz (Fatdog Software) announces the Version 0.99 release of 'XML Query Engine' with with early (0.25) W3C XQuery support. "If you want some introductory hands-on exploration of basic XQuery syntax, a free evaluation version of the engine is available. You can now select either XQL or XQuery for your query language front end. This release provides a first cut at a very limited implementation of the full XQuery grammar. This implementation supports FLWR expressions, element constructors, a limited range of XPath expressions on elements only, simple predicates testing element existence and text equality, and that's about it. The good news is that with the exception of expression lists, FLWRs can be explored in almost full recursive generality, and the features that are in place can be employed against actual data. Here's a sample query: <results> FOR $book IN //book FOR $author IN $book/author WHERE $author/first = 'Dan' RETURN <DanTheMan> $author </DanTheMan> </results> XML Query Engine (XQEngine for short) is a full-text search engine component for XML. It lets you search small to medium-size collections of XML documents for boolean combinations of keywords, much as web-based search engines let you do for HTML. Queries are specified using XQL, a de facto standard for querying XML documents that is nearly identical to the simplified form of XPath. Queries expressed in XQL are much more expressive and powerful than the standard search interfaces available through web-based search engines." Note also the online document by Katz "Introduction to XQuery." [Full context]
[February 15, 2001] From Charles Daringer. "XML parser that supports DOM and Saxon for SQL like searches of XML including arrays." Labat Anderson has developed some code that reads XML forms, Pureedge, XFDL, XML, DOM, SAX, into an array where an itemization of the array is notated from a left to right fashion at the depth of the node count from a left to right fashion. This enables a recursive search of these nodes down to the array level based on the content of a preexisting value of an array. The process reads the XML from stream, file, or table and then forms the array to only the required size of the XML present. Once 'persisted' to an array multiple 'conditional' passes or searches are facilitated in a SQL fashion where a result set can be utilized to formulate the next search. The code is written in Visual Basic 6 and can be called as a DLL which returns the array, or the code can run inside a SERVICE that runs on NT. If there is a requirement to parse xml on a conditional basis down to an array level... where the number of items in an array, or the number of arrays are a runtime uncertainty, than this is the 'fastest' approach. You can contact AL BUONI at Labat Anderson, Inc. for details."
[November 04, 2000] "SODA2 - An XML Semistructured Database System." - "The novel SODA2 architecture facilitates several crucial features which are seldom available in other database systems. The SODA2 query processor is mainly located at the client side. Each query processor contains an internal query translator that maps an query from one language into a SODA2 internal micro-query language. Therefore, SODA2 supports multiple query languages which include XPath expressions, XQL and XML-QL to date...The query interface provides an embedded query interface to allow user developed applications to query SODA2 in the easiest way. As SODA2 query processor contains a query translator and operates internally on a micro-query language and SOM, multiple query languages are supported. Curently SODA2 supports XPath expressions, XQL, and XML-QL. It follows the W3C's DOM recommendations so that non-database applications can be easily built and ported to SODA2 with DOM interface and ignore without its rich database functionalities. Alternatively, advanced users can call more sophisticated functions defined in the SODA2 Object Model (SOM) interface from the lower layer to develop time-critical applications or create customized plug-ins with their own indexing and optimization schemes.
[July 2000] "XMLGet." By Neil Ferguson. Notice posted November 27, 2001 to '[email protected]'. "XMLGet is an efficient, open source query tool for XML documents based on the XQL query language You can download the Java source and binaries for the latest version (1.0) of XMLGet. You can also download the user documentation. A document entitled 'XMLGet -- Testing, Profiling, Benchmarking, and Proposed Future Work' can be downloaded; it describes some of the caveats found in XMLGet during testing, profiles various sections of the XMlGet parser, compares performance against other XQL engines, and makes suggestions for future enhancements to XMlGet." [cache]
NIAGARA. "Niagara can be used for retrieving XML data, querying and monitoring them for some interesting changes. These three functionalities are implemented by its three main components: Search Engine, Query Engine and Trigger Manager. XML is gaining popularity for representing semi-structured data on the net. Additionally it is being used as a medium for data exchange. Self-describing nature of XML data help in doing a better job of retrieving relevant sources, when compared to a plain text file. For example if we are looking for a ship named 'Montreal', then all the current search engine will give us data sources that will most likely contain information about the city named 'Montreal'. With our XML search engine we can return data sources that will contain the name 'Montreal' in the ship element... See the slide presentation.
XPERT - 'Developing Best XML Content Management Tools in Java'. EPERT provides an efficient indexing and allows powerful retrieval with XPath based query language. We have been developing XML content management tools in Java. One of Java software applications is XPERT (XPath based query language Evaluation and Retrieval Tool). It allows different types of XML documents to be indexed together and makes it possible to be retrieved by an XPath based query language. We compress the index and save it to the disk in a very efficient way, keeping the overhead as low as possible. Furthermore, we optimize the query evaluation and allow the fast query processing. We believe that it scales well in large heterogeneous XML collections. You can download XPERT and manage the XML content in an easy and efficient manner..."
[February 15, 2001] XPath Based Query Language Evaluation and Retrieval Tool (XPERT) Supports SAX. Dongwook Shin announced that XPERT (XPath based query language Evaluation and Retrieval Tool) Version 0.5 now supports SAX. "XPERT is now able to index with SAX parsers as well as DOM parsers. With this, you can virtually index any size of XML file no matter how large it is. And you can hook up any Java SAX parser to XPERT as long as it supports SAX version 2. One more advantage is that SAX version takes less memory and index more quickly than DOM version. This Java tool allows different types of XML documents to be indexed together and and to be retrieved in an efficient way using an XPath based query language." [Full context]
[November 03, 2000] XML Query Engine Update. Howard Katz posted an announcement for the availability of XML Query Engine v0.89: "This is a major update that fixes a number of outstanding bugs and adds several new features and optimizations. This version is getting very close to beta territory, since all major features, with the exception of a persisted store, are now in place. Updates include: (1) logical subquery operators 'and' and 'or'; (2) set operators 'union' and 'intersect'; (3) namespace support (4) setDoFullText(boolean) api for turning off element text indexing (speeds up indexing, query retrieval, and drastically reduces index size in appropriate cases); (5) showDocTree(docID) api for quick visualization of element hierarchy (6) simple compound-word matching in element content; (7) attribute content moved into index for improved speed and precision; (8) a number of other optimizations, primarily to improve performance and reduce memory footprint during full-text queries." The XML Query Engine is "a JavaBean component that lets you search your XML documents for element, attribute, and full-text content. It can index multiple documents using a SAX parser of your choice. The index, once built, can be queried using XQL, a de facto standard for searching XML that is, very nearly, a proper subset of XPath. XML Query Engine extends XQL's syntax to provide a full-text capability, something lacking in standard XQL. This lets you say such things as Find me the first paragraph within either a division or a chapter that contains both the words 'xml' and 'xsl' or Give me a list of all elements containing an href attribute which points to a '.com' organization. XML Query Engine is an embeddable component that's callable from your application. It requires some straightforward Java programming to wire the query engine to your front-end code. The engine uses a result-listener architecture to deliver its results: You register an XQL result listener with the engine before calling your first query. Once your query's been resolved, the result-set document is delivered to your listener's results() method. Query results can be delivered in one of three formats. Two of these are XML, one of which is a standard result format, similar in structure to that returned by other XQL vendors, while the other is specialized to return 'navigational metadata' describing the nodes it contains in terms of their location within their originating documents. You can use this metadata to easily re-navigate, via either SAX or DOM, back into the original documents for further post-processing if desired. The third result-set format is CSV, Comma-Separated-Values, for particularly fast and compact result delivery of navigational metadata. XML Query Engine is a work in progress. The current version is fast approaching beta status. I've implemented all the core XQL features necessary to support full-text capability on top of the standard language..."
[October 04, 2000] Fujitsu XLink Processor. Developed by Fujitsu Laboratories Ltd., as "an implementation of XLink and XPointer. This processor supports XML Linking Language (XLink) Version 1.0 Candidate Recommendation. You may use this processor and other included programs without charge for 60 days after the installation. You must read 'LICENSE', before you begin your installation..." Multi-Platform: Developed with Java, this processor can be used on many platforms which support Java Runtime Environment. Support for XLink Ver.1.0CR: This processor supports XLink Ver.1.0CR, which is now being discussed in W3C. XLink/XPointer processing with DOM: This processor works with DOM. This processor can work with any XML processor or parser which can create DOM, on condition that an appropriate interface between this processor and it is implemented. Supported Features: XLink Features: simple-type element and its related attributes extended-type element and its related attributes locator-type element and its related attributes resource-type element and its related attributes arc-type element and its related attributes title-type element, Linkbases. XPointer Features: Bare Names, Child Sequences. Links: demo application; download; license. Contact: [email protected] or Masatomo Goto.
XML-QL: A Query Language for XML ". . . techniques and tools should exist for extracting data from large XML documents, for translating XML data between different schemas (DTD's), for integrating XML data from multiple XML sources, and for transporting large volumes of XML data to clients or for sending queries to XML sources. We have implemented a query language for XML, called XML-QL, to investigate solutions to the problems above. The language has a WHERE-CONSTRUCT clause, similar to SQL's SELECT-WHERE construct, and borrows features of query languages recently developed by the database research community for semistructured data. The implementation is publically available." By Mary Fernandez. See also the XML-QL Users' Guide.
[March 28, 2000] XMLQUERY: An extension of the LTNSL query language to allow more complex queries including operators for finding sequences of XML elements. Now including a prototype version of XMLPERL2 (a transformation system similar to XMLPERL but using the extended XMLQUERY language).
[May 24, 2000] Jonathan Robie (Software AG) recently announced design and development work on 'Quilt: An XML Query Language'. "Quilt is an XML query language designed for queries on heterogeneous data sources, and drawing from the design of XQL, XML-QL, SQL, and OQL. The authors included me, Don Chamberlin (one of the two authors of SQL), and Dana Florescu (well known in the object database community, and one of the authors of XML-QL). The two papers to look at are found here: http://www.almaden.ibm.com/cs/people/chamberlin/quilt.html. Viz., (1) "Quilt: An XML Query Language," by Jonathan Robie, Don Chamberlin, and Daniela Florescu [to be presented at XML Europe, Paris, June 2000], and (2) "Quilt: An XML Query Language for Heterogeneous Data Sources," by Chamberlin/Robie/Florescu [to be presented at WebDB 2000, Dallas, May 2000]. Quilt has no official status in the XML Query WG. The authors are all members of that WG (and one was an editor of both the Requirements WD and the Data Model WD), and we designed the language to meet the requirements of the Query WG. Since the Query WG is not yet working on concrete query languages, it is too early to say how it will respond to this proposal..." From one of the design papers: "As increasing amounts of information are stored in XML, exchanged in XML, and presented as XML through various interfaces, the ability to intelligently query XML data sources becomes increasingly important. The data model of XML is quite different from the data models of traditional databases, and traditional database languages are not well suited to querying XML sources. In addition, XML blurs the distinction between data and documents, allowing documents to be treated as data sources and traditional data sources to be treated as documents. Query languages, including XML query languages, still tend to be designed either for documents or for data. Since XML may represent a rich variety of information coming from many sources and structured in many ways, an XML query language must try to provide the same rich variety in the queries that may be formulated. Quilt is a query language for XML. It originated when the authors attempted to apply XML query languages such as XML-QL, XPath, XQL, YATL, and XSQL to a variety of use cases. We found that each of these languages had strong advantages for some of the queries we examined, but was unable to express other queries we found equally important. Therefore, we chose to take some of the best ideas from these languages, plus some ideas from SQL and OQL, integrating them with a fresh syntactic approach. Our goal was to design a small, implementable language that met the requirements specified in the W3C XML Query Working Group's XML Query Requirements. During our design work, we have adapted features from various languages, carefully assembling them to form a new design -- hence the name 'Quilt'. The resulting language supports queries that draw information from various sources and patch them together in a new form. This is another reason for the name 'Quilt'." Note also in this connection "XML Query Languages: Experiences and Exemplars." The paper "identifies essential features of an XML query language by examining four existing query languages: XML-QL, YATL, Lorel, and XQL. The first three languages come from the database community and possess striking similarities. The fourth comes from the document community and lacks some key functionality of the other three..." Local: cache, XMLEuro, cache, WebDB.
METU-SRDC. "METU-SRDC presents an implementation of XML-QL. XML-QL is originally implemented by AT&T , by mapping the queries into STRU-QL. METU-SRDC's implementation works directly on XML files and contains some advanced features like "Recursive Functions" and "Aggregate Functions" (Sum, Count ...). The extensions to the original specification are explained through examples. Implementation of the package is done in pure Java . JavaCC and IBM's xml parser for java , xml4j were used as auxiliary applications."
"Quilt; An XML Query Language for Heterogeneous Data Source." By Daniela Florescu, Jonathan Robbie, and Don Chamberlin. Proceedings of the workshop on Web and databases (WebDb) in conjunction with SIGMOD'00, Dallas, Texas. [cache]
[August 18, 2000] QuiltParser: JavaCC Grammar for Quilt XML Query Language. "Quilt Parser in java is available for download." From Dongwon Lee. JavaCC parser source code and test queries are available at http://www.cobase.cs.ucla.edu/projects/xpress/quilt/. Description: "QuiltParser is a parser for the Quilt XML query language written with JavaCC as a part of XPRESS project at UCLA / CSD. This small, implementable language has been recently proposed by Robie, Chamberlin, and Florescu; it integrates the advantages of various languages while meeting the W3C's XML Query Requirements. More information on Quilt may be found at http://www.almaden.ibm.com/cs/people/chamberlin/quilt.html. The current implementation is based on the BNF and test examples from this paper (presented at XML Europe, Paris, June 2000).
"Querying XML Data." By Daniela Florescu, Alin Deutsch, Mary Fernandez, Alon Levy, David Maier, and Dan Suciu. In Data Engineering Bulletin Volume 22, Number 3 (1999), pages 27-34. [cache]
[March 28, 2000] 'New Version of Andrei's generic SGML tools' - including XML query tool.
[March 29, 1999] On behalf of GMD-IPSI (German National Research Center for Information Technology) and the XML Competence Center at GMD, Ingo Macherius has announced Java based XQL and W3C-DOM implementations. "The engine consists of two main parts: (1) A persistent implementation of the W3C-DOM, and (2) A full implementation of the XQL language. The XQL engine implements the W3C-QL '98 workshop paper syntax of XQL. It uses a novel indexing algorithm for XML (publication pending), which indexes the document while processing the first query. Subsequent queries to the same document are considerably accelerated.
[Matthew Sergeant:] "there's an implementation of XML-QL in my directory on CPAN for perl users, which needs fixing up a little bit, but it's quite usable (if a little slow). It facilitates the use of perl's regexp syntax for queries as well as the system used by XML-QL, which makes it nice and powerful..."
XRS. Dongwook Shin (Lister Hill Center, National Library of Medicine) recently announced the public availability of XRS: An XML Retrieval Engine. XRS is an XML search engine that is able to retrieve any elements a user wants very effectively. Unlike other XML search engines that get back whole XML documents to you, you can impose conditions on any elements with weights and get back relevant elements efficiently. XRS uses a couple of new techniques that have been recently developed. One of those is the BUS (Bottom Up Scheme) technique developed for indexing and retrieving structured documents efficiently..."
SIM - The Structured Information Manager. The Structured Information Manager (SIM) provides a complete environment with which to develop end-to-end document management solutions. SIM gives superior performance in managing very large, complex stores of data. It stores and indexes SGML, XML, MARC and RTF natively, and also accommodates multimedia formats. Because of the native support for SGML/XML, SIM document management solutions are very efficient and scale up as the document repository grows from tens of gigabytes to hundreds of gigabytes in size. SIM is completely Web enabled. Particular strengths of SIM include sophisticated SGML/XML support, high performance text indexing, powerful Web based application support and native support for international standards such as SGML, XML, RTF, MARC, ODBC and Z39.50. . . . By directly supporting the import and export of documents marked up in SGML/XML, SIM automatically has access to a large range of text processing tools such as text editors, and translators that can convert text between a variety of word processing standards. Storing text in SGML/XML format also means that the SIM retrieval engine has access to the structure of the documents as well as their contents. This structural information can be used to improve the processing of user queries and to determine the best ways to display a document."
Xdex. "XML data and documents incorporate rich layers of metadata that can be leveraged for extremely accurate information categorization and context-based searching. Sequoia's Xdex is a uniquely powerful XML indexing engine that allows you to easily take advantage of this functionality. . ."
TEXTMLServer. "The LEXml Library is a web-ready Intranet library solution for legal documents where you can store and research the full text of your documents. Take the Guided Tour and see why our library is a must for your corporate portal. The LEXml Library is available in a free LITE edition that is time bomb free and fully functional up to 500 documents... IRT (Interactive Repository Technology) is the fruit of over ten years of experience in document indexing and publishing technology. IRT offers a robust COM API designed with XML in mind and therefore offers all the performance and flexibility of XML for indexing and publishing purposes.
[November 24, 1999] From Duane Nickull, XSL List, Wed, 24 Nov 1999 11:36:41 -0800: "The goXML XML Context based Search Engine has a private API for conducting queries and submitting content via XML. The Xml Query Interface (XQI) is available at http://www.goxml.com/xqi.htm. The documentation and source code are yours to examine in hopes of shedding some light on this subject. Also, you may want to use the XML search engine to find content on Unicode characters."
Lorel Query Language - Software is available for download. "The Lore project focuses on defining a declarative query language for XML, developing new technology for interactive searches over XML data, and building an efficient XML query processor. Lore is a database management system (DBMS) for XML, a simple and increasingly popular data model. Nested, tagged data is the essence of XML. . . We have developed the Lorel language specifically for querying XML or other semistructured data. Based on OQL, Lorel provides powerful path traversal operators and makes extensive use of type coercion to help yield "intuitive" results for all queries over XML data. The language is described in this paper."
[March 08, 2000] "Lore: A Database Management System for XML. [INTERNET PROGRAMMING.]" By Roy Goldman, Jason McHugh, and Jennifer Widom. In Dr. Dobb's Journal Volume 25, Issue 4 [#311] (April 2000), pages 76-80. ['Lore is a DBMS designed specifically for XML. In the same way that SQL queries relational DBMSs, Lore provides the query language Lorel for issuing expressive queries over XML data. Additional resources include 'lore.txt' listings.'] For publications and software, see the Lore Web site.
BUS - Bottom Up Scheme of indexing and retrieval for SGML/XML documents. See: "BUS: An Effective Indexing and Retrieval Scheme in Structured Documents." By Dongwook Shin, Hyuncheol Jang, and Honglan Jin [Department of Computer Science, Chungnam National University, Taejon, South Korea]. Pages 235-243 (with 16 references) in Digital Libraries '98. Proceedings of the Third ACM Conference on Digital Libraries (Held June 23-26, 1998). New York, N.Y.: Association for Computing Machinery, 1998. Abstract: "In recent digital library systems or the World Wide Web environment, many documents are beginning to be provided in the structured format, tagged in mark up languages like SGML or XML. Hence, indexing and query evaluation of structured documents have been drawing attention since they enable to access and retrieve a certain part of documents easily. However, conventional information retrieval techniques do not scale up well in structured documents. This paper suggests an efficient indexing and query evaluation scheme for structured documents (named BUS) that minimizes the indexing overhead and guarantees fast query processing at any level in the document structure. The basic idea is that indexing is performed at the lowest level of the given structure and query evaluation computes the similarity at a higher level by accumulating the term frequencies at the lowest level in the bottom up way. The accumulators summing up the similarity play the role of accumulating all the term frequencies of the related part at a certain level. This paper also addresses the implementation of BUS and proves that BUS works correctly. . .
Xtract - a command-line `grep'-like tool for searching XML documents. Just as `grep' returns lines which match your regular expression, so Xtract returns all those sub-trees from XML documents which match a query pattern."
TEXTML Server. "The free TEXTML SERVER Lite is available in our download section. This evaluation edition of our Interactive Repository Technology has all the functionality of the licensed version and is 'Time Bomb Free'. The LEXml Library is a web-ready Intranet library solution for legal documents where you can store and research the full text of your documents. The LEXml Library is available in a free LITE edition that is time bomb free and fully functional up to 500 documents. What is Interactive Repository Technology? IRT is the fruit of over ten years of experience in document indexing and publishing technology. IRT offers a robust COM API designed with XML in mind and therefore offers all the performance and flexibility of XML for indexing and publishing purposes. Read about the benefits and features of IRT in our product section, and see how the TEXTML Server is your ideal tool for creating XML document management solutions..."

Articles and Other Resources

DAV Searching and Locating (DASL)
See references also in "XML and Attribute Grammars."
[May 17, 2005] "DataDirect XQuery Enables XQuery Everywhere. Easy, Powerful Java XQuery Component Simplifies XML and Relational Data Integration — Now Available in Beta." - "DataDirect Technologies, the industry leader in standards-based components for connecting applications to data, and an operating unit of Progress Software Corporation, today announced the availability of DataDirect XQuery, an easily embeddable XQuery implementation for XML applications that need to process both XML and relational data sources. Developers can get started with XQuery and the XQuery API for Java (XQJ) today using the beta release of DataDirect XQuery, the first XQuery component to implement XQJ... Middleware applications such as report generation, Web services, and other Web applications require accessing different XML and relational data sources and then integrating, manipulating, and transforming the data from these sources into a coherent output format. In the past, solutions for working with XML and relational data have been difficult to build, typically requiring applications to spend significant effort navigating and casting data from XML structures, and often requiring too many different languages and software systems which were proprietary or not scalable. DataDirect XQuery simplifies working with XML and relational data together, allowing Java developers to programmatically invoke and process XQuery expressions against any major relational database including Oracle, Microsoft SQL Server and IBM DB2, directly from within their Java applications. Not only does this powerful new approach to writing database-independent XML data integration code require less code than yesterday's solutions — because it is based on XQuery and XQJ standards, it is easier and more intuitive to learn and use, saving developers precious time and money... DataDirect XQuery is the first embeddable component for XQuery that implements the XQuery for Java API (XQJ). The XQuery API for Java (XQJ) is an API designed to support the XQuery language (in the same way that the JDBC API supports the SQL query language). XQJ, which is based on the XQuery Data Model rather than the relational model, allows a Java application to submit XQuery queries to any XML or relational data sources and to process the results. The XQJ standard (JSR 225) is being developed under the Java Community Process and is in the early stages of being defined..."
[May 16, 2005] "Ipedo 'Dual-Core' EII Platform Processes Both SQL and XQuery." By Tom Sullivan. From InfoWorld (May 16, 2005). "Ipedo on Monday launched XIP 4.0 (Extensible Information Platform) and described it as a 'dual-core' EII offering that works with both SQL and the emerging XQuery querying languages. New to Version 4.0 are support for Business Objects XI and Crystal Reports XI, a visual rules GUI, Web services publishing, and the dual querying engines. Integration with Business Objects and Crystal Reports BI tools enables data federated by Ipedo to be available in reports. 'Ipedo looks like one big database, so Business Objects users can point to a number of data sources,' said Tim Matthews, Ipedo co-founder and vice president of marketing. The visual GUI helps users create rules based on data values, such as the ability to send alerts or invoke more detailed analysis when appropriate. XIP 4.0 enables developers to publish views of data as Web services, which the company claims allows organizations to leverage EII within a SOA..." See details in the announcement: "Ipedo Releases New Version of Enterprise Information Integration (EII) Platform. Ipedo XIP 4.0 Provides Integration across Broadest Range of Data and Content with New Dual-Core Query Architecture."
[May 06, 2005] "Debunking XQuery Myths and Misunderstandings." By Frank Cohen. From IBM developerWorks (May 06, 2005). "If you work with XML, Web services, or Service Oriented Architecture (SOA), you will likely benefit from the emerging XML Query (XQuery) standard. XQuery is not even a formally accepted standard, yet dozens of implementations help software architects and developers every day. What began as a standard for querying XML documents now includes the next-generation standards for XML selection (XPath 2), XML serialization, full-text search, and functional XML data modeling. A project of this size is bound to have much myth and misunderstanding that needs to be debunked. Here are some of the more common myths and misunderstandings surrounding XQuery. Frank Cohen details and clarifies many of the myths and misunderstandings that surround XQuery. XQuery shows great promise because it reduces the amount of code you need to write to build services that work with XML. The greater XQuery ecosystem provides a unified way to query XML documents, including XML selection, serialization, full-text search, and functional data modeling. Work continues at the XQuery specification Working Group, and this will lead to even more benefits for software developers who work with XML..."
[May 03, 2005] "Microsoft MVPs Petition for XQuery Support in Whidbey." By John K. Waters. From Application Development Trends (May 03, 2005). "When Microsoft announced in January that it was dropping XQuery support from the next release of the .NET Framework, the company's reasoning seemed sound enough: XQuery will not receive final approval from the World Wide Web Consortium (W3C), the standards body shepherding its development, until early 2006. But some XQuery users are hoping to get Microsoft to reconsider its decision. DataDirect Technologies, a provider of data connectivity components and XML development tools, issued an online petition this week, which it hopes will 'convince Microsoft of the overall importance of supporting XQuery in the .NET framework.' As of Monday, the 'XQuery for All' campaign had garnered a reported 140 petition signatures from members of Microsoft's Most Valuable Professionals (MVPS) program. DataDirect acknowledges that Microsoft hasn't abandoned XQuery altogether. The Redmond software giant is a longtime supporter of the language, currently serves as a member of the W3C XQuery standards committee and has publicly committed to helping to complete the standards work. Other large software vendors, including Oracle and IBM, are supporting XQuery 1.0 in upcoming versions of their enterprise-class products..."
[March 21, 2005] "SQL vs. XML in a Database World." By Joab Jackson and Jonathan Robie. From Government Computer News (March 21, 2005). ['Jonathan Robie is one of the chief authors of XQuery, an SQL-like query language that can be used for searching both XML documents and relational databases. W3C, the international body that creates and maintains Web standards, is now reviewing XQuery as a formal draft.'] Robie on XQuery and Structured Query Language: "SQL is a relational query language, XQuery is an Extensible Markup Language query language. If all you are doing is querying relational databases, then SQL is the language you want. XQuery works best if you're querying XML or a combination of XML and relational sources. XML's logical structure is based on hierarchy and sequence. Two things that relational databases don't do is hierarchy and sequence. Yet Web sites typically get a lot of information from databases, but they don't put two-dimensional tables up on the screen. You create hierarchies for that information. In XML you structure everything with hierarchy and sequence. Let me give you a scenario. Your Web site has a message coming in: Someone wants to ask for the price of some stocks. Now we can join XML documents and various relational tables. The XML documents will identify the person who wants the information. It will give the data range that person is interested in. We join that against a relational database to figure out how the stocks have performed, and we build an XML structure that contains the result... I think that since the standards boards of the International Standards Organization and the American National Standards Institute are in the process of adding all of XQuery into SQL, it is just hard to claim that the SQL community is not accepting XQuery. If you go to the major database conferences, you will see a lot on XML. Every relational database vendor has added XML support. People are very conservative about their database choices. But the flow of information is just as important as the control of information..."
[March 02, 2005] "Getting Started with XQuery." By Bob DuCharme. From XML.com (March 02, 2005). "Although the W3C's XQuery language for querying XML data sources is still in Working Draft status, the recent XML 2004 conference showed that there's already plenty of interest and many implementations. While the Saxon implementation may not scale up as much as the disk-based versions that use persistent indexes and other traditional database features, you can download the free version of Saxon, install it, and use XQuery so quickly that it's a great way to start playing with the language in order to learn about what this new standard can offer you. To illustrate running a query, DuCharme starts with a toy example that demonstrates how to tell Saxon which query to run against which XML, then moves on to examples that show useful queries run against real XML data. Part of the appeal of XQuery to people with more of a traditional database background and less of an XML geek background is that XQuery also offers a more SQL-like syntax. Issuing a query against multiple documents at once is an example of a task that, while not impossible in XSLT, is much easier in XQuery when we use the collection function. You can use 'collection' in XSLT 2.0 as well as in XQuery, because it's one of the XQuery 1.0 and XPath 2.0 Functions and Operators'; its use with XQuery generally allows more concise requests than it does with XSLT. In part two of the article, the author will demonstrate XQuery's ability to sort and aggregate data; it will also show how user-defined functions in queries can expand the possibilities for how you select and use the data in your XML documents with XQuery..."
[February 2005] "XQuery: A New Way to Search." By Caroline Kvitka. From Oracle Magazine (February 2005). "The challenge for XQuery is to search XML documents and pull out the document that you want or the parts of the document that you want efficiently, easily, reliably, and predictably. 'The situation is actually complicated a little bit more by the fact that XML has come to serve different purposes, sometimes at the same time,' says Jim Melton, consulting member of the technical staff at Oracle, co-chair of the W3C's XML Query Working Group, and editor of all parts of the SQL standard. 'XML is used to mark up documents, but it's also used to mark up data to say, 'Here is the employee ID,' 'Here is the employee salary,' and so on. Because XML serves a mixture of purposes, the query language has to be able to deal with the different uses of XML.' From the looks of it, XQuery is standing up to the challenge. Currently a working draft specification of the World Wide Web Consortium (W3C), XQuery was first proposed in 1998. Oracle, IBM, Microsoft, DataDirect Technologies, Bell Labs, and BEA are all active in the development of the specification. In addition, several universities have been involved in solving some of the theoretical data model problems..."
[December 29, 2004] "XQuery's Niche." By Edd Dumbill. From XML.com (December 29, 2004). "At the beginning of December, Roger Costello posed a question that has given rise to a thread that has lasted all month, mentioning (1) that the capabilities provided by XQuery represent a subset of the capabilities provided by XSLT/XPath 2.0, (2) The XQuery syntax is a hybrid -- it has some XML characteristics but is not XML. Norm Walsh notes that It's a non-queryable, non-transformable subset of XSLT 2.0'. Given this, I am wondering: 'What niche XQuery is expecting to fill? It was a timely moment to ask such a thing, as over the course of this year lines of division have been forming over the question of XSLT 2 vs. XQuery. As key implementations emerge, such as Microsoft's, XQuery has been welcomed by those who are mentally incompatible with XSLT's declarative nature. At the same time, XSLT devotees just can't see what the fuss is about, perceiving XQuery as a custom syntax for what XSLT can already do. Jonathan Robie, who given his long experience with XQuery should be well-placed to answer, gave a couple of answers..."
[December 16, 2004] "Priscilla Walmsley on XQuery and XML Schema Technologies." By Ivan Pedruzzi and Priscilla Walmsley [email]. In The Stylus Scoop Newsletter (December 16, 2004), Stylus Studio Developer Network. 'Priscilla Walmsley has been working closely with XML Schema and XQuery for years. She was a member of the W3C XML Schema Working Group from 1999 to 2004, where she served as editor of the second edition of XML Schema Part 0 (Primer). As a result of her work with XML Schema, Ms. Walmsley wrote the respected book Definitive XML Schema for Prentice Hall. She has also been an Observer of the XML Query Working Group for two years. During that time she has written another book, Definitive XQuery, which will be published in 2005. Currently, Ms. Walmsley serves as Managing Director of Datypic, where she specializes in XML- and SOA-related consulting and training. Ivan Pedruzzi, Stylus Studio's Senior Product Architect, and editor of The Stylus Scoop newsletter, caught up with Ms. Walmsley at the XML Conference & Exhibition 2004 (XML 2004) last month, where Ms. Walmsley gave a presentation entitled 'Introduction to XQuery'..." The two chatted about the XQuery buzz, XML Schema, XQJ technologies, and other hot topics in the XQuery development arena.'] Walmsley: "I was immediately attracted to XQuery because it has an intuitive syntax that I enjoy using and stretching to its limits. Having spent many years using SQL, XQuery feels familiar, yet much more powerful. I've enjoyed working with XSLT and XPath 1.0 over the years, but for some of the work I've done they felt like an awkward fit. For a transformation scenario where I'm saying 'every time you get an x element, do this' it works great. But for applications that involve selecting a subset of an XML document, joining it with other data, and performing calculations or manipulating it in some way, I've sometimes felt like XSLT was making me force a square peg into a round hole. XQuery embedded in program code is a great way to reduce (and transform) the set of data you're working with rather than tediously traversing the DOM model of an entire document. In the past I've done this with XPath, but XQuery lets me join multiple data sources easily and sort my results, actions that are not part of XPath. Being a true data-head, I also really like the typing capabilities of XQuery. Some of the advanced functionality of XQuery is more data-oriented, and there are some compelling benefits for using XQuery with XML Schemas..."
[December 16, 2004] "Introduction to XQuery." By Priscilla Walmsley (Datypic) Tutorial given Monday, November 15, 2004 at the XML 2004 Conference & Exposition. 86 slides. "XQuery is a language for querying XML data and documents. This tutorial covers the basics of XQuery from a technical perspective. It will provide attendees with a solid understanding of the syntax and structure of XQuery expressions..." Note also the author's book announcement: Definitive XQuery. (Prentice Hall PTR, Forthcoming 2005, ISBN: 0131013750). Book overview: "Definitive XQuery provides complete coverage of the W3C XQuery 1.0 standard. In addition, it provides the background knowledge in namespaces, schemas, built-in types and regular expressions that is relevant to writing XML queries. The book is designed for query writers who have some knowledge of XML basics, but not necessarily advanced knowledge of XML-related technologies. It can be used both as a tutorial, by reading cover to cover, and as a reference, by using the appendices to locate specific functions and types... [from the interview, cited above:] it is designed to be used as both a tutorial and a reference. It covers the entire XQuery language, including all the overlap with XPath 2.0. The reference part of the book has detailed descriptions and examples of all the built-in functions and types; something that will be useful to both XQuery and XPath 2.0 users. It will probably be out in the third quarter of 2005, depending on when XQuery becomes a Candidate Recommendation...as I was writing the book, I came up with a series of illustrative examples which I call 'useful functions'. What sets them apart from regular examples is that they are likely be used by readers in their own queries. They range from string functions like substring-after-last and last-index-of, to functions that modify element and attribute nodes, such as add-attribute, change-element-namespace, and so on. These functions are not built into XQuery because clearly there is a benefit to keeping the recommendation smaller. But they would be useful to a lot of query authors. I eventually realized that I had hundreds of ideas for these functions — far too many to put in the book. So, I'm working on putting these functions together into a library that will be available through my company..."
[March 10, 2004] "BumbleBee, the XQuery Test Harness." By Jason Hunter. In XML.com (March 10, 2004). "Will XQuery be the key that unlocks a new generation of data and content? Nearly every vendor, from the well-known old guard (IBM, Oracle, and Microsoft) to the plucky upstarts (Cerisent, X-Hive, and Qizx) has expressed their support for XQuery and are actively collaborating in its standardization. Under development by the W3C and in Last Call, XQuery looks poised to become the standard query language by which companies access and manipulate semi-structured data and merge together disparate data and content repositories. Using XQuery can be frustrating, as you're faced with choosing from a variety of XQuery vendors that support different versions and interpretations of the XQuery specification; once you've selected the XQuery engine that's best for you, it can be hard to know if the queries you write today will produce reliable results tomorrow after you upgrade your engine or make changes to your queries. The BumbleBee XQuery test harness addresses these frustrations and takes the pain and uncertainty out of learning and using XQuery. Named because it buzzes around FLWORs, BumbleBee provides a cross-platform, vendor-neutral automated testing environment for XQuery development. In other words, BumbleBee is to XQuery what JUnit is to Java. Write your query, define the expected result, and let the tool do the rest... BumbleBee provides a powerful, portable, vendor-neutral automated test environment for XQuery. With BumbleBee you can automate your regression testing, compare multiple XQuery engines, and learn the language through structured challenges. The latest BumbleBee release, version 1.2, includes support for seven vendors and numerous specification draft releases. The easy-to-write .bee file format allows for quick development of tests, including negative tests and compound tests. Future BumbleBee versions may include a graphical query execution environment and test authoring tool."
[March 19. 2004] "XQuery Normalizer and Static Analyzer." By Deepak M. Srinivasa and Rajeshwari Rajendra (Technology Incubation Center, IBM Software Labs, India). 10 pages (with 9 references). "XQuery Normalizer and Static Analyzer is a tool for processing XQuery expressions. Specifically, the two components of the tool — the Normalizer and the Static Analyzer perform two kinds of processing..." See the references and description also on the IBM alphaWorks web site: "XQuery Normalizer and Static Analyzer can be used to parse, normalize, and find the output type and an XQuery Expression. The XQuery grammar supported in this version is the specification in XQuery 1.0: An XML Query Language, W3C Working Draft 15 November 2002. The parser is used to check whether a given XQuery expression conforms to the specified grammar and is devoid of syntactic errors. As the first step, any application that makes use of XQuery language will need a parser, which is readily available in this package. Further, the result of parsing, that is, the parse tree, will be the input for further processing phases. This parse tree for XQuery expression is given as the XML representation of the XQuery expression by the parser. The normalizer is used to tranform an XQuery expression to a normalized form conforming to the XQuery Core Grammar specified in the XQuery 1.0 and XPath 2.0 Formal Semantics, W3C Working Draft 15 November 2002. See the below questions for more details on normalization. The static type analyzer is used to determine the output type for a given XQuery Expression without actually executing it. Static type analysis checks whether each expression is type-safe, and if so, determines its static type. If the expression is not type-safe, static type analysis yields a type error. For instance, a comparison between an integer value and a string value might be detected as a type error during the static type analysis. In other words, static type analyzer is a semantic checker. Any application required to find the type of an XQuery expression without its execution can make use of this tool..." (FAQ document)
[September 05, 2003] "XQuery from the Experts: Influences on the Design of XQuery. Book Excerpt Explores the Origins of the XML Query Language." By Don Chamberlin (IBM Fellow, Almaden Research Lab). From IBM developerWorks, XML zone. September 03, 2003. Excerpted from Chapter 2, "Influences on the Design of XQuery," in the book XQuery from the Experts: A Guide to the W3C XML Query Language (Addison-Wesley). "Early in its history, the XML Query Working Group confronted the question of whether XML is sufficiently different from other data formats to require a query language of its own. The SQL language is a very well established standard for retrieving information from relational databases and has recently been enhanced with new facilities called 'structured types' that support nested structures similar to the nesting of elements in XML. If SQL could be further extended to meet XML query requirements, developers could leverage their considerable investment in SQL implementations, and users could apply the features of these robust and mature systems to their XML databases without learning a completely new language. Given these incentives, the working group conducted a study of the differences between XML data and relational data from the point of view of a query language: (1) Relational data is 'flat,' organized in the form of a two-dimensional array of rows and columns. In contrast, XML data is 'nested', and its depth of nesting can be irregular and unpredictable... (2) Relational data is regular and homogeneous. Every row of a table has the same columns, with the same names and types. This allows metadata -- information that describes the structure of the data -- to be removed from the data itself and stored in a separate catalog. XML data, on the other hand, is irregular and heterogeneous... (3) Like a stored table, the result of a relational query is flat, regular, and homogeneous. The result of an XML query, on the other hand, has none of these properties. For example, the result of the query Find all the red things may contain a cherry, a flag, and a stop sign, each with a different internal structure... (4) Because of its regular structure, relational data is 'dense' -- that is, every row has a value in every column. This gave rise to the need for a 'null value' to represent unknown or inapplicable values in relational databases. XML data, on the other hand, may be 'sparse'...; (5) In a relational database, the rows of a table are not considered to have an ordering other than the orderings that can be derived from their values. XML documents, on the other hand, have an intrinsic order that can be important to their meaning and cannot be derived from data values. This has several implications for the design of a query language... The significant data model differences summarized above led the working group to decide that the objectives of XML queries could best be served by designing a new query language rather than by extending a relational language. Designing a query language for XML, however, is not a small task, precisely because of the complexity of XML data. An XML 'value,' computed by a query expression, may consist of zero, one, or many items, each of which may be an element, an attribute, or a primitive value. Therefore, each operator in an XML query language must be well defined for all these possible inputs. The result is likely to be a language with a more complex semantic definition than that of a relational language such as SQL..."
[June 12, 2003] IBM and Oracle Submit XQuery API for Java (XQJ) Java Specification Request. A Java Specification Request XQuery API for Java (XQJ) submitted by IBM and Oracle Corporation has been published through the Java Community Process (JCP). The specification design goal is to "develop a common API that allows an application to submit queries conforming to the W3C XQuery 1.0 specification to an XML data source and to process the results of such queries. The design of the API will also take into account precedents established by other JSRs, notably JDBC and JAXP. SQL, developed by INCITS H2 and ISO/IEC JTC 1/SC32/WG3, is the query language supported by many relational DBMSs. JDBC is the Java API that allows an application to submit SQL requests to an RDBMS and process the results of the query. The XQuery API for Java (XQJ) specification relates to XQuery in the same way that JDBC relates to SQL. The XQJ specification may provide the ability to submit XPath 2.0 expressions to an XML data source. It may also allow an application to specify queries using XQueryX, the XML representation of XQuery queries. The final XQJ Specification, Reference Implementation, and Technology Compatibility Kit will be made available on a Royalty-Free basis, with commonly-used disclaimers on warranties on the technologies; a reciprocal license will be required as per Section 5C of the Java Specification Participation Agreement (JSPA)." This JSR has received support from BEA, DataDirect Technologies, IBM, Oracle Corporation, Sun Microsystems, Sybase, and X-Hive Corporation.
[May 16, 2003] "Interactive Web Applications with XQuery." By Ivelin Ivanov. From XML.com (May 14, 2003). ['Ivelin Ivanov on using XQuery to front-end Amazon web services with HTML.'] "In this installment of Practical XQuery I continue to discuss practical uses of XQuery, focusing this time on interactive web page generation. The sample application will display a list of books; based on user input, it will also display detailed information about the selected title. The example exercises an interesting capability of the QEXO XQuery implementation, which is open source software. It allows front-end web applications with custom look and feel to be written very easily, using business logic from remote web services. Amazon.com allows anyone to register and build a collection of favorite titles. The collection can be seen either directly via HTML pages on the web site or accessed via a REST-style web service. In the latter case the XML response contains information about book titles, book image icons, and unique book identifiers. The identifier can be used to access another web service offered by Amazon, which supplies book details, including price, user rating, and so on. To create the example application, I will write two XQuery programs. The first, favbooks.xql, will provide a custom view of my favorite books. The second, bookdetails.xql, will show the details of a chosen book. To get a taste for what is coming, you can play with this example application..." See the open source Qexo: The GNU Kawa implementation of XQuery.
[May 06, 2003] "XQuery Marks the Spot." By Jack Vaughan. In Application Development Trends (May 05, 2003). "XML has emerged in only five years as a startlingly powerful means of handling data. It has also been accompanied by a slew of 'X-centric' helper tools, APIs and standards such as XSLT, XPath and, of late, XQuery. Generally, XML must exist within the context of other software system languages, and so Java developers, .NET developers, Cobol developers, SQL developers and others have had to learn something about this markup language that grew out of SGML, a relatively obscure document-related language. With more developers encountering XML all the time, some will soon look at XQuery as an option for system building. According to the W3C group that is working to specify XQuery, the markup language will build on the idea that a query language using the structure of XML intelligently can express queries across all kinds of data. To cast some light on this technology, we spoke recently with Jonathan Robie, XML program manager at DataDirect Technologies as well as a member of the W3C's XML Query Working Group that is at work on XQuery. Robie is an editor of the XQuery specification and several related specifications. 'In many development environments, people have to work with relational data, XML and data found in objects. These have three very different data models -- three very different ways of representing and manipulating data,' said Robie. 'For XML and relational data, XQuery allows you to work in one common model, an XML-based model that can also be used with XML views of relational data.' Thus, the data architect on a team may soon begin to look at some type of XML model to handle diverse needs. XML is, among other things, hierarchical in structure, and some modelers may seek to exploit this and other attributes. Of course, some critics suggest one model may not turn out to be the answer. But some XML-centric apps may do well with an XML-centric model, Robie suggests. 'Most Web sites have some connection to a database. Many are using XML to transfer data from the database to the Web site. When you want to present hierarchical information to site users, you don't give them a series of tables and ask them to do joins in their mind,' Robie jokes. 'You create a hierarchy on screen as an outline or graphical representation that shows the hierarchy. All the relational databases can give you is a table,' he said..." See references and summaries in the news story "W3C Releases Ten Working Drafts for XQuery, XSLT, and XPath."
[May 01, 2003] "Is XQuery an Omni-Tool?" By Uche Ogbuji. In Application Development Trends (May 01, 2003). "Most builders would scoff at the idea of replacing all of their specialized implements with an omni-tool, but is XML a different story? After all, with XQuery approaching recommendation status, one could argue that XML is about to get its very own omni-tool. XQuery can be a simple XML node access language, largely XPath 1.0 with tweaks and more core functions. It offers SQL-like primitives to process large XML data sets, and is a strongly typed system that offers static and dynamic typing (controversial features I've noted in the past). It also has primitives for generating output XML -- and the lack of separation of input and output facilities is questionable. XQuery does not yet allow XML database updating, but update language proposals are under consideration, and XQuery should soon add them. XQuery is a very important development for several reasons. For one, it comes with a very sophisticated formal model and semantics developed by some of the finest minds in the business. This may seem to be the least-functional attachment to the tool, but it is actually like the oil that keeps the motor from seizing. Given the formal model, a lot of questions about the effectiveness and operation of queries can be answered deterministically. This is important for consistent implementations and advanced techniques. The data model of XQuery is designed to be friendly to optimizers, which should allow for fast implementations. For users of W3C XML Schema Language, XQuery provides a very rich interface to the Post Schema Validation Infoset. XQuery syntax builds on the XPath expression language and adds a few SQL-like primitives. There is also an experimental XML-based syntax for the language, but it has been put aside for the moment. But in covering so many bases, XQuery attempts to set an overall standard for an XML processing model and mechanism prematurely. With its sprawling and ambitious requirements/use cases, XQuery claims territory in almost every task typical for developers using XML... Despite this, XQuery has a formalism of thinking about XML processing that can be selectively adopted by specialized tools. The danger is that too many will associate XQuery too closely as the heart of XML processing, obscuring superior solutions for each case..."
[April 15, 2003] "Processing RSS." By Ivelin Ivanov. From XML.com (April 09, 2003). ['Ivelin Ivanov makes light work of processing RSS files with XQuery. This is the first installment of a new regular column on XML.com, Practical XQuery. Ivelin Ivanov and Per Bothner will be publishing tips on the use of the XQuery language, as well as self-contained example applications.'] "The goal of this article is to demonstrate the use of XQuery to accomplish a routine, yet interesting task; in particular, to render an HTML page that merges RSS news feeds from two different weblogs. RSS has earned its popularity by allowing people to easily share news among and between web sites. And for almost any programming language used on the Web, there is a good selection of libraries for consuming RSS... Readers will benefit from a basic knowledge of the XQuery language; Per Bothner has written an informal introduction to XQuery. Even though XQuery started as an XML-based version of SQL, the language has a very broad application on the Web. In what follows, I will show that XQuery allows RSS feeds to be consumed and processed easily. In fact, we will see that it isn't necessary to use a specialized library. We will utilize only functions of the core language... The fact that XQuery recognizes XML nodes as first-class language constructs, combined with the familiar C-like language syntax, makes it an attractive tool for the problems it was built to solve. It must be noted that although it has a for loop structure, XQuery is a purely functional language. In short, this means that XQuery functions always return the same values given the same arguments. This is an important property of the language, which allows advanced compilation optimizations not possible for C or Java. In the past decade, functional language compilers have shown significant advantages over imperative language compilers. Their unconventional syntax and the inertia of imperative languages keep them under the radar of mainstream development. However, the XQuery team seems to recognize these weaknesses and is making an attempt to overcome them..."
[April 04, 2003] "IBM Fortifying XML Query Language. Big Blue Partnering with Microsoft, Oracle to Boost Spec." By Paul Krill. In InfoWorld (April 04, 2003). "IBM is preparing to advance the XQuery XML query language on two fronts: by submitting with Microsoft a test suite for industry consideration and by working with Oracle on a Java API for the language. IBM and Microsoft on Friday [2003-04-04] plan to submit to the World Wide Web Consortium (W3C) a test suite for the as-yet-unfinished XQuery language, said Nelson Mattos, Distinguished Engineer for IBM in charge of information integration efforts in data management, in San Jose, Calif. The suite is simply called the XQuery Test Suite. "One of the major milestones in establishing [XQuery as a] standard is a test suite that validates if an implementation conforms with the specification," Mattos said. The test suite consists of a series of programs that illustrate the different features in XQuery and checks if a given implementation supports the features in a way defined by the standard, Mattos said. He stressed the significance of XQuery as a mechanism for searching and retrieving XML data. Support for XQuery will occur in products such as databases, including IBM's DB2 database, as well as in applications such as content management, document management, and information integration systems... Final W3C approval of XQuery as a formal recommendation, which is tantamount to being an industry standard, is expected later this year, according to a W3C spokesperson. W3C will be looking for other vendors to submit test suites as well, according to W3C. The test suite from IBM and Microsoft is to be submitted to the W3C XML Query Working Group. The test suite provides a framework for comparing specific implementations of XQuery to the W3C specification, according to Microsoft's Michael Rys, program manager for the SQL Server database, in Redmond, Wash. Microsoft plans to support XQuery in the Yukon release of SQL Server, due in a beta version by June [2003]... Also planned by IBM within a few weeks is formation, along with Oracle, of an expert group within the Java Community Process that would develop a Java API for XQuery to establish a standard way for a Java program to search for documents written in the XML language. IBM and Oracle would deliver the specification. The Java Community Process is an industry mechanism for adding standard technologies to the Java platform. The proposed API, which would be the subject of a Java Specification Request within JCP, would relate to XQuery in the same manner that JDBC relates to SQL, IBM said..." See the W3C XQuery website. General references in "XML and Query Languages."
[April 04, 2003] "Database Heavyweights Weigh In On XML Standard." By Lisa Vaas. In eWEEK (April 04, 2003). "Relational database heavyweights are pushing the XQuery standard for querying XML documents, with IBM and Microsoft Corp. expected to present a test suite for the standard to the W3C on Friday, and Oracle Corp. recently having posted a prototype of the standard on its site. The test suite IBM and Microsoft will present is considered an important milestone in finalizing a standard for querying XML data. If adopted by the W3C, the test suite will be used to check whether an XQuery implementation performs as standards dictate, thus ensuring that a given technology is portable across multiple applications that conform to the standard... IBM has pledged that when the XQuery standard is finalized, the company will plug the search technology into its DB2 database product family. The products that would adopt XQuery include DB2 Information Integrator, a product that grew out of IBM's Xperanto initiative that's designed to unify, integrate and search scattered repositories and formats of historical and real-time information as if they were one database; DB2 Universal Database; and DB2 Content Manager... According to Mattos, WebSphere Business Integrator, the Informix database, and Business Intelligence products such as Intelligent Miner or the Red Brick Warehouse will also support XQuery when it becomes an official W3C recommendation. XML aficionados differ on how much the XQuery standard matters. Timothy Chester, senior IT manager and adjunct faculty member at Texas A&M University, in College Station, Texas, called it an 'important step in a very small sandbox.' Chester uses XML for systems integration but doesn't query XML documents and thinks it's unlikely that many will give up the tried-and-true structured language of relational databases for XML..."
[March 31, 2003] "Case Study: Enabling Low-Cost XML-Aware Searching Capable of Complex Querying." By Brandon Jockman, W. Eliot Kimber, and Joshua Reynolds (ISOGEN International). Paper presented at XML Europe 2002. "There is a common need among XML projects to have fast, reliable, full-text searching of XML documents that can be applied to entire content repositories. For all but the most trivial cases, the solution must allow for complex querying of element content and attributes along with the ability to search for structural relationships among elements. For many use cases the ideal solution would minimize cost by using existing open source components as opposed to costly commercial alternatives. This paper describes the integration of XML-aware indexing and searching components with a full-text search engine. It details the approach used, the results, some alternatives, and important lessons learned along the way..."
[December 26, 2002] "Generating XML and HTML using XQuery." By Per Bothner. From XML.com. December 18, 2002. ['Often perceived mainly as a query language, XQuery can actually be used to generate XML and HTML. Per Bothner provides a worked example, and compares XQuery with XSLT.'] "XQuery is often described as a query language for processing XML databases. But it's also a very nice language for generating XML and HTML, including web pages. In this article we will look at XQuery from this angle. There are many tools for generating web pages. Many of these are based on templates. You write an HTML page, but you can embed within it expressions that get calculated by the web server... In order to demonstrate web page generation with XQuery, I will describe a photo album application. There are lots of such applications around, and while they differ in features, they all have the same basic idea. You throw a bunch of digital images (JPEG files) at the application, and it generates a bunch of web pages. The overview page shows many smaller thumbnail images; if you click on one, you get a bigger version of that image... if you have a task that matches XSLT's strength, by all means use XSLT. However, if you have a task that is a mix of XSLT-style transformation combined with some control logic, consider using XQuery, even for the part of the task that is XSLT-like. The advantage of XSLT over XQuery in applications best suited for XSLT is relatively minor, while the pain of writing more complex logic in XSLT instead of XQuery is considerable. The photo album is an application I first wrote in XSLT, but I was able to easily make significant improvements when I rewrote it in XQuery..."
[December 03, 2002] "XML Takes a Step Forward, Hits Snag on Another Front." By Lisa Vaas. In eWEEK (November 25, 2002). "While support for XML grew last week as IBM released DB2 Universal Database 8 with support for the language, support of a limited query language from a standards group could limit the broad use of XML. With the addition of Extensible Stylesheet Language Transformations, a SQL function for automatic style transformation, DB2 now has about 100 extensions to SQL that are built to support XML data. DB2 has caught up to Microsoft Corp.'s SQL Server 2000 and Oracle Corp.'s Oracle9i in its ability to handle Web services -- and it includes support for a Universal Description, Discovery and Integration registry. More XML support is what DB2 users such as Suppleyes.com Inc. are looking for as they anticipate business partners' and customers' use of XML. Suppleyes.com runs a business-to-business e-commerce system that automates purchasing and inventory management for large ambulatory surgery centers, many of which run DB2... Janet Perna, general manager of IBM Data Management Solutions, in Armonk, N.Y., told eWeek that future XML support includes XQuery -- the XML query language -- and native database support for XML in DB2... But at the World Wide Web Consortium Advisory Committee meeting last week, members confirmed that Version 1.0 of the working draft of XQuery will not include support for full-text search operations. As a result, most vendors of document-oriented XML databases will be forced to maintain their existing approaches to queries, which will limit the short-term usefulness of the proposed specification. Nelson Mattos, an IBM distinguished engineer, said a full-text version, which the W3C has developed in parallel, is still on track. 'One goal of developing it in parallel is that they could publish the XQuery portion without it if there was any delay with the full-text version,' said Mattos, in San Jose, Calif. Analysts say that, as the standard becomes more widely used, it has become imperative for relational DBMS vendors to support XQuery..."
[November 18, 2002] "Enosys Powers New Class Of Enterprise Information Integration Applications. Selection of Enosys as BEA Systems Partner Endorses XQuery. New XQuery Products from Microsoft, IBM and Oracle Mark Growing Demand for XQuery-based EII." - "Enosys Software, an XML-based information integration software provider, today announced a technology partnership with BEA Systems, the latest evidence of the technology industry's adoption of XQuery as the new standard for enterprise information integration (EII). The announcement follows recent news of XQuery-based products from Microsoft, Oracle, and IBM, as the increasing need for information integration catalyzes XQuery adoption. XQuery is a W3C standard that provides a vendor-neutral method for accessing, transforming, and integrating data from disparate sources. According to IDC, the number of discrete data sources storing mission-critical information has increased exponentially, with an average of four dozen applications and 14 databases deployed throughout the typical Fortune 1000 company. With the introduction of the XML standard, data from these systems and applications can now be made available for easy access and integration, and XQuery-based EII is the only solution that leverages the XML standard. Enhanced information access can enable more accurate, timely decision-making, shorten sales cycles, and improve customer service and supply chain effectiveness, among other benefits... Enosys technology has deep roots in research from Stanford and University of California, where query-able real-time XML views of multiple data sources were first proposed and researched by the founders and key members of the Enosys team. Enosys was the first to harness the power of XQuery for enterprise information integration. With an XQuery-based EII offering, users spend significantly less money and less time to achieve real-time integration of data from disparate sources. Enosys also served on the W3C committee that designed and reviewed the XQuery language specifications and was the first to market with an XQuery-compliant enterprise integration server and a hands-on XQuery training course..."
[November 05, 2002] "XQuery: An XML Query Language." By Donald Chamberlin. In IBM Systems Journal Volume 41, Number 4 (2002), pages 597-615 (with 20 references). "The World Wide Web Consortium has convened a working group to design a query language for Extensible Markup Language (XML) data sources. This new query language, called XQuery, is still evolving and has been described in a series of drafts published by the working group. XQuery is a functional language comprised of several kinds of expressions that can be nested and composed with full generality. It is based on the type system of XML Schema and is designed to be compatible with other XMLrelated standards. This paper explains the need for an XML query language, provides a tutorial overview of XQuery, and includes several examples of its use... XQuery expression-types include path expressions, element constructors, function calls, arithmetic and logical expressions, conditional expressions, quantified expressions, expressions on sequences, and expressions on types. XQuery is defined in terms of a data model based on heterogeneous sequences of nodes and atomic values. An instance of this data model may contain one or more XML documents or fragments of documents. A query provides a mapping from one instance of the data model to another instance of the data model. A query consists of a prolog that establishes the processing environment, and an expression that generates the result of the query. Currently, XQuery is defined only by a series of working drafts, and design of the language is an ongoing activity of the W3C XML Query Working Group. The working group is actively discussing the XQuery type system and how it is mapped to and from the type system of XML Schema. It is also discussing full-text search functions, serialization of query results, errorhandling, and a number of other issues. It is likely that the final XQuery specification will include multiple conformance levels; for example, it may define how static type-checking is done but not require that it be done by every conforming implementation. It is also expected that a subset of XQuery will be designated as XPath Version 2.0 and will be made available for embedding in other languages such as XSLT... Just as XML is emerging as an application-independent format for exchange of information on the Internet, XQuery is designed to serve as an applicationindependent format for exchange of queries. If XQuery is successful in providing a standard way to retrieve information from XML data sources, it will help XML to realize its potential as a universal information representation..."
[November 05, 2002] "XTABLES: Bridging Relational Technology and XML." By John E. Funderburk, Gerald Kiernan, Jayavel Shanmugasundaram, Eugene Shekita, and Catalina Wei. In IBM Systems Journal Volume 41, Number 4 (2002), pages 616-641 (with 35 references). "XML (Extensible Markup Language) has emerged as the standard data-exchange format for Internet-based business applications. These applications introduce a new set of data management requirements involving XML. However, for the foreseeable future, a significant amount of business data will continue to be stored in relational database systems. Thus, a bridge is needed to satisfy the requirements of these new XMLbased applications while still using relational database technology. This paper describes the design and implementation of the XTABLES middleware system, which we believe achieves this goal. In particular, XTABLES provides a general framework to create XML views of relational data, query XML views, and store and query XML documents using a relational database system. Some of the novel features of the XTABLES architecture are that it (1) provides users with a single XML query language for creating and querying XML views of relational data, (2) executes queries efficiently by pushing most computation down to the relational database engine, (3) allows users to query seamlessly over relational data and meta-data, and (4) allows users to write queries that span XML documents and XML views of relational data... XTABLES exposes relational data as an XML view and also allows users to view native XML documents as XML views. Users can then query over these XML views using a general-purpose, declarative XML query language (XQuery), and they can use the same query language to create other XML views. Thus, users of the system always work with a single query language and can query seamlessly across XML views of relational data and XML documents. They can also query relational data and meta-data interchangeably. In addition to providing users with a powerful system that is simple to use, the declarative nature of user queries allows XTABLES to perform optimizations, such as view composition and pushing computation down to the underlying relational database system... We believe that the XTABLES system architecture can serve as the foundation for pursuing various avenues of future research. One such area is providing support for emerging XML query language features, such as updates. We have made some initial progress toward this goal by providing the ability to store or 'insert' XML documents into a certain class of XML views. However, the development of a general theory of 'updatable XML views' is an open research problem. Another interesting problem that arises in the context of XML query languages is providing support for information retrieval style queries. These are especially important for querying native XML documents." See also "From Data Management to Information Integration: A Natural Evolution," by Mary Roth and Dan Wolfson (DBTI for e-Business, IBM Silicon Valley Lab, June 2002). [cache]
[April 2002] "Ontology Storage and Querying." By Aimilia Magkanaraki, Grigoris Karvounarakis, Ta Tuan Anh, Vassilis Christophides, and Dimitris Plexousakis. Foundation for Research and Technology Hellas, Institute of Computer Science Information Systems Laboratory. Technical Report No 308. April 2002. 21 pages. Review of RDF tools and APIs. [*NB: "One of the conclusions drawn from this survey is that the majority of the query languages do not yet support the complete set of modeling constructs offered by the above standards. Furthermore, the frontiers between querying and inferring capabilities offered by these languages are not clear. In most cases, inference is limited to a recursive traversal of ontology class/property hierarchies as well as of data paths involving transitive properties. We believe that the target functionality of the proposed query languages still has to be justified with respect to real large-scale Semantic Web applications."] "The necessity for ontology building, annotating, integrating and learning tools is uncontested. However, the sole representation of knowledge and information is not enough. Human information consumers and web agents have to use and query ontologies and the resources committed to them, thus the need for ontology storage and querying tools arises. However, the context of storing and querying knowledge has changed due to the wide acceptance and use of the Web as a platform for communicating knowledge. New languages for querying (meta)data based on web standards (e.g., XML, RDF, Topic Maps) have emerged to enable the acquisition of knowledge from dispersed information sources, while the traditional database storage techniques have been adapted to deal with the peculiarities of the (semi)structured data on the web. The purpose of this chapter is to briefly present and evaluate a set of query languages and associated tools for ontology/resource storage and querying aiming to support large-scale Semantic Web applications, the next evolution step of the Web. This list of languages and tools is by no means complete and the tools presented are indicative of the tendency to provide full storage and query support to web-based ontology/metadata standards, such as RDF, RDFS, Topic Maps, DAML+OIL or the forthcoming Web Ontology Language. Our work in this chapter focuses on the evaluation of querying languages designed for Semantic Web related knowledge representation formalisms rather than general-purpose querying languages (e.g., SQL, Datalog, F-logic). Although it has to be proven in practice, RDF-enabled search technologies have the potential to provide a significant improvement over the current keyword-based engines or theme navigation search, especially when it comes to conceptual browsing and querying. Furthermore, this orientation facilitates the comparison of querying languages and tools, since it provides a common reference base. It should be stressed that our comparison of ontology query languages and tools does not rely on performance figures, since these would require extensive comparative experiments, which go beyond the scope of this work. On the contrary, we present an overview of general system features and query language expressiveness, while providing the interested reader with useful references to additional informative material..." [cache]
[October 18, 2002] "What is XQuery?" By Per Bothner. From XML.com (October 16, 2002). ['Article in XML.com's series of primers on core XML technologies. Per Bothner introduces the W3C's XML query language, XQuery 1.0. XQuery is designed to enable the query and formatting of XML data. Per provides an overview of XQuery's features and resources for learning more.'] "The W3C is finalizing the XQuery specification, aiming for a final release in late 2002. XQuery is a powerful and convenient language designed for processing XML data. That means not only files in XML format, but also other data including databases whose structure -- nested, named trees with attributes -- is similar to XML. XQuery is an interesting language with some unusual ideas. This article provides a high level view of XQuery, introducing the main ideas you should understand before you go deeper or actually try to use it.. The first thing to note is that in XQuery everything is an expression which evaluates to a value. An XQuery program or script is a just an expression, together with some optional function and other definitions. So 3+4 is a complete, valid XQuery program which evaluates to the integer 7. There are no side-effects or updates in the XQuery standard, though they will probably be added at a future date. The standard specifies the result value of an expression or program, but it does not specify how it is to be evaluated. An implementation has considerable freedom in how it evaluates an XQuery program, and what optimizations it does.. XQuery borrows path expressions from XPath. XQuery can be viewed as a generalization of XPath. Except for some obscure forms (mostly unusual 'axis specifiers'), all XPath expressions are also XQuery expressions. For this reason the XPath specification is also being revised by the XQuery committee, with the plan that XQuery 1.0 and XPath 2.0 will be released about the same time... One difference to note between XPath and XQuery is that XPath expressions may return a node set, whereas the same XQuery expression will return a node sequence. For compatibility these sequences will be in document order and with duplicates removed, which makes them equivalent to sets. XSLT is very useful for expressing very simple transformations, but more complicated stylesheets (especially anything with non-trivial logic or programming) can often be written more concisely using XQuery..."
[October 16, 2002] "Combining the Power of XQuery and XSLT. Toward Fulfilling the Promise of XML." By Jim Gan (Ipedo). In XML Journal Volume 03, Issue 10 (October 2002). With 5 figures. "Although XSLT and XQuery are designed to meet different requirements, there are similarities between them... I'll briefly compare XQuery and XSLT in terms of data model, content construct, modularization, and expression power... The XML access data model in the match attribute and select attribute of XSLT instructions is based on the XPath 1.0 data model, which is basically a tree structure with nodes. There are four data types in the XPath 1.0 data model: node-set, string, number, and boolean. There are seven XPath nodes: root node, element, text node, attribute node, processing instruction node, namespace node, and comment node. Note that XPath nodes are slightly different from the familiar W3C DOM nodes. There's no namespace node in DOM, nor is there a CdataSection or EntityReference in XPath nodes. The root node in the XPath node model corresponds to a document in DOM. Element, attribute, processing instruction, comment, and text in the XPath 1.0 model correspond one to one to their counterparts in DOM. XQuery is a strongly typed query language, with a data model based on XPath 2.0. It has document node, element node, attribute node, namespace node, processing instruction, comment, and text node. Document node is essentially the same as root node in XPath 1.0. Each node has a node identity concept and a type, which is derived from the corresponding XML Schema definition of the node. Instead of node set, sequence is introduced in the XQuery data model. A sequence is an ordered collection of zero or more items. An item may be an atomic value or a node. Both data models have the same document order concept. The document order of nodes is defined in order of node appearances in the physical XML document... XSLT and XQuery are two key technologies for XML processing. XSLT is good for transforming a whole XML document into another XML document with different schema or into a formatted output for presentation. XQuery can be used for XML-to-XML transformation too, but not for formatting XML documents. Like SQL to relational data, XQuery itself is a powerful language to intelligently search the XML data stored in XML. Another key feature of XQuery is that it can query various data sources as long as they can be viewed as XML. More and more existing non-XML information is being XMLized into XML, and all XML technologies can be used together for various applications. To fulfill the promise of XML, a complete XML platform is needed to support these technologies so as to gain real business benefits out of them. This includes XML parsing, XML Schema validation, XSLT, XQuery, XML data storage, and indexing. Supporting some XML technologies at the toolkit level, or one XML technology at a time, just isn't good enough..." [alt URL]
[August 26, 2002] "The Query Language TQL." By Giovanni Conforti, Giorgio Ghelli, Antonio Albano, Dario Colazzo, Paolo Manghi, and Carlo Sartiani (Dipartimento di Informatica, Università di Pisa, Pisa, Italy). From among the papers accepted for the Fifth International Workshop on the Web and Databases (WebDB 2002), Madison, Wisconsin - June 6-7, 2002. "This work presents the query language TQL, a query language for semistructured data, that can be used to query XML files. TQL substitutes the standard path-based pattern-matching mechanism with a logic-based mechanism, where the programmer specifies the properties of the pieces of data she is trying to extract. As a result, TQL queries are more 'declarative', or less 'operational', than queries in comparable languages. This feature makes some queries easier to express, and should allow the adoption of better optimization techniques. Through a set of examples, we show that the range of queries that can be declaratively expressed in TQL is quite wide. The implementation of TQL binding mechanism requires the adoption of non-standard techniques, and some of its aspects are still open. In this paper we implicitly report about the current status of the implementation by writing all queries using the version of TQL that has been implemented... Although the language TQL originates from the study of a logic for mobile ambients, for the simplest queries it turns out to be quite similar, in practice, to other XML query languages. However, the expression of queries which involve recursion, negation, or universal quantification, has in TQL a clear declarative nature, while other languages are forced to adopt a more operational approach. All queries presented in this paper are executable in the prototype version of the TQL evaluator, and can be found in the file demo.tql in the standard distribution. The current version of the prototype works by loading all data in main memory, but is already based on a translation into an intermediate TQL Algebra, with logical optimizations carried on both at the source and at the algebraic level. The intermediate algebra works on infinite tables of forests, represented in a finite way, and supports such operations as complement, to deal with negation, coprojection, to deal with universal quantification, several kinds of iterators, to implement the | operator, and a recursion operator. TQL is currently based on a unordered nested multisets data model. The extension of TQL's data model with ordering is an important open issue." TQL can be freely downloaded. [cache]
[March 29, 2002] "IBM Xperanto Demo." March 2002. ['Get a sneak preview of IBM's exciting new standards-based information integration technologies! Xperanto represents IBM's work combining emerging XML and XQuery standards with the power of data integration. This interactive demo shows how a newly-merged bank and financial services company uses XQuery as a single interface to deliver a single view of data to a customer and to a sales representative.] "The IBM Xperanto demo is a technology preview that illustrates how IBM is advancing the state of integration technology with Xperanto, combining XML and the emerging standard, XQuery, with the power of data integration across relational databases, XML documents, flat files, spreadsheets, Web services, and more. The demo financial scenario page of the demo describes common situations for which this technology is a solution. The technology details pages display the queries and demonstrate how IBM integrates query, federation, Web services, and text search technologies using XQuery, the common query language for accessing XML. Using IBM's Xperanto, you can simplify data integration tasks for the new breed of Web and XML applications that require delivering a complete enterprise view of customers, partners, and services to improve customer service, supply chain management, and enterprise decision-making..." See also (1) "Meet the Experts: Jim Kleewein talks about the Xperanto Technology Demo"; (2) "Xperanto, Bridging Relational Technology and XML"; this second article describes the design and implementation of an XML middleware system to create XML views of relational data, query XML views and store/query XML documents using a relational database system.
[March 15, 2002] "XPERANTO: Bridging Relational Technology and XML." From International Business Machines Corporation, DB2 Developer Domain. By Catalina Fan, John Funderburk, Hou-in Lam, Jerry Kiernan, and Eugene Shekita (IBM Almaden Research Center, San Jose, CA 95120) and Jayvel Shanmugasundaram (Cornell University). [March 2002.] 9 pages. ['The cutting edge of data management research! The XPERANTO research project enables XML-based applications to leverage relational database technology by using XML views of existing relational data.'] "XML has emerged as the standard data-exchange format for Internet-based business applications. These applications introduce a new set of data management requirements involving XML. However, for the foreseeable future, a significant amount of business data will continue to be stored in relational database systems. Thus, a bridge is needed to satisfy the requirements of these new XML-based applications while still leveraging relational database technology. This paper describes the design and implementation of the XPERANTO middleware system, which we believe achieves this goal. In particular, XPERANTO provides a general framework to create and query XML views of existing relational data. One of the features provided by XPERANTO is the ability to create XML views of existing relational data. XPERANTO does this by automatically mapping the data of the underlying relational database system to a low-level default XML view. Users can then create application-specific XML views on top of the default XML view. These application-specific views are created using XQuery, a general-purpose, declarative XML query language currently being standardized by W3C. XPERANTO materializes XML views on demand, and does so efficiently by pushing down most computation to the underlying relational database engine. Another feature provided by XPERANTO is the ability to query XML views of relational data. This is important because users often desire only a subset of a view's data. Moreover, users often need to synthesize and extract data from multiple views. In XPERANTO, queries are specified using the same language used to specify XML views, namely XQuery. XPERANTO executes queries efficiently by performing XML view composition so that only the desired relational data items are materialized. In summary, XPERANTO provides a general means to publish and query XML views of existing relational data. Users always use the same declarative XML query language (XQuery) regardless of whether they are creating XML views of relational data or querying those views. ... XPERANTO exposes relational data as an XML view. Users can then query these XML views using a general-purpose, declarative XML query language (XQuery), and they can use the same query language to create other XML views. Thus, users of the system always work with a single query language In addition to providing users with a powerful system that is simple to use, the declarative nature of user queries allows XPERANTO to perform optimizations such as view composition and pushing computation down to the underlying relational database system." See also "IBM Federated Database Technology," by Laura Haas and Eileen Lin.
[March 09, 2002] "Path Predicate Calculus: Towards a Logic Formalism for Multimedia XML Query Languages." By Peiya Liu, Amit Chakraborty, and Liang H. Hsu (Siemens Corporate Research, Inc.) In Markup Languages: Theory & Practice 3/1 (Winter 2001), pages 93-106 (with 22 references). "Many document query languages are currently proposed for specifying document retrieval. But the formalisms for document query languages are still underdeveloped. An adequate formalism is critical for query language development and standardization. Classical formalisms, relational algebra and relational calculus, are used to evaluate the expressive power and completeness of relational query languages. Most relational query languages embed within them either one or a combination of these classical formalisms. However, these formalisms cannot be directly used for tree document query languages due to different underlying data models. In this paper, we propose a logic formalism, called path predicate calculus, based on a tree document model and paths for querying XML. In the path predicate calculus, the atomic logic formulas are element predicates rather than relation predicates as in relational calculus. In this path predicate calculus, queries are equivalent to finding all proofs of the existential closure of logical assertions in the form of path predicates that document elements must satisfy."
[February 12, 2002] "TREX-Q: A query language based on XML Schema." By Brad Penoff (Sun Microsystems, Ireland) Chris Brew (Department of Linguistics, The Ohio State University). Pages 200-209 in in Proceedings of the IRCS Workshop on Linguistic Databases (11-13 December 2001, University of Pennsylvania, Philadelphia, USA; Organized by Steven Bird, Peter Buneman and Mark Liberman; Funded by the National Science Foundation). "James Clark's TREX is a clean, simple and powerful schema language for XML. On this foundation implemented a query language (called TREXQ). The purpose of the present paper is to report our experiences in doing this, and to contribute to understanding of the issues that arise when a validator is extended into a query language. Aside from TREX, the strongest influences on our work are McKelvie's XMLQUERY and a similar algorithm described by Mark Hopkins in UseNet postings... We certainly need to search large corpora and examine the results. It is unlikely that the format in which the data is delivered to us will be convenient or appropriate for every search that we will need to carry out. We may therefore need to conduct a systematic transformation of a large corpus, with or without substantial human intervention. We should expect that data will be delivered to us with irregular, inconsistent, poorly documented or peculiar structures. A powerful query language will make all these processes easier. But we cannot ignore efficiency, because our corpora are very likely to be big. In particular, we cannot assume that we will be able to afford to read our corpus into main memory. We therefore need a query language implementation that handles memory efficiently, at least for the common case where the corpus is large but the search result is of manageable size. Given these considerations, the starting point of this paper is the observation that validation technology is currently better understood than query language technology (at least for the kind of messy and irregular documents that arise in language technology) The leading idea of this paper is to present a methodology for turning an XML validator into a query engine..." [cache]
[January 10, 2002] "XML with Data Values: Typechecking Revisited." By Noga Alon (Tel Aviv University), Tova Milo (Tel Aviv University), Frank Neven (Limburgs Universitair Centrum), Dan Suciu (University of Washington), and Victor Vianu (University of California, San Diego). Presented at PODS [Principles of Database Systems] 2001, Santa Barbara, California, USA. 12 pages (with 26 references). "We investigate the typechecking problem for XML queries: statically verifying that every answer to a query conforms to a given output DTD, for inputs satisfying a given input DTD. This problem had been studied by a subset of the authors in a simplified framework that captured the structure of XML documents but ignored data values. We revisit here the typechecking problem in the more realistic case when data values are present in documents and tested by queries. In this extended framework, typechecking quickly becomes undecidable. However, it remains decidable for large classes of queries and DTDs of practical interest. The main contribution of the present paper is to trace a fairly tight boundary of decidability for typechecking with data values. The complexity of typechecking in the decidable cases is also considered... The decidability results highlight subtle trade-offs between the query language and the output DTDs: decidability is shown for increasingly powerful output DTDs ranging from unordered and star-free to regular, coupled with increasingly restricted versions of the query language. Showing decidability is done in all cases by proving a bound on the size of counterexamples that need to be checked. The technical machinery required becomes quite intricate in the case of regular output DTDs and involves a combinatorial argument based on Ramsey's Theorem. For the decidable cases we also consider the complexity of typechecking and show several lower and upper bounds. The undecidability results show that specialization in output DTDs or recursion in queries render typechecking unfeasible. If output DTDs use specialization, typechecking becomes undecidable even under very stringent assumptions on the queries and DTDs. Similarly, if queries can use recursive path expressions, typechecking becomes undecidable even for very simple output DTDs without specialization. Several questions are left for future work. We showed decidability of typechecking for regular output DTDs and queries restricted to be projection free. It is open whether the latter restriction can be removed. With regard to complexity, closing the remaining gaps between lower and upper bounds remains open. Beyond the immediate focus on typechecking, we believe that the results of the paper provide considerable insight into XML query languages, DTD-like typing mechanisms for XML, and the subtle interplay between them." [source]
[November 28, 2001] OASIS Technical Committee Proposed for TransQuery. A proposal for the creation of a 'TransQuery' OASIS Technical Committee has been made by Evan Lenz, Eric van der Vlist, Francis Norton, and Leigh Dodds. A SourceForge TransQuery project currently hosts a sample implementation of the TransQuery processing model. An OASIS mailing list has been formed for the discussion of TransQuery, which is "a small, flexible set of XSLT conventions and processing model constraints that enable the use of XSLT as a query language over multiple XML documents. TransQuery addresses interoperability between XML databases and document management systems that use XSLT as their primary data access language." [Full context]
[October 29, 2001] "A Decentralized XML Database Approach to Electronic Commerce." By Hiroshi ISHIKAWA and Manabu OHTA. In IEICE Transactions on Information and Systems (October 2001), pages 1302-1312 (with 19 references). The Institute of Electronics, Information and Communication Engineers (IEICE) / IEEE Joint Special Issue on Autonomous Decentralized Systems and Systems' Assurance. "Decentralized XML databases are often used in Electronic Commerce (EC) business models such as e-brokers on the Web. To flexibly model such applications, we need a modeling language for EC business processes. To this end, we have adopted a query language approach and have designed a query language, called XBML, for decentralized XML databases used in EC businesses. In this paper, we explain and validate the functionality of XBML by specifying e-broker business models and describe the implementation of the XBML server, focusing on the distributed query processing." See also the following bibliographic item. [cache]
[October 29, 2001] "An Active Web-based Distributed Database System for E-Commerce." By Hiroshi Ishikawa and Manabu Ohta (Tokyo Metropolitan University, Department of Electronics and Information Engineering). Paper presented at the International Workshop on Web Dynamics held in conjunction with the 8th International Conference on Database Theory, London, UK, 3-January-2001. 10 pages. See previous bibliographic entry.
[September 19, 2001] "Indexing and Querying XML Data for Regular Path Expressions." By Quanzhong Li and Bongki Moon (Department of Computer Science, University of Arizona, Tucson, AZ 85721, USA). Paper presented at the 2001 International Conference on Very Large Databases (VLDB 2001), Rome, Italy, September, 2001. 10 pages, with 25 references. "With the advent of XML as a standard for data representation and exchange on the Internet, storing and querying XML data becomes more and more important. Several XML query languages have been proposed, and the common feature of the languages is the use of regular path expressions to query XML data. This poses a new challenge concerning indexing and searching XML data, because conventional approaches based on tree traversals may not meet the processing requirements under heavy access requests. In this paper, we propose a new system for indexing and storing XML data based on a numbering scheme for elements. This numbering scheme quickly determines the ancestor-descendant relationship between elements in the hierarchy of XML data. We also propose several algorithms for processing regular path expressions, namely, (1) EE-Join for searching paths from an element to another, (2) EA-Join for scanning sorted elements and attributes to find element-attribute pairs, and (3) KC-Join for finding Kleene-Closure on repeated paths or elements. The EE-Join algorithm is highly effective particularly for searching paths that are very long or whose lengths are unknown. Experimental results from our prototype system implementation show that the proposed algorithms can process XML queries with regular path expressions by up to an order of magnitude faster than conventional approaches... The XQuery language is designed to be broadly applicable across all types of XML data sources from documents to databases and object repositories. The common features of these languages are the use of regular path expressions and the ability to extract information about the schema from the data. Users are allowed to navigate through arbitrary long paths in the data by regular path expressions. For example, XPath uses path notations as in URLs for navigating through the hierarchical structure of an XML document. Despite the past research efforts, it is widely believed that the current state of the art of the relational database technology fails to deliver all necessary functionalities to efficiently store XML and semi-structured data. Furthermore, when it comes to processing regular path expression queries, only a few straightforward approaches based on conventional tree traversals have been reported in the literature. Such approaches can be fairly inefficient for processing regular path expression queries, because the overhead of traversing the hierarchy of XML data can be substantial if the path lengths are very long or unknown. In this paper, we propose a new system called XISS for indexing and storing XML data based on a new numbering scheme for elements and attributes. The index structures of XISS allow us to efficiently find all elements or attributes with the same name string, which is one of the most common operations to process regular path expression queries. The proposed numbering scheme quickly determines the ancestor-descendant relationship between elements and/or attributes in the hierarchy of XML data. We also propose several algorithms for processing regular path expression queries... The new query processing paradigm proposed in this paper poses an interesting issue concerning XML query optimization. A given regular path expression can be decomposed in many different ways. Since each decomposition leads to a different query processing plan, the overall performance may be affected substantially by the way a regular path expression is decomposed. Therefore, it will be an important optimization task to find the best way to decompose an expression. We conjecture that document type definitions and statistics on XML data may be used to estimate the costs and sizes of intermediate results. In the current prototype implementation of XISS,all the index structures are organized as paged files for effi-cient disk IO. We have observed that trade-off between disk access efficiency and storage utilization. It is worth investigating the way to find the optimal page size or the break-even point between the two criteria." [cache]
[September 18, 2001] "A Fast Index for Semistructured Data." By Brian F. Cooper, Neal Sample, Michael J. Franklin, Gísli R. Hjaltason, and Moshe Shadmon. Paper presented at the 27th VLDB Conference, Roma, Italy, September 13, 2001. 19 pages, with 32 references. Abstract: "Queries navigate semistructured data via path expressions, and can be accelerated using an index. Our solution encodes paths as strings, and inserts those strings into a special index that is highly optimized for long and complex keys. We describe the Index Fabric, an indexing structure that provides the efficiency and flexibility we need. We discuss how 'raw paths' are used to optimize ad hoc queries over semistructured data, and how 'refined paths' optimize specific access paths. Although we can use knowledge about the queries and structure of the data to create refined paths, no such knowledge is needed for raw paths. A performance study shows that our techniques, when implemented on top of a commercial relational database system, outperform the more traditional approach of using the commercial system's indexing mechanisms to query the XML." Detail: "... Typically, indexes are constructed for efficient access. One option for managing semistructured data is to store and query it with a relational database... An alternative option is to build a specialized data manager that contains a semistructured data repository at its core. Projects such as Lore and industrial products such as Tamino and XYZFind take this approach. It is difficult to achieve high query performance using semistructured data repositories, since queries are again answered bytraversing many individual element-to-element links, requiring multiple index lookups. Moreover, semistructured data management systems do not have the benefit of the extensive experience gained with relational systems over the past few decades. To solve this problem, we have developed a different approach that leverages existing relational database technology but provides much better performance than previous approaches. Our method encodes paths in the data as strings, and inserts these strings into an index that is highly optimized for string searching. The index blocks and semistructured data are both stored in a conventional relational database system. Evaluating queries involves encoding the desired path traversal as a search key string, and performing a lookup in our index to find the path. There are several advantages to this approach. First, there is no need for a prioriknowledge of the schema of the data, since the paths we encode are extracted from the data itself. Second, our approach has high performance even when the structure of the data is changing, variable or irregular. Third, the same index can accelerate queries along many different, complex access paths. This is because our indexing mechanism scales gracefully with the number of keys inserted, and is not affected by long or complex keys (representing long or complex paths). Our indexing mechanism, called the Index Fabric, utilizes the aggressive key compression inherent in a Patricia trie to index a large number of strings in a compact and efficient structure. Moreover, the Index Fabric is inherently balanced, so that all accesses to the index require the same small number of I/Os. As a result, we can index a large, complex, irregularly-structured, disk-resident semistructured data set while providing efficient navigation over paths in the data. Indexing XML with the Index Fabric: Because the Index Fabric can efficiently manage large numbers of complex keys, we can use it to search many complex paths through the XML. In this section, we discuss encoding XML paths as keys for insertion into the fabric, and how to use path lookups to evaluate queries... We encode data paths using designators: special characters or character strings. A unique designator is assigned to each tag that appears in the XML. The designator-encoded XML string is inserted into the layered Patricia trie of the Index Fabric, which treats designators the same way as normal characters, though conceptually they are from different alphabets. In order to interpret these designators (and consequently to form and interpret queries) we maintain a mapping between designators and element tags called the designator dictionary. When an XML document is parsed for indexing, each tag is matched to a designator using the dictionary. New designators are generated automatically for new tags. The tag names from queries are also translated into designators using the dictionary, to form a search key over the Index Fabric. ... Raw paths index the hierarchical structure of the XML by encoding root-to-leaf paths as strings. Simple path expressions that start at the root require a single index lookup. Other path expressions may require several lookups, or post-processing the result set. [Here] we focus on the encoding of raw paths. Raw paths build on previous work in path indexing. Tagged data elements are represented as designator-encoded strings. We can regard all data elements as leaves in the XML tree..." [cache 2001-09-18]
[August 15, 2001] "Microsoft Debuts Demo 2 of XML Query Tool." By Jeffrey Burt. In eWEEK August 14, 2001. "Microsoft Corporation today released the second demo of an XML query tool. XQuery is the Redmond, Wash., company's implementation of the latest version of the World Wide Web Consortium's XML Query Working Draft, which was released June 7. 'This is emerging technology,' said Philip DesAutels, product manager for XML Web services at Microsoft. 'We're rolling this out to our developer community to go out and get their input into it.' A committee of the W3C is working on an XML (Extensible Markup Language) query specification designed to enable users to extract data from documents on the Web and manipulate the data that is found. The committee released the first public draft of the technology in February, and Microsoft released its first demo version of XQuery in April. Eventually, XQuery will be incorporated throughout Microsoft's products, from its SQL Server database tools to the company's .Net frameworks, said Mark Fussell, lead program manager for XML technologies at Microsoft. More than 30,000 individual developers visited the Web site of the first demo, and Microsoft expects many more to do so with the second demo. Available on the Web or by download, the second demo enables developers to use it with .Net today and includes an interface to allow query results on SQL Server to be tagged as XML data..." See the recent news item, and the announcement from May 14, 2001: "Microsoft Hosts Online XQuery Prototype Application."
[July 05, 2001] "Extending XML Query Language. XQuery for Querying Relational Databases." By John Gao (Tracker Business System, Richland, WA) and Devin Smith (Pacific Northwestern National Lab). Masters Thesis (Washington State University). May, 2001. Abstract: "The objective of this research is to extend an XML query language for querying XML documents stored in relational databases. XML has become the dominant language for computerized data representations and exchanges. XML documents are mainly stored in file systems, but database systems can provide better data security and concurrency management. For practical reasons, relational databases will be the main storage system of XML documents if information in the XML documents is to be retrieved efficiently. SQL is primarily used for querying relational data. XML query languages are developed mainly for XML documents in files systems or non-relational databases. No query language is available for querying XML documents in relational databases efficiently. SQL extended with XML query power or an XML query language extended with relational query functionality may be used to query XML documents in relational databases. XQuery (working draft), the XML query language proposed by the World Wide Web Consortium, combines superior features from other XML query languages, SQL, and object-oriented query languages. XQuery can query relational databases only after data are transferred to XML documents, costing time and memory. The query structure of the FOR/LET-WHERE-RETURN (FLWR) in XQuery is similar to the SELECT-FROM-WHERE structure in SQL. Thus, XQuery was extended for querying relational databases directly by adding a Mode expression in FLWR to control data input and output types. Extended-XQuery can query XML documents in relational databases whether they are stored as blob fields or mapped into relational tables. Like SQL, Extended-XQuery can query relational data directly without any data transformation. A relational query engine from Interbase is used by Extended-XQuery for querying relational data in this research. Extended-XQuery queries are translated to SQL queries and then executed with the SQL engine. A user interface for translating and executing Extended-XQuery queries was implemented with Borland Delphi 5. To be a fully functional database language, XQuery also needs to be extended with the functionality for update, insert and deletion in future research..."
[June 26, 2001] "An Introduction to XQuery. A look at the W3C's proposed standard for an XML query language." By Howard Katz (Fatdog Software). From IBM developerWorks. June 2001. ['Howard Katz introduces the W3C's XQuery specification, currently winding its way toward Recommendation status after emerging from a long incubation period behind closed doors. The complex specification consists of six separate working drafts, with more to come. This article provides some background history, a road map into the documentation, and an overview of some of the technical issues involved in the specification. A sidebar takes a quick look at some key features of XQuery's surface syntax. Code samples demonstrate the difference between XQuery and XQueryX and show examples of the surface syntax.'] "The W3C's XQuery specification has been in the works for a long time. The initial query language workshop that kicked things off was hosted by the W3C in Boston in December 1998. Invited representatives from industry, academia, and the research community at the workshop had an opportunity to present their views on the features and requirements they considered important in a query language for XML. The 66 presentations, which are all available online, came mainly from members of two very distinct constituencies: those working primarily in the domain of XML as-document (largely reflecting XML's original roots in SGML), and those working with XML as-data -- the latter largely reflecting XML's ever-increasing presence in the middleware realm, front-ending traditional relational databases. The working group is large by W3C standards (I'm told that only the Protocol Working Group has a larger membership). Its composition of some 30-odd member companies reflects the views of both constituencies. What's now starting to coalesce into final form is an XML query language standard that very ably manages to represent the needs and perspectives of both communities. The key component of XQuery that will be most familiar to XML users is XPath, itself a W3C specification. A solitary XPath location path standing on its own (//book/editor meaning 'find all book editors in the current collection') is perfectly valid XQuery. On the data side, XQuery's SQL-like appearance and capabilities will be both welcome and familiar to those coming in from the relational side of the world..."
[May 11, 2001] "Efficient Evaluation of XML Middle-ware Queries." By Mary FernFernándezndez (AT&T Labs), Atsuyuki Morishima (University of Tsukuba), and Dan Suciu (University of Washington). Paper presented at ACM SIGMOD/PODS 2001, Santa Barbara, California, May 21-24, 2001. 12 pages. "We address the problem of efficiently constructing materialized XML views of relational databases. In our setting, the XML view is specified by a query in the declarative query language of a middle-ware system, called SilkRoute. The middle-ware system evaluates a query by sending one or more SQL queries to the target relational database, integrating the resulting tuple streams, and adding the XML tags. We focus on how to best choose the SQL queries, without having control over the target RDBMS... XML is the universal data-exchange format between applications on the Web. Most existing data, however, is stored in non-XML database systems, so applications typically convert data into XML for exchange purposes. When received by a target application, XML data can be re-mapped into the application's data structures or target database system. Thus, XML often serves as a language for defining a view of non-XML data. We are interested in the case when the source data is relational, and the exchange of XML data is between separate organizations or businesses on the Web. This scenario is common, because an important use of XML is in business-to-business (B2B) applications, and most business-critical data is stored in relational database systems (RDBMS). This scenario is also challenging, be cause the mapping from the relational model to XML is inherently complex and may be difficult to compute efficiently. Relational data is flat, normalized (3NF), and its schema is often proprietary. For example, relation and attribute names may refer to a company's internal organization, and this information should not be exposed in the exported XML data. In contrast, XML data is nested, unnormalized, and its schema (e.g., a DTD or XML Schema) is public. The mapping from the relational data to XML, therefore, usually requires nested queries, joins of multiple relations, and possibly integration of disparate databases. In this work, we address the problem of evaluating efficiently an XML view in the context of SilkRoute, a relational to XML middle-ware system. In SilkRoute, a relational to XML view is specified in the declarative query language RXL. An RXL query has constructs for data extraction and for XML construction. We are interested in the special case of materializing large RXL views. In practice, large, materialized views may be atypical: often the XML view is kept virtual, and users' queries extract small fragments of the entire XML view. For example, SilkRoute supports composition of user-defined queries in XML-QL and virtual RXL views and translates the composed queries into SQL. SilkRoute's query composition algorithm is described elsewhere. Our goal is to support data-export or warehousing applications, which require a large XML view of the entire database. In this case, computing the XML view may be costly, and query optimization can yield dramatic improvements...In our scenario, the XML document defined by an RXL view typically exceeds the size of main memory, therefore, the sorted, outer-union approach best suits our needs. This approach constructs one large, SQL query from the view query; reads the SQL query's resulting tuple stream; and then adds XML tags. The SQL query consists of several left-outer joins, which are combined in outer unions. The resulting tuples are sorted by the XML element in which they occur, so that the XML tagging algorithm can execute in constant space. SilkRoute initially used a more naive approach, in which the view query was decomposed into multiple SQL queries that do not contain outer joins or outer unions. Each result is sorted to permit merging and tagging of the tuples in constant space. We call this the fully partitioned strategy. This work makes two contributions. First, we show experimentally that neither of the above approaches is optimal. This is surprising for the sorted outer-union strategy, because only one SQL query is generated, and therefore has the greatest potential for optimization by the RDBMS. In experiments on a 100MB database, we found that the outer-union query was slower than the queries produced by the fully-partitioned strategy. We found that the optimal strategy generates multiple SQL queries, but fewer than the fully partitioned strategy, therefore the optimal SQL queries may contain outer joins and outer unions. XML tagging still uses constant space, because it merges sorted tuple streams. The optimal strategy executes 2.5 to 5 times faster than the sorted outer-union and fully-partitioned strategies... Generating SQL queries from an XML view definition is a tedious task, and as we have shown, different SQL-generation strategies dramatically effect query-evaluation time. These observations indicate that the user of a relational-to-XML publishing system should not be responsible for choosing SQL queries. To better support large XML views, we presented a method that decomposes the XML view definition into several, smaller SQL queries and submits the decomposed SQL queries to the target database. Our greedy algorithm for decomposing an XML view definition relies on query-cost estimates from the target query optimizer. This method works well in practice and generates execution plans that are near optimal. Although particularly effective in an XML middle-ware system, our view-tree representation can encompass the view-definition languages of commercial relational-to-XML systems. Commercial systems typically generate XML in-engine, because the cost of binding application variables to the tuples dominates execution time. Our decomposition method could be applied within a relational query optimizer as a preprocessing step to XML publishing of relational data in-engine. This work is focussed on publishing large XML documents in an environment in which the middle-ware system has no control over the physical environment or query optimizer of the target database. Given these constraints, our greedy algorithm for searching for optimal query plans is necessary and effective. The simpler outer-union strategy, however, might be adequate when the middle-ware system has more control over the target database. SilkRoute's generated optimal plans do better than the unified outer-union plan, because each individual query is smaller than the outer-union plan. Small queries are less likely to stress the query optimizer; they sort smaller result relations and therefore are less likely to spill tuples to disk; and they typically have many fewer null values than a unified query. An outer-union plan can be reduced by hand, which would provide the same benefits as automatic view-tree reduction. Assuming that the target database has plentiful memory and/or multiple disks, and efficiently supports null values, the resulting outer-union plan is likely to be comparable to SilkRoute's generated optimal plans. Finally, the outer-union plan may also be appropriate when a user query requests only a subset of the XML view, and the result document is small. In this scenario, the outer-union strategy should work well, because the resulting SQL query is usually simple. This scenario is considered in [SilkRoute: Trading], where the XML view of the database is virtual, and users query it using XML-QL." See also: Mary Fernández Dan Suciu, and Wang-Chiew Tan: "SilkRoute: Trading Between Relations and XML," in Proceedings of WWW9 (2000).
[April 13, 2001] "XML Query Engine. The XQEngine utility lets you perform full-text-searches across multiple files using the XML Query Language (XQL) and Java." By Piroz Mohseni. From DevX XML-Zone (April 2001). "A while ago, I was looking for an XML search utility. My application had to search a relatively large number of XML files (they were small files) on a periodic basis. The primary goal was to find out if there was a match or not, but sometimes we needed to extract the 'found' data as well. I was first driven towards XSLT and its sister XPath thinking the search problem could be mapped into a transformation and solved that way. After some experimenting, I decided I really had a search problem at hand. The comma-separated values (CSV) output I needed was not appropriate for XSLT and full-text searching was not available. I decided the XML Query Language (XQL) seemed to better address my problem. As I was looking for implementations of XQL, I came across a small utility program called XQEngine which seemed like a good fit. In this article, I'll show you how you can use XQEngine for your search needs. XQEngine [available at Fatdog.com] is a JavaBean which uses a SAX parser to index one or more XML documents and then allows you to perform multiple searches on them. The search language is a superset of XQL which uses a syntax similar to XPath. Recently the XML Query working group at W3C released several new working documents so the language most definitely will change in the future... Searching XML documents continues to be an evolving challenge. Database vendors are now offering XML support and there are a number of new XML store and search solutions. XQEngine offers an effective search solution when simplicity is of more concern than scalability. It has several useful configurations and can return the search results in various formats." [From fatdog.com: 'XML Query Engine (XQEngine for short) is a full-text search engine component for XML. It lets you search small to medium-size collections of XML documents for boolean combinations of keywords, much as web-based search engines let you do for HTML. Queries are specified using XQL, a de facto standard for querying XML documents that is nearly identical to the simplified form of XPath. Queries expressed in XQL are much more expressive and powerful than the standard search interfaces available through web-based search engines. XML Query Engine is a compact (roughly 160K), embeddable component written in Java. It has a straightforward programming interface that lets you easily call it from your own Java application. The engine should work well as a personal productivity tool on an individual desktop, as part of a CD-based application, or on a server with low to medium-volume traffic.']
[March 24, 2001] "A Web Odyssey: From Codd to XML. [Invited Presentation.]" By Victor Vianu (UC San Diego). With 100 references. (so!) Paper presented at PODS 2001. Twentieth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). May 21 - 24, 2001. Santa Barbara, California, USA. "What does the age of the Web mean for database theory? It is a challenge and an opportunity, an exciting journey of rediscovery. These are some notes from the road... What makes the Web scenario different from classical databases? In short, everything. A classical database is a coherently designed system. The system imposes rigid structure, and provides queries, updates, as well as transactions, concurrency, integrity, and recovery, in a controlled environment. The Web escapes any such control. It is a free-evolving, ever-changing collection of data sources of various shapes and forms, interacting according to a exible protocol. A database is a polished artifact. The Web is closer to a natural ecosystem. Why bother then? Because there is tremendous need for database-like functionality to efficiently provide and access data on the Web and for a wide range of applications. And, despite the differences, it turns out that database knowhow remains extremely valuable and effective. The design of XML query and schema languages has been heavily influenced by the database community. XML query processing techniques are based on underlying algebras, and use rewrite rules and execution plans much like their relational counterparts. The use of the database paradigm on the Web is a success story, a testament to the robustness of databases as a field. Much of the traditional framework of database theory needs to be reinvented in the Web scenario. Data no longer fits nicely into tables. Instead, it is self-describing and irregular, with little distinction between schema and data. This has been formalized by semi-structured data. Schemas, when available, are a far cry from tables, or even from more complex object-oriented schemas. They provide much richer mechanisms for specifying exible, recursively nested structures, possibly ordered. A related problem is that of constraints, generalizing to the semi-structured and XML frameworks classical dependencies like functional and inclusion dependencies. Specifying them often requires recursive navigation through the nested data, using path expressions. Query languages also differ significantly from their relational brethren. The lack ofschema leads to a more navigational approach, where data is explored from specific entry points. The nested structure of data leads to recursion in queries, in the form of path expressions. Other paradigms have also proven useful, such as structural recursion... One of the most elegant theoretical developments is the connection of XML schemas and queries to tree automata. Indeed, while the classical theory of queries languages is intimately related to finite-model theory, automata theory has instead emerged as the natural formal companion to XML. Interestingly, research on XML is feeding back into tree automata theory and is re-energizing this somewhat arcane area of language theory. This connection is a recurring theme throughout the paper... In order to meaningfully contribute to the formal foundations of the Web, database theory has embarked upon a fascinating journey of rediscovery. In the process, some of the basic assumptions of the classical theory had to be revisited, while others were convincingly reaffirmed. There are several recurring technical themes. They include extended conjunctive queries, limited recursion in the form of path expressions, ordered data, views, incomplete information, active features. Automata theory has emerged as a powerful tool for understanding XML schema and query languages. The specific needs of the XML scenario have inturn provided feedback into automata theory, generating new lines of research. The Web scenario is raising an unprecedented wealth of challenging problems for database theory -- a new frontier to be explored." Note: the PODS 2001 conference has other papers on XML (query).
[March 19, 2001] "IBM Experiments With XML." By Charles Babcock. In Interactive Week (March 19, 2001). "IBM is experimenting with eXtensible Markup Language as a query language to get information from a much broader set of resources than rows and tables of data in relational databases. It has also built a working model of a 'dataless' database that assembles needed information from a variety of sources, after breaking down a user's query into parts that can be answered separately. The response sent back to the user offers the information as a unified, single presentation. The disclosures came as IBM pulled back the curtain on its database research at its Almaden Research Lab in San Jose where Project R was first fledged 25 years ago, leading to the DB2 database management system in the mid-1980s. At the briefing, it also disclosed that Don Chamberlin, IBM's primary author of the Structured Query Language (SQL), which became instrumental to the success of relational databases, was also behind XQuery, IBM's proposed XML query language before the World Wide Web Consortium. The W3C's XML Query Working Group released its first working draft of an XML query language on Feb. 15. IBM Fellow Hamid Pirahesh said 'XQuery has been taken as a base' by the W3C working group and would lead to a language that could be used more broadly than SQL. An XML-based query language could query repositories of documents, both structured and unstructured, such as e-mail, to find needed information... IBM, Microsoft and Software AG are all committed to bring out products based on an XML query language. Software AG, through its former American subsidiary, established Tamino as an XML-based database system over the last year. An IBM product will be launched before the end of June, Pirahesh said. Such future products may make it possible for sites rich in many forms of content, such as CNN, National Geographic or the New York Times, may find many additional ways to allow visitors to seek what they want or ask questions and obtain answers, said Jim Reimer, distinguished engineer at IBM.... Besides the proposed query language, IBM has built an experimental 'dataless' database system that gets the user the information needed from a variety of sources by breaking down a query into its parts. Each part is addressed to the database system or repository that can supply an answer, even though the data may reside in radically different systems and formats. When the results come back, they are assembled as one report or assembled view to the user. IBM plans to launch a product, Discovery Link, as an add-on to its DB2 Universal Server system in the second quarter. Discovery Link itself will contain no data but will have a database engine capable of parsing complex queries into simpler ones and mapping their route to the systems that can respond with results. The user will not need to know the name of the target database or repository or how to access it. Discovery Link will resolve those issues behind the scenes, said IBM Fellow Bruce Lindsay. The system will be a 'virtual database' or a federation of heterogeneous databases, and a pilot Discovery Link system has been in use for several months by pharmaceutical companies trying to research and manufacture new drugs..."
[February 22, 2001] "XQuery: Reinventing the Wheel?" By Evan Lenz (XYZFind Corp.). February 2001. "There is a tremendous amount of overlap in the functionality provided by XQuery, the newly drafted XML query language published by the W3C, and that provided by XSLT. The recommendation of two separate languages, one for XML query and one for XML transformations, if they don't have some sort of common base, may cause confusion as to which language should be used for various applications. Despite certain limitations, XSLT as it currently stands may function well as an XML query language. In any case, the development of an XML query language should be informed by XSLT... The proliferation of XML as a data interchange format and document format is creating new problems and opportunities in the field of information retrieval. While much of the world's information is housed in relational database management systems, not all information is able to fit within the confines of the relational data model. XML's hierarchical structure provides a unified format for data-centric information, document-centric information, and information that blurs the distinction between data and documents. Accordingly, a data model for XML could provide a unified way of viewing information, whether that information is actually stored as XML or not. Access to, extraction from, and manipulation of this information together comprise the problem of an XML query language. This paper explores some issues, advantages, and disadvantages of using XSLT as a query language for XML. It attempts to show that the basic building blocks of an XML query language can be found in XSLT, by way of an introduction to and comparison with XQuery, the newly drafted XML query language published by the W3C XML Query Working Group. This paper is not a proposal for a specific implementation. [Conclusion:] In the long run, the XML Query Working Group is probably doing the right thing in first formally defining the semantics of the query language. To attain the sophistication of query optimization that we currently have with SQL, an XML query language's underlying mathematics must be well understood. But these semantics should not be developed in a vacuum. However well understood a particular set of semantics is, we will not truly understand which set of semantics is useful in an XML query language until people have built real applications involving XML query. This is the reason why XSLT should be seriously addressed: it is the most widely used and implemented XML query language yet." Note 'This paper is adapted from what I'll be presenting on 'XSLT as a query language' at XSLT-UK.' See also the related posting on XQuery. On XSLT-UK: see the events listing. [cache]
[November 27, 2000] "Relationships Between Logic Programming and XML." By Harold Boley (Deutsches Forschungszentrum für Künstliche Intelligenz GmbH). Revised document, from Proceedings of the Fourteenth Workshop 'Logische Programmierung', Würzburg, January 2000. 17 pages, with 13 references. "Mutual relationships between logic programming and XML are examined. XML documents are introduced as linearized derivation trees and mapped to Prolog structures. Conversly, a representation of Herbrand terms and Horn clauses in XML leads to a pure XML-based Prolog (XmlLog). The XML elements employed for this are complemented by uses of XML attributes like id/idref for extended logics. XML's document type definitions are introduced on the basis of ELEMENT declarations for XmlLog and ATTLIST declarations for id/idref. Finally, queries in languages like XQL are treated functional-logically and inferences on the basis of XmlLog are presented. All concepts are explained via knowledge-representation examples... The simplicity of Web-based data exchange is beneficial for nonformal, semiformal and formal documents. For formal specifications and programs the Web permits a distributed development, usage and maintenance. Logic programming (LP) has the potential to serve as a standard language for this. However, the World Wide Web Consortium (W3C) has enhanced HTML (for nonformal and semiformal documents) into the Extensible Markup Language (XML) for semiformal and formal documents. Thus the issue of the relationships between XML and LP arises. Will logic programming have the chance, despite, or perhaps, precisely because of XML, to become a 'Web technology' for formal documents? Could the HTML-like syntax of XML forthis be replaced by a Prolog-like syntax, or could it be edited or presented - over a standardized stylesheet - in such a Prolog syntax? Is SLD resolution a suitable starting point for the interpreter semantics of an XML query language like XQL? Or should an LP-oriented, inferential, query language be developed in the form of an XML-based Prolog? In the following text, such questions will be dealt with, and possible interplays between XML and LP, in both directions, will be discussed. The already foreseeable success of XML as a 'universal' interchange format for commercial practice (including E-commerce) can also be viewed as a success of the declarative representation technique; as it was proposed, in a somewhat different form by logic programming. Similarities and differences between these declarative paradigms will be later elaborated upon..." [cache]
Design and Analysis of Query Languages for Structured Documents -- A Formal and Logical Approach. By Frank Neven. PhD Dissertation. Limburgs Universitair Centrum (LUC), 1999. [cache]
[November 16, 2000] "Kweelt: the Making-of Mistakes Made and Lessons Learned." By Arnaud Sahuguet (University of Pennsylvania). November, 2000. 26 pages (with 32 references). ['We have just released a technical report that describes our experiment in implementing the Quilt XML query language.'] "In this paper we report our experience in building Kweelt, an open source Java framework for querying XML based on the recent Quilt proposal. Kweelt is intended to provide a reference implementation for the Quilt language but also to offer a framework for all kinds of experiments related to XML including storage, optimization, query language features, etc. And we report in this paper on the differences entailed by the use of two different storage managers, based respectively on character files and relational databases. An important design decision was to do a 'direct' implementation of Quilt. Instead of relying on preconceptions (and misconceptions!) inherited from our database query processing background, we wanted this reference implementation to expose exactly what is easy and what is hard both in terms of expressiveness and of efficiency. The process has lead naturally to what may in hindsight be called mistakes, and to formulate lessons that will hopefully be used in future implementations to mix-and-match pieces of existing technology in databases and programming languages for optimal results." See further the PENN Database Research Group publications.
[November 15, 2000] "Object-Oriented Mediator Queries to XML Data." By Hui Lin, Tore Risch, and Timour Katchaounov (Uppsala DataBase Laboratory [UDBL], Department of Information Science, Uppsala University). Pages 38-45 (with 20 references) in Proceedings of the First International Conference on Web Information Systems Engineering [Proceedings of WISE 2000, Hong Kong, China, 19-21 June 2000.] Abstract: "The mediator/wrapper approach is used to integrate data from different databases and other data sources by introducing a middleware virtual database that provides high level abstractions of the integrated data. A framework is presented for querying XML data through such an Object-Oriented (OO) mediator system using an OO query language. The mediator architecture provides the possibility to specify OO queries and views over combinations of data from XML documents, relational databases, and other data sources. In this way interoperability of XML documents and other data sources is provided. The mediator provides OO views of the XML data by inferring the schema of imported XML data from the DTD of the XML documents, if available, using a set of translation rules. A strategy is used for minimizing the number of types (classes) generated in order to simplify the querying. If XML documents not having DTDs are read, or if the DTD is incomplete, the system incrementally infers the OO schema from the XML structure while reading XML data. This requires that the mediator database is capable of dynamically extending and modifying the OO schema. The paper overviews the architecture of the system and describes incremental rules for translating XML documents to OO database structures. [Conclusion:] We described the architecture of a wrapper called AmosXML that allows parsing and querying XML documents from an object-oriented mediator system. Furthermore, incremental translation rules were described that infer OO schema elements while reading DTD definitions or XML documents. Some rules infer the OO schema from the DTD, when available. For XML documents without DTDs, or when the DTD is incomplete, other rules incrementally infer the OO schema from the contents of the accessed XML documents. The discovery of OO schema structures combined with other OO mediation facilities in AMOS II allow the specification OO queries and views over data from XML documents combined with data from other data sources. The incremental nature of the translation rules allow them to be applied in a streamed fashion, which is important for large data files and when the network communication is slow or bursty. There are several possible directions for our future work: (1) The current rules do not infer any inheritance, but a flat type hierarchy is generated. Rules should be added to infer inheritance hierarchies, e.g., by using behavioral definitions of types where a type is defined by its behavior (i.e., its attributes and methods). In our case this means that a type T is defined as a subtype of another type U if the set of functions on U is a subset of the set of functions on T. (2) Integrating XML data involves efficient processing of queries over many relatively small XML data files described by several layers of metadata descriptions and links. For example, there can be 'master' XML documents having links to other XML documents and DTDs. Therefore the query language needs to be able to transparently express queries referencing both XML data and the metadata in the master documents. New techniques are needed to be able to specify and efficiently process queries over such multi-layered metadata. (3) The conventional exhaustive cost-based query processing techniques do not scale over large numbers of distributed XML documents. New distributed heuristic techniques need to be developed for this." See also: (1) AMOS II - Active Mediators for Information Integration; (2) the UDBL publications page.
[November 22, 2000] "Extensions of Attribute Grammars for Structured Document Queries." By Frank Neven (Limburgs Universitair Centrum, Universitaire Campus, Dept. WNI, Infolab, B-3590 Diepenbeek, Belgium). Presented at DBPL99 - The Eighth International Workshop on Database Programming Languages (Kinloch Rannoch, Scotland, September 1st - 3rd, 1999). 18 pages (with 46 references). "Widely-used document specification languages like, e.g., SGML and XML, model documents using extended context-free grammars. These differ from standard context-free grammars in that they allow arbitrary regular expressions on the right-hand side of productions. To query such documents, we introduce a new form of attribute grammars (extended AGs) that work directly over extended context-free grammars rather than over standard context-free grammars. Viewed as a query language, extended AGs are particularly relevant as they can take into account the inherent order of the children of a node in a document. We show that two key properties of standard attribute grammars carry over to extended AGs: efficiency of evaluation and decidability of well-definedness. We further characterize the expressiveness of extended AGs in terms of monadic second-order logic, establish the complexity of their non-emptiness and equivalence problem to be complete for EXPTIME, and consider several extensions of extended AGs. As an application we show that the Region Algebra expressions introduced by Consens and Milo can be efficiently translated into extended AGs. This translation drastically improves the known upper bound on the complexity of the emptiness and equivalence test for Region Algebra expressions... Structured document databases can be seen as derivation trees of some grammar which functions as the 'schema' of the database'. Document specification languages like, e.g., SGML and XML, model documents using extended context-free grammars. Extended context-free grammars (ECFG) are context-free gram- mars (CFG) having regular expressions over grammar symbols on the right-hand side of productions. It is known that ECFGs generate the same class of string languages as CFGs. Hence, from a formal language point of view, ECFGs are nothing but shorthands for CFGs. However, when grammars are used to model documents, i.e., when also the derivation trees are taken into consideration, the difference between CFGs and ECFGs becomes apparent... The classical formalism of attribute grammars, introduced by Knuth, has always been a prominent framework for expressing computations on derivation trees. Therefore, in previous work, we investigated attribute grammars as a query language for derivation trees of CFGs. Attribute grammars provide a mechanism for annotating the nodes of a tree with so-called 'attributes', by means of so-called 'semantic rules' which can work either bottom-up (for so-called 'synthesized' attribute values) or top-down (for so-called 'inherited' attribute values). Attribute grammars are applied in such diverse fields of computer science as compiler construction and software engineering. Inspired by the idea of representing transition functions for automata on unranked trees as regular string languages, we introduce extended attribute grammars (extended AGs) that work directly over ECFGs rather than over standard CFGs... By carefully tailoring the semantics of inherited attributes, extended AGs can take into account the inherent order of the children of a node in a document. [Related work:] Schwentick and the present author defined query automata to query structured documents. Query automata are two-way automata over (un)ranked trees that can select nodes depending on the current state and on the label at these nodes. Query automata can express precisely the unary MSO definable queries and have an EXPTIME- complete equivalence problem. This makes them look rather similar to extended AGs. The two formalisms are, however, very different in nature. Indeed, query automata constitute a procedural formalism that has only local memory (in the state of the automaton), but which can visit each node more than a constant number of times. Attribute grammars, on the other hand, are a declarative formalism, whose evaluation visits each node of the input tree only a constant number of times (once for each attribute). In addition, they have a distributed memory (in the attributes at each node). It is precisely this distributed memory which makes extended AGs particularly well-suited for an efficient simulation of Region Algebra expressions. It is, hence, not clear whether there exists an efficient translation from Region Algebra expressions into query automata. Extended AGs can only express queries that retrieve subtrees from a document. It would be interesting to see whether the present formalism can be extended to also take re- structuring of documents into account. A related paper in this respect is that of Crescenzi and Mecca. They define an interesting formalism for the definition of wrappers that map derivation trees of regular grammars to relational databases. Their formalism, however, is only defined for regular grammars and the correspondence between actions (i.e., semantic rules) and grammar symbols occurring in regular expressions is not so flexible as for extended AGs. Other work that uses attribute grammars in the context of databases includes work of Abiteboul, Cluet, and Milo and Kilpeläinen et al. Finally, we compare extended AGs with the selective power of two query languages for XML. XML-QL is an SQL like query language for XML defined by Deutsch et al. that allows to define general tree to tree transformations. The selective power of XML- QL, however, is restricted to regular path expressions. Consequently, it can only take the structure of paths into account, not of whole subtrees. E.g., queries like the one in 'Example 13: retrieve every poem that has the words king or lord in every other verse' cannot be expressed. As another example we mention that XML-QL cannot select nodes whose children match a certain regular expressions. The latter functionality is obtained by Papakonstantinou and Vianu by introducing a form of horizontal navigation into their selection patterns. These patterns can easily be simulated by extended AGs, but still cannot express the query of 'example 13'." See (1) "XML and Query Languages" and (2) "XML and Attribute Grammars." [cache]
[May 10, 2000] Aya Soffer published a call for papers in connection with an ACM SIGIR 2000 Workshop On XML and Information Retrieval', to be held on July 28, 2000 in Athens, Greece. "XML - the eXtensible Markup Language has recently emerged as a new standard for data representation and exchange on the Internet. It has thus become crucial to address the question of how can we efficiently query and search large corpora of XML documents. To date, most work on storing, indexing, querying, and searching documents in XML has stemmed from the database community's work on semi-structured data. An alternative approach, which has received less attention to date, is to view XML documents as a collection of text documents with additional tags and relations between these tags. In this workshop, we will explore both approaches and investigate the relationship between IR and XML. Topics may include: (1) Extending IR technologies to search XML documents and integrating XML structure in IR indexing structures; (2) Querying XML documents both on content and structure; (3) Leveraging the semantics inherent to XML for the search process; (4) Relationships between XML and other text encoding and metadata standards; (5) Definition of standard DTDs/Schemas for IR tools such as search results and clustering outputs." Submissions are due on June 05, 2000. Further information is available on the workshop web site.
[May 19, 1999] "Query Languages [and the Web]." By Massimo Marchiori. Presentation Slides. From the Eighth International World Wide Web Conference (WWW8), Toronto, Canada. May 1999.
Jonathan Robie (Software AG) announced on XML-DEV that he has "set up a mailing list for XQL (XML Query Language). This new list "is intended to answer questions about the definition of the language, how to implement it, who has implemented it in what products, and whatever else seems to be of interest. [Also, Robie] will use this list to try to reach consensus in the XQL community if decisions need to be made, e.g., to add new extensions."
XQL FAQ - By Jonathan Robie. Updated 26 Mar 1999 or later. "This FAQ and the associated XQL Mailing List are intended to point you to resources that will help you learn, use, or implement XQL."
[August 16, 2000] Comparative Analysis of Five XML Query Languages." By Angela Bonifati and Stefano Ceri (Dipartimento di Elettronica e Informazione, Politecnico di Milano, Piazza Leonardo da Vinci 32, I-20133 Milano, Italy). [Email: bonifati/[email protected]]. September 15, 1999, submitted for publication. Published: SIGMOD Record Volume 29, Number 1 (March 2000), pages 68-79. "XML is becoming the most relevant new standard for data representation and exchange on the WWW. Novel languages for extracting and restructuring the XML content have been proposed, some in the tradition of database query languages (i.e. SQL, OQL), others more closely inspired by XML. No standard for XML query language has yet been decided, but the discussion is ongoing within the World Wide Web Consortium and within many academic institutions and Internet-related major companies. We present a comparison of five, representative query languages for XML, highlighting their common features and differences. [...] Conclusions: A Unified View. The five reviewed languages can be organized in a taxonomy, where: (1) LOREL and XML-QL are the OQL-like and XML-like representatives of Class 2 of expressive query languages for XML, playing the same role as high-level SQL standards and languages (e.g., SQL2) in the relational world. Our study indicates that they need certain additions in order to become equivalent inpower, in which case it would be possible to translate between them. Currently, a major portion of the queries that they accept can be translated from anyone language to another. (2) XSL and XQL are representative of Class 1 of single-document query languages, playing the same role as core SQL standards and languages (e.g. the SQL supported by ODBC) in the relational world; they do not have joins. Their expressive power is included within the expressive power of Class 2 languages. Their rationale is to extract information from a single document, to be expressed as a single string and passed as one of the URL parameters. (3) XML-GL can be considered as a graphical query interface to XML, playing the same role as graphical query interfaces (e.g., QBE) in the relational world. The queries being supported by XML- GL are the most relevant queries supported by Class 2 languages. If the common features (as initially identified in this paper) will become fully understood, it is possible to envision a collection of translators between languages of the same class, and/or between languages of different classes, and/or from the graphic language XML-GL to the programmative languages of Classes 1 and 2. In this way, query languages for XML will constitute a language hierarchy similar to the one existing for relational and object-relational databases." [cache]
[August 30, 2000] "Efficient Evaluation of Regular Path Expressions on Streaming XML Data." By Zachary G. Ives, Alon Y. Levy, and Daniel S. Weld (University of Washington Database Group). Technical Report UW-CSE-2000-05-02, University of Washington, 2000 Submitted for publication. 22 pages (with 18 references) "The adoption of XML promises to accelerate construction of systems that integrate distributed, heterogeneous data. Query languages for XML are typically based on regular path expressions that traverse the logical XML graph structure; the efficient evaluation of such path expressions is central to good query processing performance. Most existing XML query processing systems convert XML documents to an internal representation, generally a set of tables or objects; path expressions are evaluated using either index structures or join operations across the tables or objects. Unfortunately, the required index creation or join operations are often costly even with locally stored data, and they are especially expensive in the data integration domain, where the system reads data streamed from remote sources across a network, and seldom reuses results for subsequent queries. This paper presents the x-scan operator which efficiently processes non-materialized XML data as it is being received by the data integration system. X-scan matches regular path expression patterns from the query, returning results in pipelined fashion as the data streams across the network. We experimentally demonstrate the benefits of the x-scan operator versus the approaches used in current systems, and we analyze the algorithm's performance and scalability across a range of XML document types and queries. [Conclusions:] In this paper we have presented the x-scan algorithm, a new primitive for XML query processing, that evaluates regular path expressions to produce bindings. X-scan is scalable to larger XML documents than previous approaches and provides important advantages for data integration, with the following contributions: (1) X-scan is pipelined and produces bindings as data is being streamed into the system, rather than requiring an initial stage to store and index the data. (2) X-scan handles graph-structured data, including cyclical data, by resolving and traversing IDREF edges, and it does this following document order and eliminating duplicate bindings. (3) X-scan generates an index of the structure of the XML document, while preserving the original XML structure.(4) X-scan uses a set of dependen finite state machines to efficiently compute variable bindings as edges are traversed. In contrast to semi-structured indexing techniques, x-scan constructs finite automata for the paths in the query, rather than for the paths in the data. (5) X-scan is very efficient, typically imposing only an 8% overhead on top of the time required to parse the XML document. X-scan scales to handle large XML sources and compares favorably to Lore and a commerical XML repository, sometimes even when the cost of loading data into those systems is ignored." Note from ZI home page: "Zack works with Professors Alon Levy and Dan Weld on the Tukwila data integration system. Tukwila uses adaptive query processing techniques to efficiently deal with processing heterogeneous, XML-based data from across the Internet. Current research in Tukwila is in new adaptive query processing techniques for streaming XML data, as well as policies for governing the use of adaptive techniques." [cache]
[October 03, 2000] "How to Structure and Access XML Documents With Ontologies." By Michael Erdmann and Rudi Studer (Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB) University of Karlsruhe, D-76128 Karlsruhe, Germany; Email: {erdmann|studer}@aifb.uni-karlsruhe.de). Date: 11-April-2000. 21 pages, 24 references. To appear in: Data and Knowledge Engineering, Special Issue on Intelligent Information Integration. "Currently dozens of XML-based applications exist or are under development. Many of them offer DTDs that define the structure of actual XML documents. Access to these documents relies on special purpose applications or on query languages that are closely tied to the document structures. Our approach uses ontologies to derive a canonical structure, i.e., a DTD, to access sets of distributed XML documents on a conceptual level. We will show how the combination of conceptual modeling, inheritance, and inference mechanisms on the one hand with the popularity, simplicity, and flexibility of XML on the other hand leads to applications providing a broad range of high quality information. [...] In this paper we showed that ontologies provide a compact, formal, and conceptually adequate way of describing the semantics of XML documents. By deriving DTDs from an ontology the document structure is grounded on a true semantic basis and thus, XML documents become adequate input for semantics-based processing. By providing a conceptual foundation for XML we achieve at the same time a way to access sets of differently structured XML documents rather independently of their actual linear representation. The ontology provides a shared vocabulary that integrates the different XML sources, that makes the information uniformly accessible, and thus mediates between the conceptual terms used by an information seeker and the actual markup used in XML documents. Our approach relates to work from the areas of semi-structured data models, query languages, and metadata. We do not claim that semi-structured data models or query languages are not relevant for XML, instead we claim that they need to be complemented by ontology-based approaches like ours (or, under certain circumstances, that pursued by ["Ontology-aware XML queries"]). They are powerful tools to retrieve the contents of documents based on the document structure. The data models of all these approaches (among others XML-QL, Lorel for XML, XSL, and XQL) directly reflect the document structure, i.e., its syntax. ONTOBROKER+XML abstracts from this structure and refers to the contents as concepts and relationships, instead. The relationship between our approach and RDF/RDFS is manifold. Both define an ontology (or schema) that is used to structure the contents of XML documents. We use Frame Logic and automatically derive a DTD that constrains the possible document structures. In the RDF/RDFS context the schema is encoded in XML itself using the primitives provided by RDFS. In both cases the ontology is seen as a semantic complement to the DTD describing syntactic properties, only. Both approaches encode the factual knowledge in XML. Differences lie in the expressible ontological primitives. Frame Logic comes from an object oriented and logic-based tradition where each class has its own local set of attributes, whereas in RDF attributes are global and not uniquely associated with a class. The expressibility of Frame Logic is richer than of RDF/RDFS, since in Frame Logic arbitrary axioms can be formulated to derive new information. This is currently not possible in RDF/RDFS. Since, it cannot be expected that application development always starts with modeling an ontology we must take care of existing XML document structures or XML Schemas, how they can be related to an ontology, or how they can be used to derive an ontology. This reverse direction allows us (i) to keep and use the existing XML documents and structures, (ii) to use all existing applications that create, access, manipulate, filter, render, and query these documents, and (iii) at the same time to benefit from the domain knowledge modeled in the ontology by utilizing smarter applications that can complement (or even replace) the existing applications in some areas, especially query answering." [cache]
[October 03, 2000] "Ontology-aware XML-Queries." By Michael Erdmann (Institut für Angewandte Informatik und Formale Beschreibungsverfahren [AIFB] University of Karlsruhe, D-76128 Karlsruhe, Germany) and Stefan Decker (Department of Computer Science, Stanford University, Stanford, CA 94305). Paper presented at WebDB 2000, Dallas, Texas, May 2000. "The Extensible Markup Language is accepted as the emerging standard for data interchance on the Web. XML allows authors to create their own markup (e.g. <Student>), which seems to carry some semantics. However, from a computational perspective tags like <Student> carries as much semantics as a tag like <H1>. A query answering facility simply does not know, what an author is and how the concept author is related to e.g., a concept person. We investigate technologies to enrich query-answering with XML documents using background knowledge about concept-structures. ... Several query-approaches for XML and XML-based languages are reported in the literature. We have already discussed XQL and XML-QL, and will now focus on other approaches. LORE (Lightweight Object Repository) is a DBMS designed specifically for querying data expressed in OEM (Object Exchange Model) and XML. It does not support ontology-aware queries, but is extensible using the sketched framework in our paper. RQL (RDF Query Language) does not support general XML, but RDF. It does support retrieval based on the subclass structure, but the subclass structure is given through an RDF schema, which defines subclassOf facts between terms. Our approach enables to use reasoning instead of a fix encoded subclass structure. Bar-Yossef queries documents based on the semantic tagging, but does not consider a specific ontology, and especially not a subclass structure and does not support XML. Because the relationship between XML structure and ontologies may be manifold, it is necessary to define several rewriting algorithms, and thus extend the applicability of our approach. As a next step in our work we plan to parameterize the XML/ontology mapping in a way that the rewriting algorithm can be easily adopted in different application scenarios." See also "XML and 'The Semantic Web'." [cache]
[October 04, 2000] "XML and Scheme." By Oleg Kiselyov. A micro-talk presentation at a Workshop on Scheme and Functional Programming 2000, Montréal, 17 September 2000. "This talk will propose consistent or conformant Scheme implementations of W3C Recommendations: XML Infoset, XPath query language and a small subset of XSL Transformations. At the top of the [W3C XML semantics] hierarchy is an XML information set, Infoset: an abstract data set that describes information available in a well-formed XML document. Infoset's goal is to present in some form all relevant pieces of data and their abstract, container-slot relationships to each other. The implementation of this nest of containers as well as means of accessing items and properties are beyond the scope of the Infoset specification. XML document, with tags in familiar angular brackets is one of the concrete instances of an XML Infoset. Conversion is through parsing/unparsing. The XML Path Language, XPath, makes the tree structure of XML Infoset explicit, and more practical. For example, XPath groups character information items into strings. Still the tree model of XPath is conceptual only, and does not mandate any particular implementation. XPath is a query language over an XPath tree. A Document Object Model (DOM) is another specialization of Infoset. It not only describes a tree view of a document but also makes the tree real to an application, via a set of interfaces to navigate the tree and query or update its nodes. XML Stylesheet Language Transformations, XSLT, build upon the XPath tree to describe transformations from branches and leaves of one tree onto another. This talk will show a conformant instance of this mostly abstract hierarchy with Scheme as an implementation language. Scheme represents the XML Infoset and the XPath tree -- which are data structures. Scheme is also used to express XPath queries and tree transformations... We have shown an s-expression-style for XML, XPath and XSLT. SXML, SXPath and SXSLT however are not merely s-expressions: they can denote executable Scheme code and can be evaluated by an eval function as they are. Thus an XML document and operations on it can be expressed in Scheme -- and regarded either as data structures or as code." Note in this connection also (1) 'An XML parsing/lexing framework' - "This framework is a collection of low- to high-level parsers for various productions defined in the XML Recommendation. The package is intended to be a set of 'Lego blocks' you can use to build a SAX or a DOM parser -- or a specialized lightweight parser for a particular document type. The framework also contains its own high-level XML parser: a procedure that turns an XML document or a well-formed part of it into the corresponding SXML form. The latter is an S-expression-based model of an XML Information Set. SXML is a 'relative' of DOM, whose data model is another instance of the XML Infoset. SXML is particularly suitable for Scheme-based XML/HTML authoring, SXPath queries, and tree transformations. The comments to a function SSAX:element -> SXML formally define SXML and give more details. The present XML parsing framework has a 'sequential' feel of SAX yet a 'functional style' of DOM. It parses XML in a pure functional style, with the input port being a monad (a linear, read-once parameter)." and (2) 'Evaluating SXML: XSL Transformations in Scheme'. [cache]
[August 30, 2000] "X-Scan: a Foundation for XML Data Integration." Project overview. From the University of Washington Database Group. 'The x-scan algorithm is a new operator designed to facilitate integration of XMLdata sources in the context of the Tukwila data integration system.' "The adoption of XML promises to accelerate construction of systems that integrate distributed, heterogeneous data. Query languages for XML are typically based on regular path expressions that traverse the logical XML graph structure; the efficient evaluation of such path expressions is central to good query processing performance. Most existing XML query processing systems convert XML documents to an internal representation, generally a set of tables or objects; path expressions are evaluated using either index structures or join operations across the tables or objects. Unfortunately, the required index creation or join operations are often costly even with locally stored data, and they are especially expensive in the data integration domain, where the system reads data streamed from remote sources across a network, and seldom reuses results for subsequent queries. We propose the x-scan operator, which efficiently processes non-materialized XML data as it is being received by the data integration system. X-scan matches regular path expression patterns from the query, returning results in pipelined fashion as the data streams across the network. We have experimentally demonstrated the benefits of the x-scan operator versus the approaches used in current systems and analyzed the algorithm's performance and scalability across a range of XML document types and queries. [...] X-scan is a new method for evaluating path expressions as data is streaming into the system. The input to x-scan is an XML data stream and a set of regular path expressions occurring in a query; x-scan's output is a stream of bindings for the variables occuring in the expressions. A key feature of x-scan is that it produces these bindings incrementally, as the XML data is streaming in; hence, x-scan fits naturally as the source operator to a complex pipeline, and it is highly suited for data integration applications. X-scan is motivated by the observation that IDREF links are limited to the scope of the current document, so in principle, the entire XML query graph for a document could be constructed in a single pass. X-scan achieves this by simultaneously parsing the XML data, indexing nodes by their IDs, resolving IDREFs, and returning the nodes that match the path expressions of the query. In addition to the path expression evaluation routines, x-scan includes the following functionality: (1) Parsing the XML document; (2) Node ID recording and reference resolving; (3) Creating a graph-structured index of the file; (4) Returning tuples of node locations..."
[August 30, 2000] "XML Query Languages in Practice: An Evaluation." By Zachary G. Ives and Ying Lu. Paper presented at Web Age Information Management 2000, Shanghai, China. Abstract. The popularity of XML as a data representation format has led to significant interest in querying XML documents. Although a 'universal' query language is still being designed, two language proposals, XQL and XML-QL, are being implemented and applied. Experience with these early implementations and applications has been instructive in determining the requirements of an XML query language. In this paper, we discuss issues in attempting to query XML, analyze the strengths and weaknesses of current approaches, and propose a number of extensions. We hope that this will be helpful both in forming the upcoming XML Query language standard and in supplementing existing languages. [Conclusion:] In this paper, we have described the two most widely accepted XML query languages, XQL and XML-QL, and examined how they can be applied to three different domains: relational queries, queries over arbitrary XML data, and graph-structured scientific applications. While we believe this to be the first analysis of XML query languages' applicability, issues in designing an XML query language have been frequently discussed in the literature. Recently, Bonifati and Ceri presented a survey of five major XML query languages that compared the features present in each. The goal of this paper is more than to provide a feature comparison: we hope to promote a greater understanding of XML query semantics, and to detail some of the problems encountered in trying to apply these languages. While a query language containing the 'union' of the features present in XQL and XML-QL will go a large way towards solving the needs of querying XML, we also propose a number of extensions that we feel are necessary: (1) An XML graph model with defined order between IDREFs and subelements; (2) Regular path expression extensions for subelement, IDREF, or arbitrary edges; (3) Support for 'optional' path expression components and null values; (4) Support for following XPointers; (5) Pruning of query output; (6) Clearer semantics for copying subgraphs to query output." [cache]
[September 15, 2000] "XDuce: An XML Processing Language." Preliminary Report. By Haruo Hosoya and Benjamin C. Pierce (Department of CIS, University of Pennsylvania). In Proceedings of Third International Workshop on the Web and Databases (WebDB2000). May 18-19, 2000, Adam's Mark Hotel, Dallas, TX. 6 pages, 15 references. Among the reasons for the popularity of XML is the hope that the static typing provided by DTDs (or more sophisticated mechanisms such as XML-Schema) will improve the safety of data exchange and processing. But, in order to make the best use of such typing mechanisms, we need to go beyond types for documents and exploit type information in static checking of programs for XML processing. In this paper, we present a preliminary design for a statically typed programming language, XDuce (pronounced 'transduce'). XDuce is a tree transformation language, similar in spirit to mainstream functional languages but specialized to the domain of XML processing. Its novel features are regular expression types and a corresponding mechanism for regular expression pattern matching. Regular expression types are a natural generalization of DTDs, describing, as DTDs do, structures in XML documents using regular expression operators (i.e., *, ?, |, etc.). Moreover, regular expression types support a simple but powerful notion of subtyping, yielding a substantial degree of flexibility in programming. Regular expression pattern matching is similar to ML pattern matching except that regular expression types can be embedded in patterns, which allows even more flexible matching. In this preliminary report, we show by example the role of these features in writing robust and flexible programs for XML processing. After discussing the relationship of our work to other work, we briefly sketch some larger applications that we have written in XDuce, and close with remarks on future work. A formal definition of the core language can be found in the full version of this paper. . . XDuce's values are XML documents. A XDuce program may read in an XML document as a value and write out a value as an XML document. Even values for intermediate results during the execution of the program have a one-to-one correspondance to XML documents (besides some trivial differences). As concrete syntax, the user has two choices: XML syntax or XDuce's native syntax. [In the paper] we have presented several examples of XDuce programming and shown how we can write flexible and robust programs for processing XML by combining regular expression types and regular expression pattern matching. We consider XDuce suitable for applications involving rather complicated tree transformation. Moreover, for such applications, our static typing mechanism would help in reducing development periods. In this view, we have built a prototype implementation of XDuce and used it to develop some small but non-trivial applications. . . [Related work:] Mainstream XML-specific languages can be divided into query languages such as XML-QL and Lorel and programming languages such as XSLT. In general, when one is interested in rather simple information extraction from XML databases, programs in programming languages are less succinct than the same programs in a suitable query language. On the other hand, programming languages tend to be more suitable for writing complicated transformations like conversion to a display format. XDuce is categorized as a programming language... (a) A recent example of the embedding approach is Wallace and Runciman proposal to use Haskell as a host language for XML processing. The only thing they add to Haskell is a mapping from DTDs into Haskell datatypes. (b) Another piece of work along similar lines is the functional language XMlambda for XML processing, proposed by Meijer and Shields. Their type system is not described in detail in this paper, but seems to be close to Haskell's, except that they incorporate Glushkov automata in type checking, resulting in a more flexible type system. (c) A closer relative to XDuce is the query language YAT, which allows optional use of types similar to DTDs. The notion of subtyping between these types is somewhat weaker than ours (lacking, in particular, the distributivity laws used in the 'database evolution' example in Section 2.1). Types based on tree automata have also been proposed in a more abstract study of typechecking for a general form of 'tree transformers' for XML by Milo, Suciu, and Vianu. The types there are conceptually identical to those of XDuce. The type system of XDuce was originally motivated by the observation by Buneman and Pierce that untagged union types corresponded naturally to forms of variation found in semistructured databases." See the XDuce web site; XDuce examples; download XDuce system 0.1.10. [cache]
"Adding Relevance to XML." By Anja Theobald and Gerhard Weikum (Department of Computer Science, University of the Saarland, Germany' WWW: http://www-dbs.cs.uni-sb.de; Email: {theobald, weikum} @cs.uni-sb.de). . Presented at [Session 3: Querying XML] WebDB 2000, May 18-19, 2000, Adam's Mark Hotel, Dallas, TX. 13 references. "XML query languages proposed so far are limited to Boolean retrieval in the sense that query results are sets of qualifying XML elements or subgraphs. This search paradigm is intriguing for 'closed' collections of XML documents such as e-commerce catalogs, but we argue that it is inadequate for searching the Web where we would prefer ranked lists of results based on relevance estimation. IR-style Web search engines, on the other hand, are incapable of exploiting the additional information made explicit in the structure, element names, and attributes of XML documents. In this paper we present a compact query language that reconciles both search paradigms by combining XML graph pattern matching with relevance estimations and producing ranked lists of XML subgraphs as search results. The paper describes the language design and sketches implementation issues. . . XML is the main driving force in the ongoing endeavour for data integration across the entire spectrum from largely unstructured to highly schematic data. In an abstract sense, all data is uniformly captured by a graph with nodes representing XML elements along with their attributes. A variety of query languages have been proposed for searching XML data [...] These languages essentially combine SQL-style logical conditions over element names, contents, and attributes with regular- expression pattern matching along entire paths of elements. The result of a query is a set of paths or subgraphs from a given graph that represents an XML document or document collection. Although originally motivated by Web searching, the key target of XML query languages has shifted to searching over a single or a small number of federated XML repositories such as electronic shopping catalogs. In this setting, the focus on Boolean retrieval, where an XML path is either a query match or does not qualify at all, is adequate. In the Web, however, where the graph for an XML document is conceptually extended by following outgoing links to other sites, ranked retrieval remains the preferred search paradigm as it is practically impossible to compute exhaustive answers for Boolean-search queries. Thus, traditional Web search engines, typically based on variations of the vector space model, remain the only viable choice for large-scale information retrieval on the Web. This well established technology, on the other hand, disregards the opportunities for more effective retrieval that arise from the fact that XML-based data makes more structure and semantic annotations (i.e., element and attribute names) explicit. In this paper, we argue that XML query languages should be extended by information-retrieval-style similarity conditions, so that a query returns a ranked list sorted by descending relevance. We propose a concrete, simple language along these lines, which we have coined XXL for "flexible XML search language". To this end, we have adopted core concepts of XML-QL, and have extended them with similarity conditions on elements and their attributes. The relevance assessments, also known as 'scores', for all elementary conditions in a query, are combined into an overall relevance of an XML path, and the result ranking is based on these overall relevance measures. . . The biggest challenge in our future work lies in making the approach scalable and perform well on distributed Web data. In this broader context, Oracle8i interMedia as a thesaurus-backed text search engine would have to be replaced by a Web search engine, which is relatively easy with our modular architecture. In addition and more importantly, the presented traversal procedure has to be extended for this purpose. For more efficient traversal we consider prefetching techniques, and we are looking into approximative index structures as a search accelerator. Both techniques may exploit the fact that we are heuristically computing a relevance ranking, so that a faster but less complete result is usually tolerable." [cache, alt URL]
"Argos: Efficient Refresh in an XQL-Based Web Caching System." By Luping Quan, Li Chen, Elke A. Rudensteiner (Worcester Polytech). Presented at WebDB 2000, May 18-19, 2000, Adam's Mark Hotel, Dallas, TX. [cache] [alt URL]
[August 30, 2000] "XPERANTO: Publishing Object-Relational Data as XML." By Michael Carey, Daniela Florescu, Zachary Ives, Ying Lu, Jayavel Shanmugasundaram, Eugene Shekita, and Subbu Subramanian. Third International Workshop on the Web and Databases (WebDB), May 2000, Dallas, TX. "Since its introduction, XML, the Extensible Markup Language, has quickly emerged as the universal format for publishing and exchanging data in the World Wide Web. As a result, data sources, including object-relational databases, are now faced with a new class of users: clients and customers who would like to deal directly with XML data rather than being forced to deal with the data source's particular (e.g., object-relational) schema and query language. The goal of the XPERANTO project at the IBM Almaden Research Center is to serve as a middleware layer that supports the publishing of XML data to this class of users. XPERANTO provides a uniform, XML-based query interface over an object-relational database that allows users to query and (re)structure the contents of the database as XML data, ignoring the underlying SQL tables and query language. In this paper, we give an overview of the XPERANTO system prototype, explaining how it translates XML-based queries into SQL requests, receives and then structures the tabular query results, and finally returns XML documents to the system's users and applications. [Conclusions:] In this paper, we have described a systematic approach to publishing XML data from existing object-relational databases. As we have explained, our work on XPERANTO is based on a 'pure XML' philosophy -- we are building the system as a middleware layer that makes it possible for XML experts to define XML views of existing databases in XML terms. As a result, XPERANTO makes it possible for its users to create XML documents from object-relational databases without having to deal with their native schemas or SQL query interfaces. XPERANTO also provides a means to seamlessly query over object-relational data and meta-data. Our plans for future work include providing support for insertable and updateable XML views. We are also exploring the construction and querying of XML documents having a recursive structure, such as part hierarchies and bill of material documents." See also: "XPERANTO: A Middleware for Publishing Object-Relational Data as XML Documents", by M. Carey, J. Kiernan, J. Shanmugasundaram, E. Shekita, and S. Subramanian. VLDB Conference, September 2000.[cache]
[August 30, 2000] "Efficiently Publishing Relational Data as XML Documents." By J. Shanmugasundaram, E. Shekita, R. Barr, M. Carey, B. Lindsay, H. Pirahesh, and B. Reinwald VLDB Conference, September 2000. "XML is rapidly emerging as a standard for exchanging business data on the World Wide Web. For the foreseeable future, however, most business data will continue to be stored in relational database systems. Consequently, if XML is to fulfill its potential, some mechanism is needed to publish relational data as XML documents. Towards that goal, one of the major challenges is finding a way to efficiently structure and tag data from one or more tables as a hierarchical XML document. Different alternatives are possible depending on when this processing takes place and how much of it is done inside the relational engine. In this paper, we characterize and study the performance of these alternatives. Among other things, we explore the use of new scalar and aggregate functions in SQL for constructing complex XML documents directly in the relational engine. We also explore different execution plans for generating the content of an XML document. The results of an experimental study show that constructing XML documents inside the relational engine can have a significant performance benefit. Our results also show the superiority of having the relational engine use what we call an 'outer union plan' to generate the content of an XML document. [.. .] To summarize, our performance comparison of the alternatives for publishing XML documents points to the following conclusions: (1) Constructing an XML document inside the relational engine is far more efficient that doing so outside the engine, mainly because of the high cost of binding out tuples to host variables. (2) When processing can be done in main memory, a stable approach that is always among the very best (both inside and outside the engine), is the Unsorted Outer Union approach. (3) When processing cannot be done in main memory, the Sorted Outer Union approach is the approach of choice (both inside and outside the engine). This is because the relational sort operator scales well." [cache]
[August 30, 2000] "Relational Databases for Querying XML Documents: Limitations and Opportunities." By J. Shanmugasundaram, K. Tufte, G. He, C. Zhang, D. DeWitt, and J. Naughton. VLDB Conference, September 1999. See also the slides. "With the growing importance of XML documents as a means to represent data in the World Wide Web, there has been a lot of effort on devising new technologies to process queries over XML documents. Our focus in this paper, however, has been to study the virtues and limitations of the traditional relational model for processing queries over XML documents conforming to a schema. The potential advantages of this approach are many -- reusing a mature technology, using an existing high performance system, and seamlessly querying over data represented as XML documents or relations. We have shown that it is possible to handle most queries on XML documents using a relational database, barring certain types of complex recursion. Our experience has shown that relational systems could more effectively handle XML query workloads with the following extensions: (1) Support for Sets: Set-valued attributes would be useful in two important ways. First, storing set sub-elements as set-valued attributes would reduce fragmentation. This is likely to be a big win because most of the fragmentation we observed in real DTDs was due to sets. Second, set-valued attributes, along with support for nesting [13], would allow a relational system to perform more of the processing required for generating complex XML results. (2) Untyped/Variable-Typed References: IDREFs are not typed in XML. Therefore, queries that navigate through IDREFs cannot be handled in current relational systems without a proliferation of joins -- one for each possible reference type. (3) Information Retrieval Style Indices: More powerful indices, such as Oracle8i's ConText search engine for XML, that can index over the structure of string attributes would be useful in querying over ANY fields in a DTD. Further, under restricted query requirements, whole fragments of a document can be stored as an indexed text field, thus reducing fragmentation. (4) Flexible Comparisons Operators: A DTD schema treats every value as a string. This often creates the need to compare a string attribute with, say, an integer value, after typecasting the string to an integer. The traditional relational model cannot support such comparisons. The problem persists even in the presence of DCDs or XML Schemas because different DTDs may represent 'comparable' values as different types. A related issue is that of flexible indices. Techniques for building such indices have been proposed in the context of semi-structured databases . (5) Multiple-Query Optimization/Execution: As outlined in Section 4, complex path expressions are handled in a relational database by converting them into many simple path expressions, each corresponding to a separate SQL query. Since these SQL queries are derived from a single regular path expression, they are likely to share many relational scans, selections and joins. Rather than treating them all as separate queries, it may be more efficient to optimize and execute them as a group. (6) More Powerful Recursion: As mentioned in Section 4, in order to fully support all recursive path expressions, support for fixed point expressions defined in terms of other fixed point expressions (i.e., nested fixed point expressions) is required. These extensions are not by themselves new and have been proposed in other contexts. However, they gain new importance in light of our evaluation of the requirements for processing XML documents. Another important issue to be considered in the context of the World Wide Web is distributed query processing -- taking advantage of queryable XML sources. Further research on these techniques in the context of processing XML documents will, we believe, facilitate the use of sophisticated relational data management techniques in handling the novel requirements of emerging XML-based applications." [cache]
[April 18, 2000] "XSLT as a Query Language for XML." By Arnaud Sahuguet (University of Pennsylvania Database Research Group). "There is a lot of discussions about which query language should be used for XML. An excellent survey is XML Query Languages: Experiences and Examplars, by Mary Fernandez, Jerome Simeon and Phil Wadler. Unfortunately, their paper does not examine XSL-T as a candidate. This Web page ["XSLT as a Query Language for XML"] revisits the 10 queries described in the paper and offer one way to answer them using XSL-T. We do not claim that this is the unique or best way to do it. Comments and suggestions are welcome. All queries have been tested using either XT or Xalan. Query 6 requires Xalan, because XT does not support the document() construct..."
[June 23, 2000] "XYZFind: Searching in Context with XML." By Daniel Egnor and Robert Lord (XYZFind Corporation, Seattle, Washington, USA). Paper submitted to the SIGIR 2000 XML Workshop. "In an attempt to improve search precision, information retrieval systems struggle to understand the context of text in unstructured documents. While not all documents have explicit structure, we believe the emergence of XML as a standard format for representing structure offers an opportunity for greatly improved information retrieval based on the logical context that is made explicit in XML documents. At XYZFind Corp., we are building an information retrieval system which indexes diverse corpora of XML documents to offer a context-based search experience that outperforms unstructured information retrieval systems. XYZFind does not merely return a better list of documents matching a user's query; instead, we engage the user in a dialogue. Using information known about the domains of knowledge (XML schemas) in the corpus, XYZFind helps the user construct a highly precise query. In some cases, XYZFind actually constructs a precise query automatically from the user's keyword query. Finally, rather than simply listing document locations, the software is able to extract and format results in a way that is highly relevant to the user's query. We are implementing this technology by extending well-known information retrieval techniques as well as drawing on database research. We have not sacrificed the traditional strengths of information retrieval systems (ease of use, flexibility, scalability and low deployment cost)..." See also the software description.
[April 18, 2000] "A Data Model and Algebra for XML Query." By Mary Fernández, Jérûme Siméon, Dan Suciu, and Philip Wadler. ""This note presents a possible data model and algebra for an XML query language. It should be compared with the alternative proposal. The algebra is derived from the nested relational algebra, which is a widely-used algebra for semi-structured and object-oriented databases. For instance, similar techniques are used in the implementation of OQL. We differ from other presentations of nested relational algebra in that we make heavy use of list comprehensions, a standard notation in the functional programming community. We find list comprehensions slighly easier to manipulate than the more traditional algebraic operators, but it is not hard to translate comprehensions into these operators (or vice versa). One important aspect of XML is not covered by traditional nested relational algebras, namely, the structure imposed by a DTD or Schema. (So far as we can see, the proposal also does not cover this aspect.) We extend the nested relational algebra with operators on regular expressions to capture this additional structure. The operators for regular expressions are also expressed with a comprehension notation, similar to that used for lists. Again, a similar technique is common in the functional programming community. (Chapter 11 of [ref 1] uses a do notation that is quite similar to our comprehensions.) We use the functional programming language Haskell as a notation for presenting the algebra. This allows us to use a notation that is formal and concrete, without having to invent one from scratch. It also means you can download and play with the algebra, for instance, to try implementing your favorite query. We use a slightly modified version of Haskell that supports regular expression comprehensions. Code that implements the algebra and a modified Hugs interpreter for Haskell can be downloaded from the URL at the head of this document..." Also in PDF and Postscript format. [cache]
[February 09, 2000] From Jonathan Robie: "There is a paper 'Projection and Transformation in XML Queries' at the following URL: http://metalab.unc.edu/xql/projection-transformation.html. This paper follows some decisions that were made on this list late last year [1999]: (1) XQL now uses XPath's revisions to the original XQL syntax. (2) Node Generators have been added to allow transformations of the kind supported by XML-QL. (3) XPath Axes are not part of XQL, but it is not an error to support them. (4) The "before" and "after" operators are part of XQL. This is a paper rather than a spec. I need to produce an up-to-date spec, based on the XPath spec. I will put one together over the weekend and post it. It will incorporate XPath as a normative reference, and discuss the additions." [local archive copy]
[February 09, 2000] Test suite for xql. Q: "Is there a test suite for validating xql implementations?" A: There is a small suite of tests dating from May 1999. Not official, not complete, but a start :) [From: Ingo Macherius]
Open Source XQL. "This forum is for the development of an Open Source version of XQL. XQL (XML Query Language) is a query language for XML documents that provides a simple and embeddable syntax. This forum provides a place for developers to share in the creation of a query engine for XML."
"Querying XML Data." - Tutorial at XML Europe '99 Granada, Spain. By Pascale Leblanc (Senior Software Engineer) and Jacques Deseyne (Senior XML/SGML Consultant), Sema Group, Belgium. "From simple locators over pattern matching to the new proposals for specific query languages, the standardization activity presents many mechanisms to specify a request for finding specific elements and getting information out of XML documents. This tutorial gives an overview of these mechanisms, their purpose, origin and background and their current status. Rather than listing a complete detailed inventory, this tutorial tries to provide an orientation aid for those who want to understand the requirements and motivations behind the different proposals submitted to the World-Wide-Web Consortium."
"Infoseek Goes Bilingual." By Chris Oakes. In Wired News (November 12, 1998). "[Infoseek] will make new searching software available next week - software that can interpret documents encoded with the Extensible Markup Language..."
"XML and Search." - By Avi Rappoport. See also "XML Searching Resources."


SEARCH \| ABOUT \| INDEX \| NEWS \| CORE STANDARDS \| TECHNOLOGY REPORTS \| EVENTS \| LIBRARY

Contents