June 30 2000 Dongwook Shin The developer of XPERT dwshin@futurexpert.com
First you may wonder why XPERT supports another query language other than XPath, XQL, or Quilt. Why should I make another XML query language, besides current a dozen XML query languages. In fact, I don't want to make it complicated. But one of the lessons I learned from the previous experience is that "the simpler, the better!".
In my opinion, the current version of XPath is quite complicated than it should be. XQL and Quilt are also complicated and evolving. The only things I like from XPath and XQL are "simple path notation" and "predicate (or filter in XQL)". So I want to re-make the language as simple as possible with these two notion.
Then the question is why a language should be as simple as possible. The reason is twofold: The first is that it is easy to use simpler language than the other. The second is that it is more likely to get a better optimization.
For instance,
XPath specification does not address which symbol is the starting one in the grammar: Expr, or PathExpr. Moreover, the grammar is defined as recursive: Expr to PathExpr and PathExpr to Expr, which makes hard to imagine the expressiveness of the language.I think an XML query language should have three functionalities:
These three are quite different from one another, so should not be put together. One good example is XSLT, which provides node selection and transformation. It is a good language (I think.) in that it allows first to select nodes with selection functionality (by XPath) and transform them into another one. On the other hand, in XQL does not support the transformation.
So, I want to differentiate these three and make a language that has all these features in a distinct way. However, unfortunately, I don't know how to do that. The only way I can do is build a language in a constructive way, adding a feature one by one, without violating the basic assumption (The simpler, the better). At this moment, I want to build a node selection component in this way. One of the main reasons that I choose XPath as the starting point is that it is a W3C recommendation and has many good features. Another rule in building a language is that I don't want to provide many ways to express the same query. You may think that the more ways you can have in expressing the same query, the more flexibility you will enjoy. But the reality is that you have to pay much more than you will get. One important price you have to pay is the performance issue of the query evaluation. It is generally true that the more complex a query language is, the more difficult we will get an optimization.
So the question is "Do you really want to have extra ways to express a query even though you already have some, in the expense of querying performance?" I think the answer should be No. So I try to put reasonable constraints in composing a query, which keeps from allowing many ways to express the same question and contributes to achieve the better optimization.
One such constraint is :
The test node (or Node test in XPath notation) should be an ancestor or equivalent to the condition nodes in predicates.
For instance, take an XPath query: "//SECTION/TITLE[contains(..//PARA, "XPath")]", which says "retrieve TITLEs of SECTION whose descendant PARA has literal "XPath". It is a legal XPath expression. But you can also represent the query as
"//SECTION[contains(.//PARA, "XPath")]/TITLE"
It tatally depends on your implementation which is faster. Right now, I only allow the second expression, since it it easier to implement and makes the query evaluation simpler. However, later, when I develop a way to transform the first into the second or find a simpler and faster way to implement the first, I will drop the constraint.
A similar rational applies to the query evaluation of XPERT on XPQL query. XPERT only retrieves the outermost element when more than one elements nested together are found. For instance, if an element <PARA> is found, but it contains another <PARA> elements relevant to the query "//PARA[contains(., "XPERT")]" as:
<PARA> XPERT <PARA> XPERT </PARA> </PARA>
then XPERT only returns the outer <PARA> instead of returning both. XPERT considers the efficiency more important than the completeness.
If you want to find the nested element, you can search the retrieved result instead, which gives you the possibility to reach the completeness.
Now, here is the BNF form of the XPQL and example legal queries. I try to use the same symbols as used in XPath specification. This is just a beginning. Don't be disappointed if XPQL omits the features you think important. As it is evolving, we will cover it unless it violates our mission in the next release. Don't hesitate to mail us at
dwshin@futurexpert.com and join to make a better language.
-
- BNF form of XPQL
XPQLquery ::= Expr Expr ::= PathExpr | PathExpr '|' Expr (* '|' is union operator) PathExpr ::= AbsoluteLocationPath AbsoluteLocationpath ::= '/' RelativeLocationPath ? | '//' RelativeLocationPath ? RelativeLocationpath ::= Step '|' RelativeLocationPath '/' Step '|' RelativeLocationPath '//' Step Step ::= ElementName '|' ElementName Predicate Predicate ::= '[' PredicateExpr ']' ElementName ::= NodeName | '.' | '..' (* NodeName means the legal names for XML nodes and thus are not defined in more detail) PredicateExpr ::= OrExpr | OrExpr or PredicateExpr OrExpr ::= AndExpr | AndExpr and OrExpr AndExpr ::= UnionExpr | UnionExpr '|' AndExpr UnionExpr ::= Number | last( ) | contains(PathwithoutPredicate, literal ) | in(IRExpr, pathwithoutPredicate ) | pathwithoutPredicate operator literal | literal operator pathwithoutPredicate (* The argument positions of in() are the opposite of those in contains()) PathwithoutPredicate ::= LocationPathwithoutPredicate LocationPathwithoutPredicate ::= RelativeLocPathwithoutPredicate | AbsoluteLocPathwithoutPredicate AbsoluteLocPathwithoutPredicate ::= '/' RelativeLocPathwithoutPredicate ? | '//' RelativeLocPathwithoutPredicate ? RelativeLocpathwithoutPredicate ::= StepwithoutPredicate '|' RelativeLocPathwithoutPredicate '/' StepwithoutPrediate '|' RelativeLocPathwithoutPredicate '//' StepwithoutPredicate StepwithoutPredicate ::=NodeName | NodeName@AttributeName | '.' | '..' (* AttributeName means the legal names for XML attribute and thus are not defined in more detail) operator ::= '=' | '<=' | '<' | '>=' | '>' IRExpr ::= OrIRExpr OrIRExpr ::= AndIRExpr | AndIRExpr or OrIRExpr AndIRExpr ::= literal | literal and AndIRExpr literal ::= '"'[^"]* '"' | '"'[^']" '"' (* inside in() function, '*' is a wild character) Number ::= [1-9][0-9]* (*Number means the position of a child. So It should be positive integer) Here are some legal XPQL queries.