Cover Pages: XML and Attribute Grammars

[November 22, 2000] [Provisional reference collection.]

References:

[March 26, 2002] "Using Object-Oriented Attribute Grammars as ODB System Generator." By Takashi Imaizumi, Takeshi Hagiwara, Katsuhiko Gondow, and Takuya Katayama. Department of Information Engineering, Faculty of Engineering, Niigata University, Niigata 950-2181, Japan. Presented at the Third Workshop on Attribute Grammars and their Applications (WAGA'00), Ponte de Lima, Portugal. July 7, 2000. 20 pages. "This paper presents MAGE2 system. It implements a computational model OOAG (Object-Oriented Attribute Grammars) and creates its attributed object trees in object-oriented database (ODB) using persistent object allocation mechanism of object-oriented database management systems (ODBMS). The MAGE2 is a programming support and execution environment for OOAG. The focus of this paper is on an execution system. We indicate core techniques to implement MAGE2, that is, how to execute specifications of OOAG and how to generate an ODB system. We are planning to use MAGE2 to design databases for storing data that have logical structures such as program source files, XML documents and so on... OOAG has been derived from attribute grammars. Declarative structures, separation of semantics and syntax definition, and local description resulting in high readability and high maintainability, and clear description due to functional computation of attributes are all desirable characteristics of AGs. We summarize the OOAG features as a generator of database systems as follows: (1) OOAG has been derived from attribute grammars; (2) We can program how to manipulate software objects by message passing; (3) We can describe data structure and manipulation method of software objects at the same place; (4) We can generate software repository system automatically from formal repository specification written in OSL language. An OOAG description is separated into two parts: one is a static specification and the other is a dynamic specification. They are described in a specification language OSL 'Object Specification Language'. We describe briefly each part and then give the correspondence of OSL language constructs to conventional attribute grammars constructs... A tool constructed by the MAGE system creates attributed object trees in an ODB persistently. Operations to persistent object trees can be described in OSL specifications by the message passing mechanism of OOAG model. Programmers who use MAGE will only write interface codes between generated tool. Created ODBs will be maintained by the OOAG evaluator. This includes updating, adding, or deleting objects. They will be described in dynamic subtrees operation by message passing. If the state of object trees in the database will be inconsistent, OOAG evaluation loop will keep them consistent. From above features of the MAGE system, we can develop complicated object-oriented database systems which manage structured data effciently..." [source, Postscript]
[November 22, 2000] "Adding Semantics to XML." By G. Psaila and S. Crespi-Reghizzi. Pages 113-132 in D. Parigot and M. Mernik [editors], Proceedings of the Second Workshop on Attribute Grammars and their Applications (WAGA '99), Amsterdam, The Netherlands, March 1999. INRIA Rocquencourt. "Starting form the analogy between a document tagged by a mark-up language (XML, SGML) and a source string generated by a BNF grammar, we argue that XML parsers should benefit from the addition of semantic attributes and functions. Currently XML only includes initialized lexical attributes. By our approach a XML parser would be extended into a syntax-directed translator. Deep transformations of a document could be specified, sent over the network, and executed within the XML system. For the specification of the semantic attributes and functions we propose a XML Document Type Definition, that is conceptually similar to the metalanguage of a compiler-compiler. By this approach the additions to the XML standard are kept to a minimum.The differences between attribute grammars and attributed XML specifications are discussed, and the system architecture of a semantic evaluator generator is presented." Also in Postscript format. [cache]
[November 22, 2000] "Extensions of Attribute Grammars for Structured Document Queries." By Frank Neven (Limburgs Universitair Centrum, Universitaire Campus, Dept. WNI, Infolab, B-3590 Diepenbeek, Belgium). Presented at DBPL99 - The Eighth International Workshop on Database Programming Languages (Kinloch Rannoch, Scotland, September 1st - 3rd, 1999). 18 pages (with 46 references). "Widely-used document specification languages like, e.g., SGML and XML, model documents using extended context-free grammars. These differ from standard context-free grammars in that they allow arbitrary regular expressions on the right-hand side of productions. To query such documents, we introduce a new form of attribute grammars (extended AGs) that work directly over extended context-free grammars rather than over standard context-free grammars. Viewed as a query language, extended AGs are particularly relevant as they can take into account the inherent order of the children of a node in a document. We show that two key properties of standard attribute grammars carry over to extended AGs: efficiency of evaluation and decidability of well-definedness. We further characterize the expressiveness of extended AGs in terms of monadic second-order logic, establish the complexity of their non-emptiness and equivalence problem to be complete for EXPTIME, and consider several extensions of extended AGs. As an application we show that the Region Algebra expressions introduced by Consens and Milo can be efficiently translated into extended AGs. This translation drastically improves the known upper bound on the complexity of the emptiness and equivalence test for Region Algebra expressions... Structured document databases can be seen as derivation trees of some grammar which functions as the 'schema' of the database'. Document specification languages like, e.g., SGML and XML, model documents using extended context-free grammars. Extended context-free grammars (ECFG) are context-free gram- mars (CFG) having regular expressions over grammar symbols on the right-hand side of productions. It is known that ECFGs generate the same class of string languages as CFGs. Hence, from a formal language point of view, ECFGs are nothing but shorthands for CFGs. However, when grammars are used to model documents, i.e., when also the derivation trees are taken into consideration, the difference between CFGs and ECFGs becomes apparent... The classical formalism of attribute grammars, introduced by Knuth, has always been a prominent framework for expressing computations on derivation trees. Therefore, in previous work, we investigated attribute grammars as a query language for derivation trees of CFGs. Attribute grammars provide a mechanism for annotating the nodes of a tree with so-called 'attributes', by means of so-called 'semantic rules' which can work either bottom-up (for so-called 'synthesized' attribute values) or top-down (for so-called 'inherited' attribute values). Attribute grammars are applied in such diverse fields of computer science as compiler construction and software engineering. Inspired by the idea of representing transition functions for automata on unranked trees as regular string languages, we introduce extended attribute grammars (extended AGs) that work directly over ECFGs rather than over standard CFGs... By carefully tailoring the semantics of inherited attributes, extended AGs can take into account the inherent order of the children of a node in a document. [Related work:] Schwentick and the present author defined query automata to query structured documents. Query automata are two-way automata over (un)ranked trees that can select nodes depending on the current state and on the label at these nodes. Query automata can express precisely the unary MSO definable queries and have an EXPTIME- complete equivalence problem. This makes them look rather similar to extended AGs. The two formalisms are, however, very different in nature. Indeed, query automata constitute a procedural formalism that has only local memory (in the state of the automaton), but which can visit each node more than a constant number of times. Attribute grammars, on the other hand, are a declarative formalism, whose evaluation visits each node of the input tree only a constant number of times (once for each attribute). In addition, they have a distributed memory (in the attributes at each node). It is precisely this distributed memory which makes extended AGs particularly well-suited for an efficient simulation of Region Algebra expressions. It is, hence, not clear whether there exists an efficient translation from Region Algebra expressions into query automata. Extended AGs can only express queries that retrieve subtrees from a document. It would be interesting to see whether the present formalism can be extended to also take re-structuring of documents into account. A related paper in this respect is that of Crescenzi and Mecca. They define an interesting formalism for the definition of wrappers that map derivation trees of regular grammars to relational databases. Their formalism, however, is only defined for regular grammars and the correspondence between actions (i.e., semantic rules) and grammar symbols occurring in regular expressions is not so flexible as for extended AGs. Other work that uses attribute grammars in the context of databases includes work of Abiteboul, Cluet, and Milo and Kilpeläinen et al. Finally, we compare extended AGs with the selective power of two query languages for XML. XML-QL is an SQL like query language for XML defined by Deutsch et al. that allows to define general tree to tree transformations. The selective power of XML- QL, however, is restricted to regular path expressions. Consequently, it can only take the structure of paths into account, not of whole subtrees. E.g., queries like the one in 'Example 13: retrieve every poem that has the words king or lord in every other verse' cannot be expressed. As another example we mention that XML-QL cannot select nodes whose children match a certain regular expressions. The latter functionality is obtained by Papakonstantinou and Vianu by introducing a form of horizontal navigation into their selection patterns. These patterns can easily be simulated by extended AGs, but still cannot express the query of 'example 13'." [cache]
[November 22, 2000] "Structured document query languages based on attribute grammars: locality and non-determinism." By Frank Neven. Pages 1290142 in Fundamentals of Information Systems, eds. T. Polle, T. Ripke and K. Schewe. The Kluwer International Series in Engineering and Computer Science #496, 1999.
"Expressiveness of structured document query languages based on attribute grammars." By Frank Neven and Jan Van den Bussche. Pages 11-17 in Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, 1998. [Seventeenth Association for Computing Machinery SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. June 1 - 4, 1998, Seattle, WA USA.] "Structured document databases can be naturally viewed as derivation trees of a context-free grammar. Under this view, the classical formalism of attribute grammars becomes a formalism for structured document query languages. From this perspective, we study the expressive power of BAGS: Boolean-valued attribute grammars with propositional logic formulas as semantic rules, and RAGS: relation-valued attribute grammars with first-order logic formulas as semantic rules, BAGs can express only unary queries; RAGS can express queries of any arity. We first show that the (unary) queries expressible by BAGS are precisely those definable in monadic second-order logic. We then show that the queries expressible by RAGS are precisely those definable by first-order inductions of linear depth, or, equivalently, those computable in linear time on a parallel machine with polynomially many processors. We finally show that RAGS are more powerful than monadic second-order logic for queries of any arity... As originally proposed by Gonnet and Tompa, a structured document database can be naturally viewed as a derivation tree over a context-free grammar. For instance, this is essentially the view taken by SGML. The classical formalism of attribute grammars, introduced by Knuth, has always been a prominent framework for expressing computations on derivation trees. Attribute grammars provide a mechanism for annotating the nodes of a tree with so-called 'attributes', by means of so-called 'semantic rules' which can work either bottom-up (for so- called 'synthesized' attribute values) or top-down (for so- called 'inherited' attribute values). Attribute grammars nre applied in such diverse fields of computer science as compiler construction and software engineering. Hence, it is natural to consider attribute grammars as a basis for structured document database languages. For instance, this approach was chosen by Abiteboul, Cluet and Milo. Our goal in this paper is to understand the expressive power of attribute grammars as a structured document query language. In the simple query facility provided by most information retrieval systems, a query amounts to the selection of certain nodes in the tree, corresponding to positions in the document or structural elements of the document, that are to be retrieved. In a previous paper, we proposed to use Boolean-valued attribute grammars (BAGS) to express such unary queries.' BAGS are attribute grammars with Boolean attribute values, and with propositional logic formulas as semantic rules. A BAG indeed expresses a query in a natural way: the result of the query expressed by a BAG consists of those nodes in the tree for which some designated attribute is true. Information retrieval systems usually query a set of structured documents instead of only one document. However, as far as query language design is concerned, a set of documents can be considered as one long structured document. We will show that a unary query is expressible by a BAG if and only if it is definable in monadic second-order logic (MS0). We found this pleasantly surprising, since at first it was not even clear to us that all first-order queries are BAG-expressible. The only-if direction is easy to prove. For the if direction, we make use of a classical theorem from the field of automata and logic (Doner-Thatcher-Wright) stating that a tree language is recognizable by a finite tree automaton if and only if it is definable by an MSO-sentence. Using this theorem, we can prove our result by an intricate construction of a BAG which simulates, in parallel, the runs of a tree automaton on all possible Boolean labelings of a tree. An interesting corollary of our proof is that each BAG is equivalent to a BAG that can be evaluated in only two passes, the first one bottom-up and the second one top-down. Having understood the expression of unary queries, we then turn to queries that result in relations among the nodes of the tree of arbitrary, fixed, arity. Thereto, we introduce relation-valued attribute grammars (RAGS), which use first-order logic formulas as semantic rules. The query expressed by a RAG is naturally defined as the value (a relation) of some designated attribute of the root. We show that the queries expressible by RAGS are precisely those definable by first-order inductions of linear depth..." [cache]
[December 10, 1998] "Attribute Grammars Over Extended Grammars for Structured Document Queries." By Frank Neven (Limburgs Universitair Centrum, Diepenbeek, Belgium). Unpublished paper, 1998. 14 pages. Abstract: "Widely-used document specification languages like, e.g. SGML and XML, model documents using extended context-free grammars. These differ from context-free grammars in that they allow arbitrary regular expressions on the right-hand side of productions. To query such documents, we introduce a new form of attribute grammars (extended AGs) that work directly over extended context-free grammars rather than over context-free grammars. Viewed as a query language, extended AGs are particularly relevant as they can take into account the inherent order of the children of a node in a document. We show that two key properties of standard attribute grammars carry over to extended AGs: efficiency of evaluation and decidability of well-definedness. We further characterize the expressiveness of extended AGs and consider several extensions." Also on attribute grammars: see the following entry.
[December 10, 1998] "Generating SGML-specific Editors: From DTDs to Attribute Grammars." By José Carlos Ramalho, Alda Reis Lopes, and Pedro Henriques. Presentation at the Markup Technologies '98 Conference, and printed in the GCA's published proceedings, Markup Technologies '98 Conference Proceedings, pages 61-72. [See also the preceding entry on attribute grammars.] "Many SGML parsers are implemented using traditional syntax-directed translation; this provides good performance for structural validation and batch processing. Problems emerge when we change the goal or the processing context - for example, to build an extension for semantic validation, or to validate online instead of batch. In the early 1970's, a newer paradigm of semantics-directed translation, based on the formalism of `attribute grammars', caught the attention of compiler developers. We have developed a DTD editor that generates attribute grammars which correspond to the DTD being edited - from which, in turn, it is possible to generate a specialized editor for the specific document type. We conclude with a glimpse of the intended environment, which will include a style editor and a semantic editor as well as the DTD editor described. . . An attribute grammar (AG) is a well accepted formalism used by the compiler community to specify the syntax and semantics of languages. Introduced by Knuth, the AG appeared as an extension to the classic CFG (context-free grammar) to allow the local definition (without the use of global variables) of the meaning of each symbol in a declarative style. Terminal symbols have intrinsic attributes (that describe their lexical information) and Nonterminal symbols are associated with generic attributes; semantic information can be synthesized up the tree (from the bottom to the root), but can also be inherited down the tree (from the top to the leaves), enabling explicit references to contextual dependencies. . ." [online version, and partial cache]
[September 16, 2000] "XMILE: An Incremental Code Mobility System based on XML Technologies." By Cecilia Mascolo, Wolfgang Emmerich, and Anthony Finkelstein (Dept. of Computer Science, University College London). 2nd International Symposium on Agent Systems and Applications and Mobile Agents (ASA/MA2000), September 2000. "Logical mobility ranges from simple data mobility, where information is transferred, through code mobility that allows the migration of executable code, to mobile agents, in which code and data move together. Several application domains need a more exible approach to code mobility than the one that can be achieved with Java and with mobile agents in general. This exibility can either be required as a result of low network bandwidth, scarce resources, and slow or expensive connectivity, like in mobile computing settings, or scalability requirements like in applications on several thousand clients that haveto be kept in sync and be updated with new code fragments. We show how to achieve more fine-grained mobility than in the approaches using mobile agents and Java class loading. We demonstrate that the unit of mobility can be decomposed from an agent or class level, if necessary, down to the level of individual statements. We can then support incremental insertion or substitution of, possibly small, code fragments and open new application areas for code mobility such as management of applications on mobile thin clients, for example wireless connected PDAs or mobile phones, or more in general distributed code update and management. This work builds on the formal foundation for fine-grained code mobility that was established in [an earlier paper]. That paper develops a theoretical model for fine-grained mobility at the level of single statements or variables and argues that the potential of code mobility is submerged by the capability of the most commonly used language for code mobility, i.e., Java. We focus on an implementation of fine-grained mobility using standardized and widely available technology. It has been identified that mobile code is a design concept, independent of technology and can be embodied in various ways in different technologies. The eXtensible Markup Language (XML) can be exploited to achieve code mobility at a very fine-grained level. XML has not been designed for code mobility, however it happens to have some interesting characteristics, mainly related to exibility, that allow its use for code migration. In particular, we will exploit the tree structure of XML documents and then use XML related technologies, such as XPath and the Document Object Model (DOM) to modify programs dynamically. The availability of this technology considerably simplifies the construction of application-specific languages and their interpreters. XML provides a exible approach to describe data structures. We now show that XML can also be used to describe code. XML DTDs (i.e., Data Type Definition) are, in fact, very similar to attribute grammars. Each element of an XML DTD corresponds to a production of a grammar. The contents of the element define the right-hand side of the production. Contents can be declared as enumerations of further elements, element sequences or element alternatives. These give the same expressive power to DTDs as BNFs have for context free grammars. The markup tags, as well as the PCDATA that is included in unrefined DTD elements, define terminal symbols. Elements of XML DTDs can be attributed. These attributes can be used to store the value of identifiers, constants or static semantic information, such as symbol tables and static types. Thus, XML DTDs can be used to define the abstract syntax of programming languages. We refer to documents that are instances of such DTDs as XML programs. XML programs can be interpreted and in interpreters can be constructed using XML technologies. When XML programs are sent from one host to another we effectively achieve code mobility." [cache]
Chenevoy, Y.; Belaid, A. "[A Structural Approach for Library References Recognition] (article in French)." Traitement du Signal 12/6 (1995) 663-671 (with 10 references). Author's affiliation: CRID, Bourgogne University, Dijon, France.
"Abstract: This paper presents a library references recognition system for retrospective conversion of catalogues. The system is guided by a structure model of a reference class, described by an attribute grammar. The analysis method is based on prediction and verification of segmentation hypotheses proposed by the model. The result, given in UNIMARC format, contains the different sub-fields of the reference with their confidence score. This method is general enough to be adopted for any document having a micro-structure. The method has also been used on other kinds of documents such as author index and subjects."
Feng, An; Wakayama, Toshiro. "SIMON: A Grammar-based Transformation System for Structured Documents." Electronic Publishing: Origination, Dissemination, and Design (EPODD) EP '94. Fifth International Conference on Electronic Publishing, Document Manipulation, and Typography, Darmstadt, Germany, 13-15 April 1994. 6/4 (December 1993) 361-372. 13 references. Authors' affiliation: Xerox Corporation, Xerox Webster Research Center, Webster, NY, USA 14580.
Abstract: SIMON is a grammar-based transformation system for restructuring documents. Its target applications include meta-level specification of document assembly, view definition and retrieval for multiview documents, and document type evolution. The internal document model is based on attribute grammars, and it interfaces with external document models such as SGML through input and output conversion. The transformation engine of SIMON is an amalgamation of syntax-directed computation and content-oriented computation: the former is through higher-order (and related) extensions of attribute grammars, whereas the latter is done by externally defined programs and it is for computation not naturally amenable to the syntax-directed paradigm. The current implementation of SIMON employs the higher-order extension proposed by H. H. Vogt, S. D. Swierstra, and M. F. Kuiper ("Higher-Order Attribute Grammars," Proceedings of the ACM SIGPLAN '89 Conference on Programming Language Design and Implementation: 131-145) for the syntax-directed computation, and C++ for the content-oriented computation.
King, P. R. "Modelling Multimedia Documents." Pages 95-110 (with 15 references) in EP '96. Proceedings of the Sixth International Conference on Electronic Publishing, Document Manipulation and Typography. [ = Journal Special Issue: Electronic Publishing - Origination, Dissemination and Design (EPODD), June & September 1995, Volume 8, Issues 2-3. Sixth International Conference on Electronic Publishing, Document Manipulation and Typography, Palo Alto, California. September 24-26, 1996. Sponsored by Adobe Systems Incorporated; School of Information Management and Systems, University of California at Berkeley; Xerox Corporation. [Proceedings Volume] Edited by Allen Brown, Anne Brüggemann-Klein, and An Feng; [Journal] Editors David F. Brailsford and Richard K. Furuta. Chichester/ New York: John Wiley & Sons, 1996. ISSN: 0894-3982. Author's affiliation: Department of Computer Science, The Univesity of Manitoba, Winnipeg, Manitoba, Canada R3T 2N2. Email: prking@cs.umanitoba.ca.
Abstract: "This paper discusses the need for models for multimedia documents and describes a particular formal model. The model makes use of an executable Interval Temporal Logic as its basis. The paper describes how temporal constraints among media items may be specified for subsequent manipulation and for use in prototyping. In particular, it uses the powerful notion of interval projection, both as a device for specifying variable display rates for media items and also for providing a scripting mechanism. The paper also outlines how this model may be used as the basis of an authoring tool for such documents." The article focuses upon document modelling and authoring systems designed to take advantage of formal models. The author's interest extends beyond the notion of attribute grammar, which serves as the core formalism in the authoring of structured documents in SGML and HyTime; he seeks to develop a model which formally addresses manipulation, including temporal aspects of authoring. For other conference information, see the main conference entry for EP '96, or the brief history of the conference as sixth in a series since 1986. See the volume main bibliographic entry for a linked list of other EP '96 titles relevant to SGML and structured documents.
Murata Makoto; Nakatsuyama, H. "A Theoretical Foundation of the DSSSL Location Model." Mathematical and Computer Modelling 25/4 (February 1997) 95-107 (with 12 references). Author's affiliation: Fuji Xerox Information Systems Co. Ltd., KSP 9A7, 2-1 Sakado 3-Chome, Takatsu-ku, Kawasaki-shi, Kanagawa-ken 213, Japan.
"Abstract: In the location model of the Document Style Semantics and Specification Language (DSSSL), one can use tree patterns to locate nodes in logical structures of documents. A tree pattern consists of conditions on nodes and those on their hierarchical relationships. As a first step towards efficient implementations, the paper shows a theoretical foundation of the location model. Tree patterns are first expressed by sentences of branching time temporal logic. These sentences are then converted to well formed attribute grammars. Thus, the library of attribute grammar evaluation techniques can be used to implement the location model. It is our belief that this observation is significant for future implementers of DSSSL. Furthermore, the converted attribute grammars can be evaluated by traversing logical structures several times. The number of required traversals can be found by examining the original sentences."
See also: Third International Workshop on Attribute Grammars and their Applications. WAGA 2000. With proceedings
See also: Second International Workshop on Attribute Grammars and their Applications - WAGA'99. With proceedings. "The aim of this one day workshop is to bring together researchers from academia and industry interested in the field of attribute grammars. The workshop wellcomes contributions on all aspects of attribute grammars, with special emphasis on new applications of attribute grammars (outside compilation) and comparisons to other formalisms (action semantics, evolving algebra, denotational semantics, natural semantics, abstract interpretation), to technical standards (mark-up languages SGML and XML), and to programming languages (object-oriented programming, constraint logic programming, functional programming, adaptive programming, generic programming)."
See also: "XML and Query Languages."
See also: "SGML/XML and Forest/Hedge Automata Theory."


SEARCH \| ABOUT \| INDEX \| NEWS \| CORE STANDARDS \| TECHNOLOGY REPORTS \| EVENTS \| LIBRARY