RIO - Publications

[This local archive copy mirrored from the canonical site: http://star.vic.cmis.csiro.au:8042/RIO/RIOpubs.rhtml, 1998-08-30; links may not have complete integrity, so use the updated/canonical document at this URL if possible.

The RIO (Reuse of Information Objects) project aims to develop techniques to support information reuse in various contexts; the current focus being on virtual documents as a tool for information reuse.

Note: TIM (Text-based Information Management Group) at CSIRO Mathematical and Information Science (CMIS) has the following areas of interest and expertise: "Electronic documents: structured documents (XML/SGML/HTML), fragmentation, structure, metadata, models, hypertexts, formats and transformation; Production and publishing: WWW and other forms of electronic dissemination; tools integration and interoperability; multi-user documents; document workflow management; Document delivery: active and virtual documents, information organization and presentation (CSS,XSL,DSSSL); Document analysis, text parsing, structural and semantic recognition, textual data representation, abstraction."]


Publications related to RIO - Reuse of Information Objects through virtual documents

LINK Anne-Marie Vercoustre and François Paradis, Reuse of Linked Documents through Virtual Document Prescriptions, in Lecture Notes in Computer Science 1375, Proceedings of Electronic Publishing '98, Saint-Malo, France, pp499-512, 1-3 April, 1998.
As the WWW becomes a major source of information, a lot of interest has arisen, not only for searching for information, but for reusing this information in new pages or directly from applications. Unfortunately HTML tags do not provide a significant level of structure for identifying and extracting information, since they are mostly used for presentation issues. Moreover the simple link mechanism of the Web does not support the controlled traversal of links to related pages. Particularly promising is the proposal for a new standard, XML, which could bring the power of SGML to the Web while keeping the simplicity of HTML. In this paper we present a system and a language that allows reusing information from various sources, including databases and SGML-like documents, by combining it dynamically to produce a virtual document. The language uses a tree-like structure for the representation of information objects as well as links between objects. The paper focuses on the selection and the traversal of XML links to extract information from linked pages. The strength of our approach is to be an SGML-compliant solution, which makes it ready to take full advantage of XML for reusing information from the Web as soon as it is widely used.
François Paradis and Anne-Marie Vercoustre and Brendan Hills, A Virtual Document Interpreter for Reuse of Information, in Lecture Notes in Computer Science 1375, Proceedings of Electronic Publishing '98, Saint-Malo, France, pp487-498, 1-3 April, 1998.
The importance of reuse of information is well recognised for electronic publishing. However, it is rarely achieved satisfactorily because of the complexity of the task: integrating different formats, handling updates of information, addressing document author's need for intuitiveness and simplicity, etc. An approach which addresses these problems is to dynamically generate and update documents through a descriptive definition of \emph{virtual documents}. In this paper we present a document interpreter that allows gathering information from multiple sources, and combining it dynamically to produce a virtual document. Two strengths of our approach are: the generic information objects that we use, which enables access to distributed, heterogeneous data sources; and the interpreter's evaluation strategy, which permits a minimum of re-evaluation of the information objects from the data sources.
Craig A. Lindley and Anne-Marie Vercoustre, Virtual Document Models for Intelligent Video Synthesis, in International Conference on Computational Intelligence and Multimedia Applications (ICCIMA'98), Melbourne, Australia, pp661-666, February, 1998.
To facilitate maximum reuse of video data, well-specified and systematic methods are required for the incorporation of digital video components into multiple delivery products. This paper presents an approach to the intelligent synthesis of digital video products by the representation of virtual videos using SGML. Virtual videos are defined using a Virtual Video Prescription to specify a generic model of a video production that includes embedded queries for the dynamic retrieval of video content from underlying databases. An achitecture for virtual video synthesis is decribed, and a detailed example is presented to demonstrate the prescription language syntax and its underlying concepts. Virtual Video Prescriptions for various video types and genres can include increasingly complex video models and composition rules, leading to a synthesis of SGML-based document technology with knowledge base systems technology.
François Paradis and Anne-Marie Vercoustre, A Language for Publishing Virtual Documents on the Web, International Workshop on the Web and Databases (WebDB 98), Valencia, Spain, 1-3 April, 1998.
The Web is creating exciting new possibilities for direct and instantaneous publishing of information. However, the apparent ease with which one can publish documents on the Web hides more complex issues such updating and maintaining Web pages. We believe one of the crucial requirements to document delivery is the ability to extract and reuse information from other documents or sources. In this paper we present a descriptive language that allows users to write virtual documents, where dynamic information can be retrieved from various sources, transformed, and included along with static information in HTML documents. The language uses a tree-like structure for the representation of information, and defines a database-like query language for extracting and combining information without a complete knowledge of the structure or the types of information. The data structures and the syntax of the language are presented along with examples.
François Paradis, The Prescription Language of the RIO Project, technical report, CMIS, October, 1997.
An important component of the RIO (Reuse of Information Objects) project is the investigation of virtual document techniques to support the access and edition of electronic documents. In this report, we present a language for information reuse that allows users to write virtual documents, where dynamic information objects can be retrieved from various sources, transformed, and included along with static information in SGML documents. The language uses a tree-like structure for the representation of information objects, and allows querying without a complete knowledge of the structure or the types of information.
Anne-Marie Vercoustre and François Paradis, A Descriptive Language for Information Object Reuse through Virtual Documents, in 4th International Conference on Object-Oriented Information Systems (OOIS'97), Brisbane, Australia, pp299-311, 10-12 November, 1997.
The importance of reuse is well recognised for electronic document writing. However, it is rarely achieved satisfactorily because of the complexity of the task: integrating different formats, handling updates of information, addressing document authorís need for intuitiveness and simplicity, etc. In this paper, we present a language for information reuse that allows users to write virtual documents, where dynamic information objects can be retrieved from various sources, transformed, and included along with static information in SGML documents. The language uses a treelike structure for the representation of information objects, and allows querying without a complete knowledge of the structure or the types of information. The data structures and the syntax of the language are presented through an example application. A major strength of our approach is to treat the document as a non-monolithic set of reusable information objects.
François Paradis and Anne-Marie Vercoustre and Brendan Hills, A Virtual Document Approach for Reusing SGML/XML Information Objects, presented at the SGML/XML Asia Pacific, Sydney, Australia, 22-24 September, 1997.
The importance of reusing information is well understood in electronic publishing, and is one of the motivations for the development and use of SGML. Reuse is actually quite hard to achieve with SGML, as the elements are strongly typed and there can be incompatibilities between the DTDs. HTML, an SGML derivative, relaxes those constraints, but unfortunately it does not provide a significant level of structure for identifying and extracting information, since the tags are mostly used for presentation. XML, another SGML derivative, is a promising alternative which could bring the power of SGML to the Web while keeping the simplicity of HTML. Those standards all have particularities which must addressed in a global solution to the reuse of information. We present our solution to the reuse of SGML information objects: a system that can dynamically combine information from various sources, including databases and SGML-like documents, to produce a virtual document, which allow an author to reuse information in a document-centric, descriptive way. We maintain support for the particularities of the data sources, by having them stored in different formats and accessed in their own native query language, but also support the integration of these information objects by converting them into a common, tree-like data structure, and by providing a language to extract and transform information in those trees. In this approach, a collection of SGML documents can be stored in an object-oriented database as a tree-like hierarchy of information objects; thus taking advantage of the strict typing of SGML to provide efficient storage and retrieval. By extending the standard query language of the object-oriented database, we can query on an incomplete or partial knowledge of the document structure whilst retaining the search efficiency that the database engine provides us. Combination of the results with other databases or data sources, and inclusion into the SGML virtual document is handled by our tree language. HTML and XML documents, do not always conform to a DTD, and, if they come from the Web, are volatile and fast-changing in nature. We propose in this case to access those documents through the standard file systems or http protocol, to convert them to our tree-like data structures on-line, and to use our tree language for both extraction and transformation, with possibly some specific instructions to handle links. The system is currently being implemented. Our prototypal application, a document to generate activity reports, reuses both an SGML database and a collection of HTML pages (as well as an SQL database), and shows how flexible and powerful our tool for information reuse is.
Anne-Marie Vercoustre and Jon Dell'Oro and Brendan Hills, Reuse of Information through Virtual Documents, in Second Australian Document Computing Symposium, Melbourne Australia, pp55-64, April 5, 1997.
This paper explores the issue of representing textual information in the form of virtual documents that include data and fragments of documents from remote sources - especially from databases and SGML document databases. Virtual documents are dynamically generated, and therefore always present up-to-date information when they are instantiated. The benefit of this paradigm is that it allows information to be shared, reused, and adapted for various contexts. A virtual document specification defines how to find and retrieve information objects from databases or from existing documents, and how to assemble it into another document. Virtual documents can be used to create HTML pages that contain information from one or several remote or local databases, to assembly parts of existing documents into a new one, or to define various views of the same information according to various needs. This paper focuses on the prototype implementation of virtual documents from the perspectives of document authoring and architecture. We propose and SGML syntax for Information Object that includes OQL-queries for retrieving fragments of existing documents, transformations on an intermediate tree representation, and output mapping to the virtual document structures.
Anne-Marie Vercoustre, Jon Dell'Oro, Building Virtual Documents upon Distributed Databases, CSIRO-Report, CMIS 98/66, 1996.
This paper explores the issue of publishing information under the form of Virtual Documents that include data and fragments of documents from remote sources, especially from databases and SGML Documents databases. Because they are dynamically generated, Virtual Documents are automatically updated when the source information changes, and they allow for sharing, reusing and adapting information for various contexts. The paper focuses on implementation of virtual documents from the authoring and architecture perspective.


Back to RIO Main Page