Members of IBM alphaWorks and the ToX Project ('Toronto XML Engine', under development at the Database Group, Department of Computer Science, University of Toronto) have released a public version of ToXgene. ToXgene is a template-based XML content generator for complex, semantically-correlated collections of XML documents. "The data generation process in ToXgene is based on a conceptual description of the data to be generated, i.e., the templates. The tool is intended for cases in which the structure of the data to be generated is known, the data is required to conform to that structure, and multiple collections of documents, with varying structures, sizes and complexities, can easily be generated. ToXgene has four main features: (1) it generates complex XML content, including elements with mixed content, attributes, non-gibberish text, and different numerical and date values; (2) it supports random number generators using different probability distributions; (3) it allows element sharing among different XML documents; (4) it supports the specification of integrity constraints over the data it produces, thus allowing the generation of consistent ID, IDREF, and IDREFS attributes."
From the online tool description:
ToXgene allows the reuse of previously generated content. Thus, an existing collection of documents can be expanded while its consistency is maintained, instead of its having to be started from scratch again. Moreover, ToXgene can mix real and synthetic data during the generation process, which mixing is often required in many practical situations (for example, using real names of countries or provinces).
ToXgene is written for Java, should run on all Java platforms. It requires a JRE compatible with JDK 1.3 and the Xerces 1.4.1 library.
Authors of ToXgene include Denilson Barbosa, Alberto Mendelzon, and John Keenleyside.
From the ToX Project web site: "The Toronto XML Server it is a repository for XML data and metadata, which supports real and virtual XML documents. Real documents are stored as files or mapped into relational or object databases, depending on their structuredness; indices are defined according to the storage method used. Virtual documents can be remote documents, defined as arbitrary WebOQL queries, or views, defined as queries over documents registered in the system. The system catalog contains metadata for the documents, especially their schemata, used for query processing and optimization. Queries can range over both the catalog and the documents, and multiple query languages are supported."
Principal references:
- ToXgene web site
- ToXgene FAQ document
- ToX Project web site (UToronto)
- "Toronto XML Server (ToX) Provides Repository for Real and Virtual XML Documents."
- "ToX - The Toronto XML Engine." By Denilson Barbosa, Attila Barta, Alberto Mendelzon, George Mihaila, Flavio Rizzolo, and Patricia Rodriguez-Gianolli. Paper presented at the International Workshop on Information Integration on the Web, Rio de Janeiro, 2001.
- "Indexing XML Data with ToXin." By F. Rizzolo and A. Mendelzon. Presented at the Fourth International Workshop on the Web and Databases, Santa Barbara, CA. 2001.