An experimental programming language developed at the Xerox Research Centre of Europe in Grenoble, France has been released for testing. Circus-DTE is a "new, innovative programming language designed for document transformation processing. Circus-DTE natively supports XML (DTDs) and is particularly suited to data processing or the transformation of structured documents: it automatically validates the results produced so that input into another application is sure to function properly. XRCE scientists believe that Circus-DTE could be especially useful when there are multiple document transformations, such as document content processing, Internet publishing, publishing on handheld devices and database-to-XML conversions. For example, processing a customer order requires a series of transformations -- data must be input into applications that check inventory and availability, that prepare shipping documentation, that generate an invoice, that process payments and perhaps even publish to the Web so a customer can track progress online. Circus-DTE is a mixture of functional, imperative and declarative programming styles. It is a type-safe compiled language with an embedded interpreter for compile-time evaluation of testing clauses. Circus-DTE incorporates structural matching operators that operate on all types of data. Matching operations involve typechecked filters which are combined up to arbitrary complex levels. Circus-DTE also offers a Linda-like Coordination Memory. Such a model relies on a few basic synchronization primitives and an associative memory that together simplify complex synchronization schemes." The software is downloadable; it runs on Win32, Linux, or Solaris/SunOS platforms.
Circus-DTE is a programming language specialized for Data Structure transformation. Thanks to its original type system, Circus-DTE provides powerful verification of transformation programs. The 'hot' problem of designing XML transformations that produce valid (DTD compliant) outputs is addressed through an advanced type system and a convenient Data Model associated with an automatic and powerful DTD-to-Circus-DTE-type conversion tool.
Circus-DTE usage scenarios
- Document re-engineering: Reorganization and update document bases to follow changes in DTD/Schema structures
- Document reorganization and aggregation: Collect, filter and reorganize various contents in order to create new documents which conform with a given output DTD/Schema
- Publishing: Online or offline conversion of structured content to presentation format -- e.g., FO, PDF, HTML, Text
- Web service exploitation and integration: Design and development of components that transform and deliver data in conformance with published Schemas
- Document adaptation/customization: Conversion of existing documents to fit particular needs or usage profiles -- presentation tuning, content filtering or enrichment
Development: "Jean-Yves Vion-Dury developed the language at XRCE, with help from Veronika Lux (XML Data Model plus many other contributions) and Emmanuel Pietriga. After a first (medium scale: around 10 external developers and 10 thousand lines) evaluation of the 1.0 alpha version, Circus-DTE evolved to v2.0 alpha integrating many significant improvements and innovative features such as read-only/rewritable references, polymorphic records and "regular" sequences. Polymorphic composition of transformers, another advanced feature currently under patenting process, is not made available in the alpha version. The technology is still evolving in order to address the many challenges faced by the document transformation community, among which are on-the-fly adaptation of documents and Web Services integration..." [main web page]
Circus-DTE Language Features
Document transformation technology is not new, and indeed a number of solutions already exist, but until now software engineers have basically had to choose between general purpose low-level languages and more specific, abstract high-level languages, each group fulfilling a particular need, but each with its drawbacks. Xerox's Circus-DTE technology bridges the gap between the two approaches, and in a sense offers the best of both worlds. Circus-DTE is specific enough to deal with today's complex transformation problems but also general enough to be able to adapt to new challenges tomorrow. As such, it represents a breakthrough in language technology and an opportunity for software companies to enhance their competitivity and increase their market share.
Circus-DTE is designed around an original programming abstraction called PAM (Polymorphic abstract machine), a procedure having only one input parameter and one output parameter. PAM is ideally suited for programming small-grain transformation algorithms, which can then be combined through a rich set of connectors. Thanks to a patented composition calculus, Circus-DTE brings component reuse to a new level.
Circus-DTE is a mixture of functional, imperative and declarative programming styles. It is a type-safe compiled language with an embedded interpreter for compile-time evaluation of testing clauses. Circus-DTE incorporates structural matching operators that operate on all types of data. Matching operations involve typechecked filters which are combined up to arbitrary complex levels.
Circus-DTE also offers a Linda-like Coordination Memory. Such a model relies on a few basic synchronization primitives and an associative memory that together simplify complex synchronization schemes. The Linda experiment demonstrated that concurrent transformation processes remain highly reusable thanks to the indirection imposed by the model. Processes do not explicitly specify a recipient to exchange information.
Semantics are formally defined through Structured Operational Semantics, including concurrent operators. Formal semantics brings clarity and confidence to the language. It also facilitates the task of establishing important run-time properties which are crucial in certain sensitive areas. And finally it enables engineers and other researchers to work on the technology using rigorous bases and common notations.
Back-end and code generator: The Circus-DTE compiler generates Python byte code and an intermediate format. A reflexive API makes Circus-DTE particularly suitable for programming various code generators. A JVM code generator is included in the distribution package. It runs independently of the main compiler.
Portability: Circus-DTE runs natively on a Python2.2 platform and offers a cross compilation into the Java Virtual Machine byte code. This means that Circus-DTE runs on all major modern architectures and operating systems.
Circus-DTE proposes four different data models for handling XML and HTML documents: Inclusion trees (compact, very well adapted to top-down recursive descent), reference trees, null pointer abstraction (flexibility for navigating in nodes), reference trees with namespaces, DOM. Networking primitives: unified through Uniform Resource Identifiers (file, http, ftp, mail). Ready-to-use, stand-alone components: analysis and generation of document types (DTD, schemas). Used to translate known constraints on a document (e.g., DTD) into equivalent Circus-DTE types (generation of Circus-DTE source code). [adapted from the Research Factsheet]
- Announcement 2003-01-08: "Xerox Makes New Programming Language for Document Transformations Available for 'Test Drive'. Early Adopters Can Try Out Promising Software, Provide Feedback On Emerging Technologies at Xerox-RIT Site."
- alphaAve.com website
- Circus-DTE website
- Circus-DTE (Document Transformation Environment) Research Factsheet 4 pages. [cache]
- The Circus-DTE tutorial. By David Ramsey (Xerox Research Centre Europe). August 2, 2002. Version: 1.1. 77 pages. [cache]
- Circus-DTE development history
- Contact: Christer Fernstrom
- See also: Xerox Multilingual Document Authoring Project
- Xerox Research Centre Europe, Grenoble, France