Cover Pages: XML Daily Newslink: Wednesday, 06 August 2008

A Cover Pages Publication http://xml.coverpages.org/
Provided by OASIS and Sponsor Members
Edited by Robin Cover

This issue of XML Daily Newslink is sponsored by:
Primeton http://www.primeton.com

Headlines

Developers Release Apache Tuscany SCA 1.3 for SOA Application Development
Topic Maps Constraint Language (TMCL) Final Committee Draft
W3C Last Call Review for CSS Marquee Module Level 3
The OAI2LOD Server: Exposing OAI-PMH Metadata as Linked Data
OASIS BPEL4People TC Releases Committee Draft Candidate Specifications
The Apache Qpid XML Exchange
A Hybrid Parallel Processing for XML Parsing and Schema Validation
Using Atom Categorization to Build Dynamic Applications
Using the Ontology Editing Tool SWOOP to Edit Taxonomies and Thesaurii

Developers Release Apache Tuscany SCA 1.3 for SOA Application Development
Apache Tuscany Team, Software Announcement

The Apache Tuscany Project team has announced the version 1.3 release of the Java SCA project. Apache Tuscany provides a runtime environment based on the Service Component Architecture (SCA). SCA is a set of specifications aimed at simplifying SOA application development. These specifications are being standardized by OASIS as part of the Open Composite Services Architecture (Open CSA). This 1.3 release is the first release of Tuscany as an Apache top level project, and includes various improvements to the SCA runtime and extensions, along with performance improvements, and numerous bug fixes... An essential characteristic of SOA is the ability to assemble new and existing services to create brand new applications that may consist of different technologies. Service Component Architecture defines a simple, service-based model for construction, assembly and deployment of a network of services (existing and new ones) that are defined in a language-neutral way. Tuscany implements the SCA Version 1.0 specifications. The Apache Tuscany SCA 1.3 release includes implementations of the main SCA specifications including: (1) SCA Assembly Model V1.0; (2) SCA Policy Framework V1.0; (3) SCA Java Common Annotations and APIs V1.0; (4) SCA Java Component Implementation V1.0; (5) SCA Spring Component Implementation V1.0; (6) SCA BPEL Client and Implementation V1.0; (7) SCA Web Services Binding V1.0; (8) SCA EJB Session Bean Binding V1.0. It also includes implementations of many features not yet defined by SCA specifications, including: [i] SCA bindings for Direct Web Remoting, RSS and ATOM Feeds, HTTP resources, JSON-RPC, PUB/SUB Notifications, and RMI. [ii] SCA implementation types for OSGI, XQuery, BPEL, Widget and various dynamic languages including Groovy, Javascript, Python and Ruby. [iii] Databindings for Service Data Objects (SDO), JAXB, XmlBeans, Axis2's AXIOM, JSON, SAXON, DOM, SAX and StAX. The Tuscany SCA Runtime can be configured as a single node SCA domain or as an SCA domain distributed across multiple nodes.

See also: SCA and Apache Tuscany

Topic Maps Constraint Language (TMCL) Final Committee Draft
IPSJ/ITSCJ Secretariat, Ballot Text Announcement

A communication from Toshiko KIMURA announces that in accordance with Resolution 6 adopted at the SC 34 Plenary meeting held in Kyoto, Japan, 2007-12-08/11 (SC 34 N 968), "Information technology — Topic Maps — Constraint Language (TMCL)" is being circulated to the SC 34 members for a four-monthn FCD ballot. SC 34 members are requested to vote/comment via the CIB as soon as possible but not later than 2008-12-08. The TMCL specification defines a means to express and evaluate constraints on topic maps conforming to the Topic Map Data Model (TMDM). It defines a data model for representing constraints on instances of the topic map data model and the formal semantics for the interpretation of different constraint types. This International Standard expresses constraints using topic map constructs and the interpretation of these constraints as TMQL. In addition, This International Standard defines a number of CTM templates to faciliate the construction of TMCL constraints. TMCL constraints are represented as topic map structures using the TMDM. Any syntax that can be used to create TMDM structures is a valid authoring syntax for TMCL constraints and therefor no special syntax is defined. The formal semantics of TMCL constraints are defined using TMQL... TMCL defines constraint types and an interpretation for instances of those types. The interpretation indicates in an unambiguous fashion what it means for an instance of a given constraint type to be evaluated as true or false in the context of a TMDM instance. The TMCL constraint types are defined in terms of the topic map data model. The formal interpretation of each constraint type is defined using TMQL. All constraint types defined follow a common pattern. They are all defined as subtypes of the topic type called 'Constraint'. They all have an occurrence of type 'validation expression'. It is possible to define new constraint types that address specific domain requirements while still fitting into the overall TMCL validation framework. The constraint types defined in TMCL are intended for use in a entity constraint language fashion, such as ERM, UML etc. They are intended to be used to define the set of identities, occurrences, names and played association roles that a topic of a given type must have in order to been deemed valid..." ISO/OEC JTC 1/SC 34 is the international standardization subcommittee for Document Description and Processing Languages standards and technical reports related to structured markup languages (specifically the Standard Generalized Markup Language (SGML) and the Extensible Markup Language (XML)) in the areas of information description, processing and association.

W3C Last Call Review for CSS Marquee Module Level 3
Bert Bos (ed), W3C Technical Report

Members of the W3C Cascading Style Sheets (CSS) Working Group have published a Last Call Working Draft for "CSS Marquee Module Level 3." CSS describes the rendering of documents on various media. When documents (e.g., HTML) are laid out on visual media (e.g., screen or print) and the contents of some element are too large for a given area, CSS allows the designer to specify whether and how the overflow is displayed. One way, available on certain devices, is the "marquee" effect: the content is animated and moves automatically back and forth. This module defines the properties to control that effect. The marquee effect consists of the UA slowly moving the content of a box so that, over time, all parts of the box are visible at least once. The speed of the movement, whether the movement is in one direction only or back and forth, how far the content moves and how often may vary. But, unlike for most other scrolling mechanisms, the scrolling does not depend on user events. Typically, marquee is used in contexts where there is no room for a scrollbar or other visible mechanism or when user interaction is limited: instead of actively moving hidden content into view, the user waits while the content moves by. The specification only defines the marquee effect for level 2 of the CSS box model, i.e., for horizontal text only, as defined by CSS level 2. It is expected that this specification will be updated and generalized to include vertical text, once the CSS Text Layout module is stable. Features in this module were previously described in a draft of the CSS Box module.

The OAI2LOD Server: Exposing OAI-PMH Metadata as Linked Data
Bernhard Haslhofer and Bernhard Schandl, CEUR-WS Presentation

The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is utilised for the exchange and sharing of metadata for digital and non-digital items and enjoys growing popularity in the domain of digital libraries and archives. Client applications can use the OAI-PMH protocol to harvest metadata from Data Providers using open standards such as URI, HTTP, and XML. Institutions taking the role of data providers can easily expose their metadata via OAI-PMH by implementing light-weight wrapper components on top of their existing metadata repositories... There exist a number of OAI Data Provider Registries, from which we know that currently 1765 institutions worldwide maintain OAI-PMH repositories. Regarding their application domain, we can observe that the protocol has been implemented in a variety of institutions, ranging from small research facilities to national libraries that have integrated this protocol with their catalogue systems. Examples are the Institute of Biology of the Southern Seas, exposing 403 records, and the U.S. National Library of Medicine's digital archive, exposing 1,272,585 records. The design of OAI-PMH is based on the Web Architecture, but it does not treat its conceptual entities as dereferencable resources. Also selective access to metadata is still out of its scope. One can, for instance, retrieve metadata for a certain digital item, but cannot retrieve all digital items that have been created by a certain author. With the OAI2LOD Server we provide a possible solution for these shortcomings by following the Linked Data design principles and by providing SPARQL access to metadata. The ongoing Object Reuse and Exchange (OAI-ORE) standardisation indicates that the idea of Linked Data will play a substantial role in the context of digital libraries and archives... OAI-ORE is a set of standards for the description and exchange of aggregations of Web resources. A resource can be anything that is identified with a URI such as Web sites, online multimedia content, or items stored in institutional digital library systems. In the ORE data model an aggregation is an instance of the conceptual entity Resource Map and is identified by a URI. A resource map describes the encapsulated resources as a set of machine readable RDF statements, which makes them readable for a variety of Web agents. Clients can retrieve aggregations by executing an HTTP GET request on a resource map's URI. The ATOM Syndication Format is specified as the primary serialisation format for delivering resource maps to clients. However, since the ORE data model is defined in RDF, resources can not only be mapped to the ATOM format but also serialised in other RDF exchange formats such as RDF/XML or N3. OAI-PMH and OAI-ORE overlap in the fact that Resource Maps can be included as metadata records in OAI-PMH responses, which allows batch retrieval and harvesting of aggregation information. We believe that there lies a great potential in a tighter integration of these two standards: if OAI-PMH metadata repositories expose their items as Web resources by assigning them HTTP-dereferencable URIs, these items could take part in OAI-ORE aggregations. One possible strategy could be to define a common core data model that links these two standards so that the ORE specification builds on top of the OAI-PMH protocol. Meanwhile, the OAI2LOD Server can serve as a bridge between these two standards.

See also: the LDOW 2008 Proceedings

OASIS BPEL4People TC Releases Committee Draft Candidate Specifications
Charlton Barreto, Luc Clement, Dieter Koenig (et al, eds), OASIS Working Drafts

Members of the OASIS WS-BPEL Extension for People (BPEL4People) Technical Committee have published milestone Committee Draft Candidate [*] deliverables for "WS-BPEL Extension for People (BPEL4People) Specification Version 1.1" and "Web Services - Human Task (WS-HumanTask) Specification Version 1.1." This OASIS TC was chartered in January 2008 to define: (1) extensions to the OASIS WS-BPEL 2.0 Standard to enable human interactions and (2) a model of human interactions that are service-enabled. Several working drafts have been released, including the current 'Rev 42', designated as 'Committee Draft Candidate' and queued for TC review and CD vote in the August 2008 timeframe. (1) BPEL4People Abstract: Web Services Business Process Execution Language, version 2.0 (WS-BPEL 2.0 or BPEL for brevity) introduces a model for business processes based on Web services. A BPEL process orchestrates interactions among different Web services. The language encompasses features needed to describe complex control flows, including error handling and compensation behavior. In practice, however many business process scenarios require human interactions. A process definition should incorporate people as another type of participants, because humans may also take part in business processes and can influence the process execution. This specification introduces a BPEL extension to address human interactions in BPEL as a first-class citizen. It defines a new type of basic activity which uses human tasks as an implementation, and allows specifying tasks local to a process or use tasks defined outside of the process definition. This extension is based on the WS-HumanTask specification. (2) WS-HumanTask Abstract: The concept of human tasks is used to specify work which has to be accomplished by people. Typically, human tasks are considered to be part of business processes. However, they can also be used to design human interactions which are invoked as services, whether as part of a process or otherwise. This specification introduces the definition of human tasks, including their properties, behavior and a set of operations used to manipulate human tasks. A coordination protocol is introduced in order to control autonomy and life cycle of service-enabled human tasks in an interoperable manner. [*NB: Candidate in this context means that the Working Drafts are nominated for TC Review and ballot; if approved, the Working Drafts ('Committee Draft Candidates') will be issued as Committee Draft specifications.]

See also: the OASIS BPEL4People TC

The Apache Qpid XML Exchange
Jonathan Robie, Paper Prepared for Presentation at Balisage 2008

XML is widely used for messaging applications. Message-oriented Middleware (MOM) is a natural fit for XML messaging, but it has been plagued by a lack of standards. Each vendor's system uses its own proprietary protocols, so clients from one system generally can not communicate with servers from another system. Developers who are drawn to XML because it is simple, open, interoperable, language independent, and platform independent often use REST for messaging because it shares the same virtues. When XML developers need high-performance, guaranteed delivery, transactions, security, management, asynchronous notification, or direct support for common messaging paradigms like point-to-point, broadcast, request/response, and publish/subscribe, they have been forced to sacrifice some of the virtues that drew them to XML in the first place. Java JMS is an API, defined only for Java, and it does not define a wire protocol that would allow applications running on different platforms or written in different languages to interoperate. SOAP and Web Services offer interoperability if the same underlying protocols are used and if the same WSI-protocol is used by all parties, but at the cost of more complexity than a MOM system. And as the basic components of enterprise messaging have been added piece by piece to the original specifications, Web Services have become complex, defined in a large number of overlapping specifications, without a coherent and simple architecture. The new Advanced Message Queueing Protocol (AMQP) is an open, language independent, platform independent standard for enterprise messaging. It provides precisely the coherent and simple architecture that has been missing for sophisticated messaging applications. Red Hat Enterprise MRG includes a multi-language, multi-platform, open source implementation of AMQP. We develop the messaging component as part of the upstream Apache Qpid project. In order to meet the needs of XML messaging systems, we contributed the Apache Qpid XML Exchange, which provides XQuery-based routing for XML content and message properties. Together, AMQP, Apache Qpid, and the Qpid XML Exchange provide a solid foundation for mission critical XML messaging applications.

See also: AMQP references

A Hybrid Parallel Processing for XML Parsing and Schema Validation
Yu Wu, Qi Zhang, Zhiqiang Yu, Jianhui Li; Balisage 2008 Conference Paper

XML is playing crucial roles in web services, databases, and document representing and processing. However, the processing of XML document has been regarded as the main performance bottleneck especially for the processing of very large XML data. On the other hand, multi-core processing gains increasingly popularity both on the desktop computers and server computing machines. To take full advantage of multi-cores, we present a novel hybrid parallel XML processing model, which combines data-parallel and pipeline processing. It first partitions the XML by chunks to perform data parallel processing for both XML parsing and schema validation, then organize and execute them as a two stage pipeline to exploit more parallelism. The hybrid parallel XML processing model has shown great overall performance advantage on multi-core platform as indicated by the experiment performance results... Several efforts have been made in this field to parallelize XML parsing. Wei Lu first presented a pre-scanning based parallel parsing model, which consisted of an initial pre-scanning phase to determine the structure of the XML document, followed by a full, parallel parser. The results of pre-scanning phase are used to help partition the XML document for data parallel processing. The research continued with an attempt to parallelize the pre-scanning stage to exploit more parallelism. Michael R.Head also explored new techniques for parallelizing parsers for very large XML documents. They did not focus on developing parallel XML parsing algorithm, but exposing the parallelism by dividing XML parsing process into several phases, such as XML data loading, XML parsing and result generation, and then scheduling working threads to execute each parsing phase in a pipeline model. The paper discussed a bunch of performance issues such as load imbalance and communication and synchronization overhead. Parabix uses parallel bitstream technology, in its XML parsing by exploiting the SSE vector instructions in the Intel architecture. Compared to other approaches, our approach tries to avoid the pre-scanning overhead, as we discovered the pre-scanning overhead is considerable especially after we improved the parsing performance. Our the algorithm is chunk-based and each parallel sub-task is to process a chunk, it helps processing a large document without loading it into the memory. The vectorization approachcan be used in the each parallel sub-task and therefore complementary to our approach. Moreover, our paper is the first one describing the parallel schema validation and the hybrid parallel model. The performance evaluation results show the performance benefits by this model and parallel XML parsing and schema validation.

See also: XML Schema Languages

Using Atom Categorization to Build Dynamic Applications
Alexander Milowski, Paper Prepared for Presentation at Balisage 2008

Atom feeds provide the ability to categorize both the feed and its entries. This categorization provides a simple and easy way for feed authors to associated terms and semantics to their feed contents. By using this categorization, authors can keep their information organized while re-purposing them to build dynamic web applications... One of the interesting parts of the Atom vocabulary is the category element associated with both feeds and entries. This element has two important attributes called 'scheme' and 'term'. The scheme attribute is an URI value that 'qualifies' or 'scopes' the term attribute's value. The element itself can contain any content—text or elements—but none is defined by the Atom Syndication Format. If you concatenate the scheme and term attribute values and assume a default for when the scheme attribute is omitted, the result is a URI. This value can be interpreted as a leaf term in some unnamed ontology that labels the entry or feed with that term. As the category element may contain content, a value can be associated with the term. This interpretation means that for each category element you get a RDF triple. This triple is constructed such that the subject is the entry or feed, the predicate is the term URI, and the object is the value of the element.. Plenty of software system exists that allow authored keywords to produce index information and then allow people to browse that information. What is interesting here is that we're using categorization and terms. Any categorization—both formal and informal—can now be used to annotate information stored in the feeds. The annotations are not limited to keywords. Also, the combination of different terms and values can be used to create a very specific set of informaiton. Similarly, the queries are not limited to simple retrieval exercises. The SPARQL queries can perform complex union and intersection operations as well as filtering on term values. As such, very specific data sets can be retrieved from the atomojo server. As time goes by, queries can be developed to use whatever categorization evolves from the authors. These queries can be used to re-purpose that original content without much, if any, change to the feed metadata. The resulting feeds can then be associated with a web resource independent of how the author chose to organize the original entries and feeds. That is, I can create a disorganized pile of information and keep my website organized.

Using the Ontology Editing Tool SWOOP to Edit Taxonomies and Thesaurii
Bob DuCharme, Blog

In the online course in taxonomy development that I took recently we reviewed several popular taxonomy development tools. I found them to be expensive or to have clunky, dated interfaces, and was disappointed that the formats most of these programs supported for storing saved work was either a binary proprietary format or what they just called "XML"... There is a standard format that they can share: SKOS, which provides an ontology (available as an OWL file) that defines the kinds of relationships that taxonomists want to see in taxonomy or thesaurus development. This includes basic ones such as "narrower" and "broader" and more sophisticated variations on these such as "broaderPartitive" and "narrowerInstantive". A little background: we can qualify the relationship between a term in a tree and its parent by saying that the child node is narrowerInstantive, as the Louvre is an instance of a museum, or narrowerPartitive, as a brain stem is a part of a brain, or narrowerGeneric, as the class of parrots is a subclass of the class of birds. In addition to defining the taxonomy term relationship properties "broader" and "narrower", SKOS defines instantive, partitive, and generic subproperties of "broader" and "narrower"... If the SKOS standard lays out the potential relationships and provides a definition of these relationships in a standard syntax (OWL), and an open source GUI tool like SWOOP can read that and let you define the terms and relationships in a new thesaurus by pointing and clicking, then the most difficult part of providing a new alternative to the well-known taxonomy tools is already done, right? Well, not quite. There are two key things missing... A SPARQL query delivered via Pellet can pull out explicit and implicit triples, but not in a syntax that can be used for an RDF/OWL file. I saw on the Pellet mailing list that the next version would support SPARQL CONSTRUCT queries that let you create a new set of RDF around the returned triples, so that will help. Describing all this here, I can casually refer to the use of SWOOP to read an ontology file and then define individuals and their taxonomic relationships, but I'd like to spell out in more detail how I used SWOOP to do this...


SEARCH \| ABOUT \| INDEX \| NEWS \| CORE STANDARDS \| TECHNOLOGY REPORTS \| EVENTS \| LIBRARY

Headlines

Sponsors