Cover Pages: XML Daily Newslink: Wednesday, 17 March 2010

A Cover Pages Publication http://xml.coverpages.org/
Provided by OASIS and Sponsor Members
Edited by Robin Cover

This issue of XML Daily Newslink is sponsored by:
Oracle Corporation http://www.oracle.com

Headlines

IETF Update: Specification for a URI Template
Integrating Composite Applications on the Cloud Using SCA
New Models of Human Language to Support Mobile Conversational Systems
Definitions for Expressing Standards Requirements in IANA Registries
Now IBM's Getting Serious About Public IaaS
There is REST for the Weary Developer
Aggregative Digital Libraries: D-NET Software Toolkit and OAIster System
Public Data: Translating Existing Models to RDF

IETF Update: Specification for a URI Template
Joe Gregorio, Roy Fielding, Marc Hadley, Mark Nottingham, David Orchard; IETF Internet Draft

A revised version of the IETF Standards Track Internet Draft "URI Template" has been published. From the abstract: "A URI Template is a compact sequence of characters for describing a range of Uniform Resource Identifiers through variable expansion. This specification defines the URI Template syntax and the process for expanding a URI Template into a URI, along with guidelines for the use of URI Templates on the Internet.

Overview: "A Uniform Resource Identifier (URI) is often used to identify a specific resource within a common space of similar resources... URI Templates provide a mechanism for abstracting a space of resource identifiers such that the variable parts can be easily identified and described. URI templates can have many uses, including discovery of available services, configuring resource mappings, defining computed links, specifying interfaces, and other forms of programmatic interaction with resources.

A URI Template provides both a structural description of a URI space and, when variable values are provided, a simple instruction on how to construct a URI corresponding to those values. A URI Template is transformed into a URI-reference by replacing each delimited expression with its value as defined by the expression type and the values of variables named within the expression. The expression types range from simple value expansion to multiple key=value lists. The expansions are based on the URI generic syntax, allowing an implementation to process any URI Template without knowing the scheme-specific requirements of every possible resulting URI.

A URI Template may be provided in absolute form, as in the examples above, or in relative form if a suitable base URI is defined... A URI Template is also an IRI template, and the result of template processing can be rendered as an IRI by transforming the pct-encoded sequences to their corresponding Unicode character if the character is not in the reserved set... Parsing a valid URI Template expression does not require building a parser from the given ABNF. Instead, the set of allowed characters in each part of URI Template expression has been chosen to avoid complex parsing, and breaking an expression into its component parts can be achieved by a series of splits of the character string. Example Python code [is planned] that parses a URI Template expression and returns the operator, argument, and variables as a tuple..."

Integrating Composite Applications on the Cloud Using SCA
Rajarshi Bhose and Kiran Nair, DDJ

"Elastic computing has made it possible for organizations to use cloud computing and a minimum of computing resources to build and deploy a new generation of applications. Using the capabilities provided by the cloud, enterprises can quickly create hybrid composite applications on the cloud using the best practices of service-component architectures (SCA).

Since SCA promotes all the best practices used in service-oriented architectures (SOA), building composite applications using SCA is one of the best guidelines for creating cloud-based composite applications. Applications created using several different runtimes running on the cloud can be leveraged to create a new component , as well as hybrid composite applications which scale on-demand with private/public cloud models can also be built using secure transport data channels.

In this article, we show how to build and integrate composite applications using Apache Tuscany, the Eucalyptus open source cloud framework, and OpenVPN to create a hybrid composite application. To show that distributed applications comprising of composite modules (distributed across the cloud and enterprise infrastructure) can be integrated and function as a single unit using SCA without compromising on security, we create a composite application that components spread over different domains distributed across the cloud and the enterprise infrastructure. We then use SCA to host and integrate this composite application so that it fulfills the necessary functional requirements. To ensure information and data security, we set up a virtual private network (VPN) between the different domains (cloud and enterprise), creating a point-to-point encrypted network which provides secure information exchange between the two environments...

This project illustrates that distributed applications comprising of composite modules (distributed across the cloud and Enterprise Infrastructure) can be integrated and made to function as a single unit using Service Component Architecture (SCA) without compromising on security..."

New Models of Human Language to Support Mobile Conversational Systems
Staff, W3C Announcement

W3C has announced a 'Workshop on Conversational Applications: Use Cases and Requirements for New Models of Human Language to Support Mobile Conversational Systems'. The workshop will be held June 18-19, 2010 in New Jersey, US, hosted by Openstream. The main outcome of the workshop will be the publication of a document that will serve as a guide for improving the W3C language model. W3C membership is not required to participate in this workshop. The current program committee consists of: Paolo Baggia (Loquendo), Daniel C. Burnett (Voxeo), Deborah Dahl (W3C Invited Expert), Kurt Fuqua (Cambridge Mobile), Richard Ishida (W3C), Michael Johnston (AT&T), James A. Larson (W3C Invited Expert), Sol Lerner (Nuance), David Nahamoo (IBM), Dave Raggett (W3C), Henry Thompson (W3C/University of Edinburgh), and Raj Tumuluri (Openstream).

"A number of developers of conversational voice applications feel that the model of human language currently supported by W3C standards such as SRGS, SISR and PLS is not adequate and that developers need new capabilities in order to support more sophisticated conversational applications. The goal of the workshop therefore is to understand the limitations of the current W3C language model in order to develop a more comprehensive model. We plan to collect and analyze use cases and prioritize requirements that ultimately will be used to identify improvements to the W3C language model. Just as W3C developed SSML 1.1 to broaden the languages for which SSML is useful, this effort will result in improved support for language capabilities that are unsupported today.

Suggested Workshop topics for position papers include: (1) Use cases and requirements for grammar formalisms more powerful than SRGS's context free grammars that are needed to implement tomorrow's applications (2) What are the common aspects of human language models for different languages that can be factored into reusable modules? (3) Use cases and requirements for realigning/extending SRGS, PLS and SISR to support more powerful human language models (4) Use cases and requirements for sharing grammars among concurrent applications (5) Use cases that illustrate requirements for natural language capabilities for conversational dialog systems that cannot easily be implemented using the current W3C conversational language model. (6) Use cases and requirements for speech-enabled applications that can be used across multiple languages (English, German, Spanish, ...) with only minor modifications. (7) Use cases and requirements for composing the behaviors of multiple speech-enabled applications that were developed independently without requiring changes to the applications. (8) Use cases and requirements motivating the need to resolve ellipses and anaphoric references to previous utterances.

Position papers, due April 2, 2010, must describe requirements and use cases for improving W3C standards for conversational interaction and how the use cases justify one or more of these topics: Formal notations for representing grammar in: Syntax, Morphology, Phonology, Prosodics; Engine standards for improvement in processing: Syntax, Morphology, Phonology, Lexicography; Lexicography standards for: parts-of-speech, grammatical features and polysemy; Formal semantic representation of human language including: verbal tense, aspect, valency, plurality, pronouns, adverbs; Efficient data structures for binary representation and passing of: parse trees, alternate lexical/morphologic analysis, alternate phonologic analysis; Other suggested areas or improvements for standards based conversational systems development..."

See also: W3C Workshops

Definitions for Expressing Standards Requirements in IANA Registries
Olafur Gudmundsson and Scott Rose (eds), IETF Internet Draft

The Internet Engineering Steering Group (IESG) has received a request to consider the specification Definitions for Expressing Standards Requirements in IANA Registries as a Best Current Practice RFC (BCP). The IESG plans to make a decision in the next few weeks, and solicits final comments on this action; please send substantive comments to the IETF mailing lists by 2010-04-14.

Abstract: "RFC 2119 defines words that are used in IETF standards documents to indicate standards compliance. These words are fine for defining new protocols, but there are certain deficiencies in using them when it comes to protocol maintainability. Protocols are maintained by either updating the core specifications or via changes in protocol registries. For example, security functionality in protocols often relies upon cryptographic algorithms that are defined in external documents. Cryptographic algorithms have a limited life span, and new algorithms regularly phased in to replace older algorithms. This document proposes standard terms to use in protocol registries and possibly in standards track and informational documents to indicate the life cycle support of protocol features and operations.

The proposed requirement words for IANA protocol registries include the following. (1) MANDATORY This is the strongest requirement and for an implementation to ignore it there MUST be a valid and serious reason. (2) DISCRETIONARY, for Implementations: Any implementation MAY or MAY NOT support this entry in the protocol registry. The presence or omission of this MUST NOT be used to judge implementations on standards compliance (and for) Operations: Any use of this registry entry in operation is supported, ignoring or rejecting requests using this protocol component MUST NOT be used as bases for asserting lack of compliance. (3) OBSOLETE for Implementations means new implementations SHOULD NOT support this functionality, and for Operations, means any use of this functionality in operation MUST be phased out. (4) ENCOURAGED: This word is added to the registry entry when new functionality is added and before it is safe to rely solely on it. Protocols that have the ability to negotiate capabilities MAY NOT need this state. (5) DISCOURAGED means this requirement is placed on an existing function that is being phased out. This is similar in spirit to both MUST- and SHOULD- as defined and used in certain RFC's such as RFC 4835. (6) RESERVED: Sometimes there is a need to reserve certain values to avoid problems such as values that have been used in implementations but were never formally registered. In other cases reserved values are magic numbers that may be used in the future as escape valves if the number space becomes too small. (7) AVAILABLE is a value that can be allocated by IANA at any time..."

This document is motivated by the experiences of the editors in trying to maintain registries for DNS and DNSSEC. For example, DNS defines a registry for hash algorithms used for a message authentication scheme called TSIG, the first entry in that registry was for HMAC-MD5. The DNSEXT working group decided to try to decrease the number of algorithms listed in the registry and add a column to the registry listing the requirements level for each one. Upon reading that HMAC-MD5 was tagged as 'OBSOLETE' a firestorm started. It was interpreted as the DNS community making a statement on the status of HMAC-MD5 for all uses.

Now IBM's Getting Serious About Public IaaS
James Staten, Forrester Blog

"IBM has been talking a good cloud game for the last year or so. They have clearly demonstrated that they understand what cloud computing is, what customers want from it and have put forth a variety of offerings and engagements to help customers head down this path—mostly through internal cloud and strategic rightsourcing options.

But its public cloud efforts, outside of application hosting have been a bit of wait and see. Well the company is clearly getting its act together in the public cloud space with today's announcement of the Smart Business Development and Test Cloud, a credible public Infrastructure as a Service (IaaS) offering. This new service is an extension of its developerWorks platform and gives its users a virtual environment through which they can assemble, integrate and validate new applications. Pricing on the service is as you would expect from an IaaS offering, and free for a limited time...

Certainly any IaaS can be used for test and development purposes so IBM isn't breaking new ground here. But its off to a solid start with stated support from test and dev specialist partners SOASTA, VMLogix, AppFirst and Trinity Software bring their tools to the IBM test cloud..."

See also: Jeffrey Schwartz in GCN

There is REST for the Weary Developer
Sivadasan Plakote, DevX.com

This brief article provides an example of working with the Representational State Transfer style of software architecture. REST (Representational State Transfer) is a style of software architecture for accessing information on the Web. The RESTful service refers to web services as resources that use XML over the HTTP protocol. The term REST dates back to 2000, when Roy Fielding used it in his doctoral dissertation. The W3C recommends using WSDL 2.0 as the language for defining REST web services. To explain REST, we take an example of purchasing items from a catalog application...

First we will define CRUD operations for this service as following. The term CRUD stands for basic database operations Create, Read, Update, and Delete. In the example, you can see that creating a new item with Id is not supported. When a request for new item is received, Id is created and assigned to the new item. Also, we are not supporting the update and delete operations for the collection of items. Update and delete are supported for the individual items...

Interface documents: How does the client know what to expect in return when it makes a call for CRUD operations? The answer is the interface document. In this document you can define the CRUD operation mapping, Item.xsd file, and request and response XML. You can have separate XSD for request and response, or response can have text such as 'success' in return for the methods other than GET...

There are other frameworks available for RESTful Services. Some of them are listed here: Sun reference implementation for JAX-RS code-named Jersey, where Jersey uses a HTTP web server called Grizzly, and the Servlet Grizzly Servlet; Ruby on Rails; Restlet; Django; Axis2a.

See also: Wikipedia on REST

Aggregative Digital Libraries: D-NET Software Toolkit and OAIster System
Paolo Manghi, Marko Mikulicic, Leonardo Candela, (et al), D-Lib Magazine

"Aggregative Digital Library Systems (ADLSs) provide end users with web portals to operate over an information space of descriptive metadata records, collected and aggregated from a pool of possibly heterogeneous repositories. Due to the costs of software realization and system maintenance, existing "traditional" ADLS solutions are not easily sustainable over time for the supporting organizations. Recently, the DRIVER EC project proposed a new approach to ADLS construction, based on Service-Oriented Infrastructures. The resulting D-NET software toolkit enables a running, distributed system in which one or multiple organizations can collaboratively build and maintain their service-oriented ADLSs in a sustainable way. Aggregative Digital Library Systems (ADLSs) typically address two main challenges: (1) populating an information space of metadata records by harvesting and normalizing records from several OAI-PMH compatible repositories; and (2) providing portals to deliver the functionalities required by the user community to operate over the aggregated information space, for example, search, annotations, recommendations, collections, user profiling, etc.

Repositories are defined here as software systems that typically offer functionalities for storing and accessing research publications and relative metadata information. Access usually has the twofold form of search through a web portal and bulk metadata retrieval through OAI-PMH interfaces. In recent years, research institutions, university libraries, and other organizations have been increasingly setting up repository installations (based on technologies such as Fedora, ePrints, DSpace, Greenstone, OpenDlib, etc) to improve the impact and visibility of their user communities' research outcomes.

In this paper, we advocate that D-NET's 'infrastructural' approach to ADLS realization and maintenance proves to be generally more sustainable than 'traditional' ones. To demonstrate our thesis, we report on the sustainability of the 'traditional' OAIster System ADLS, based on DLXS software (University of Michigan), and those of the 'infrastructural' DRIVER ADLS, based on D-NET.

As an exemplar of traditional solutions we rely on the well-known OAIster System, whose technology was realized at the University of Michigan. The analysis will show that constructing static or evolving ADLSs using D-NET can notably reduce software realization costs and that, for evolving requirements, refinement costs for maintenance can be made more sustainable over time..."

Public Data: Translating Existing Models to RDF
Jeni Tennison, Blog

"As we encourage linked data adoption within the UK public sector, something we run into again and again is that (unsurprisingly) particular domain areas have pre-existing standard ways of thinking about the data that they care about. There are existing models, often with multiple serialisations, such as in XML and a text-based form, that are supported by existing tool chains. In contrast, if there is existing RDF in that domain area, it's usually been designed by people who are more interested in the RDF than in the domain area, and is thus generally more focused on the goals of the typical casual data re-user rather than the professionals in the area...

To give an example, the international statistics community uses SDMX for representing and exchanging statistics... SDMX includes a well-thought through model for statistical datasets and the observations within them, as well as standard concepts for things like gender, age, unit multipliers and so on. By comparison, SCOVO, the main RDF model for representing statistics, barely scratches the surface in comparison. This isn't the only example: the INSPIRE Directive defines how geographic information must be made available. GEMINI defines the kind of geospatial metadata that that community cares about. The Open Provenance Model is the result of many contributors from multiple fields, and again has a number of serialisations.

You could view this as a challenge: experts in their domains already have models and serialisations for the data that they care about; how can we persuade them to adopt an RDF model and serialisations instead? But that's totally the wrong question. Linked data doesn't, can't and won't replace existing ways of handling data. The question is really about how to enable people to reap these benefits; the answer, because HTTP-based addressing and typed linkage is usually hard to introduce into existing formats, is usually to publish data using an RDF-based model alongside existing formats. This might be done by generating an RDF-based format (such as RDF/XML or Turtle) as an alternative to the standard XML or HTML, accessible via content negotiation, or by providing a GRDDL transformation that maps an XML format into RDF/XML...

Modelling is a complex design activity, and you're best off avoiding doing it if you can. That means reusing conceptual models that have been built up for a domain as much as possible and reusing existing vocabularies wherever you can. But you can't and shouldn't try to avoid doing design when mapping from a conceptual model to a particular modelling paradigm such as a relational, object-oriented, XML or RDF model. If you're mapping to RDF, remember to take advantage of what it's good at such as web-scale addressing and extensibility, and always bear in mind how easy or difficult your data will be to query. There is no point publishing linked data if it is unusable..."

See also: Linked Data


SEARCH \| ABOUT \| INDEX \| NEWS \| CORE STANDARDS \| TECHNOLOGY REPORTS \| EVENTS \| LIBRARY

Headlines

Sponsors