Cover Pages: XML Daily Newslink: Monday, 18 October 2010

A Cover Pages Publication http://xml.coverpages.org/
Provided by OASIS and Sponsor Members
Edited by Robin Cover

This issue of XML Daily Newslink is sponsored by:
ISIS Papyrus http://www.isis-papyrus.com

Headlines

Five W3C Working Draft Updates for SPARQL Version 1.1
First Public Draft: Secure Token Transfer Protocol (STTP)
Membase and Cloudera Announce Integration
Syslog Extension for Cloud Using Syslog Structured Data
XML-Based Web Forms as a Platform for Clinical Data Entry into RDF
Understanding C#: Simple LINQ to XML Examples

Five W3C Working Draft Updates for SPARQL Version 1.1
Steve Harris, Andy Seaborne, Simon Schenk, Paul Gearon (et al); W3C TRs

The W3C SPARQL Working Group has published updates for five SPARQL v1.1 Working Draft specifications. SPARQL (SPARQL Protocol and RDF Query Language) is a query language for RDF data on the Semantic Web with formally defined meaning. RDF is a directed, labeled graph data format for representing information in the Web. RDF is often used to represent, among other things, personal information, social networks, metadata about digital artifacts, as well as to provide a means of integration over disparate sources of information. This specification defines the syntax and semantics of the SPARQL query language for RDF. SPARQL can be used to express queries across diverse data sources, whether the data is stored natively as RDF or viewed as RDF via middleware. SPARQL contains capabilities for querying required and optional graph patterns along with their conjunctions and disjunctions. SPARQL also supports aggregation, subqueries, creating values by complex expressions, extensible value testing, and constraining queries by source RDF graph.

The revised SPARQL 1.1 Query Language specification now incorporates the new features of SPARQL 1.1 into the main SPARQL Query specification. The structure of this document will change to fully integrate the new features. In this publication, new content is gathered together for ease of review of these new features. The new features are: (1) Aggregates (apply expressions over groups of solutions; by default a solution set consists of a single group, containing all solutions); (2) Subqueries; (3) Negation (incorporates two styles of negation, one based on filtering results depending on whether a graph pattern does or does not match in the context of the query solution being filtered, and one based on removing solutions related to another pattern); (4) Expressions in the SELECT clause (the SELECT clause can also introduce new variables, together with an expression that gives the value of the binding for that variable)...

SPARQL 1.1 Update defines "an update language for RDF graphs. It uses a syntax derived from the SPARQL Query Language for RDF. Update operations are performed on a collection of graphs in a Graph Store. Operations are provided to change existing RDF graphs as well as create and remove graphs in the Graph Store"... "SPARQL 1.1 Service Description describes both "a method for discovering a service description from a specific SPARQL service, and an RDF schema for encoding such descriptions in RDF. SPARQL Service Description is a way of describing the features of a SPARQL service made available via the SPARQL Protocol..."

The draft SPARQL 1.1 Uniform HTTP Protocol for Managing RDF Graphs is related to SPARQL Update, which allows a user to update RDF graphs in a Graph Store at various levels of granularity, including individual RDF statements. The protocol defined in this 'Uniform HTTP Protocol' document is meant to provide a minimal set of uniform, colloquial HTTP operations for managing a semantic web of mutable, named RDF graphs at a strictly large level of granularity." Finally, as to the SPARQL 1.1 Entailment Regimes draft: "Various W3C standards, including RDF and OWL, provide semantic interpretations for RDF graphs that allow additional RDF statements to be inferred from explicitly given assertions. It is desirable to utilize SPARQL as a query language in these cases as well, but this requires basic graph pattern matching to be defined using semantic entailment relations instead of explicitly given graph structures. Such extensions of the SPARQL semantics are called entailment regimes within this document and entailment regimes for standard entailment relations in the semantic web such as RDF entailment, RDFS entailment, etc. are defined in this document..."

First Public Draft: Secure Token Transfer Protocol (STTP)
Niklas Neumann, Florian Tegeler, Xiaoming Fu (eds), IETF Internet Draft

A first public IETF Internet Draft has been released for the specification Secure Token Transfer Protocol (STTP). Abstract: "The Secure Token Transfer Protocol (STTP) provides the means for two entities to exchange a set of tokens that is needed to perform a certain task such as authentication. The exact context of the tokens and their further usage is outside the scope of the protocol. STTP is intended to be employed to in case a mechanism to securely transfer tokens is missing for a particular scenario or context within other protocols."

"The OAuth 2.0 Protocol specifies a workflow that requires end-user authorization. To acquire end-user authorization an application needs to either access a HTTP resource (the end-user authorization endpoint) with an appropriate user-agent or have access to the user's credentials and perform HTTP Basic authentication. In cases where an application does not have access to such an appropriate user-agent or where HTTP Basic authentication is not supported, STTP can be used to request the corresponding OAuth tokens. OAuth also requires the transfer of an Access Grant and an Access Token between the entity accessing the resource server and the application accessing the authorization server. In scenarios where these entities are not part of the same application, STTP can be used to transfer the tokens securely between applications even if they are running on different hosts (e.g. the authorizing application is running on a mobile device).

Passwords as shared secrets are very common for authentication and authorization. While in many cases they are pre-established, STTP can be used to dynamically provision passwords for specific scenarios. An example scenario is the generation of one-time-passwords that can be used to access services over an untrusted link or using an untrusted device. For example, a user can use a trusted mobile device that connects over a secure low-bandwidth link to the authentication and authorization server to retrieve a (one time) token that he can use to access a service on a public Internet terminal that uses a non-secured WiFi connection. In such a scenario the application on the public terminal could use STTP to request a set of tokens from the user's device which in turn uses STTP to retrieve the tokens (after a successful authentication) from the authentication and authorization server.

Protocol details: To transfer a set of tokens a STTP client contacts a STTP server over an appropriate protocol and they exchanges a set of requests and responses. A request is a command string followed by a number of arguments and a response is a numeric code together with a textual representation followed by a number of arguments. Within one connection multiple requests and responses can be exchanged until either the client or the server terminate the connection...."

Membase and Cloudera Announce Integration
Ron Bodkin, InfoQueue

At Hadoop World, Membase and Cloudera announced the integration of the Membase Server, with Cloudera's distribution for Hadoop. Membase is a NoSQL database which was released the day before. Hadoop is an open source project which includes distributed storage and map-reduce processing framework. At the conference, AOL Advertising and ShareThis presented how they used this integration for their ad targeting and serving platforms.

James Phillips, co-founder and SVP of Products at Membase wrote that 'On the technology integration front, we have built and are making available to customers two mechanisms for integrating Membase and Cloudera Distribution for Hadoop (CDH)...

The first is a Membase NodeCode module that can stream data from Membase to CDH in real-time. As new operational data enters Membase, it can be massaged in real time and pumped into a CDH cluster for processing. The second is a Sqoop-derived batch loader utility that enables loading of data from Membase to CDH, and vice versa.

Mike Olson, Cloudera CEO noted that 'Integrating with Membase Server with Cloudera's Distribution for Hadoop adds complementary functionality that customers are interested in. The result is a highly optimized data delivery system with virtually no lag time. This real-time processing capability is essential for any solution on which split decisions must be made, including ad targeting and social gaming'..."

See also: the Membase web site

Syslog Extension for Cloud Using Syslog Structured Data
Gene Golovinsky, Sam Johnston, Zachary Fox (eds.), IETF Internet Draft

IETF has published an initial level -00 Internet Draft for Syslog Extension for Cloud Using Syslog Structured Data. From the Abstract: "This document provides an open and extensible log format to be used by any cloud entity or cloud application to log and trace activities that occur in the cloud. It is equally applicable for cloud infrastructure (IaaS), platform (PaaS), and application (SaaS) services. CloudLog is defferent in content, but not in nature from the traditional logging as it takes in account transient nature of identities and resources in the cloud.

From the problem statement: Practically all hardware and software entities deployed on the network log their activities. Network elements such as routers, servers, firewalls and switches log information about their activities using mostly Syslog (except for Windows). Applications running on the network also log activities, but often using proprietary mechanisms. While logging mechanisms are inconsistent between different entities—Syslog, Windows events, proprietary files—they generally carry enough information to identify type of the activity, time of the occurrence, physical entity involved in the event, and often user(s) that participated in the event.

Availability of this information is crucial for accomplishing multiple business objectives ranging from assuring security and performing forensics to adhering to compliance regulations (SOX, PCI, etc.). The existence of logs and information in them is necessary, but not sufficient for achieving security, compliance and other business objectives. The process of collecting, processing, searching and even simply interpreting information in logs is exceptionally labor and time consuming process and often cannot even be done on any meaningful scale without appropriate tools in place. Log Management tools used to solve the problem of scale and interpretation heavily depend on the fact that format of logs is largely well defined and understood.

In cloud deployments the situation with availability of logs in reliability of information in them is drastically different. By definition, cloud resources are shared. A piece of hardware is now running multiple Virtual Instances of "it". They can be brought up and down within very short period of time and at any given moment the hardware can be shared not just by different users but by different users from different companies. Even if Linux or Windows VMs continue to log their activity the information in these logs is very likely to be irrelevant since you cannot really tie logs to the physical entity. Moreover, even if one managed to map logs to a physical entity, there is absolutely no guarantee that the same VM image will be running on the same hardware in its next reincarnation. And there is really no clear way to determine how many users share the hardware and what are their identities and roles..."

XML-Based Web Forms as a Platform for Clinical Data Entry into RDF
Chimezie Ogbuji, Blog

"I've recently been motivated to better articulate why I think the use of XForms, Plain Old XML (POX), and GRDDL (or faithful renditions of RDF graphs if you will) is a more robust web architecture for managing mutable RDF content for the purpose of research data management than other thin-client approaches, for instance...

Background question: 'Are there examples of tools or architectures that demonstrate support for the Model View Controller (MVC) paradigm for data entry directly against RDF content? It seems to be that there is an inherent impedance mismatch with that is needed for an efficient, documented-hosted, binding-oriented architecture for data entry and the amorphous nature of RDF as well as the cost of using RDF querying as a mechanism for binding data to UI elements.'

In my experience since 2006 as a software architect of web applications that use XForms to manage patient record documents as RDF graphs, I've come to appreciate that the 'CRUD problem' of RDF might have good protocol solutions being developed right now, but the question of whether there is anything more robust for forms-based data collection than declarative, auto-generated XForms that mange RDF datasets is a more difficult one, I think. My personal opinion is that the nature of the abstract syntax of an RDF graph (as opposed to the tree underlying the XML infoset), its impact on binding RDF resources to widgets, and the ubiquitous use of warehouse relational schemas as infrastructure for RDF datasets in databases will always be an insurmountable performance impediment for alternative solutions at larger volumes that are more robust than using XForms to manage an XML collection on a filesystem as a faithful rendition of an RDF dataset.

RDF/SQL databases are normalized and optimized more for read than for write: with asymptotic consequences to write operations. An architecture that directly manages very large numbers (millions) of RDF triples will be faced with this challenge. The OLTP / OLAP divide in legacy relational architecture is analogous to the use of XML and RDF in those respective roles and is an intuitive architectural style for using knowledge representation in content management systems. GRDDL and its notion of faithful renditions can be used to manage this divide as infrastructure for contemporary content management systems..."

See also: GRDDL references

Understanding C#: Simple LINQ to XML Examples
Andrew Stellman, O'Reilly Technical

"XML is one of the most popular formats for files and data streams that need to represent complex data. The .NET Framework offers powerful tools for creating, loading, and saving XML files. And once you've got your hands on XML data, you can use LINQ to query anything from data that you created to an RSS feed...

This article uses two simple LINQ to XML tutorial style examples that highlight basic patterns that you can use to create or query XML data using LINQ to XML. In the first example we create XML data, write it to disk, read it back, and then query it using LINQ; in the second example we use a LINQ query to read data from an RSS feed...

The LINQ to XML classes live in the System.Xml.Linq namespace, which has a very handy object for processing XML: the XDocument class, which represents an XML document. There's a lot of depth to an XDocument, but the easiest way to get a handle on it is to see a simple XDocument example. The example uses the XDocument and XElement classes to create XML data and save it to a file (or print it to the console). Then we use LINQ to query the XML data. Finally, we'll read RSS data from a blog using an XDocument object, and use LINQ to XML to turn it into a sequence of our own objects that we can code against...

You can do some pretty powerful things with LINQ to XML, because so much data is stored and transmitted as XML. Like RSS feeds, for example: open up any RSS feed and view its source, and you'll see XML data. And that means you can read it into an XDocument and query it with LINQ. One nice thing about the 'XDocument.Load()' method is that when you pass it a string, you're giving it a URI. A lot of the time, you'll just pass it a simple filename. But a URL will work equally well..."

See also: Wikipedia on LINQ


SEARCH \| ABOUT \| INDEX \| NEWS \| CORE STANDARDS \| TECHNOLOGY REPORTS \| EVENTS \| LIBRARY

Headlines

Sponsors