Cover Pages: XML Daily Newslink: Wednesday, 14 April 2010

A Cover Pages Publication http://xml.coverpages.org/
Provided by OASIS and Sponsor Members
Edited by Robin Cover

This issue of XML Daily Newslink is sponsored by:
Microsoft Corporation http://www.microsoft.com

Headlines

OASIS Public Review: SAML V2.0 Identity Assurance Profiles Version 1.0
IETF Updates Calendar Specification xCal: The XML format for iCalendar
OGF Data Format Description Language (DFDL) v1.0 Core Specification
Movement on the Big Data Front
Big Data Open-Source Duo United Under Apache: Hadoop and Cassandra
Full Disk Encryption Isn't Quite Dead
VMware's SpringSource Adds Lightweight Messaging for the Cloud
Storage? Boring? Not Anymore.

OASIS Public Review: SAML V2.0 Identity Assurance Profiles Version 1.0
Bob Morgan, Paul Madsen, Scott Cantor (eds), OASIS Public Review Draft

Members of the OASIS Security Services (SAML) Technical Committee have published an approved 'Committee Draft 01' version of SAML V2.0 Identity Assurance Profiles Version 1.0 for public review through June 13, 2010. The document "specifies methods of representing assurance information as used in two aspects of SAML. It profiles the use of SAML's Authentication Context mechanisms to express per-authentication assurance information via authentication requests and assertions. Level-of-Assurance (LOA) definitions in Identity Assurance Frameworks are expressed as a set of authentication context classes. The document also specifies a means for representing assurance certification status of entities in SAML metadata.

The document Expressing Identity Assurance in SAML 2.0 provides standard means for parties using SAML to exchange information regarding identity assurance. It defines, as a profile of the SAML Authentication Context specification, a restricted version of the AuthnContext schema for representing assurance indicators (sometimes called levels of assurance) defined by external documentation of any given assurance framework. In addition, it defines a SAML attribute profile that may be used to represent the certification status of an issuer of authentication statements (i.e., an Identity Provider) regarding its conformance with the requirements of an identity assurance framework.

Background: "Many organizations using federated service access have found it useful to define or adopt identity assurance frameworks, such as 'Liberty Identity Assurance Framework 1.0'. Such frameworks offer a model for categorizing the large number of possible combinations of registration processes, security mechanisms, and authentication methods that underlie authentication processes into a smaller, more manageable set. The term 'levels of assurance' (LOA) is often used to refer to this concept, or a particular such set ('assurance profiles' is also used). Different combinations of processes and technology are rated according to the quality of assurance they can provide. Typically, a framework defines 3-5 levels or profiles, ranging from low to high assurance. Relying parties then decide which LOA is required to access specific protected resources, based on an assessment of the risk associated with those resources — high risk requires high assurance, for example — and work with identity providers to ensure that the requirements of that level are met.

Given this interest, it is useful for parties using SAML for federation to express in SAML authentication messages the LOA requested by a relying party, and the LOA that is applicable to an authentication response. The specification 'Authentication Context for the OASIS Security Assertion Markup Language (SAML) V2.0' defines a variety of options for representing the details of identity management processes and mechanisms. The LOA profile in this document is motivated by two related considerations: (1) The SAML authentication context scheme is comprehensive, but quite complex. Deployers find that this complexity is a barrier to designing authentication contexts that match their LOA requirements. (2) Representing the details of a LOA definition using the full expressiveness of the authentication context schema results in XML documents that must be passed in-band with authentication events and parsed by SAML implementations. In most cases, the processing requirements are not sustainable and interoperability issues have not been explored. The approach taken here simply represents each LOA in an assurance framework as a separate authentication context class. Each LOA class is characterized by a URI, and the body of the schema simply contains a reference to the external documentation that defines the LOA..."

IETF Updates Calendar Specification xCal: The XML format for iCalendar
Cyrus Daboo, Mike Douglass, Steven Lees (eds), IETF Internet Draft

Steven Lees, co-editor of the new IETF xCal specification reports: "I have updated the xCal draft. The main change was removing the LINK related sections; the LINK extension concept will be addressed in a separate specification..." Version -03 of the document xCal: The XML format for iCalendar defines 'xCal' as an XML format for iCalendar data. Document Appendix A (Relax NG Schema) supplies Relax NG schema for iCalendar in XML using the compact notation of Relax NG.

Discussion of the xCal specification also takes place in CalConnect's TC-XML group. The CalConnect XML Technical Committee (TC-XML) was established in February 2008 to "develop the following specifications: (1) A specification for a two-way reference mapping of iCalendar to XML; (2) A core abstract calendaring API, and a concrete web services binding for that API. The initial design goal for this set of specifications is based on the NIST Smart Grid requirements for a web services calendaring and scheduling API. The group's goals for WS-Calendar work include: [i] delivering a a WS-Calendar specification that meets the needs of the NIST Smart Grid effort, and [ii] delivering a core abstract calendaring API that will serve as a base specification for future work, and support other protocol bindings in the future..."

From the xCal -03 draft: "The iCalendar data format defined in IETF RFC 5545 is a widely deployed interchange format for calendaring and scheduling data. While many applications and services consume and generate calendar data, iCalendar is a specialized format that requires its own parser/generator. In contrast, XML-based formats are widely used for interoperability between applications, and the many tools that generate, parse, and manipulate XML make it easier to work with than iCalendar.

The purpose of this specification is to define "xCal", an XML format for iCalendar data. xCal is defined so that iCalendar data to be converted to XML, and then back to iCalendar, without losing any semantic meaning in the data. Anyone creating XML calendar data according to this specification will know that their data can be converted to a valid iCalendar representation as well. Two key design considerations are: (1) Round-tripping (converting an iCalendar instance to XML and back) which will will give the same result as the starting point. (2) Preservation of the semantics of the iCalendar data. While a simple consumer can easily browse the calendar data in XML, a full understanding of iCalendar is still required in order to modify and/or fully comprehend the calendar data..."

See also: the OASIS WS-Calendar TC

OGF Data Format Description Language (DFDL) v1.0 Core Specification
M. Beckerle, M. Westhead, J. Myers (et al), OGF Public Comment Document

The Open Grid Forum has announced publication of "Data Format Description Language (DFDL) v1.0 Core Specification" for public comment. The document was produced by members of the OGF Data Format Description Language WG (DFDL-WG) which was chartered to "define an XML-based language for describing the structure of binary and textual files and data streams so that their format, structure, and metadata can be exposed." The document provides a definition of a standard Data Format Description Language (DFDL). This language allows description of dense binary and legacy data formats in a vendor-neutral declarative manner. DFDL is an extension to the XML Schema Description Language (XSDL).

From the Introduction: "Data interchange is critically important for most computing. Grid computing and all forms of distributed computing require distributed software and hardware resources to work together. Inevitably, these resources read and write data in a variety of formats. General tools for data interchange are essential to solving such problems. Scalable and High Performance Computing (HPC) applications require high-performance data handling, so data interchange standards must enable efficient representation of data. Data Format Description Language (DFDL) enables powerful data interchange and very high-performance data handling...

Textual XML data is the most successful data interchange standard to date. All such data are by definition new, by which we mean created in the XML era. Because of the large overhead that XML tagging imposes, there is often a need to compress and decompress XML data. However, there is a high-cost for compression and decompression that is unacceptable to some applications. Standardized binary data are also relatively new, and is suitable for larger data because of the reduced costs of encoding and more compact size. Examples of standard binary formats are data described by modern versions of ASN.1, or the use of XDR. These techniques lack the self-describing nature of XML-data. Scientific formats, such as NetCDF and HDF are used by some communities to provide self-describing binary data. In the future, there may be standardized binary-encoded XML data as there is a W3C working group that has been formed on this subject...

It is an important observation that both XML format and standardized binary formats are prescriptive in that they specify or prescribe a representation of the data. To use them your applications must be written to conform to their encodings and mechanisms of expression. DFDL suggests an entirely different scheme. The approach is descriptive in that one chooses an appropriate data representation for an application based on its needs and one then describes the format using DFDL so that multiple programs can directly interchange the described data. DFDL descriptions can be provided by the creator of the format, or developed as needed by third parties intending to use the format. That is, DFDL is not a format for data; it is a way of describing any data format. DFDL is intended for data commonly found in scientific and numeric computations, as well as record-oriented representations found in commercial data processing. DFDL can be used to describe legacy data files, to simplify transfer of data across domains without requiring global standard formats, or to allow third-party tools to easily access multiple formats. DFDL can also be a powerful tool for supporting backward compatibility as formats evolve..."

See also: the OGF Document Series

Movement on the Big Data Front
Ken North, DDJ

"The enthusiasm for Big Data applications has us putting persistent data solutions under a microscope these days. It must be noted that although Big Data applications involve operations with large data sets, their function can vary from online transaction processing to analytics to semantics-driven information retrieval. And an application might be using a distributed key-value store, a row- or column-order store, a set store, a triples store or some other technology...

The BI community represents only one slice of the Big Data user pie. The piece that represents the Linked Data / Web 3.0 / Semantic Web community isn't as large, but that community is growing. In March 2010, Oxford University and the University of Southampton announced a new Institute for Web Science will lead the way in Web 3.0 development with ?30 million in funding from the UK government... Cassandra, Hadoop Map/Reduce, Greenplum and other engines come up frequently in discussions about Big Data. But if Sir Tim Berners-Lee has his way, we'll be having more discussions about solutions for Really Big Data.

The W3C Resource Description Framework (RDF) defines a triples data model that's gained acceptance for Semantic Web applications, Linked Data and building out Web 3.0. There are a variety of data stores capable of handling billions of RDF triples, including OpenLink Virtuoso, Ontotext BigOWLIM, AllegroGraph, YARS2, and Garlik 4store. Raytheon BBN Technologies has approached the triples store problem from the perspective of using a cloud-based technology known as SHARD (Scalable, High_Performance, Robust and Distributed). SHARD is a distributed data store for RDF triples that supports SPARQL queries. It's based on Cloudera Hadoop Map/Reduce and it's been deployed in the cloud on Amazon EC2. SHARD uses an iterative process with a MapReduce operation executed for each single clause of a SPARQL query. According to Kurt Rohloff, a researcher at Raytheon BBN, SHARD performs better than current industry-standard triple-stores for datasets on the order of a billion triples..."

Big Data Open-Source Duo United Under Apache: Hadoop and Cassandra
Gavin Clarke, The Register

"As big-data hook ups go, they don't get much bigger: NoSQL and distributed computing pin ups Cassandra and Hadoop have been united by the Apache Software Foundation. ASF has released Apache Cassandra 0.6, adding support for its Hadoop project. Both Cassandra and Hadoop are ASF projects, with Cassandra only graduating from Apache's early phase incubator phase in February 2010. The union will allow users to run analytics queries using the Hadoop map reduce framework against data held inside Cassandra.

Hadoop is an open-source project party based on Google's MapReduce technology that found large-scale use inside Yahoj!. Cassandra is one of a family of NoSQL systems that started life as a way to store and serve frequently accessed data in massive systems spanning tens of thousands of servers and millions of users. The idea is NoSQL is faster and its architecturally easier to construct that using a traditional relational database system in these environments.

The Cassandra NoSQL technology started at Facebook and became an ASF incubator project in 2009. Users include Digg, Cisco WebEx, Rackspace, Reddit, and Twitter. As more data has been put into NoSQL systems, it has inevitably followed that those running them want to query it rather than simply use NoSQL as a holding pen for things like Facebook status updates, Tweets, or Digg posts...

See also: Apache Cassandra

Full Disk Encryption Isn't Quite Dead
Roger A. Grimes, InfoWorld

At least once a month, it seems some vendor or techie claims to have broken a version of a hard drive full-disk encryption (FDE) program scheme, whether it's from Microsoft (my full-time employer), BitLocker, open source favorite TrueCrypt, or some other variant. All the stories and the hype are enough to make one wonder if FDE is dead. The brief -- and slightly qualified — answer is no. There are a handful of clever attacks, as well as software to make them easier to pull off. Luckily there are easy ways to prevent most of them. We will start, however, with an attack that doesn't have an easy defense...

Prolific crypto- and password-cracking vendor Passware recently announced that it could crack both BitLocker- and TrueCrypt-protected disk volumes using the FireWire method. Theoretically, one can carry off similar attacks via a DMA-enabled port, such as PCI. These attacks can ultimately be successful against any software crypto product that does not use specialized hardware.

A researcher recently accomplished an attack against the TPM chip using an electron microscope to find a BitLocker. Needless to say, this type of attack requires not only an expensive microscope, but a highly skilled individual or team of individuals. Microsoft pointed out that the TPM attack could be prevented by using any two-factor BitLocker mode that requires TPM, plus an external PIN or smart card. The electron microscope may get the key stored in the TPM chip, but it can't find the PIN in the human mind — yet... Some other FDE attacks involve intercepting the boot-up cycle in such a way that malware is able to bypass the crypto or eavesdrop on the decryption key; a whitepaper from iViZ Security details such an attack... Most important, all of the attacks that I'm aware of require prior successful admin or root access to the victim's computer. If the attacker has that sort of access, why not just steal the data?

So, yes, FDE can be compromised. At least one assault, the cold boot memory exploit, is difficult to defend against if the attacker has the access, tools, and techniques. To counter that offensive, use good physical security or a crypto method that doesn't rely on normal computer memory chips. The rest of the attacks can be prevented by using a strong FDE solution, two-factor crypto authentication, and hibernation instead of standby or suspend mode if you're not powering down between active sessions. Disabling unneeded interface ports (and DMA if possible) works as well... Try to remember that risk is relative. Today's low-cost FDE programs prevent most of the attacks that any common computer would likely face. In most cases, FDE solutions are trying to prevent common thieves or unauthorized employees from easily accessing protected data. The sophisticated, persistent attacker with enough time and motivation will probably get to your data with or without the use of encryption... In reality, there are much easier ways to steal data than to attack the crypto..."

VMware's SpringSource Adds Lightweight Messaging for the Cloud
John K. Waters, Application Development Trends

"VMware's SpringSource division announced today that it will be adding a newly acquired lightweight messaging system to its implementation of the Java-based open source Spring Framework. RabbitMQ is an open messaging system owned by U.K.-based Rabbit Technologies, which VMware acquired last week. SpringSource has two missions these days, said Rod Johnson, the SpringSource division's general manager (and founder): to continue to grow its middleware business and to contribute to VMware's cloud strategy. The Rabbit acquisition fulfills both...

RabbitMQ is an open-source, multi-platform messaging service used by cloud computing providers to create a messaging server. It's based on the RabbitMQ Advanced Message Queuing Protocol (AMQP) message queue, a multi-protocol messaging system. Johnson said that AMQP will be the focus of the integration...

AMQP is an open standard for messaging middleware. The project Website declares its aim to make AMQP the defacto open standard for messaging middleware: By complying with the AMQP standard, middleware products written for different platforms and in different languages can send messages to one another. AMQP addresses the problem of transporting value-bearing messages across and between organizations in a timely manner...

The AMQP spec defines a set of messaging capabilities called the 'AMQP Model,' which consists of components for routing and storing messages within a broker service, and a simple set of rules for wiring these components together. The specification includes network wire-level format designed to allow client applications to talk to a broker and interact with the AMQP model it implements. Both a networking protocol and the semantics of broker services are also specified in the AMQP. The RabbitMQ has been implemented on a wide range of platforms other than Java, including .NET, PHP, Ruby and others. SpringSource will be making the Java Spring integration particularly easy as a way of interacting with RabbitMQ..."

See also: the AMQP Specification

Storage? Boring? Not Anymore.
Paul Venezia, InfoWorld

"I spent most of this week at Storage Network World in Orlando, Florida, and I came away with the overwhelming impression that storage is moving at a frantic pace in half a dozen different directions. There was some real excitement, even adjusting for the usual shrill marketing messages.

I sat in on several talks discussing the realities of FCoE adoption and integration over the next few years, and it seems clear that traditional Fibre Channel has a long row to hoe if it's going to survive the lower cost and higher performance of FCoE. Granted, there are generally higher latencies with FCoE, but the benefits of lower cost and less complexity can push those concerns aside for many infrastructures. For the moment, Fibre Channel has an 8Gbps limit, whereas FCoE can run up to 10Gbps...

How beneficial is a 100Gbps pipe if you can't get anywhere near that throughput from the disk itself? That's where SSDs come in. Don't make the mistake of viewing SSDs as simply 2.5-inch disks in hot-swap sleds. They're showing up in all kinds of form factors, from PCI Express to custom arrays from vendors like Texas Memory Systems. TMS was showing off its RamSan 630 that can handle up to 10TB of SLC SSD storage in a single 3U server and claim 60GBps throughput...

Data deduplication is all the rage. It hasn't been available in any reasonable performance and reliability form for very long — in fact, spot polls show that most companies haven't adopted deduplication yet -- but the reality of ever expanding storage requirements makes it an extremely attractive proposition for most large infrastructures. Naturally, deduplication is only as effective as the source material; if you don't have much data overlap, you're not going to get much out of running deduplicators. But if you do, it can make a massive difference in the size of your active storage and backups. At one lab session, I ran through several deduplication scenarios and generally saw somewhere around a 10:1 ratio on the fly. Backups showed about the same..."

See also: the SNW 2010 agenda


SEARCH \| ABOUT \| INDEX \| NEWS \| CORE STANDARDS \| TECHNOLOGY REPORTS \| EVENTS \| LIBRARY

Headlines

Sponsors