Cover Pages: XML Daily Newslink: Tuesday, 14 September 2010

A Cover Pages Publication http://xml.coverpages.org/
Provided by OASIS and Sponsor Members
Edited by Robin Cover

This issue of XML Daily Newslink is sponsored by:
Microsoft Corporation http://www.microsoft.com

Headlines

Metalink/HTTP: Mirrors and Cryptographic Hashes in HTTP Headers
W3C Timed Text Markup Language (TTML) 1.0 as Proposed Recommendation
Schema Scope: Primer and Best Practices for W3C XML Schema Design
Suggested Values for SMTP Enhanced Status Codes for Anti-Spam Policy
Disappearing Identity Providers Pose Problem
Up in the Air: Moving Your Applications to the Cloud
Cloud Security and Compliance: Clear The Ambiguity
The Simple Publishing Interface (SPI) for Federated Repositories

Metalink/HTTP: Mirrors and Cryptographic Hashes in HTTP Headers
Anthony Bryan, Henrik Nordstrom, Tatsuhiro Tsujikawa (et al), IETF Internet Draft

IETF has published an updated version of the Standards Track specification Metalink/HTTP: Mirrors and Cryptographic Hashes in HTTP Headers. Now at level -17, this document has been released under several different titles: 'Metalink in HTTP Headers', 'MetaLinkHeaders: Mirrors and Checksums in HTTP Headers', 'Metalink/HTTP: Mirrors and Checksums in HTTP Headers' (etc).

This document specifies Metalink/HTTP Mirrors and Cryptographic Hashes as a different way to get information that is usually contained in the Metalink XML-based download description format. 'Metalink is an Internet standard that harnesses the speed and power of peer to peer networking, FTP, and HTTP with a single click.' Metalink/HTTP describes multiple download locations (mirrors), Peer-to-Peer, cryptographic hashes, digital signatures, and other information using existing standards for HTTP headers. Clients can transparently use this information to make file transfers more robust and reliable...

From the document Introduction: Identical copies of a file are frequently accessible in multiple locations on the Internet over a variety of protocols (such as FTP, HTTP, and Peer-to-Peer). In some cases, users are shown a list of these multiple download locations (mirrors) and must manually select a single one on the basis of geographical location, priority, or bandwidth. This distributes the load across multiple servers, and should also increase throughput and resilience. At times, however, individual servers can be slow, outdated, or unreachable, but this can not be determined until the download has been initiated. Users will rarely have sufficient information to choose the most appropriate server, and will often choose the first in a list which may not be optimal for their needs, and will lead to a particular server getting a disproportionate share of load...

This document describes a mechanism by which the benefit of mirrors can be automatically and more effectively realized. All the information about a download, including mirrors, cryptographic hashes, digital signatures, and more can be transferred in coordinated HTTP Headers. This Metalink transfers the knowledge of the download server (and mirror database) to the client. Clients can fallback to other mirrors if the current one has an issue. With this knowledge, the client is enabled to work its way to a successful download even under adverse circumstances. All this is done transparently to the user and the download is much more reliable and efficient..."

W3C Timed Text Markup Language (TTML) 1.0 as Proposed Recommendation
Glenn Adams, Mike Dolan, Geoff Freed, Sean Hayes (et al, eds), W3C Technical Report

Members of the W3C Timed Text Working Group have published a Proposed Recommendation for the Timed Text Markup Language (TTML) 1.0 specification. W3C publishes a technical report as a Proposed Recommendation to indicate that the document is believed to be stable, and to encourage implementation by the developer community. At this time the dynamicFlow feature and the property value reverse oblique text have been removed due to lack of implementations.

W3C invites public review of the TTML 1.0 PR through October 12, 2010. A list of issues is available. Also for review purposes, a test suite is available, along with its coverage report and a implementation report. In the DFXP 1.0 Test Results document, test results are tabulated for Adobe/Flash, WGBH/NCAM 3.0.1, W3C/HTML5/JS, Longtail/JW FLV 4.6, MS/Silverlight TimedTextPad, and XFSI Viewer.

The Timed Text Markup Language (TTML) "is a content type that represents timed text media for the purpose of interchange among authoring systems. Timed text is textual information that is intrinsically or extrinsically associated with timing information. It is intended to be used for the purpose of transcoding or exchanging timed text information among legacy distribution content formats presently in use for subtitling and captioning functions.

In addition to being used for interchange among legacy distribution content formats, TTML content may be used directly as a distribution format, for example, providing a standard content format to reference from a 'text' or 'textstream' media object element in a SMIL 2.1 document... Use of TTML is intended to function in a wider context of Timed Text Authoring and Distribution mechanisms that are based upon a system model wherein the Timed Text Markup Language serves as a bidirectional interchange format among a heterogeneous collection of authoring systems, and as a unidirectional interchange format to a heterogeneous collection of distribution formats after undergoing transcoding or compilation to the target distribution formats as required, and where one particular distribution format is TTML..."

Schema Scope: Primer and Best Practices for W3C XML Schema Design
Casey D. Jordan and Dale Waldt, IBM developerWorks

A schema is a well-formed XML document that uses the powerful XML Schema Definition Language (XSD, also sometimes called W3C Schema) to model and validate other XML data. Depending on how you define schema particles (elements, types, attributes, and other constructs), they have an associated scope that is either global/exposed or local/hidden. The scope design of your schema significantly affects how the schema can evolve, be reused, and interoperate with other technologies.

Before you begin any schema project, it's imperative to align your design choices with your goals. By understanding the uses of schema scope, you can streamline the process of managing schemas and content. Ultimately, this will increase your ability to manage schema life cycle and allow your schemas to interact efficiently with other systems.

In this article, we first show how global or local scope is defined for various schema particles and explain how scope affects their behavior. Then we describe basic schema design patterns and explore considerations and best practices for creating scope designs that fit the needs of your projects.

Defining elements locally prevents them from being exposed to other parts of the schema. A local element's context is limited to its current location, so it cannot be referenced from other parts of the schema... It's not always easy to determine whether you should define schema particles with local or global scope. Depending on the use case, namespacing requirements, and schema evolution, the best choices can vary. Generally, a schema design falls into four basic patterns: (1) Russian doll; (2) Salami slice; (3) Venetian blinds; (4) Garden of Eden. It is important to understand these patterns to determine the best solution for your project..."

See also: W3C XML Schema

Suggested Values for SMTP Enhanced Status Codes for Anti-Spam Policy
Jeff Macdonald (ed), IETF Internet Draft

IETF has published a specification for Suggested Values for SMTP Enhanced Status Codes for Anti-Spam Policy. The document "establishes a set of extended SMTP policy codes for anti-spam. It seeks to provide additional codes for error texts that currently use the extended SMTP error code 5.7.1. The anti-spam codes were determined by looking at error texts produced by major ISPs and finding commonalities. The result is a new set of error texts with associated extended SMTP error codes."

"IETF RFC 3463 defines a set of Enhanced Status Codes for SMTP related to anti-spam policy. These codes are to be registered with the IANA Mail Enhanced Status Codes registry as defined in RFC 5248. While Anti-Spam policy is inherently a local decision, assigning these codes helps troubleshoot problems and lower support costs by allowing sending administrators to resolve many problems themselves.

The most common extended SMTP code assigned to anti-spam policy is '5.7.1'; this is because the subject code of 7 is meant for security or policy. Using 5.7.1 for many different anti-spam policies weakens the usefulness of extended SMTP error codes. One of the motivations behind RFC 3463 was to re-distribute the classifications of SMTP error codes in order to provide a richer set of errors, and provide a means for machine- readable, human language independent status codes. Thus a new subject code of 8 is introduced for anti-spam policy.

All of the new detail text was gathered by surveying several existing large ISPs to see what messages were produce when presented with messages that violate their policies. An attempt was then made to coalesce the messages together into common themes. These themes where then simply assigned a detail number. While this document provides suggested text for each detail code, alternate text can be provided if the text is in the spirit of the suggested text. This will allow sites to simply prepend the proper extended SMTP code to their existing text..."

Disappearing Identity Providers Pose Problem
Dave Kearns, Network World

"John Fontana is now writing about IdM issues for Ping Identity. He recently commented upon an issue that may arise more and more in the future as identity providers (specifically OpenID Identity Providers) disappear... Later this month, Six Apart will officially shut down VOX, a blogging site and an OpenID provider.

How does this affect people using VOX as their OpenID Identity Provider (IDP)? Fontana explains: 'If you have associated your VOX OpenID with services that you regularly use and where you store data, then there will be no one to validate that VOX OpenID... in effect, you don't exist, and worse yet, you have no access rights to your stuff'..."

Pam Dingle: "While this event isn't likely to crush a huge number of users under its wheel, it does start to expose some of the issues around OpenID. Can they be solved? Perhaps. The simple solution may be that OpenID IDPs consolidate into a handful of providers such as Google, which doesn't appear to be going anywhere soon...."

In fact, this scenario has been scaring off customers of OpenID since the service was first started. The generally accepted workaround is to (where possible) have two separate logins to a service, both accessing all of your data. If one OpenID IDP fails you still have the second to fall back on. Of course that assumes that the service provider allows two separate logins to have administrative access to the single set of data, which is not always true..."

Up in the Air: Moving Your Applications to the Cloud
Panos Louridas, IEEE Software

Cloud computing is currently among its most-hyped topics. Consultants, industry reports, business magazines, blogs, and books are trying to tell developers what cloud computing is. At the same time, cloud infrastructure providers are eager to extol the virtues of a new computing and programming model and the benefits you could accrue by getting on board. The cloud-computing paradigm is characterized by: (1) transactional resource acquisition, (2) nonfederated resource provisioning, and (3) a metered resource wher the provider meters resource usage... But there are also risks, such as the dependency on high-availability, high-performance network connections, and—not least, security and privacy.

In general, cloud infrastructure providers should show that their offerings meet the obligations set down by law. In standards such as PCI DSS, the onus for compliance is on the cloud user. In privacy matters like the EU Data Protection Directive, the cloud user must investigate exactly what kinds of data are moved where and whether the movement falls foul of the directive.

Many cloud providers achieve SAS 70 (Statement on Auditing Standards No. 70: Service Organizations) Type II certification. This assures service users that an independent third party (an auditor) has examined the provider's organizational controls over the processing of sensitive information.

Finally, in a global economy, developers must also consider how data protection and privacy laws differ among nations and cultures. For example, Europe and the US have different legal and social histories that are reflected in different perspectives on the relative merits of policy and regulation. The US generally prefers a self-regulatory approach, where industries decide on the norms they will adopt, Europe opts for a policy-driven approach, where laws regulate data handling and privacy. EU member states pass laws on the national level to enforce the Data Protection Directive..."

Cloud Security and Compliance: Clear The Ambiguity
George Hulme, InformationWeek

The fact that business consumers of public cloud computing services don't get much in the way of transparency into the governance and security efforts of their cloud providers has been an obvious hindrance to cloud adoption. Here's an example at how a nascent, but encouraging, standard (CloudAudit) aims to change that...

Essentially, CloudAudit provides a common interface and the context around IT controls necessary to automate the auditing of cloud infrastructures to a number of compliance frameworks and regulations. To date, CloudAudit defines compliance 'namespaces' for ISO 27002, PCI DSS, COBIT, HIPAA, and NIST 800-53. CloudAudit has also built upon work completed by the Cloud Security Alliance in cross-mapping between compliance framework controls.

George Reese, co-founder and CTO at cloud infrastructure management provider enStratus also worked closely on the development of CloudAudit. Initially, Reese explains, enStratus became involved with CloudAudit because of the how it could help further better governance in the cloud. In addition, should CloudAudit take root and grow in the IT industry, it could have profound benefits for both the cloud providers and the enterprise consumers of cloud services. 'As a cloud provider, enStratus has to constantly undergo different kinds of audits from our customers. Each audit is different, and even if they're auditing for the same thing, each customer asks the same questions in different ways: it's just not economical'...

What CloudAudit does, should it be successful, is make it easier at a much lower cost to make IT controls transparent to whoever asks. Providers can publish their answers to the CloudAudit questions in a standard format that is readable by auditors and, one day, automated programs... Eventually, Reese explains, cloud providers and management companies could be able to use CloudAudit to help better manage their customers data: 'Consider a company that has data that is subject to European privacy laws. enStratus would be able to query the different cloud providers a customer has accounts with and make certain that data never leaves Europe..."

The Simple Publishing Interface (SPI) for Federated Repositories
Stefaan Ternier, David Massart, Michael Totschnig (et al), D-Lib Magazine

"The Simple Publishing Interface (SPI) is a new publishing protocol, developed under the auspices of the European Committee for Standardization (CEN) workshop on learning technologies. This protocol aims to facilitate the communication between content producing tools and repositories that persistently manage learning resources and metadata. The SPI work focuses on two problems: (1) facilitating the metadata and resource publication process (publication in this context refers to the ability to ingest metadata and resources); and (2) enabling interoperability between various components in a federation of repositories.

This article discusses the different contexts where a protocol for publishing resources is relevant. SPI contains an abstract domain model and presents several methods that a repository can support. An Atom Publishing Protocol binding is proposed that allows for implementing SPI with a concrete technology and enables interoperability between applications.

The Simple Publishing Interface (SPI) is used to push digital resources and/or their metadata into a repository. SPI makes relatively few assumptions about the resources and metadata that can be published. SPI supports four operations: submitting a resource to a repository, deleting a resource from a repository, publishing a metadata record to a repository, and deleting a metadata record from a repository... Depending on the binding that is used, this byte-stream can be encoded in various ways. SPI proposes two ways of submitting a resource to a repository: "by-value" and "by-reference". In by-value publishing, the resource is directly embedded (after encoding) in a message sent to a repository. In by-reference publishing, the message sent to the repository only contains a reference (e.g., a URL) to the submitted resource. It is then the responsibility of the repository to use this reference to retrieve the resource and store it...

The SPI is an abstract model that seeks consensus on which high level methods to offer for publishing metadata and resources. A binding to a technology makes these methods more concrete and defines how applications can interoperate practically. SPI has been bound to the Atom Publishing Protocol (APP or AtomPub) for several reasons. AtomPub is a simple HTTP-based protocol for publishing Atom entries to the web. Although Atom as a format for web feeds is well known, AtomPub can do much more than serve as a protocol for blog clients. In the realm of repositories, SWORD (Simple Web-service Offering Repository Deposit) is a profile of AtomPub that defines a number of Atom elements and HTTP extensions to support the publication of resources or content packages. Both SWORD and SPI initiatives started around the same time. In a later stage of the SPI project, it was decided to offer support for AtomPub in order to enable interoperability with the SWORD specification. SPI is in essence an abstract protocol that can be bound to existing technologies (APP, SOAP, XML-RPC), while SWORD is an extension of existing technology (APP)..."


SEARCH \| ABOUT \| INDEX \| NEWS \| CORE STANDARDS \| TECHNOLOGY REPORTS \| EVENTS \| LIBRARY

Headlines

Sponsors