Cover Pages: XML Daily Newslink: Monday, 01 June 2009

A Cover Pages Publication http://xml.coverpages.org/
Provided by OASIS and Sponsor Members
Edited by Robin Cover

This issue of XML Daily Newslink is sponsored by:
Primeton http://www.primeton.com

Headlines

OASIS Public Review Draft: Common Alerting Protocol Version 1.2
Revised W3C SPARQL Working Group Charter Anticipates SPARQL Phase II
IETF PKIX Working Group: Other Certificates Extension
The DMTF Standards Incubation Process: Promoting Open Collaboration Throughout the Standard Lifecycle
Open Data: Ready for Reuse?
Conformance in the Floating World: Duty Now for the Future?
Key Exchange: The Impact of the Key Management Interoperability Protocol
Google Wave's Architecture
Spinning Up: OASIS DITA Pharmaceutical Content Subcommittee

OASIS Public Review Draft: Common Alerting Protocol Version 1.2
Jacob Westfall (ed), OASIS OASIS Emergency Management TC Draft

Members of the OASIS Emergency Management Technical Committee have approved a Committee Draft of the "Common Alerting Protocol Version 1.2" specification and released it for public review through July 27, 2009. This Version 1.2 represents a minor release to resolve issues identified by the EM-TC CAP Call for Comments initiated in April 2008, and to incorporate feedback from CAP profile development efforts (on which, see below).

The Common Alerting Protocol (CAP) provides an open, non-proprietary digital message format for all types of alerts and notifications. It does not address any particular application or telecommunications method. The CAP format is compatible with emerging techniques, such as Web services, as well as existing formats including the Specific Area Message Encoding (SAME) used for the United States' National Oceanic and Atmospheric Administration (NOAA) Weather Radio and the Emergency Alert System (EAS), while offering enhanced capabilities that include: (1) Flexible geographic targeting using latitude/ longitude shapes and other geospatial representations in three dimensions; (2) Multilingual and multi-audience messaging; (3) Phased and delayed effective times and expirations; (4) Enhanced message update and cancellation features; (5) Template support for framing complete and effective warning messages; (6) Compatible with digital encryption and signature capability; (7) Facility for digital images and audio. Key benefits of CAP will include reduction of costs and operational complexity by eliminating the need for multiple custom software interfaces to the many warning sources and dissemination systems involved in all-hazard warning. The CAP message format can be converted to and from the native formats of all kinds of sensor and alerting technologies, forming a basis for a technology-independent national and international 'warning internet'.

In connection with the CAP Version 1.2 work, members of the Emergency Management CAP Profiles Subcommittee have been working to review the DHS FEMA/IPAWS "Common Alerting Protocol (CAP) v1.1 Profile Requirements" document, to refine and formalize technical specifications and create an "IPAWS Common Alerting Protocol (CAP) Version 1.1 Profile", in order to support CAP within the U.S. Emergency Alert System (EAS). A public review draft of this CAP Profile ("Common Alerting Protocol Version 1.1: USA Integrated Public Alert and Warning System Profile Version 1.0") has been published.

See also: the OASIS CAP Profile

Revised W3C SPARQL Working Group Charter Anticipates SPARQL Phase II
Staff, W3C Announcement

A W3C Working Group has been re-chartered to enumerate a set of extensions that are expected to be widely used and can be shown to exist in multiple, interoperable implementations. The mission of the new SPARQL Working Group, chartered through July 31, 2010, is to produce a W3C Recommendation that extends SPARQL. The extension is a small set of additional feature that have been identified by the users as badly needed for applications, and have been identified by SPARQL implementers as reasonable and feasible extension to current implementations. Although the community has expressed opinions on various types of extensions to SPARQL, this Working Group is only chartered to make additions that are expected to be widely used and can be shown to exist in multiple, interoperable implementations. It is not the goal of this working group to make a significant upgrade of the SPARQL language.

The W3C RDF Data Access Working Group has published three SPARQL recommendations (Query Language, Protocol, and Results Format) in January 2008. SPARQL has become very widely implemented and used since then—and, in fact, even before the specification achieved a W3C Recommendation status. Usage and implementation of SPARQL have revealed requirements for extensions to the query langauge that are needed by applications. Most of these were already known and recorded when developing the current Recommendation, but there was not enough implementation and and usage experience at the time for standardization. Current implementation experience and feedback from the user community makes it now feasible to handle some of those issues in a satisfactory manner.

Examples for the extensions to the SPARQL, to be considered by this Working Group, include: (1) Insert, update, and delete: building on SPARQL to modify RDF data stores, including using HTTP Post; (2) Query constructs to access container and collection members; (3) Aggregate functions (e.g., COUNT, SUM, MIN, MAX, ...) and supporting query language constructs (e.g., GROUP BY clause, HAVING clause, and the ability to project named expressions); (4) Additional filters and operators (e.g.,, full-text search, string functions); (5) Interoperability mechanisms (e.g., via service descriptions) for endpoints that extend SPARQL basic graph matching to other entailment regimes such as RDF Schema or OWL; (6) Examine options for a more predictable DESCRIBE functionality; (7) Reconsider the issue on XML Serialization of the SPARQL query language—this issue was explicitly postponed by the previous DAWG group.

IETF PKIX Working Group: Other Certificates Extension
Stephen Farrell (ed), IETF Internet Draft

Members of the IETF Public-Key Infrastructure (X.509) Working Group (PKIX) have released an updated Internet Draft for Other Certificates Extension. Abstract: "Some applications that associate state information with public key certificates can benefit from a way to link together a set of certificates belonging to the same end entity that can safely be considered to be equivalent for the purposes of referencing that application state information. This memo defines a certificate extension that allows applications to establish the required linkage without introducing a new application protocol data unit."

From the 'Introduction': "RFC 5280 ('Internet X.509 Public Key Infrastructure Certificate and Certificate Revocation List (CRL) Profile') defines a profile for the use of public key certificates for Internet applications. If an application associates application state information with a public key certificate, then that association may be disrupted if the end entity changes its public key certificate. Such disruption can occur due to renewals or if the end entity changes its certificate issuer. Similarly, if the end entity is actually a distributed system, where each instance has a different private key, then the relying party (RP) has no way to assoicate the different public key certificates with the relevant application state information. For example, assume a web browser retains state information (perhaps passwords) about a web site, indexed (possibly indirectly) via values contained in the web server's public key certificate (perhaps a DNS name). When the web server certificate expires, and a new certificate is acquired (perhaps with a different DNS name), then the browser cannot safely map the new certificate to the relevant state information. This memo defines a new public key certificate extension that supports such linkage. Other than the issuer asserting that the set of certificates belong to the same end entity for use with the same application, the fine-detail of the semantics of the linkage of certificates is not defined here, since that is a matter for application developers and the operators of certification authorities (CAs). In particular we do not define how a CA can validate that the same end entity is the holder of the various private keys, nor how the application should make use of this information. Nor do we define what kinds of state information may be shared.

The IETF PKIX Working Group was chartered to to track the evolution of ITU-T X.509 documents, and maintain compatibility between these documents and IETF PKI standards, since the profiling of X.509 standards for use in the Internet remains an important topic for the working group. PKIX does not endorse the use of specific cryptographic algorithms with its protocols. However, PKIX does publish standards track RFCs that describe how to identify algorithms and represent associated parameters in these protocols, and how to use these algorithms with these protocols. PKIX pursues new work items in the PKI arena if working group members express sufficient interest, and if approved by the cognizant Security Area director(s).

The DMTF Standards Incubation Process: Promoting Open Collaboration Throughout the Standard Lifecycle
Josh Cohen, Distributed Management Task Force (DMTF) Newsletter

In this article, DMTF Vice Chairman of the Board and Process Committee Chair talks about the Open Cloud Standards Incubator and the DMTF Incubator Process. "DMTF incubators are designed to enable members to work together to produce informational specifications that can later be submitted for further standards development. In the past, the development cycle for a future standard often began with a group of vendors working privately to create a specification. Once finished, this specification would be submitted to a standards body, such as the DMTF, for ratification as a standard. While these private efforts were very successful in certain situations, they also had various drawbacks. Even when operated fairly, they were still governed by agreement of the vendors involved rather than an open standards body with a well-defined process. The incubator process allows vendors to form a group within DMTF to develop such a specification. The DMTF process ensures oversight by a standards body rather than private vendor agreements. The DMTF processes themselves are defined, in a democratic process, by the members of the organization. In cases where there is a disagreement, members can use a well-defined escalation process which may proceed up to the DMTF Board of Directors for resolution. This provides a mechanism to enable transparency and resolve issues fairly and by consensus of the organization.

New incubators can be proposed by any two Board or Leadership member companies within DMTF. Incubator proposals are submitted to the DMTF Process Committee Chair, who notifies the general DMTF membership of the intent to form the incubator. Interested members then work to develop a charter, which must be approved by the Board of Directors before the incubator can begin working on its stated deliverables. Once an incubator is approved, any DMTF member can participate to provide input. An Incubator will produce deliverables, such as specifications or other materials, that are considered informational specifications. Informational specifications produced by DMTF incubators are reviewed by the Board, which determines the next step for the work. Specifications developed through the incubator process are expected to evolve into permanent DMTF standards; however, there is no guarantee that DMTF will accept the specifications..."

Open Data: Ready for Reuse?
Joab Jackson, Government Computer News

Making raw data ready for third-party use is a big challenge for U.S. government agencies. Last month, the Obama administration unveiled a Data.gov Web site that offers the public access to data feeds from various agencies. Although the number of feeds is modest, the site's debut could signal a radical new way in which government agencies must handle and release their data. The site sets a new standard for the presentation of government data, but prepping such data, however, will be no small task.

The initial Data.gov offering has links to forty-seven (47) datasets from a variety of agencies available in a variety of formats. For example, the U.S. Geological Survey has submitted its National Geochemical Survey database, which offers a nationwide analysis of soil and stream sediments, and the Fish and Wildlife Service has presented data about waterfowl migration flyways... Data.gov offers data in several formats, including Extensible Markup Language (XML), comma-separated values (CSV), plain text, Keyhole Markup Language and ESRI's shapefile format for geographical data. The site also offers links to agency Web pages that allow people to query specific databases. For example, it has a link to the General Services Administration's USAspending.gov site, which lets users search information on federal contracts.

Getting data into a coherent state will require data governance, which takes a considerable amount of work. Multiple agencies will have to agree on how data should be structured, such as through a common taxonomy or ontology, and on a precise dictionary of terms. Ray Bjorklund, senior vice president and chief knowledge officer at FedSources, said the government still has a long way to go before it can offer a wide array of useful data sources. Getting data into a coherent state will require data governance, which takes a considerable amount of work. Multiple agencies will have to agree on how data should be structured, such as through a common taxonomy or ontology, and on a precise dictionary of terms. "There are lots of islands of data within the government that are well-structured," Bjorklund said. "Will you try to tie all those definitions together and reconcile them? Will you have agencies redesign their data environments to meet some new taxonomy or infrastructure?" The first step any agency should take is to encode its data in XML... It is important to put the data in a format where there is metadata or tags around the data, so when one is looking at the data, they know what they are looking for. If you just publish via CSV, there is no easy way to consume the data. But if you have XML, you have tags that describe the data.

See also: the Data.gov Web site

Conformance in the Floating World: Duty Now for the Future?
Rick Jelliffe, O'Reilly Technical

This article looks at some trends and challenges for document validation. The challenges come in two classes: first, raw capabilities for lifecycle support for standards; second, coping with transitions from technologies defined by implementation to technologies defined by standards with the necessary agility. There are many other minor challenges: validating sliced and diced documents distributed as parts in a ZIP archive, for example. Standards-making stakeholders working on the XML-in-ZIP specifications will be increasingly grappling with this kind of issue, just as a function of the maturity of XML.

The Rise and Fall of Mandatory Validation: SGML was based on a strict regime: the document had to have a grammar (DTD), and it had to conform to the grammar (DOCTYPE). This idea, the jargon is rigorous markup, springs from a workflow that was likely to be distributed in time (archiving), task (re-targetting), and responsibility (different organizations.) You don't want to bad documents to propagate down the line (in terms of tasks, organizations or years) before discovering it is flawed. Mandatory validation had other benefits too, notably it enforced design by contract. However, mandatory validation requires schemas, which add to the expertise investment. And it had become clear that high-volume, low stakes data delivered from a debugged system did not benefit from being validated before being sent: validation was appropriate for debugging and verification. And at the receiver end, the increasing use of schemas to generate code (data binding) which builds some kinds of validation (generated code rarely checks the full constraints, but necessarily barfs when there is an error in the document against the parts of the schema that it has used.) If there has been a trend away from mandatory validation, there has also been a trend away from voluntary validation...

The Rise and Fall of Extensibility: With XML, well-formedness checking is Draconian, but validation with a schema is not manadatory. So a different approach was taken to allowing successful large systems: modularity. The XML Namespaces spec allows your documents names to be partitioned off into different namespaces, with each namespace being in effect a vocabulary or sublanguage. Generic software or libraries that only understood a single vocabulary become possible, such as for SVG. In this view of modularity, documents would increasingly be composed from items in different standard namespaces: the most extreme implication of this was teased out by XSD 1.1, which does not provide any facility to specify the top-level element. The SGML notion of a document type was replaced entirely. Again, those people who still needed document types switched to RELAX NG or used a simple Schematron assertion...

Minor and Major Versions, and Subsets: But extensibility has its problems. The flexibility of not having mandatory validation creates a vacuum for unmanaged change. Extensible software is often written (as distinct from code generated from schemas) with policies to cope with unexpected elements, with HTML being the poster boy. However, what if the change relates to something regarded as intrinsic to the information? It is popular to say that the application entirely determines how the information is processed (if you have pure data). However, some information items have strong cohesion, so you cannot have one without the other. For example, if you changed the HTML 'TD' element to be called BIGBOYBYGUM, in order to placate those voices in your head I suppose, then you would not expect HTML software to work correctly merely by stripping the element out and continuing with its contents...

Key Exchange: The Impact of the Key Management Interoperability Protocol
Michael Vizard and Kevin Bocek, eWEEK Internview

"In this eWEEK podcast hosted by Mike Vizard, the Director of product marketing for Thales, Kevin Bocek [WWW], talks about the impact that a new KMIP (Key Management Interoperability Protocol) standard will have on spurring widespread adoption of encryption. Storage managers, data center operators, and IT security teams are demanding new support for full encryption next step in data security. With encryption comes the need for integrated key management within databases and other applications to cover business continuity and SLA requirements. Encryption is now being deployed as critical part of enterprise infrastructure. The KMIP interoperability specification is a big step for the industry in terms of supporting development of interoperable security products. The standard will be applicable for key management systems and servers, for both software-based applications (databases, web servers) and hardware (tape drives, disk arrays, network switches)...

Encryption is is now being deployed to a much larger extent than over before, so key management needs to be commensurable... We've seen an end to crypto wars (e.g., certificate formats) so the focus is on reliability and recoverability of data: we need key management that's stable and tested. Key management is moving away from the point applications (from concern about individual keys) to policy and compliance concerns. The question isn't so much "where are my keys?" but "what are my key management policies? how can I pass an audit?... Data center operators need to be able to know and prove that their encryption supports recoverable data. Where to encrypt: software, hardware, both? It depends: what's the risk: what's threat to the data, and what are the consequeneces of compromise (data breach). If you have high-value transactions, the data may need to be protected (encrypted) early and throughout its life cycle; but some data becomes sensitive only at certain points. For storage systems, encryption is now moving from being an option to being an expectation/requirement. Many devices are shipped with hardware that supports encryption -- even if it's not activated initially. Similarly, with databases: expectation is growing that information is encrypted in the database. Cost factors are coming down as encryption gets embedded in applications and hardware devices. A challenge however is management of keys in terms of automation and auditing for operations. KMIP interoperability will help a lot...

Google Wave's Architecture
Abel Avram, InfoQueue

Google Wave is three things: a tool, a platform and a protocol. The architecture has at its heart the Operational Transformation (OT), a theoretical framework meant to support concurrency control. According to Google's definition, Google Wave is "a new communication and collaboration platform based on hosted XML documents (called waves) supporting concurrent modifications and low-latency updates."

The Tool: Google Wave is an email program + instant messenger + collaborative document sharing & editing tool. It is using JavaScript and HTML5 on the client side running in browsers like Chrome, Firefox, Safari, including mobile platforms (iPhone, Android), and Java + Python on the server side, but the server side can be implemented with anything a customer wants. The tool was built with GWT and uses Google Gears to handle drag and drop which is not yet included in HTML 5. The tool needs a dedicated server to handle concurrent communications which is needed especially for large teams. The server can be outside in the cloud or inside in a private enterprise or, simply, in someone's home. The Platform: Google Wave comes with a public API and the company promises to open source the entire platform before the product goes live. As a platform, Wave allows developers to modify the base code and extend it with gadgets and robots. Gadgets are small programs running inside of a wave, while robots are 'automated wave participants'. Wave can also be embedded in other mediums like blogs.

As for the Protocol: the main elements of Google Wave's data model are: (1) Wave: Each wave has a globally unique wave ID and consists of a set of wavelets. (2) Wavelet: A wavelet has an ID that is unique within its containing wave and is composed of a participant list and a set of documents. The wavelet is the entity to which Concurrency Control / Operational Transformations apply. (3) Participant: A participant is identified by a wave address, which is a text string in the same format as an email address (local-part@domain). A participant may be a user, a group or a robot. Each participant may occur at most once in the participant list. (4) Document: A document has an ID that is unique within its containing wavelet and is composed of an XML document and a set of "stand-off" annotations. Stand-off annotations are pointers into the XML document and are independent of the XML document structure. They are used to represent text formatting, spelling suggestions and hyper-links. Documents form a tree within the wavelet. (5) Wave View: A wave view is the subset of wavelets in a wave that a particular user has access to. A user gains access to a wavelet either by being a participant on the wavelet or by being a member of a group that is a participant (groups may be nested).

Operational Transformation is the crucial part of Wave's technology. Google Wave makes extensive use of Operational Transformations (OT) which are executed on the server. When an user edits a collaborative document opened by several users, the client program provides an Optimistic UI by immediately displaying what he/she types but it also sends the editing operation to the server to be ratified hoping that it will be accepted by the server. The client waits for the server to evaluate the operation and will cache any other operations until the server replies... The server is the keeper of the document and its version is considered the 'correct' version. In the end, each client will be updated with the final version received from the server, which is the result of possibly many operational transformations. There are recovery means provided for communication failure or server/client crash. All XML documents exchanged between the client and the server carry a checksum for rapid identification of miscommunications.

[Note: Ben Parr's article "The Top Six Game-Changing Features of Google Wave" identifies: Live Collaborative Editing with Wiki-Style Functionality, Wave Extensions, Drag-And-Drop File Uploads, Wave Embeds, Playback, and Open Source.]

See also: Ben Parr's article

Spinning Up: OASIS DITA Pharmaceutical Content Subcommittee
OASIS DITA TC Members, Draft Proposal for New Work

In May 2009, members of the OASIS Darwin Information Typing Architecture (DITA) Technical Committee voted to form a new OASIS DITA Pharmaceutical Content Subcommittee to support creation, maintenance, and publishing of pharmaceutical documentation using DITA constructs. The goal is to "optimize the value of DITA it is an objective to support these specializations with additional topics and maps for facilitating the business processes of content design, authoring, document review and submission assembly."

As initially proposed, the Subcommittee (DITA-PC-SC) will "define DITA topics, maps, associated metadata properties and terminology to streamline design and creation of the complete body of pharmaceutical documentation required to present a product for scientific and regulatory purposes throughout its lifecycle. These constructs will include: (1) a pharmaceutical content taxonomy of DITA topics, (2) the metadata and terminology to be associated with each topic instance, and (3) a taxonomy of DITA maps all of which are defined to optimize reuse and re-purposed content. The initial objectives of the DITA Pharmaceutical Content Subcommittee are to define topics and maps as required to implement: [A] ICH CTD (Common Technical Document) content specification; [B] US IND (Investigational New Drug) content specification; [C] EU CTA (Clinical Trial Authorization) content specification; [D] FDA Structured Product Labeling content specification; [E] EU Product Information Management content specifications.

From the minutes of the May 26, 2009 DITA TC meeting: "Steffen Fredericksen provided background about the rationale for starting the new Subcommittee. Pharmaceutical companies are struggling with multiple government regulatory standards being imposed. Information has to be delivered in various formats, including SPL structured product labeling format, Product Information Management (PIM) for the European Union, ECDD for FDA approval, and more. All of these standards are rendition standards, but none is suited to be a common, backbone standard for storing, editing, and working with the data. Quality of documentation is absolutely essential for the industry..." [Contact Carol Geyer, OASIS Director of Communications, for details.]


SEARCH \| ABOUT \| INDEX \| NEWS \| CORE STANDARDS \| TECHNOLOGY REPORTS \| EVENTS \| LIBRARY

Headlines

Sponsors