Cover Pages: XML Daily Newslink: Friday, 26 February 2010

A Cover Pages Publication http://xml.coverpages.org/
Provided by OASIS and Sponsor Members
Edited by Robin Cover

This issue of XML Daily Newslink is sponsored by:
Sun Microsystems, Inc. http://sun.com

Headlines

IESG Announces Metalink Download Description Format as Proposed Standard
Open Document Format for Office Applications Version 1.0 Errata
W3C Releases Second Web Compatibility Test for Mobile Browsers
New Properties for iCalendar
Electronic Medical Record (EMR) HIT Standards Committee Struggles
New IETF Internet Draft: CardDAV Directory Gateway Extension
CTO Roundtable: Malware Defense
End-to-end Data Integrity for File Systems: A ZFS Case Study

IESG Announces Metalink Download Description Format as Proposed Standard
Anthony Bryan, Tatsuhiro Tsujikawa, Neil McNab, Peter Poeml (eds), IETF Internet Draft

The Internet Engineering Steering Group (IESG) announced approval of The Metalink Download Description Format (I-D version 28) as an IETF Proposed Standard. The sepcification defines Metalink, an XML-based download description format. Metalink describes download locations (mirrors), checksums, and other information. Clients can transparently use this information to reliably transfer files.

IESG reports that there are now interoperable implementations. The specification is not a WG product; it has been developed independently, and achieved interoperability already. It was reviewed to a reasonable extent within the IETF, e.g., within the Apps Discuss mailing list. The XML Namespace registration process depends on an expert who has not been assigned yet, due to a change in processing of IANA registrations.

Details: "Metalink is a document format based on XML that describes a file or list of files to be downloaded from a server. Metalinks can list a number of files, each with an extensible set of attached metadata. Each listed file can have a description, multiple cryptographic hashes, and a list of Uniform Resource Identifiers (URIs) that it is available from. Often, identical copies of a file are accessible in multiple locations on the Internet over a variety of protocols, such as File Transfer Protocol (FTP), Hypertext Transfer Protocol (HTTP), and Peer-to-Peer (P2P). In some cases, users are shown a list of these multiple download locations (mirror servers) and must manually select one based on geographical location, priority, or bandwidth. This is done to distribute the load across multiple servers, and to give human users the opportunity to choose a download location that they expect to work best for them... Knowledge about availability of a download on mirror servers can be acquired and maintained by the operators of the origin server, or by a third party. This knowledge, together with cryptographic hashes, digital signatures, and more, can be stored in a machine-readable Metalink file. The Metalink file can transfer this knowledge to the user agent, which can peruse it in automatic ways or present the information to a human user. User agents can fall back to alternate mirrors if the current one has an issue. Thereby, clients are enabled to work their way to a successful download even under adverse circumstances. All this can be done transparently to the human user and the download is much more reliable and efficient. In contrast, a traditional HTTP redirect to one mirror conveys only comparatively minimal information—a referral to a single server, and there is no provision in the HTTP protocol to handle failures.

Other features that some clients provide include multi-source downloads, where chunks of a file are downloaded from multiple mirrors (and optionally, Peer-to-Peer) simultaneously, which frequently results in a faster download. Metalinks can leverage HTTP, FTP and Peer-to-Peer protocols together, because regardless over which protocol the Metalink was obtained, it can make a resource accessible through other protocols. If the Metalink was obtained from a trusted source, included verification metadata can solve trust issues when downloading files from replica servers operated by third parties. Metalinks also provide structured information about downloads that can be indexed by search engines..."

Open Document Format for Office Applications Version 1.0 Errata
Patrick Durusau and Svante Schubert (eds), OASIS

Members of the OASIS Open Document Format for Office Applications (OpenDocument) Technical Committee have published an Errata document for public review through March 14, 2010. This specification, Open Document Format for Office Applications (OpenDocument) Version 1.0 Errata and associated schema files, are made available for feedback from potential users, developers and others for the sake of improving the interoperability and quality of this OASIS technical work.

The Open Document Format for Office Applications (OpenDocument) format is an open, XML-based file format for office applications, based on OpenOffice.org XML. This document is the second errata document for OpenDocument v1.0. It includes as first part the OpenDocument Format 1.0 Approved Errata 01, covering the first part the comments of the Japanese National Body (Document #N0942) and in the new second part the second comments of the Japanese National Body (Document #N1078) and the first of the British National Body (N1309). Thee errata apply to OpenDocument v1.0 , which is identical in the technical content corrected to ISO/IEC 26300.

See also: the OASIS OpenDocument TC

W3C Releases Second Web Compatibility Test for Mobile Browsers
Staff, W3C Announcement

W3C's Mobile Web Test Suites Working Group has just released a brand new Web Compatibility Test for Mobile Browsers. Based on the same idea of evaluating support of a number of Web technologies at a glance as in the first Web Compatibility Test published in July 2008, this second version features a number of more recent technologies that promise to make Web browsers more powerful, in particular on mobile

A blog article from Kai Hendry notes: "Go and test your mobile with the new test, and if your browser scores a 110% you are cheating... In this fresh forward looking 2.0 test we hope to encourage key technologies that will make the mobile platform simply rock. Of course we have the usual suspects like AJAX support and canvas which were tested in the WCTMB v1 test too. However we gear up by checking for Geolocation support which is very relevant to mobile users and for various helpful offline technologies like application cache and Web storage. These offline technologies help the Web in areas where Internet may be unreliable, which is a lot of places on most mobile devices!

We also make a daring leap into the fray to ask for support of video and audio, which is quite demanding on a mobile device. We allow for all sorts of codecs, though midi files and animated gifs won't pass. :) We also test for new input types, rich text editing and font face support which could be a workaround where phones have a poor font, for instance for a particular locale. No matter where you are from or what language you speak, we hope to entangle you in the Web with any device to hand..."

The W3C Mobile Web Initiative is focusing on developing best practices for mobileOK Web sites and Web applications, device information needed for content adaptation, test suites for mobile browsers, and marketing and outreach activities... While becoming increasingly popular, mobile Web access today still suffers from interoperability and usability problems. W3C's Mobile Web Initiative addresses these issues through a concerted effort of key players in the mobile production chain, including authoring tool vendors, content providers, handset manufacturers, browser vendors and mobile operators. With mobile devices, the Web can reach a much wider audience, and at all times in all situations. It has the opportunity to reach into places where wires cannot go, to places previously unthinkable (e.g., providing medical information to mountain rescue scenes) and to accompany everyone as easily as they carry the time in their wristwatches. Moreover, today, many more people have access to mobile devices than access to a desktop computer. This is likely to be very significant in developing countries, where Web-capable mobile devices may play a similar role for deploying widespread Web access as the mobile phone has played for providing POTS..."

See also: the blog article

New Properties for iCalendar
Cyrus Daboo (ed), IETF Internet Draft

An initial -00 public Internet Draft New Properties for iCalendar has been published by IETF. This document defines a set of new properties for iCalendar data (Calendar Properties, Component Properties, Property Parameters).

The Internet Calendaring and Scheduling Core Object Specification (iCalendar) RFC 5545 data format is used to represent calendar data and is used with RFC 5546 (iCalendar Transport-Independent Interoperability Protocol - iTIP) to handle scheduling operations between calendar users. iCalendar is in widespread use, and in accordance with provisions in that specification, extension elements have been added by various vendors to the data format in order to support and enhance capabilities. This specification collates a number of these ad-hoc extensions and uses the new IANA registry capability defined in RFC 5545 to register standard variants with clearly defined definitions and semantics. In addition, some new elements are introduced for features that vendors have been requesting recently.

Calendar Properties: (1) CALENDAR-NAME Property is used to specify a name (a short, one- line description) of the iCalendar object that can be used by calendar user agents when presenting the calendar data to a user. (2) CALENDAR-DESCRIPTION Property is used to specify a lengthy textual description of the iCalendar object that can be used by calendar user agents when describing the nature of the calendar data to a user. (3) CALENDAR-UID Property specifies the persistent, globally unique identifier for the calendar, where the value of this property MUST be a globally unique identifier and the generator of the property MUST guarantee that the value is unique. (5) CALENDAR-URL Property specifies a URL from where the calendar data was retrieved or where it can be refreshed. (5) CALENDAR-TZID Property specifies a time zone identifier that represents the default timezone for which floating time or all-day events in the iCalendar object can be assumed to be relative to; it can also be used to choose an initial time zone for use when creating new components in the iCalendar object. (6) CALENDAR-REFRESH-INTERVAL Property specifies a positive duration that gives a suggested polling interval for checking for updates to the calendar data; the value of this property SHOULD be used by calendar user agents as the polling interval for calendar data updates. (7) CALENDAR-COLOR Property specifies a color used for displaying the calendar data. (8) CALENDAR-IMAGE Property specifies an image for an iCalendar object via a uri or directly with inline data that can be used by calendar user agents when presenting the calendar data to a user; multiple properties MAY be used to specify alternative sets of images with, for example, varying media subtypes, resolutions or sizes...

Electronic Medical Record (EMR) HIT Standards Committee Struggles
Anthony Guerra, InformationWeek

Instead of recommending small adjustment to the Standards and Certification Interim Final Rule (IFR), the 10th meeting of the U.S. federal HIT Standards Committee was consumed by a fundamental, philosophical debate of just how specific the group should be in its regulations, and the nature of the regulatory process itself.

Harvard Medical School CIO John Halamka, Vice Chair of the committee and co-chair of the Clinical Operations Workgroup: "We have to balance when to be specific and when not... On the one hand, we could provide little specificity and let the industry settle on its own standards, but if you do that and let 1,000 wild flowers bloom, you get many train tracks with different gauges... We have been told that regulations are very hard to change," he added, "and run the risk that if we are too specific, the industry will be stuck in concrete and not able to move. We don't want to ossify technology."

After some initial discussion, Halamka proposed specifying "families" of standards—rather than exact standards to be used within those families—while requiring a "floor," or oldest version, that would be considered acceptable...

New IETF Internet Draft: CardDAV Directory Gateway Extension
Cyrus Daboo (ed), IETF Internet Draft

Members of the IETF vCard and CardDAV (VCARDDAV) Working Group have published an initial -00 version of the CardDAV Directory Gateway Extension specification. The document defines and extension to the "vCard Extensions to WebDAV (CardDAV)" protocol that allows a server to expose a directory as a read-only address book collection.

Overview: "The CardDAV protocol defines a standard way of accessing, managing, and sharing contact information based on the vCard (RFC 2426) format. Often, in an enterprise or service provider environment, a directory of all users hosted on the server (or elsewhere) is available. It would be convenient for CardDAV clients if this directory were exposed as a "global" address book on the CardDAV server so it could be searched just as personal address books are. This specification defines a "directory gateway" feature extension to CardDAV to enable this.

This specification adds one new WebDAV property to principal resources that contains the URI to the directory gateway address book collection resource. Note that this feature is in no way intended to replace full directory access—it is meant to simply provide a convenient way for CardDAV clients to query contact-related attributes in directory records... The 'CARDDAV:directory-gateway' Property thus identifies an address book collection resource that is the directory gateway address book for the server... Clients wishing to make use of the directory gateway address book can request the CARDDAV:directory-gateway property when examining other properties on the principal resource for the user. If the property is not present, then the directory gateway feature is not supported by the server at that time. Servers wishing to expose a directory gateway as an address book collection MUST include the 'CARDDAV:directory-gateway' property on all principal resources of users expected to use the feature. Since the directory being exposed via the directory gateway address book collection could be large, servers SHOULD use the feature to truncate the number of results returned in an 'CARDDAV:addressbook-query' REPORT as defined in Section 8.6.2 of the CardDAV specification... Servers need to expose the directory information as a set of address book object resources in the directory gateway address book collection resource. To do that, a mapping between the directory record format and the vCard data has to be applied..."

The author notes: "Several vendors have expressed interest in this, and indeed the calendarserver.org server does have a directory gateway, though it did not advertise it in the way being proposed in the new draft." The IETF mailing list 'vcarddav.ietf.org' has comment on the draft from Nick Zitzmann, Simon Perreault, Filip Navara, Cyrus Daboo, and others.

CTO Roundtable: Malware Defense
Mache Creeger, ACM Queue

"As all manner of information assets migrate online, malware has kept on track to become a huge source of individual threats. In a continuously evolving game of cat and mouse, as security professionals close off points of access, attackers develop more sophisticated attacks. Today profit models from malware are comparable to any seen in the legitimate world. But there's hope. Some studies have shown that while 25 percent of consumer-facing PCs are infected by some sort of malware, the infection rate of the commercial PC sector is around half that rate...

This ACM Queue CTO Roundtable panel is split between users and vendors; the intent is to educate readers about the scope of the malware threat today, the types of frameworks needed to address it, and how to minimize the overall risk of breach. Participants include: Michael Barrett (CISO of PayPal), Jeff Green (Head of the Threat Research Unit at McAfee Lab), Vlad Gorelik (Vice President of Engineering at AVG Technologies), Vincent Weafer (Vice President for Security Response at Symantec), Opinder Bawa (CIO for the UCSF School of Medicine), and Steve Bourne (CTO at El Dorado Ventures).

Excerpt: (Barrett) "Cloud computing is very promising for cost-effective and burst capacity. There is a very large user base of organizations that are attracted to cloud computing where they deal with nonconfidential information. There are also people who represent more regulated industries, such as financial, that cannot just dump the data in an outsourced cloud and not know its physical location. I have to know where my data resides because there are safe-harbor considerations I must maintain. So the data-location requirement is one issue with clouds. A second issue is the ability to define an application's security requirements. If I have particular security requirements around my application, I don't want it to co-reside with someone else's application that has a different requirement set. We don't have the policy language yet to adequately describe everyone's security requirements. For cloud computing to work, that type of definitional information needs to be in place. It is not there today, but we will undoubtedly get to the point where we will have the proper risk vocabulary to address this issue..."

Weafer: "The past 12 months in the malware-threat landscape have been a natural evolution of the past couple of years. We have seen a huge explosion in the volume of new malware. We've also seen evolution in terms of the sophistication of malware, new data-mining techniques, and new methods of self protection that have really changed the threat landscape. The attacker's ability to get smarter tools easily and use them faster with less technical skill has changed a lot of what we're seeing. We are not looking at a single pandemic threat but a huge explosion of individual threats. What you get on one machine is completely different from what you get on another machine. Each infection has a unique signature. You're served up a unique piece of malware that may not be seen by anyone else in the world. Threats have gone from global to local to personalized..."

End-to-end Data Integrity for File Systems: A ZFS Case Study
Yupu Zhang, Abhishek Rajimwale (et al.), FAST 2010 Conference paper

"We present a study of the effects of disk and memory corruption on file system data integrity. Our analysis focuses on Sun's ZFS, a modern commercial offering with numerous reliability mechanisms. Through careful and thorough fault injection, we show that ZFS is robust to a wide range of disk faults. We further demonstrate that ZFS is less resilient to memory corruption, which can lead to corrupt data being returned to applications or system crashes. Our analysis reveals the importance of considering both memory and disk in the construction of truly robust file and storage systems.

File and storage systems have evolved various techniques to handle corruption. Different types of checksums can be used to detect when corruption occurs, and redundancy, likely in mirrored or parity-based form, can be applied to recover from it. While such techniques are not foolproof, they clearly have made file systems more robust to disk corruptions. Unfortunately, the effects of memory corruption on data integrity have been largely ignored in file system design. Hardware-based memory corruption occurs as both transient soft errors and repeatable hard errors due to a variety of radiation mechanisms, and recent studies have confirmed their presence in modern systems. Software can also cause memory corruption; bugs can lead to 'wild writes' into random memory contents, thus polluting memory; studies confirm the presence of software-induced memory corruptions in operating systems.

The problem of memory corruption is critical for file systems that cache a great deal of data in memory for performance. Almost all modern file systems use a page cache or buffer cache to store copies of on-disk data and metadata in memory. Moreover, frequently-accessed data and important metadata may be cached in memory for long periods of time, making them more susceptible to memory corruptions...

Our results for memory corruptions indicate cases where bad data is returned to the user, operations silently fail, and the whole system crashes. Our probability analysis shows that one single bit flip has small but non-negligible chances to cause failures such as reading/ writing corrupt data and system crashing. We argue that file systems should be designed with end-to-end data integrity as a goal. File systems should not only provide protection against disk corruptions, but also aim to protect data from memory corruptions. Although dealing with memory corruptions is hard, we conclude by discussing some techniques that file systems can use to increase protection against memory corruptions..."


SEARCH \| ABOUT \| INDEX \| NEWS \| CORE STANDARDS \| TECHNOLOGY REPORTS \| EVENTS \| LIBRARY

Headlines

Sponsors