Cover Pages: XML Daily Newslink: Monday, 07 July 2008

A Cover Pages Publication http://xml.coverpages.org/
Provided by OASIS and Sponsor Members
Edited by Robin Cover

This issue of XML Daily Newslink is sponsored by:
Primeton http://www.primeton.com

Headlines

Google Releases 'Protocol Buffers' Data Language
Convenience Over Correctness
An Extension to the Presence Information Data Format: Location Object (PIDF-LO) for the Timezone of a Presentity
Web-Scale Workflow Track Semantic Provenance for eScience: Managing the Deluge of Scientific Data
Software-as-a-Service: The Spark That Will Change Software Engineering?
OASIS Web Services Discovery and Web Services Devices Profile (WS-DD) TC
Configuration Data Model for IPFIX and PSAMP
Collective Intelligence: What Good Is It?

Google Releases 'Protocol Buffers' Data Language
Thomas Claburn, InformationWeek

Google has released "Protocol Buffers" as an open source data description language that the company developed for internal use. Think of it as XML's cousin, but simpler, smaller, and faster. Google software engineer Kenton Varda, in a post on the Google open source blog, said that Google uses literally thousands of different data formats, most of which are structured. Encoding these data formats on a massive scale is too much for XML, so Google developed Protocol Buffers. Reference documentation is provided for working with protocol buffer classes in C++, Java, and Python. Varda compares Protocol Buffers to an Interface Description Language (IDL), without the complexity... XML remains a better choice for files like text-documents. XML is intended to be human-readable and human-editable. A Protocol Buffer requires a .proto file message definition to be understood. The free download that Google is offering includes the complete source code for the Java, Python, and C++ protocol buffer compilers. In the online FAQs for Protocol Buffers, Google says that it has many other software projects that it intends to release as open source. According to the documentation, "Why not just use XML?" -- Protocol buffers have many advantages over XML for serializing structured data. Protocol buffers are simpler, are 3 to 10 times smaller, are 20 to 100 times faster, are less ambiguous, and generate data access classes that are easier to use programmatically... Protocol buffers are now Google's lingua franca for data—at time of writing, there are 48,162 different message types defined in the Google code tree across 12,183 .proto files. They're used both in RPC systems and for persistent storage of data in a variety of storage systems.

Convenience Over Correctness
Steve Vinoski, IEEE Internet Computing

"Several of my columns over the years have discussed the remote procedure call (RPC) abstraction. First described in RFC 707, with implementation approaches and details later provided by Andrew Birrell and Bruce Nelson, RPC has influenced distributed systems research and development since the early 1980s. One interesting aspect about the introduction and existence of newer RPC systems is that we've already known for many years that RPC is fundamentally flawed. Why do we continue to use RPC-oriented systems when they're fraught with well-known and well-understood problems? It's easy: RPC-oriented systems aim to let developers use familiar programming language constructs to invoke remote services, passing requests and data to them and expecting more data in response... The illusion of RPC—the idea that a distributed call can be treated the same as a local call—ignores not only latency and partial failure but also the concerns that spell the difference between a scalable networked system with good performance capabilities and a nonscalable one whose performance characteristics are dictated entirely by the RPC infrastructure... Representational state transfer (REST), on the other hand, addresses all these concerns and more. It offers clear layering and separation of concerns, and it meets network effects head-on. For example, caching is relatively straightforward with RESTful HTTP because clients can make conditional GET requests and servers can specify cache-control headers. HTTP also specifies which of its verbs are idempotent, which helps address partial failure and its resulting indeterminacy issues. RESTful applications are well-equipped to deal with intermediation and loose coupling. Many developers are thus attracted to REST, but unsurprisingly, some try to build programming language frameworks to make it convenient. These frameworks invariably come up short and ignore important REST elements, such as its hypermedia constraint, because those elements don't fit well with typical general-purpose programming language abstractions... Many still using RPC in the enterprise are starting to realize they'd be better off with either message queuing or RESTful HTTP, depending on the nature of their applications. The developers of Facebook Thrift and Cisco Etch, as convenient as those systems might be, would have been better off providing an XMPP- or AMQP-based message-queuing system or relying on RESTful HTTP; perhaps both cases are instances of those not knowing history being doomed to repeat it. It's time for RPC to retire. I won't miss it."

An Extension to the Presence Information Data Format: Location Object (PIDF-LO) for the Timezone of a Presentity
Sharon Chisholm (et al., eds), IETF Internet Draft

The NETCONF Event Notifications standard specifies the mechanism by which NETCONF clients can subscribe to and receive event notifications. However, with the exception of a timestamp, no standard Notification content was defined. This memo defines a set of information that should be included in all NETCONF notifications, information that should be included based on class of notification and also defines a set of specific notifications to support specific management functions, such as configuration. Section 4 presents the XML Schema for Notification Content; it defines both the complex types to be used to derived implementation-specific Notification definitions as well as specific standard Notification definitions... For management to be effective and scalable, it cannot solely rely on request-response based management patterns. Instead, it is crucial that also event-driven management is supported. In general, event-driven management obviates the need for polling cycles that are wasteful in terms of the management bandwidth and CPU load they consume—bearing in mind that many polling cycles do not result in any new information. This makes management inherently more scalable (more efficient use of resources) and also improves response time, as a noteworthy event is picked up immediately, not after the next polling cycle. The enabler for event-driven management are event notifications. Different classes of events can be distinguished. Each event Class serves a different management purpose, hence applications should be able to differentiate between classes of events they are interested in from those that they are not. In addition, each event Class contains a set of unique event information that is common to that Class. The definition of Event Notification content specific to this event category.

Web-Scale Workflow Track Semantic Provenance for eScience: Managing the Deluge of Scientific Data
Satya S. Sahoo, Amit Sheth, and Cory Henson; IEEE Internet Computing

Provenance information in eScience is metadata that's critical to effectively manage the exponentially increasing volumes of scientific data from industrial-scale experiment protocols. Semantic provenance, based on domain-specific provenance ontologies, lets software applications unambiguously interpret data in the correct context. The semantic provenance framework for eScience data comprises expressive provenance information and domain-specific provenance ontologies and applies this information to data management. The authors' "two degrees of separation" approach advocates the creation of high-quality provenance information using specialized services. In contrast to workflow engines generating provenance information as a core functionality, the specialized provenance services are integrated into a scientific workflow on demand. This article describes an implementation of the semantic provenance framework for glycoproteomics... The two-degrees-of-separation approach is also founded on the Component-Based Software Engineering principle and on recent developments in service-oriented computing (SOC). The CBSE approach is based on reusable, loosely coupled, independent components for software system development. The Web-services-based SOA approach realizes the CBSE approach's objectives. Provenance-generation tools implemented as specialized Web services take advantage of the extensive and comprehensive Web services ecology already in place featuring representation schema, communication standards, and a registry standard. Some workflow engines' use of WSDL-based descriptions to create provenance is already constrained by the ambiguous data-typing of parameters (often as a string data type). Furthermore, the SOC community is rapidly adopting 'lightweight' representational state transfer (REST) services as an alternative to a 'heavyweight' WSDL-based architecture... The [chosen] semantic-provenance framework achieves the important requirements identified by the proposed Open Provenance Model (OPM), part of the international provenance challenge; it also addresses many nonfunctional requirements using the rich set of publicly available resources that the Semantic Web research community has created... The well-defined semantics of the Resource Description Framework (RDF) model, and expressive formal-logic-based OWL language, address the requirement for 'precise description of provenance information'... We're implementing our framework to model provenance information of sensor data related to weather forecasting to demonstrate the use of semantic provenance information for data integration. We're also extending the ProPreO ontology to incorporate a Nuclear Magnetic Resonance (NMR)-based data-analysis protocol. This will let software applications use semantic provenance information to create an unambiguous context for comparing experimental data for toxicology metabolomics using ms-based and NMR-based data-analysis approaches...

Software-as-a-Service: The Spark That Will Change Software Engineering?
Greg Goth, IEEE Distributed Systems Online

Software-as-a-Service (SaaS) is receiving a lot of attention in analysts' briefings and technology trade press articles. In the past year, SaaS has emerged from its pioneering group of start-ups and medium-sized vendors to be embraced, albeit awkwardly, by software giants including Oracle and SAP. Much of the attention SaaS has garnered in recent months has focused on the new business model that on-demand software enables. However, some veteran technologists who've adopted SaaS for their own livelihood, and analysts as well, say that the phenomenon might well be the catalyst for a far wider-ranging discussion on software development for the next generation... One major technological factor in advancing the new development models might be the rise of service-oriented architecture (SOA) and Web services standards. The ASP model, championed in the late 1990s and early years of this decade, never took off because its one-to-one architecture was inherently difficult to scale. SaaS technology, however, takes advantage of a one-to-many SOA-enabled architecture that can offer customized services to different customers, and even different branches of the same enterprise. One example is a credit-risk scoring application offered on a SaaS basis by Fidelity National Information Services. The platform can combine, in near-real time, up to 600 customer attributes from 20 disparate data sources, including credit bureau attributes, US Postal Service address histories, motor vehicle license data, and more. It can also customize those attributes on the fly by bank branch or product line. For instance, a bank in San Francisco might have different red flags for account fraud from a bank in New York City. Yet the SaaS-based application can customize those attributes, working off the same code base...

OASIS Web Services Discovery and Web Services Devices Profile (WS-DD) TC
Staff, OASIS Announcement

OASIS announced the publication of a draft charter for a proposed "OASIS Web Services Discovery and Web Services Devices Profile (WS-DD) Technical Committee." A corresponding comment period is open through July 21, 2008. Proposers include representatives from CA Inc., Canon Inc., Lexmark International Inc., Microsoft Corporation, Nortel Networks Limited, Novell Inc., Progress Software Corporation, Red Hat Inc., Ricoh Company Limited, Schneider Electric SA, Software AG, and WSO2 Inc. The purpose of the WS-DD Technical Committee is to define: (1) A lightweight dynamic discovery protocol to locate web services that composes with other Web service specifications. (2) A binding of SOAP to UDP (User Datagram Protocol), including message patterns, addressing requirements, and security considerations. (3) A profile of Web Services protocols consisting of a minimal set of implementation constraints to enable secure Web service messaging, discovery, description, and eventing on resource-constrained endpoints. The TC will accomplish this purpose through continued refinement of the Web Services Discovery (WS-Discovery) specification, the SOAP-over-UDP specification, and the Devices Profile for Web Services (DPWS) specification, which would be submitted to the TC. Ratification of the DPWS, WS-Discovery, and SOAP-over-UDP specifications as OASIS standards, including a brief maintenance period, after the specifications are ratified as a standard, to address any errata, will mark the end of the TC's lifecycle.

Configuration Data Model for IPFIX and PSAMP
Gerhard Muenz and Benoit Claise (eds), IETF Internet Draft

Members of the IETF IP Flow Information Export (IPFIX) Working Group have published an initial -00 Internet Draft for "Configuration Data Model for IPFIX and PSAMP." IPFIX and PSAMP compliant monitoring devices (routers, switches, monitoring probes, collectors etc.) offer various configuration possibilities that allow adapting network monitoring to the goals and purposes of the application, e.g. accounting and charging, traffic analysis, performance monitoring, security monitoring. The use of a common device-independent configuration data model for IPFIX and PSAMP compliant monitoring devices facilitates network management and configuration, especially if monitoring devices of different implementers and/or manufacturers are deployed simultaneously. On the one hand, a device-independent configuration data model helps storing and managing the configuration data of monitoring devices in a consistent format. On the other hand, it can be used for local and remote configuration of monitoring devices... The purpose of this document is the specification of a device-independent configuration data model that covers the commonly available configuration parameters of Caches and Selection Processes, Exporting Processes, and Collecting Processes. The data model is encoded in Extensible Markup Language (XML). An XML document conforming to the configuration data model contains the configuration data of one monitoring device. In order to ensure compatibility with the NETCONF protocol (RFC 4741), YANG is used as modeling language. If required, the YANG specification of the configuration data model can be converted into XML Schema language using the pyang tool. YANG provides mechanisms to augment the configuration data model with additional device-specific or vendor-specific parameters.

Collective Intelligence: What Good Is It?
Tim O'Reilly and John Battelle, Radar Blog

For Launch Pad 2008, the focus will be on startups in the fields of alternative energies, social entreprenuerialism, microfinance, developing economies, political action, renewable technologies, and the like. In an era of looming scarcities, economic disruption, and the possibility of catastrophic ecological change, it's time for us all to wake up, to take our new "superpowers" seriously, and to use them to solve problems that really matter. The potential is huge. In recent months, I've seen fascinating startups for earth monitoring, carbon markets, energy efficiency of electronic devices, and home energy management... As we pondered the theme for this year, one clear signal has emerged: our conversation is no longer just about the Web. Now is the time to ask how the Web—its technologies, its values, and its culture—might be tapped to address the world's most pressing limits. Or put another way — and in the true spirit of the Internet entrepreneur—its most pressing opportunities. As we convene the fifth annual Web 2.0 Summit, our world is fraught with problems that engineers might charitably classify as NP hard — from roiling financial markets to global warming, failing healthcare systems to intractable religious wars. In short, it seems as if many of our most complex systems are reaching their limits. It strikes us that the Web might teach us new ways to address these limits. From harnessing collective intelligence to a bias toward open systems, the Web's greatest inventions are, at their core, social movements. To that end, we're expanding our program this year to include leaders in the fields of healthcare, genetics, finance, global business, and yes, even politics.


SEARCH \| ABOUT \| INDEX \| NEWS \| CORE STANDARDS \| TECHNOLOGY REPORTS \| EVENTS \| LIBRARY

Headlines

Sponsors