Cover Pages: XML Daily Newslink: Wednesday, 17 December 2008

A Cover Pages Publication http://xml.coverpages.org/
Provided by OASIS and Sponsor Members
Edited by Robin Cover

This issue of XML Daily Newslink is sponsored by:
Sun Microsystems, Inc. http://sun.com

Headlines

Overlap of Identity Technologies
Contract Versioning, Compatibility and Composability
CMIS: Scope of Hierarchical Properties
Managing a Flood of New Data
Essential Hierarchy
Expressing Confidence in a Location Object

Overlap of Identity Technologies
Eric Sachs, Ashish Jain, Paul Madsen; Community Google Document

OAuth, OpenID, SAML, SaaS, 2ndFactorAuth, InfoCards, OpenSocial, Portable Contacts, WS-*, Geneva, ... Have you ever wondered how all these technologies fit together? Because the technologies are evolving, and do overlap in some cases, they can fit together in a few different ways. The goal of this document is to give some examples of how the technologies fit together. There will be some debate over the pros/cons of overlapping technologies such as OpenID/SAML as well as different forms of Strong/2ndFactorAuth, and certainly the full scope of InfoCards could overlap with almost all of these. But as the adoption levels of these technologies increase, we will identify the different use cases where one technology fits better then another, and that should help companies decide what combination of technology best meets their need... There are roughly two "camps" in the identity space, "Enterprise" and "OpenSource." The group with the most experience is the "camp" that is more Enterprise focused, and has been developing technologies such as SOAP, SAML, Liberty ID-WSF, & WS-* for many years. There are many enterprise vendors in this space who provide software that supports these technologies. The "OpenSource" camp is newer, and has primarily focused on building lighter weight protcols (such as OAuth & OpenID) that are available with open source implementations on many platforms. That focus on "lighter weight" had led them to focus on REST based APIs as opposed to SOAP based APIs. However as these technologies have evolved and added functionality, that added functionality has made them look similar to the heavier WS-Trust & SAML protocols. Most of the major SaaS vendors (Amazon/Google/Salesforce/Yahoo/etc.) have taken this approach because they provide additional features that better supported consumers, small businesses, and "mashup/web2.0" developers. Those SaaS vendors are now heavily involved in designing/evolving these technologies to ensure they could be scaled in a highly reliable manner on their infrastructure...

Contract Versioning, Compatibility and Composability
Kjell-Sverre Jerijaervi and Jean-Jacques Dubray, InfoQueue

In the recent weeks, many industry analysts have been prompt to point out fears, uncertainties and doubts about SOA. Gartner, for instance, claims that companies with plans to start SOA initiatives are falling, while companies which plan no SOA initiatives have increased from 6 to 16% in the last 12 months. These notes and articles often sound as if companies no longer believe in building reusable IT assets which can be composed in different solutions. We believe that the explanation for this lower level of interest in SOA is quite different: one of the key failure of SOA initiatives has been precisely the inability to produce reusable and composable assets. It seems as if that every new consumer brings enough fresh requirements to mandate a service different from existing services.. If you ever hope to reuse a service, it is imperative to have clear design guidelines for contracts that express what this service provides and how it can be consumed. You may be able start your SOA initiative without much SOA Governance, but it would be a mistake not to have contract design guidelines. Overtime, these design guidelines will of course become a central part of your SOA Governance design compliance policies... In this article, we provide a series of recommended practices for establishing a Service Contract Versioning strategy geared towards service reuse, composability and compatibility with prior consumers (or providers). We claim that such versioning strategy is essential to achieve satisfactory levels of service reuse and in turn generate higher (and expected) ROI from SOA initiatives... We define a versioning strategy which focuses on reuse by enabling services to evolve to meet new consumer requirements without breaking existing consumers of the service. This approach adds a new dimension to the reuse of service: in a way it introduces a 'forward' reuse strategy, as the 'new version' of a service is reused by the older consumers instead of the other way around, while people traditionally think of reuse when a new consumer reuses a service designed for an existing service consumer. From our experience, we feel that compatible, versioned data models, messages and services have not been a primary concern of SOA initiatives. In addition, of those that defined a versioning strategy, very few have used XML and XML Schema extensibility. It is our strong belief that a compatibility-based versioning strategy can increase service discoverability, composability and true reuse. It can also reduce, albeit not eliminate, the need for service governance. Overall, it is expected that the cost of construction, operation and maintenance of a service will greatly be reduced by such versioning strategy. It is time to move beyond primitive reuse to reap the benefits of your service inventory.

CMIS: Scope of Hierarchical Properties
Jens Hübel, CMIS TC Wiki

One of the early CMIS TC discussion topics was the "Scope of Hierarchical Properties." Jens Hübel wrote: "To start this discussion I will come up with a list of use cases (A) Dependent Pick Lists. Sometimes properties have a hierarchy between each other. You might have a list of car manufactures on first level and a list of car models on the second level where the allowed elements depend on the value of the first level. (B) Structured documents. Many repositories allow documents not being modeled as something flat but having structure within the document itself. CMIS does support versions as a substructures, other repositories might have additional structures like sub-components, languages, renditions, attachments, whatever. Often repositories allow properties on document level as well as on component level. Some properties might have a global scope (like name, status), whereas other properties might depend on the component level (version-specific, language-specific, attachment specific), e.g., author, modificationDate. The structure is not necessarily restricted to two levels, but can be a tree (e.g., Document-Version- Language-Rendition). (C) Relationship within properties. Some repositories allow properties to be grouped and have a structure in itself. An invoice might have a an ordered list of items, each item having an amount. This is a kind of relationship on property level. The CMIS relationship service might be a way to deal with those situations. (D) Multilingual properties. Perhaps a special form of a hierarchy, but could have some common pattern. Allow a value of a property to be available in more than one language, for example a comment field in English and French... [David Choy] Regarding 'structured document', it may be a major topic by itself. Some people think of them as content, such as large technical manuals that contain subcomponents. Others think of them to contain metadata, such as forms. And still others argue that content and metadata should be treated the same with no semantic distinction. This discussion may touch upon XML and even XQuery. The TC should decide whether we want to address this for v1.0. In other words, we need to define the scope of 'hierarchical/ complex property' if we want to include it in v1.0..." [Note: CMIS — Content Management Interoperability Services (CMIS) is being developed in a new OASIS Technical Committee.]

Managing a Flood of New Data
Shawn P. McCarthy, Government Computer News

Government information technology managers are constantly inundated with new types of data arriving from an ever-increasing number of sources. It's their job to figure what's worth keeping from each data stream, how to store it, how to access it and how to make the data available to a wide variety of applications. One type of information already having an impact is data generated by various civil engineering projects. That includes information from road and bridge sensors, water level sensors, smart lighting controls for buildings or public spaces, and even citywide networks of traffic controls, highway signs and monitors along fences and borders. To understand that growing data flow and associated issues, let's start with the sensors themselves. They are often transducers. A transducer typically measures energy produced by pressure, movement or heat then converts that energy into something else, such as an electrical impulse that can be recorded as data. New types of sensors implanted in bridges can measure the movement of girders or plates, metal corrosion, and other types of wear. A local system usually collects sensor impulses and converts that information into a specific type of data that can then be sent to a computer. Those systems produce a variety of data types, many of which are proprietary. But a standard called Transducer Markup Language is becoming increasingly common. It can be used to create a type of XML document that describes data produced by a transducer in a standardized way and includes metadata that describes the system producing the data, the date and time it was collected, and how multiple devices relate to one another within a network or via Global Positioning System coordinates.. Of the nation's nearly 500,000 bridges, the Federal Highway Administration cataloged 25.8 percent as structurally deficient or functionally obsolete as of 2006. That doesn't mean they are heading for collapse, but it does mean they need monitoring. Traditionally, that has meant periodic visual inspections. But as the 2007 collapse of the I-35W Mississippi River bridge in Minneapolis showed, visual inspections might not be enough. The replacement bridge built in Minneapolis contained hundreds of special sensors, many cast right into the concrete. The University of Minnesota and the Minnesota Department of Transportation monitor the data those sensors collect. Realizing that there will be a growing demand for such systems, researchers at Clarkson University have developed a prototype bridge sensor that doesn't need a battery. It powers itself via the vibrations of a typical bridge, similar to those flashlights that you charge by cranking or shaking... Right now, transportation- related sensors are leading the way. But other agencies will soon notice their own flood of sensor data. The Agriculture Department will see more data from crop and livestock sensors. The Energy Department will see more information on energy consumption and how weather and cost affect it. Meanwhile, the Homeland Security Department is already dealing with data from border sensors and video surveillance systems.

Essential Hierarchy
Jeni Tennison, Blog

"In [a recent] post I discussed the kinds of situations where overlapping markup can appear in documents, and the distinction between containment, when one element happens to contain another, and dominance, where the relationship between the two elements is more meaningful. Here I'll expand a bit more on the issue of whether dominance relationships are or should be part of the essential information in the document... Overlap is arguably the main remaining problem area for markup technologists. Capturing and analysing the overlap between poetic and syntactic structures in poems and plays helps academics gain a deeper understanding of the ways poetic technique has changed over time. And the complexities of structures in documents such as the Bible simply cannot be represented without allowing overlap to happen. But academic study aside, overlap is a really important problem because whenever we collaborate on documents and whenever we change documents, we create overlapping structures. One of the major projects that I've worked on at TSO deals with publishing consolidated legislation, showing the places where 'current' legislation was amended over time from its original, enacted state. The authors of legislation care little for document structures, and amendments often overlap document structures such as paragraphs and list items, and each other... When you're talking about overlapping structures, it's useful to make the distinction between structures that contain each other and structures that dominate each other. Containment is a happenstance relationship between ranges while dominance is one that has a meaningful semantic. A page may happen to contain a stanza, but a poem domainates the stanzas that it contains. In LMNL, we view a document as consisting of a sequence of atoms, usually characters, and ranges over those characters. But the model makes no assertions about dominance relationships between the ranges. This document model is easy to construct from a serialised document like the one above. Conversely, GODDAG document models are directed acyclic graphs (DAGs): the nodes within those graphs have children and parents, with leaf nodes containing characters, and the parent-child relationship implies dominance. This is a useful model for processing, and particularly querying. Navigating a DAG is a lot like navigating a tree, just one that represents multiple hierarchies. But it isn't possible to construct a DAG from a serialised document like the one above without extra information about which containment relationships are actually dominance relationships, and which mere happenstance. [...] James Clark commented: ... 'I would be inclined to start by designing the information model first and then figure out a syntax to represent that information model. Maybe I'm just brainwashed by too much XML/SGML, but the hierarchical relationships seem like a fundamental aspect of the information about the document which the markup should be capturing explicitly.' [...] As well as overlap, LMNL has weird things like structured and ordered annotations, atoms, and anonymous ranges. In that spirit, I want to see if we can get away with not having hierarchy as a fundamental part of the information model. Does this allow us to do things that we couldn't otherwise do, or is it a burden? I don't know yet... If the syntax for expressing hierarchies is that verbose and difficult to use, people won't use it, and we'll have to find a way to add dominance relationships programmatically. We might as well start from that point. But perhaps someone out there can come up with a clean, elegant syntax for expressing dominance within overlapping markup?"

Expressing Confidence in a Location Object
Martin Thomson (ed), IETF Internet Draft

Members of the IETF Geographic Location/Privacy (GEOPRIV) Working Group have published an initial -00 Internet Draft for "Expressing Confidence in a Location Object." The document defines a confidence element that expresses the estimated probability that associated location information is correct. This element conveys information that might otherwise be lost about the probability distribution represented by a region of uncertainty. Section 4 provides the Confidence Schema (XML) with the proposed XML namespace. Location information is often less than perfect. Two measures are used to quantify how imperfect the location information is: uncertainty and confidence. These terms, and their relationship with location information are explored in detail in the I-D "Representation of Uncertainty and Confidence in PIDF-LO". Standard forms for the expression of uncertainty are included in "GEOPRIV PIDF-LO Usage Clarification, Considerations and Recommendations", but confidence is fixed to a value of 95%. On the whole, a fixed definition for confidence ensures consistency between implementations. Location generators that are aware of this constraint can generate location information at the required confidence. Location recipients are able to make sensible assumptions about the quality of the information that they receive. In some circumstances—particularly with pre-existing systems -- location generators might provide location information with some other confidence. Common values include 38%, 67% and 90%; all of which are prevalent in current systems. Existing forms of expressing location information, such as that defined in "Generation Partnership Project; Technical Specification Group Code Network; Universal Geographic Area Description (GAD)", contain elements that express the confidence in the result. This element adds information that was previously unavailable to recipients of location information. Without this information, a location server or generator that has access to location information with a confidence lower than 95% has two options. Both of these choices degrade the quality of the information provided: (1) The location server is forced to either attempt to scale regions of uncertainty in an attempt to acheive 95% confidence. This scaling process significantly degrades the quality of the information, because the location server might not have the necessary information; the assumptions that have to be made result in poor quality results. (2) The location server can ignore the confidence entirely, which results in giving the recipient of that information a false impression of its quality.


SEARCH \| ABOUT \| INDEX \| NEWS \| CORE STANDARDS \| TECHNOLOGY REPORTS \| EVENTS \| LIBRARY

Headlines

Sponsors