This issue of XML Daily Newslink is sponsored by:
IBM Corporation http://www.ibm.com
- Initial IETF Internet Draft Specification for the vCard XML Schema
- Repository to Repository Transfer of Enriched Archival Information Packages
- InfoWorld Test Center Review: Silverlight, For Real This Time
- Data Points: Cloud Gazing From Silverlight 2
- Create a Framework To Support XSLT Transformation Pipelines
- Data Stream Management: Aggregation, Classification, Modeling, and Operator Placement
- Standards-Based Computing Capabilities for Distributed Geospatial Applications
- Updated OGC Reference Model from the Open Geospatial Consortium
- Unstructured Information Management Architecture (UIMA) Version 1.0
- Last Call Review for W3C 'Selectors API'
- Open Source SOA Requires Expertise
- SOA Could Rebound as Recession-Busting Strategy
- X3D Schematron Validation and Quality Assurance
- Who Said WS-Transfer is for REST?
Initial IETF Internet Draft Specification for the vCard XML Schema
Simon Perreault (ed), IETF Internet Draft
Members of the IETF vCard and CardDAV (VCARDDAV) Working Group have published an initial -00 level Internet Draft for the "vCard XML Schema" specification. The document defines the XML schema of the vCard data format as defined in the "vCard Format Specification." These Standards Track specifications, if approved, will update IETF RFC 2739 ("Calendar Attributes for vCard and LDAP") and obsolete RFCs 2425, 2426, 4770. The "vCard Format Specification" I-D defines the vCard data format for representing and exchanging a variety of information about an individual (e.g., formatted and structured name and delivery addresses, email address, multiple telephone numbers, photograph, logo, audio clips, etc.). It defines a text-based format as opposed to a binary format. Electronic address books have become ubiquitous. Their increased presense on portable, connected devices as well as the diversity of platforms exchanging contact data call for a standard. The new "vCard XML Schema" specification provides a formal definition for the same underlying data as defined in the "vCard Format Specification." The XML formatting may be preferred in some contexts where an XML engine is readily available and may be reused instead of writing a stand-alone vCard parser. The schema is expressed in the Relax NG language. The general idea is to map vCard properties to XML element and vCard parameters to XML attributes. For example, the "FN" property is mapped to the "fn" element. That element's value (a text node) corresponds to the vCard property's value. vCard parameters are mapped to XML attributes. Properties having structured values (e.g. the N property) are expressed by XML element trees. Element names in that tree (e.g., "surname", "given", etc.) do not have a vCard equivalent since they are identified by position in plain vCard. Line folding is a non-issue in XML. Therefore, the mapping from vCard to XML is done after the unfolding procedure is carried out. Conversely, the mapping from XML to vCard is done before the folding procedure is carried out. Since the original vCard format is extensible, it is expected that these vCard extensions will also specify extensions to the XML format described in the schema document. New properties, parameters, data types and values (collectively known as vCard objects) can be registered from IANA.
See also: the vCard Format Specification
Repository to Repository Transfer of Enriched Archival Information Packages
Priscilla Caplan, D-Lib Magazine
Many within the preservation community believe that there is no single "true" preservation solution, that many approaches must be tried and tested, and that redundancy reduces risk. It must be possible for materials archived in one repository to be exported to and ingested by a second repository without loss of authenticity, digital provenance, or other vital preservation information. This article addresses this latter requirement, reviewing the justification for such transfers, transfer standards, and research to date... A number of standards exist that can and have been used as building blocks of common transfer formats. For descriptive metadata, libraries appear to have settled mostly on MODS or Dublin Core, although other schema are in use in some libraries and in other domains. The LMER (Long-term Preservation Metadata for Electronic Resources) schema used in Germany and the PREMIS data dictionary used elsewhere define standard elements of general preservation metadata according to consistent (although different) data models. The JHOVE tool for file identification and characterization also defines some of the same data in its representation information. MIX (the XML representation of NISO/AIIM standard Z39.87 Technical Metadata for Digital Still Images), TextMD, the draft AES metadata standard for digital audio, and several other de facto standards have been used for format-specific technical metadata. METS is widely used as a container format and to express structural metadata. As the container, METS is the primary schema, and the other schemas are used within METS to extend it... In Australia, the PRESTA (PREMIS Requirements Statement) project undertaken by the Australian Partnership for Sustainable Repositories in 2006 looked at how PREMIS could be used in transferring content from one repository to another. The highest, generic profile requires use of MODS, PREMIS and METS; allows MIX, TextMD and various other schema for format-specific technical metadata; and allows either PREMIS or the XACML schema for rights metadata... The optimal documentation of relationships is perhaps the most difficult issue: for example, whole/part relationships can be recorded in both PREMIS and METS, as well as in descriptive metadata schemes such as MODS, MARC and Dublin Core, but not all of these schemes are equally expressive. To complicate matters further, PREMIS allows two methods of linking, by PREMIS identifier and by ID / IDREF-type attributes in the PREMIS XML schema... The TIPR (Towards Interoperable Preservation Repositories) project will test the use of a SKOS-based registry prototype under development by the Library of Congress, the Standards & Research Data Values Registry. The prototype has a RESTful Web-services interface that allows both reading and writing values. Local values from the partners' systems will be registered in the prototype, and relationship information returned by the registry will be used in mapping values between systems.
See also: The Australian METS Profile
InfoWorld Test Center Review: Silverlight, For Real This Time
Martin Heller, InfoWorld
Data Points: Cloud Gazing From Silverlight 2
John Papa, MSDN Magazine
Create a Framework To Support XSLT Transformation Pipelines
Jake Miles, IBM developerWorks
This article explains the creation of a framework, called Butterfly, that runs in PHP 5 and facilitates the applications of chains of XSLT stylesheets to XML source documents. It also looks at the nature of framework design in general as it sketches out the Butterfly framework in particular. Butterfly provides transparent caching of the transformed results. Inspired by the Java-based Apache Cocoon project, so named because it houses and manages the transformation of data from one form to another (turning caterpillars into butterflies), this much lighter-weight framework is [accordingly] named Butterfly. With the Butterfly framework, you can set up an XML configuration file to define chains of stylesheet transformations, and then instantiate Butterfly objects that can each produce the result of an XSLT transformation chain... With the XSL module in PHP 5, you can apply XSLT stylesheets to XML documents to transform the XML data into some other type of text document. This document can be another XML structure, HTML, or any other structure, including plain text or even Java and other programming languages... The chain starts with a source XML document (though not necessarily a file), and applies a series of XSLT stylesheets to it until it produces the final document. This functionality is a small subset of the functionality provided by the Apache Cocoon project, a Java-based framework that processes pipelines of XSLT stylesheets to produce a final document. When the final document is a Web page, one concern when processing XSLT stylesheets is that of performance. For small data documents and simple stylesheets this might not prove to be an issue. For large data sets with thousands of elements, however, applying a series of stylesheets upon each page load can not only slow down the page, but it consumes a lot of memory and processing power on the server. The solution to the performance problem is simple—cache the result of the XSLT transformation as a static HTML page that the Web server can serve instantly, and only perform the full chain of XSLT transformations when the source document or one of the stylesheets has changed. This caching mechanism is not unique to the specific XML or XSLT involved, and therefore, the framework can handle it generically.
See also: Michael Kay on the XSLT language
Data Stream Management: Aggregation, Classification, Modeling, and Operator Placement
Frank Olken and Le Gruenwald, IEEE Internet Computing
This article is the Guest Editors' Introdution to the November/December 2008 issue of IEEE Internet Computing. Data stream management is concerned with managing large streams of data arriving from data communications or sensor networks. The past 40 years have seen the proliferation of microprocessors in everything from watches, PDAs, cell phones, and appliances to automobiles and copy machines. In the coming decade, most (if not all) of these items will be incorporated into computer networks onto which they'll stream sensor data, such as temperature, blood pressure, and room occupancy. RFID chips and readers will provide increasingly fine-grained data on the movement of people and goods, which will be encoded as data streams. Such data will be huge and largely ephemeral, so much of it will exist only as streams rather than be permanently recorded in raw form. Interest in this area is increasing, and we anticipate that it will grow rapidly throughout the coming decade, driven by rapid growth in the pervasiveness and bandwidth of digital communications networks and an impending explosion in sensor networks. Sensor networks involve large distributed networks of intelligent sensors that typically generate streams of timestamped measurements. Sensor network research overlaps with data stream management research, but also encompasses topics in networking protocols, geospatial data processing, and microoperating systems. Click streams (the record of user activities on the Web) and other network logs contribute to the increasing demand for data stream management... Typically, data streams exhibit the following characteristics: (1) infinite length, (2) continuous data arrival, (3) high data rates, (4) requirements for low-latency, realtime query processing, and (5) data that are usually time-stamped and generally arrive in either temporal order or close to it... In wireless sensor networks, data transmitted from sensors to servers are susceptible to losses, delays, or corruption for many reasons, such as power outages at the sensor's node or a higher bit-error rate with wireless radio transmissions compared to the wired communication alternative... Recently, we've seen strong research interests in privacy-preserving data management -- how to manage data without violating privacy policies that deal with data disclosure. For data streams, however, privacy guarantees based on data that use data processing models such as a sliding window might not hold for the overall data, so new solutions are necessary. Finally, in mobile environments, clients, servers, or both might move over time. We must consider problems that arise due to mobility, frequent disconnections, and nodes' energy limitations when managing streams in a mobile setting... We anticipate that the growth in the use of smart sensors, microprocessors, networks, and the World Wide Web will fuel an explosion of demand for data stream management systems in the coming decade. More research is needed to support the design of such systems.
Standards-Based Computing Capabilities for Distributed Geospatial Applications
Craig Lee and George Percivall, IEEE Computer
Some 80 to 90 percent of all information is geospatially related. Examples include oil and gas exploration, weather forecasting and tracking, aviation, satellite ground systems, environmental planning, disaster management, public administration (e-government), civic planning and engineering, and all manner of e-sciences. All such activities entail gathering significant amounts of data and other critical information that must be stored, accessed, and managed... The ability to access, integrate, analyze, and present geospatial data across a distributed computing environment using common tools has tremendous value. Indeed, with the growing connectedness of our world—through data-collecting instruments, data centers, supercomputers, departmental machines, and personal devices such as cell phones, PDAs, and smart phones—as a society we expect a wide range of information to be instantly accessible from anywhere... The Open Geospatial Consortium and the Open Grid Forum are collaborating to develop open standards that address the distributed computing needs of geospatial applications while accommodating the inevitability of diverse formats, schemas, and processing algorithms. The notional geocomputing architecture is largely supported by the OpenGIS standards that OGC already has in place. OGC Web Service standards are layered on top of open Internet standards—in particular, the HTTP, uniform resource locators (URLs), multipurpose Internet mail extensions (MIME), and XML-based World Wide Web standards. The main OWS standards include the following: (1) Web map service, where WMS standardizes the display of registered and superimposed maplike views of information that come simultaneously from multiple remote and heterogeneous sources; (2) Web feature service (WFS) which standardizes the retrieval and update of digital representations of real-world entities referenced to the Earth's surface; (3) Web Coverage service (WCS), which standardizes access to spatially extended coverages, usually encoded in a binary format and offered by a server. Catalogue service for the Web (CSW), which standardizes interfaces to publish, discover, browse, and query metadata about data, services, and other resources. OGC Sensor Web Enablement (SWE) standards make all types of sensors, instruments, and imaging devices accessible and, where applicable, controllable via the Web... Once such information models and architectures are in place, the grid community can use concrete instances to populate catalogs of information for discovery by users and systems.Event notification is a key part of any distributed system and can be accomplished using OGF standards such as Information Dissemination (INFOD) that support the publish-subscribe paradigm. OGF has ongoing work in the context of the OGS WPS standard, which is a key point of collaboration between the organizations... A key goal is implementing WPS with standard grid tools that enable access to a range of back-end processing environments. Beyond WPS, there is a need to develop joint interoperability specifications for workflow management, event notification, and security. Because WPS employs a relatively simple model whereby a user can send data for processing on a (possibly remote) server and retrieve the results, implementations using HPC Basic Profile, SAGA, or GridRPC should all be possible... [IEEE Computer 41/11 (November 2008), 50-57]
See also: the OGF standards
Updated OGC Reference Model from the Open Geospatial Consortium
Staff, OGC Announcement
The Open Geospatial Consortium (OGC) has announced the completion and availability of Version 2.0 of the "OGC Reference Model (ORM)." Contributors include George Percivall (Editor, Open Geospatial Consortium), Carl Reed (Open Geospatial Consortium), Lew Leinenweber (BAE Systems), Chris Tucker (ERDAS), and Tina Cary (OGC/Cary and Associates). The OGC Reference Model provides "a framework for the ongoing work of the OGC and a guide for those who seek to implement interoperable solutions and applications for geospatial services and data. The ORM focuses on relationships between the documents in the OGC Standards Baseline (SB), which consists of the approved OpenGIS Abstract and Implementation Standards (Interface, Encoding, Profile, Application Schema), and OGC Best Practice documents. The ORM provides insight into the current state of the work of the OGC and thus serves as a useful resource for defining architectures for specific applications. It helps prospective members see how they might serve their stakeholders by making a contribution to the OGC process, and it provides overall guidance to developers who are implementing one or more of the OpenGIS Standards. The ORM contains numerous links to OGC resources that provide more detailed information. It is the result of extensive development by hundreds of OGC Member Organizations and tens of thousands of individuals who have contributed to the development of OGC standards since 1994. The Open Geospatial Consortium (OGC) is an international consortium of more than 365 companies, government agencies, research organizations, and universities participating in a consensus process to develop publicly available geospatial standards. OpenGIS Standards support interoperable solutions that "geo-enable" the Web, wireless and location-based services, and mainstream IT. OGC Standards empower technology developers to make geospatial information and services accessible and useful with any application that needs to be geospatially enabled.
See also: the Geography Markup Language (GML)
Unstructured Information Management Architecture (UIMA) Version 1.0
Adam Lally, Karin Verspoor, Eric Nyberg (eds), OASIS Committee Specification
Members of the OASIS Unstructured Information Management Architecture (UIMA) Tecjnical Committee have published an approved Committee Specification 01 for "Unstructured Information Management Architecture (UIMA) Version 1.0." Unstructured information represents the largest, most current and fastest growing source of knowledge available to businesses and governments worldwide. The web is just the tip of the iceberg. Consider, for example, the droves of corporate, scientific, social and technical documentation including best practices, research reports, medical abstracts, problem reports, customer communications, contracts, emails and voice mails. Beyond these, consider the growing number of broadcasts containing audio, video and speech. These mounds of natural language, speech and video artifacts often contain nuggets of knowledge critical for analyzing and solving problems, detecting threats, realizing important trends and relationships, creating new opportunities or preventing disasters. For unstructured information to be processed by applications that rely on specific semantics, it must be first analyzed to assign application-specific semantics to the unstructured content. Another way to say this is that the unstructured information must become 'structured' where the added structure explicitly provides the semantics required by target applications to interpret the data correctly... The UIMA specification defines platform-independent data representations and interfaces for text and multi-modal analytics. The principal objective of the UIMA specification is to support interoperability among analytics. This objective is subdivided into the following four design goals: (1) Data Representation. Support the common representation of artifacts and artifact metadata independently of artifact modality and domain model and in a way that is independent of the original representation of the artifact. (2) Data Modeling and Interchange. Support the platform- independent interchange of analysis data (artifact and its metadata) in a form that facilitates a formal modeling approach and alignment with existing programming systems and standards. (3) Discovery, Reuse and Composition. Support the discovery, reuse and composition of independently-developed analytics. (4) Service-Level Interoperability. Support concrete interoperability of independently developed analytics based on a common service description and associated SOAP bindings...
Last Call Review for W3C 'Selectors API'
Anne van Kesteren and Lachlan Hunt (eds), W3C Technical Report
W3C announced that the Web Applications Working Group has published the Last Call Working Draft for the "Selectors API" specification. Comments are welcome through December 12, 2008. Selectors, which are widely used in CSS, are patterns that match against elements in a tree structure. The Selectors API specification defines methods for retrieving Element nodes from the DOM by matching against a group of selectors. It is often desirable to perform DOM operations on a specific set of elements in a document. These methods simplify the process of acquiring specific elements, especially compared with the more verbose techniques defined and used in the past. The W3C Web Applications (WebApps) Working Group, a merger of the WebAPI and WAF Working Groups, is chartered to develop standard APIs for client-side Web Application development. This work will include both documenting existing APIs such as XMLHttpRequest and developing new APIs in order to enable richer web applications.
Open Source SOA Requires Expertise
Cristian Sturek, InformationWeek
Price is one of the most appealing selling points for open source SOA products. However, the price for these low- or no-cost, feature-rich platforms is often an enterprise service bus that's not designed for business users, requiring technical expertise that can drag the SOA platform down. There are some notable exceptions, and open source leaders are increasingly aware of the problem, prompting a rush to develop "orchestration" elements in open source ESBs. Still, most aren't there yet. Most notable among the open source ESBs are Apache ServiceMix, Iona's Fuse ESB, JBoss ESB, Mule ESB, and WSO2 ESB... The ESB is a standards-based SOA backbone, capable of connecting applications through service interfaces. By combining messaging, Web services, XML, and data transformation/management, an ESB can reliably connect, mediate, and control communications and interactions among services. When it comes to technological integration, open source ESBs deliver results that are similar to those of their commercial counterparts. Even more impressive, they always adhere to standards such as JBI, which isn't something all commercial vendors can claim. The open source ESBs are not only thoroughly tested and decently documented online, but they come with a number of adapters for JDBC, SOAP, FTP, HTTP, POP3, TCP, UDP, and even legacy systems such as the AS/400. However, aligning business needs and IT infrastructure is critical during the implementation phase of SOA, and this is where open source ESBs fail to deliver, at least for now. IT professionals will be fine working with XML/Java, but business pros may have a difficult time working in the open source environment. Business analysts generally need to visually orchestrate their process flows, make real-time changes to running processes, adjust service-level agreements, and replace underperforming services. Commercial products have the edge in offering increased business flexibility through plug-and-play architecture and reuse of existing services... Open source ESBs offer straightforward results, but only if users have enough expertise in XML and Java to use them. For instance, consider a company that needs a system to accept, validate, and place Web orders. In the Mule ESB, end- points, connectors, and routers are defined in XML. In Spring and Mule, a JavaBean and two configuration files are created to accept the order information, but the conversion of the file contents to XML and the validation that corrects quantities require additional Java coding. A Web service adapter is then configured in XML to place orders... To effectively implement an open source ESB, IT teams must be ready to learn the framework, component model, and XML scripting model, as well as have a good working knowledge of Spring and Java.
SOA Could Rebound as Recession-Busting Strategy
Pedro Pereira, eWEEK
Adoption of the hard-to-explain technology is down, but it's still possible to make a convincing business strategy case. Service Oriented Architecture. The term, typically abbreviated to SOA, is a mouthful. It turns out it's hard to explain as well, and as a result, making a case to customers for return on investment isn't exactly easy. Determining ROI, in fact, is the greatest challenge developers working on SOA implementations say they face, according to a recent survey by Evans Data, which polled 368 developers working with SOA and Web services in September and October. So great is the challenge, according to participants, that it tops identifying available Web services, testing and validation, and paying for the technology. So selling an SOA project to the customer takes some work. "It's a long-term initiative, it's not a short-term quick hit," says Evans Data CEO John Andrews. "It's hard to understand, and it's hard to describe." With that in mind, it doesn't take much to understand why the adoption of SOA is in decline. A survey by Gartner found that the number of enterprises planning SOA adoption this year dropped by more than half, to 25 percent from 53 percent in 2007. The number of companies with no plans to adopt SOA jumped to 16 percent from 7 percent...
X3D Schematron Validation and Quality Assurance
Don Brutzman, X3D Documentation
X3D Schematron is an additional form of XML validation used to detect problems and help assure the quality and correctness of X3D scenes. Schematron is a language for making assertions about patterns found in XML documents. Assertions for correct behavior must succeed to avoid error and warning messages. Detected inconsistencies and error conditions can be similarly reported. Allowing authors to find errors when creating Extensible 3D (X3D) Graphics scenes can greatly improve end-user experiences. Quality assurance to improve correctness can work on many levels, both syntactic and semantic... The Extensible Markup Language (XML) encoding for '.x3d' scenes has offered three primary ways for checking the syntactic correctness of scenes: (a) XML well-formed checks; (b) X3D document type definition (DTD or DOCTYPE) validation grammar ensures proper parent-child element relations and attribute naming; (c) X3D Schema validation grammar also ensures proper type-checking of most attribute values. An X3D Schematron ruleset is now available to allow checking a wide variety requirements that are specific to X3D. These quality-assurance checks go beyond the capabilities of DTD or Schema grammar-based checking. Rule examples include: (1) DEF must precede USE; (2) Interpolator key and keyValue arrays must have the same length; (3) Numeric values for index arrays are within bounds; (4) Plus many others, since X3D has many requirements for internal consistency... [Citation via Len Bullard]
See also: the Schematron Wiki
Who Said WS-Transfer is for REST?
William Vambenepe, IT Management in a Changing IT World (Blog)
One more post on the 'REST over SOAP' topic, recently revived by the birth of the W3C WS Resource Access working group. People seem to assume that WS-Transfer was created as a way to support the creation of RESTful systems that communicate over SOAP. As much as I can tell, this is simply not true. I never worked for Microsoft and I was not in the room when WS-Transfer was created. But I know what WS-Transfer was created to support: chiefly, it was WS-Management and the Devices Profile for Web Services, neither of which claims to have anything to do with REST. It's just that they both happen to deal with resources (that word again!) that have properties and they want to access (mostly retrieve, really) the values of these properties. But in both cases, these resources have a lot more than just state. You can call all sorts of type-specific operations on them. No uniform interface. It's not REST and it's not trying to be REST. The Devices Profile also happens to make heavy use of WS-Discovery and I am pretty sure that UDP broadcasts aren't a recommend Web-scale design pattern. And no 'hypermedia' in sight in either spec either. A specification is not RESTful. An application system is. And most application systems that use WS-Transfer don't even try to be RESTful... Can you point me to any real-life implementation of WS-Transfer that exhibits (at the level of the entire application) any convincing alignment with REST? The systems I know that use it (e.g. managing a bunch of WS-Management-enabled resources from a WS-Management-enabled IT management console) have no REST aspiration if you look at the entire system (and not just the WS-Transfer operations). And it's the entire system that matters from an architectural perspective. If REST alignment was the real goal of WS-Transfer, you'd think some would use it that way...
See also: the WS-RA news story
XML Daily Newslink and Cover Pages sponsored by:
|Sun Microsystems, Inc.||http://sun.com|
XML Daily Newslink: http://xml.coverpages.org/newsletter.html
Newsletter Archive: http://xml.coverpages.org/newsletterArchive.html
Newsletter subscribe: firstname.lastname@example.org
Newsletter unsubscribe: email@example.com
Newsletter help: firstname.lastname@example.org
Cover Pages: http://xml.coverpages.org/