Cover Pages: XML Daily Newslink: Friday, 07 November 2008

A Cover Pages Publication http://xml.coverpages.org/
Provided by OASIS and Sponsor Members
Edited by Robin Cover

This issue of XML Daily Newslink is sponsored by:
Primeton http://www.primeton.com

Headlines

W3C Invites Implementations of Speech Synthesis Markup Language (SSML)
Legacy Extended IRIs for XML Resource Identification
Using Device-provided Location-Related Measurements in Location Configuration Protocols
XML Heaven or XML Hell? Why Anti-XML Sentiment Is Misguided.
UBL 2.0 International Data Dictionary
'Geneva' SAML Interop—With a Lot of Help from Our Friends
DCMI Drafts for Applicaton Profiles and Core Metadata Interoperability
Sense/Net Adopts the CMIS Standard—the First in .NET World
IPTC Publishes Update to the EventsML-G2 Standard
How to Detect XML Document Encodings with SAX and XNI

W3C Invites Implementations of Speech Synthesis Markup Language (SSML)
Daniel C. Burnett, Paolo Baggia, Paul Bagshaw, (et al.), W3C Candidate Recommendation

W3C's Voice Browser Working Group has issued an invitation for implementations of the Candidate Recommendation for the "Speech Synthesis Markup Language (SSML) Version 1.1" specification. The Speech Synthesis Markup Language Specification is designed to provide a rich, XML-based markup language for assisting the generation of synthetic speech in Web and other applications. SSML and is based upon the JSGF and/or JSML specifications. SSML is part of a larger set of markup specifications for voice browsers developed through the open processes of the W3C. The essential role of the markup language is to give authors of synthesizable content a standard way to control aspects of speech output such as pronunciation, volume, pitch, rate, etc. across different synthesis-capable platforms. A related initiative to establish a standard system for marking up text input is SABLE, which tried to integrate many different XML-based markups for speech synthesis into a new one...The intended use of SSML is to improve the quality of synthesized content. Different markup elements impact different stages of the synthesis process. The markup may be produced either automatically, for instance via XSLT or CSS3 from an XHTML document, or by human authoring. Markup may be present within a complete SSML document or as part of a fragment embedded in another language, although no interactions with other languages are specified as part of SSML itself. Most of the markup included in SSML is suitable for use by the majority of content developers; however, some advanced features like phoneme and prosody (e.g. for speech contour design) may require specialized knowledge... The entrance criteria to the Proposed Recommendation phase require at least two independently developed interoperable implementations of each required feature, and at least one or two implementations of each optional feature depending on whether the feature's conformance requirements have an impact on interoperability. Detailed implementation requirements and the invitation for participation in the Implementation Report are provided in the Implementation Report Plan. We expect to meet all requirements of that report within the Candidate Recommendation period closing 5 January 2009. The Voice Browser Working Group will advance SSML 1.1 to Proposed Recommendation no sooner than 5 January 2009.

Legacy Extended IRIs for XML Resource Identification
Henry S. Thompson, Richard Tobin, Norman Walsh (eds), W3C Note

W3C announced that the XML Core Working Group has published a Group Note of "Legacy Extended IRIs for XML Resource Identification." Abstract: "For historic reasons, some formats have allowed variants of IRIs (RFC 3987) that are somewhat less restricted in syntax, for example XML system identifiers and W3C XML Schema anyURIs. This document provides a definition and a name (Legacy Extended IRI or LEIRI) for these variants for easy reference. These variants have to be used with care; they require further processing before being fully interchangeable as IRIs. New protocols and formats should not use Legacy Extended IRIs. This document is very closely based on material from "ABNF for IRI References and IRIs" and section 7, "Legacy Extended IRIs", included by permission of its authors. It is intended to provide a basis for a single normative reference from many XML- and/or HTML-related standards in advance of the final publication of 'IRI-bis' as an RFC. When that publication occurs, this specification will be re-issued to reference it in place of the extracts given below.

Using Device-provided Location-Related Measurements in Location Configuration Protocols
Martin Thomson and James Winterbottom (eds), IETF Internet Draft

In this document a method is described by which a Device is able to provide location-related measurement data to a location information server (LIS) within a request for location information. Document Sections 7.10 - 7.17 provide the XML Schema Registration definitions for Measurement Container, Base Types, LLDP, DHCP, WiFi, Cellular, GNSS, and DSL. Location-related measurement information are observations concerning properties related to the position of a Device, which could be data about network attachment or about the physical environment. Location-related measurement data does not necessarily contain location information directly, but it can be used in combination with contextual knowledge of the network, or algorithms to derive location information. Examples of location-related measurement data are: radio signal strength or timing measurements, Ethernet switch and port identifiers. When a LIS generates location information for a Device, information from the Device can improve the accuracy of the location estimate. A basic set of location-related measurements are defined, including common modes of network attachment as well as assisted Global Navigation Satellite System (GNSS) parameters... GNSS is any satellite-based system that provides positioning and time information. For example, the US Global Positioning System (GPS) or the European Galileo system. Location-Related Measurements in LCPs: the document defines a standard container for the conveyance of location-related measurement parameters in LCPs. This is an XML container that identifies parameters by type and allows the Device to provide the results of any measurement it is able to perform. A set of measurement schemas are also defined that can be carried in the generic container.

XML Heaven or XML Hell? Why Anti-XML Sentiment Is Misguided.
Duncan Mills, SYS-CON Enterprise Open Source Magazine

The introduction to one of the current crop of open source Web application frameworks says: "With proper markup/logic separation, a POJO data model, and a refreshing lack of XML..." Where does this distaste for XML in the Java framework development community come from? Several factors immediately spring to mind. First of all, XML's lack of tooling support is a significant issue. Let's use the XML-driven framework Apache Struts as an example. Though popular, Struts is often held up as an illustration of how XML-configured frameworks are bad (although many making this claim are competitors of Struts). In reality, I think the problem has nothing to do with XML; rather it's the fact that the Struts page-flow metadata represents a series of complex relationships. Of course the raw XML that describes these relationships is hard to understand, but that's not really a problem with the XML itself. Any textual description of a page flow, in code or metadata, is hard to understand... Overuse also adds to the negative perception of XML. While great for describing complex relationships, XML can be viewed as overkill for simple framework configuration. Enterprise JavaBeans (EJB) are often held up as an example of how badly XML can be overused. Interestingly one of the key features of the latest version of the EJB specification (3.0) addressed the issue head on. In the latest version, although XML metadata is still a valid option for describing the entities, code annotations and "configuration by exception" are the norm for describing simpler relationships. XML is only used for the corner cases... The vision of XML metadata as the primary code artifact is an exciting and compelling one, but there is a slight hitch. One of the biggest weaknesses of XML-driven frameworks today is debugging. Developers are insulated from the gory details of the framework implementation by the metadata until something goes wrong. Invariably, as soon as you try to debug such a framework you're thrown headlong into a huge stack of mystifying data structures that bear little or no resemblance to that nice clean representation you saw in the design time XML. This is the next great problem for framework developers to solve.

UBL 2.0 International Data Dictionary
Roberto Cisternino, Yukinori Saito, Oriol Bausa Peris (eds), OASIS PRD

OASIS announced that members of the Universal Business Language (UBL) Technical Committee have released a Committee Draft of the "UBL 2.0 International Data Dictionary" for Public Review. The review period ends January 06, 2009. Comment from potential users, developers, and all others is invited for the sake of improving the interoperability and quality of OASIS work. UBL defines standard XML representations of common business documents such as purchase orders, invoices, and shipping notices. UBL 1.0, released as an OASIS Standard in November 2004, normatively defines over 600 standard business terms (represented as XML element names) that serve as the basis for eight basic standard XML business document types. These English-language names and their corresponding definitions constitute the UBL 1.0 data dictionary—not a separate publication, but simply a label for the collection of all the element names and definitions contained in the UBL 1.0 data model spreadsheets and in the XML schemas generated from these data models. As an informational aid for UBL users, UBL localization subcommittees subsequently translated all of the UBL 1.0 definitions into Chinese (traditional and simplified), Japanese, Korean, Spanish, and Italian. These translations were published in a single merged spreadsheet called the UBL 1.0 International Data Dictionary (IDD). With input from a number of government agencies in Europe and Asia, UBL 2.0, released as an OASIS Standard in December 2006, greatly expanded both the UBL document set and the library of UBL information items upon which they are based, adding terms and document types to support a broad range of additional functions needed for government procurement. The resulting UBL 2.0 data dictionary contains more than 1900 standard element names and their English-language definitions. The UBL localization subcommittees are now in the process of translating this much larger collection. As the translation effort is expected to take some time, the UBL Technical Committee is releasing the UBL 2.0 IDD in stages as the localization subcommittees complete their work in order to begin the public review that is an integral part of the OASIS specification process. The English-language data definitions contained in UBL 2.0 (as updated by Errata 01) are normative. The translated definitions, on the other hand, are informative; they exist as aids in understanding the English-language definitions, not as replacements for those definitions.

See also: the announcement

'Geneva' SAML Interop—With a Lot of Help from Our Friends
Don Schmidt, Blog

Response to the unveiling of 'Geneva' at the PDC last week has been outstanding... 'Geneva' comprises a rich claims-based access feature set that will deliver on much of the promise of the Identity Metasystem vision. I just want to focus on the SAML 2.0 protocol support in this article. In the hope of being able to submit 'Geneva' Server to the Liberty Alliance interoperability testing program, Liberty Interoperable, we have targeted the IdP Lite and SP Lite Operational Modes from the SAML 2.0 Conformance Requirements specification, plus the GSA Profile which is referenced by many governments around the world. That is still a lot of functionality and we had to determine what customers really needed so we could prioritize our development process accordingly. We have been working with customers and other vendors for over a year to determine what features of the SAML 2.0 protocol are most commonly deployed. They unanimously agreed that the Web SSO Profile is what matters most. Based on actual customer deployments—augmented by extensive consultation with experts from the Shibboleth community, and precious insights from other vendors, including IBM, Ping Identity, SAP and Sun Microsystems—the SAML 2.0 feature prioritization for 'Geneva' Server looks like this (in descending order). (1) Web SSO AuthnRequest: HTTP redirect; (2) Web SSO Response: HTTP POST; (3) Identity Provider Discovery: Cookie; (4) Web SSO Response: HTTP Artifact; (5) Artifact Resolution: SOAP; (6) Single Logout (IdP-initiated): HTTP redirect; (7) Single Logout (SP-initiated): HTTP redirect; (8) Enhanced Client/Proxy SSO: PAOS... We got a Lot of Help from Our Friends. We owe a huge debt of gratitude to the Shibboleth community (Scott Cantor from The Ohio State University, and Jim Fox from the University of Washington, in particular), IBM (Tony Nadalin, Shane Weeden, Neil Readshaw) and Ping Identity (Patrick Harding, Tom Doyle, Pasha Beneson). We would not have finished the 'Geneva' beta in time for the PDC without this incredible outpouring of help from the community. On behalf of Microsoft I extend our heartfelt gratitude. I guess this shows that the Identity Metasystem is bringing people together, as well as technologies.

See also: the 'Geneva' news story

DCMI Drafts for Applicaton Profiles and Core Metadata Interoperability
Karen Coyle, Thomas Baker, Mikael Nilsson (et al., eds) DCMI Working Drafts

The Dublin Core Metadata Initiative (DCMI) has announced the publication of two new Working Drafts, with invitation for public comment through December 01, 2008. DCMI is an organization dedicated to promoting the widespread adoption of interoperable metadata standards and developing specialized metadata vocabularies for describing resources that enable more intelligent information discovery systems. The DMIS specification for a core set of fifteen metadata terms (the DCMI Metadata Terms) continues to be one of the main activities of DCMI: it was published as a DCMI Recommendation, as IETF RFC #5013, ANSI/NISO Standard Z39.85-2007, and ISO Standard 15836-21003. (1) "Guidelines for Dublin Core Application Profiles" explains the key components of a Dublin Core Application Profile and walks through the process of developing a profile. A Dublin Core Application Profile (DCAP) is a document (or set of documents) that specifies and describes the metadata used in a particular application. To accomplish this, a profile: describes what a community wants to accomplish with its application (Functional Requirements); characterizes the types of things described by the metadata and their relationships (Domain Model); enumerates the metadata terms to be used and the rules for their use (Description Set Profile and Usage Guidelines); and defines the machine syntax that will be used to encode the data (Syntax Guidelines and Data Formats). The new WD document is aimed at designers of application profiles—people who will bring together metadata terms for use in a specific context. It does not address the creation of machine-readable implementations of an application profile nor the design of metadata applications in an broader sense. For additional technical detail the reader is pointed to further sources. (2) "Interoperability Levels for Dublin Core Metadata" discusses the design choices involved in designing applications for different types of interoperability. At Level 1, applications use data components with shared natural-language definitions. At Level 2, data is based on the formal-semantic model of the W3C Resource Description Framework. At Level 3, data is structured as Description Sets (records). At Level 4, data content is subject to a shared set of constraints (described in a Description Set Profile). Conformance tests and examples are provided for each level... The evolving assumptions which over the past decade have led from fifteen elements to the Singapore Framework for Dublin Core Application Profiles can be captured in a layered model of interoperability. The model of levels presented here addresses the need felt in many communities to position various projects with various degrees of interoperability with Dublin Core but lacking an appropriate terminology. The intention is to provide a "ladder of interoperability", specifying the choices, costs, and benefits involved in designing applications for increased levels interoperability. This document describes the possible levels of interoperability with Dublin Core metadata of a specification (or application). These levels are helpful for determining the scope of a project that wants to be "Dublin Core-compatible" and to set expectations for users of "Dublin Core-compatible" specifications. The levels come with simple litmus tests that serve as guidelines for determining the level of interoperability.

Sense/Net Adopts the CMIS Standard—the First in .NET World
Tamás Bíró, Sense/Net Blog

"We have developed a CMIS draft implementation in Sense/Net 6.0 Beta 2, soon to be released. It is possibly the first .NET implementation, as all supporting companies except from Microsoft are JAVA based. It is surely the first open source implementation on the .Net platform... Since Sense/Net 6.0 is both an Enterprise Portal and an Enterprise Content Management System, with its own Content Repository, we wanted to showcase how easy it is to use the .NET platform, WCF and Sense/Net 6.0 to implement the standard [see screenshot]. Our demo is a two way implementation, because our content repository has a CMIS service interface and our portal has a CMIS client Webpart (portlet). So other CMIS clients can access our contents, but our portal can aggregate content from other CMIS compliant systems, such as next generation SharePoint, Alfresco and others. We are also building an online CMIS demo [see: Simple CMIS Client, Simple CMIS Aggregate Client] The demo features two CMIS webparts. One is able to navigate the content repository; the other is able to aggregate content from two sources that you can enter. The screenshot above is the CMIS test webpart, showing the PFS root contents. The services can also be accessed, just copy the URIs from the input boxes. It even works with a simple browser, showing XML. There is no authentication, so no login is required. The source code will be available within a few days..." Related information is available from the Sense/Net Wiki: "How Does the Implementation of CMIS Work with Sense/Net 6.0?"

IPTC Publishes Update to the EventsML-G2 Standard
Michael Steidl, IPTC Announcement

Michael Steidl, Managing Director of the International Press Telecommunications Council, announced updates to the version 1.1 EventsML-G2 standard and supporting Catalog. The IPTC, based in London, UK, is a consortium of the world's major news agencies, news publishers and news industry vendors. It develops and maintains technical standards for improved news exchange that are used by virtually every major news organization in the world. IPTC G2-Standards are built from a set of specifications and XML components that can be shared in a modular way for maximum effectiveness. IPTC's G2-Standards, including NewsML-G2 Version 2.2 and EventsML-G2 Version 1.1, "fit into the Semantic Web initiatives of the World Wide Web Consortium, enriching content so that computers can more easily search the huge universe of news. The goal is to better help news agencies manage and distribute their massive libraries of current and archived news content, and to help customer search engines find content quickly and accurately. G2-Standards can be easily combined with IPTC's groundbreaking NewsCodes, which provide a rich suite of standard terms for describing news, to give news agencies amazing flexibility in how news can be bundled for downstream users. With widely available digital news archives now dating back to 1850 or earlier, news agencies, librarians and archivists have a special interest in the rapid searching and retrieval of news, which NewsCodes can accelerate to help drive revenue growth. The common features of the G2-Standards are: (1) A data strucuture to convey all kinds of news content - the News Item; (2) A data structure for packaging News Items in a structured way - the Package Item; (3) A data structure to convey persistent information worth to remember and to refer to - the Concept Item; (4) A data structure to collect many concepts in a single wrapper, e.g. to convey it as a controlled vocabulary - the Knowledge Item; (5) A wrapper to transmit one to many of the items above - the News Message G2-Standards and the Controlled Vocabulary Catalog: The specifications of all G2-Standards provide a Catalog of controlled vocabularies used with this item. A Catalog provides a mapping of the Scheme-URI—the globally unique and unambiguous identifier for a controlled vocabulary taking the format of a URL—to an alias-string to be used with the QCodes in G2 items.

See also: the Catalog

How to Detect XML Document Encodings with SAX and XNI
Elliotte Rusty Harold, IBM developerWorks

XML is defined in terms of Unicode characters. For transmission and storage in modern computers, those Unicode characters must be stored as bytes and decoded by the parser. A number of different encoding schemes are used for this purpose: UTF-8, UTF-16, ISO-8859-1, Cp1252, SJIS, and many others. Usually, but not quite always, you really don't care about the underlying encoding. The XML parser converts whatever the document is written in to Unicode strings and char arrays. Your program operates on those decoded strings. Most commonly, this issue comes up when you want to preserve the input encoding for output. Another case is when you want to store a document in a database as a string or a Character Large Object (CLOB) without parsing it. Similarly, some systems transmit XML documents over HTTP without fully reading them but need to set the HTTP Content-type header to indicate the proper encoding. In these cases, you need to know how the document is encoded. Much of the time, you know what the encoding is because you wrote the document. But if you didn't—e.g., if you just received the document from somewhere else (for instance, from an Atom feed) -- then the best approach is to use a streaming API such as Simple API for XML (SAX), Streaming API for XML (StAX), System.Xml.XmlReader, or the Xerces Native Interface (XNI). You can also use tree-based APIs such as Document Object Model (DOM); but they read the entire document, even though the first 100 bytes or less are usually all you need to read to determine the encoding. A streaming API can read just as much as it needs and then abandon parsing once the answer is known. This is much more efficient... This article considers the cases when you really do care about the underlying encoding. In these rare cases where you do need to know the input encoding, SAX and XNI offer fast and efficient means of figuring it out.


SEARCH \| ABOUT \| INDEX \| NEWS \| CORE STANDARDS \| TECHNOLOGY REPORTS \| EVENTS \| LIBRARY

Headlines

Sponsors