Cover Pages: XML Daily Newslink: Friday, 02 March 2007

A Cover Pages Publication http://xml.coverpages.org/
Provided by OASIS and Sponsor Members
Edited by Robin Cover

This issue of XML Daily Newslink is sponsored by:
BEA Systems, Inc. http://www.bea.com

Headlines

W3C Last Call Review: GRDDL Links Microformats and Semantic Web
Web Services Transaction Version 1.1 Proposed as as OASIS Standard
OASIS Symposium 2007 Explores Advances for eBusiness and eGovernment
What Does XML Smell Like?
W3C Workshop on Declarative Models of Distributed Web Applications
Presence Authorization Rules
Secure Browsing: Web Security Experience, Indicators and Trust
Open Document Standards and Language Identification

W3C Last Call Review: GRDDL Links Microformats and Semantic Web
Dan Connolly (ed), W3C Technical Report

Members of the W3C GRDDL Working Group have released a Last Call Working Draft for the "Gleaning Resource Descriptions from Dialects of Languages (GRDDL)" specification. With important applications such as connecting microformats to the Semantic Web, GRDDL is a mechanism to extract RDF statements from suitable XHTML and XML content using programs such as XSLT transformations. GRDDL allows powerful mash-ups at very low cost. In particular, GRDDL defines a technique for obtaining RDF data from XML documents and in particular XHTML pages. Authors may explicitly associate documents with transformation algorithms, typically represented in XSLT, using a link element in the head of the document. Alternatively, the information needed to obtain the transformation may be held in an associated metadata profile document or namespace document. Clients reading the document can follow links across the Web using techniques described in the GRDDL specification to discover the appropriate transformations. This document uses a number of examples from the GRDDL Use Cases document to illustrate, in detail, the techniques GRDDL provides for associating documents with appropriate instructions for extracting any embedded data. There are many domain-specific languages ("dialects") used in practice among the many XML documents on the web. There are dialects of XHTML, XML and RDF that are used to represent everything from poetry to prose, purchase orders to invoices, spreadsheets to databases, schemas to scripts, and linked lists to ontologies. While this breadth of expression is quite liberating, inspiring new dialects to represent information, it can be a barrier to understanding across different domains or fields. How, for example, does software discover the author of a poem, a spreadsheet and an ontology? And how can software determine whether authors of each are in fact the same? By specifying a GRDDL transformation, the author of a document states that the transformation will provide a faithful rendition in RDF of information (or some portion of the information) expressed through the XML dialect used in the source document. Likewise, by specifying a GRDDL namespace transformation or profile transformation, the creator of that namespace or profile states that the transformation will provide a faithful RDF rendition of a class of source documents which relate to that namespace or profile. A namespace document or a profile document also provide a means for their authors to explain in prose the purpose of the transformation or any policy statements.

See also: the GRDDL use cases

Web Services Transaction Version 1.1 Proposed as as OASIS Standard
Staff, OASIS Announcement

Members of the OASIS Web Services Transaction (WS-TX) Technical Committee have approved committee draft specifications for Web Services Transaction Version 1.1 and submitted the document collection for consideration as an an OASIS Standard. The TC was chartered in October 2005 to define a set of protocols that coordinate the outcomes of distributed application actions, specifying an extensible framework for developing coordination protocols through continued refinement of the Web Services Coordination (WS-Coordination) Version 1.0. Web Services Transaction v1.1 is a set of 3 specifications consisting of WS-Coordination, WS-AtomicTransaction, and WS-BusinessActivity. The WS-Coordination specification protocols are used to support a number of applications, including those that need to reach consistent agreement on the outcome of distributed activities. The specification defines a coordination context XML type that identifies a specific activity and the "coordination type" of the agreement protocol supported by the coordination context. It also defines protocols that enable an application service to create a coordination context and to register for coordination protocols. The framework enables existing transaction processing, workflow, and other systems for coordination to hide their proprietary protocols and to operate in a heterogeneous environment. The WS-AtomicTransaction specification provides the definition of the Atomic Transaction coordination type that is to be used with the extensible coordination framework described in WS-Coordination. This specification defines three specific agreement coordination protocols for the Atomic Transaction coordination type: completion, volatile two-phase commit, and durable two-phase commit. Developers can use any or all of these protocols when building applications that require consistent agreement on the outcome of short-lived distributed activities that have the all-or-nothing property. The WS-BusinessActivity specification provides the definition of two Business Activity coordination types: AtomicOutcome or MixedOutcome, that are to be used with the extensible coordination framework described in the WS-Coordination specification. This specification also defines two specific Business Activity agreement coordination protocols for the Business Activity coordination types: BusinessAgreementWithParticipantCompletion, and BusinessAgreementWithCoordinatorCompletion. Developers can use these protocols when building applications that require a compensation-based, consistent agreement on the outcome of long-running distributed activities.

See also: the OASIS WS-TX TC

OASIS Symposium 2007 Explores Advances for eBusiness and eGovernment
Staff, OASIS Announcement

Open standards supporters from around the world are expected to gather in San Diego, California, 15-17 April 2007, for the fourth annual OASIS Symposium. Centered on the theme, "eBusiness and Open Standards: Understanding the Facts, Fiction, and Future," sessions will examine SOA, identity management, Web services, business process, enterprise content, and information management. Presentations on OpenDocument, WS-BPEL, SAML, DITA, ebXML and other specifications will be featured. "We're not living in the standards world of the 70s, 80s, or 90s, and customers know it. They're demanding real open standards and not those where 'open' was inserted by the marketing team," said Robert Sutor, Ph.D., vice president of standards and open source at IBM. In the Symposium's keynote address, Dr. Sutor will explore the current climate for standards, how we got here, and where current actions are leading us. The OASIS Symposium will feature a Management Track of sessions on the latest technologies, applications, and services from a business perspective. A Technical Track, geared at providing IT professionals with the most up-to-date processes, tools and techniques for practical applications and implementations, will also be offered. The event is open to the public, and members as well as non-members of OASIS are invited to participate. Burton Group vice president and research director, Anne Thomas Manes, will lead the closing panel, "Five Years of Web Services & SOA: You Are Here." Executives from BEA Systems, EDS, IBM, SAP, and Sun Microsystems will share and debate their perspectives on the successes and failures experienced in SOA as well as the challenges and promises that remain. Hosted by the OASIS ODF Adoption Committee, a special OpenDocument Workshop will focus on the 'implementability' of using applications that comply with the OpenDocument format OASIS Standard (ISO/IEC 26300). Leaders from Europe and the US will discuss the latest advances in OpenDocument adoption, accessibility, and programmability. A WS-BPEL Workshop will clarify the business value that WS-BPEL offers, examine common scenarios in which the specification should be applied, and explore usage of advanced constructs.

See also: the Symposium web site

What Does XML Smell Like?
Michael Day, XML.com

This article introduces a set of heuristic rules for sniffing the content of a file in order to determine whether it is an XML document or an HTML document. An implementation is provided using the xmlReader interface of libxml2. This implementation is used in Prince, a formatter for creating PDF files from web documents. Say a user agent wants to load a web document and display it, format it, process it, or whatever. It might be an XML document, containing XHTML, SVG, MathML, or a nutritious mix of these vocabularies. Or it might be an HTML document, ideally valid HTML4, but more likely an unappetizing bowl of tag soup. The problem is, how does the user agent know whether to parse the document as XML or HTML? If the document is being retrieved over the Web, then there is no problem, as the HTTP response will come with a Content-Type header that gives the MIME type of the document. This may be text/html for HTML, application/xml for XML or 'application/xhtml+xml' for XHTML. The user agent can check the MIME type before trying to parse the document, and all is well. However, if the document is being loaded from a local file, there is no obvious way to determine if it is XML or HTML. The user agent might try checking the file extension, but what if it is .html? It is common for XHTML files to be given an extension of .html or .htm, as .xhtml is rather long and .xht is rather obscure. This means that a file with an extension of .html may actually be an XML document and require an XML parser. In some cases, documents will probably load, the user may not get what he expects, as style sheets and scripts may behave differently, embedded SVG or MathML content will be garbled, and external entities and inclusions will not be resolved. Web user agents like Prince need a way to determine whether a .html file should be parsed as XML or HTML. In the absence of telepathy, there is no perfect algorithm to determine the intent of the author, so we will need to formulate some heuristics that can sniff the content of the document and see if it smells like XML or HTML. In Prince, document sniffing heuristic rules are implemented as a C function that uses the xmlReader interface from libxml2 to parse the document up until the first start tag or one of the heuristics matches. A copiously commented version of the code, as well as some sample documents to test it on, is available for download in the "Code" section below; it compiles to a small program that sniffs files and classifies them as being XML or HTML.

W3C Workshop on Declarative Models of Distributed Web Applications
Staff, W3C Announcement

A Call for Participation has been issued in connection with a W3C Workshop on Declarative Models of Distributed Web Applications -- "Describing User Interaction in Multi-Device Applications from an End-To-End Perspective." The Workshop will be held 5 - 6 June 2007 in Dublin, Ireland, hosted by MobileAware with the support of the Irish State Development Agency, Enterprise Ireland. W3C membership is not required in order to participate in the Workshop.; position papers are due 17 April 2007. This Workshop will help the W3C community determine what steps it can take in this area, including the possible scope of W3C Recommendations. The main aim of this workshop is to look at the potential for applying declarative techniques to describing Web applications, as a whole rather than just the markup downloaded to each device. Today, server-side scripts are used extensively to generate client-side markup on the fly, and the cost of developing and maintaining these scripts represents an opportunity for declarative based approaches. The emergence of XML databases and XQuery looks promising. Likewise the Semantic Web can be applied to descriptions, e.g. of device capabilities and access control, and for reasoning over them. Security and usability are key themes for realizing the potential for new kinds of Web applications, particularly, those involving richer access to device capabilities and to personal or confidential information. Another angle is the emergence of distributed applications and the potential for remotely controlling devices and user interfaces through the means to remotely exchange events. Participants will have the opportunity to discuss application modeling, security and usability for distributed applications running on network devices. More and more devices have some kind of networking capability. W3C has hitherto focused on model-based approaches with user interface languages such as DIAL. In principle, model-based aproaches can be combined with other techniques such as dialog and goal based formalisms to describe Ubiquitous Web applications as a whole, rather than just the portions that run on particular devices. In a world of distributed applications, can we work toward common declarative languages for both user interaction and application logic? State transitions, for example, can be written in SCXML to describe application flow, as well as the behavior of individual devices in applications where multiple devices are loosely coupled via events.

See also: the Ubiquitous Web

Presence Authorization Rules
Jonathan Rosenberg (ed), IETF Internet Draft

The IESG announced that it has received a request to consider the "Presence Authorization Rules" specification as a Proposed Standard. The IESG plans to make a decision in the next few weeks, and solicits final comments, to be received by 2007-03-16. The document was produced by members of the IETF SIP for Instant Messaging and Presence Leveraging Extensions (SIMPLE) Working Group. Authorization is a key function in presence systems. Authorization policies, also known as authorization rules, specify what presence information can be given to which watchers, and when. This specification defines an Extensible Markup Language (XML) document format for expressing presence authorization rules. Such a document can be manipulated by clients using the XML Configuration Access Protocol (XCAP), although other techniques are permitted. The Session Initiation Protocol (SIP) for Instant Messaging and Presence (SIMPLE) specifications allow a user, called a watcher, to subscribe to another user, called a presentity, in order to learn their presence information. This subscription is handled by a presence agent. "Common Policy: A Document Format for Expressing Privacy Preferences" specifies a framework for representing authorization policies, and is applicable to systems such as geo-location and presence. This framework is used as the basis for presence authorization documents. In the framework, an authorization policy is a set of rules. Each rule contains conditions, actions, and transformations. The conditions specify under what conditions the rule is to be applied to presence server processing. The actions element tells the server what actions to take. The transformations element indicates how the presence data is to be manipulated before being presented to that watcher, and as such, defines a privacy filtering operation. A presence authorization document can be manipulated by clients using several means. One such mechanism is the XML Configuration Access Protocol (XCAP); this specification defines the details necessary for using XCAP to manage presence authorization documents. A presence authorization document is an XML document, formatted according to the defined schema defined in; presence authorization documents inherit the MIME type of common policy documents, 'application/auth-policy+xml'.

Secure Browsing: Web Security Experience, Indicators and Trust
Tyler Close (ed), W3C Note

W3C's Web Security Context Working Group has released the First Public Working Draft for "Web Security Experience, Indicators and Trust: Scope and Use Cases." The WD describes what technologies may be used and how proposals will be evaluated to produce the group's technical work to enable a secure and usable interface so Web users can make safe trust decisions on the Web. It elaborates upon the Web Security Context Working Group's charter to explain what the group aims to achieve, what technologies may be used and how proposals will be evaluated. Web user agents are now used to engage in a great variety and number of commercial and personal activities. Though the medium for these activities has changed, the potential for fraud has not. The Working Group will catalog existing presentation of security information and corresponding user interpretations reported in user studies. It will analyze common use cases to determine what security information a user requires to proceed safely and recommend security information that should, or should not, be presented in each case. The Working Group will also recommend a set of terms, indicators and metaphors for consistent presentation of security information to users, across all web user agents. For each of these items, the Working Group will describe the intended user interpretation, as well as safe actions the user may respond with in common use cases. The WG will recommend presentation techniques that integrate the consumption of security information by the user into the normal browsing workflow. Presenting security information in a way that is typically ignored by the user is of little value. User interactions on the Web, using the HTTP and HTTPS protocols, are at the core of the Working Group's scope. Where Web interactions involve other application-level protocols (including, e.g., SOAP or FTP), the Working Group considers these in its scope and will aim that its recommendations be applicable; however, the Working Group does not consider recommendations that are specific to such protocols as a Goal. Use cases considered by this Working Group must involve a web user agent, operated by a human user. Any user agent that is used in a Web interaction is in scope; the range of such agents includes widely deployed web browsers, rich clients, and the web browsers found on mobile phones and other constrained devices. In all instances, the use case is only relevant to this Working Group if the presentation of security information should affect the user's interaction with the web resource.

See also: W3C Security Activity

Open Document Standards and Language Identification
Linguistic Society of America, TAC Report

Members of the LSA Technology Advisory Committee have published a letter sent to an ANSI Program Manager regarding the OOXML standard's use of language identifiers. Excerpt from letter and background: "I am writing you as President of the Linguistic Society of America (LSA), on behalf of the Executive Committee of the Society and its members. The LSA understands that the ECMA 376 Office Open XML (OOXML) standard is being proposed for adoption as an ISO/IEC standard by JTC1/SC34. The LSA has reviewed the OOXML standard in relation to use of language identifiers and requests that any ISO/IEC standard for OOXML incorporate revisions to consistently specify the use of the recommendations in IETF BCP 47 for language tags in OOXML documents. A detailed explanation follows. The LSA has reviewed the ECMA 376 Office Open XML standard in relation to internationalization and, specifically, metadata elements for language identification. As observed in Section 4.2 of SC34/N0809, WordprocessingML and DrawingML use language identifiers for each paragraph and run. The specifications for these in Section 4:2.18.51 and Section 4:5.1.12.72, however, are vague, unnecessarily inconsistent, and underrepresent the world's languages. To be specific: (1) WordprocessingML uses the simple type, ST_Lang, defined in Section 4:2.18.51, while DrawingML uses a different simple type, ST_TextLanguageID, defined in Section 4:5.1.12.72. There is no reason why these should be different... (2) The specification of ST_Lang in Section 4:2.18.51 requires values to consist of an "ISO 639-1 letter code plus a dash plus an ISO 3166-1 alpha-2 letter code". This roughly corresponds to IETF specification RFC 1766 (superseded by RFC 4646), though it is more restrictive. This specification has undesirable qualities: it requires an ISO 3166-1 country identifier even if one is unnecessary or even inappropriate; it does not permit important distinctions related to written form that are essential for linguistic processing, such as script or orthography conventions...; it allows reference to only the small portion of the world's languages that are supported in ISO 639-1 rather than the more comprehensive set supported in ISO 639-3. Use is, therefore, limited to roughly 200 out of some 7000 known languages. In summary, then, if ECMA 376 is considered as a proposed ISO/IEC standard, then the LSA requests that it be revised to unambiguously specify the use of the recommendations in IETF BCP 47 for language tags per the specific changes described..."


SEARCH \| ABOUT \| INDEX \| NEWS \| CORE STANDARDS \| TECHNOLOGY REPORTS \| EVENTS \| LIBRARY

Headlines

Sponsors