Cover Pages: XML Daily Newslink: Wednesday, 21 May 2008

A Cover Pages Publication http://xml.coverpages.org/
Provided by OASIS and Sponsor Members
Edited by Robin Cover

This issue of XML Daily Newslink is sponsored by:
Sun Microsystems, Inc. http://sun.com

Headlines

Microsoft Office 2007 SP2 to Support XPS, PDF v1.5, PDF/A, and ODF v1.1
Processing Linked Web Data with XSLT
State of the Semantic Web
DITA, DocBook, and the Art of the Document
W3C Call for Implementations: XQuery and XPath Full Text 1.0
Web-based Spreadsheets with OpenOffice.org and Dojo
OASIS Open Standards Forum 2008
A Uniform Resource Identifier for Geographic Locations ('geo' URI)

Microsoft Office 2007 SP2 to Support XPS, PDF v1.5, PDF/A, and ODF v1.1
Staff, Microsoft Announcement

Microsoft announced that with the release of Microsoft Office 2007 Service Pack 2 (SP2) scheduled for the first half of 2009, the list of supported document formats will grow to include support for XML Paper Specification (XPS), Portable Document Format (PDF) 1.5, PDF/A, and Open Document Format (ODF) v1.1. "When using SP2, customers will be able to open, edit and save documents using ODF and save documents into the XPS and PDF fixed formats from directly within the application without having to install any other code. It will also allow customers to set ODF as the default file format for Office 2007. To also provide ODF support for users of earlier versions of Microsoft Office (Office XP and Office 2003), Microsoft will continue to collaborate with the open source community in the ongoing development of the Open XML-ODF translator project on SourceForge.net. In addition, Microsoft has defined a road map for its implementation of the newly ratified International Standard ISO/IEC 29500 (Office Open XML). IS29500, which was approved by the International Organization for Standardization (ISO) and International Electrotechnical Commission (IEC) in March, is already substantially supported in Office 2007, and the company plans to update that support in the next major version release of the Microsoft Office system, code-named 'Office 14'. Consistent with its interoperability principles, in which the company committed to work with others toward robust, consistent and interoperable implementations across a broad range of widely deployed products, the company has also announced it will be an active participant in the future evolution of ODF, Open XML, XPS, and PDF standards. Microsoft will join the OASIS technical committee working on the next version of ODF and will take part in the ISO/IEC working group being formed to work on ODF maintenance. Microsoft employees will also take part in the ISO/IEC working group that is being formed to maintain Open XML and the ISO/IEC working group that is being formed to improve interoperability between these and other ISO/IEC-recognized document formats. The company will also be an active participant in the ongoing standardization and maintenance activities for XPS and PDF. It will also continue to work with the IT community to promote interoperability between document file formats, including Open XML and ODF, as well as Digital Accessible Information System (DAISY XML), the foundation of the globally accepted DAISY standard for reading and publishing navigable multimedia content. Microsoft is also committed to providing Office customers with the ability to open, edit and save documents in the Chinese national document file format standard, Uniform Office Format (UOF)."

Processing Linked Web Data with XSLT
Uche Ogbuji, DevX.com

The Semantic Web is a grand vision for increasing the power of the web through better expression and management of context. Semantic Web developers are building a framework to open up and connect organized information, which takes advantage of many popular developments on the web, such as the success of Wikipedia, Creative Commons-licensed publishing on sites like Flickr, and various blogs. A portion of this framework is the Linking Open Data (LOD) community initiative (seeded by the W3C Semantic Web Education and Outreach group). A goal of LOD is to weave together separate collections of open data using deep linking and RDF (Resource Description Framework) representations. The hallmark of LOD is to make it easy for web developers to create and process compatible data. Utilizing LOD calls for a broad war chest of tools and techniques that cover the diverse expertise of Web developers. One popular tool for processing data on the web is XSLT (Extensible Stylesheet Language Transformations), building on the growth of XML as a data format on the web. XSLT is not a general-purpose programming language—so it is limited in its uses—including LOD processing. However, XSLT is very useful to handle auxiliary roles in such processing that involves transforming XML. This article explores specialized areas for the use of XSLT 1.0 in LOD processing. The focus is on XSLT 1.0 (XSLT 2.0 does offer more for LOD processing, but it is far more complex and much less used by the community). XSLT 1.0 has more processors than 2.0 and the EXSLT set of community extensions, which has strong support in Firefox 3.0, provides facilities that bring it close to the power of XSLT 2.0... As for any web development, use whatever tools you prefer for Linking Open Data (LOD), but there are a few things that make XSLT attractive. For one, XSLT processing is much faster than Javascript/DOM in almost all browsers. Also, some web developers prefer to learn XSLT rather than other more general programming languages. By using Semantic Web technologies now, you strengthen your position as a web developer for the future. Ideally, you should feel empowered to use a combination of languages for processing, and to target each language to its greatest strength.

State of the Semantic Web
Ivan Herman, Conference Presentation

This presentation was delivered by Ivan Herman, W3C Semantic Web Activity Lead, at the 2008 Semantic Technology Conference held in San Jose, California, on May 18, 2008. The history of the Semantic Web goes back several years now. The 55-slide presentation summarizes what has been achieved, where we are, and where we are going. Ivan Herman joined the Centre for Mathematics and Computer Sciences (CWI) in Amsterdam in 1988 where he holds a tenured position. He joined the W3C Team as Head of W3C Offices in 2001 while maintaining his position at CWI. Ivan served as Head of Offices until 2006, when he was asked to take the Semantic Web Activity Lead position, which is now his principal work at W3C. As summarized in Bruno Pinheiro's blog: "[Herman] made a broad presentation of what they're focusing at the W3C, which are the discussions that are burning at the community and talked about some technologies that they are putting their bets on. As far as I saw, Dublin Core and FOAF are a common sense at the vocabulary level, as they appeared as good examples in both presentations and in every book about semantic. SPARQL is the Query Language that with RDF and WOL seems to be under the spotlight now. Ivan talked a little about an interesting project called the 'Linking Open Data Project', which Goal is to 'expose open databases in RDF', setting RDF links among data items from different databases and setting up SPARQL endpoints to query the data. The first practical projectOne of the projects of this initiative is the DBPedia: by extracting data from that 'infobox' on wikipedia pages (right columm) from a City, for example, and integrating with the city information on the US Census database they can build a stronger an richer knowledge of that city. At this elaboration stage there are still lots of issues, but these were the ones Ivan talked about: security, trust, provenance; ontology merging, alignment, term equivalences; Uncertainty. The most important for me were the ontology merging and uncertainty. The web as we know was build on sharing and linking documents. Now, on the Semantic wave the same concept must be applied. There's no need to build a complete new ontology on geonames, for example. Just link to an existing and build one just for your own knowledge domain..."

See also: Bruno Pinheiro's blog

DITA, DocBook, and the Art of the Document
Kurt Cagle, O'Reilly Reviews

Structured documentation provides a level of uniformity that can then serve for reusing content from a single document source. Today that is important because such structured source documents can in turn be transformed into HTML, into PDFs, PostScript files, RTF and Microsoft Word. Such source documents can also serve to power binary help files, to provide first-level semantics for text-to-speech and VoiceML applications and so forth - all at the same time. A consistent document language makes it possible to build transformations to import partial content into output for labels on cans or boxes, and provides a single point of authority for translation into foreign languages... DocBook and DITA both provide XML Markup for describing different facets of technical documentation. DocBook actually has its origins, ironically enough, with O'Reilly & Associates as a language used to lay out narrative technical books, based primarily upon the works of Norman Walsh and Robert Stayton. DocBook was originally an SGML specification, and was one of the first non-W3C specifications to be converted to XML, with the formal specification for DocBook being then assigned to OASIS-Open as part of their documentation activity. It is used primarily for describing books, articles, research papers and (with some additions) slides, but its structured layout also makes it attractive for storing technical articles with small to moderate sized organizations. Indeed, even today, many of the books that O'Reilly produces are laid it first in DocBook... DITA, on the other hand, evolved from the Darwin Information Typing Architecture developed by IBM in order to create individual 'topics' of content -- such as those that might be used for an online documentation system. The topics in turn are organized by topic maps that establish a hierarchical structure for the topics. Topics in turn use a basic layout language which borrows somewhat from HTML, but extends it to include figures, examples, notes, screen displays and so forth. DITA works especially in those cases where narrative content is limited to the domain of a single topic (such as the individual entry within a help application), although efforts are underway to try to extend this to formal business documents, with mixed success. As a technology, DITA seems to work best in those situations where you're dealing with content that can be parsed into distinct chunks that have to be updated by a wide number of authors.

W3C Call for Implementations: XQuery and XPath Full Text 1.0
S. Amer-Yahia, C. Botev, S. Buxton (et al., eds), W3C Technical Report

W3C has issued a call for implementations in connection with the publication of "XQuery and XPath Full Text 1.0" as a Candidate Recommendation. This document has been jointly developed by the W3C XML Query Working Group and the W3C XSL Working Group, each of which is part of the XML Activity. It will remain a Candidate Recommendation until at least 15-September-2008, and will not be submitted for consideration as a W3C Proposed Recommendation until its four key exit critera are met. A Test Suite for this document is under development, and implementors are encouraged to run this test suite and report their results. The editorial teams have also released Working Drafts for "XQuery and XPath Full Text 1.0 Requirements" and "XQuery and XPath Full Text 1.0 Use Cases." The CR document defines the language and the formal semantics of XQuery and XPath Full Text 1.0. Additionally, the document defines an XML syntax for XQuery and XPath Full Text 1.0. XQuery and XPath Full Text 1.0 extends the syntax and semantics of XQuery 1.0 and XPath 2.0... As XML becomes mainstream, users expect to be able to search their XML documents. This requires a standard way to do full-text search, as well as structured searches, against XML documents. A similar requirement for full-text search led ISO to define the SQL/MM-FT standard. SQL/MM-FT defines extensions to SQL to express full-text searches providing functionality similar to that defined in this full-text language extension to XQuery 1.0 and XPath 2.0. XML documents may contain highly structured data (fixed schemas, known types such as numbers, dates), semi-structured data (flexible schemas and types), markup data (text with embedded tags), and unstructured data (untagged free-flowing text). Where a document contains unstructured or semi-structured data, it is important to be able to search using Information Retrieval techniques such as scoring and weighting... As XQuery and XPath evolve, they may apply the notion of score to querying structured data. For example, when making travel plans or shopping for cameras, it is sometimes useful to get an ordered list of near matches in addition to exact matches. If XQuery and XPath define a generalized inexact match, we expect XQuery and XPath to utilize the scoring framework provided by XQuery and XPath Full Text.

See also: the Requirements document

Web-based Spreadsheets with OpenOffice.org and Dojo
Oleg Mikheev and Doan Nguyen Van, Java World Magazine

As functionality traditionally associated with desktop applications moves to the Web, developers are looking for new ways to handle that computational heavy lifting on the server side. Many Web applications these days aim to replace a corresponding desktop application in one way or another. For instance, most Web grids and tables, such as those in Google Spreadsheets, essentially mimic desktop office spreadsheets. But if you need to create a Web-based application that behaves like an office suite, there's no need to reinvent the wheel: the open source OpenOffice.org suite can actually serve as the powerhouse behind a Web application. In this article, you'll learn how to combine OpenOffice.org and Dojo to create a simple Ajax-based spreadsheet application much like Google Spreadsheets. OpenOffice.org is a cross-platform office suite. It is based on a component model called Universal Network Objects, or UNO, which allows components to communicate across the network oblivious to the platform they run on and the language they were written in. Though it's usually thought of as a desktop application, OpenOffice.org can be also run in server mode. In this mode, OpenOffice.org listens to a network port for connections. You can connect to an OpenOffice.org server running either on a local or remote computer and use the UNO environment to work with documents. UNO libraries for both client and server modes are part of the standard OpenOffice.org distribution... The example application in the article used Dojo and its grid component as a front end. Dojo is a powerful JavaScript framework with lots of components ready to be used with almost no development effort. It also provides an AOP-like mechanism to add custom behavior to its components. The combination of OpenOffice.org and Dojo resulted in a working application resembling Google Spreadsheets and capable of displaying and editing cell values -- and all that with minimal development effort spent.

OASIS Open Standards Forum 2008
Staff, OASIS Announcement

OASIS announced that the annual OASIS European Forum will be held October 1-3, 2008 near London. The theme will focus on "Security Challenges for the Information Society." OASIS invites proposals for presentations, panel sessions, and interoperability demonstrations related to this theme. Funding for the Forum is provided by OASIS Foundational Sponsor members, BEA, IBM, Primeton, and Sun Microsystems, and by IDtrust. "Open exchange of information and access to online services also pose challenges and threats. Service providers want to authenticate the identity of individuals requesting access, and determine the resources and services they are entitled to access. Users want their identity and personal data and privacy to be protected adequately, and the confidentiality of sensitive data they are submitting to be respected. In today's Internet and in many large private network infrastructures, heterogeneity and diversity are the rule rather than the exception. Security infrastructures need open standards and interoperability to scale to the huge deployments that are being rolled out today. Some of these security standards from OASIS and other organizations support a model where identity authentication, access control, digital signature processing, encryption and key management are provided as services that can be distributed and shared. The Open Standards Forum 2008 will provide users who are evaluating or looking to deploy such security infrastructures with an opportunity to explore the state of the art in security services, standards and products. It will also provide users with an opportunity to present and share their use cases, requirements and (initial) experience with other users and with some of the leading experts in this field."

A Uniform Resource Identifier for Geographic Locations ('geo' URI)

Alexander Mayrhofer and Christian Spanring (eds). IETF Internet Draft Members of the IETF Geographic Location/Privacy (GEOPRIV) Working Group have published an initial -00 version of the draft "Uniform Resource Identifier for Geographic Locations ('geo' URI)." The document specifies an Uniform Resource Identifier (URI) for geographic locations using the 'geo' scheme name. A 'geo' URI provides latitude, longitude and optionally altitude of a physical location in a compact, simple, human-readable, and protocol independent way... An increasing number of Internet protocols and data formats are being enriched by specifications on how to add information about geographic location to them. In most cases, latitude as well as longitude are added as attributes to existing data structures. However, all those methods are specific to a certain data format or protocol, and don't provide a generic way to protocol independent location identification. The 'geo' URI scheme is another step into that direction and aims to facilitate, support and standardize the problem of location identification in geospatial services and applications. 'Geo' URIs identify a geographic location using a textual representation of the location's spatial coordinates in either two or three dimensions (latitude, longitude, and optionally altitude). Such URIs are independent from a specific protocol, application, or data format in which they might be contained... Because the 'geo' URI is not tied to any specific protocol, and identifies a physical location rather than a network resource, most of the general security considerations on URIs do not apply. he URI syntax does make it possible to construct valid 'geo' URIs which don't identify a valid location on earth. Applications must not use URIs which such invalid values, and should warn the user when such URIs are encountered... The IETF Geographic Location/Privacy (GEOPRIV) Working Group, part of the Real-time Applications and Infrastructure Area activity, was chartered to assess the authorization, integrity and privacy requirements that must be met in order to transfer location information, or authorize the release or representation of such information through an agent. A goal of this working group is to deliver a specification that has broad applicablity and will become mandatory to implement for IETF protocols that are location-aware. The group has produced several final RFCs.


SEARCH \| ABOUT \| INDEX \| NEWS \| CORE STANDARDS \| TECHNOLOGY REPORTS \| EVENTS \| LIBRARY

Headlines

Sponsors