Cover Pages: XML Daily Newslink: Wednesday, 16 January 2008

A Cover Pages Publication http://xml.coverpages.org/
Provided by OASIS and Sponsor Members
Edited by Robin Cover

This issue of XML Daily Newslink is sponsored by:
Sun Microsystems, Inc. http://sun.com

Headlines

Helping Dolphins Fly: Sun Acquires MySQL
Interoperability for Searching Learning Object Repositories: The ProLearn Query Language
Creating Preservation-Ready Web Resources
Thoughts on Firefox 3.0
W3C Team Submission on N3 and Turtle
MuleSource Readies Open Source SOA Governance
Dairy Company Lends Insight Into Wal-Mart's RFID Mandate

Cover Pages

W3C Publishes SPARQL Protocol and RDF Query Language Semantic Web Standard

Helping Dolphins Fly: Sun Acquires MySQL
Jonathan Schwartz, Blog

We [Sun Microsystems] announced big news today: our preliminary results for our fiscal second quarter, and as importantly, that we're putting a billion dollars behind the "M" in LAMP. If you're an industry insider, you'll know what that means: we're acquiring MySQL AB, the company behind MySQL, the world's most popular open source database. MySQL is by far the most popular platform on which modern developers are creating network services. From Facebook, Google and Sina.com to banks and telecommunications companies, architects looking for performance, productivity and innovation have turned to MySQL. In high schools and college campuses, at startups, at high performance computing labs and in the Global 2000. The adoption of MySQL across the globe is nothing short of breathtaking. Over the past few years, we've distributed hundreds of millions of licenses and invested to build some of the free software world's largest communities. From Java to ZFS, Lustre to Glassfish, NetBeans to OpenOffice.org and OpenSolaris, we've been patient investors and contributors, both. Free and open software has become a way of life at Sun. MySQL's has similarly driven extraordinary adoption of their community platform, with more than 100 million downloads over the past 10 years... Just as we did for Oracle in their early days, our performance engineering teams will sit (virtually) with their counterparts in MySQL and in the community, leveraging technologies such as ZFS and DTrace (which we didn't even have in the Oracle era) to ensure Sakila flies—along with the rest of the LAMP stack, from memcached and PHP, to the broader ISV community around MySQL. MySQL is already the performance leader on a variety of benchmarks. We'll make performance leadership the default for every application we can find, and on every vendor's hardware platforms, not just Sun's: on Linux, Solaris, Windows, all. For the technically oriented, Falcon will absolutely sing on Niagara...

See also: Tim O'Reilly's Blog

Interoperability for Searching Learning Object Repositories: The ProLearn Query Language
Stefaan Ternier, David Massart, Alessandro Campi, Sam Guinea, Stefano Ceri, and Erik Duval; D-Lib Magazine

The "ProLearn Query Language" is a query language developed for repositories of learning objects. The most relevant standards for describing learning objects are LOM, Dublin Core, and MPEG-7. The IEEE Learning Object Metadata (LOM) is a hierarchical metadata standard usually encoded in XML, published by the IEEE in 2002. Its purpose is to enable the description of learning objects through attributes that include the type of object, author, owner, terms of distribution, and format, as well as pedagogical attributes, such as typical learning time or interaction style. LOM is based on early work in ARIADNE and IMS. Dublin Core (DC) is a standard for generic resource descriptions. The simple DC metadata element set consists of 15 elements, including title, creator, subject, description, publisher, contributor, date, type, format, identifier, source, language, relation, coverage, and rights. MPEG-7 is an ISO/IEC standard for describing multimedia content. MPEG-7 Multimedia Description Schemes (DSs) are metadata structures in XML that facilitate searching, indexing, filtering, and access. PLQL is primarily a query interchange format, used by source applications (or PLQL clients) for querying repositories (or PLQL servers). In this article, we give a precise description of the semantics of PLQL, concerning both kinds of clauses and their mutual relationship and describe two experimentation efforts around PLQL: one involving the ARIADNE repository and the other the EUN Learning Resource Exchange initiative. We present PLQL as an emerging standard for querying worldwide, heterogeneous learning object repositories. This work has followed on the heels of previous achievements in defining SQI (the Simple Query Interface), a unified access point to distributed and heterogeneous repositories. PLQL is a "query interchange format" accommodating the great diversity of the different repositories and their capabilities. PLQL combines exact and approximate search, and supports queries on XML-based hierarchical metadata. PLQL was designed in a way that allows for easy mapping to various paradigms for metadata management. So far, we have mapped PLQL to Lucene, XML Query (XQuery), and SQL. Apache Lucene is an important deployment context for PLQL as this open source toolkit for text indexing and searching is widespread and easy to use. XQuery is the W3C recommendation for querying XML data; with this mapping, all PLQL levels can be mapped and executed on XML database systems.

See also: the Wiki

Creating Preservation-Ready Web Resources
Joan A. Smith and Michael L. Nelson, D-Lib Magazine

Preservation is an on-going challenge for digital libraries, but even more so for the World Wide Web. While archivists may understand web sites, webmasters typically know little about preservation models, metadata, and methods. From the webmaster's point of view, the ideal solution would be a tool installed on the web server which manages itself, and which automatically provides the "extra information" (i.e., metadata) that the archiving site needs to prepare the website for preservation, and which does not impact the normal operation of the web server. We propose a simple model for such everyday web sites which takes advantage of the web server itself to help prepare the site's resources for preservation. This is accomplished by having metadata utilities analyze the resource at the time of dissemination. The web server responds to the archiving repository crawler by sending both the resource and the just-in-time generated metadata as a straight-forward XML-formatted response. We call this complex object (resource + metadata) a CRATE. In this paper we discuss mod_oai, the web server module we developed to support this approach, and we describe the process of harvesting preservation-ready resources using this technique... How can metadata be derived for web resources? Several tools have been developed in recent years that can be used to analyze a web resource. The limitations of MIME typing as currently implemented by web servers has led to projects like the Global Digital Format Registry (GDFR) and Pronom's DROID tool, which provide a deeper introspection of the resource's format. Once the format type is known and described, additional utilities can extract information like keywords and subject matter, or derive an abstract from text content. JHOVE, which arose from Harvard's JSTOR project, can identify, validate and characterize a number of file types including images (JPEG, GIF, PNG, etc.), text (HTML, XML), and PDF documents... A CRATE consists entirely of XML-formatted, plain ASCII (human-readable) content. The concept calls for the disseminating web server to preprocess the resources it serves up by using metadata-generation utilities and to serialize this information together with the Base64-encoded resource in a simple XML-formatted complex object response, using the Apache mod_oai web server module. [As to the] content-length of the full response, XML files can grow very large, particularly where images are concerned; there are several mechanisms for dealing with this issue.

Thoughts on Firefox 3.0

Kurt Cagle, O'Reilly Reviews

The second beta version of Firefox Version 3.0 is now out, and I have to say that overall I'm feeling quite pleased with what I'm seeing, with a few caveats. As to DOM: the changes coming to this version with JavaScript are likely to engender some headaches in the short-term, especially for big AJAX libraries, but longer term will prove fairly beneficial... Certain efforts of the HTML 5/WhatWG efforts are also making their way into Firefox. Among them is the introduction of the activeElement and hasFocus properties. The activeElement property (on the document) gives a pointer to the element that currently has the focus, or the body element if nothing else has it within the document. Similarly, the hasFocus property on an element will retrieve whether that element currently has the property or not... Detecting and Working with Offline Applications: As web applications have become more complex and robust, they have also become far more sensitive to needing to know when they are in fact online or offline. This is especially true of AJAX applications that often 'die' at the worst possible time because you've lost a wireless signal to the Internet. Firefox 3 implements a new read-only property and two new events ('ononline' and 'onoffline') which makes it possible to detect which such events occur... One of the central changes for CSS in Firefox is the adoption of the W3C Cascading StyleSheet Object Model (CSSOM) working draft, which provides a comprehensive mechanism for enabling or disabling individual stylesheets, and for manipulating stylesheets from DOM. CSS changes at the individual rule and property level are pretty impressive as well: (1) Support for inline-block and inline-table. This is an extraordinarily welcome addition (especially for XForms developers), because it makes it possible to take a span within a div element and set the height, width, text-alignment and so forth. (2) rgba() and hsla()—these CSS functions let you specify either red/green/blue/alpha byte values or hue/saturation/luminosity/alpha values for the CSS color property. (3) text-rendering()—text-rendering determines the quality of the text output compared to the speed of rendering. (4) Soft hypens and tabs—soft hyphens (indicated by the  entity) will force a word to break at the hypen if its at the edge of a block of text, while hard (default) hyphens generally force the word onto the next line; tabs in monospaced fonts are also now rendered more accurately, rather than just being treated as generic white space.

W3C Team Submission on N3 and Turtle
Ivan Herman, Blog

As provided in the W3C Process Document, the W3C Team may request that the Director publish information at the W3C Web site. At the Director's discretion, these documents are published as "Team Submissions". These documents are analogous to Member Submissions (e.g., in expected scope). However, Team Submissions are not part of the Recommendation Track process, and there is no additional Team comment. W3C recently announced the release of new versions of the RDF N3 and the RDF Turtle serialization formats, co-authored by Tim Berners-Lee and Dan Connolly for the former, and David Beckett and Tim Berners-Lee for the latter. The new versions also eliminate some minor incompatibilities between these languages and the SPARQL pattern language. The authors have also submitted the 'text/n3' and 'text/turtle' media types to IETF. "Turtle: Terse RDF Triple Language" defines a textual syntax for RDF called 'Turtle' that allows RDF graphs to be completely written in a compact and natural text form, with abbreviations for common usage patterns and datatypes. Turtle provides levels of compatibility with the existing N-Triples and Notation 3 formats as well as the triple pattern syntax of the SPARQL Recommendation. The "Notation3 (N3): A Readable RDF Syntax" specification defines Notation 3 (also known as N3), an assertion and logic language which is a superset of RDF. N3 extends the RDF datamodel by adding formulae (literals which are graphs themselves), variables, logical implication, and functional predicates, as well as providing an textual syntax alternative to RDF/XML.

See also: W3C Team Submissions

MuleSource Readies Open Source SOA Governance
Paul Krill, InfoWorld

Branching out in the SOA space, MuleSource has introduced its Mule Galaxy software, an open source SOA governance platform with an integrated registry and repository. The company also refreshed its Mule open source ESB (enterprise service bus) and offer Mule Saturn, a lightweight BAM (business activity monitoring) tool that works with the ESB. Working with Mule software or as a stand-alone component in an SOA infrastructure, Galaxy features a RESTful Atom Pub interface to simplify integration with frameworks such as Apache CXF and Windows Communication Foundation, MuleSource said. Support for artifact types is provided for Mule configuration, WSDL (Web Service Definition Language), policies, and custom artifacts. Enterprises can set their own policies. According to the announcement: "MuleSource, the leading provider of open source service oriented architecture (SOA) infrastructure software, today announced the Beta release of Mule Saturn 1.0, a lightweight business activity monitoring tool for business processes and workflow. Saturn is designed to complement an SOA infrastructure by providing detailed logging and reporting on every transaction that flows through the Mule Enterprise Service Bus (ESB). It is one of the first new tools to be featured in today's release of Mule 1.5 Enterprise Edition. The Mule ESB is the core component of SOA, and enables a broad range of integration scenarios, moving and managing data between systems inside and outside of the firewall. Saturn is designed to complement the SOA infrastructure by providing detailed logging and reporting on every transaction that flows through the Mule ESB. By aggregating real-time transactional data, Saturn analyzes the complete flow from one system to another and captures the relevant information to determine the state." The blog from Anne Thomas Manes ("MuleSource Releases Another RESTful Open Source Registry Repository") notes that "at this point, three vendors provide fully RESTful repositories: MuleSource, WSO2, and HP Systinet..."

See also: the announcement

Dairy Company Lends Insight Into Wal-Mart's RFID Mandate
Mary Hayes Weier, InformationWeek

After its initial partnership back in 2004, Daisy Brands decided to tag all of its pallets no matter where they're heading. Daisy Brands, which sells its sour cream and cottage cheese through retail stores worldwide, joined Wal-Mart's RFID mandate early on to avoid the rush of companies clamoring for help with RFID products, certification and services. While others have hesitated, Daisy says its investment in RFID has been a boon, helping Daisy better manage the flow of its perishable products through Wal-Mart stores and ensure marketing promotions proceed as planned, according to Kevin Brown, Daisy's information systems manager. It also lets Daisy's other customers -- including those who don't use RFID—better track their orders. In 2003, Wal-Mart announced 100 top suppliers would launch its initial RFID effort. Daisy was among another 30-some companies that also volunteered... Using Wal-Mart's Retail Link Web site for suppliers, Brown can track, by lot number, how quickly pallets of product make it to stores and when they're unpacked (Wal-Mart has readers at its dock entrances and on its cardboard case compactors), and when products pass through a store's point-of-sale system based on their bar codes. Daisy's own ERP systems contain production and expiration information on all cases and pallets shipped. If product is moving too slowly, indicating a potential issue with freshness, Daisy can dispatch someone to a store to investigate. The information also provides Daisy with insight about trends and behaviors among different types of stores. RFID is far superior to bar codes, Brown said, because it doesn't require a line of site from a reader. Brown is also using the information to track promotion success.

Selected from the Cover Pages, by Robin Cover

W3C Publishes SPARQL Protocol and RDF Query Language Semantic Web Standard

SPARQL (a recursive acronym for "SPARQL Protocol and RDF Query Language," pronounced "sparkle") has been released as a standard by W3C. The three-part specification was produced by members of the RDF Data Access Working Group, which is part of the W3C Semantic Web Activity. It defines a standardized query language for RDF enabling the 'joining' of decentralized collections of RDF data. RDF (Resource Description Framework) is a directed, labeled graph data format for representing information in the Web. RDF "integrates a variety of applications from library catalogs and world-wide directories to syndication and aggregation of news, software, and content to personal collections of music, photos, and events using XML as an interchange syntax. The RDF specifications provide a lightweight ontology system to support the exchange of knowledge on the Web." As explained in the W3C announcement, SPARQL allows people to "focus on what they want to know rather than on the database technology or data format used behind the scenes to store the data. Because SPARQL queries express high-level goals, it is easier to extend them to unanticipated data sources, or even to port them to new applications." Tim Berners-Lee, W3C Director: "Trying to use the Semantic Web without SPARQL is like trying to use a relational database without SQL. SPARQL makes it possible to query information from databases and other diverse sources in the wild, across the Web." Three SPARQL implementation reports accompany the prose specifications. W3C reports that fourteen implementations of SPARQL are already documented, many of which are available as open source software.


SEARCH \| ABOUT \| INDEX \| NEWS \| CORE STANDARDS \| TECHNOLOGY REPORTS \| EVENTS \| LIBRARY

Headlines

Cover Pages

Sponsors