This issue of XML Daily Newslink is sponsored by:
SAP AG http://www.sap.com
- World Wide Web Consortium Lists: 400,000 Emails
- The Australian METS Profile: A Journey about Metadata
- Yahoo Search Takes Aim at Semantic Web
- Speech Synthesis Markup Language (SSML) Version 1.1
- Eclipse to Stress Component, Runtime Efforts
- Eclipse at eBay, Part 1: Tailoring Eclipse to the eBay Architecture
- Sun Unveils NetBeans 6.1 Beta
- OpenAjax Adds Security, Mobility To Web 2.0 Apps
- HTTP Header Linking
- Web Creator Rejects Net Tracking
World Wide Web Consortium Lists: 400,000 Emails
Jason Hunter, Blog
HTML 4.0, XML, PNG, CSS, DOM, and XQuery: These are but a few of the technologies to come out of the World Wide Web Consortium, commonly referred to as the W3C. Mark Logic Corporation is proud to announce that MarkMail has loaded the full W3C public mailing lists. MarkMail in fact uses all of those W3C technologies. The W3C mailing list archives start in 1994 and cover 400,000 emails across 200 mailing lists. MarkMail is a free service for searching mailing list archives, with huge advantages over traditional search engines. It is powered by MarkLogic Server: Each email is stored internally as an XML document, and accessed using XQuery. All searches, faceted navigation, analytic calculations, and HTML page renderings are performed on a single MarkLogic Server machine running against millions of messages. The MarkLogic Server is a commercial enterprise-class XML Content Server built to load, query, manipulate, and render large amounts of XML using the W3C's XQuery language. In MarkMail every email is represented and held as an XML document. MarkMail lets you search millions of emails across thousands of mailing lists. One may search using keywords as well as "from:", "subject:", "extension:", and 'list: constraints'. The GUI doesn't yet expose it, but you can negate any search item, like "-subject:jira". The subdomains are list constraining, so "tomcat.markmail.org" searches tomcat lists. The "n" and "p" keyboard shortcuts may be used to navigate the search results.
See also: the MarkMail W3C public mailing lists
The Australian METS Profile: A Journey about Metadata
Judith Pearce (et al.), D-Lib Magazine
In December 2007 the National Library of Australia registered an Australian METS Profile with the Library of Congress. This profile describes the rules and requirements for using the Metadata Exchange and Transmission Standard (METS) to support the collection of and access to content in Australian digital repositories. METS is a framework standard that enables metadata describing an object and its structure to be recorded in a document that can be used as a Submission Information Package (SIP) or Dissemination Information Package (DIP) in digital object management and delivery scenarios. It is extensible by plugging in various other extension schemas such as MODS (Metadata Object Description Schema) for resource description, MIX (Metadata for Images in XML) for still image technical metadata and PREMIS (PREservation Metadata Implementation Strategies) for provenance and fixity. The aim of this article is to describe our journey towards a generic Australian METS profile that can be used across multiple domains and usage scenarios. It also describes how the main profile and the sub-profile work together and what additional profiling work is planned by the National Library of Australia and its partners to address the needs of the Australian repository community and (hopefully) of the international community as well. The Journal Workflow project focussed on the use case of preserving access to an on-line journal created via the Public Knowledge Project (PKP) Open Journal System (OJS) application. The Submission Service takes packaged content (OJS Native XML), performs pre-ingestion processing over it (transform OJS Native XML into a METS package) and submits it to a repository. This workflow is customisable via the ability to develop and configure localised workflow steps within the service. The METS package is unpacked by the receiving repository and stored in whatever form the repository requires. In future, the OJS application itself is likely to support the export of content as a METS package. The Dissemination Service is available to repositories and makes use of the Digital Repository Interface (DRI) XML as the standard for representing the repository objects. In this way any repository able to generate DRI-compliant markup can store their objects natively but through the Dissemination Service have them rendered in a common way. Under the Journal Workflow, a journal stored in DSpace and Fedora native formats (vastly different) can be given the same look and feel. The Simple Web-service Offering Repository Deposit (SWORD) project led by UKOLN published its Deposit API not long after the Submission Service project had concluded. SWORD has been developed as a profile of the ATOM Publishing Protocol and is agnostic to workflow or content packaging format. Combining METS with SWORD in submission workflows is a direction we are currently exploring. METS is a good fit for reaching our destination. It is, however, one of a long line of standards developed to meet emerging needs. Standards will continue to be developed to meet changes in technology and the dynamic nature of the digital universe.
See also: the Australian METS Profile web site
Yahoo Search Takes Aim at Semantic Web
Heather Havenstein, Computerworld
Yahoo Inc. announced that it will support various Semantic Web standards in its new Search Open Platform, the latest move by the company to embrace the emerging Web framework. The company also disclosed more details about its plan to open its search engine to third party developers. Yahoo said that its support of standards like microformats and RDF (Resource Description Framework), are aimed providing users with better search results by improving the understanding of content and the relationships between content. For example, the new Web standards would ensure the inclusion of pertinent data, such as a person's name, location, current job specialties, number of contacts and a link to get introduced to that person, to a LinkedIn profile found via Yahoo Search, the company noted. "With a richer understanding of LinkedIn's structured data included in our index, we will be able to present users with more compelling and useful search results for their site," noted Amit Kumar, director of product management for Yahoo Search, in a blog post. Kumar: "While there has been remarkable progress made toward understanding the semantics of web content, the benefits of a data Web have not reached the mainstream consumer. Without a killer semantic Web app for consumers, site owners have been reluctant to support standards like RDF, or even microformats. We believe that app can be Web search." Yahoo also announced that it will launch a beta tool to let third parties add data to Yahoo Search results within several weeks. Using this tool a restaurant, for example, could add reviews or other data to Yahoo Search results for queries about the eatery. Developers can build enhanced results applications by accessing structured data that Yahoo will make available through public APIs and in its index. The structured data is available to Web site owners through feeds or the supported semantic Web standards.
See also: Amit Kumar's blog
Speech Synthesis Markup Language (SSML) Version 1.1
Daniel C. Burnett and Zhi Wei Shuang (eds), W3C Technical Report
W3C announced that the Voice Browser Working Group has published an updated Working Draft for the "Speech Synthesis Markup Language (SSML) Version 1.1" specification, part of the W3C framework for enabling access to the Web using spoken interaction. Appendix G documents the specification changes since SSML Version 1.0; a colored diff-marked version is also available for comparison purposes. Please send comments by 17-April-2008. This document enhances SSML 1.0 to provide better support for a broader set of natural (human) languages. To determine in what ways, if any, SSML is limited by its design with respect to supporting languages that are in large commercial or emerging markets for speech synthesis technologies but for which there was limited or no participation by either native speakers or experts during the development of SSML 1.0, the W3C held three workshops on the Internationalization of SSML. The first workshop, in Beijing, PRC, in October 2005, focused primarily on Chinese, Korean, and Japanese languages, and the second workshop, in Crete, Greece, in May 2006, focused primarily on Arabic, Indian, and Eastern European languages. The third workshop, in Hyderabad, India, in January 2007, focused heavily on Indian and Middle Eastern languages. Information collected during these workshops was used to develop a requirements document. Changes from SSML 1.0 are motivated by these requirements. SSML provides a rich, XML-based markup language for assisting the generation of synthetic speech in Web and other applications. It provides a standard way to control aspects of speech such as pronunciation, volume, pitch, rate, etc. across different synthesis-capable platforms. SSML is part of a larger set of markup specifications for voice browsers developed through the open processes of the W3C. A related initiative to establish a standard system for marking up text input is SABLE, which tried to integrate many different XML-based markups for speech synthesis into a new one. The intended use of SSML is to improve the quality of synthesized content. Different markup elements impact different stages of the synthesis process. The markup may be produced either automatically, for instance via XSLT or CSS3 from an XHTML document, or by human authoring. Markup may be present within a complete SSML document or as part of a fragment embedded in another language, although no interactions with other languages are specified as part of SSML itself. Most of the markup included in SSML is suitable for use by the majority of content developers.
See also: the W3C Voice Browser Activity
Eclipse to Stress Component, Runtime Efforts
Paul Krill, InfoWorld
The Eclipse Foundation announced that is will branch out in the realm of component-oriented software development, unveiling an umbrella project unifying several runtime initiatives. Eclipse's component development plan, called CODA (Component Oriented Development and Assembly), hinges on Eclipse's Equinox, which is the foundation's OSGi-based runtime and a part of the new Eclipse Runtime (RT) project. CODA is a methodology on how to build and deploy applications. Equinox is runtime platform software focused on Java and supporting the concepts of CODA. Eclipse RT serves as a top-level project to house runtime efforts in the Eclipse community. Featured will be six sub-projects, including: Equinox; Eclipse Communication Framework for development of distributed tools and applications; EclipseLink, providing object-relational persistence services; and Rich AJAX Platform for building AJAX applications. The other two subprojects include Swordfish, offering an SOA framework, and Riena for building an enterprise desktop with capabilities like the ability to access transaction and database systems. Also on tap at the conference is the introduction of an Equinox community portal to educate developers on Equinox, OSGi, and Eclipse runtime projects. OSGi has served as the basis for the Eclipse plug-in model, in which the Eclipse IDE is extended via plug-ins offering different capabilities. Equinox and CODA provide advantages in component-oriented development because Equinox is based on OSGi, a component model spanning platforms and architectural tiers. OSGi also can be used in mobile and embedded devices and desktop and server applications; other component models tend to be operating system-specific or tied to a specific deployment tier. Developers using Equinox can assemble and customize the application and runtime platform; also, a standard integration mechanism is provided to link to partner and customer solutions.
See also: the Eclipse announcement
Eclipse at eBay, Part 1: Tailoring Eclipse to the eBay Architecture
Michael Galpin, IBM developerWorks
In this article the author explains how eBay uses Eclipse and custom plug-ins to build the next generation of the giant auction Web site. Eclipse's first claim to fame was as an integrated development environment (IDE) for Java technology. Eclipse's plug-in architecture is a big reason for its success. There are many popular plug-ins available, and it is very easy to create your own. These two traits make Eclipse a perfect fit for systems with specialized architectures, such as eBay. Eclipse is known for being a Java IDE with a great plug-in system. The Eclipse V3.3 (Europa) release of Eclipse brought with it several specialized distributions of Eclipse. These included Eclipse for Java developers and Eclipse for Java EE developers. In addition, you can use the Eclipse C/C++ Development Toolkit (CDT) and the Eclipse PHP Development Toolkit (PDT). eBay was originally launched as AuctionWeb in 1995. The original site was written in Perl. As the site grew, it was rewritten with a C++ back end and a front end that made use of XSL. Using XSL to generate HTML was very cutting-edge back in the late 1990s. eBay went public in 1998 and continues to see exponential growth. Constantly mounting pressure from traffic forced a massive rewrite of the back end of eBay in the Java programming language starting in 2001. The front-end architecture was not changed. The Java+XSL architecture is internally referred to as the V3 architecture, with Perl being V1 and C++/XSL as V2. The V3 architecture proved to be massively scalable, allowing eBay to grow to its current size as one of the world's most visited sites.
Sun Unveils NetBeans 6.1 Beta
Darryl K. Taft, eWEEK
See also: the NetBeans IDE 6.1 release notes
OpenAjax Adds Security, Mobility To Web 2.0 Apps
Terry Sweeney, InformationWeek
See also: the announcement
HTTP Header Linking
Mark Nottingham (ed), IETF Internet Draft
This memo clarifies the status of the Link HTTP header and attempts to consolidate link relations in a single registry. A means of indicating the relationships between documents on the Web has been available for some time in HTML, and was considered as a HTTP header in RFC 2068, but removed from RFC 2616, due to a lack of implementation experience. There have since surfaced many cases where a means of including this information in HTTP headers has proved useful. However, because it was removed, the status of the Link header is unclear, leading some to consider minting new application-specific HTTP headers instead of reusing it. This document seeks to address these shortcomings. Additionally, formats other than HTML—namely, Atom (RFC 4287)—have also defined generic linking mechanisms that are similar to those in HTML, but not identical. This document aims to reconcile these differences when such links are expressed as headers. This straw-man draft is intended to give a rough idea of what it would take to align and consolidate the HTML and Atom link relations into a single registry with reasonable extensibility rules. In particular: (a) it changes the registry for Atom link relations, and the process for registration; (b) it assigns more generic semantics to several existing link relations, both Atom and HTML; (c) it changes the syntax of the Link header—in the case where extensions are present. The Link entity-header field provides a means for describing a relationship between two resources, generally between that of the entity associated with the header and some other resource. An entity may include multiple Link values. The Link header field is semantically equivalent to the 'link' element in HTML, as well as the 'atom:link' element in Atom. The title parameter may be used to label the destination of a link such that it can be used as identification within a human-readable menu... Link Relation Registry: This specification is intended to update Atom s a way of indicating the semantics of a link. Link relations are not format-specific, and must not specify a particular format or media type that they are to be used with. The security considerations of following a particular link are not determined by the link's relation type; they are determined by the specific context of the use and the media type of the response. Likewise, a link relation should not specify what the context of its use is, although the media type of the dereferenced link may constrain how it is applied. New relations may be registered, subject to IESG Approval, as outlined in RFC 2434.
See also: Atom references
Web Creator Rejects Net Tracking
Rory Cellan-Jones, BBC News
The creator of the Web has said consumers need to be protected against systems which can track their activity on the internet. Sir Tim Berners-Lee told BBC News he would change his internet provider if it introduced such a system. Plans by leading internet providers to use Phorm, a company which tracks web activity to create personalised adverts, have sparked controversy. Sir Tim said he did not want his ISP to track which websites he visited. "I want to know if I look up a whole lot of books about some form of cancer that that's not going to get to my insurance company and I'm going to find my insurance premium is going to go up by 5% because they've figured I'm looking at those books," he said. Sir Tim said his data and web history belonged to him... Phorm has said its system offers security benefits which will warn users about potential phishing sites—websites which attempt to con users into handing over personal data. The advertising system created by Phorm highlights a growing trend for online advertising tools - using personal data and web habits to target advertising. Social network Facebook was widely criticised when it attempted to introduce an ad system, called Beacon, which leveraged people's habits on and off the site in order to provide personal ads... According to "The Register" ("Gov advisors: Phorm is illegal"), "The Foundation for Information Policy Research (FIPR), a leading government advisory group on internet issues, has written to the Information Commissioner arguing that Phorm's ad targeting system is illegal. In an open letter posted to the think tank's website today, the group echoes concerns voiced by London School of Economics professor Peter Sommer that Phorm's planned partnerships with BT, Virgin Media and Carphone Warehouse are illegal und the Regulation of Investigatory Powers Act 2000 (RIPA). The letter, signed by FIPR's top lawyer Nicholas Bohm, states: 'The explicit consent of a properly-informed user is necessary but not sufficient to make interception lawful'... Bohm uses the letter to urge the Information Commissioner, Richard Thomas, to ignore the conclusions of the Home Office, which advised BT and the other ISPs that Phorm's technology is legal."
See also: the Register article
XML Daily Newslink and Cover Pages are sponsored by:
|BEA Systems, Inc.||http://www.bea.com|
|Sun Microsystems, Inc.||http://sun.com|
XML Daily Newslink: http://xml.coverpages.org/newsletter.html
Newsletter Archive: http://xml.coverpages.org/newsletterArchive.html
Newsletter subscribe: firstname.lastname@example.org
Newsletter unsubscribe: email@example.com
Newsletter help: firstname.lastname@example.org
Cover Pages: http://xml.coverpages.org/