Cover Pages: XML Daily Newslink: Thursday, 10 January 2008

A Cover Pages Publication http://xml.coverpages.org/
Provided by OASIS and Sponsor Members
Edited by Robin Cover

This issue of XML Daily Newslink is sponsored by:
SAP AG http://www.sap.com

Headlines

Markmail for Email Archiving and Search Exceeds 5.5 Million Messages
MarkMail Provides Amazing Search Capabilities
W3C Working Draft: SMIL Timesheets 1.0
Component Composition Strategies and Tactics
TopQuadrant Releases Semantic Web Development Tools
Microsoft to Provide Virtual Access to Library of Congress
Unicode Common Locale Data Repository (CLDR) Version 1.5.1
OLPC: A Conversation with Mary Lou Jepsen

Markmail for Email Archiving and Search Exceeds 5.5 Million Messages
Staff, Mark Logic Corporation Announcement

Developers at Mark Logic Corporation have announced support for a number of new mailing lists and significant growth of existing lists, increasing the base of content being archived and searched with MarkMail. Launched in November 2007, MarkMail is a community-focused searchable message archive service. MarkMail allows people to leverage the immense amount of collective knowledge accumulated over time through email discussions. Users can find technical information, research historical decision making, analyze and understand trends, and locate experts for a wide range of technical topics. Each email is stored as an XML document, and accessed using XQuery. All searches, faceted navigation, analytic calculations, and HTML page renderings are performed on a single MarkLogic Server machine running against millions of messages. With MarkMail, users can seamlessly query across the structured and unstructured parts of email, including attachments, unlocking the value trapped inside millions emails. By observing structure in the seemingly free-form content of the message body and automatically weighting query terms appropriately, MarkMail delivers results far superior to simple full-text search. In addition, MarkMail presents analytic information based on header information and other message metadata, enabling user drill down and query refinement. MarkMail now contains more than 700 mailing lists and nearly six million messages, and in the last two months has added: (1) 550,000 messages about PHP - a web scripting language; (2) 360,000 messages about MySQL - a relational database; (3) 275,000 messages about Mozilla - home to Firefox, Thunderbird, etc; (4) 200,000 messages about XML - XSLT, SAX, Mark Logic, etc; (5) 85,000 messages about Cascading Style Sheets - a key component of Web development; (6) Many more messages from smaller development communities - including JDOM, WSO2, AppFuse, XWiki, and Mule. Built on MarkLogic Server, a industry-leading XML content server, MarkMail combines sophisticated search functionality with a powerful faceted navigation interface and real-time analytics to deliver a new, state-of-the-art experience for interacting with large-scale message archives.

See also: the MarkMail overview

MarkMail Provides Amazing Search Capabilities
Tim O'Reilly, Blog

I've been meaning to write for a while about MarkLogic's awesome new search tool for trolling through open source mailing lists, MarkMail. Let's face it. While there may be a new generation that thinks that email is for old fogies, for many of us, email is a primary online tool, at least as important to us as the web. Many of us no longer file documents or attachments—we just search for them again in our email. Perhaps most importantly, email is a primary collaboration tool, and as many of us have figured out, collaboration is one of the internet's killer apps. Searching our shared memory in a collaborative space is REALLY useful—with open source mailing lists being a great example. Despite its importance, very little has been done to improve on email. The clients we use today are not radically different from what we used ten years ago (except perhaps in being web-based). This is why there was so much excitement when xobni showed how useful it is to expose the social network hidden in email. MarkMail does something equally powerful. Imagine a tool that lets you see trends across thousands of email messages, saved over years. Imagine being able to find who is the most prolific poster on a given topic, and explore the histogram of their entire message history. Imagine being able to do instantaneous data mining against millions of stored messages, with a response time better than you get looking at your local mailbox... Where MarkMail really shines is in managing large mail archives. And that's why, of course, MarkLogic has put up MarkMail for free.

W3C Working Draft: SMIL Timesheets 1.0
Petri Vuorimaa, Dick Bulterman, Pablo Cesar (eds), W3C Technical Report

W3C announced "SMIL Timesheets 1.0" as a First Public Working Draft and Last Call Working Draft of a possible future W3C Recommendation of the SMIL Timesheets 1.0. This document has been produced by the SYMM Working Group as part of the W3C Synchronized Multimedia Activity. document defines an XML timing language that makes SMIL 3.0 element and attribute timing control available to a wide range of other XML languages. This language allows SMIL timing to be integrated into a wide variety of a-temporal languages, even when several such languages are combined in a compound document. SMIL Timesheets can be seen as a temporal counterpart of CSS. Whereas CSS defines the spatial layout of the document and formatting of the elements,SMIL Timesheets specify which elements are active at a certain moment and what their temporal scope is within a document. And as with CSS, SMIL Timesheets can be reused in multiple documents, which can provide a common temporal framework for multimedia presentations with different contents but identical storylines. SMIL Timesheets allows the definition of out-of-line timing in conjunction with non-SMIL languages including compound XML documents. To make authoring easier, it contains only a limited subset of SMIL functionality. This document was part of the SMIL 3.0 specification as the "SMIL 3.0 External timing" module, extending SMIL timing. It was removed from the SMIL 3.0 specification at Candidate Recommandation phase in order to give it more visibility as Timesheets allows integration of timing into a wide range of other XML languages.

Component Composition Strategies and Tactics
Jean-Jacques Dubray, InfoQ

Component technologies have continuously evolved since the early 90s. With the advent of Spring and the development of the Dependency Injection pattern, they have taken a new turn and started providing advanced composition mechanisms. Last week, Sanjay Patil, Standards Architect at SAP Labs, published an article on "IT Scenarios for Service Component Architecture" that described some of the composition strategies enabled by the Service Component Architecture. For Sanjay, the two most important factors are transport independence which translates into deployment flexibility, and dynamicity which is achieved in SCA with the use of policies. Sanjay sees: (1) a bottom-up composition scenario where an application [is composed] by assembling different existing implementation artifacts; (2) An heterogeneous composition scenario ... where SCA allows structuring the integration logic, and related functions (such as mapping, etc.) as first class components, whose relationship with other components is then captured as part of a well defined composite. Last month, a team from IBM compared the different component technologies and the degree to which they support composition. The article ("Software Components: Coarse-Grained Versus Fine-Grained") first defined some of the properties that promote composition mechanisms: interface coupling, data (type systems and message formats), version resiliency, transport independence, expected interaction patterns, conversations, ability to mediate, and dynamicity. The IBM article concludes: 'One of the strengths of SCA is its ability to combine with a wide variety of fine-grained component models that are used to implement coarse-grained service components. SCA brings value to each of them in terms of modeling the structure of solutions in the large, providing agility and flexibility, and removing the need to define complex configuration details within implementation code. It also has the virtue of being able to connect different component models used for different parts of an overall solution.'

TopQuadrant Releases Semantic Web Development Tools
Kurt Mackie, Application Development Trends Articles

TopQuadrant has added to its open Eclipse-based suite of solutions that enable semantic Web application development for the enterprise. The latest addition, announced in November, is called TopBraid Composer Maestro Edition. This collection of tools is part of the company's TopBraid Suite of solutions that support ontology modeling, Web application deployment and collaboration. TopBraid Composer Maestro Edition has enhanced features for testing and rapid deployment of Web applications, according to the company. It includes an integrated Web server and Business Intelligence and Reporting Tools (BIRT), plus ease-of-use improvements. Maestro Edition users can execute Java Server Pages (JSP) in the Eclipse development environment to generate HTML and XML documents. The toolset has a mapping feature that lets users query and transform XML in Resource Description Framework (RDF), and then save back to XML without losing information (which is known as "round-tripping"), according to company officials. Users can also scan e-mails and load metadata into Web Ontology Language (OWL) ontologies. The key idea behind the Maestro Edition is its ability to enable agile development, according to Holgar Knublauch, vice president of product development... TopQuadrant's products use a proprietary querying language called Sparql/Motion, which is a graphical language built around the W3C's SPARQL, a developing standard. SPARQL is comparable with SQL, but it operates on RDF data sources.

See also: the TopQuadrant web site

Microsoft to Provide Virtual Access to Library of Congress
Grant Gross, InfoWorld

Microsoft announced that it will provide the technology to allows visitors to the U.S. Library of Congress (LOC) to first take a virtual tour of historic documents and map out what exhibits they want to see. The project will include the Myloc.gov Web site, to be launched in April 2008, linked to information kiosks at the LOC's Thomas Jefferson Building in Washington, D.C. Interactive galleries will allow visitors to the Myloc.gov site to view and sometimes interact with items such as a rough draft of the U.S. Declaration of Independence, the Gutenberg Bible, and a 1507 map that first used the word "America." The new technology is designed to assist people who want to visit the library in person, according to John Sampson, director of federal government affairs at Microsoft. Visitors to the Web site will be able to bookmark areas of interest, then use a bar code at the LOC's information kiosks that will point them to more information in person, he said. Visitors both online and on-site can also engage in a game called Knowledge Quest that sends them searching for clues in the LOC's art and artifacts. Interactive presentation software for kiosks will run on Windows Vista and its Web equivalent, built using Microsoft Silverlight. The project will also use Microsoft Office SharePoint Server 2007 Web content management software. The library's "Exploring the Early Americas" exhibition, which opened December 13, 2007, offers a sampling of the new experience.

See also: the Microsoft announcement

Unicode Common Locale Data Repository (CLDR) Version 1.5.1
Rick McGowan, Unicode Announcement

The Unicode Consortium has announced the release of the new version of the "Unicode Common Locale Data Repository" (Unicode CLDR 1.5.1), providing key building blocks for software to support the world's languages. Unicode CLDR is by far the largest and most extensive standard repository of locale data. This data is used by a wide spectrum of companies for their software internationalization and localization: adapting software to the conventions of different languages for such common software tasks as formatting of dates, times, time zones, numbers, and currency values; sorting text; choosing languages or countries by name; and many others. CLDR uses the XML format provided by UTS #35: "Locale Data Markup Language (LDML)." LDML is a format used not only for CLDR, but also for general interchange of locale data, such as in Microsoft's .NET. The main changes in CLDR Version 1.5.1 are a significant revision to the data and process for computing timezone names, and additional data for finding default script or country given a language, or the converse. The structure has also been updated for the latest version of IETF BCP 47 ("Tags for Identifying Languages"), and new currency codes.

See also: XML and Unicode

OLPC: A Conversation with Mary Lou Jepsen
John Ryan, ACM Queue

OLPC: What's behind that funky green machine? Mary Lou Jepsen and her team had to reinvent what a laptop could be. From Tunisia to Taiwan, Mary Lou Jepsen has circled the globe in her role as CTO of the OLPC (One Laptop Per Child) project. Founded by MIT Media Lab co-founder Nicholas Negroponte in 2005, OLPC builds inexpensive laptops designed for educating children in developing nations. Marvels of engineering, the machines have been designed to withstand some of the harshest climates and most power-starved regions on the planet. To accomplish this, Jepsen and her team had to reinvent what a laptop could be. As Jepsen says, "You ask different questions and you get different answers." The resulting machine, named the XO, is uniquely adapted to its purpose, combining super-low-power electronics, mesh networking, and a sunlight-readable screen, which Jepsen designed herself. Although still shy of the "$100 laptop" goal envisioned in the beginning, the XO is still the most inexpensive laptop ever built. Jepsen: "The architecture we've created is very powerful, not just for low-cost laptops, but for high-end laptops as well... If you look at what's been happening in computers for the past 40 years, it's been about more power, more megahertz, more MIPS. As a result, we've had huge applications and operating systems. Instead, at OLPC we focused on an entirely different kind of solution space. We focused on low power consumption, no hard drive, no moving parts, built-in networking, and sunlight-readable screens..." [Ed note: Jepsen recently founded Pixel Qi, endeavoring to create the world's first $75 laptop.]

See also: the OLPC Wiki


SEARCH \| ABOUT \| INDEX \| NEWS \| CORE STANDARDS \| TECHNOLOGY REPORTS \| EVENTS \| LIBRARY

Headlines

Sponsors