Cover Pages: XML Daily Newslink: Wednesday, 08 October 2008

A Cover Pages Publication http://xml.coverpages.org/
Provided by OASIS and Sponsor Members
Edited by Robin Cover

This issue of XML Daily Newslink is sponsored by:
Sun Microsystems, Inc. http://sun.com

Headlines

Schema Support Strengthens Toolkit in XHTML Modularization Standard
Open XML Format SDK 2.0
Indexing and Searching Image Files Using Lucene.NET
Time for a Calendar Revival
Red Hat Boosts Open Source SOA
Service Component Architecture in Real Life
The DITA Standard Puts Its Money Where Its Mouth Is

Schema Support Strengthens Toolkit in XHTML Modularization Standard
Daniel Austin, Subramanian Peruvemba (et al., eds), W3C Technical Report

W3C announced the update of its its XHTML Modularization standard today with support for designing modular languages using XML Schema. The specification is now approved as a W3C Recommendation. The addition of schemas to XHTML Modularization 1.1 is an important step towards the XHTML2 Working Group's goal that XHTML support rich Web content and be extensible, while remaining interoperable. A modularization standard allows language designers to reuse elements defined by multiple parties (including other W3C standards such as SVG and MathML) and combine them into new formats to meet specific application needs. The standard allows people to use schema-enabled, off-the-shelf tools to immediately begin authoring and validating documents written in those new languages. The XHTML2 Working Group, which gained experience using Modularization 1.1 to build some modules and languages, now plans to add schema support to other XHTML standards... Over the last couple of years, many specialized markets have begun looking to HTML as a content language. There is a great movement toward using HTML across increasingly diverse computing platforms. Currently there is activity to move HTML onto mobile devices (hand held computers, portable phones, etc.), television devices (digital televisions, TV-based Web browsers, etc.), and appliances (fixed function devices). Each of these devices has different requirements and constraints. Modularizing XHTML provides a means for product designers to specify which elements are supported by a device using standard building blocks and standard methods for specifying which building blocks are used. These modules serve as "points of conformance" for the content community. The content community can now target the installed base that supports a certain collection of modules, rather than worry about the installed base that supports this or that permutation of XHTML elements. The use of standards is critical for modularized XHTML to be successful on a large scale. It is not economically feasible for content developers to tailor content to each and every permutation of XHTML elements. By specifying a standard, either software processes can autonomously tailor content to a device, or the device can automatically load the software required to process a module. Modularization also allows for the extension of XHTML's layout and presentation capabilities, using the extensibility of XML, without breaking the XHTML standard. This development path provides a stable, useful, and implementable framework for content developers and publishers to manage the rapid pace of technological change on the Web.

Open XML Format SDK 2.0
Zeyad Rajabi, Blog

This article shows how to use the Microsoft Open XML SDK to accomplish real world scenarios such as document assembly and document manipulation. It first addresses the overall design of the Open XML SDK with respect to goals and scenarios. Subsequent posts will dive more deeply into the architecture of the SDK as well as show lots of sample code... The Open XML SDK provides a set of .Net APIs that allows developers to create and manipulate documents in the Open XML Formats in both client and server environments without the need of the Office clients. The SDK should make it easier for you to build solutions on top of the Open XML Format by allowing you to perform complex operations, such as creating Open XML packages or adding/deleting tables, with just a few lines of code. The SDK takes care of both the structure of the Open XML Format as well as the XML contained in each of the parts of the package. In other words, with this SDK, you will be able to add or remove parts within a package as well as manipulate xml constructs, such as paragraphs and tables. The SDK also supports programming in the style of LINQ to XML, which makes coding against XML content much easier than the traditional W3C XML DOM programming model. Using the Open XML SDK to create solutions that manipulate documents directly has many advantages as compared with automating Microsoft Office applications using macros or VBA. The major advantage is that the Open XML SDK is fully supported on the server, unlike automating Office applications. That means you can create managed code solutions that are scalable and stable on the server. Imagine being able to write multi-threaded solutions that build on top of the SDK. In addition, there is a huge performance advantage when developing solutions with the Open XML SDK, which is very evident when dealing with large numbers of documents. You will be able to programmatically generate 1000s of documents based on data from a database within a matter of seconds rather than hours. Lastly, the Open XML SDK is a dedicated file format API that specializes in the manipulation and creation of Open XML packages. The SDK is fully aware of the structure and schema of Open XML Formats. The Open XML SDK has been releases as two versions: (1) Version 1.0 allows for direct manipulation of the Open XML Package at the part level; (2) Version 2.0 provides strongly typed class support for the underlying XML content contained in each part In other words, version 1.0 of the SDK deals with the structure or skeleton of Open XML Formats, while version 2.0 of the SDK deals with the xml contained within each of the xml parts. Version 1.0 of the SDK has been fully released with a "go-live" license back in June 2008. With this go-live license you will be able to build and deploy solutions confidently. A couple of weeks ago we released the first Community Technology Preview (CTP) of version 2.0 of the Open XML SDK. Keep in mind this version of the SDK is still a CTP, so we are expecting to get a lot of customer feedback to polish this API.

Indexing and Searching Image Files Using Lucene.NET
Adelene Ng, DDJ

Apache Lucene is a high-performance, full-featured text retrieval library. Originally written in Java, it has since been ported to C++, C#, Perl, and Python. In this article, I show how Lucene.NET can be used to index and search image files captured by digital cameras. What makes this possible is that digital photos embed the camera settings and scene information as metadata. The specification for this metadata is the Exchangeable Image File Format (www.exif.org). Examples of stored information include shutter speed, exposure settings, date and time, focal length, metering mode, and whether the flash was fired. Here I show how the EXIF information can be extracted from the images through some user-specified criteria. These user-specified search criteria are then used as an index to search your image library. To keep the example simple, I limited the EXIF search criteria to date range and user comments fields. All images that satisfy the search criteria are displayed as thumbnails. The ImageSearcher utility I present here was developed in C# using Visual Studio 2008. It also makes use of a number of open-source libraries and components such as Lucene.NET 2.0, NLog 1.0, and ExifExtractor. In addition to Lucene, I use the NLog logging library, which has a programming interface similar to Apache log4net. I use NLog to write diagnostic messages to a file for logging and tracing purposes. To extract EXIF information, I use the ExifExtractor library. Although .NET already has utilities to extract EXIF information, it returns raw data and tags. More processing would be required for this to be used in this application. For example, if I wanted to extract shutter speed information, I would need to know the tag number, extract the tag, and then convert the data from ASCII to a number... As you can see, the Lucene.NET and EXIFExtractor libraries can quickly be used to build an application that searches for images according to user-specified criteria, then display those images that match the search conditions on the results panel. Moreover, the application can be easily extended to include other search criteria such as shutter speed and camera make and model.

See also: Apache Lucene.Net

Time for a Calendar Revival
John Kremer, Blog

Why is Yahoo! launching a new Yahoo! Calendar after ten years and why will it be better than your paper calendar or (gasp!) your desktop calendar? Because we think the time is right for these Web-based scheduling applications to finally take off. Here's why: (1) Open standards like iCalendar and CalDAV make all online calendars work together so people can share their schedules without the hiccups of the past. (2) Broadband and mobile device ubiquity means you're always connected, even if you don't like to admit it. You need your busy life to be in order and for your calendar to be accessible wherever you go. (3) Web 2.0 technologies have made it possible to incorporate very cool visual effects and practical functions like event discovery. Thanks to the powerful technology that our Zimbra team built, and our involvement with the online calendaring community, we've been able to add some much-improved technical functionality to the new Yahoo! Calendar. Now you can better connect with your friends and family—even those who aren't using Yahoo! Calendar. Our new calendar is interoperable with the other popular services, including those from Apple, Microsoft, AOL, Mozilla, and Google, so you can share your upcoming plans and important dates with friends. With the new Yahoo! Calendar you can subscribe to any iCalendar-based public calendar and add upcoming events and show times to your Yahoo! Calendar. This means you'll be able to plan for a local concert when your favorite band comes to town and you'll know when the next new episode of 'Lost' will air. You can now easily drag and drop events to reschedule appointments without having to refresh your Web page. You can set email, IM or SMS reminders for important activities and never miss a birthday or anniversary again. You can personalize your Yahoo! Calendar with interesting photos from Flickr to make your online calendar as visually appealing as it is productive. And this beta is just the first in a series of updates you'll see coming from us. Imagine being able to download your favorite sports team's schedule, your class schedule, or your child's t-ball schedule and being constantly on top of everything. That's coming soon. [Note: John Kremer is Vice President, Yahoo! Mail.]

Red Hat Boosts Open Source SOA
Paul Krill, InfoWorld

Red Hat is expanding its open-source JBoss SOA platform with the unveiling Wednesday of JBoss Enterprise SOA Platform 4.3 and JBoss Operations Network 2.1. Enterprise SOA Platform supports small-integration projects to enterprise-wide SOA integration. It features open-source projects like JBoss ESB, JBoss JBPM, and JBoss Rules. SOA Platform 4.3 offers ESB features including gateway listeners, a declarative security model and improved Web services integration. Additional scripting languages are supported, enabling development of services in Jython, JRuby, and BeanShell. These languages enable non-Java programmers to build services. Version 4.3 can be administered by JBoss Operations Network 2.1, which also is being introduced Wednesday and supports patch management, start-stop monitoring, and other capabilities. With version 4.3, stateful rules services decision tables and rule agent support further enable business event processing and an event-driven architecture, Red Hat said. Also, non-developers can construct business rules. Among the other capabilities of JBoss Operations Network 2.1 is centralized management including inventory, administration, deployment and updating of JBoss Enterprise Middleware products and subsystems. Remote platform configuration and deployment and automatic ESB service inventory discovery are offered as well, along with JBoss ESB service monitoring. Pierre Fricke, director, SOA product line management at Red Hat: "Our SOA strategy aims to drive down the cost and complexity of implementing service oriented architectures with a comprehensive portfolio of enterprise-class, open source middleware. JBoss Enterprise SOA Platform 4.3 and JBoss Operations Network 2.1 together are another example of our commitment to accelerating the adoption of hardened, secure and stable open source middleware solutions that deliver on this strategy. Our customers helped drive the direction of the enhancements delivered today in the newest versions of the SOA Platform and JBoss Operations Network, and with these solutions, we're excited to offer enterprises tools to effectively plan, deploy, manage and monitor open source SOA."

See also: the Red Hat announcement

Service Component Architecture in Real Life
Miko Matsumura, DevX.com

This article will walk you through two samples of how to get started with SCA. First we'll write a small SCA application that uses the services provided by an SCA composite to do something useful. Having covered that, we'll show you how to take some code and package it as a composite so it can be used in SCA applications. In Service Component Architecture, we write service-oriented applications that use services provided by SCA composites. In the SCA component model, the smallest piece of accessible code is a component; a composite is composed of components. To quote analyst David Chappell, if components are the atoms of SCA, then composites are molecules. The composite includes the details of the code, the services provided by that code, and references to any other pieces of code the composite uses. One of the great benefits of SCA is that the skills used to invoke, assemble and create composites will be useable across many vendor environments because of the broad industry support for SCA. Using the same skill set, we can access services built on different architectures like POJOs, EJB session beans, and RMI. This example relies on a set of Java objects and methods that implement a calculator. We know it's much more compact than the kinds of components you will work with in the future—but we are going to use this because having very compact code helps to focus on what's new in SCA and the key pieces of code that will help use and create composites... The SCA application and the SCA composite we've built here are simple. If we're writing an SCA application, as we did in the first part of this article, things don't get any more complicated than this. Create the POJO and call its methods. If we're creating an SCA composite, however, we can do far more sophisticated things. We could add other implementation types, other protocols, and policies such as encryption and authentication to the definition of our composite. Despite that, the original application would still work. If you're an application developer, writing an application with SCA means the code is simpler, there's less of it to write, and there's less of it to maintain. If you're an administrator, packaging your components with SCA gives you tremendous power and flexibility, allowing you to change how your components work without worrying how the applications that use them will be affected.

The DITA Standard Puts Its Money Where Its Mouth Is
Peter Hagopian, InformationWeek

The Darwin Information Typing Architecture (DITA) was initially created by IBM, and has since been accepted as an OASIS standard. Getting started using DITA is actually fairly straightforward—a number of content management systems and XML editors integrate tightly with the DITA Open Toolkit, which is a free download. Why does this matter to content creators? For a number of reasons, not the least of which are that implementing DITA can save you time and a lot of money. There was a terrific DITA case study posted on the Data Conversion Laboratory site recently. [From the article, by Jennifer Linton of CaridianBCT, "How our DITA Conversion Saved us 100 Grand, for Starters: A Case Study in DITA for Globalization and Localization," where Part One narrates how a multi-national, regulated medical device company planned its migration to a DITA CMS by identifying stakeholders and defining personas, establishing a high-level process and system requirements, developing a content model, and figuring out what to do with legacy documents... CaridianBCT is a highly regulated medical device company, and our Technical Communications Department is implementing a content management system, translation management system, and DITA authoring environment. The Technical Communications Department is responsible for authoring and translating all labels for CaridianBCT equipment, Operator's Manuals, Instructions for Use (IFUs); kit, case, and bag labels; service manuals, preventive maintenance procedures, schematics, installation procedures, spare parts instructions, training materials, protocols, Standard Operating Procedures (SOPs), and more. CaridianBCT is also heavily regulated globally, requiring an intensive document control and release process. Some of the regulated bodies the company abides to include the USA FDA, the EMEA Medical Device Directive (MDD), the Canadian Health Canada and Canadian Standards Association (CSA), the APAC International Electromechanical Commission (IEC) and International Standards Organization (ISO), as well as our internal SOPs and MDD definitions. Because each region and country has specific standards, parts of each deliverable may need to be specific to that country while the majority of the content is the same. About three years ago, CaridianBCT asked the question, "How are we going to manage the increasing number of deliverables we have to maintain because of all the different customer needs in each country?" [...] Our information model defines the CaridianBCT-specific guidelines for using DITA. In this document we identify specific DITA elements to use while authoring such as the information types, content units, map elements, body elements, and inline elements we use. It also identifies the metadata, CMS and TMS folder structures, and naming conventions. The CMS/TMS user guides document the specific tasks we perform when using the CMS or TMS. Each system provides general help documentation, but in some instances, we had to identify workarounds or more detailed steps for how to accomplish a task. Also, because the CMS and TMS are separate systems, the user guides identify how to use the integration between the two systems. Some of the tasks we identified include Producing a Deliverable, Adding a Content Reference, Conditionally Filtering a Deliverable, Sending Topics to Translation, and Sending Updates to a Translation Project Already in Progress. These user guides provide ongoing learning tools for people to reference as well as new people coming into the department to learn the environment...

See also: the DITA case study


SEARCH \| ABOUT \| INDEX \| NEWS \| CORE STANDARDS \| TECHNOLOGY REPORTS \| EVENTS \| LIBRARY

Headlines

Sponsors