Cover Pages: XML Daily Newslink: Tuesday, 19 August 2008

A Cover Pages Publication http://xml.coverpages.org/
Provided by OASIS and Sponsor Members
Edited by Robin Cover

This issue of XML Daily Newslink is sponsored by:
IBM Corporation http://www.ibm.com

Headlines

Data Technology Keeps In Step with NIMS STEP
Cooperative Alert Information and Resource Notification System (CAIRNS)
XSLT Function Library to Process Instance Data Using Schema Information
XForms and RESTful Web Services
SEC Introduces IDEA Database to Collate XBRL Reporting Tags
A Standards-Based Expert System for Annotating XML-in-ZIP documents
Sharpening Your Axis with Visual Basic 9
Quick Start for Persisting XML Standards-Compliant Data
How to Shut Down All Your Machines Without Anyone Noticing

Data Technology Keeps In Step with NIMS STEP
Staff, Interoperability Technology Today

Emergency managers are steps closer to ensuring that technologies supporting response operations adhere to the Common Alerting Protocol (CAP) and the Emergency Data Exchange Language (EDXL) suite of standards. These data messaging standards enable emergency responders to share critical data—such as a map, a situational report, or an alert -- seamlessly across disparate software applications, devices, and systems. The effective exchange of this type of data is essential during emergency response operations. To evaluate product adherence to data messaging EDXL standards, U.S. Department of Homeland Security (DHS) Command, Control and Interoperability Division (CID) is partnering with the DHS Federal Emergency Management Agency's (FEMA) Incident Management Systems Integration (IMSI) program. The initiative, known as the National Incident Management System Supporting Technology Evaluation Program (NIMS STEP), provides an independent, objective evaluation of commercial and government hardware and software products related to incident management. Participation in the voluntary program does not constitute certification of NIMS compliance or an official DHS endorsement of the product... Evaluation activities are also designed to help create a uniform level of compliance and expand technology solutions. NIMS STEP supports NIMS, which identifies the requirements for ensuring interoperability and compatibility among multiple response agencies. NIMS efforts provide a consistent, nationwide approach for agencies at all levels of government to effectively and efficiently manage response operations... In support of these NIMS criteria, NIMS technical standards CAP and EDXL are linked to IMSI testing and evaluation activities. The CAP standard enables practitioners to exchange all-hazard emergency alerts, notifications, and public warnings. Such data can be disseminated simultaneously over many different warning systems, e.g., computers, wireless, alarms, television, and radio. The EDXL suite of standards includes the DE standard, which enables responders to distribute data messages by recipient, geographic area, or other specifications such as discipline type. The EDXL suite also will include the Resource Messaging (RM) and Hospital AVailability Exchange (HAVE) standards...

Cooperative Alert Information and Resource Notification System (CAIRNS)
Renato Iannella, Technology Demonstrator Announcement

Renato Iannella of National ICT Australia (NICTA) has announced the release of demonstrator technology software that supports specifications produced by the OASIS Emergency Management Technical Committee. CAIRNS (Cooperative Alert Information and Resource Notification System), one of the outcomes of work done within the SAFE project at NICTA, has been released as Open Source under the BSD licence. CAIRNS is a demonstrator of technologies that can be used to construct an interoperable CIMS (Crisis Information Management System) architecture. Specifically, CAIRNS implements the following XML-based emergency messaging standards: EDXL - Distribution Element; EDXL - Resource Messaging; Common Alert Protocol (CAP). On the most basic level, CAIRNS is a collection of independent nodes that can join and drop out of the network at will. Messages between nodes are passed using peer-to-peer (P2P) technologies similar to those used in file sharing networks. There is no central node, which means there is no single point of failure that would bring the whole system down. Each node caches the messages it receives and is able to forward them even if the original sender can no longer be reached. A message is purged from the cache when an update arrives or its expiration time is reached. Each node acts both as a SOAP server and a SOAP client. Routing information is attached to the message using EDXL-DE (Emergency Data Exchange Language - Distribution Element).The payload of the message can be in any of the emerging XML incident report formats, such as TWML (Tsunami Warning Markup Language), CAP (Common Alerting Protocol) or EDXL-RM (EDXL - Resource Messaging). Existing systems can be connected to CAIRNS by using gateways that translate the data from existing systems into structured CAIRNS messages and vice versa. The (sample) images show the Cyclone Warning Markup Language (CWML) bulletins mapped on a Google-Maps interface showing the path of a cyclone, its destructive winds, and other relevant information base on the transmitted CWML messages. The CAIRNS software is freely available from SourceForge.

See also: the documentation

XSLT Function Library to Process Instance Data Using Schema Information
Xia Li, IBM developerWorks

W3C Recommendation XML Path Language (XPath) 2.0 introduces a few new sequence type constructs that allow applications to use the type definitions and element declarations in an XML schema to match elements in an XML instance document. The use of these constructs in an application can reduce the complexity of the code and increase generality of the application. Unfortunately, these constructs are currently only available in schema-aware XPath host language processors such as schema-aware XSLT or XQuery processors and these schema-aware processors often couple the validation process closely with the instance data processing. That means an application must perform a validation process to use schema information regardless of whether there is a need to do so. In fact, in many cases the instance data has already validated against a specific schema before it arrives at an instance data processor—this makes validation a redundant and unnecessary process. In such scenarios, it is the schema information that you need to process instance data rather than the validation. In this article, the author introduces a function library implemented in pure XSLT that can enable applications to benefit from these new XPath 2 constructs; at the same time, it will decouple the validation process from the instance data processing. The function library essentially supplies the functions that perform the node tests with given instance elements, schema type definitions, and schema element declarations. It contains the following three categories of functions: Load schema documents; Perform node testing; Locate type definitions or element declarations in schema documents. You can use the function library with any basic XSLT processors to employ the schema information to process instance data without going through the unnecessary validation process The fact that the function library is implemented in pure XSLT will provide great portability of your code as well. And the approach is also applicable to XQuery since nothing in the library inherently prevents you from implementing it in XQuery.

XForms and RESTful Web Services
Philip Fennell, O'Reilly Blog

There was one thing missing from XForms 1.0 that would have made all the difference when trying to access RESTful Web Services: the ability to control HTTP headers when making instance data requests and submissions. What compounded the problem was that many of the implementations either inappropriately (in my opinion) set the HTTP Accept header to */* or just adopted the string used by the host browser. This made it nigh-on impossible to request, in a RESTful fashion, an XML representation of the resource you wish to edit, i.e. opaque URLs and server-side content negotiation. In the XForms 1.1 Candidate Recommendation however, there is a section that doesn't exactly give away its importance. That section is 11.8 The header Element and the significance of the 'xf:header' element, along with its attributes and children, is that it gives you the power to control the HTTP headers of your submission requests. Now if you are at all familiar with the work of the W3C Forms Working Group you won't be surprised to discover that it doesn't stop with just setting the HTTP headers. They have thought about how you set them and had the foresight to provide optional attributes that take an XPath expression, thereby allowing you to set headers and their values based upon instance data. Is that cool or what? I have not checked-out any of the server-side XForms implementations but as for browser plug-ins; FormsPlayer already supports this, the Mozilla XForms plug-in will support it in the next release... [With respect to the example] In the case of FormsPlayer, this results in Internet Explorer's default Accept header being overwritten. On the other hand the Mozilla plug-in takes the stance that the Accept header allows multiple values and therefore appends the new value to the end of the header. For me, I'd have expected the former but can also understand the logic of the latter. However, you could look at it this way, an XForms binding instruction will, likely as not, be defined for one particular representation therefore, saying you'll accept more than one representation doesn't seem very practical in this context. This might lead you to conclude that the overwrite behavior is appropriate...

See also: XML and Forms

SEC Introduces IDEA Database to Collate XBRL Reporting Tags
Roy Mark, eWEEK

The U.S. Securities and Exchange Commission (SEC) has debuted its successor to the agency's EDGAR database with Chairman Christopher Cox proclaiming the new interactive database to be faster, more accurate and more meaningful than the 1980s-era EDGAR. Called IDEA (Interactive Data Electronic Applications), the database is based on a completely new architecture being built from the ground up. The system is built on the XBRL (Extensible Business Reporting Language) tagging format that allows public companies and mutual funds to submit information in a standardized, tagged format for analysis and comparison. The SEC has already formally proposed requiring U.S. companies to provide financial information using interactive XBRL data beginning as early as 2009. The SEC has separately proposed requiring mutual funds to submit their public filings using interactive data. The SEC currently has a voluntary program for tagging financial documents with XBRL. Microsoft, General Electric and United Technologies are already participating in the program. The decision to eventually replace EDGAR marks the SEC's transition from collecting paper forms and documents to making the information itself freely available to the public. "IDEA will ensure that the SEC continues to stay ahead of the needs of investors," Cox said. "IDEA's launch represents a fundamental change in the way the SEC collects and publishes company and fund information -- and in the way that investors will be able to use it." EDGAR will continue to be available for the indefinite future. During the transition to IDEA, investors will be able to take advantage of new interactive, IDEA-like features that will be grafted onto EDGAR in the short run. The transition will make it possible for investors to use IDEA's advanced search capabilities and to use the information from EDGAR within spreadsheets and analytical software, which is not possible with EDGAR. The EDGAR database also will continue to be available as an archive of company filings for past years.

See also: XBRL references

A Standards-Based Expert System for Annotating XML-in-ZIP documents
Rick Jelliffe, O'Reilly Technical

One of the projects I have been working on recently has been a proof-of-concept system to allow a rules-base approach to automatically classifying and annotating XML-in-ZIP documents. A few of my recent posts have been in this area: navigating around ZIP, adding foreign elements, and so on. The brief was for an organization with a large number of documents from multiple sources, but with each source supposed to use stylesheets. The idea was to make a rules base that would distinguish all the different ways that a few structures (titles, table of contents, potentially citations, etc) were represented. This would allow classification of documents according to the structures found, the discovery of outliers and exceptions (e.g. incorrectly marked up documents, or where additional rules were needed), and automated annotation back to the original documents. The approach we have taken is to use Schematron, using the report elements rather than the assert elements. These have opposite logic to the assertions: a report is made whenever you find part of a pattern rather than when it is missing. The rulesbase is a Schematron schema. It can wander around the ZIP archive if it has to, in order to get information to test the main XML file. This generates a report in ISO SVRL, the standard Schematron Validation Report Language, which in particular includes an XPath locator to the matched element. The report also includes dynamically generated text from the document in question, and this gives enough information for the various other later stages. It works! The only real trick is that writing back multiple annotations to the original file (in the form of customXml elements for OOXML files) will have to be done in reverse order, so as not to disrupt the Xpaths which use positionals... The reason for using Schematron for the rules file is that it eliminates programming aspects from creating the rules file. The person maintaining the rules file only needs to understand XPaths, not full XSLT or Java etc. And the hope is that many of these rules will be quite similar, so making new rules will often be just hacking rather than creating entirely new Xpaths. Schematron allows the simplest kind of expert system to be expressed: basically the equivalent of if-thens and case statements but hidden as patterns and rules. This means that the maintainer does not need to have any awareness of higher logic, and so on.

See also: Schematron references

Sharpening Your Axis with Visual Basic 9
Beth Massi and Avner Aharoni, DevX.com

Visual Basic 9 completely eliminates the barrier between the code you write and the XML you're trying to express. Creating, querying, and transforming XML is much more intuitive and productive than ever before. Visual Basic 9 has a new set of language features that allows developers to work with XML in a much more productive way using a new API called "LINQ to XML." LINQ stands for "Language Integrated Query," and it lets you write queries for objects, databases, and XML in a standard way. Visual Basic provides deep support for LINQ to XML through XML literals and XML axis properties. These features let you use a familiar, convenient syntax for working with XML in your Visual Basic code. LINQ to XML is a new, in-memory XML programming API specifically designed to leverage the LINQ framework. Even though you can call the LINQ APIs directly, only Visual Basic allows you to declare XML literals and directly access XML axis properties. This article will help you master these new Visual Basic XML features... Traditionally, when working with XML in code you would use the XML Document Object Model (XML DOM), and call its API to manipulate objects representing the structure of your XML. This API is document-centric, which does not lend itself well to creating XML with LINQ. In addition it uses the XPath language, which is unnecessary when using LINQ. Another way to work with XML would be to drop out of Visual Basic code altogether and manipulate XML using XSLT, XQuery, or XPath directly. With the release of the .NET Framework 3.5, the LINQ to XML API allows developers to work with XML in a much more intuitive way, taking a top-down, element-centric approach. Additionally, in Visual Basic 9 (VB9) you can use XML literals to embed XML directly into your code and bring an even more WYSIWYG approach to XML programming... Working with XML in VB9 via the LINQ to XML API using XML literals and XML axis properties completely eliminates the barriers between the code you write and the XML you're trying to express. The new LINQ to XML API provides a much more intuitive approach to querying, transforming, and creating XML, and in many cases has better performance than its predecessor, the XML DOM.

Quick Start for Persisting XML Standards-Compliant Data
Conor O'Mahony, Native XML Databases Blog

XML-based standards have emerged in many industries. For instance, there is ACORD in insurance, FIXML in financial services, NIEM in government, and so on. Are you evaluating options for persisting standards-compliant XML data? If so, you should know about a great resource. As you know, you can freely download IBM DB2, which is a data server for both relational and XML data. Well, IBM has also made available working demos for a number of XML standards, including ACORD, FIXML, FpML, MISMO, NIEM, OTA, TAX1120, TWIST, UNIFI, and more. The demos show end-to-end XML data exchange, together with data retrieval via RESTful Web services, Atom feeds, and XForms. Note: In separate blog articles, O'Mahony discusses "When to Use a Native XML Database" and differences between 'Native XML' database technologies. "The basic unit of storage in a native XML database is the XML data that is being stored. In other words, XML data is stored 'as is' in a native XML database. What a native XML database does is straightforward. How it does this is not so straightforward. Each vendor implements its native XML database in a different way, with very different performance characteristics and capabilities. When choosing a native XML database, you should carefully consider the implications of choosing one vendor's database over another. The performance, scalability, and capabilities of the applications you are building will vary depending on your choice of database. Depending on which forums you read, there are differing opinions regarding which vendor's implementation offers the best technology.

How to Shut Down All Your Machines Without Anyone Noticing
Jason Hunter, MarkMail Blog

Last week we discovered we had to replace some bad memory chips in two of the three machines we use to run the MarkMail service. This blog post tells the story of how we managed to replace these memory chips without (almost) any of our visitors noticing... The MarkLogic machines have specialized roles. One machine (I like to picture it up front) listens for client connections. It's responsible for running XQuery script code, gathering answers from the other two machines, and formatting responses. The other two machines manage the stored content, about half on each. They support the front machine by actually executing queries and returning content... All email messages go into MarkLogic data structures called "forests". Get it? Forests are collections of trees, each document being an XML tree. Our D1 server manages forests MarkMail 1 and MarkMail 2, the oldest two. They're now effectively read-only because we're loading into higher numbered forests now on D2. Turns out that's a highly convenient fact. It means we could back up the content from D1 and put it on our spare box, now acting like a D3. Then with a single transactional call to MarkLogic we could enable the two backup forests on D3 and disable the two original forests on D1. No one on the outside would see a difference. Zero downtime... Looking back, we're happy that we could cycle through disabling every machine in our MarkLogic cluster yet not have any substantial downtime. Looking forward, we expect operations like this will get easier. If and when we add a permanent E2 machine to the cluster it means we won't have to do anything special to take one of them out of commission. Our load balancer will just automatically route around any unresponsive front-end servers. We were also happy to see that our configuration for SAN-based manual failover works. We proved that as long as another machine can access the SAN, we'll be able to bring the content back online should a back-end machine fail... A non-techie friend once asked why managing a high-uptime web site was hard. I said, "It's like we're driving from California to New York and we're not allowed to stop the car. We have to fill the gas tank, change the tires, wash the windows, and tune the engine but never reduce our speed. And really, because we're trying to add new features and load new content as we go, we need to leave California driving a Mini Cooper S and arrive in New York with a Mercedes ML320." Note: MarkMail is a free service for searching mailing list archives, with huge advantages over traditional search engines. It is powered by MarkLogic Server: Each email is stored internally as an XML document, and accessed using XQuery. All searches, faceted navigation, analytic calculations, and HTML page renderings are performed by a small MarkLogic Server cluster running against millions of messages.

See also: the MarkMail overview


SEARCH \| ABOUT \| INDEX \| NEWS \| CORE STANDARDS \| TECHNOLOGY REPORTS \| EVENTS \| LIBRARY

Headlines

Sponsors