Cover Pages: XML Daily Newslink: Tuesday, 22 July 2008

A Cover Pages Publication http://xml.coverpages.org/
Provided by OASIS and Sponsor Members
Edited by Robin Cover

This issue of XML Daily Newslink is sponsored by:
IBM Corporation http://www.ibm.com

Headlines

MarkMail Announces Search Support for java.net Mailing List Archives
Society of Broadcast Engineers Promote the Common Alert Protocol (CAP)
RDF, Topic Maps, Predicate Calculus, and the Queen of Romania
UncertML: An XML Schema for Exchanging Uncertainty
Don't Forget UOF: Here Comes EIOffice 2009
The Stateless State: What Were You Doing Five Minutes Ago on the Web?
Handle Errors in XML Parsing with SAX

MarkMail Announces Search Support for java.net Mailing List Archives
Jason Hunter, Mark Logic Announcement

In a July 22, 2008 posting to "The Making of MarkMail" (MarkMail Announce Mailing List), Jason Hunter reported on a cooperative effort with Sun Microsystems and CollabNet to index the mail archives for java.net, which generates some "15,000 human-to-human emails every month..." MarkMail is a free service for searching mailing list archives, with huge advantages over traditional search engines. It is powered by MarkLogic Server: Each email is stored internally as an XML document, and accessed using XQuery. All searches, faceted navigation, analytic calculations, and HTML page renderings are performed by a small MarkLogic Server cluster running against millions of messages. The Project List identifies mailing lists integrated into the Markmail index/search operations. As of 2008-07-22, the list included 916 projects: "each project has its own subdomain, which implicitly limits all searches to the emails of that specific project. That can be useful for bookmarking or linking..." These include the XSL list, World Wide Web Consortium (W3C), WSO2, Apache, Google Groups, etc. Jason says: "Last week, in collaboration with Sun and CollabNet, we loaded the mail archive histories for java.net, Sun's open source developer playground for Java projects and home to projects like GlassFish, jMaki, AppFuse, Grizzly, Hudson, and WebWork. The load includes more than 1,000 mailing lists and roughly 1,000,000 messages... Founded in 2003, java.net is the realization of a vision of a diverse group of engineers, researchers, technologists and evangelists at Sun Microsystems, Inc. to provide a common area for interesting conversations and innovative development projects related to Java technology. With nearly 450,000 total members, java.net encompasses a wealth of community-based knowledge in its ongoing email discussions, which most recently totaled 40,000 emails per month... With such a large community, it's fun to look at community-wide analytics. It's a little-known feature that you can go to our browse page and add an arbitrary query to the URL and it'll show you list-by-list numbers for all messages matching that query. For example, you can view the total number of messages per list throughout time, or the counts for just last week. You can browse the lists where people from "sun.com" have written the most. If you want to see the top lists, do it as a regular search."

See also: the PR

Society of Broadcast Engineers Promote the Common Alert Protocol (CAP)
Barry Thomas and Christopher D. Imlay, Comment to the FCC

Many expect FEMA (U.S. Federal Emergency Management Agency) officials to announce a position on adopting the Common Alerting Protocol (CAP) by the end of July 2008, consistent with the hope of broadcasters and emergency management specialists that CAP will be formally designated "as the official data protocol for creating and sending emergency messages." The Society of Broadcast Engineers (SBE) is one of several groups advocating specifically for adoption of CAP. In May 2008, SBE issued a document outlining a "Strategy for Impementing CAP Emergency Alert System (EAS), proposing an EAS CAP Profile Working Group "to develop the EAS CAP Profile to be mandated by FEMA for use in every state, regardless of distribution means, to ensure interoperability among states and devices." This WG would seek to align with the HazCollect CAP Profile and coordinate with NWS "to ensure that any new EAS CAP Profile can be accepted at NWS for use in generating ongoing NWR SAME alerts to the current generation of NOAA Weather Radios now in use by the public..." Also, on June 18, 2008, 'Comments of the Society of Broadcast Engineers (SBE) Before the Federal Communications Commission' were submitted for "Licensing of the 700 MHz Band D Block Spectrum and Creation of a Nationwide, Broadband, Interoperable Public Safety Network." Excerpt from the submission: "... FEMA was tasked to define and adapt CAP 1.1 and CAP Protocols and how they will be used to improve the President's ability to reach all American citizens. The goal as SBE perceives it is to enhance EAS, so that it can be an effective public warning system to complement and integrate with a growing number of other warning systems... In the original EAS system, short data codes for events, locations, and times of emergency, etc. had to be compressed in the 512-baud FSK protocol and transmitted in seconds as a relay from station to station. While that relay system has performed well, it will not support the data throughput required for CAP-Enhanced EAS. For example, an EAS message for an AMBER alert (event code CAE) indicates that an AMBER alert has been issued in a particular area for a particular time. The current EAS data burst does not contain any information about the description (or photo) of the abducted child, possible routes of travel, vehicles of interest, etc... The Society of Broadcast Engineers respectfully requests that the Commission set aside a total of just 100 kHz in the D block spectrum; i.e., 50 kHz from the D block spectrum in the lower band, and 50 kHz from the D block spectrum in the upper band, exclusively for the Emergency Alert System nationwide... Further, SBE suggests that setting aside this limited amount of EAS support spectrum is quite obviously in the public interest, as it will enable the rapid and proficient deployment of CAP on a nationwide basis and improve the efficiency and performance of the EAS..."

RDF, Topic Maps, Predicate Calculus, and the Queen of Romania
Michael Sperberg-McQueen, 'Messages in a Bottle' Blog

Some colleagues and I spent time not long ago discussing the proposition that RDF has intrinsic semantics in a way that XML does not. My view, influenced by some long-ago thoughts about RDF, was that there is no serious difference between RDF and XML here: from interesting semantics we learn things about the real world, and neither the RDF spec nor the XML spec provides any particular set of semantic primitives for talking about the world. The maker of the vocabulary can (I oversimplify slightly, complexification below) make terms mean pretty much anything they want: this is critical both to XML and to RDF. The only way, looking at an RDF graph or the markup in an XML document, to know whether it is talking about the gross national product or the correct way to make adobe, is to look at the documentation. This analysis, of course, is based on interpreting the propositition we were discussing in a particular way, as claiming that in some way you know more about what an RDF graph is saying than you know about what an SGML or XML document is saying, without the need for human intervention. Such a claim seems patently false, but as far as I can tell it is what some of my colleagues have been trying to persuade me of for years... Thomas Roessler has recently posted a concise but still rather complex statement of the contract that producers of RDF enter into with the consumers of RDF, and the way in which it can be said to justify the proposition that RDF has more semantics built-in than XML. My bumper-sticker summary, though, is simpler. When looking at an XML document, you know that the meaning of the document is given by an interaction of (1) the rules for interpreting the document shaped by the designer of the vocabulary and by the usage of the document creator with (2) the actual content of the document. The rules given by the vocabulary designer and document author, in turn, are limited only by human ingenuity. If someone wants to specify a vocabulary in which the correct interpretation of an element requires that you perform gematriya on the element's generic identifier (element type name, as the XML spec calls it) and then feed the resulting number into a specific random number generator as a seed, then we can say that that's probably not good design, but we can't stop them... [Note: the companion blog article on 'Descriptive markup and data integration' muses upon TRoessler's assertion about specific ways in which "RDF's provision of a strictly monotonic semantics makes some things possible for applications of RDF, and makes other things impossible..."

UncertML: An XML Schema for Exchanging Uncertainty
Matthew Williams, D.Cornford, L. Bastin, B. Ingram; Aston University TR

In this paper the authors present UncertML, an XML schema which provides a framework for describing uncertainty as it propagates through many applications, including online risk management chains. This uncertainty description ranges from simple summary statistics (e.g., mean and variance) to complex representations such as parametric, multivariate distributions at each point of a regular grid. The philosophy adopted in UncertML is that all data values are inherently uncertain, (i.e., they are random variables, rather than values with defined quality metadata)... UncertML version 1.1 (2007-12-17) has flexible support for continuous random variables. This representation is suitable for many environmental variables; however future releases will see added support for discrete and categorical random variables and possibly fuzzy representations. We note that UncertML is designed to encode any type of uncertainty. Whilst the above use cases highlight application to attribute uncertainty, location uncertainty is plainly an issue for geospatial data. We envisage that future iterations of GML might replace coordinates in the geometry types with Uncertainty types. Most data contains uncertainty, arising from sources including measurement error, observation operator error, processing/modelling errors, or corruption. Processing this uncertain data, typically through models (which typically also have errors), propagates the uncertainty. The ability to optimally utilise data relies on a complete description of any uncertainty... Other researchers have highlighted the importance of GIS frameworks which can handle incomplete knowledge in data inputs, in decision rules and in the geometries and attributes modelled. It is particularly important for this uncertainty to be characterised and quantified when GI data is used for spatial decision making. Despite a substantial and valuable literature on means of representing and encoding uncertainty and its propagation in GI, no framework yet exists to describe and communicate uncertainty in an interoperable way. This limits the usability of Internet resources of geospatial data, which are ever-increasing, based on specifications that provide frameworks for the 'GeoWeb'. [Note: This work was funded by the European Commission, under the Sixth Framework Programme, by Contract 033811 with DG INFSO, action Line IST-2005-2.5.12 ICT for Environmental Risk Management. Related work: INTAMAP (INTeroperability and Automated MAPping), a fully automated interpolation service employing Web Services.]

See also: the research web site

Don't Forget UOF: Here Comes EIOffice 2009
Andy Updegrove, Standards Blog

Long time followers of the ODF-OOXML story will recall that there is a third editable, XML-based document format in the race to create the documentary record of history. That contender is called UOF—for Uniform Office Format, and it has been under development in China since 2002, although I first heard and wrote about it back in November of 2006. Last summer, UOF was adopted as a Chinese National Standard, and last Friday the first complete office suite based upon UOF was released. It's called Evermore Integrated Office 2009 (EIOffice 2009 for short)... While Evermore may not cost Microsoft many sales in the West, it could prove to be a formidable opponent in what I expect will evntually be the largest market for desktop software in the world—China, with its 1.3 billion citizens. China is determined to promote its own software industry, and Evermore will also have a distinct price advantage, at least relative to Microsoft's standard list prices. The top edition of EIOffice 2009 will sell for RMB 1,198 ($174.92), as compared to the RMB 4,902 ($717.83) price in China for Microsoft's professional Office 2007 edition... There could be other factors to take into account as well. The Chinese government has been playing a skillful game of cat and mouse with Microsoft since last year. And it's clearly no coincidence that on July 11, Evermore Vice President Cao Shen called for Microsoft to be the first target for China's new anti-monopoly law, which will take effect in ten days' time. Whether Shen is speaking to, or for, the government, of course, remains to be seen. This is not an isolated expression of displeasure with a foreign vendor over a standards-related commercial battle in China. I have noted several recent standards-related articles at Xinhau, the official Chinese news service, that have been unusually hostile not only towards Microsoft, but other western vendors as well, such as Nokia (this time with respect to 3G wireless standards—and perhaps good news for Google and Apple). And then there is the WiFi -- WAPI face off, which refuses to die. WAPI installations, a well as China's home-grown TD-CDMA 3G wireless standard will both be given high profile exposure during the Olympic games All in all, it appears that athletics will not provide the only contest in Beijing in August. Another struggle of Olympian proportions (with far more gold to be won) is about to begin there as well. And while the government may not be able to influence what happens on the playing fields, it is determined not to come home empty handed when it comes to standards...

The Stateless State: What Were You Doing Five Minutes Ago on the Web?
Peter Seebach, IBM developerWorks

Clear thinking about how data persists across retrievals, sessions, processes, and other boundaries can help you improve your Web applications, both present and future. "State" and "persistence" are crucial "terms of art" for computing. They are concepts that arise throughout computing, but have meanings hard to understand from outside the domain. Clarity about these is essential for a developer of modern distributed applications. In general, "state" refers to information about the current conditions of program execution—runtime data stored in memory. "Persistence," by contrast, refers to keeping data between one program execution and another. In a program which iterates through a table in a database, the database itself is "persistent," but the information about which row is being displayed is "state." When applied to a protocol, "state" treats each series of interactions as having continuity, much like a single program's state. A "stateless" protocol is one in which there is no such continuity; each request must be processed entirely on its own merits. HTTP is conventionally considered a stateless protocol... HTTP is fundamentally a request-response couple: A browser requests a particular URL, perhaps with supplementary data, and the server answers with a response page. While the end-user might experience his or her surfing as a trip made up of steps in a particular sequence, at the protocol level each delivered page is independent of the others; any display is simply the output corresponding to the latest URL-plus-data input. How is it, then, that Web developers talk about "sessions," "logins" and "logouts," "personalization," "hijacking," and other such inherently stateful ideas? HTTP is supplemented by several devices which give it state functionality; that is, standards subordinate to the HTTP definition give mechanisms that, among other functions, can be interpreted as state interfaces. Most Web frameworks and browsers layer higher-level interfaces (a programmable Session object, for example) that simplify development. At a general level, though, they encapsulate one or more of the following basic session-maintenance mechanisms in an abstract interface... As mentioned throughout this article, state mechanisms expose a risk. What if someone other than the intended user gets access to the user's state information? There are a number of other features which may help reduce the risks of trying to keep state for HTTP...

Handle Errors in XML Parsing with SAX
Brett D. McLaughlin, IBM developerWorks

With the ease of XML parsing in the newer Java language APIs, from JAXP to JAXB to JAX-WS, XML parsing has become foundational to Java programming. But with the abstractions and higher-level APIs comes an apparent loss of control over the fine-grained interactions between a parser and your XML data. This typically leads to more errors or worse, a complete halt of parsing when even the smallest problem arises. Fortunately, the Simple API for XML (SAX) still provides an easy-to-use means of dealing with errors and you can access that mechanism even when you don't use SAX directly... Error handling must be considered a frontline part of your application development. In fact, most users report a stronger memory of a program's errors—and how those errors were handled—than any other component of an application or site. Great features are great; horrendous error handling is horrendous. That sounds overly simple, but taking even 10 percent more time to work on handling errors gracefully or even preventing errors from occurring at all will dramatically improve user experience. When it comes to handling XML parsing and processing errors, the key is not really a particular SAX interface as much as it is understanding what drives XML processing. Once you realize that SAX underpins most XML processing, you know that SAX is then the key to good error handling. If five years from now, another XML parsing API has supplanted SAX, you should learn that API to the extent that you can work with it. Getting access to a SAX XMLReader is trivial, but knowing what to do with that interface is not. In fact, that's really the key to error handling: Understand the system that you work with and the lower levels of that system. You don't need to start pushing and popping in assembly language, but you do need to know that SAX is the key XML parsing API in use today. Error handling then becomes an issue of implementation and execution...


SEARCH \| ABOUT \| INDEX \| NEWS \| CORE STANDARDS \| TECHNOLOGY REPORTS \| EVENTS \| LIBRARY

Headlines

Sponsors