Cover Pages: XML Daily Newslink: Wednesday, 20 May 2009

A Cover Pages Publication http://xml.coverpages.org/
Provided by OASIS and Sponsor Members
Edited by Robin Cover

This issue of XML Daily Newslink is sponsored by:
Microsoft Corporation http://www.microsoft.com

Headlines

Transcending SOA: OMG Announces Business Ecology Initiative (BEI)
Office XML Document Format Tools Released
IETF Internet Drafts: Atom Hierarchy Extensions and Collection Discovery
Reaping Deep Web Rewards Is a Matter of Semantics
Open Database Alliance: New Direction for mySQL
Guidance on Interoperation and Implementation Reports
Yahoo Vows Death to the '10 Blue Links'

Transcending SOA: OMG Announces Business Ecology Initiative (BEI)
Dave West, InfoQueue

Object Management Group (OMG) has announced the "Business Ecology Initiative" (BEI) with IBM as Founding Sponsor. BEI "is focused on erasing the artificial lines between business and Information Technology (IT) so that IT becomes a ubiquitous, integral and vital asset to the company and leads decision-making, structural change and enterprise-wide quality initiatives, drives efficiency and revenue, and provides measurable, clear return on investment. OMG intends to ground the BEI in the Actionable Architecture, which provides details on how to create sustainable business processes. "An Actionable Architecture", according to the OMG's FAq document, is "one that supports Business Ecology through the output of business process analysis and supports decisions that maximize value and minimize waste. It supports the creation, automation, termination and optimization of business processes, applies to governance of these processes, identification of opportunities to promote sustainability (minimize waste in the process) and provides for the reuse of existing processes to support new business needs." The OMG sees BEI as an amalgamation of earlier technologies, like SOA, and plans to "use its proven ability to bring communities together and motivate new initiatives through offerings beyond standards work and into industry collaboration forums such as: (1) SOA Consortium: A community of business analysts and IT end-users dedicated to sharing experiences to maximize the effective transition to a Service Oriented Enterprise. (2) BPM Consortium: A community of business analysts and IT end-users dedicated to sharing experiences to maximize the performance of business processes. (3) GCIO: A community of business analysts and IT end-users dedicated to promoting and implementing sustainable business practices.

In addition, OMG intends to work on supporting standards and maturity models, including: [A] BPMM: A standardized methodology for measuring an organization's success at adopting BPM for process efficiency across the enterprise. [B] Green Computing Maturity Model (GCMM): A set of standard business practices to measure against organizational business practices to maximize efficiency and minimize environmental footprint for sustainability. [C] Architecture Driven Modernization: A set of standards for modeling legacy systems so that they may be readily connected to and transitioned to more modern systems methodologies and technologies, and related standards for optimizing software assurance and software quality. [D] Software Defined Radio: A set of standards to allow a wide array of different communication technologies to interoperate in the field on an as-needed basis without predefined interoperability. [E] SysML Modeling Language: A shared standard language for defining large, complex systems so that they may be effectively designed, optimized and combined. Each of the above items A-E could, in itself be an ambitious undertaking. Combining them into a single initiative promises to generate numerous challenges and opportunities. In its entirety, BEI is less a combination of existing technologies like SOA and BPM. BEI should be seen as a means of transcending those technological and methodological solutions to achieve something that is greater than the whole. BEI should transcend SOA and similar efforts and initiatives...

See also: the OMG announcement

Office XML Document Format Tools Released
Kurt Mackie, Application Development Trends

Microsoft announced that new tools have been released to help further extend the compatibility and interoperability of Office Open XML (OOXML) document formats used in Microsoft Office 2007. The new tools are being developed by various open source projects. In addition, the Fraunhofer Fokus research group is working on a future "test library and validation tool" that will check document formats to see how well they comply with ISO/IEC 29500 and ECMA-376, which are OOXML-based international standards. Microsoft is a partner in the validation tool effort, which was announced in late February, 2009. One of the open source projects releasing a new tool is Apache POI, which works to make OOXML files readable in Java-based applications. On Monday, Apache POI 3.5 beta 5 was released at the Apache POI Web site, along with a software development kit. This latest release adds "improved support" for .DOCX (Word) and .PPTX (PowerPoint) file formats, as well as "extended support" for the .XLSX (Excel) file format, according to a Microsoft announcement. Microsoft first began collaborating with the Apache POI project back in March of last year. On Friday, MindTree and Microsoft released the Open XML Document Viewer v1.0 application. This browser plug-in, available at the CodePlex open source project site, allows Microsoft Office 2007 documents to be read in a Web browser. The Open XML Document Viewer, which translates OOXML-based files to HTML, now supports the Opera browser on both Windows and Linux. Other supported browsers include Firefox and Internet Explorer versions 7 and 8. Microsoft and Dialogika have enhanced an Office Binary to Open XML Translator application by adding support for .XLS and .PPT files. This application lets the user translate Office binary files into OOXML and OpenDocument Format (ODF) files. Finally, the Open XML-ODF Translator add-in for Microsoft Office got some improvements with version 3.0, which was released in late March on SourceForge. Microsoft supported ODF 1.1 with this translator release. Native support for ODF 1.1 is now part of Microsoft Office 2007 Service Pack 2, which was released in late April...

See also: on Fraunhofer FOKUS support

IETF Internet Drafts: Atom Hierarchy Extensions and Collection Discovery
Colm Divilly and Nikunj Mehta (eds), IETF Internet Drafts

Two related specifications have been released as initial version -00 Internet Draft documents. The "Hierarchy Extensions for Atom" specification defines mechanisms for hierarchical navigation among Atom feeds and entries. Editor's publication comment: "Based on feedback received on this and the atom-protocol list as well as others interested in hierarchical relations in Atom, we have split out the hierarchical navigation and representation portions from the the initial document 'draft-divilly-atompub-hierarchy-00'. This was done with the intention of achieving consensus on the Atom syntax to be used for parent/child like navigation separately from how such resources are manipulated." Details: "Many applications, besides blogs, provide their data in the form of syndicated Web feeds using formats such as Atom (RFC 4287). Some such applications organize Atom Entries in a hierarchical fashion similar to a file system. This specification describes a means of communicating about Atom Entries that are hierarchically related to each other since resource identifiers are opaque to clients and cannot be directly manipulated for the purposes of representation exchange, i.e., navigation. This specification proposes new XML markup to extend the Atom Syndication Format and new link relations to obtain representations of hierarchically related Atom resources... Hierarchy Model: A hierarchy exists when a resource indicates the likelihood of a parent and/or a child resource. The terms parent and child are indicative of the need for the former to exist before the latter can be created... The Atom Syndication Format defines the Atom Entry construct. The extensions in this specification define two specialized kinds of Entry construct—parent Entry and child Entry. A parent Entry is a container for child Entries. A parent Entry could itself be a child of another parent Entry. Every Entry construct is represented as an Atom Entry Document referred to in this specification as an "entry" and its plural. A logical Feed comprising entirely of child entries of a given Entry is called its child feed and one comprising entirely of its parent entries is called its parent feed. Both parent feed and child feed are seen from the perspective of a given Entry resource. The entries in the parent feed and child feed of an Entry SHOULD be disjoint, i.e., not share any entries. A parent entry contains a "down" atom:link for its child feed. A parent entry may also contain a "down-tree" atom:link for a child feed of a subset of the descendants of that parent Entry.A child entry contains an "up" atom:link for its parent feed or entry if the child only allows a single parent. A child entry may also contain an "up-tree" atom:link for a parent feed of a subset of the ascendants of that child Entry...

The "AtomPub Guidelines for Collection Discovery" specification recommends best practices for discovering AtomPub Collection resources as applicable to various content representation formats. Editor's publication comment: "Given the discussion on this list about discovery of AtomPub collections, it is useful to have a clear path to take as well as guidance about which mechanisms to use under certain conditions. Therefore, we feel that a new RFC is needed to clarify the way in which collections ought to be identified. Some of the material for this RFC is taken from the existing hierarchy I-D. The idea is that the best practices for collection discovery are best kept separate from new stuff..." Details: "Atom Publishing Protocol (RFC 5023) is used for several applications that consume and produce an unbounded number of Collection resources. This document introduces guidelines over existing syntactic techniques for identifying Collection resources in use from various Internet content representation formats, including feeds and HTML type resources. Previously, AtomPub has introduced two mechanisms for collection discovery. However, in the absence of suitable guidelines, there is no clarity about which mechanism is suited for a particular purpose. In general, where other Atom or AtomPub syntax is in use, this document recommends the use of the app:collection element. Where only a link element may be used, this document recommends the use of the service link relation. When an Atom Processor encounters an Atom Feed Document, and the processor is capable of performing AtomPub operations, then it is valuable to determine whether that feed is modifiable. Some processors incorrectly assume that a feed may be modified at the exact same URI from which it is obtained. AtomPub Processors need a way to determine whether and where new entries can be added to a Feed. If a syndicated feed document is modifiable using AtomPub, then the processor can indeed manipulate the feed. For this purpose, processors can benefit from Collection metadata that is present in the feed document. When an Atom processor encounters an HTML or XHTML document, and the processor is capable of performing AtomPub operations, then it is useful to determine the available AtomPub resources with which it can interact. The host format, though, limits the ability to specify metadata embedded in the content being processed. Therefore, the best approach for a server is to specify a link with relation "service" to provide a URI to the AtomPub Service Document that includes all the metadata corresponding to one or more Collections.

Reaping Deep Web Rewards Is a Matter of Semantics
Greg Goth, IEEE Internet Computing

Case-sensitive passwords are common on the Internet. And now, perhaps, the global community of Internet developers might have to prepare for a case-sensitive semantic Web. The lowercase 's' semantic Web that might offer users a far richer Internet experience differs dramatically from the Semantic Web — a specific framework defined by the W3C — that has hovered tantalizingly just out of mass reach for almost a decade. Researchers are exploring several different approaches to providing an alternate semantic experience, from domain-specific academic research engines to commercial offerings from established companies such as Google and startups such as Kosmix... Numerous academic and commercial researchers are exploring ways to access and index the contents in the deep Web — mainly content hidden behind HTML forms in databases — in order to offer users more information to answer their queries. Currently, much of the academic research centers on domain-specific aspects of the deep Web because academic funding is inadequate to tackle form discovery and indexing over the entire Web. The communities might approach the issue from slightly different vectors, but there is near consensus that the crux of delivering richer material to the Internet user is developing a way to access deep Web and surface Web pages and provide some sort of semantically aware architecture to constrain query results...

[Researchers] Geller and Chun wish to go beyond indexing-form labels and form-field values of the deep Web: "The DeepPeep search engine looks for domain-specific forms that may lead users to desired deep Web contents. Our initial approach to extracting Web form labels, to use them as index terms, is similar to their approach reported in VLDB (Very Large Data Base) 2008. However, what we advocate is to annotate the forms in a way such that even the generic search engines such as Google and Yahoo can locate the deep Web forms. This requires not only the labels used in the forms to be indexed, which seems to be the predominant method used in DeepPeep, but that the semantic contents of the deep Web also be available for search..." Significant advances in semantic enrichment—in both the deep and surface Web—will face different hurdles in different settings. Although academic researchers might be hampered by the inability to create an infrastructure scalable enough to attract large numbers of users, commercial entities such as the large search engines might find it difficult to alter their revenue-producing platform architectures to accommodate nascent semantic technologies, even technologies that don't rely on orthodox Semantic Web elements such as the Resource Description Framework and Web Ontology Language...

See also: W3C Semantic Web

Open Database Alliance: New Direction for mySQL
Dave West, InfoQueue

Establishment of The Open Database Alliance, a vendor-neutral consortium designed to become the industry hub for the MySQL open source database," was announced on May 13th. Monty Program Ab, a MySQL database engineering company, and Percona, a MySQL services and support firm, are the initial members of this alliance. The stated intent of the Alliance is: 'to unify all MySQL-related development and services, providing a solution to the fragmentation and uncertainty facing the communities, businesses and technical experts involved with MySQL.' The alliance will fork mySQL development using MariaDB, "a branch of the MySQL database that includes all major open source storage engines, including the Maria transactional storage engine." The announcement contains no mention of the purchase of Sun Microsystems by Oracle, but it would be difficult to not see the Alliance as, in part, a reaction to fears about the future of the open source mySQL under Oracle's stewardship. Similar concerns have been expressed about Java... The announcement lists three founding members of the Alliance: Monty Program Ab (Widenius' company and primary developer of MariaDB), Percona (support and consulting for mySQL and LAMP), and Open Query (training). the addition of other members is pending until membership and participation rules are clearly defined, but the Alliance is expected to be:"open to all businesses, organizations and individuals interested in helping create a new, centralized resource for MySQL and to ensure that it remains a top quality, high performance open source database."

Guidance on Interoperation and Implementation Reports
Lisa Dusseault and Robert Sparks (eds), IETF Internet Draft

Lisa Dusseault (Area Director for IETF Applications Area) and Robert Sparks (Area Director for IETF Real-time Applications and Infrastructure Area) have published an updated draft for the document Guidance on Interoperation and Implementation Reports, intended as an IETF "Best Current Practice" RFC that will update RFC 2026, "The Internet Standards Process—Revision 3." Document abstract: "Advancing a protocol to Draft Standard requires documentation of the interoperation and implementation of the protocol. Historic reports have varied widely in form and level of content and there is little guidance available to new report preparers. This document updates the existing processes and provides more detail on what is appropriate in an interoperability and implementation report."

Details: "The IETF Draft Standard level, and requirements for standards to meet it, are described in RFC 2026. For Draft Standard, not only must two implementations interoperate, but also documentation (the report) must be provided to the IETF... Moving standards along the standards track can be an important signal to the user and implementor communities, and the process of submitting a standard for advancement can help improve the standard or the quality of implementations that participate. However, the barriers seem to be high for advancement to Draft Standard, or at the very least confusing. This memo may help in guiding people through one part of advancing specifications to Draft Standard... Having and demonstrating sufficient interoperability is a gating requirement for advancing a protocol to Draft Standard. Thus, the primary goal of an implementation report is to convince the IETF and the IESG that the protocol is ready for Draft Standard. This goal can be met by summarizing the interoperability characteristics and by providing just enough detail to support that conclusion. Side benefits may accrue to the community creating the report in the form of bugs found or fixed in tested implementations, documentation that can help future implementors, or ideas for other documents or future revisions of the protocol being tested. Special Cases include: Deployed Protocols; Undeployed Protocols; Schemas, languages and formats; Multiple Contributors, Multiple Implementation Reports; Test Suites; Optional Features, Extensibility Features.

Yahoo Vows Death to the '10 Blue Links'
James Niccolai, PC World Magazine

Yahoo has offered a peek at how its search results are likely to be displayed a few months from now, as it tries to find a better alternative to the traditional "10 blue links." "People don't really want to search," said Prabhakar Raghavan, head of Yahoo Labs and Yahoo's search strategy, in a [recent] meeting with reporters in San Francisco. Their objective is to quickly uncover the information they are looking for, not to scroll through a list of links to Web pages. Yahoo's answer is to try to figure out the "intent" of the person conducting the search, and then present various types of information within the results that relate to what they are looking for, such as restaurant reviews, movie times, flight schedules and so on. Yahoo showed a slightly different page layout for displaying search results that it's currently testing with users. Search results for the name of a restaurant lead off with a map showing its location, followed by links to an aggregated selection of reviews, photos and directions. Yahoo is revamping its image search in a similar way.

Moving away from the "blue links" is something all the main search companies have been exploring. Even Google, which dominates Web search and has the least to gain from disrupting the status quo, has been blending news, video and other content with its results. Microsoft CEO Steve Ballmer has admitted how tough it is to beat Google at its own game, and suggested that the only way to win market share in search is to change the playing field and do things differently... Part of the challenge is figuring out the user's intent. "You cater to the user's intent as best you can define it," he said. For example, there are many towns in the world called Syracuse, but if a person is searching for "Syracuse restaurant" and it's 6 p.m. Eastern Time, there's a good chance they are in New York because that's where it's time for dinner. The other challenge is creating the web of objects. Yahoo plans to do it with software algorithms but also using "the wisdom of crowds." Specifically, it will use data provided through its SearchMonkey project, which encourages site owners to provide structured data about the content on their Web sites. Still, building the web of objects is a long-term effort and will apply to only a fraction of search queries to begin with. "This is going to take years" to complete, Raghavan said.


SEARCH \| ABOUT \| INDEX \| NEWS \| CORE STANDARDS \| TECHNOLOGY REPORTS \| EVENTS \| LIBRARY

Headlines

Sponsors