Cover Pages: XML Daily Newslink: Thursday, 14 May 2009

A Cover Pages Publication http://xml.coverpages.org/
Provided by OASIS and Sponsor Members
Edited by Robin Cover

This issue of XML Daily Newslink is sponsored by:
IBM Corporation http://www.ibm.com

Headlines

Guidelines for Writing Device Independent Tests
Link Relations Proposal: How Link Relations Should Be Named for CMIS
Darwin Information Typing Architecture: DITA for the Masses?
NeoNote: Suggestions for a Global Shared Scholarly Annotation System
Features: EMC Documentum xDB vs. Oracle XML DB & IBM DB2 pureXML
Scalable Vector Graphics: Participate in SVG Open 2009
National Library of Medicine: Evaluation of Digital Repository Software
OGC Geospatial Rights Management Summit

Guidelines for Writing Device Independent Tests
Dominique Hazaël-Massieux and Carmelo Montanez (eds), W3C Group Note

As part of the W3C Mobile Web Initiative, members of the Test Suites Working Group have published a Group Note of Guidelines for Writing Device Independent Tests. This is the First Public Working Group Note of the Device Independent Testing Guidelines, and represent a consensus of the participants. This W3C Working Group was chartered to help create a strong foundation for the mobile Web through the development of a set of tests suites for browsers on mobile devices. The main objective of the MWI Test Suites Working Group is to enable the development of a rich mobile Web experience by providing tests suited to user agents on mobile devices. These tests can then be used by developers of user agents to increase the quality of their software, and by other actors in the market to advocate for better standards support, and to use the knowledge gathered on user agents to develop content that will work well across a large set of devices...

As support for Web technologies grows, it is important that tests writers develop test suites that will work as well as possible across devices. Consider recording the browser identifier with the widely implemented window.navigator.userAgent, as there might be several browsers on one device. When designing device-independent test cases, it is important to acknowledge the limitations of most devices: screen; memory available; network bandwidth, latency and cost; CPU power; extensions capabilities. For tests that require interaction (either for running the test or for submitting results), consider: keyboard or pointing device access and ease of use; human cost of correctly submitting the results, automatic start of the test automatically (e.g. through an onload event)... The first step towards writing device-independent test cases is thus to determine the range of devices on which the test cases would need or are likely to be run. To make that assessment, if the technology is already deployed, consider on what devices it is running. If the technology can only be deployed on devices with a certain level of hardware characteristics, adapt the constraints in the guidelines below to that level. If it is not possible to create a single test that will work well across devices, consider creating several versions of the test, or using server-based content adaptation to adapt it based on the requesting devices...

Link Relations Proposal: How Link Relations Should Be Named for CMIS
Cornelia Davis and Al Brown, OASIS CMIS TC Working Draft

Members of the OASIS Content Management Interoperability Services (CMIS) TC have published a revised working draft for a Link Relations Proposal. This issue is summarized in the CMIS TC Jira project: "The current draft of CMIS defines new link relations. Some of these link relations should result in new values registered in the Atom Link Relations IANA registry, others should be defined as CMIS specific URIs and still others should leverage link relations that are already registered. Cornelia Davis: "I have completed the inventory of current link relations and have made an initial proposal on how each of the link relations should be named. This document inventories all of the link relations currently included in version 0.61 of the RESTful AtomPub binding and suggests that either: (1) an already registered name be used, (2) that a new name be registered with the IANA, or (3) that a CMIS specific URI be used for the name. This document is the starting point for discussions in the TC."

Per the document 'Introduction': "The following document lays out a proposal for naming of the currently proposed link relations. These suggestions are motivated by several things: (A) First, to use existing, registered names wherever possible. This will aid in interoperability, simplifies client development and offers the potential that existing clients do something meaningful with CMIS generated Atom feeds. (B) Identify those concepts that are not specific to CMIS for registration in the IANA. Again, the goal is interoperability. Just as CMIS is stronger through the leverage of existing link relations, future work can leverage the relations that CMIS brought if they are expressed with sufficient generality. (C) Link relations are about defining semantics for a relationship, not about dictating how a client behaves with respect to it, nor do link relations prescribe a media type for the resource that is the target of the link..." Examples: CMIS link relation: 'parent', Naming Suggestion: 'Up'; CMIS link relation: 'repository', Naming Suggestion: 'Service'; CMIS link relation: 'children', Naming Suggestion: 'Down'; CMIS link relation: 'descendants', Naming Suggestion: 'downall'; CMIS link relation: 'allversions', Naming Suggestion: 'Allversions'; CMIS link relation: 'latestversion', Naming Suggestion: 'Latestversion'; CMIS link relation: 'type', Naming Suggestion: 'Describedby'; CMIS link relation: 'Source', Naming Suggestion: 'Via'...

Darwin Information Typing Architecture: DITA for the Masses?
Ann Rockley, CMS Watch

DITA (Darwin Information Typing Architecture) was originated within IBM, and later adopted by OASIS, yet until now "the shoemakers children had no shoes" — in other words, IBM products did not themselves easily support DITA. However a few weeks back, IBM (via FileNet) announced a partnership with Quark to 'Bring DITA to the Masses.' A hyperbolic statement for sure, but beneath the bluster there is some substance. In simple terms the Quark XML Author 3.0 has been integrated with IBM FileNet Content Manager P8 4.5 to provide DITA functionality in an enterprise environment. And though most industry partnerships are barely worth the paper they are written on this one is at least interesting -- if not quite the revolution it proposes. This is not the first time that FileNet has supported XML; they had an integration with Arbortext about 10 years back that enabled companies to author in XML and store in FileNet. However, at that time it was Arbortext that supplied the smarts for managing the XML, FileNet was essentially a "dumb repository." This time is a bit different. IBM has modified FileNet to support DITA and XML, with Quark providing a Word-based XML authoring experience. To be clear, though we welcome this announcement, as always we have to treat it with a good dose of skepticism. Such skepticism is justified as this is not the first time an XML implementation in FileNet has been pushed to the market. Additionally this announcement is not an acquisition of Quark by IBM for the purposes of driving DITA — simply a partnership, and both have many partnerships.

Nevertheless, there are three industry trends that are currently driving the adoption of XML, Component Content Management (CCM) and DITA in the enterprise. (1) It is becoming possible to author in a familiar, non-threatening editor For years, XML editors have presented a technical interface to users, largely because these editors have been around for more than a decade and used to be used by technical authors. (2) DITA for narrative documents: The OASIS DITA for Enterprise Business Documents subcommittee has been working on solutions to present an aggregated view (e.g., document view) of DITA rather than the traditional topic-oriented view. DITA content can now be presented as a document that looks like a traditional Word document, but with DITA topic structure under the covers. (3) Growing awareness of the cost of unstructured content A growing number of organizations now see the unstructured content that exists in narrative business documents as standing in the way of processes that could potentially be automated end-to-end. This lack of structure leads to inconsistency, poor readability, and the inability to reuse content. In many cases, because of the inherent lack of structure, content remain hidden. Moving to highly structured content can be another step towards optimization and automation of content processes...

See also: DITA references

NeoNote: Suggestions for a Global Shared Scholarly Annotation System
Bradley Hemminger, D-Lib Magazine

There is a need for integrated support for annotation and sharing within the primary tool used for interacting with the World Wide Web, which today is a web browser. Based on prior work and user studies in our research lab, we1 propose design recommendations for a global shared annotation system, for the domain of scholarly research. We describe a system built using these design recommendations (NeoNote), and provide an example video demonstrating the suggested features. Finally, we discuss the major challenges that remain for implementing a global annotation system for sharing scholarly knowledge... Overall, we believe the most effective way such an interface could be delivered would be as part of a next generation interface to the web, closer to the original hypertext systems first proposed like Xanadu... Searching and selection of content items should be done from within the single interface to the web that users utilize. In the past, many literature database searches were performed on specific interfaces from providers like BRS, Data-Star, Dialog, and Orbit. Nowadays, most searching is done either via web-based interfaces to these same services, or via freely available web-based search engines like Google. As a result, researchers need to be able to add content items identified on a web page to their "personal digital library" with a single click. Examples include: Zotero, Connotea, RefWorks. These services scrape the marked webpage to automatically capture metadata into their applications. Because users make frequent use of these features only when information saving can be done during their web browsing experience, these search features should be an integrated part of the web browser...

There are several competing options for organizing materials: (1) Hierarchical - organizes materials in the typical hierarchical folders used by most current computer file systems; (2) Content-based - index the content items and access mechanisms that are built around searching the content directly (words in documents) instead of navigating file hierarchies; (3) Tags - tagging associates keywords or descriptive words that are used as surrogates to identify and later search for the content items. Many of these are user-generated today and are referred to as folksonomies... In order to properly index the full content, and to refer to portions of the content, the content must be fully and openly represented. This can and is being done using the XML format. For instance, PUBMED is now utilizing XML to represent their articles. Most of the scientific literature in electronic format is, however, currently stored in PDF format. This was an opportune standard at the time because it allowed for an electronic format that could both be read on the screen and printed at high quality. It is, however, not an open standard (although PDF/A attempts to remedy this) and, more importantly, it does not properly separate content from presentation to facilitate markup at the subdocument level conveniently, or for reuse across multiple display devices...

For text documents we now have the technical capabilities to store and represent the content information separately from the presentation information in XML, and to markup specific pieces of documents, such as sentences, words, or diagrams. This is critical to supporting more accurate statements of citation and reference. We do not, however, have similar notational referencing mechanisms defined for many other data types that are commonly available on the web. Thus, it is important that standard representations for audio, video, graphics, images, statistical analyses, genetic sequences, etc, all similarly evolve to support referencing within their content items, and not just to the entire content item itself. The goal of this article is to prompt others to think more generally and more globally about issues surrounding access, representation, searching and sharing of content items and annotations in digital repositories.

Features: EMC Documentum xDB vs. Oracle XML DB & IBM DB2 pureXML
Carla Spruit, EMC Blog

The growth and maturation of XML technologies are such that they have become an essential feature of any new database rollout. This is true not just for XML and object database vendors, but increasingly for "traditional" relational database vendors as well. XML databases grew out of work in the 1990s on Object Relational Databases, in which XML documents are shredded into tables based upon specific schemas, placing a high reliance upon having a stable schema for data content. Many contemporary XML databases, on the other hand, use a much more generalized mechanism for element, attribute and text content that provides far more optimized access to information, and works far better at storing XML documents. Relational database vendors, whom have had to support both SQL relational and XML data models, have in turn developed hybrid XML/SQL databases that are frequently touted as offering "native" XML database support. These systems are still object-relational in nature, though they usually incorporate a secondary indexing mechanism that provides at least some support that is comparable to XML-only models, if not equivalent. This paper examines two such projects: Oracle XML DB and IBM DB2 pureXML, and compares them with the storage strategies and XML functionality of EMC Documentum xDB...

XML standards focus mainly on the XML document as the top level node. Some standards, like XLink, XInclude, XML Schema, XSLT and XQuery, use URIs to point to an(other) XML document. Many standards (XLink, XInclude) and schemas (DocBook, DITA, S1000D) define links from one XML document to another. You can even define your own links in a schema. In practice, data sets that use these standards and schemas have one thing in common: links between XML documents have URIs based on the file system. An XML database must be able to store multiple XML documents, and it is important that the user can store and access the XML in the database in a way that resembles a file system. For instance, if an XML document contains links to other XML documents, and the links are represented by URIs containing relative paths, these links should still be valid when the documents are stored in an XML database. This is especially true for document-centric content, where the documents will often be exported and processed outside the database. These documents may also be official records (for example XBRL financial statements and Pharmaceutical product labels) and need to preserve their integrity wherever they may be stored. For adequate support of XML standards, therefore, it is important that an XML database support nested collections of XML documents... The product versions described in this paper are: Oracle XML DB 11g Release 1 (11.1), IBM DB2 pureXML 9.5, and EMC Documentum xDB 9.0 [= X-Hive/DB before its acquisition by EMC in July 2007].

The comparison starts by looking at how well these products fit the definition of a "native XML database". We then look at the implementation of various XML features that are important to developers, and to managing the type of content that is increasingly common in business applications....One of the key conclusions to be drawn from this comparison is that relational databases are not XML databases. The latter require different storage methods, access methodologies and validation support than relational databases. This is evident when looking at the IBM and Oracle products—in the attempt to provide the best of both worlds, IBM and Oracle may have inadvertently achieved the opposite, with databases that add a great deal of complexity for SQL developers while never quite achieving the level of support that a full XML database needs to offer. The evolution of XML towards richer structures that combine data- and document-oriented content and more sophisticated linking means that maximum flexibility is required in an XML database. Although the hybrid approach may be sufficient for simpler data structures, sophisticated XML applications need a true native XML database with deep standards support and no tradeoffs in terms of performance and functionality. The mixing of relational and XML processing may at one time have seemed like an advantage, when XML was still relatively new, most data was in a relational format, and XML databases were relatively immature. More and more data is now finding its natural representation in XML, and native XML databases like EMC Documentum xDB have improved their reliability, performance, and ease of use. The hybrid approach now pales in comparison to xDB's robust combination of features for XML storage, metadata, XML-aware indexes, and XML specific content management features like libraries and versioning.

Scalable Vector Graphics: Participate in SVG Open 2009
Staff, W3C Announcement

The SVG Open Conference provides an opportunity for designers, developers and implementers to share ideas, experiences, products and strategies. Scalable Vector Graphics (SVG) an open standard of the World Wide Web Consortium (W3C) enabling high-quality, interactive, animated and stylable graphics to be delivered over the web using accessible, human-readable XML. At the SVG Open 2009 Conference you will have the opportunity to learn to use it to create effective and compelling web content, learn techniques for developing SVG software solutions, and see the latest developments from the W3C. You will meet the creators of SVG applications in person, the authors of the SVG specifications, and you will have the opportunity to provide your own input for future development. The Seventh International Conference on Scalable Vector Graphics will take place from October 2-4, 2009, in Mountain View, California, hosted by Google at the Crittenden Campus. At this conference you can learn about subjects varying from specialized technical visualizations to interactive multimedia art. On the program there are presentations, beginner and advanced level workshops, and the opportunity to meet people from the SVG community, industry and the W3C SVG Working Group...

W3C will again this year sponsor SVG Open 2009. Members of the W3C SVG Working Group will be attending and presenting at the conference, which will include a Working Group panel session on future SVG developments. A day of workshops will also be scheduled adjacent to the main conference. The conference organizers have indicated that proposals for presentation abstracts and course outlines are welcome through 31-May-2009.

See also: the W3C news item

National Library of Medicine: Evaluation of Digital Repository Software
Jennifer L. Marill and Edward C. Luczak, D-Lib Magazine

The U.S. National Institutes of Health (NIH) National Library of Medicine (NLM) undertook an 18-month project to evaluate, test and recommend digital repository software and systems to support NLM's collection and preservation of a wide variety of digital objects. This article outlines the methodology NLM used to analyze the landscape of repository software and select three systems for in-depth testing. Finally, the article discusses the evaluation results and next steps for NLM. This project followed an earlier NLM working group, which created functional requirements and identified key policy issues for an NLM digital repository to aid in building NLM's collection in the digital environment. The scope of the Digital Repository Evaluation and Selection project was to perform an extensive evaluation of existing commercial and open source digital repository systems and software. The evaluation included those systems and software already identified by the Digital Repository Working Group, as well as any new or previously overlooked software. The evaluation was to include hands-on testing against a set of functional requirements based on the Open Archival Information System (OAIS) model -- ingest, archival storage, data management, administration, preservation planning, and access—as specified in the NLM Digital Repository Policies and Functional Requirements Specification. Based on the work of the previous NLM Digital Repository Working Group, the WG scanned the literature and conducted investigations to construct a list of ten systems and software for initial evaluation. The ten systems included: Open Source: DAITSS, DSpace, EPrints, Fedora, Greenstone, Keystone DLS; Commercial: ArchivalWare, CONTENTdm, DigiTool, VITAL.

After completion of all testing, the WG recommended that NLM select Fedora as the core system for the NLM digital repository. The WG was highly impressed with a number of Fedora capabilities, including the strong technology roadmap, the excellent underlying data model that can handle NLM's diverse materials, the active development community, Fedora's adherence to standards, and Fedora's use by leading institutions and libraries with similar digital project goals. Fedora is also seen as a low risk choice for now, as it is open source and no license fees are involved. The WG also recommended that work should begin immediately on a Fedora pilot project using four identified collections of materials from NLM and the NIH Library. Most of these collections already have content files and metadata for loading into a repository. After an initial pilot phase at approximately six to eight months, the effort will be evaluated. NLM senior staff concurred with this recommendation and work has already begun on the pilot implementation.

See also: the Fedora Commons Project

OGC Geospatial Rights Management Summit
Staff, The Open Geospatial Consortium Announcement

OGC has announced a "Geospatial Rights Management Summit" to be held June 22, 2009 at the Massachusetts Institute of Technology, MA, USA. Summary: Geospatial data and services (Earth images, GIS, map browsers, location services, navigation, etc.) have become an integral part of our information environment. But this progress raises issues of security, public access, intellectual property, and emergency use of geospatial information. The issues are complex because geospatial data products are often composed of data from multiple sources which may have different rights and restrictions associated with them. Thus, business and policy issues, not technical issues, are now the industry bottleneck. This summit provides an opportunity to learn about and discuss the Geospatial Digital Rights Management Reference Model (GeoDRM RM) that has been developed by the GeoRM Working Group of the OGC Technical Committee. The GeoDRM RM is an abstract specification for the management of digital rights in the area of geospatial data and services. The OGC membership will use the GeoDRM RM in developing OpenGIS Implementation Specifications for open interfaces and encodings that will enable diverse systems to participate in transactions involving data, services and intellectual property protection...

In e-commerce models for dissemination and use of Intellectual Property (IP) assets, geodata are treated as commodities to be priced, ordered, traded and licensed. Direct monetary reward, however, is often not the motivation or is only secondary behind the desire for more rigorous control of IP assets. Harlan Onsrud of the GeoData Alliance argues that the incentive structures implicit in "library systems" are an appropriate model for motivating data producers, collectors and traders to document, share and otherwise disseminate their geodata. Onsrud observes that the library system is a "chaordic" framework of seemingly ad hoc agreements among stakeholders that strikes a balance supporting "...strong public goods, access and equity principles while fully protecting the intellectual property rights of authors and publishers. Rapid technological advances have tipped the balance of laws that establish incentives for producers to make their content available while maintaining the access, use and equity rights of users. Onsrud envisions the establishment of a framework of operating agreements, similar to that in which libraries develop and share resources, as one way to establish a way for geodata to be more accessible and useful to a larger numbers of users. The specific requirements for managing IP rights by controlling geodata distribution and use, however, are extremely complex and vary widely depending heavily on factors...


SEARCH \| ABOUT \| INDEX \| NEWS \| CORE STANDARDS \| TECHNOLOGY REPORTS \| EVENTS \| LIBRARY

Headlines

Sponsors