Cover Pages: XML Daily Newslink: Wednesday, 15 October 2008

A Cover Pages Publication http://xml.coverpages.org/
Provided by OASIS and Sponsor Members
Edited by Robin Cover

This issue of XML Daily Newslink is sponsored by:
IBM Corporation http://www.ibm.com

Headlines

XQuery Update: DB2 pureXML in the Health Care, Business, Finance, and IT
W3C TAG: Passwords in the Clear
Yahoo! Releases OpenID Research
IETF BOF Request Text for Authority-to-Citizen Alert Activity
Regulatory Transparency and XBRL
Is Opera's MAMA the Best Search for Developers?
First Legal Shot Across the Semantic Web's Bow: Thomson Suing Zotero

XQuery Update: DB2 pureXML in the Health Care, Business, Finance, and IT
Susan Malaika, Jan-Eike Michels, Christian Pichler; IBM developerWorks

Increasingly, XML is being used as a message exchange format in a variety of industries. Often, industry consortia or governments define the structure of these exchange messages. The growing popularity of different initiatives are encouraging the use of these XML exchange messages. These initiatives include, for example, Information as a Service and Software as a Service (SaaS), along with the dominance of technologies such as Web Services, File Transfer Protocol (FTP), messaging, e-mail, and Web based feed information. As organizations consume and produce these exchange messages, they are also beginning to store the messages directly—for example, for audit purposes. In some systems, these stored messages are the primary source of up-to-date information for supporting the business of an institution or firm. There are cases where it is desirable to modify a stored XML message or derive a new message from an existing stored message. Here are some examples to produce an updated XML message that: (1) Relates to ("links" to) existing information. For example, in health care, additional test results may be produced, so a new patient record is created, based on the existing record, which is then augmented with recent medical results. (2) Incorporates additional information. For example, in business, additional items may be added subsequently to an order. Often, the procedure is to cancel the original order and produce a new order with both the content of the original order and the additional order items. (3) Incorporates modified information. For example, in financial derivatives processing, a new party may replace an existing party at a particular point in time, through a process called novation. XQuery, the language that can be used to query XML documents, has added extensions to perform sub-document updates on XML documents. These extensions make it possible to add new nodes, delete or rename existing nodes, and replace existing nodes and their values... This article explains how to apply the XQuery Update Facility, a W3C standard, to XML stored in DB2. Furthermore, the XQuery Update Facility is then illustrated in the context of four industries: health care, business, financial derivatives, and information technology.

See also: XML and Query Languages

W3C TAG: Passwords in the Clear
David Orchard (ed), TAG Finding

Members of the W3C Technical Architecture Group (TAG) have released a TAG Finding on "Passwords in the Clear," approved at its October 16, 2008 weekly telephone conference. The purpose of this finding is to provide guidance for securely transmitting passwords on the World Wide Web. Clear text passwords are a serious security risk. Digest authentication has significant advantages over clear text passwords, though other security issues arise. The use of an encrypted channel or key exchange is always more secure. When a password is transmitted in clear text, it is vulnerable in many ways: (1) The password is available on the wire. As the password is transmitted over the wire, tools such as packet sniffers or network analyzers can easily monitor the traffic and intercept passwords as they're sent between computers. (2) The password is available in browsing history. Most web browsers provide 'back' navigation to previous pages, with content locally cached for performance as well as ease of use for the user. These pages are stored in memory and are relatively easy to examine. (3) The password is readable on web proxies. Many larger corporations, as well as internet service providers, offer web proxies to allow faster downloads as well as some level of anonymity for web users. It is estimated that between 1 and 2 percent of e-commerce transactions are related to fraud. As customers are becoming more 'net savvy', they are starting to examine web page types and are attempting to only use secure systems. Therefore, any organization that wishes to safeguard its customers' data should start with secure transfers of user login and password information... SOAP messages are often sent using HTTP and any SOAP message is subject to similar password security concerns. While SSL/TLS can be used to secure SOAP-based messages point to point, the issue can be more complex if SOAP intermediaries are used. If confidential information is to be sent as part of the SOAP package, publishers can use use SSL/TLS, XML Encryption, and WS-Security for sensitive data elements... HTML allows authors to create input forms. If a form field is a password, password masking should take place to protect the user from onlookers seeing what is being entered and stop anyone from later using the 'back' button to discover passwords... Password Alternatives: A solution to sending username/password combinations to many different web sites is to use single sign-on or delegated authorization technology, such as SAML, OpenId and OAuth. Another solution is to use client certificate-based authentication. Finally, two-factor security is growing in popularity. An example is security password token generators where a number or PIN that is known to the user is combined with a regularly random generated number by a hardware token as the password.

See also: additional TAG Findings

Yahoo! Releases OpenID Research
Allen Tom, Yahoo Communication

I'm happy to announce that Yahoo! is releasing the results of a usability study that we did for OpenID. Our test subjects were several experienced Yahoo! users (representative of our mainstream audience) who were observed as they tried to sign into a product review site using the Yahoo OpenID service. First, the good news. After the users completed their tests, we explained OpenID to them, and they all recognized the value of being able to easily sign into a new site without having to create a new ID and password. They also appreciated the potential of using their Yahoo OpenIDs to automatically verify their Yahoo email address without having to do manual email verification. Now the bad news. None of the users had heard of OpenID before, and none of them even noticed the OpenID sign-in box displayed below the traditional email/password login form on the site. In many cases, the test subjects entered their Yahoo email address and Yahoo password to try to log in. We had told the test subjects that they could sign into the site using their Yahoo! account without having to register... Observing these tests was more than a bit frustrating for the Yahoo! OpenID team, and the test subjects may have been distracted by the sounds of the groans and head-pounding coming from the other side of the one-way mirror. Certainly there is a lot of work to be done on the OpenID UX (user experience) front. On the Yahoo! side of things, we streamlined our OP last week, and removed as much as we could. We removed the CAPTCHA and slimmed down the OP to just a single screen, and focused the UI to get the user back to the RP. We expect that RPs will enjoy a much higher success rate for users signing in with their Yahoo OpenID. On the RP side of things, our recommendation is that they emphasize to users that they can sign in with an existing account, specifically their YahooID. We believe that the YahooID, as well has IDs from other providers, have a higher brand awareness than OpenID. We also believe that first time users signing in with an OpenID should be able to go directly to their intended destination after signing in, instead of having to complete additional registration. Hopefully, as SimpleReg/AttributeExchange are more widely supported (Yahoo does not currently support them), relying parties will no longer feel the need to force the user through an additional registration form after signing in with an OpenID.

See also: the Yahoo! OpenID report

IETF BOF Request Text for Authority-to-Citizen Alert Activity
Brian Rosen and Steve Norreys, IETF ECRIT Posting

IETF Birds of a Feather (BOF) Request Text was published for possible technical work relating to "Authority-to-Citizen Alert: Notifications of Emergencies from Authorities to Citizens" The BOF Chairs initially identified the IETF Real-time Applications and Infrastructure Area (RAI) as a locus for this work, possibly as a new IETF Working Group or as part of a revised charter for the Emergency Context Resolution with Internet Technologies (ECRIT) Working Group. The BOF will be held in conjunction with the 73rd IETF Meeting, November 16-21, 2008. Summary: ECRIT is working on citizen-to-authority calls. Alerts that are sent from "authorities" (which we define broadly to any level of authority, including, for example, a school administrator) to "citizen" (which we also define broadly to include visitors and other individuals and groups) are of great interest. Many efforts are underway in other fora, but all of them are stovepipes: limited by which authorities can invoke alerts as well as which networks they can be distributed on. What is needed is a more flexible way to deliver alerts from any notifier to any interested or affected individual, via any device connected to the Internet. This is a large problem: some alerts must be delivered to an enormous number of endpoints; a Tsunami warning may involve millions of devices eventually. A system which can deliver messages to billions of endpoints with one notification are obviously attractive attack targets. This is further complicated by the desire to accept notifications from a wide variety of notifiers, complicating authentication mechanisms that might be employed. The security implications of any solutions must be addressed as a intergral part of the solution from the very start of the work... The content of an alert is also a significant challenge to Internet scale alerts. The alert must be rendered on a wide variety of devices with different user interface capabilities. Multiple languages have to be accomodated. Networks with limited bandwidth must be considered. Alert work is being done in many fora, and some is directly applicable to this work. For example, OASIS work on Common Alerting Protocol (CAP) is likely part of the solution. It should also be recognised that this is an area of work that will be highly regulated and as such close work with the fora where regulators operate (e.g. ATIS, ETSI, and ITU-T) may help with the deployment of any solutions. The purpose of the BOF is to determine the need and scope for authority to citizen alerts, what existing protocols and mechansims need to be adapted to meet those needs and the appropriate representation for alerts should be. This work could be accomplished within an expanded charter of ECRIT, or a new work group could be formed. This work will leverage GEOPRIV work on location.

See also: BOFs as IETF Pre-WG Efforts

Regulatory Transparency and XBRL
Kurt Cagle, O'Reilly Technical

In an economy where billions of transactions and hundreds of trillions of dollars move around the globe every day, there is no way that humans by themselves can regulate such systems. Instead, one of the most significant reforms that could take place would be the widespread adoption of the Extensible Business Reporting Language (XBRL) at all levels. The purpose of XBRL is to define and establish a set of standards that can be used to describe, at a programmatic modeling level, common factors of business accounting, as used by the SEC and other regulatory agencies. Last year, Chris Cox, current head of the SEC, endorsed XBRL as an acceptable format for submitting annual reports -- and other countries are moving to the point of not just accepting, but requiring, that an XBRL document be submitted as part of quarterly or annual filings. It may seem hyperbole to expect that a single XML document will radically transform the finance industry, but it has the potential to have a significant impact. One of the central problems that regulators face currently is that most reports and filings are hard copy text rather than electronic form, and even the form of such filings varies dramatically from one company to the next. While it's possible to scan such documents and apply OCR or similar mechanisms, the amount of work necessary to get even one such company's filings into a searchable form is not trivial, and the sheer variety of such form data means that printed documents are essentially unsearchable as so much of the relevant content is not data but metadata (context). The SEC has accepted the use of electronic filings using both text and html through its EDGAR (Electronic Data Gathering, Analysis, and Retrieval) system, and does in fact have an older XML format as part of the EDGAR specification -- XFDL (Extensible Forms Description Language). However, XFDL is, as the name implies, a forms presentation language. It makes it possible to precisely lay out form content for later print production, but it does not in fact contain any formal business semantics about the information contained within this form. XBRL, on the other hand, is a standard intended to model business entities rather than providing presentation. With XBRL you could lay out the assets that you have and their associated asset classes, you could give information about profits and losses achieved, you could provide taxation information... A tectonic shift is taking place in the economy right now, one that is punishing those that have been most abusive of the trust of customers, investors, governments and the taxpayers in those governments. XBRL has the potential to help renew that trust.

Is Opera's MAMA the Best Search for Developers?
Sean Michael Kerner, InternetNews.com

Most search engines search for content. Opera's new MAMA search ("Metadata Analysis and Mining Application") is searching for what's behind the content. It's all about figuring out what websites are made of in terms of markup and technologies. Sure you can easily find that stuff out today without MAMA on a site by site basis (view source/page info etc) but looking at all that info in the aggregate as a search is something that I personally have not seen in the way that MAMA provides. According to the Opera announcement: "... Opera Software has led a first-of-its- kind project to create a search engine that tracks how Web pages are structured on the World Wide Web. When released publicly in the coming months, this engine will help browser makers and standards bodies work towards a more standards-driven and compatible Web. MAMA will help Web developers find examples of usage of features and functions, look at trends and gather data to justify technology to their clients or managers. This will also encourage standards bodies to take into account developers' suggestions about what is happening on the Web in reality and will eventually raise the quality and interoperability of specifications, the Web and browsers. MAMA can also respond to queries as general as 'how many sites use CSS (Cascading Style Sheets)?' (80.4 percent of MAMA's URLs), or 'how many markup errors does the average Web page have?' (47), or 'how many characters does an average Web page have?' (16,400), to more specific queries such as 'what country is using XMLHttpRequest, a critical component of AJAX, the most?' (Norway, with 10.2 percent, within MAMA's URL set). MAMA is up to the task of tackling vague questions that don't have easy answers, like 'how many sites are mobile-ready?' or 'how prevalent is Web 2.0?' Defining a page as being 'Web 2.0' can cover a variety of sub-topics, including the use of micro formats, RSS, JSON (JavaScript Object Notation) and AJAX among numerous other criteria. MAMA is ready to provide the complex answers to indistinct questions where simple answers do not exist..."

See also: the announcement

First Legal Shot Across the Semantic Web's Bow: Thomson Suing Zotero
Danny Weitzner, Open Internet Policy Blog

Last week Thomson Reuters (the owner of EndNote Software, a widely used proprietary tool for collecting and managing scholarly bibliographic information) filed a lawsuit against Zotero, the most popular open source, Semantic Web-enabled bibliographic tool. Zotero, packaged as a Firefox extension, is a handy tool for collecting bibliographic metadata to assist scholars in managing information necessary for their research (news story, complaint). Zotero can import and export a variety of different bibliographic formats and does so in a web-friendly, RDF-enabled way. Exchanging and linking bibliographic information (i.e., the title, author, publication venue) of scholarly communication is an important means to discover new links amongst individual pieces of research that are published around the world. This has been a high priority, for example, in the life sciences where new knowledge can be uncovered by linking individual pieces of research together. The latest beta release of Zotero will read and write EndNote's proprietary metadata format and import and export the citation formats that EndNote provides for a wide variety of academic journals. In response to this, Thomson sued the Zotero developers (an open source community hosted at George Mason University), charging that Zotero (and GMU) reverse engineered the EndNote citation file format in violation of EndNote's end user license agreement (EULA). The key effect of Thomson's suit, if it succeeds, would be to create a legal doctrine that enables software developers to restrict the Semantic Web's potential to promote data interoperability and data integration. The legal issue at bar has to do with reverse engineering and the enforceability of EULAs, both of which are important questions. And, there's a lot of say about whether or not the compliant will stand up to legal scrutiny... the Web community, as well as the scholarly community, ought to pay careful attention to this case because its outcome could have real bearing on how free we will all be in the future to exchange information and realize the knowledge-enhancing benefits of the Web through collaborative research.


SEARCH \| ABOUT \| INDEX \| NEWS \| CORE STANDARDS \| TECHNOLOGY REPORTS \| EVENTS \| LIBRARY

Headlines

Sponsors