Cover Pages: XML Daily Newslink: Friday, 18 July 2008

A Cover Pages Publication http://xml.coverpages.org/
Provided by OASIS and Sponsor Members
Edited by Robin Cover

This issue of XML Daily Newslink is sponsored by:
Sun Microsystems, Inc. http://sun.com

Headlines

Experimental RFCs from Email Address Internationalization Working Group
Information from Google: Hitting 40 Languages
An ESB for the Web?
NETCONF Event Notifications
What Makes for a Successful Protocol?
BizTalk Services Have Been Updated
Interview with Kenton Varda: Google Open Sources Protocol Buffers
Linus Torvalds, Geek of the Week
Tech Giants Tackle Information Overload

Experimental RFCs from Email Address Internationalization Working Group
Jiankang YAO, Wei MAO, and Abel Yang (eds), IETF RFCs

The IESG has approved the two specifications from the IETF Email Address Internationalization Working Group as Experimental RFCs. An IETF "Experimental" designation typically denotes a specification that is part of some research or development effort subject only to editorial considerations and to verification that there has been adequate coordination with the standards process. If the IETF may publish something based on this on the standards track once they know how well this one works, it's Experimental. This IETF WG was chartered to study email address internationalization problems and create proposed solutions. Background and History: Mailbox names often represent the names of human users. Many of these users throughout the world have names that are not normally expressed with just the ASCII repertoire of characters, and would like to use more or less their real names in their mailbox names. These users are also likely to use non-ASCII text in their common names and subjects of email messages, both in what they send and what they receive. This protocol specifies UTF-8 as the encoding to represent email header field bodies. The traditional format of email messages RFC 2822 allows only ASCII characters in the header fields of messages. This prevents users from having email addresses that contain non-ASCII characters. It further forces non-ASCII text in common names, comments, and in free text (such as in the Subject: field) to be encoded (as required by MIME format RFC 2047. The two experimental documents form the core specification for an extension to SMTP and RFC 2822 that allow the use of UTF-8 in message headers without encoding. This includes the use of non-ASCII characters in email addresses, both on the left and right hand sides of the '@' character. The documents have been extensively reviewed by people with mail expertise. There have been reports of implementations, but no interoperability tests have been reported to date. (RFC #1) The "Internationalized Email Headers" RFC specifies an experimental variant of Internet mail that permits the use of Unicode encoded in UTF-8, rather than ASCII, as the base form for Internet email header field bodies. It removes the blanket ban on applying a content-transfer-encoding to all subtypes of message/, and instead specifies that a composite subtype may specify whether or not a content-transfer-encoding can be used for that subtype, with "cannot be used" as the default. This form is permitted in transmission only if authorized by an SMTP extension, as specified in an associated specification. And this specification updates section 6.4 of IETF RFC 2045 to conform with the requirements. (RFC #2) "SMTP Extension for Internationalized Email Address" specifies an SMTP extension for transport and delivery of email messages with internationalized email addresses or header information. The extension is identified with the token "UTF8SMTP". In order to provide information that may be needed in downgrading, an optional alternate ASCII address may be needed if an SMTP client attempts to transfer an internationalized message and encounters a server that does not support this extension.

Information from Google: Hitting 40 Languages
Mario Queiroz, Google Blog

"One of our goals is to give everyone using Google the information they want, wherever they are, in whatever language they speak, and through whatever device they're using. A huge part of that goal is making our services available in as many languages as possible. And as I'm sure you can imagine, that isn't as easy as simply as translating a few lines of text. Take Hebrew or Arabic, which are written from right to left. An Arabic speaker may search for [example 'world cup football 2008'] where part of the query will be written from right to left in Arabic, while the numbers will be written left to right. Sometimes the right-to-left difference can mean having to change the entire layout of a page, as with Gmail. Or take Russian, where words change depending on their placement and role in a sentence. In Russian, for example 'pizza in Moscow' is encoded [see example] but 'pizza near Moscow' [differs markedly]. Then there's the whole challenge of ensuring that queries are locally relevant. While many Australians searching for 'freedom' are looking for the Australian furniture chain, UK and US users are often looking for the definition of the word itself. Our search results, then, have to take into account these local differences... In 2007, we undertook a company-wide initiative to increase the availability of our products in multiple languages. We picked the 40 languages read by over 98% of Internet users and got going, relying heavily on open source libraries such as International Components for Unicode (ICU) and other internationalization technologies to design products... Today we have more than 30 products in more than 30 languages, up from 5 products in 30 languages just a year ago. In 2004, we had 150 local-language versions of various products (e.g., a product local to the UK, not just the English-speaking world), while today we're at more than 1500. From January to March of 2008, we launched 256 local-language versions of various products, compared to 55 in the same period of 2007. And we have upgraded to Unicode 5.1 to make sure that we can handle any characters people read or write in..."

An ESB for the Web?
Jim Stogdill, O'Reilly Radar

Imagine my surprise when I saw the web acting a bit like the enterprise with the launch of Gnip. [The Gnip API provides notification of activities (events) occurring in a variety of services and, whenever possible, a GUID that identifies the activity itself vis a vis the service it was created on; API users have two primary roles Publishers and Subscribers.] As the web moves toward a network of widespread transactional API's, each with it's own vocabulary, it is starting to look a lot like a legacy enterprise writ large or maybe like an industry eco-system... In the enterprise space, faced with the N-squared problem, you probably define an enterprise vocabulary, build a bunch of services that conform to it (or buy your applications from vendors that provide them), and then hook them all together through your Enterprise Service Bus (ESB). ESBs by definition support web services interfaces, provide translation services, and process orchestration on top of a message routing backbone. They usually come from vendors that probably used to sell Message Oriented Middleware (MOM) (of both the store and forward and pub/sub variety), Application Servers, Enterprise Application Integration (EAI), and even Export Transform and Load (ETL) software. There are also a growing stable of open source versions built on standards like Java Business Integration (JBI)... [Getting back to Gnip as the ESB for the web]: When I first saw their drawing on the web site (RSS, REST, Comet, XMPP, Atom—handled via the Gnip protocol bridge, which enables you to act as though the entire Web uses your preferred data exchange protpcol) I immediately thought "Cool, I bet they are using a JBI backbone with a service engine for translation and a bunch of binding components to deal with XMPP, HTTP, SIP, RSS, and etc." Because I come from the enterprise space this seems like a natural use case for an ESB, and for JBI in particular... It seems that there is a growing trend towards the use of XMPP as a generic XML routing bus, a role that makes it look suspiciously like an ESB... You may be thinking "If message oriented middleware is the backbone of many ESB's, why isn't Gnip using Amazon's SQS as the foundation for the Web's ESB?" After all, SQS is essentially a simple web-friendly message bus. The simple answer is latency. SQS has performance characteristics more like store-and-forward-style MOM than like pub/sub MOM. Because of that it is more suitable for use cases that need guaranteed delivery but that can support average latencies on the order of one second (and may be high as ten seconds)... From the Blog: "We built a system that connects Data Consumers to Data Publishers in a low-latency, highly-scalable standards-based way. Data can be pushed or pulled into Gnip (via XMPP, Atom, RSS, REST) and it can be pushed or pulled out of Gnip (currently only via REST, but the rest to follow). This release of Gnip is focused on propagating user generated activity events from point A to point B. Activity XML provides a terse format for Data Publishers to distribute their user's activities. Collections XML provides a simple way for Data Consumers to only receive information about the users they care about... As a Consumer, whether your application model is event- or polling-based Gnip can get you near-realtime activity information about the users you care about. Our goal is a maximum 60 second latency for any activity that occurs on the network. While the time our service implementation takes to drive activities from end to end is measured in milliseconds, we need some room to breathe. Data can come in to Gnip via many formats, but it is XSLT'd into a normalized Activity XML format which makes consuming activity events (e.g. 'Joe dugg a news story at 10am') from a wide array of Publishers a breeze..."

See also: the Gnip web site

NETCONF Event Notifications
S. Chisholm and H. Trevino (eds), IETF Proposed Standard Protocol

The IETF RFC Editor Team announced the availability of a new Standards Track Request for Comments in online RFC libraries: "NETCONF Event Notifications." The document is now an IETF Proposed Standard Protocol, and is a work product of the Network Configuration Working Group. The document specifies an Internet standards track protocol for the Internet community, and IETF requests discussion and suggestions for improvements. The IETF NETCONF Working Group was chartered to produce a protocol suitable for network configuration, whereas "configuration of networks of devices has become a critical requirement for operators in today's highly interoperable networks. Operators from large to small have developed their own mechanisms or used vendor specific mechanisms to transfer configuration data to and from a device, and for examining device state information which may impact the configuration. Each of these mechanisms may be different in various aspects, such as session establishment, user authentication, configuration data exchange, and error responses. The NETCONF protocol is using XML for data encoding purposes, because XML is a widely deployed standard which is supported by a large number of applications. The NETCONF Event Notifications document defines mechanisms that provide an asynchronous message notification delivery service for the NETCONF protocol. This is an optional capability built on top of the base NETCONF definition. It defines the capabilities and operations necessary to support the service. Document section 4 specifies the XML Schema for Event Notifications.

What Makes for a Successful Protocol?
Dave Thaler and Bernard Aboba (eds), IETF RFC

IETF announced the availability of a new Informational Request for Comments in the online RFC libraries. An IETF "Informational" specification is published for the general information of the Internet community, and does not represent an Internet community consensus or recommendation. RFC #5218 "What Makes for a Successful Protocol?" The document discusses "success" from several points of view, and makes recommendations about questions that should be asked when evaluating protocol designs. The Internet community has specified a large number of protocols to date, and these protocols have achieved varying degrees of success. Based on case studies, this Informational RFC document attempts to ascertain factors that contribute to or hinder a protocol's success. It is hoped that these observations can serve as guidance for future protocol work... Two major dimensions on which a protocol can be evaluated are scale and purpose. When designed, a protocol is intended for some range of purposes and was designed for use on a particular scale. According to these metrics, a "successful" protocol is one that is used for its original purpose and at the originally intended scale. A "wildly successful" protocol far exceeds its original goals, in terms of purpose (being used in scenarios far beyond the initial design), in terms of scale (being deployed on a scale much greater than originally envisaged), or both. That is, it has overgrown its bounds and has ventured out "into the wild"... The case studies described in Appendix A of the document indicate that the most important initial success factors are filling a real need and being incrementally deployable. When there are competing proposals of comparable benefit and deployability, open specifications and code become significant success factors. Open source availability is initially more important than open specification maintenance. In most cases, technical quality was not a primary factor in initial success. Indeed, many successful protocols would not pass IESG review today. Technically inferior proposals can win if they are openly available. Factors that do not seem to be significant in determining initial success (but may affect wild success) include good design, security, and having an open specification maintenance process. Many of the case studies concern protocols originally developed outside the IETF, which the IETF played a role in improving only after initial success was certain. While the IETF focuses on design quality, which is not a factor in determining initial protocol success, once a protocol succeeds, a good technical design may be key to it staying successful, or in dealing with wild success. Allowing extensibility in an initial design enables initial shortcomings to be addressed...

BizTalk Services Have Been Updated
Abel Avram, InfoQueue

BizTalk Labs "is where Microsoft shares early access to experimental connectivity and business process technologies in order to get feedback from customers... whereas an Enterprise Service Bus (ESB) is a commonly deployed set of technologies that most large organizations use to make it easier to build and maintain complex Enterprise applications, an Internet Service Bus (ISB) is the evolution of this approach that leverages advances on the Internet to make it easier to connect applications between organizations and to integrate with browsers, RSS, and other Web technologies...An Internet Service Bus consists of a set of integrated hosted services that includes: naming; application messaging (including routing and publish and subscribe); identity and access control; and workflow and business process management." Abel Avram reports that "BizTalk Labs has updated its range of connectivity and business process services through the BizTalk Services SDK which offers access to the following services: Workflow, Identity, Windows Live ID Credentials, Unauthenticated Access, TransportClientCredentials, HTTP Connectivity Mode. The BizTalk Labs SDK works on Windows Vista, XP, or Server 2003. Internet Explorer 7 and the .NET Framework v3.0 Runtime and SDK are necessary to use the SDK. BizTalk service summary: (1) Workflow: BizTalk Services has added a new service for running Workflows for service orchestration in the BizTalk Services cloud. (2) Identity Service Scopes: The Identity Service now allows for creating per-service access control management scopes with delegation of management authority between users. (3) Windows Live ID Credentials: You can now use Windows Live ID as credential for obtaining tokens. (4) Unauthenticated Access: For all connection modes, services can opt out of the client authorization facility provided by the Relay and allow unauthenticated client access. (5) TransportClientCredentials: Refactored, WCF-aligned API for configuring/setting credentials for accessing the Relay, replacing the 'raw' TokenProviders. (6) HTTP Connectivity Mode: New connectivity mode allowing RelayedOneway, RelayedMulticast, and RelayedDuplex services to listen on the Relay using HTTP (port 80). From the web site: "Keep in mind that the technologies available at BizTalk Labs are experimental. In many cases we have not decided on what they will be named, whether they will become fully released products, or how we will charge for them."

See also: InfoWorld

Interview with Kenton Varda: Google Open Sources Protocol Buffers
Kurt Cagle, XML.com

Data messaging formats represent the life-blood of any distributed application. The ability to pass information back and forth between disparate systems becomes crucial for any organization, but for companies such as Google, the challenge of setting up communications between the thousands of different servers that host the various Google services forced the need for a specialized format that met their needs in particular. Recently, Kenton Varda, an engineer working on search engine infrastructure at Google, became the point man for releasing Google's internal messaging format (called Protocol Buffers) as an open source project using the Apache license. Kurt Cagle spoke with Kenton about Protocol Buffers, why they are important to Google and why the decision was made to open source them—and why use an internal format rather than a format such as XML, JSON or related technology. Excerpts from Varda in the interview: "Practically all our internal data formats, for both RPC and storage, are based on Protocol Buffers. Many apps need the them for performance reasons, but they are also often used just because it's the path of least resistance. Protocol Buffers are good when you have structured data which you need to encode in a way that is both efficient and extensible. The second point is important: a lot of people ask why we didn't just use various existing binary formats, and the answer is usually that those formats do not provide easy extensibility... Of course, XML and JSON provide extensibility as well, but Protocol Buffers have an advantage over them in efficiency—Protocol Buffers are both smaller and faster to parse. Furthermore, the data access classes generated by the Protocol Buffer compiler are often more convenient to use than typical SAX or DOM parsers. Of course, lack of human-readability can be a serious disadvantage depending on the use case. That said, XML is a much better solution when you need to encode documents composed primarily of text with markup. Protocol Buffers provide no obvious way to interleave text with structured elements. XML and JSON are also better if you need a human-readable format -- although there is a standard way to encode Protocol Buffers in text, it provides no real advantages over JSON... Contrary to what many people are saying, our intent with this release is not to 'kill XML'. We simply believe that while XML works very well in the situations for which it was designed, it is not the ideal solution for every problem. XML is inherently inefficient both in terms of size and parsing speed since it is a text-based format. In many applications, these inefficiencies don't matter, but for us they make a big difference. Furthermore, XML, despite being a simplification of SGML, is still a very complicated standard, and many of its features just get in the way in a lot of cases. Protocol Buffers are designed to be very simple conceptually."

Linus Torvalds, Geek of the Week
Richard Morris, simple-talk

Linus Torvalds is remarkable, not only for being the technical genius who wrote Linux, but for then being able to inspire and lead an enormous team of people to devote their free time to work on the operating system and bring it to maturity. An acknowledged godfather of the open-source movement, Linus Torvalds was just 21 when he changed the world by writing Linux. Today, 17 years later, Linux powers everything from supercomputers to mobile phones. In fact ask yourself this: if Linux didn't exist, would Google, Facebook, PHP, Apache, or MySQL? Excerpt on one topic (patents): Richard Morris: 'Do you think software patents are a good idea?' Linus Torvalds: "Heh—definitely not. They're a disaster. The whole point (and the original idea) behind patents in the US legal sense was to encourage innovation. If you actually look at the state of patents in the US today, they do no such thing. Certainly not in software, and very arguably not in many other areas either. Quite the reverse: patents are very much used to stop competition, which is undeniably the most powerful way to encourage innovation. Anybody who argues for patents is basically arguing against open markets and competition, but they never put it in those terms. So the very original basis for the patents is certainly not being fulfilled today, which should already tell you something. And that's probably true in pretty much any area. But the reason patents are especially bad for software is that software isn't some single invention where you can point to a single new idea. Not at all. All relevant software is a hugely complex set of very detailed rules, and there are millions of small and mostly trivial ideas rather than some single clever idea that can be patented. The worth of the software is not in any of those single small decisions, but in the whole. It's also distressing to see that people patent 'ideas'. It's not even a working 'thing'; it's just a small way of doing things that you try to patent, just to have a weapon in an economic fight. Sad. Patents have lost all redeeming value, if they ever had any'." Note: other 'Geeks of the Week' in the series include: Tim Berners-Lee, CmdrTaco (slashdot founder) and Richard Hipp (SQLite creator).

See also: 'Geeks of the Week' series

Tech Giants Tackle Information Overload
Holly Jackson, CNET NEWS.com

Your BlackBerry buzzes with a text from your boss, snapping you out of your Twitter-surfing trance. Your friend calls you and tells you to check out his Facebook profile, as you respond to your spouse's instant message about dinner plans. All the while, your in-box is overflowing with new e-mail messages. If humans were like computers, our screens would be frozen—overloaded by information and too much multitasking. The term "information overload" has floated around for years and been the topic of much analysis, but the situation remains. According to recent research by enterprise research firm Basex, these distractions are now costing the American economy more than $650 billion in lost productivity, and taking up 28 percent of workers' time. Such numbers led Intel engineer Nathan Zeldes and other tech industry insiders to form the new Information Overload Research Group. The nonprofit consortium, whose members include Microsoft Research, IBM, and Google employees, recently held its first conference in New York, with members meeting at sessions with titles like "No Time to Think" and "Visionary Vendors." Now that the group has had its inaugural gathering, Zeldes, its president, said IORG will continue to recruit members and financial sponsors from a scope of business sectors. With more minds applied to finding a solution to what IORG calls "the world's greatest challenge to productivity," Zeldes hopes to generate innovative ideas that can benefit both businesses and individuals. With a reported 281,000 terabytes of information created worldwide in 2007, streamlining and compiling data with software is one way technology can wrangle the information influx, Vanderbroek says. Of course, given that most of the parties involved in the IORG have created hardware and software that contributes to information overload, one might question why those same people would want to hinder it... Companies also have to think about balancing their employees' lives. Information overload outside of work, like using a BlackBerry on weekends or vacations, could hinder the work-life balance, leading to decreased worker satisfaction. Zeldes also points out the problem is not just affecting technology companies or large corporations.


SEARCH \| ABOUT \| INDEX \| NEWS \| CORE STANDARDS \| TECHNOLOGY REPORTS \| EVENTS \| LIBRARY

Headlines

Sponsors