Cover Pages: XML Daily Newslink: Wednesday, 02 July 2008

A Cover Pages Publication http://xml.coverpages.org/
Provided by OASIS and Sponsor Members
Edited by Robin Cover

This issue of XML Daily Newslink is sponsored by:
Oracle Corporation http://www.oracle.com

Headlines

AtomServer: The Power of Publishing for Data Distribution
Graphs Versus Objects
ISO Approves PDF as an International Standard
Event-Based Architectures
XMPP Plus RabbitMQ Gateway = High Performance Twitter?
Searching Mailing Lists with MarkMail
Beyond XML and JSON: YAML for Java Developers

Cover Pages

Balisage 2008 Conference in Montreal Continues Extreme Markup Tradition

AtomServer: The Power of Publishing for Data Distribution
Bryon Jacob and Chris Berry, InfoQueue

"Consider this: you work for a company with a federation of fully independent web sites, implemented in half a dozen different programming languages, on several different platforms. Each independent website has its own database system and schema, and is managed by teams with varying skill sets, located in eight sites throughout the United States and Europe. And the company is growing. Your job? Enable these disparate systems to share crucial data conveniently and rapidly amongst themselves. Your design criteria are: (1) High Traffic Capability: the service would need to move approximately 1M pieces of data a day at launch; (2) Transactional Correctness: the service must be accurate as the authoritative source of data for all clients; (3 )Resiliency: the service must be easy to upgrade with seamless data republishing when formats change; (4) Loose Coupling: with so many systems, each must be able to manage themselves independently; (5) Adoption: the system must have a low barrier to entry for clients implemented in a variety of languages—Java, C#, PHP, Ruby, and ColdFusion; (6) Adaptability: the system must support many different types of data and be extensible to add new types of data on demand... We were faced with exactly this problem about a year ago at Homeaway.com... it didn't take long to recognize two design tenets. First, that a distributed, publish-subscribe service is a great way to address resiliency and loose coupling of subsystems, and second, that building RESTful services (as opposed to heavyweight protocols like SOAP) is a natural solution for systems that need high scalability, extensibility, and ease of adoption. These two principles led us directly to Atom (a RESTful publishing protocol) and to a new breed of data service called an Atom Store. We've spent the last year implementing an Atom Store for Homeaway. And from that real-world implementation we have extracted the open source Atom Store framework, named AtomServer, described in this article... AtomServer is in live production use at our company, handling more than a million requests a day, with several million entries in our store. Building on a RESTful specification such as Atom while leveraging the design of existing services like GData have ensured a solid foundation on which to build. We hope that you will pick up a copy and tell us what you think...

See also: Atom references

Graphs Versus Objects
John Hebeler and Matt Fisher, Dr. Dobb's Journal

Software objects are the de facto programming paradigm for engineering intelligence into modern computer systems. Objects' labyrinth of inheritance, polymorphism, and encapsulated data, intermeshed with ifs, whiles, and for loops, are the basis for flying airplanes, producing health diagnoses, and surfing the Web. Sometimes we escape this rigid paradigm and place the program intelligence elsewhere, such as databases and files. In most cases, knowledge solutions are a hybrid of approaches. Each method has its advantages and disadvantages. An alternative approach (graphs) offers a contrast to these traditional holders of programming intelligence. Graphs have improved significantly with the coming of the Semantic Web, where graphs are a key tenet. In this article, we introduce graphs through a comparison with objects. This approach illustrates some key advantages while stirring up a little controversy. Some would say we should start with comparing graphs to databases and other similar approaches, but this would constrain graphs to a more traditional role. Graphs, as you will see, can help in all areas of knowledge management, including Web 2.0 and beyond. As the developer, you must constantly choose between the trade-offs of the various methods, such as programming steps themselves, databases, and files. Here are some key concepts to consider: Expressiveness, Integration, Resource Use, Scalability, Interrogation, Flexibility, Integrity (consistency and correctness)... How does Web 2.0 impact these attributes? Web 2.0 represents three significant trends: scale, change, and integration. Web 2.0 has evolved the emphasis on systems—they must scale rapidly, quickly adapt to new possibilities, and easily integrate with others. Thus to be Web 2.0 enabled, you must carefully consider how your development choices incorporate these Web 2.0 trends. Additionally, the intelligence of your program becomes even more of a key asset -- you no longer must do everything from moving data bits to a fancy GUI. If you can incorporate the trends into your solution, many Web 2.0 possibilities are already there for your integration. [The authors provide some code examples that highlight the differences between objects and graphs; from the worked example, the authors conclude:] Refactoring object code on a frequent basis to support such dynamic activity becomes burdensome. Realistically, we can never plan for all the different items our clients will buy and sell; graphs let us better deal with such uncertainty. Scalability, flexibility, and ease of integration are easily met using the graph paradigm, for the intelligence is in the data and not the code.

ISO Approves PDF as an International Standard
Elizabeth Montalbano, ComputerWorld

The ISO has approved Adobe Systems Inc.'s widely used Portable Document Format as an international standard, and the organization is now in charge of any changes made to the PDF specification. The format is open and accessible to anyone as ISO 32000-1, the standards body said Wednesday. The standard is based Adobe's Version 1.7 of PDF. PDF, the file format for Adobe's Acrobat software, has long been used as a standard way for people to exchange and view business documents. However, Adobe kept a proprietary hold on the format until it finally succumbed to industry pressure and submitted it for standardization in February 2007. Adobe's move reflected an industrywide trend to standardize broadly adopted file formats to increase interoperability among different applications that people use to create business documents. According to ISO's announcement: "The new standard, ISO 32000-1, Document management -- Portable document format—Part 1: PDF 1.7, is based on the PDF version 1.7 developed by Adobe. This International Standard supplies the essential information needed by developers of software that create PDF files (conforming writers), software that reads existing PDF files and interprets their contents for display and interaction (conforming readers), and PDF products that read and/or write PDF files for a variety of other purposes (conforming products). Future versions of the format will be published as subsequent parts of the standard by the ISO subcommittee in charge of its maintenance and development (SC 2, Application issues, of ISO technical committee ISO/TC 171, Document management applications). [The standard] costs 370 Swiss francs and is available from ISO national member institutes."

See also: the announcement

Event-Based Architectures
Ted Faison, DDJ

Event-Based Architectures (EBAs) simplify system design, development, and testing because they minimize the relationships between the parts of a system... What Is an EBA? An EBA is an architecture based on parts that interact solely or predominantly using event notifications, instead of direct method calls. An "event notification" is a signal that carries information about an event that was detected by the sender. While you're probably familiar with events like button clicks, events can be defined to include almost any conditions or occurrences you can think of. Notifications can be used to carry any domain-specific information you want, and in any type of system—embedded, GUI-based, distributed, or other. There are different ways to deliver notifications, but the most common technique uses an indirect method call that is made through a pointer initialized at runtime. At design time, the compiler is unaware of what object (and perhaps even what type of object) the pointer will be referencing when the call is made. In an EBA, each part emits signals related to its internal state and reacts to signals received from other parts. This simple input/output model has important consequences: Each part can be developed and tested in isolation from the rest of the system, because it knows nothing about the other parts. In a well-designed EBA, the relationship of complexity versus size tends to be more linear than exponential, so the larger the system is, the better off you are with an EBA, compared to other conventional approaches. The larger a system gets, the more you can benefit from Event-Based Architectures. The individual parts, be they classes or components, have little or no type coupling to the rest of the system. This is especially important for testability. EBAs are eminently testable. They can be tested incrementally and can be developed using a test-driven approach. You can develop and test every major part of a system in isolation. Very cool. Over the years, I have developed many different types of software systems using EBAs. People sometimes find it perplexing that the salient classes in an EBA have no associations between them, but this is often a good thing because EBAs reduce coupling in order to reduce complexity. I have found that signal wiring diagrams are a good way to document and model EBAs. Although they are different from most of the diagrams you're probably seen before, they are easy to understand and easy to create.

XMPP Plus RabbitMQ Gateway = High Performance Twitter?

Dave Rosenberg "The LShift team just released a RabbitMQ to XMPP gateway proof-of-concept. RabbitMQ [RabbitMQ Open Source Enterprise Messaging] is an implementation of AMQP, the emerging open source standard for high performance enterprise messaging. Think of AMQP as the open source version of something like MQ Series or other high-volume JMS servers. XMPP is open XML technology for presence and real-time communication. Message volume shouldn't be a problem for services like Twitter. RabbitMQ with an XMPP gateway might be part of the the solution. The combination means you can inject very high volumes of messages into IM or other XMPP enabled applications with far fewer scale issues..." From the RabbitMQ FAQ: RabbitMQ implements AMQP's "TX" message class, which provides atomicity and durability properties to those clients that request them... RabbitMQ enables developers of messaging solutions to take advantage of not just AMQP but also one of the most proven systems on the planet. The Open Telecom Platform (OTP) is used by multiple telecommunications companies to manage switching exchanges for voice calls, VoIP and now video. These systems are designed to never go down and to handle truly vast user loads. And because the systems cannot be taken offline, they have to be very flexible, for instance it must be possible to 'hot deploy' features and fixes on the fly whilst managing a consistent user SLA... Messages that are published in persistent mode are logged to disk for durability. If the server is restarted, the system ensures that received persistent messages are not lost. The transactional part of the protocol provides the final piece of the puzzle, by allowing the server to communicate its definite receipt of a set of published messages...

See also: the RabbitMQ FAQ document

Searching Mailing Lists with MarkMail
Brian d'foy, use Perl; List

"A couple of months ago, MarkLogic imported a bunch of Perl mailing lists into their MarkMail service. After only a couple of minutes playing with the service, I really liked it. Instead of making a query, getting results, then trying again, MarkMail lets me start with a broad topic and refine the search by looking at intermediate results. I wanted to find out more. For 'The Perl Review', I interviewed Jason Hunter, the Principal Technologist at Mark Logic, to find out more. I also made a screencast of me using the service —as well as pounding my laptop with a mallet. You'll see that Jason listed some corpus sizes for different subjects. Although Perl has half a million messages, so other subjects are huge too. We can boost Perl's numbers by getting more lists into MarkMail. Although MarkMail initially imported 75 of the Perl lists, they want to import more. In the interview, Jason has instructions on what to do. Basically, you write to them and tell them to import your list. If you have dead, historical lists, they can import those archives too. Easy peasy." As to the (short) video: "Brian d foy shows how easy it is to search the Perl mailing lists using MarkMail. Unlike other searches you may have done, MarkMail lets you start with a broad topic and incrementally refine the search based on intermediate results." The video is an independent production and is not associated with MarkLogic or MarkMail. We just really like the service." [Note: 'MarkLogic Server provides role- and task-aware delivery of content thanks to its full support of XML and XQuery. Unlike traditional relational database and search solutions, it leverages both the structure and text of XML content. This lets your products and applications deliver content at the granularity your customers require, via a range of web services and interfaces including RSS, REST, and SOAP.']

See also: the video

Beyond XML and JSON: YAML for Java Developers
Jacek Furmankiewicz, DevX.com

Despite all the buzz generated by dynamic languages (Ruby, Groovy, Python, etc.) and their related frameworks (such as Ruby on Rails), the vast majority of Java developers reading this article deal mostly with pure Java at their day jobs and will continue to do so for many years to come. However, that doesn't mean that they can't learn something from the new kids on the block and add a new tool to their arsenals. This article introduces the YAML (short for YAML Ain't Markup Language) file format (popularized by the Ruby on Rails framework, which uses YAML for all of its configuration files) and shows how it differs from XML and JSON. It goes on to examine YAML's advantages and drawbacks... As you can plainly see from the examples [presented], YAML is noticeably less verbose than XML. Most of a YAML file's content is the actual data, not endless lists of opening and closing tags, which themselves are often larger that the data they describe. As such, YAML is much better suited for any sort of data file that you may need to maintain by hand. On the downside, YAML does not provide the concept of a schema or DTD, so there is no way to verify whether the format of the file is what you expected. XML's verbosity has its costs, but the overall maturity of that format provides a lot of extra tools for validation that YAML does not have (yet). JSON is perfect for any data that is geared towards efficiency and reducing file size, because it wastes almost no space on whitespace or closing tags. However, as the content of a JSON file increases in complexity, it descends into closing-bracket hell. This is most painfully visible in JavaFX code (which is based around JSON). A UI structure that is any more complex results in a data file that becomes nearly incomprehensibly complex towards the end.

Selected from the Cover Pages, by Robin Cover

Balisage 2008 Conference in Montreal Continues Extreme Markup Tradition

"Balisage 2008: The Markup Conference" continues the popular Montreal conference series (formerly "Extreme Markup Languages") under a new title "Balisage." The word Balisage in Montreal French means "markup", or "electronic text encoding, especially in the form of markup such as XML or HTML..." While the conference name has changed, the conference is still "Extreme" in its historic sense, and is predicated on a community conviction that "There is nothing so practical as a good theory". The Balisage 2008 Conference organizers have published the complete program listings for the main conference (August 12 - 15, 2008) and for the "International Symposium on Versioning XML Vocabularies and Systems" on August 11, 2008. The people involved with Balisage are a different breed, which accounts for the program's attractiveness to regular attendees—some seeking refuge from commercialism felt in the typical industry-driven conferences. Listings for the Balisage Conference Committee, Advisory Board, and speakers reveal what the web site declares: "The people making Balisage include markup theoreticians, practitioners, data modelers, developers, and aficionados. We work as software developers, academics, librarians, system architects, lexicographers, integrators, archivists, document managers, standards developers, and programmers." Balisage 2008 Conference topics include languages and processes for manipulating XML, the Semantic Web and semantically-based document markup, resource-oriented architectures, ontology design, schema mashups, constraint management, real-time generation of topic maps, secure publishing for social networks, managing overlapping annotations over the same primary data, the social limitation of interoperability in digital libraries, implementation of XSD 1.1 conditional-type assignment, and a host of others.


SEARCH \| ABOUT \| INDEX \| NEWS \| CORE STANDARDS \| TECHNOLOGY REPORTS \| EVENTS \| LIBRARY

Headlines

Cover Pages

Sponsors