Cover Pages: XML Daily Newslink: Tuesday, 21 August 2007

A Cover Pages Publication http://xml.coverpages.org/
Provided by OASIS and Sponsor Members
Edited by Robin Cover

This issue of XML Daily Newslink is sponsored by:
EDS http://www.eds.com

Headlines

Internet Security Glossary, Version 2
Enhancing Search and Browse Using Automated Clustering of Subject Metadata
Instant Message Cancellation Suggestion
Companies Still Seeking ROI from SOA
XForms, XML Schema, and ROX/x2o
Service Platforms Emerge as the Foundation for SOA
SOA for the Business Developer: Concepts, BPEL, and SCA
Dereferencing HTTP URIs
E-Voting Predicament: Not-so-secret Ballots

Internet Security Glossary, Version 2
Robert W. Shirey (ed), IETF RFC

The Internet Engineering Task Force (IETF) has announced the publication of a new RFC (Request for Comments) document in the online RFC libraries. "Internet Security Glossary, Version 2" (FYI #0036, RFC #4949) provides definitions, abbreviations, and explanations of terminology for information system security. The 334 pages of entries offer recommendations to improve the comprehensibility of written material that is generated in the IETF Internet Standards Process documented in RFC 2026. The recommendations follow the principles that such writing should (a) use the same term or definition whenever the same concept is mentioned; (b) use terms in their plainest, dictionary sense; (c) use terms that are already well-established in open publications; and (d) avoid terms that either favor a particular vendor or favor a particular technology or mechanism over other, competing techniques that already exist or could be developed. Each entry is preceded by a character (I, N, O, or D) enclosed in parentheses, to indicate the type of definition: "I" for a RECOMMENDED term or definition of Internet origin; "N" if RECOMMENDED but not of Internet origin; "O" for a term or definition that is NOT recommended for use in IDOCs but is something that authors of Internet documents should know about; "D" for a term or definition that is deprecated and SHOULD NOT be used in Internet documents. This revised Glossary is an extensive reference that should help the Internet community to improve the clarity of documentation and discussion in an important area of Internet technology. The IETF has not taken a formal position either for or against recommendations made by this Glossary, and the use of RFC 2119 language (e.g., SHOULD NOT) in the Glossary must be understood as unofficial. In other words, the usage rules, wording interpretations, and other recommendations that the Glossary offers are personal opinions of the Glossary's author. Readers must judge for themselves whether or not to follow his recommendations, based on their own knowledge combined with the reasoning presented in the Glossary.

See also: XML and Security Standards

Enhancing Search and Browse Using Automated Clustering of Subject Metadata
Kat Hagedorn, Suzanne Chapman, David Newman; D-Lib Magazine

The Web puzzle of online information resources often hinders end-users from effective and efficient access to these resources. Clustering resources into appropriate subject-based groupings may help alleviate these difficulties, but will it work with heterogeneous material? The University of Michigan and the University of California Irvine joined forces to test automatically enhancing metadata records using the Topic Modeling algorithm on the varied OAIster corpus. We created labels for the resulting clusters of metadata records, matched the clusters to an in-house classification system, and developed a prototype that would showcase methods for search and retrieval using the enhanced records. Clustering means taking words and phrases that make up metadata records and gathering them together into semantically meaningful groupings. For instance, a record about the feeding and care of cats can be grouped with a record about the feeding and care of hamsters. To achieve this, we used an automated clustering technique called Topic Modeling, developed at UCI. The power of this technique is in its ability to cluster effectively using decidedly less text than a full-text indexing algorithm. Additionally, while others have tested similar statistical clustering techniques on a repository-by-repository basis, we chose to test this algorithm on a large, heterogeneous corpus of metadata. The Topic Model is an implementation of Latent Dirichlet Allocation, which is a probabilistic version of Latent Semantic Indexing. To create semantically meaningful clusters, the algorithm automatically "learns" a set of topics that describe a collection of records, i.e., without direct human supervision it can discover meaningful clusters of records. The input to the Topic Modeling algorithm was made up of the records from 668 OAIster repositories. The contents of the Dublin Core title, subject and description fields were tokenized (made lowercase, punctuation removed, simple stemming). Additionally, we augmented a standard English stopword list (the, and, that, with, etc.) with words that had little topic value but occur frequently in metadata records, such as volume, keyword, library, and copyright. The final representation of our processed OAIster collection included 7.5 million records, a 94,000 word vocabulary, and a total of 290 million word occurrences. Results indicated that while the algorithm was somewhat time-intensive to run and using a local classification scheme had its drawbacks, precise clustering of records was achieved and the prototype interface proved that faceted classification could be powerful in helping end-users find resources.

Instant Message Cancellation Suggestion
Eric W. Burger (ed)., IETF Internet Draft

IETF announced the availability of a new "Instant Message Cancellation Suggestion" I-D from the online Internet-Drafts directories. The document describes a mechanism for a user agent client to request the marking of a previously sent Instant Message as cancelled. From a user's perspective, the user agent server, or other proxies or relay points in the network, can effectively remove the first message, giving the impression the first message was cancelled. This document describes the utility of such a function, as well as the futility of attempting to create a true message cancellation feature. Some proprietary electronic mail systems provide mechanisms for canceling the delivery of a message after the user sends a first message. Some people call this "recalling a message." Users have demanded this feature, most often to cancel messages sent in error, to cancel a message they wish to supercede with another message, or to cancel the future delivery of a message. Although a useful feature, there are a number of practical problems implementing the feature. The challenge is this: how to provide the functionality the market demands, without over-promising too much to users, while maintaining security, privacy, and non-repudiation of sent messages, in a secure, reliable manner. This document describes such a mechanism for Instant Message (IM) systems. The client creates an IM. That IM includes the Message-ID header, to uniquely identify the message. At a certain point in time, the client decides he does not want the system to deliver the message. Thus he sends a delivery cancellation request against the first message. The client does this by sending a message cancellation request, referencing the Message-ID of the first message. The SIP Extension for Instant Messaging document describes the use of SIP for carrying instant messages. Likewise, the XMPP IM and Presence document describes the use of XMPP for carrying instant messages. Since what we are doing is providing a mechanism at the application level for making a cancellation request, this mechanism is transport-independent. Section 4 of the memo "Formal Syntax" supplies the RelaxNG Schema for the imCancel XML type. Section 6.1 "Schema and Namespace" defines the namespace urn:ietf:params:xml:ns:imCancel". Section 6.2 "Registration of MIME Media Type" registers a 'message' MIME media type with subtype name 'im-cancel+xml', having the charset parameter of 'application/xml' and encoding considerations as specified in RFC 3023.

Companies Still Seeking ROI from SOA
Heather Havenstein, Computerworld

Only 37% of companies using service-oriented architecture (SOA) technology have seen it result in a positive return on investment, according to a report released Monday by Nucleus Research Inc. David O'Connell, an analyst at Wellesley, Mass.-based Nucleus, said that while corporate SOA projects could provide strong ROI for companies, most efforts today "seem to get stranded in little local pockets" of organizations. "SOA is rather narrowly adopted, and it tends to be adopted on as-needed basis," O'Connell said. "It is easy to get ROI on SOA when you have a small handful of services for a narrowly defined set of projects. When you want to go enterprisewide, that is a large jump up. People are not getting over that hump." The report noted that fewer than four in 10 corporate developers use SOA tools and technologies despite their promise of cutting application development costs, In fact, the report found that SOA impacts only 27% of an average company's IT projects. The report also found that developers who did use SOA tools increased productivity by 28%. Nucleus suggested that companies train developers and acquire repository and registry technology to ensure that services that can be reused can be easily found. O'Connell said that because most SOA projects to date have been limited to corporate silos, investments in repository and registry technology have yet to be made. In fact, the report found that fewer than one in three companies now use repositories, registries, SOA competency centers, or other tools and technologies that can help broaden SOA adoption. On average, only 32% of published services are reused.

XForms, XML Schema, and ROX/x2o
Kurt Cagle, XML.com

This article begins with a fundamental question: "If I have an XML schema, is there any way that I can work with that schema to build forms for populating instances of that schema?" Over the years, I've seen a number of variations on this same question, and generally for a pretty good reason. It takes a lot of work to create a schema in the first place, but when you're done, what you end up with, in general, is something that seems like it should be good to generate something; you have data type information, constraint information, enumerations, and enough other pieces that it would seem that making forms from them should be a cake walk. However, the process is generally fraught with more land mines than you might expect... I announced the creation of a new Google Project called ROX Server (now: x2o Object Publishing System); if you are interested in XForms generation and building support for robust XML forms solutions, then I'd ask that you check out the site and become a contributing member... Once the core system is completed, there are three additional areas of code development. The first is a system for building a formal workflow engine into [x2o]. This project is ongoing now and is built partially upon the Schematron core language. When done, this should make it possible to create, comment on, and approve or reject appropriate forms as part of an orchestrated workflow. The second area is the creation of libraries of standard packs that enable functionality for commonly used schemas. The intent here is to make ROX Server usable right out of the box for common tasks... My goal with x2o [ROX Server] is simple: I believe that as we move to an increasingly interactive Web 2.0, the effective processing of XML-based objects not only will happen, but must happen. Currently an incredible amount of time and money is spent trying to build and manage form-based systems; by providing a set of tools for building rich XForms, I see ROX Server and technologies like it able to free up budgets for other, more important (and more interesting) development projects. Update: x2o is an open source project established for the following goals: (1) Create a system to use XML schemas to generate XForms and related XML-based GUIs (2) Build a customization layer into this process so that the generated forms can be tailored to specific needs and intents. (3) Establish a data publishing layer to simplify the creation, modification, submission and processing of XML designed to work with the GUIs. (4) Establish a work flow management system for viewing and approving or rejecting records built via x2o. To do this, x2o works on top of two open source technologies: the eXist XML Database and the Mozilla Firefox XForms add-on, though implementations to other projects and vendors will be included in the near future.

See also: the Google code web site

Service Platforms Emerge as the Foundation for SOA
Greg Pavlik and Demed L'Her, SOA World Magazine

Enterprise software architectures are shifting from collections of applications that are designed around user interfaces to assemblies of reuseable services. The first step in the evolution toward service-based applications was the definition and publication of services encapsulating discrete business functions. The second wave used services in point-to-point combinations using protocols aimed at system interoperability for communication. The next wave of SOA adoption will focus on enabling composite service definitions that combine domain-specific languages for process orchestration, XML transformations, message routing, and business rules. In this article, we'll look at how SOA platforms are evolving to meet these requirements. Specifically, we'll examine three related themes: (1) The nature and role of service platforms that are designed to host composite services and complex business processes; (2) The changes in how applications are described and designed in the new SOA platforms; (3) The importance of key standards in simplifying and commoditizing the integration of services and applications. In particular, we'll look at a series of emerging industry standards that describe how to design composite services implemented using many different implementation languages and protocols. These standards are defined in the Service Component Architecture (SCA) framework.

SOA for the Business Developer: Concepts, BPEL, and SCA
Ben Margolis and Joseph Sharpe, MC Press Book Description

In June 2007, MC Press published "SOA for the Business Developer" in its series 'Master SOA Best Practices'. "Service-Oriented Architecture (SOA) is a way of organizing software. If your company's development projects adhere to the principles of SOA, the outcome will be an inventory of modular units called "services", which allow for a quick response to change. This book tells the SOA story in a simple, straightforward manner that will help you understand not only the buzzwords and benefits, but also the technologies that underlie SOA: XML, WSDL, SOAP, XPath, BPEL, SCA, and SDO. And through it all, the authors provide business examples and illustrations, giving a practical meaning to abstract ideas. A sample Chapter 2 'Services' is available online. The book gives a detailed overview of Extensible Markup Language (XML), including namespaces and XML schema. It describes Web Services Definition Language (WSDL) and SOAP, and gives a tutorial on XML Path Language (XPath). The book provides comprehensive details on BPEL 2.0, a language that coordinates services and whose preceding version is already in numerous products. It introduces Service Component Architecture (SCA), a proposed standard for composing and deploying applications. You're sure to hear more of SCA, which is sponsored by 18 companies, including IBM, Oracle, and Sun Microsystems. It also introduces Service Data Objects (SDO), a proposed standard for representing data in a single way, even if the data comes from different types of data sources."

See also: the Chapter 2 excerpt

Dereferencing HTTP URIs
Rhys Lewis (ed), Draft TAG Finding

"The World Wide Web (Web) is an information space in which the items of interest are known as resources. Resources are identified by Uniform Resource Identifiers (URI). The URIs used in the Web are based on the HTTP scheme. The general syntax of URIs is described in detail in "RFC 3986: Uniform Resource Identifier (URI): Generic Syntax", which also gives many examples. URIs are globally unique within the Web. The way in which URIs are structured provides an important contribution to the ability of the Web to scale to support very large numbers of uniquely identified resources. In particular, the structure of URIs supports delegation of the authority for their allocation. Many resources that have a Web presence are actually documents. Documents provide physical representations of bodies of information. A document might, for example, provide a description of the planet Mars. URIs allow documents to be uniquely identified as resources in the Web. In addition, their Web presence allows the information they embody to be accessed and retrieved. Normally, such retrieval is direct. The resource itself consists of a body of information that is amenable to transmission across the Web within suitable messages. In general, we call such resources information resources because their essence is information... Documents are not the only kind of information resource. The information associated with some resources is provided by computing systems. These perform work when the Web presence of the resource is accessed. Some systems might be able to retrieve data from sources that do not themselves have a Web presence. They may also perform computations in order to assimilate the information that will ultimately be returned to the requester in a suitable representation."

E-Voting Predicament: Not-so-secret Ballots
Declan McCullagh, CNET News.com

Ohio's method of conducting elections with electronic voting machines appears to have created a true privacy nightmare for state residents: revealing who voted for which candidates. Two Ohio activists have discovered that e-voting machines made by Election Systems and Software and used across the country produce time-stamped paper trails that permit the reconstruction of an election's results--including allowing voter names to be matched to their actual votes. Making a secret ballot less secret, of course, could permit vote selling and allow interest groups or family members to exert undue pressure on Ohio residents to vote a certain way. It's an especially pointed concern in Ohio, a traditional swing state in presidential elections that awarded George Bush a narrow victory over John Kerry three years ago. Ohio law permits anyone to walk into a county election office and obtain two crucial documents: a list of voters in the order they voted, and a time-stamped list of the actual votes. "We simply take the two pieces of paper together, merge them, and then we have which voter voted and in which way," said James Moyer, a longtime privacy activist and poll worker who lives in Columbus, Ohio. Once the two documents are merged, it's easy enough to say that the first voter who signed in is very likely going to be responsible for the first vote cast, and so on. ES&S machines are used in about 38 states, according to the Election Reform Information Project, created by the Pew Center on the States. Of those states, Arkanasas, Iowa, North Carolina, Ohio, and West Virginia are among those using ES&S iVotronic machines with paper audit trails.


SEARCH \| ABOUT \| INDEX \| NEWS \| CORE STANDARDS \| TECHNOLOGY REPORTS \| EVENTS \| LIBRARY

Headlines

Sponsors