The Cover PagesThe OASIS Cover Pages: The Online Resource for Markup Language Technologies
Advanced Search
Site Map
CP RSS Channel
Contact Us
Sponsoring CP
About Our Sponsors

Cover Stories
Articles & Papers
Press Releases

XML Query

XML Applications
General Apps
Government Apps
Academic Apps

Technology and Society
Tech Topics
Related Standards
Last modified: May 28, 2002
XML Articles and Papers. July - September 2001.

XML General Articles and Papers: Surveys, Overviews, Presentations, Introductions, Announcements

References to general and technical publications on XML/XSL/XLink are also available in several other collections:

The following list of articles and papers on XML represents a mixed collection of references: articles in professional journals, slide sets from presentations, press releases, articles in trade magazines, Usenet News postings, etc. Some are from experts and some are not; some are refereed and others are not; some are semi-technical and others are popular; some contain errors and others don't. Discretion is strongly advised. The articles are listed approximately in the reverse chronological order of their appearance. Publications covering specific XML applications may be referenced in the dedicated sections rather than in the following listing.

September 2001

  • [September 28, 2001] "RDF/Topic Maps: late/lazy reification vs. early/preemptive reification." By Steven R. Newcomb. Posting 2001-09-27. "For me, at least, the shortest, most compelling and cogent demonstration of a certain critical difference between Topic Maps and RDF was Michael Sperberg-McQueen's wrap-up keynote at the Extreme Markup Languages Conference ( last August. Michael brought colored ribbons and other paraphernalia to the podium, in order to illustrate his words... In the past, I myself have considered RDF as the competitor of Topic Maps. Happily, I was wrong -- at least in fundamental technical terms. Indeed, I now believe that if there were no RDF, the Topic Maps camp would have to invent something like it in order to make the Maps paradigm predictably comprehensible by the programmers who are pioneering the development of the Internet. There are other interesting comparisons to be made between RDF and Topic Maps, but ever since Michael's demonstration of the difference between early vs. late (preemptive vs. lazy) reification, I have been meaning to document both the difference and the demonstration..." See: (1) "Resource Description Framework (RDF)" and (2) "(XML) Topic Maps."

  • [September 24, 2001] "XML Schema Quick Reference Cards." Prepared by Danny Vint. See: (1) XML Schema - Structures Quick Reference Card, and (2) XML Schema - Data Types Quick Reference Card. XML-DEV posting: "I've just uploaded 2 quick reference cards that I built for the XML Schema Data types and Structures specifications. These cards are available in PDF format. If you download and print them realize that they are setup for 8.5 x 14 paper. If when you print these files, just set the 'Fit to page' and Landscape mode to get a properly scaled copy of these documents. I'm also in the process of moving my 'XML Family EBNF Productions Help' to this new site as well as updating the content. This isn't completed; I'm currently showing the older version that I have previously published..." For schema description and references, see "XML Schemas."

  • [September 21, 2001] "Modeling XML Vocabularies with UML: Part II." By Dave Carlson. From September 19, 2001. "Mapping UML Models to XML Schema: This is where the rubber meets the road when using UML in the development of XML schemas. A primary goal guiding the specification of this mapping is to allow sufficient flexibility to encompass most schema design requirements, while retaining a smooth transition from the conceptual vocabulary model to its detailed design and generation. A related goal is to allow a valid XML schema to be automatically generated from any UML class diagram, even if the modeller has no familiarity with the XML schema syntax. Having this ability enables a rapid development process and supports reuse of the model vocabularies in several different deployment languages or environments because the core model is not overly specialized to XML... The default mapping rules described in this article can be used to generate a complete XML schema from any UML class diagram. This might be a pre-existing application model that now must be deployed within an XML web services architecture, or it might be a new XML vocabulary model intended as a B2B data interchange standard. In either case, the default schema provides a usable first iteration that can be immediately used in an initial application deployment, although it may require refinement to meet other architectural and design requirements. The first article in this series presented a process flow for schema design that emphasized the distinction between designing for data-oriented applications versus text-oriented applications. The default mapping rules are often sufficient for data-oriented applications. In fact, these defaults are aligned with the OMG's XML Metadata Interchange (XMI) version 2.0 specification for using XML as a model interchange format. This approach is also well aligned with the OMG's new initiative for Model Driven Architecture (MDA). Text-oriented schemas, and any other schema that might be authored by humans and used as content for HTML portals, often must be refined to simplify the XML document structure. For example, many schema designers eliminate the wrapper elements corresponding to an association role name (but this also prevents use of the XSD <all> model group). This refinement and many others can be specified in a vocabulary model by setting a new default parameter for one UML package, which then applies to all of its contained classes..." See: (1) Part I of Carlson's article; (2) "Conceptual Modeling and Markup Languages"; (3) "XML Schemas."

  • [September 21, 2001] "Being Too Generous." By Leigh Dodds. From September 19, 2001. ['Microsoft's recent release of Internet Explorer 6 has already attracted criticism for its deprecation of the Netscape plugin API. This week in his XML-Deviant column Leigh Dodds takes a look at IE6's XML support, and relates how community criticism has been met with a positive response from Microsoft.'] "This week the XML-Deviant looks at some recent community criticism over the XML support in Internet Explorer, which has been resolved with some promising feedback from Microsoft. Despite its many and varied successes XML has still not achieved it's aim of being 'SGML on the Web'. At least not within the most popular viewport of the Web, the browser. HTML is still the Web's lingua franca despite the desire of many in the XML community to see it be deprecated in favor of XHTML or CSS-styled XML documents. In other environments XML has been a runaway success, yet it is still having trouble gaining a foothold in user agents. Arguably RSS is the most successful XML format being displayed to users and processing even popular formats like SVG is handed off by browsers to optional plug-ins rather than being natively supported. There are a few reasons for this. The XML community has not made the effort it could to convince the web development community of the advantages of XML, leading to an image problem. Strong disagreements over the relative merits of XSLT and CSS has also displayed a lack of common vision for the role of XML in client-side document styling. There can be little doubt that the lack of good XML/XSLT/CSS support in recent browsers is the root cause of the problem. Which is ironic since the browser was instrumental in getting pointy-bracket parsers on millions of desktops around the world. Of course the situation is not completely bleak. XML processing capabilities are appearing in both major browsers. The added irony is that Internet Explorer appears to be leading the way, despite the fact that the most widely regarded tools in the XML toolkit are open source, and despite MS XML parser's baroque installation modes..."

  • [September 21, 2001] "Writing SAX Drivers for Non-XML Data." By Kip Hampton. From September 19, 2001. "In a previous column, we covered the basics of the Simple API for XML (SAX) and the modules that implement that interface in Perl. Over the course of the next two months we will move beyond these basic topics to look at two slightly more advanced ones: creating drivers that generate SAX events from non-XML sources and writing custom SAX filters. If you are not familiar with the way SAX works, please read High-Performance XML Parsing With SAX before proceeding. SAX is an event-driven API in which the contents of an XML document are accessed through callback subroutines that fire based on various XML parsing events (the beginning of an element, the end of an element,character data, etc.) For the purpose of this article, a SAX driver (sometimes called a SAX generator) can be understood to mean any Perl class that can generate these SAX events. In the most common case, a SAX driver acts as a proxy between an XML parser and the one or more handler classes written by the developer. The handler methods detailed in the SAX API are called as the parser makes its way through the document, thereby providing access to the contents of that XML document. In fact, this is precisely what SAX was designed for: to provide a simple means to access information stored in XML. As we will see, however, it is often handy to be able to generate these events from data sources other than XML documents... I'm certain that there are XML purists out there for whom this technique -- using a non-XML class to produce SAX event streams -- will seem like heresy. Indeed you do need to be a bit more careful when letting your own custom module stand in for an XML parser (for the reasons stated above), but, in my opinion, the benefits far outweigh the costs. Writing custom SAX drivers provides a predictable, memory-efficient, easy to take advantage of Perl's advanced built-in data handling capabilities and vast collection of non-XML parsers and other data interfaces to create XML document streams..."

  • [September 21, 2001] "Tools for Dynamic Web Sites: ASP vs. PHP vs. ASP.NET." By Hans Hartman. In Seybold Report: Analyzing Publishing Technology [ISSN: 1533-9211] Volume 1, Number 12 (September 17, 2001). ['Are there practical reasons for choosing a scripting language? Or is it just a matter of taste? A developer who's built commercial sites using PHP and ASP describes the pros and cons of each. He also looks ahead to next-generation tools, ASP.NET and PHP version 5.'] "Creating database-driven Web sites used to be complex and time-consuming. Fortunately, several server-scripting tools were invented to make it easier for publishers to generate Web content automatically from their databases instead of manually coding it in HTML. Today, the most popular of these tools are PHP and ASP. Soon, the new ASP.NET, which Microsoft is developing as the centerpiece of its dot-NET initiative, may also be important. In this article, we'll compare all three technologies... With their first iterations launched in the mid-nineties, PHP and ASP are both mature technologies for creating database-driven Web sites. Their feature sets are comparable, but differ in two areas. First, ASP is a commercial technology, supported by Microsoft and commercial third parties, whereas PHP is open-source technology, supported by the open-source community and Zend. ASP is somewhat easier to learn, whereas PHP enables developers to create object-oriented code and, by modifying the source files that other programmers have already written, to create highly tailored modules without undue work. Second, PHP runs on a multitude of servers and platforms. ASP is limited to the IIS server and Microsoft operating systems. ASP.NET, whose arrival in the market is imminent, promises to be a faster and more efficient environment than ASP, and possibly, PHP. In addition, ASP.NET makes it easy to create SOAP- and XML-based Web services. But it, too, is limited to Microsoft platforms. Will the advantages of ASP.NET be enough to convert PHP developers? We doubt it. There is strong loyalty in the open-source community to PHP and the Apache server platform, and there are equivalent -- albeit not as easy -- tools for creating reusable object code, XML and SOAP protocols. A more interesting question is whether ASP.NET will attract new users who have no previous commitments to a Web server and OS. We think it might; it should be especially attractive where connecting to partner sites through XML and SOAP is high on the wish list."

  • [September 21, 2001] "Corel heads down the cross-media path. But will Micrografx and SoftQuad acquisitions be enough of a suite? [Cross-Media Publishing.]" By [Seybold Staff - TSR]. In Seybold Report: Analyzing Publishing Technology [ISSN: 1533-9211] Volume 1, Number 11 (September 03, 2001), pages 3, 29. ['Corel, perhaps best known as a vendor of shrink-wrapped software, has purchased XML pioneer SoftQuad. Now Corel plans to use its acquired XML technology as part of a cross-media publishng system. Inside, we tell you its chances for success.'] "Reduced to a second-tier player in the desktop application markets, Corel is making a bid to rebound as a leader in the nascent market for cross-media creation tools. In the past two months, Corel has announced its intention to buy Micrografx and SoftQuad and to piece together a new suite of cross-media tools for manipulating text and graphics... Corel announced plans to acquire SoftQuad in a stock deal worth about $34 million. One of the few suppliers of a complete XML-based authoring tool, SoftQuad gives Corel a next-generation word processing tool, as well as expertise that could be helpful to the WordPerfect team. Corel said it expects to continue WordPerfect development for the legal and government markets... SoftQuad's customer base continues to expand, with 167 new XMetaL customers reported in the last quarter, bringing the total number of XMetaL customers up to about 2,000. However, overall sales have been heavily dependent on key customers, including Cisco, which signed a license agreement valued at about $1 million. Unfortunately, SoftQuad's brand recognition has always been ahead of its sales. Its finances have rarely been healthy, and the company has struggled to produce a software hit from its core structured-authoring technology... As far as Corel, seeing it hop from bandwagon to bandwagon does not instill confidence that its new-found affinity for cross-media publishing will last any longer than its short-lived love affair with Linux. The post-Cowpland management team at Corel has its work cut out for it proving that its spotty track record of the past decade is no indication of its future potential. We do believe server-based graphics engines and XML-based authoring are corporate applications poised for growth. But they are components, not a 'cross-media solution.' A solution encompasses not only authoring but also workflow, content management, production and delivery systems for multiple media -- typically print and Web. It's a stretch to believe that Ventura will be successfully resurrected for page composition, and Corel has no content-management system that would serve as the heart of a cross-media solution. Its plan to partner such vendors, as Arbortext found out, will leave it at the mercy of their painfully long sales cycles. Corel may have bought new products and established brand names, but it still faces a formidable challenge in turning those into profitable businesses..." [See the press release, a letter to SoftQuad shareholders from Roberto Drassinower, and the relevant FAQ document, PR alt URL]

  • [September 21, 2001] "PDF Collaboration In Action. [Acrobat-Based Collaboration: How Well Does It Work? Workflow.]" By Bernd Zipper and John Parsons. In Seybold Report: Analyzing Publishing Technology [ISSN: 1533-9211] Volume 1, Number 11 (September 03, 2001), pages 7-12. ['Paper-based approval processes are generally at odds with shrinking deadlines, multi-departmental reviews and the needs of cross-media production. While numerous vendors have developed network-savvy methods for viewing, annotating and approving electronic files, many of those systems are proprietary in nature and few fully support PDF. One of Acrobat 5's significant new features is the ability to add comments and digital signatures online, forming the basis of a PDF-based collaborative workflow. We tested the new features and examined their strengths and weaknesses.'] "One of the significant new features of Acrobat 5, released in April, was the ability to add comments and digital signatures online. Although Adobe's tools are rudimentary, they form the basis of a collaborative workflow that is based, not on a proprietary raster file, but on PDF, which is widely recognized as an open, flexible and data-intensive format. The basic workflow. Acrobat 5's online workflow means that a PDF can be uploaded to a Web server, viewed in a browser by any prospective collaborator, and annotated online. Multiple users may view online PDFs, but must upload and download comments as a separate step... A separate server is required to hold the FDF files created by this commenting process. Adobe provides two methods for doing this: designating a shared folder on a network server, or specifying the URL of a WebDAV server... Importing and exporting annotations (FDF files) is handled via the Comments pane, or from the File menu... Adobe developed FDF as a 'transport format' for flexible transfer of information. For example, it is used for transferring the contents of tables or fields. The FDF format is based on the syntax of PDF. Its descriptions of objects and data are similar to those used by PDF itself, and they offer many options for display. Using this transport format, it is possible to forward and exchange data collected from forms, notes and annotations, and even optical markings... With Acrobat, there are two possible solutions for uploading PDF files to a Web server. One is WebDAV, which is an extension of the HTTP 1.1 protocol... WebDAV (Web Distributed Authoring and Versioning) extends HTTP to add the capability of securing data on the server; members of a team can use WebDAV to work on the same document at the same time, without being in the same place. The shared access is implemented by functions such as file locking and version control. The locking feature allows a user to temporarily block access to a file while he or she is working with it. Once the changes are completed, it is unlocked again. Locking and unlocking happen automatically, controlled by WebDAV, to avoid a 'collision.' It is not necessary to maintain a network connection during the time the lock (called a 'persistent lock') is applied to a file. Thus, a file can be opened online and edited offline. Subsequently, the changes are 'written' to the server. WebDAV also provides for the association of properties with documents. These properties are metadata encoded as XML. WebDAV distinguishes between 'dead' and 'live' properties. Live properties are generated by the server itself, including such things as creation date and date of modification. Dead properties are name-value combinations that incorporate a URL and XML coding. In the case of Acrobat, these are online annotations... The full $249 version of Acrobat 5 is required for users to view online comments, even if the user does not need to make comments. Neither the free Reader nor the recently introduced Approval product ($39) can view WebDAV-hosted online comments... Acrobat's online capabilities have increased the potential for collaborative workflow, using a common file format and (at least for comments) the obvious strengths and popularity of WebDAV. As tantalizing as this potential is, however, we feel the process is unfinished, and that practical solutions are still to come..." See "WEBDAV (Extensions for Distributed Authoring and Versioning on the World Wide Web."

  • [September 21, 2001] "ContentGuard Scales Back Operations. Will Discontinue Services and Concentrate on XrML." By Mike Letts. In Seybold Report: Analyzing Publishing Technology [ISSN: 1533-9211] Volume 1, Number 11 (September 3, 2001), page 31. "Hard times have fallen on high-profile digital rights management (DRM) provider ContentGuard, which has announced that it is in the process of discontinuing all of its service offerings, as well as several product lines. As a result, the company is directing customers to other service providers or vendors. In conjunction with the scaling back of operations, ContentGuard cut its workforce from about 90 employees to 30 and is slimming down or closing several of its smaller offices. For the foreseeable future, said Michael Miron, co-chairman of the board of directors and CEO of ContentGuard, the focus will fall almost solely on promoting the company's XrML rights language as a standard for the digital content industry... In addition to pushing forward with XrML, Miron noted that ContentGuard also plans on releasing a series of new development tools for customers that will allow them to integrate XrML with their own systems. Some will be free, and some will be for-purchase, said Miron. The first of these toolkits will be available 'soon,' he said, perhaps as early as this fall. In addition, some of the toolkits will allow industry participants to create extensions to the language, although Miron said the company has no plans to openly publish XrML or relinquish control of its licensing... [Editorial comment:] ContentGuard's inability to sell its services is indicative of the growing pains of the DRM market. Huge legislative and technical issues need to be ironed out before concrete revenues will be seen, so the attrition is sure to continue. Perhaps the question that should be asked is: If a company with the backing of Microsoft and Xerox can't make it, who can?" See: "Extensible Rights Markup Language (XrML)."

  • [September 21, 2001] "Talk SOAP. Building Devices that Communicate." By Amit Asaravala. In WebTechniques Volume 6, Issue 10 (October 2001), pages 35-37. "SOAP, an industry-backed device communications protocol, may make the "semantic Web" a closer reality than you thought possible... Like most protocols, SOAP isn't a tangible product, but rather a set of rules that anyone can implement in a software client or server. In essence, SOAP lets your applications invoke methods on servers, services, components, and objects that lie at remote locations on the Internet. While other protocols like DCOM and IIOP/CORBA let you do similar things, they're limited in that they weren't designed specifically for the Internet and for communication between diverse companies and devices. Using DCOM to communicate between applications in two separate companies is a difficult task that first requires agreeing on ports, transfer protocols, and so on. SOAP, on the other hand, sits on top of existing HTTP connections. As most companies have Web servers configured for HTTP connections on standard port 80, most of the initial coordination is complete. Of course, companies will still need to share APIs for available objects and methods; but SOAP lets people focus on these APIs and the data that needs to be transferred, rather than on the trouble of getting two disparate systems to communicate. All you need is a SOAP-compliant client application on the one side and a SOAP-compliant server on the other. The server could be as simple as a Web server that checks the headers of incoming HTTP requests. If it finds a POST statement with a text/xml-SOAP content-type or SOAPAction header, it sends the statement to a SOAP engine that parses the command found within. There are numerous SOAP implementations available, including SOAP::Lite for Perl... Because it's infrastructure agnostic, SOAP is positioned to become the de facto standard in communications. With its reliance on common protocols and languages such as HTTP and XML, SOAP promises to reduce the amount of coordination and development traditionally necessary to facilitate communication between two or more devices. In addition to its uses for network appliances, SOAP is being touted as the enabling component for Web services. This is one of the major reasons behind Microsoft's involvement with the SOAP specification. The company's .Net frameworks will use SOAP messages to send information between companies that have agreed to share data. eBay has already agreed to use the .Net framework to open up its auction databases. When the technology is in place, developers from other sites will be able to write auction applications that rely on live data from eBay's central database. In essence, SOAP is enabling the Web that we don't see. It's the technology that will help us realize a semantic, invisible Web that runs in the background, doing our bidding without our constant attention. So long, Web browsers." See "Simple Object Access Protocol (SOAP)."

  • [September 21, 2001] "SVG Gets an Editor. [Product Review.]" By Lynne Cooney. In WebTechniques Volume 6, Issue 10 (October 2001), page 28. "Scalable Vector Graphics (SVG) is a graphics format based on XML, and is currently nearing completion at W3C. Support for SVG isn't as broad as for Flash and Shockwave yet, but you can expect that to change with broader industry and browser support. Preparing to grab hold of the emerging market, Jasc has released a beta version of WebDraw... I like the preset objects tool, which lets you place simple arrows, hearts, and other graphics into your document from a library. To edit the points or nodes of these objects you must select the object and choose Convert to Paths. There are four primary drawing tools. You can use the Line tool to draw straight lines. With the Polyline tool you can draw irregular polylines or polygons. The FreeHand tool lets you draw a path freely, without clicking for each point. And finally, the Path tool is similar to the default pen found in vector illustration tools such as Illustrator and FreeHand... Overall, I think Jasc WebDraw is an excellent low-cost tool for creating simple SVG code. It would be a great asset to anyone creating scripts to output SVG on the fly, or to build consistent, styled headlines on a Web site. It lacks Illustrator's higher end features, but you can't beat the price -- free, while in beta!" See: "W3C Scalable Vector Graphics (SVG)."

  • [September 21, 2001] "Microsoft's Golden Road to the Internet. Visual Studio .Net Enterprise Edition. [Product Review.]" By John Pearson. In WebTechniques Volume 6, Issue 10 (October 2001), pages 48-49. "Microsoft's future is bound to Visual Studio .Net, which is probably why it has made so many changes. If you've used previous releases of Visual Studio, this will be a whole new world. If you've used enterprise suites from other companies, you have to get this one and try it out... To understand what happened to VB -- indeed, what happened to Visual Studio -- we need to look under the covers and discuss Microsoft's .Net initiative. This initiative begins with the .Net Framework -- essentially a set of libraries, classes, and interpreters that constitute the foundation upon which all .Net services and languages are built. The libraries are collectively known as the Common Language Runtime (CLR) and include fundamental programming services such as memory management, process management, and security enforcement. Part of this library is also a compiler to process language instructions. This library has a set of classes, known as .Net Framework Unified Classes, that perform systems and programming tasks such as file management, system input and output, and operating system functionality. On top of the systems classes, other classes are built that do the things that we really want to get to: data classes (ADO.Net), Windows forms, and XML and Web classes... ADO.Net is the data access component of the .Net framework. It provides access to SQL Server and OLE DB data sources via an XML interface. For Web applications, first it provides the schema in XSD format, then transmits data in XML datasets. For ASP.Net programming, this makes database and server-side programming much easier. You can easily incorporate XML data from other sources into your applications, and you can use XML to transmit data from your application to others. ADO.Net is fully integrated into Visual Studio .Net and is the primary means of data access for the applications developed with it."

  • [September 21, 2001] "The Semantic Web: An Introduction." By Sean B. Palmer. ['This document is designed as being a simple but comprehensive introductory publication for anybody trying to get into the Semantic Web: from beginners through to long time hackers. The document discusses many principles and technologies of the Semantic Web, including RDF, RDF Schema, DAML, ontologies, inferences, logic, SEM, queries, trust, proof, and so on. Because it touches a lot of subjects, it may cover some well-known material, but it should also have something that will be of interest to everyone.'] "... So the Semantic Web can be seen as a huge engineering solution... but it is more than that. We will find that as it becomes easier to publish data in a repurposable form, so more people will want to pubish data, and there will be a knock-on or domino effect. We may find that a large number of Semantic Web applications can be used for a variety of different tasks, increasing the modularity of applications on the Web. But enough subjective reasoning... onto how this will be accomplished. The Semantic Web is generally built on syntaxes which use URIs to represent data, usually in triples based structures: i.e., many triples of URI data that can be held in databases, or interchanged on the world Wide Web using a set of particular syntaxes developed especially for the task. These syntaxes are called 'Resource Description Framework' syntaxes... Table Of Contents: 1. What Is The Semantic Web?; 2. Simple Data Modelling: Schemata; 3. Ontologies, Inferences, and DAML; 4. The Power Of Semantic Web Languages; 5. Trust and Proof; 6. Ambient Information and SEM; 7. Evolution; 8. Does It Work? What Semantic Web Applications Are There?; 9. What Now? Further Reading." See: "XML and 'The Semantic Web'."

  • [September 21, 2001] "C/C++ developers: Fill your XML toolbox. Tools advice for C and C++ programmers ramping up on XML." By Rick Parrish. From IBM developerWorks. September 2001. "Designed for C and C++ programmers who are new to XML development, this article gives an overview of tools to assemble in preparation for XML development. Tool tables outline generic XML tools like IDEs and schema designers, parsers, XSLT tools, SOAP and XML-RPC libraries, and other libraries either usable from or actually written in C and/or C++. The article includes advice for installing open-source libraries on Windows, Unix, and Linux, plus a brief glossary of key XML terms. It seems as if everywhere you look there is some new XML-related tool being released in source code form written in Java. Despite Java's apparent dominance in the XML arena, many C/C++ programmers do XML development, and there are a large assortment of XML tools for the C and C++ programmer. We'll confront XML library issues like validation, schemas, and API models. Next, we'll look at a collection of generic XML tools like IDEs and schema designers. Finally, we'll conclude with a list and discussion of libraries either usable from or actually written in C and/or C++. This isn't a comparative review that rates tools. My goal is to explain the types of tools you'll probably need and to point you to likely candidates. You'll still need to research, test, and compare tool features against your project needs to assemble your ultimate toolbox. To incorporate XML in your own software projects, you're going to want to have two sets of tools in your bag of tricks. The first set is a dialect designer (or more properly 'schema designer'). The second set of tools includes software libraries that will add parsing and XML-generation features to your application... These tools ought to give you a good start on your XML toolbox. If you want to suggest other C/C++ tools for XML that you have tried or to make any other comment, join the discussion referenced in this article." Also in PDF format.

  • [September 21, 2001] "Enabling XML security. An introduction to XML encryption and XML signature." By Murdoch Mactaggart ( From IBM developerWorks. September 2001. "XML is a major enabler of what the Internet, and latterly Web services, require in order to continue growing and developing. Yet a lot of work remains to be done on security-related issues before the full capabilities of XML languages can be realised. At present, encrypting a complete XML document, testing its integrity, and confirming the authenticity of its sender is a straightforward process. But it is increasingly necessary to use these functions on parts of documents, to encrypt and authenticate in arbitrary sequences, and to involve different users or originators. At present, the most important sets of developing specifications in the area of XML-related security are XML encryption, XML signature, XACL, SAML, and XKMS. This article introduces the first two. XML has become a valuable mechanism for data exchange across the Internet. SOAP, a means of sending XML messages, facilitates process intercommunication in ways not possible before, while UDDI seems to be fast becoming the standard for bringing together providers and users of Web services; the services themselves are described by XML in the form of WSDL, the Web Services Description Language. Without XML, this flexibility and power would not be possible and, as various people have remarked, it would be necessary to invent the metalanguage. The other area of rapid growth is that of security. Traditional methods of establishing trust between parties aren't appropriate on the public Internet or, indeed, on large LANs or WANs. Trust mechanisms based on asymmetric cryptography can be very useful in such situations, but the ease of deployment and key management, the extent of interoperability, and the security offered are, in reality, far less than the enthusiastic vendors of different Public Key Infrastructures (PKI) would have us believe. There are particular difficulties in dealing with hierarchical data structures and with subsets of data with varying requirements as to confidentiality, access authority, or integrity. In addition, the application of now standard security controls differentially to XML documents is not at all straightforward. Several bodies are actively involved in examining the issues and in developing standards. The main relevant developments here are XML encryption and the related XML signature, eXtensible Access Control Language (XACL), and the related Security Assertion Markup Language (SAML -- a blending of the formerly competing AuthML and S2ML). Each of these is driven by OASIS, and XML Key Management Specification (XKMS). This article introduces XML encryption and XML signature... SAML is an imitative driven by OASIS that attempts to blend the competing specifications AuthML and S2ML, and to facilitate the exchange of authentication and authorisation information. Closely related to SAML, but focusing more on a subject-privilege-object orientated security model in the context of a particular XML document, is the eXtensible Access Control Markup Language, also directed by OASIS and variously known (even within the same documents) as XACML or XACL. By writing rules in XACL, a policy author can define who can exercise what access privileges for a particular XML document, something relevant in the situations cited earlier. XKMS, now being considered by a W3C committee, is intended to establish a protocol for key management on top of the XML signature standard. With SAML, XACL, and other initiatives, XKMS is an important element in the large jigsaw that makes up security as applied to XML documents. Its immediate effect is to simplify greatly the management of authentication and signature keys; it does this by separating the function of digital certificate processing, revocation status checking, and certification path location and validation from the application involved -- for example, by delegating key management to an Internet Web service." See "XML Digital Signature (Signed XML - IETF/W3C)."

  • [September 20, 2001] "Directory Services Markup Language Version 2.0." From OASIS TC for Directory Services Markup Language (DSML). Draft. September 19, 2001. 38 pages. "The Directory Services Markup Language v1.0 (DSMLv1) provides a means for representing directory structural information as an XML document.1 DSMLv2 goes further, providing a method for expressing directory queries and updates (and the results of these operations) as XML documents. DSMLv2 documents can be used in a variety of ways. For instance, they can be written to files in order to be consumed and produced by programs, or they can be transported over HTTP to and from a server that interprets and generates them. DSMLv2 functionality is motivated by scenarios including: (1) A smart cell phone or PDA needs to access directory information but does not contain an LDAP client. (2) A program needs to access a directory through a firewall, but the firewall is not allowed to pass LDAP protocol traffic because it isn't capable of auditing such traffic. (3) A programmer is writing an application using XML programming tools and techniques, and the application needs to access a directory. In short, DSMLv2 is needed to extend the reach of directories. DSMLv2 is not required to be a strict superset of DSMLv1, which was not designed for upward-compatible extension to meet new requirements. However it is desirable for DSMLv2 to follow the design of DSMLv1 where possible. ... DSMLv2 focuses on extending the reach of LDAP directories. Therefore, as in DSMLv1, the design approach is not to abstract the capabilities of LDAP directories as they exist today, but instead to faithfully represent LDAP directories in XML. The difference is that DSMLv1 represented the state of a directory while DSMLv2 represents the operations that an LDAP directory can perform and the results of such operations...." With the draft XML schemas: Batch Envelope, [imported]. See references in "Directory Services Markup Language (DSML)."

  • [September 20, 2001] "Understanding WSDL in a UDDI Registry. How to Publish and Find WSDL Service Descriptions." By Peter Brittenham, Francisco Cubera, Dave Ehnebuske, and Steve Graham. From IBM developerWorks [Web services articles]. September 2001. ['The Web Services Description Language has a lot of versatility in its methods of use. In particular, WSDL can work with UDDI registries in several different ways depending upon the application needs. In this first of a three-part series, we will look at these different methods of using WSDL with UDDI registries.'] The Web Services Description Language (WSDL) is an XML language for describing Web services as a set of network endpoints that operate on messages. A WSDL service description contains an abstract definition for a set of operations and messages, a concrete protocol binding for these operations and messages, and a network endpoint specification for the binding. Universal Description Discovery and Integration (UDDI) provides a method for publishing and finding service descriptions. The UDDI data entities provide support for defining both business and service information. The service description information defined in WSDL is complementary to the information found in a UDDI registry. UDDI provides support for many different types of service descriptions. As a result, UDDI has no direct support for WSDL or any other service description mechanism. The UDDI organization,, has published a best practices document titled Using WSDL in a UDDI Registry 1.05. This best practices document describes some of the elements on how to publish WSDL service descriptions in a UDDI registry. The purpose of this article is to augment that information. The primary focus is on how to map a complete WSDL service description into a UDDI registry, which is required by existing WSDL tools and runtime environments. The information in this article adheres to the procedures outlined in that best practices document and is consistent with the specifications for WSDL 1.1, UDDI 1.0 and UDDI 2.0..." For related articles, see the IBM developerWorks Web Services Zone. References: (1) "Web Services Description Language (WSDL)", and (2) "Universal Description, Discovery, and Integration (UDDI)."

  • [September 20, 2001] "Sun, IBM Update Web Services Tools." By Tom Sullivan. In InfoWorld (September 19, 2001). "Both Sun Microsystems and IBM on Tuesday announced upgraded tools for building applications and Web services. Sun, in Palo Alto, California, said that Forte for Java 3.0, Enterprise Edition, is now generally available. The company said that the primary focus of the new version is enhanced support for EJBs (Enterprise JavaBeans) and on creating XML-based services. In addition to the Enterprise Edition, Sun also maintains the free Community Edition, which now includes seven modules previously in the Internet version. These modules include an external editor, XML support, database explorer, CORBA support, terminal emulation, file copy, and support for C, C++, and Fortran. IBM, also on Tuesday, placed the latest version of its WSTK (Web services Toolkit) on the company's alphaWorks Web site for developers. WSTK v2.4 offers developers a runtime environment as well as introductory material and examples of Web services that developers can use. New to this version are support for IBM's WebSphere application server, HTTPR (reliable HTTP), WSDL (Web Services Description Language), and WSIF (Web Services Invocation Framework), which enables developers to describe non-SOAP based services in WSDL..."

  • [September 20, 2001] "Extend the Power of Java Technology with the Modular, Extensible Forte for Java IDE." By [Staff]. Sun Forte Tools Feature Story. September 01, 2001. "Whether you're a beginning programmer or a professional Java technology developer, Sun's Forte for Java, release 3.0 integrated development environment (IDE) provides an outstanding platform in which to create and deploy Enterprise JavaBeans (EJB). The Forte for Java IDE supports the editions of the Java 2 Platform: the Micro Edition (J2ME), the Standard Edition (J2SE), and the Enterprise Edition (J2EE). Moreover, the Forte for Java IDE is modular and extensible -- allowing you to quickly incorporate new technologies, such as wireless, smart Web services, and robust application-specific user interfaces from Sun, Sun's 75+ partners, and the open source community... As a key component of the Sun Open Net Environment (Sun ONE), you can count on the Forte for Java IDE to be an outstanding product that integrates with the Sun ONE architecture. Written in the Java programming language, it generates J2EE code. Because the Forte for Java IDE also includes many wizards and productivity features, and integrates with the iPlanet Application Server and the iPlanet Web Server, developers are enabled to create and deploy EJBs in a highly productive manner. You can choose either of the following Forte for Java software editions: (1) The Community Edition is offered at no charge and includes a complete and highly integrated set of tools -- including a Web browser, Web server, a relational database and support for CORBA, RMI, XML, and source code management. This edition includes all the functionality needed for teams of developers building database-aware Web applications, including integration with Tomcat... (2) The Enterprise Edition is ideally suited for developing scalable, robust applications and services based on the J2EE architecture specification. This edition includes all the functionality of the Community Edition plus support for building and assembling EJBs into applications. It also supports deploying applications to an integrated application server, such as the iPlanet Application Server. The Enterprise Edition also enables you to develop and publish Web services with the Web Services module or add a partner plug-in module that extends your development environment to support standards, such as ebXML, WSDL, UDDI, and SOAP..." See also the announcement from Sun.

  • [September 20, 2001] "RosettaNet Sets Compliance Program." By Chuck Moozakis. In InternetWeek (September 18, 2001). "RosettaNet today took the wraps off RosettaNet Ready, a package of developer tools and source code aimed at accelerating the adoption of the product definition standard. Ready has two components. The first, a developer tools library, lets companies test software to ensure it complies with RosettaNet standards. The second, a set of software compliance badges, verifies that applications written by members and other software developers conform to RosettaNet... Fourteen companies have already signed on as Ready backers, including application integration companies webMethods and SeeBeyond. Electronics industry exchange E2open is another backer. The exchange earlier this month kicked off a RosettaNet Onboarding service that incorporates RosettaNet's XML product descriptions into i2's supply chain apps. The service is geared to electronics and semiconductor manufacturers that want to use the Web to collaborate with their trading partners. RosettaNet hopes to have another 90 or so companies signed up to support the Ready initiative. Currently, RosettaNet has about 400 member companies that have pledged to adopt the standard..." See "RosettaNet."

  • [September 20, 2001] "Device Independence Principles." W3C Working Draft 18-September-2001. Edited by Roger Gimson (HP); Co-edited by Shlomit Ritz Finkelstein (Nexgenix), Stéphane Maes (IBM), and Lalitha Suryanarayana (SBC Technology Resources). Latest version URL: Produced as part of the W3C Device Independence Activity. ['The Device Independence Working Group has released its first publication, a Working Draft of Device Independence Principles. The document describes the principles necessary to make the Web accessible by "anyone, anywhere, anytime, anyhow".'] Abstract: "This document celebrates the vision of a device independent Web. It describes device independence principles that can lead towards the achievement of greater device independence for Web content and applications." Goal: "The aim of this document is to set out some principles that can be used when evaluating current solutions or proposing new solutions, and can lead to more detailed requirements and recommendations in the future. The principles are independent of any specific markup language, authoring style or adaptation process. They do not propose specific requirements, guidelines or technologies. It is intended, however, that these principles be used as a foundation when proposing greater device independence through, for example: (1) guidelines for authoring of content and applications that use existing markup languages, (2) modifications and extensions to existing markup languages, (3) designs of adaptation tools and processes, (4) evolution of new markup languages..." See also the mailing list archives.

  • [September 19, 2001] "Indexing and Querying XML Data for Regular Path Expressions." By Quanzhong Li and Bongki Moon (Department of Computer Science, University of Arizona, Tucson, AZ 85721, USA). Paper presented at the 2001 International Conference on Very Large Databases (VLDB 2001), Rome, Italy, September, 2001. 10 pages, with 25 references. "With the advent of XML as a standard for data representation and exchange on the Internet, storing and querying XML data becomes more and more important. Several XML query languages have been proposed, and the common feature of the languages is the use of regular path expressions to query XML data. This poses a new challenge concerning indexing and searching XML data, because conventional approaches based on tree traversals may not meet the processing requirements under heavy access requests. In this paper, we propose a new system for indexing and storing XML data based on a numbering scheme for elements. This numbering scheme quickly determines the ancestor-descendant relationship between elements in the hierarchy of XML data. We also propose several algorithms for processing regular path expressions, namely, (1) EE-Join for searching paths from an element to another, (2) EA-Join for scanning sorted elements and attributes to find element-attribute pairs, and (3) KC-Join for finding Kleene-Closure on repeated paths or elements. The EE-Join algorithm is highly effective particularly for searching paths that are very long or whose lengths are unknown. Experimental results from our prototype system implementation show that the proposed algorithms can process XML queries with regular path expressions by up to an order of magnitude faster than conventional approaches... The XQuery language is designed to be broadly applicable across all types of XML data sources from documents to databases and object repositories. The common features of these languages are the use of regular path expressions and the ability to extract information about the schema from the data. Users are allowed to navigate through arbitrary long paths in the data by regular path expressions. For example, XPath uses path notations as in URLs for navigating through the hierarchical structure of an XML document. Despite the past research efforts, it is widely believed that the current state of the art of the relational database technology fails to deliver all necessary functionalities to efficiently store XML and semi-structured data. Furthermore, when it comes to processing regular path expression queries, only a few straightforward approaches based on conventional tree traversals have been reported in the literature. Such approaches can be fairly inefficient for processing regular path expression queries, because the overhead of traversing the hierarchy of XML data can be substantial if the path lengths are very long or unknown. In this paper, we propose a new system called XISS for indexing and storing XML data based on a new numbering scheme for elements and attributes. The index structures of XISS allow us to efficiently find all elements or attributes with the same name string, which is one of the most common operations to process regular path expression queries. The proposed numbering scheme quickly determines the ancestor-descendant relationship between elements and/or attributes in the hierarchy of XML data. We also propose several algorithms for processing regular path expression queries... The new query processing paradigm proposed in this paper poses an interesting issue concerning XML query optimization. A given regular path expression can be decomposed in many different ways. Since each decomposition leads to a different query processing plan, the overall performance may be affected substantially by the way a regular path expression is decomposed. Therefore, it will be an important optimization task to find the best way to decompose an expression. We conjecture that document type definitions and statistics on XML data may be used to estimate the costs and sizes of intermediate results. In the current prototype implementation of XISS,all the index structures are organized as paged files for effi-cient disk IO. We have observed that trade-off between disk access efficiency and storage utilization. It is worth investigating the way to find the optimal page size or the break-even point between the two criteria." See "XML and Query Languages." [cache]

  • [September 18, 2001] "A Fast Index for Semistructured Data." By Brian F. Cooper, Neal Sample, Michael J. Franklin, Gísli R. Hjaltason, and Moshe Shadmon. Paper presented at the 27th VLDB Conference, Roma, Italy, September 13, 2001. 19 pages, with 32 references. Abstract: "Queries navigate semistructured data via path expressions, and can be accelerated using an index. Our solution encodes paths as strings, and inserts those strings into a special index that is highly optimized for long and complex keys. We describe the Index Fabric, an indexing structure that provides the efficiency and flexibility we need. We discuss how 'raw paths' are used to optimize ad hoc queries over semistructured data, and how 'refined paths' optimize specific access paths. Although we can use knowledge about the queries and structure of the data to create refined paths, no such knowledge is needed for raw paths. A performance study shows that our techniques, when implemented on top of a commercial relational database system, outperform the more traditional approach of using the commercial system's indexing mechanisms to query the XML." Detail: "... Typically, indexes are constructed for efficient access. One option for managing semistructured data is to store and query it with a relational database... An alternative option is to build a specialized data manager that contains a semistructured data repository at its core. Projects such as Lore and industrial products such as Tamino and XYZFind take this approach. It is difficult to achieve high query performance using semistructured data repositories, since queries are again answered bytraversing many individual element-to-element links, requiring multiple index lookups. Moreover, semistructured data management systems do not have the benefit of the extensive experience gained with relational systems over the past few decades. To solve this problem, we have developed a different approach that leverages existing relational database technology but provides much better performance than previous approaches. Our method encodes paths in the data as strings, and inserts these strings into an index that is highly optimized for string searching. The index blocks and semistructured data are both stored in a conventional relational database system. Evaluating queries involves encoding the desired path traversal as a search key string, and performing a lookup in our index to find the path. There are several advantages to this approach. First, there is no need for a prioriknowledge of the schema of the data, since the paths we encode are extracted from the data itself. Second, our approach has high performance even when the structure of the data is changing, variable or irregular. Third, the same index can accelerate queries along many different, complex access paths. This is because our indexing mechanism scales gracefully with the number of keys inserted, and is not affected by long or complex keys (representing long or complex paths). Our indexing mechanism, called the Index Fabric, utilizes the aggressive key compression inherent in a Patricia trie to index a large number of strings in a compact and efficient structure. Moreover, the Index Fabric is inherently balanced, so that all accesses to the index require the same small number of I/Os. As a result, we can index a large, complex, irregularly-structured, disk-resident semistructured data set while providing efficient navigation over paths in the data. Indexing XML with the Index Fabric: Because the Index Fabric can efficiently manage large numbers of complex keys, we can use it to search many complex paths through the XML. In this section, we discuss encoding XML paths as keys for insertion into the fabric, and how to use path lookups to evaluate queries... We encode data paths using designators: special characters or character strings. A unique designator is assigned to each tag that appears in the XML. The designator-encoded XML string is inserted into the layered Patricia trie of the Index Fabric, which treats designators the same way as normal characters, though conceptually they are from different alphabets. In order to interpret these designators (and consequently to form and interpret queries) we maintain a mapping between designators and element tags called the designator dictionary. When an XML document is parsed for indexing, each tag is matched to a designator using the dictionary. New designators are generated automatically for new tags. The tag names from queries are also translated into designators using the dictionary, to form a search key over the Index Fabric. ... Raw paths index the hierarchical structure of the XML by encoding root-to-leaf paths as strings. Simple path expressions that start at the root require a single index lookup. Other path expressions may require several lookups, or post-processing the result set. [Here] we focus on the encoding of raw paths. Raw paths build on previous work in path indexing. Tagged data elements are represented as designator-encoded strings. We can regard all data elements as leaves in the XML tree..." See "XML and Query Languages." [cache 2001-09-18]

  • [September 14, 2001] "Microsoft Integration Software Targets Chemical Industry." By Renee Boucher Ferguson. In eWEEK (September 12, 2001). "Microsoft Corp. this week introduced a BizTalk business-to-business integration software development kit for the chemical industry. The Microsoft BizTalk Server 2000 CIDX Software Development Kit, which was rolled out at the Instrumentation, Systems and Automation Society conference in Houston, is designed to help chemical companies rapidly integrate applications, platforms and business processes inside and outside their firewalls. The SDK uses the core XML (Extensible Markup Language) protocols developed by the Chemical Industry Data Exchange, a consortium of chemical industry leaders. The software provides XSLT (Extensible Stylesheet Language Transformation) mapping documents that allow customers to map data from CIDX transactions to SAP AG intermediate documents, which are used in application linking and embedding. To help users along in the process, the CIDX kit also includes a sample utility that demonstrates an approach for automating the configuration of BizTalk, as well as a tutorial explaining how to implement support for a CIDX OrderCreate transaction. BizTalk is part of Microsoft's .Net platform, which supports creation of services that run on Web sites. While the chemical industry has been slow to adopt Microsoft technology as an e-business software provider and CIDX as a standard, Christopher McCormick believes it is only a matter of time before CIDX becomes the starting point for all e-business transactions in the industry... McCormick [CEO of Inc.] estimated that about 20 percent of chemicals industry businesses use CIDX... The CIDX Chem eStandards grew out of some broad standards developed for high-tech manufacturing by RosettaNet, a multi-industry consortium of which Microsoft is a founding member..." See "XML-Based 'Chem eStandard' for the Chemical Industry."

  • [September 13, 2001] "The Race To Make Numbers Useful. XML-like Standards Aim to Enable Analysis of Data Posted Online." By L. Scott Tillett. In InternetWeek #877 (September 10, 2001), page 15. "The problem with numbers on the Web these days is that they're buried in an environment designed for text. Pulling numerical data off of a Web site and running it in an analytical application requires cutting and pasting or retyping. And good luck if you need to convert euros to dollars before plugging the numbers into your app... Efforts are multipronged, with vendors working on proprietary standards for defining, sharing and translating numerical data via the Web. Others such as e-Numerate of McLean, Va., are developing standards that they say will be open. And then there are efforts that pull in multiple industry players, such as, a consortium seeking to develop Extensible Business Reporting Language. Putting numbers into a language modeled on XML, for example, could let a Web site visitor view a company's financial statement and instantly merge those numbers into an app to compare that company's performance with that of the visitor's own firm. The same approach could work for a multinational company that wants to use applications to analyze numerical data flowing in from lots of countries... The two open standards being developed by e-Numerate are intended for sharing numerical data in an XML framework via the Web. One standard, RDL, addresses the meaning of numbers, including source information, descriptors and magnitude -- whether the numbers represent inches, dollars, euros, millions, thousands or whatnot. Scott Santucci, vice president of sales and marketing for e-Numerate, compares RDL to HTML descriptors that tell a browser whether to present information in bold, in italics or in a certain location on the page. The other standard, RXL, functions essentially as math equations that are applied to RDL. Santucci described them as 'macros' that process numbers--to adjust or "normalize" them for the rate of inflation, for example, or to convert them to another numerical standard... E-Numerate, which is backed by Carlyle Venture Partners and led by William M. Diefenderfer III, former President George Bush's budget director, is building a gateway to RDL/RXL-enabled numerical data that will be released sometime next year. Meanwhile, the company expects to release a Web development kit this month to let companies develop their Web sites using RDL/RXL-enabled numbers... Mike Willis, a partner at PricewaterhouseCoopers and chair of the steering committee, said that the concept of putting numbers within an XML framework would take off as the use of XML in business continues to gain momentum. Meanwhile, vendors such as Hyperion, CaseWare and Innovision continue to work in parallel to e-Numerate to create applications for the new numbers-sharing approach." See: "Re-Useable Data Language (RDL)."

  • [September 13, 2001] "Requirements for XML Document Database Systems." By Airi Salminen (Dept. of Computer Science and Information Systems, University of Jyväskylä, Jyväskylä, Finland) and Frank Wm. Tompa (Department of Computer Science, University of Waterloo, Waterloo, ON, Canada). Paper to be presented at ACM Symposium on Document Engineering, November 2001. 10 pages, with 52 references. "The shift from SGML to XML has created new demands for managing structured documents. Many XML documents will be transient representations for the purpose of data exchange between different types of applications, but there will also be a need for effective means to manage persistent XML data as a database. In this paper we explore requirements for an XML database management system. The purpose of the paper is not to suggest a single type of system covering all necessary features. Instead the purpose is to initiate discussion of the requirements arising from document collections, to offer a context in which to evaluate current and future solutions, and to encourage the development of proper models and systems for XML database management. Our discussion addresses issues arising from data modelling, data definition, and data manipulation... Effective means for the management of persistent XML data as a database are needed. We define an XML document database (or more generally an XML database, since every XML database must manage documents) to be a collection of XML documents and their parts, maintained by a system having capabilities to manage and control the collection itself and the information represented by that collection. It is more than merely a repository of structured documents or of semistructured data. As is true for managing other forms of data, management of persistent XML data requires capabilities to deal with data independence, integration, access rights, versions, views, integrity, redundancy, consistency, recovery, and enforcement of standards. A problem in applying traditional database technologies to the management of persistent XML documents lies in the special characteristics of the data, not typically found in traditional databases. Structured documents are often complex units of information, consisting of formal and natural languages, and possibly including multimedia entities. The units as a whole may be important legal or historical records. The production and processing of structured documents in an organization may create a complicated set of documents and their components, versions and variants, covering both basic data and metadata... Data model, DDL, and DML design must be coordinated if the resulting system is to be consistent. Much effort has been devoted to data definition for the purpose of validation and to query language features. We believe that now the highest priority is to define a complete data model that covers enterprise and document data, serves as a means to define conceptual schemas, and defines the mechanism to answer whether any two items of data are equivalent. We are encouraged by the move towards convergence of the XPath and XQuery data models; if convergence with the DOM and Infoset models were undertaken, a complete and stable database model might evolve. DDLs and DMLs can then be defined to include all components of the model. We believe that priority should also be given to developing mechanisms to manage collections of DTDs and other document definitions along with managing the documents themselves. This is especially important in the context of managing diverse collections of documents, each of which encompasses many versions and variants and subject to various levels of validity. The purpose of the paper is to initiate discussion of the requirements for XML databases, to offer a context in which to evaluate current and future solutions, and to encourage the development of proper models and systems for XML database management. A well-defined, general-purpose XML database system cannot be implemented before database researchers and developers understand the needs of document management in addition to the needs of more traditional database applications..." See: "XML and Databases." [cache]

  • [September 13, 2001] "Pork Barrel Protocols." By Martin Gudgin and Timothy Ewald. From September 12, 2001. "XML Endpoints is a new column about web services, one of the most controversial and confusing topics in distributed systems development today. Our goal for this column is to examine web services as they exist today and as they will be evolving in the future. Along the way, we'll talk about protocols, programming models, toolkits, interoperability and more. We'll also try to sift through all of the proposals for competing web service related specifications -- e.g., WSDL, WSFL, XLANG, HTTPR, SOAPRP, UDDI, and so on -- in order to explain which ones are likely to be useful and why. Before we get to all that, however, we need to define the term 'web service'. . . First, web services rely on standard Internet protocols like HTTP and SMTP because nearly every platform supports them and because the entire Internet infrastructure -- the proxy servers, routers, gateways and firewalls that make up the physical network -- is designed and configured to transport and accept these protocols. Second, web services use XML-based messages because XML has industry-wide support and processing tools are inexpensive and ubiquitous. The SOAP specification defines a very widely endorsed format for these messages, and at least one alternative exists, XML-RPC. Third, web services describe their message formats in terms of a language and platform neutral type system. This helps facilitate the definition of precise wire-level contracts between web services and their clients, which makes building robust interoperable distributed systems easier. XML schema (XSD) is an obvious choice for a type system, but there are some conflicts between XSD and portions of the SOAP specification that need to be resolved. Fourth, web services provide some way to access metadata describing the messages they accept in terms of the type system mandated by the previous requirement..."

  • [September 13, 2001] "Picture Perfect." By Edd Dumbill. From September 12, 2001. "Last week, the World Wide Web Consortium issued Scalable Vector Graphics (SVG) 1.0 as a Recommendation. SVG, as its name implies, is an XML application for describing two-dimensional graphics in the form of vector-based objects. As well as allowing normal 2D drawings, SVG is scriptable, enabling user interaction, and it incorporates animation capabilities from SMIL. Along with W3C XML Schema, SVG is one of the most important technologies to emerge from the W3C this year. It's certainly been long in the making -- SVG's first public Working Draft was published over two and a half years ago. Although many have been impatient for the final recommendation, this lengthy period of maturation has produced benefits in terms of the quality of SVG's specification and the number of supporting implementations. The text of the SVG Recommendation makes for an impressive read. It starts with a useful Concepts section that explains the key points and motivations behind SVG. The specification itself is beautifully formatted, comprehensively hyperlinked, and filled with examples. In addition, it is also very well indexed and useful as a reference, both for SVG processor implementers and those wishing to create SVG diagrams in XML. Accompanying the recommendation is a test suite, allowing developers of SVG implementations to verify their code against the expected renderings of SVG documents. Although W3C XML Schema has recently gained a test suite after going to recommendation status, to have one available through the development of a specification and at publication of the recommendation is an excellent move. It has also enabled the W3C to publish implementation conformance information for the various available SVG renderers... Setting aside the excellence of the specification itself, we must ask where SVG will succeed. After all, the best of technologies have been known to fail due to poor adoption. On the Web, SVG's most immediate competitor is Flash, the only real established technology for vector-based illustration and animation. Microsoft's Internet Explorer has had support for its own predecessor to SVG, VML, for a while now, but this hasn't really achieved widespread deployment on web sites. It is clearly the W3C's hope that SVG will supplant Macromedia's Flash to a certain extent, bringing as it does the benefits of integration with the emerging XML infrastructure both in browsers and on the server side, and of course the open process of a W3C-fostered specification..." See (1) the news entry for the Scalable Vector Graphics (SVG) 1.0 specification as a W3C Recommendation, and (2) "W3C Scalable Vector Graphics (SVG)."

  • [September 13, 2001] "What Are XForms?" By Micah Dubinko. From September 12, 2001. "XForms are the new XML-based replacement for web forms. Think about how many times a day you use forms, electronic or otherwise. On the Web, forms have truly become commonplace for search engines, polls, surveys, electronic commerce, and even on-line applications. Nearly all user interaction on the Web is through forms of some sort. This ubiquitous technology, however, is showing its age. It predates XML by several years, a contributing factor to some of its limitations: poor integration with XML, device dependent, running well only on desktop browsers, blending of purpose and presentation, [and] limited accessibility features. A new technology, XForms, is under development within the W3C and aims to meld XML and forms. The design goals of XForms meet the shortcomings of HTML forms point-for-point: (1) Excellent XML and Schema integration; (2) Device independent, yet still useful on desktop browsers; (3) Strong separation of purpose from presentation; (4) Universal accessibility. This document gives an introduction to XForms, based on the 28 August 2001 Working Draft. The most important concept in XForms is 'instance data', an internal representation of the data mapped to the more visible 'form controls'. Instance data is based on XML and defined in terms of XPath's internal representation and processing of XML. It might seem strange at first to associate XPath and XForms. XPath is perhaps best known as the common layer between XSLT and XPointer, not as a foundation for web forms. As XForms evolved, however, it became apparent that forms needed greater structure than was possible with simple name-value pairs, as well as syntax to reach into the instance data to connect or "bind" form controls to specific parts of the data structure. XForms processing combines input and output into the same tree: (1) From an input source, either inline or an XML document on a server, "instance data" is parsed into memory. (2) Processing of the instance data involves interacting with the user and recording any changes in the data. (3) Upon submit, the instance data is serialized, typically as XML, and sent to a server... The XForms specification fully adopts the XML Schema data-types mechanism (including a narrower subset for small devices such as mobile phones) to provide additional data collection parameters such as maximum length or a regular expression pattern like an email address. This, combined with form-specific properties, is called the 'XForms Model' and is the basis for creating powerful forms that aren't dependent on scripts..." See the XForms 1.0 Working Draft published 28-August-2001 and the main reference page "XML and Forms."

  • [September 12, 2001] "Compatibility Encoding Scheme for UTF-16: 8-Bit (CESU-8)." Proposed Draft Unicode Technical Report #26. By Toby Phipps. Version URL: "This document specifies an 8-bit Compatibility Encoding Scheme for UTF-16 (CESU) that is intended as an alternate encoding to UTF-8 for internal use within systems processing Unicode in order to provide an ASCII-compatible 8-bit encoding that preserves UTF-16 binary collation. It is not intended nor recommended as an encoding used for open information exchange. The Unicode Consortium, does not encourage the use of CESU-8, but does recognize the existence of data in this encoding and supplies this technical report to clearly define the format and to distinguish it from UTF-8. This encoding does not replace or amend the definition of UTF-8."

  • [September 12, 2001] "In the Financial Flow. Financial Information Gets an XML Wrap Thanks to Users' Push for Open Standards [XML Enlarges the Funnel. XML is Opening Up the Financial Biz.]" By Mark Leon. In InfoWorld Volume 23, Issue 37 (September 10, 2001), page 37. ['Mark Hunt (Reuters, Director of E-Business) no longer sees a competitive edge in owning information standards.'] "Money may not grow on trees, but XML seems to be sprouting up everywhere -- and now the financial information industry has some fresh new leaves of its own. After years of protecting proprietary data systems, financial organizations are now working together on open XML standards that, according to analysts, will ease consumers' burden of juggling multiple financial information data formats. But they aren't doing it just to be nice. 'Companies like Reuters realize that they can no longer shove their proprietary standards down customers' throats,' says Dana Stiffler, an analyst at AMR Research in Boston. She explains that in the past these firms were able to dictate a proprietary messaging format, forcing their customers to buy specialized hardware and software to access financial information. Most people have seen images of stock brokers and traders surrounded by several terminals. The need for all the screens arose because none of the companies providing the vital information services were willing to use open messaging standards -- traders needed a different system for each financial data source. But the rise of the Internet, says Stiffler, has now made these same customers less tolerant of closed systems. Mark Hunt, director of e-business capability at Reuters in London, is well-acquainted with this trend toward openness. 'NewsML is a good example [of an open standard],' says Hunt. 'We took the lead on creating this as a standard for combining text, images, and video in an XML-based news feed. Now, we could have tried to own NewsML and make it a Reuters standard, but that is not the way the Web works.' NewsML binds text in multiple languages, images, and video together in a Web-based format accessible to Internet search engines... For Hunt, the funnel represents a host of open messaging standards -- and a wider funnel can hold more data. And one company can't possibly own that funnel, says Hunt, or others would have to license the product or software to access the funnel's data, so fewer sources will feed it: hence the move to create industry organizations and hammer out XML standards for just about any financial information topic. The new standards include FPML (Financial Products XML) and XBRL (Extensible Business Reporting Language). FPML is an XML format designed to handle complex financial instruments. 'It took 18 months just to define the DTD (Document Type Definition) for this,' says Hunt, noting that 'J. P. Morgan had a particular interest in FPML to ease the processing of interest rate derivatives.' XBRL is intended to introduce some consistency to the way information appearing in financial reports is formatted. 'Profits after tax may have a different meaning, depending on the country you are in,' says Hunt. 'XBRL tries to make this transparent to the consumers of this information.' Bridge's Hartley agrees that XML standards are changing the nature of his business. He adds MDDL (Market Data Definition Language) to the financial industry's alphabet soup..." See references in (1) Extensible Business Reporting Language (XBRL); (2) Financial Products Markup Language (FpML); (3) Market Data Definition Language (MDDL).

  • [September 12, 2001] "Digital Evolution draws up internal UDDI registries." By Charles Babcock [Interactive Week]. From ZDNet TechInfo. September 10, 2001. "Universal Description, Discovery and Integration registries are expected to provide a path to services over the Web. But Eric Pulier, president of Digital Evolution, believes UDDI is also good for providing services within the enterprise. The UDDI registry is something like the pages of the phone book. The UDDI Community, an industry consortium of 280 technology vendors and businesses, will eventually submit a mature specification to a standards body, such as the World Wide Web Consortium or the Internet Engineering Task Force... Digital Evolution is a company that provides a UDDI registry inside a company's firewall for employees and anointed business partners to access and use. The UDDI registry is something like the pages of the phone book. The XML-based specification provides support for contact names and Web addresses (white pages), an industry classification (yellow pages) and types of services offered (green pages). At some point, a series of UDDI servers are expected to exist around the Web like the current Domain Name System, which translates typed Web site names into TCP/IP addresses. By querying those servers, a Web application or other software could discover what services are available to it, what transactions they offer and what level of encryption is used... Digital Evolution is one of the first companies to seize the emerging UDDI standard and build a product line around it, though it is aimed inside the corporation at the IT manager rather than outside at software to software Web operations. 'We have a private UDDI registry. We seek to sell a suite of products that facilitate the use of Web services in an enterprise,' Pulier said. The products include: Data Consumer, a browser-based data sorter that allows narrowing a data set to what the user is interested in; Margin Call, which allows a server to be set up to store frequently requested data in main memory, leading to speedier responses; Code Mason, which automatically creates copies of stored procedures and data access classes of a database system, reducing the need for programmers to recreate them manually; and Java Trap, which creates a repository of XML files containing information about the environment in which a Java application will run..." See: "Universal Description, Discovery, and Integration (UDDI)."

  • [September 10, 2001] "High Performance Web Sites: ADO versus MSXML." By Timothy M. Chester, Ph.D. (Senior Systems Analyst, Computing & Information Services, Texas A&M University). TAMU CIS white paper. 15 pages. A related version has been published in Dr Dobb's Journal [DDJ] (October 2001) #329, pages 81-86 [Internet Programming. 'ADO and MSXML are tools that can be used to create high-performance web sites. MSXML provides flexibility, but ADO offers performance.' With listings.] "This article is about comparing the ASP/ADO and XML/XSL programming models. The emphasis is not on technique (although sample code is provided) but on performance. This article asks the question, 'MSXML is really cool, but how does it perform when compared to the ASP/ADO model I am already familiar with?' Like most web related issues, both methods have tradeoffs. I'll build two versions of a simple website, one using ASP and ADO to generate the user interface (UI), the other using MSXML. Then I will conduct benchmarks and compare the performance of both models. In some scenarios, ASP/ADO was found to perform better than MSXML. However, in other situations MSXML provided a ten-fold increase in performance... The Internet has evolved from simple static websites to web-based computing systems that support thousands of users. This evolutionary process experienced tremendous growth with the introduction of Microsoft's Active Server Pages (ASP), an easy-to-use and robust scripting platform. ASP makes it easy to produce dynamic, data driven webpages. The next big step came with ActiveX Data Objects (ADO) and the Component Object Model (COM). These tools allow developers to access multiple datasources easily and efficiently and best of all, in a way that is easy to maintain. Together, ASP and ADO provide the basic infrastructure for separating data from business and presentation logic, following the now infamous 'n-Tier architecture'. With the introduction of XML and XSL, websites are now taking another gigantic leap forward. In this article, I will compare the latest evolutionary leaps with an eye toward website performance by building two versions of a simple website - one using ASP and ADO to generate the user interface (UI) and the other using Microsoft's MSXML parser to transform XML/XSL documents. I will then conduct benchmarks and compare the throughput (transactions per second) of both models. Like most web related issues, both methods have tradeoffs. In some scenarios ASP/ADO performs better than MSXML. In other situations, however, MSXML provides an incredible performance advantage... [Summary:] Website performance is not a black and white subject, but is actually very, very, gray. One basic premise is often overlooked: the ways in which a website is coded has as much (or more) to do with performance than the power of the underlying web server. ADO and MSXML are tools that can be used to create high performance websites. MSXML provides increased flexibility to developers, but at a cost. When drawing data directly from a database, MSXML performs slower than ADO. However, MSXML provides an easy way to cache the presentation of data, thereby providing up to a ten fold increase in website performance. This is a viable solution for websites that need to support thousands of concurrent users..."

  • [September 10, 2001] "VoiceXML and the Voice/Web Environment. Visual Programming Tools for Telephone Application Development." By Lee Anne Phillips. In Dr Dobb's Journal [DDJ] (October 2001) #329, pages 91-96. Programmer's Toolchest. "While the Internet is making inroads into the public switched-telephone network, XML protocols such as VoiceXML are providing access to a set of tools that address the entire range of web applications..." The article provides an overview of GUI tools for creating VoiceXML applications, and reviews two: Visual Designer 2.0 from Voxeo, and Covigo Studio. [Covigo Studio "provides a visual programming environment that helps you to rapidly develop integrated mobile data and voice applications. Based on a user-centric process modeling approach, Studio separates user-interaction workflow from presentation design and data source integration. It allows you to build mobile applications from the ground-up or as extensions to existing applications, and to constantly optimize their applications to meet changing user, industry and business needs. The visual modeling approach provides multiple ways to integrate with existing enterprise applications at the presentation layer, business logic layer, or data layer levels. The product integrates with existing IT systems - including complex enterprise business processes encapsulated in systems used for customer relationship management (CRM), enterprise resource planning (ERP), and supply chain automation (SCM). This includes integrating with such technologies as HTML, JSPs, EJBs, JDBC, XML, and packaged application APIs..." The Visual Designer 2.0 from Voxeo is available at no cost. One can use the designer "to visually design phone applications and it will automatically generate the VoiceXML or CallXML markup for you. This allows a voice application developer to focus on important issues like usability and functionality, without having to worry about syntax. Voxeo Designer 2.0 is the first visual phone markup design tool to fully support round-trip development -- any CallXML or Voice XML application may be opened in the Designer tool, updated graphically (or by editing the XML directly) and re-deployed for use. Features include: Visual application design using flowcharts; Full round-trip, bi-directional development; Element/Attribute syntax validation; FTP and HTTP support for file read and write; Full CallXML Tag Support; Full VoiceXML 1.0 Tag support; 100% Pure-Java IDE, runs on any Java Virtual Machine ..."] Additional resources with Lee Anne's article include listings and source code. See "VoiceXML Forum."

  • [September 10, 2001] "Regular Expressions in C++. Text processing for C/C++ programmers." By John Maddock. In Dr Dobb's Journal [DDJ] (October 2001) #329, pages 21-26. "Regular expressions form a central role in many programming languages, including Perl and Awk, as well as many familiar UNIX utilities such as grep and sed. The intrinsic nature of pattern matching in these languages has made them ideally suited to text processing applications, particularly for those web applications that have to process HTML. Traditionally, C/C++ users have had a hard time of it, usually being forced to use the POSIX C API functions regcomp, regexec, and the like. These primitives lack support for search and replace operations and are tied to searching narrow character C-strings. Some time ago, I began work on a modern regular expression engine that would support both narrow- and wide-character strings, as well as standard library-style iterator-based searches. I call this library 'regex++', available at; it was accepted as part of the peer-reviewed boost library. In this article, I'll show how regex++ can be used to make C++ as versatile for text processing as script-based languages such as Awk and Perl... I do not intend to discuss the regular expression syntax in this article, but the syntax variations supported by regex++ are described online. The documentation for Perl, Awk, sed, and grep are other useful sources of information, as is the Open UNIX Standard... This article shows some of the power that regular expressions in C++ can give you. Regex++ does not seek to replace traditional regex tools such as lex. Rather, it provides a more convenient interface for rapid access to all kinds of pattern matching and text processing -- something that has traditionally been limited to scripting languages. In addition, it provides a modern iterator-based implementation that allows it to work seamlessly with the C++ Standard Library, providing the versatility that C++ users have come to expect from modern libraries."

  • [September 10, 2001] "Rampant Confusion." By Chad Dickerson [InfoWorld CTO]. In InfoWorld Volume 23, Issue 37 (September 7, 2001), page 12. "I'm going to start this week's column by making a couple of hype-challenged statements: XML is inherently useless; and Web services, although it's the next big thing to nontechnical folks, has been chugging along quietly for a few years without much fanfare. In my role as CTO, I sit in a lot of meetings where I act as translator between the business folks and the engineers. XML, which works quite well in helping machines talk to each other, creates quite a lot of confusion when people talk about it. Many of the discussions go as follows: Business person: 'We need to integrate data from Company X into our Web site.' Me: 'What format will the data be in?' Business person (smiling broadly): 'XML; it's all XML.' Me: 'OK, I'll need to have an engineer look at how they structure their data so we can process it properly and integrate it into the site.' Business person (smile weakening): 'But it's in XML. ... ' Me: 'Great, I'm glad it's in XML format. We need some time to port the data into our database, do QA, and make sure we process the data feed properly as it comes in.' Business person (frown developing): 'But it's in XML. ... ' At this point I start explaining that receiving an XML feed is the beginning of an integration process, not the end. To paraphrase from the XML FAQ: XML is a markup specification language and XML files are data: They just sit there until you run a program which displays them (like a browser), or does some work with them (like a converter which writes the data in another format, or a database which reads the data), or modifies them (like an editor). In other words, as much as we all love it, XML alone is more or less useless. Although XML can be wonderful for trading data among applications, applications do not magically appear around XML documents. XML does, however, function as a great point of leverage for applications, which leads us to Web services... The term Web services confuses many people, and what was supposed to make things easier is making things more difficult. But this is mainly due to lack of clarity in marketing, not shortcomings in what is essentially an extraordinarily simple and powerful concept... the XML-RPC specification provides an easily grasped window into the technical promise of Web services, while also serving as a spirited manifesto for the then-new Web services world order. When I grow confused about what Web services means, I read the XML-RPC spec and it makes sense again..." See: "XML-RPC."

  • [September 10, 2001] "XML-RPC for PHP, Version 1.0." By Edd Dumbill. Documentation. Version 1.0 is available for download. "The 1.0 release is the final release to be managed by Useful Information Company... We've developed classes which encapsulate XML-RPC values, clients, messages and responses. Using these classes it's possible to query XML-RPC servers. XML-RPC is a format devised by Userland Software for achieving remote procedure call via XML. XML-RPC has its own web site, The most common implementations of XML-RPC available at the moment use HTTP as the transport. A list of implementations for other languages such as Perl and Python can be found on the web site. This collection of PHP classes provides a framework for writing XML-RPC clients and servers in PHP..." [Edd's XML-DEV post: "So, it took me two years to get brave enough to call it '1.0', but here it is. I finally reckon my all-PHP classes for doing XML-RPC are 'stable.' Available under the BSD license. More detail at A good time to note too that I've moved the project to SourceForge as well (which turns out to surpass my expectations in niftiness), and have already gained two more developers on the project. It is my intent to step down as maintainer as soon as a suitable replacement emerges..." Note also the book Programming Web Services with XML-RPC, by Simon St.Laurent, Joe Johnston, and Edd Dumbill [foreword by Dave Winer]. O'Reilly, June 2001. "XML-RPC, a simple yet powerful system built on XML and HTTP, lets developers connect programs running on different computers with a minimum of fuss. Java programs can talk to Perl scripts, which can talk to ASP applications, and so on. With XML-RPC, developers can provide access to functionality without having to worry about the system on the other end, so it's easy to create web services... Programming Web Services with XML-RPC introduces the simple but powerful capabilities of XML-RPC, which lets you connect programs running on different computers with a minimum of fuss, by wrapping procedure calls in XML and establishing simple pathways for calling functions. With XML-RPC, Java programs can talk to Perl scripts, which can talk to Python programs, ASP applications, and so on..." See: "XML-RPC."

  • [September 10, 2001] "Use XML as a Java Localization Solution. The reusability that XML affords TMX-formatted data benefits Java internationalization development." By Masaki Itagaki. From LISA web site. "Java has been one of the best programming languages for global market-oriented application development since JDK 1.1 covered basic components for internationalization. Java has many internationalization approaches supporting such aspects as Unicode 2.0, multilingual environment, and Locale objects, to name a few. However, you still have to consider the daunting, fundamental work that is required for a global market, which means translating all text items such as labels, messages, menu items, and so on. Even for these kinds of localization issues, Java offers a nice solution in the ResourceBundle class. You can extract all the text items from original source codes, isolating them into ResourceBundle components such as a ListResourceBundle class or a property file. Although such a scheme makes a developer's life much easier, it's rather clumsy from the translation point of view, especially in terms of reusability of translations. In the localization industry, Translation Memory eXchange (TMX) is a standardized data format that uses XML for software and document translation assets. Most of the commercial translation tools can use the TMX file to reuse translation data. Translators who want to use the TMX solution for Java must implement their own data conversion between TMX and ResourceBundle data... Since 1997 the localization industry has put a lot of effort into standardizing a translation data format. The Localization Industry Standards Association (LISA), a nonprofit internationalization and localization organization, formed a special interest group called Open Standards for Container/Content Allowing Reuse (OSCAR) to define a translation memory data format and publish the TMX standard. This is simply XML-formatted data defining elements and attributes that are necessary to organize translation data efficiently... Most benefits of the TMXResourceBundle class are on the development side. Since the number of words usually determines the cost of translation, requesting translation of the same items is not cost efficient. Using TMX's DTD, you can also embed such information as a package name, a class name, and a project name. This gives you an exact match in translation data, which enables you to extract only new items. Meanwhile, if you want to achieve consistency between software translation and document translation (such as guides, manuals, and even computer-based training programs), TMX proves to be a great solution. By importing your Java TMX file into any translation tool, you can reuse Java translations through a word book or glossary functions, which are included in most translation tools. Thus, TMX benefits not just the translation industry, but Java internationalization development, as well..." Article originally published in JavaPro Magazine. See: "Translation Memory Exchange (TMX)."

  • [September 10, 2001] "Quality of Service Extension to IRML." IETF INTERNET-DRAFT 'draft-ng-opes-irmlqos-00.txt.' July 2001. By Chan-Wah Ng, Pek Yew TAN, and Hong CHENG (Panasonic Singapore Laboratories Pte Ltd). "The Intermediary Rule Markup Language (IRML) is an XML-based language that can be used to describe service-specific execution rules for network edge intermediaries under the Open Pluggable Edge Services (OPES) framework, as described in "Extensible Proxy Services Framework" and "Example Services for Network Edge Proxies". This memo illustrates examples of employing the IRML for Quality of Service (QoS) policing and control, and suggests extensions to IRML for better QoS support. This memo begins in Section 2 by illustrating a few scenarios where QoS policing and control can be incorporated into the OPES intermediary. From there, a set of preliminary requirements for QoS extension to the IRML is drafted in Section 3. Section 4 proposed a set of QoS extension to the 'property' element defined in the IRML, and Section 5 presents some examples illustrating possible use of these extensions." [cache]

  • [September 10, 2001] "Sub-System Extension to IRML." IETF INTERNET-DRAFT 'draft-ng-opes-irmlsubsys-00.txt.' July 2001. By Chan-Wah Ng, Pek Yew TAN, and Hong CHENG (Panasonic Singapore Laboratories Pte Ltd). "The Intermediary Rule Markup Language (IRML) is an XML-based language that can be used to describe service-specific execution rules for network edge intermediaries under the Open Pluggable Edge Services (OPES) framework. This memo discusses the need for OPES framework to have different sub-systems in different deployment scenario, and proposes additions to IRML for a more flexible approach to supporting different sub-systems. Section 2 presents the motivation behind having sub-systems support in IRML. Section 3 proposes a set of QoS extension to the 'property' element defined in the IRML, and Section 4 presents some examples illustrating possible use of these extensions." See the revised proposed IRML DTD. [cache]

  • [September 10, 2001] "Web Services Spells Changing Tide for Systems Integration." By Mark Jones, Ed Scannell, Tom Sullivan, Brian Fonseca, and Eugene Grygo. In InfoWorld Volume 23, Issue 37 (September 7, 2001), pages 21, 24. "Emerging Web services pose a unique challenge to the likes of HP, Compaq, and IBM Global Services, companies keenly aware that sustainable revenue growth is tied to their IT services capabilities. Driving the revenue shift is an understanding that new methods of application integration dovetail with the generally understood definition of Web services: Loosely coupled software components are delivered over the Internet via standards-based technologies such as XML and SOAP (Simple Object Access Protocol). As a result, Web services represent a new component architecture for building and distributing applications and facilitating the integration process. The challenge, in the view of some observers, is that as systems integrators look at how they can deliver Web services, they must adapt to the revenue shift by offering higher value-added services such as business process management. A study released in late August by Jupiter Media Metrix reflects more than pure enthusiasm for Web services. Jupiter states that 60 percent of business executives interviewed plan to deploy Web services for integrating internal applications during the next year. Also, a recent Gartner report states that through the second half of 2002, 75 percent of enterprises with more than $100 million in revenue will interface periodically with Web services. But despite the lofty claims that Web services promotes value-added business, industry participants agree that converting the dream to reality will not be a walk in the park -- particularly given that the concept of prebundled software components is not a new idea... Based on findings from its Web services report, Jupiter argues Web services in reality will not enable companies to sell computational services to parties they might not have prior relationships with. Obstacles include inertia around existing, comfortable relationships and the need for proven security and trust payment models; it will take years to open up the promise of new Web services business channels... When will systems integrators, and enterprise customers in turn, really start to feel the changes brought about by Web services? Estimates vary widely. Some executives say within 12 months, others talk in terms of the next five years. Perhaps the next important signpost will come during the second half of 2002, when analysts say Web services technology will mature to the point that enterprise application vendors will be rearchitecting all of their software around common standards..."

  • [September 10, 2001] "Let Your DOM Do The Walking. A Look at the DOM Traversal Module." By Brett McLaughlin (Enhydra strategist, Lutris Technologies). From IBM developerWorks. August 2001. "The Document Object Model (DOM) offers useful modules to extend its core functionality in advanced ways. This article examines the DOM Traversal module in depth, showing how to find out if this module is supported in your parser and how to use it to walk either sets of selected nodes or the entire DOM tree. You'll come away from this article with a thorough understanding of DOM Traversal, and a powerful new tool in your Java and XML programming kit. Eight sample code listings demonstrate the techniques. If you have done much XML processing during the last three years, you've almost certainly come across the Document Object Model, or DOM for short. This object model represents an XML document in your application, and it provides a simple way to read XML data and to write and change data within an existing document (see Resources for more background if you're new to the DOM). If you're on your way to being an XML guru, you've probably learned the DOM backward and forward, and you know how to use almost every method that it offers. However, there is a lot more to the DOM than most developers realize. Most developers actually have experience with the core of the DOM. That means the specification that outlines what represents the DOM, how it should operate, what methods it makes available, and so forth. Even experienced developers do not have much knowledge or understanding of the variety of extra DOM modules that are available. These modules allow developers to work more efficiently with trees, deal with ranges of nodes at the same time, operate upon HTML or CSS pages, and more all with an ease not possible using just the core DOM specification. Over the next few months, I plan articles to detail several of the modules, including the HTML module -- the Range module -- and in this article, the Traversal module. Moving through DOM trees in a filtered way makes it easy to look for elements, attributes, text, and other DOM structures. You should also be able to write more efficient, better organized code using the DOM Traversal module. Learning to use DOM Traversal, you'll see how quickly it can move throughout a DOM tree, build custom object filters to easily find the data you want, and walk a DOM tree more easily than ever. I'll also introduce you to a utility that lets you check your parser of choice for specific DOM module support, and along the way I'll manage to throw a lot of other sample code in as well..." See: "W3C Document Object Model (DOM)."

  • [September 07, 2001] "Markup Languages: Comparison and Examples." By Yolanda Gil and Varun Ratnakar (USC/Information Sciences Institute, TRELLIS project). 2001-09-07 or later. ['We are making available a comparison table that we created to understand the tradeoffs and differences among markup languages along common knowledge representation requirements. It compares XML Schema, RDF Schema, and DAML+OIL. For each dimension of comparison, the table includes a description of how each language handles that issue and hyperlinks to examples.'] "Below is a comparison table that we created to understand the tradeoffs and differences among markup languages. It compares XML (Extensible Markup Language), RDF (Resource Description Framework), and DAML (DARPA Agent Markup Language) by showing a description and examples of how each language addresses common knowledge representation requirements. We are preparing an article describing this comparison in detail. If you have any comments or suggestions, please email them to us.. Our interest in markup languages stems from: (1) Research on TRELLIS, a framework to help users create semantically annotated traces and rationale for their decisions. A prototype of TRELLIS has just been released and you can try it here. (2) Our research on PHOSPHORUS, an ontology-based agent matchmaker that the ISI Electric Elves framework uses to support human organizations..." See "DARPA Agent Mark Up Language (DAML)."

  • [September 04, 2001] "Software Component Certification." By John Morris, Gareth Lee, Kris Parker, and Gary A. Bundell (University of Western Australia); Chiou Peng Lam (Murdoch University). In IEEE Computer Volume 34, Number 9 (September 2001), pages 30-36. "Most current process-based methods for certifying software require software publishers to 'take oaths concerning which development standards and processes they will use.' Jeffrey Voas, among others, has suggested that independent agencies -- software certification laboratories (SCLs) -- should take on a product certification role. The authors accept that this approach may work well for certain software distribution models, but they also observe that it cannot be applied to all software development. Third-party SCLs would add unnecessarily to the costs that small developers incur by speculating on the success of a given component. However, supplying complete test sets with components incurs little additional cost because component authors must generate the tests in the first place. Any extra effort adds value to a component because a tested component certainly offers a more marketable commodity. The authors believe that while SCLs have a place in large or safety-critical software projects, there will always be small commercial-software developments for which failure represents a moderate cost. In such cases, the cost of generating and inspecting tests can be justified... If developers are to supply test sets to purchasers, they will need a standard, portable way of specifying tests so that a component user can assess how much testing the component has undergone. Potential customers can then make an informed judgment about the likely risk of the component failing in their application, keeping in mind the nature of the tests and the intended application. To fill this role, we designed a test specification that aims to be (1) standard and portable; (2) simple and easy to learn; (3) devoid of language-specific features; (4) equally able to work with object-oriented systems, simple functions, and complex components such as distributed objects or Enterprise JavaBeans; (5) efficient at handling the repetitive nature of many test sets; (6) capable of offering widely available and easily produced test-generation tools that do not require proprietary software; (7) free of proprietary-software requirements for interpreting and running the tests; and (8) able to support regression testing. We based our test pattern document format on the W3C's Extensible Markup Language, which satisfies most of our requirements. XML is a widely adopted general-purpose markup language for representing hierarchical data items. We have defined an XML grammar, specialized for representing test specifications, published in the form of a document type definition (DTD) that can be downloaded from our Web site. XML is well suited to representing test specifications because it adheres to a standard developed by an independent organization responsible for several other widely accepted standards. It has achieved broad acceptance across the industry, leading to the development of editors and parsers for a variety of platforms and operating systems. Further, XML's developers designed the language to provide structured documents, which support our test specifications well. XML documents -- laid out with some simple rules -- can be read and interpreted easily. Several readily available editors make understanding the language easier by highlighting its structure and providing various logical views. To keep the test specification simple and easy to use, we defined a minimal number of elements for it. Rather than adding elements to support high-level requirements, we allow testers to write helper classes in the language of the system they are testing. This approach gives testers all the power of a programming language they presumably already know and avoids forcing them to learn an additional language solely for testing...The specification uses the terminology of object-oriented designs and targets a class's individual methods. However, it can describe test sets for functions written using non-OO languages such as C or Ada equally well. As long as a well-defined interface exists, a tester can construct MethodCall elements..." See "SCL Component Test Bed Specification."

  • [September 04, 2001] "Crouching Error, Hidden Markup." By Neville Holmes. In IEEE Computer Volume 34, Number 9 (September 2001), pages 126-128. ['Holmes compares Script, Roff, and Tex to MS Word: its lack of a versatile and visible markup language can make using Microsoft Word a nightmare -- and reflects poorly on our profession.'] "... Word lets a user load and save documents with markup codes for formats such as Hypertext Markup Language (HTML) and Rich Text Format (RTF) -- but must hide some kind of markup language beneath its own fancy fagade. What Word lacks, however, is an overt means for formally marking up plain text while developing the document. I get the impression that Word's developers add formatting features impulsively, without the unifying philosophy or moderating principles that an underlying plain-text markup scheme would foster. Markup conventions have a rich history. If you take a long-term view, markup conventions have been used in the data processing industry for thousands of years. Markup is conventional annotation designed to convey guidance to the user of plain text about the text's intended treatment: This guidance originally applied to how the text should be read aloud and is otherwise known as punctuation... We need a standard markup language with the depth and versatility of TeX that would let us use a single marked-up, plain-text source file to specify a printed document and screen layout, allowing user interaction with the content and layout. Adopting Knuth's approach, or even TeX itself, would allow (1) a standard for handling text in any language or language mixture that uses the Latin alphabet, (2) the creative symbolism within the plain 7-bit ASCII character set necessary for trademarks and currency symbols, and (3) a basis for similar standards for other writing systems. With such support, document formatting would provide a sound basis for content-oriented standards. By developing and adopting such a standard, the computing industry would move toward the maturity the printing industry attained during its movable type era, and its professionals might enjoy some of the respect accorded typographers and compositors in their time. Microsoft might even be able to give Word software the look and feel of new rope..."

  • [September 03, 2001] "Business Process Specification Schema." Version 1.01. By OASIS ebXML Business Process Team. Non-normative version formatted for printing, July 2001. [11 May 2001.] Latest version URL: Formal notations in appendices A-C: Appendix A: Sample XML Business Process Specification; Appendix B: Business Process Specification Schema DTD; Appendix C: Business Process Specification Schema XML Schema. "The ebXML Specification Schema provides a standard framework by which business systems may be configured to support execution of business collaborations consisting of business transactions. It is based upon prior UN/CEFACT work, specifically the metamodel behind the UN/CEFACT Modeling Methodology (UMM) defined in the N090R9.1 specification. The Specification Schema supports the specification of Business Transactions and the choreography of Business Transactions into Business Collaborations. Each Business Transaction can be implemented using one of many available standard patterns. These patterns determine the actual exchange of Business Documents and business signals between the partners to achieve the required electronic commerce transaction... This document describes the Specification Schema, both in its UML form and in its DTD form. The document first introduces general concepts and semantics, then applies these semantics in a detail discussion of each part of the model. The document then specifies all elements in the UML form, and then in the XML form... Business process models describe interoperable business processes that allow business partners to collaborate. Business process models for e-business must be turned into software components that collaborate on behalf of the business partners. The goal of the ebXML Specification Schema is to provide the bridge between e-business process modeling and specification of e-business software components. The ebXML Specification Schema provides for the nominal set of specification elements necessary to specify a collaboration between business partners, and to provide configuration parameters for the partners' runtime systems in order to execute that collaboration between a set of e-business software components. A specification created against the ebXML Business Process Specification Schema is referred to as an ebXML Business Process Specification. The ebXML Business Process Specification Schema is available in two stand-alone representations, a UML version, and an XML version. The UML version of the ebXML Business Process Specification Schema is merely a UML Class Diagram. It is not intended for the direct creation of ebXML Business Process Specifications. Rather, it is a self-contained statement of all the specification elements and relationships required to be able to create an ebXML compliant Business Process Specification. Any methodologies and/or metamodels used for the creation of ebXML compliant Business Process Specifications must at minimum support these elements and relationships. The XML version of the ebXML Business Process Specification Schema provides the specification for XML based instances of ebXML Business Process Specifications, and as a target for production rules from other representations. Both a DTD and a W3C Schema are provided. The UML and XML based versions of the ebXML Business Process Specification Schema are unambiguously mapped to each other..." See: "Electronic Business XML Initiative (ebXML)" [references] and the ebXML Web Site.

  • [September 03, 2001] "RELAX NG DTD Compatibility." OASIS [RELAX NG TC] Working Draft 3-September-2001. Edited by James Clark and MURATA Makoto. Abstract: "This specification defines datatypes and annotations for use in RELAX NG schemas. The purpose of these datatypes and annotations is to support some of the features of XML 1.0 DTDs that are not supported directly by RELAX NG." From the Introduction: "RELAX NG provides two mechanisms for extensibility: (1) RELAX NG schemas can reference external libraries of datatypes; (2) in a RELAX NG schema, RELAX NG-defined elements can be annotated with child elements and attributes from other namespaces. The goal of this specification is to facilitate transition from XML 1.0 DTDs to RELAX NG schemas by using these extensibility mechanisms to support some of the features of XML 1.0 DTDs that are not supported by RELAX NG. RELAX NG itself performs only validation: it does not change the [XML] Infoset of an XML document. Most of the features of XML 1.0 DTDs that are not supported by RELAX NG involve modification to the infoset. In XML 1.0, validation and infoset modification are combined in a monolithic XML processor. It is a goal of this specification to provide a clean separation between validation and infoset modification, so that a wide variety of implementation scenarios are possible. In particular, it should be possible to make the infoset modifications either before performing RELAX NG validation or after performing RELAX NG validation or without performing RELAX NG validation at all. It should also be possible for an implementation of this specification not to modify the infoset at all and instead provide the application with a description of the modifications implied by the annotations, independently of any particular instance. This specification does not provide any support for features of XML 1.0 DTDs, such as entity declarations, that cannot be cleanly separated from validation. In an XML 1.0 document that is valid with respect to a DTD, each element or attribute in the instance has a unique corresponding element or attribute declaration in the DTD. With RELAX NG this is not always the case: it may be ambiguous which element or attribute pattern any particular element or attribute in the instance matches. In addition, it is non-trivial to determine when a RELAX NG schema is ambiguous. A further complication is that even when cases where it is not ambiguous, it may require multiple passes or lookahead to determine which element or attribute pattern a particular element or attribute matches. Detecting this situation is also non-trivial. Some features of XML 1.0 DTDs, in particular default attribute values and ID/IDREF/IDREFS validation, depend crucially on this unambiguous correspondence between elements or attributes in the instance and their corresponding declarations. In order to support these features in RELAX NG schemas by means of datatypes and annotations, it is therefore necessary to impose restrictions on the use of these datatypes and annotations..." See: "RELAX NG."

  • [September 03, 2001] "Guidelines for using W3C XML Schema Datatypes with RELAX NG." OASIS [RELAX NG TC] Working Draft 3-September-2001. Edited by James Clark and MURATA Makoto. [James Clark: 'I have written a first draft of the guidelines for using XML Schema Datatypes with RELAX NG...'] Abstract: "This document specifies guidelines for using the datatypes defined by W3C XML Schema Datatypes with RELAX NG." [...] "the URI should be used to identify the datatype library. The library identified by this URI contains all the builtin datatypes of W3C XML Schema Datatypes, both primitive and derived. Parameters: Any facet can be specified as a parameter with the following exceptions whiteSpace (the builtin derived datatype that specifies the desired value for the whiteSpace facet should be used instead) enumeration (the value element should be used instead) If the pattern parameter is specified more than once for a single data element, then a string matches the data element only if it matches all of the patterns. It is an error to specify a parameter other than pattern more than once for a single data element..." See: "RELAX NG."

August 2001

  • [August 31, 2001] "Integration of XML Data in XPathLog." By Wolfgang May (Institut für Informatik, Universität Freiburg). Presented at the CAiSE Workshop Data Integration over the Web (DIWeb'01) June, 4/5, 2001, Interlaken, Switzerland. 15 pages. "XPathLog is a logic-based language for manipulating and integrating XML data. It extends the XPath query language with Prolog-style variables. Due to the close relationship with XPath, the semantics of rules is easy to grasp. XPathLog defines a semantics for XPath expressions in rule heads, declaratively specifying how to create and update XML trees and nodes. In this paper, we show how XPathLog can be used to manipulate and restructure a database containing several XML trees. By linking subtrees, fusing elements and defining synonyms, data can be restructured and integrated into result trees. We illustrate the practicability of the approach by excerpts of a case study done with the LoPiX system... We propose a declarative, Prolog-style language for manipulation and integration of XML documents. The syntax and querying semantics is based on XPath. Whereas XSLT, XML-QL, and Quilt/XQuery use XML patterns for generating output (with the consequence that their output can only generate XML, but it cannot be used for manipulating an existing XML instance), our language deviates from these approaches: XPath-based syntax is used for querying (rule bodies) and generating/manipulating the data (rule heads). Also in contrast to the XML mainstream (i.e., the DOM and XML Query Data Model, XPathLog works on an abstract, edge-labeled graph-based data model as an internal representation of the current XML database. This design decision has been motivated by the experiences with F-Logic/FLORID in information integration. The data model is especially tailored to data integration, allowing to re-link elements into multiple overlapping trees, fusing elements, and introducing synonyms for subelement and attribute names (note that XML-QL is also based on a graph data model, influuenced by the STRUDEL project). In the present paper we keep the formal, theoretical part in the background... To our knowledge, XPathLog is the rst implemented, declarative, native XML language which allows for view denition and updates. XPathLog is completely XPath-based, ensuring that its declarative semantics is well understood from the XML perspective. Especially the nature of rule based bottom-up programming is easily understandable for XSLT practitioners, providing even more functionality. We expect that XPathLog is especially well-suited for data integration where expressive languages are needed for declaratively specifying powerful strategies, ranging over data, metadata, and meta-metadata." See details in the 2001-08-31 news item.

  • [August 31, 2001] "A Logic-Based Approach to XML Data Integration." Technical Report. By Wolfgang May. 262 pages. "In this work, a logic-based framework for XML data integration is proposed. XPath-Logic extends the XPath language with variable bindings and embeds it into first-order logic, interpreted over an edge-labeled graph-based data model. XPathLog is then the Horn fragment of XPath-Logic, providing a Datalog-style, rule-based language for manipulating and integrating XML data. In contrast to other approaches, the XPath syntax and semantics is also used for a declarative specification [expressing] how the database should be updated: when used in rule heads, XPath filters are interpreted as specifications of elements and properties which should be added to the database. Due to the close relationship with XPath, the semantics of rules is easy to grasp. In addition to the logic-based semantics of XPath-Logic, we give an algebraic semantics for evaluating XPathLog queries based on answer-sets. The formal semantics is defined in terms of a graph-based model which covers the XML data model, tailored to the requirements of XML data integration. It is not based on the notion of XML trees, but represents an XML-style (i.e., based on elements and attributes) database which simultaneously represents individual, overlapping XML trees as views of the database. The 'pure' XPathLog data model is extended with expressive modeling concepts such as a class hierarchy, nonmonotonic inheritance, and a lightweight signature concept. Information integration in this approach is based on linking elements from the sources into one or more result trees, creating elements, fusing elements, and dening access paths by synonyms. By these operations, the separate source trees are developed into a multiply linked graph database in which one or more result tree views can be distinguished by projections. The combination of data and metadata reasoning is supported by seamlessly adding XML Schema trees and even ontology descriptions to the internal database. XPathLog has been implemented in LoPiX. The practicability of the approach is demonstrated by a case study which also serves as a running example. The first part of the essay is dedicated to an overview of the development of XML-related concepts which also motivates the design decisions of the XPathLog framework." See details in the 2001-08-31 news item. [cache, cache]

  • [August 31, 2001] "Working with Language Identifiers. Current and Developing Standards for Distinguishing Languages in a Multilingual Environment." By Peter Constable (Non-Roman Script Initiative, SIL International). In MultiLingual Computing and Technology Volume 12 Issue 6 [#42] (June 2001), pages 63-69. ISSN: 1523-0309. The author provides an overview of standards and systems for language identification, including the notion of locales. Language identification mechanisms are surveyed for Win32 platforms, Apple/Mac (Carbon, Cocoa), and for the Microsoft .NET framework. IETF (RFC 1766/3066) and ISO (639-1, 639-2) language tag inventories are described, together with reference to the SIL (Ethnologue) codes. The author believes that ISO committees (TC 37, TC 46), IETF, and UTC are ready to cooperate in the design of solutions which embrace additional language codes for a wider range of computing applications. On the Ethnologue and recent proposals for its further standardization, see "Language Identifiers in the Markup Context: Ethnologue."

  • [August 31, 2001] Draft Technical Report: Language Codes Part 3. By John Clews. Proposal. Document Reference: ISO/TC37/SC2/WG1 N74. Date: 2001-07. 12 pages. "This draft technical report has been prepared taking into account the aims and needs expressed in the document ISO/TC37/SC2/WG1 N69: Coding systems', prepared on 2001-01-31 by Håvard Hjulstad (convenor of ISO/TC37/SC2/WG1) in Norway... Language codes part 3 lists language codes used in ISO 639-1 and ISO 639-2, and also provides information on additional language codes used in other coding systems. This is provided in a detailed table. It plans to provide information on which language codes from other coding systems are safe to use in addition to codes from ISO 639-1 and ISO 639-2, and guidelines on avoiding problems. There is the potential to develop a further full standard (a notional ISO 639-3) which would provide a much-extended list of language codes, in comparison to that currently available, to meet user needs. However, the initial aims is to provide documentation, and that is the principle aim of this draft technical report... The table supplies for each entry a reference identifier, Users, Area, Associated Country, Language Name, and mapping to other language code lists (as applicable), including (1) I-2 [2-letter codes from ISO 639 and ISO 639-1, and new codes applied by the ISO 639 Maintenance Agency]; (2) I-3T [3-letter codes from ISO 639-2, and new codes applied by the ISO 639-2 Maintenance Agency] (3) SIL [3-letter codes from the Ethnologue, published by the Summer Institute of Linguistics, SIL]; (4) OT [3-letter OpenType language tags, developed by Adobe and Microsoft, widely used in the IT industry]; (5) I-3B [3-letter bibliograhic codes from ISO 639-2, and national variants of these codes used in libraries]; (6) Linguascale [a classification system providing a way of refering to related languages, documented in the Linguasphere Register]... This document is also available in HTML format. See related references in "Language Identifiers in the Markup Context." [source, .DOC]

  • [August 31, 2001] "Sun Details StarOffice 6." By David Worthington. In eWEEK (August 30, 2001). "This week at LinuxWorld, Sun Microsystems Inc. is demonstrating the latest incarnation of its popular StarOffice Suite several weeks after placing a teaser on its Web site to gauge public interest. Overall, the focus of this release will be centered on ease of use rather than adding an overabundance of new features. Performance, compatibility and the introduction of XML (Extensible Markup Language) as the suite's default file format are among several areas that have received attention by developers. Set for a public beta in early October and release some time in the first quarter of 2002, StarOffice has undergone numerous changes since Version 5.2. Sun has made good on it promise to target key areas cited by user feedback and has opted to remove the much maligned integrated desktop. In addition, performance has been enhanced through componentization. The entire suite will not load when users simply wish to perform a routine task -- leading to quicker load times and a lower utilization of system resources... As always, popular formats such as binary and files saved with Microsoft Office are supported. However, Sun is banking on XML to provide universal compatibility to its product while side-stepping proprietary formats. With XML, the recipient of a file will not be required to have StarOffice installed to view it. Not only will XML provide for smaller file sizes, it also opens the door to interactivity. Once technology progresses, users will be able to edit files through a Web browser. Continuing its push toward the Web applications, Sun is banking on Sun One Webtop to bring its productivity suite to the masses. As far as Sun is concerned, the adoption of XML, combined with its open-source business model, is laying the groundwork for the future... To sign up for early notification of the beta release, visit the StarOffice 6 home page." See: "StarOffice XML File Format."

  • [August 31, 2001] "XML for Analysis Decoded." By Seth Grimes (Alta Plana Corp.). In Intelligent Enterprise Volume 4, Number 13 (August 31, 2001), pages 20-22, 50. ['The XML for Analysis API has snared widespread support. But will the big fish slip through the .Net in favor of Java and proprietary APIs?'] "Now computing technologies typically bring forth a torrent of buzzwords, acronyms, and jargon. Extensible markup language (XML) is now: It has been hyped as a data-interchange panacea and is a key component in industry battles for 'Web services' ascendancy. One of its dialects, XML for Analysis, is the subject of this column. Brace yourself... Web services platforms like Microsoft's .Net initiative are built on XML, the simple object access protocol (SOAP), Web services description language (WSDL), and universal description, discovery, and integration (UDDI). A UDDI directory catalogs services, most with WSDL descriptions of the services' locations and protocols. These services would, in turn, be accessed over SOAP, which is a masterpiece of simplicity and usefulness: a channel for supporting remote procedure calls (RPCs) over the applications protocol that binds the Web, HTTP. Some platforms, notably BEA System's WebLogic, iPlanet, and IBM's WebSphere, support both J2EE and SOAP... Object linking and embedding (OLE) is a mechanism for software-component interoperability within Microsoft's proprietary COM. OLE DB for OLAP is tied to the COM platform, which even Microsoft admits is ill-suited to distributed object computing over the Internet. Microsoft is replacing COM with the .Net Web-services strategy; XML for Analysis is .Net's interface for analytic services. Hyperion joined Microsoft's initiative soon after the release of the beta specification last fall and cosponsored the recently issued 1.0 release. Twenty-two other software vendors endorsed the specification. Notably absent were BI heavyweights IBM, Oracle, and SAS Institute and metadata master Informatica... XML for Analysis is more than a set of press releases, but it's also far from shipping in commercial client tools or analytic databases. Microsoft has released a software development kit (SDK) for the Visual Studio.Net development environment, which itself is slated to ship later this year, but it's likely that an XML for Analysis enabled database won't ship before next year and that few third-party client tools will ship before mid-2002... XML for Analysis will compete not only with mature, highly functional programmatic APIs that have been widely implemented by software vendors, it will also compete with the emerging Java OLAP (JOLAP) API. Hyperion is the JOLAP specification lead within the Java Community Process. Expert group members include IBM and Oracle - the leading data-management vendors, both heavily invested in Java and XML and promoting application-server-centric architectures - and SAS Institute. According to the Java Specification Request, JOLAP will 'provide a standard API for creating, storing, accessing, and managing all metadata and data related to OLAP systems ... independent of the underlying data resource.' At first blush, this sounds like the justification for XML for Analysis, but in fact JOLAP's data and metadata management capabilities aren't found in the XML API and, unlike XML for Analysis, JOLAP will provide a single data-manipulation and query language. In an AlphaBlox press release, company founder Michael Skok stresses that 'a company's analytical infrastructure must integrate with its J2EE-compliant application servers ... for optimal performance, scalability, reliability, and security.' Skok spoke at the same Hyperion conference that launched the XML for Analysis specification, which his company endorsed, yet while praising JOLAP's approach (although not naming it), he had nothing public to say about the XML API. I'd infer that vendors like AlphaBlox, Hyperion, and SAS Institute - and perhaps Oracle - are holding back on a more ringing JOLAP endorsement because enabled products will take even longer to reach market than products that implement XML for Analysis. OASIS, the XML coordinating body, allows standards developers more freedom than the Java Community Process, which regulates JOLAP development, and JOLAP is more comprehensive than XML for Analysis..." See: "XML for Analysis."

  • [August 31, 2001] "Top Ten Java and XSLT Tips." By Eric M. Burke. From O'Reilly Java News. 29 August 2001. "My new book, Java and XSLT, examines techniques for using XSLT with Java (of course!). This article highlights ten tips that I feel are important, although limiting the list to ten items only scratches the surface of what is possible. Most of these tips focus on the combination of Java and XSLT, rather than on specific XSLT (Extensible Stylesheet Transformations) techniques. For more detailed information, there are pointers to other valuable resources at the end of this article. The basics of XSL transformations are pretty simple: one or more XSLT stylesheets contain instructions that define how to transform XML data into some other format. XSLT processors do the actual transformations; Sun Microsystems' Java API for XML Processing (JAXP) provides a standard Java interface to various processors. Here is some sample code that performs an XSL transformation using the JAXP API... XSLT is not a difficult language, although it does work quite differently than Java. Diving in and writing stylesheets is probably the best way to get over the initial learning curve." You can download some sample code that performs an XSL transformation using the JAXP API; the .ZIP file contains the example, along with an XSLT stylesheet and XML data file. The included README file explains how to compile and run the example. Although the example utilizes StreamSource to read data from files, JAXP can also read XML data from SAX parsers or DOM trees... Tip #1: Cache whenever possible: Performing transformations using XSLT is CPU- and memory-intensive, so it makes sense to optimize whenever possible. Various caching techniques are one of the best ways to improve runtime performance in XSLT-driven Web applications...." For related resources, see "Extensible Stylesheet Language (XSL/XSLT)."

  • [August 31, 2001] "Inside Visual Studio .Net." By By Michael Floyd. In PC Magazine (August 30, 2001). "Visual Studio .NET -- a cornerstone of the .NET platform -- is more than just a development environment, though. VS.NET is built with the same technology that developers will use to create applications. The .NET framework introduces features like a common language runtime that unifies programming and scripting languages while managing underlying code. The .NET framework also adds a new programming model for Windows programmers, adds compiled Active Server Pages (ASP), and introduces Web Services. Thanks to the Common Language Runtime, VS.NET also provides C++, C#, and Visual Basic programmers with a common development environment. And JScript developers will find some limited support from VS.NET when creating ASP.NET and Web Services applications. XML developers will like the robust support for XML documents, XML schema, and XSL transformations. The following pages tour some of the features you're likely to find in Visual Studio .NET. I should point out that I'm looking at Visual Studio .NET Beta 2 Professional Edition. An Enterprise edition of the Beta should be available by the time you read this. Visual Studio now offers a single code editor for all languages supported by VS.NET while supporting specific features for each language. The editor introduces several editing enhancements, like word wrap, incremental search, code outlining, collapsing text, line numbering, color printing, and shortcuts. And the editor provides a variety of language-specific features, like the ability to complete prototypes and function calls as you type them in. In addition to the programming languages, the editor also supports development of HTML documents, Cascading Style Sheets, and even XML. In fact, I was pleased to see that I could load one of my XML documents and view my markup with keywords such as the XML declaration and attributes color-highlighted. What's more, the editor provides both a source view and a data view. In the data view, the structure of the document is presented in a left-hand window. When you select one of the XML elements in this hierarchy, a table in the right-hand portion of the window displays sub-elements and lets you drill down to element data. Very cool! One anomaly I found, however, is that not all XML documents can be loaded into the data view. Documents with unpredictable structure seem to confuse the editor when attempting to go to data view. Another pleasant surprise is that Visual Studio .NET allows you to create an XML schema based on a document instance. You open the document instance, which by default brings up the document source view. You can either remain in source view or switch to the data view, then right-click on the view and select Create Schema from the pop-up menu. This brings up a dialog box that lets you name your schema document. Once the schema is created, a reference to it is inserted into the original document instance. For those who don't want to hassle with writing XML schema from scratch, Visual Studio .NET can give you a real jump start..."

  • [August 30, 2001] "Requirements for an XML Registry." By Joseph M. Chiusano, Terence J. Schmitt, and Mark Crawford. Logistics Management Institute (McLean, VA, USA)., Report EP005T4. May 2001. 136 pages. Reference provided by Owen Ambur, (Co-Chair, XML.GOV XML Working Group): "LMI's report for EPA entitled 'Requirements for an XML Registry' has been posted; it contains a helpful summary of the ebXML, OASIS, and ISO 11179 registries, along with recommendations on their use." From the introduction: "The boon of XML is its extensibility. The bane of XML is its extensibility. To use XML for information exchanges, standardization of the numerous components, such as those that determine data exchange formats and provide trading partner information, are required. Recording these components in an XML registry enables XML to be used consistently, both in projects and between organizations. The benefits of an XML registry are numerous: (1) Promotes reuse; (2) Enables efficient version control; (3) Promotes unified understanding of registered objects; (4) Ensures consistency across organizational areas; (5) Promotes selective access to registered objects; (6) Enables collaborative development. The U.S. Environmental Protection Agency (EPA) has tasked LMI with determining requirements for an XML registry. As EPA continues with its e-government initiatives, it needs to implement an XML registry to offer a central location for XML and e-business resources. The registry will enable XML components to be standardized and shared among EPA, states, and industry partners, thus making data more coherent throughout the agency. We recommend that (1) EPA implement an XML registry based on the electronic business XML (ebXML) model for storing document type definitions, Schemas, XML documents, and business information content; (2) XML tags used in document type definitions and Schemas be stored in a second registry based on the International Standards Organization and International Electrotechnical Commission 11179 model; (3) EPA use its Environmental Data Registry (EDR) as the 11179 registry because the EDR is an implementation of an 11179 registry; and (4) the XML registry be accessible from the Environmental Information Exchange Network designed by the State/EPA Information Management Work Group... EPA's XML registry must: (1) store XML schemas, XML documents, and business information content; (2) accept submissions from states and industry partners, including modifying existing XML components; be accessible from the Environmental Information Exchange Network; (3) align with commercial efforts as stated in the Office of Management and Budget (OMB) Circular A-119 and the National Technology Transfer and Advancement Act of 1995 (NTTA); (4) contain appropriate security features; (5) adhere to widely adopted XML registry standards; (6) have commercial off-the-shelf (COTS) software available for implementation; (7) assure consistency of XML tags used in XML documents; (8) enable human and machine discovery of XML registry content; (9) have a centralized architecture, with support for distributed architecture in future; and (10) have one or more repositories... The 11179 standard is the basis for a document, 'Concept of Operations for a Data Registry,' which addresses registration of data elements as they are described in the 11179 standard. This document defines a data registry as 'an information resource kept by a Registration Authority that describes the meaning and representational form of data units including data element identifiers, definitions, units, allowed value domains, etc.' We refer to this as the '11179 registry model.' EPA's EDR is an implementation of an 11179 registry. ISO/IEC 11179-3 is being revised to incorporate the notions of the American National Standards Institute (ANSI) X3.285 standard, Metamodel for the Manage-ment of Shareable Data. This new version, known as the 'registry metamodel (MDR3),' extends the original version to include registering objects. We refer to this new version as the '11179 registry metamodel.' The X3.285 standard specifies the structure of a data registry as a conceptual data model and provides the attributes for identifying the characteristics of data that are necessary to clearly describe, inventory, analyze, and classify data. All other parts of the ISO/IEC 11179 standard also are being harmonized to use the same terminology as the 11179 registry metamodel..." Note: see also the "Federal Tag Standards for Extensible Markup Language" prepared by the Logistics Management Institute [== LMI Report. By Mark R. Crawford, Donald F. Egan, and Angela Jackson. Report GS018T1. June 2001. 76 pages.]. See "XML Registry and Repository" [cache]

  • [August 30, 2001] "OASIS/ebXML Registry Information Model (RIM). Version 1.1, Draft 6. By members of the OASIS/ebXML Registry Technical Committee. 30-August-2001. 50 pages. "This document specifies the information model for the ebXML Registry. A separate document, ebXML Registry Services Specification, describes how to build Registry Services that provide access to the information content in the ebXML Registry. The Registry provides a stable store where information submitted by a Submitting Organization is made persistent. Such information is used to facilitate ebXML-based Business to Business (B2B) partnerships and transactions. Submitted content may be XML schema and documents, process descriptions, Core Components, context descriptions, UML models, information about parties and even software components. A set of Registry Services that provide access to Registry content to clients of the Registry is defined in the ebXML Registry Services Specification. This document does not provide details on these services but may occasionally refer to them. The Registry Information Model provides a blueprint or high-level schema for the ebXML Registry. Its primary value is for implementers of ebXML Registries. It provides these implementers with information on the type of metadata that is stored in the Registry as well as the relationships among metadata Classes. The Registry information model: (1) Defines what types of objects are stored in the Registry; (2) Defines how stored objects are organized in the Registry; (3) Is based on ebXML metamodels from various working groups. How the Registry Information Model Works: Implementers of the ebXML Registry may use the information model to determine which Classes to include in their Registry Implementation and what attributes and methods these Classes may have. They may also use it to determine what sort of database schema their Registry Implementation may need. The Registry Information Model may be implemented within an ebXML Registry in the form of a relational database schema, object database schema or some other physical schema. It may also be implemented as interfaces and Classes within a Registry Implementation. If an Implementation claims Conformance to this specification then it supports all required information model Classes and interfaces, their attributes and their semantic definitions that are visible through the ebXML Registry Services.... Standardized taxonomies also referred to as ontologies, classification schemes, or coding schemes exist in various industries to provide a structured coded vocabulary. The ebXML Registry does not define support for specific taxonomies. Instead it provides a general capability to link RegistryEntries to codes defined by various taxonomies. The information model provides two alternatives for using standardized taxonomies for Classification of RegistryEntries. The information model provides a full-featured taxonomy based Classification alternative based on Classification, ClassificationScheme and ClassificationNode instances. This alternative requires that a standard taxonomy be imported into the Registry as a Classification tree consisting of ClassificationNode instances rooted under a ClassificationScheme instance. This specification does not prescribe the transformation tools necessary to convert standard taxonomies into ebXML Registry Classification trees. However, the transformation must ensure that: (1) The name attribute of the ClassificationScheme instance is the name of the standard taxonomy (e.g., NAICS, ICD-9, SNOMED). (2) All codes in the standard taxonomy are preserved in the code attribute of a ClassificationNode. (3) The intended structure of the standard taxonomy is preserved in the ClassificationNode tree, thus allowing polymorphic browse and drill down discovery. This means that when searching for entries classified by Asia, a client will find entries classified by descendants of Asia (e.g., Japan and Korea)..." [source]

  • [August 30, 2001] "Document Object Model (DOM) Level 3 XPath Specification Version 1.0." W3C Working Draft 30-August-2001. Edited by Ray Whitmer (Netscape/AOL). Philippe Le Hégaret wrote: "Following feedbacks on the lists, we changed the design of the API and dropped the dependency on XPath 2.0. All issues should be addressed..." Abstract: "This specification defines the Document Object Model Level 3 XPath. It provides simple functionalities to access a DOM tree using XPath 1.0. This module builds on top of the Document Object Model Level 3 Core. From the Introduction: "XPath is becoming an important part of a variety of many specifications including XForms, XPointer, XSL, CSS, and so on. It is also a clear advantage for user applications which use DOM to be able to use XPath expressions to locate nodes automatically and declaratively. But liveness issues have plagued each attempt to get a list of DOM nodes matching specific criteria, as would be expected for an XPath API. There have also traditionally been model mismatches between DOM and XPath. This proposal specifies new interfaces and approaches to resolving these issues... The XPath model relies on the XML Information Set [XML Information set] ands represents Character Information Items in a single logical text node where DOM may have multiple fragmented Text nodes due to cdata sections, entity references, etc. Instead of returning multiple nodes where XPath sees a single logical text node, only the first non-empty DOM Text node of any logical XPath text will be returned in the node set. Applications using XPath in an environment with fragmented text nodes must manually gather the text of a single logical text node from multiple nodes beginning with the first Text node identified by the implementation..." Version URL: Latest version URL: Previous version URL: Also in PDF, Postscript, and single HTML file. General references in "W3C Document Object Model (DOM)."

  • [August 25, 2001] "How and Where XML is Changing the Markets." By Anthony B. Coates (Reuters). Presented at XML Europe 2001. "XML has taken root in the financial world, but only the first pieces of the puzzle are in place. This paper presents an overview of the available financial XML specifications, what their scopes are, and how they relate to each other in practice. Reuters is the world's largest supplier of financial data, so this is a practical discussion of what is and is not possible at present, and what is in the pipeline going forward. XBRL, FpML, IRML, ISO15022, and others are covered... The financial area is a large and complicated one, and XML is still only a recent technology. The people who understand financial information and transactions the best are not the ones who understand XML the best, so it is only to be expected that much of the XML work in finance is still at the level of bringing the right people together. FpML and XBRL particularly have made good progress in creating credible and open financial XML specifications, but the architectural mismatch between these two may cause problems in the future, should they ever start to cover the same areas of the financial landscape. There is not yet an XML specification that covers the bulk of financial information requirements, nor an XML specification which covers the bulk of financial transaction requirements. However, the financial community is clearly driving towards these, while at the same time trying not to over-divide the available development resources by pursuing simultaneously too many holy grails. Vertical approaches to creating financial specifications are a good way to prove the viability of XML-based solutions to the business managers whose support is needed if XML is to cover financial needs comprehensively. It remains to be seen how a consortium with experience in a particular vertical area will be able to expand its membership and/or skills base in order to extend the scope of its specification(s). There remains a lot of committee work to be done between where we are now and where we want to be. An important point that now has been proven by experience is that separating the XML structural decisions from the product/vocabulary decisions is a very good way to allow the people who know XML to do XML, and the people who are domain experts to worry about domain issues rather than angle-brackets. It is against this background that Reuters is developing its MarketsML family of XML Schemas, with the aim of uniting the representation of financial information via a consistent architectural approach. MarketsML will interoperate with other XML specifications via transformation, and has the express aim of being able to fully represent the data models of major financial XML specifications. As the XML Schema Working Group (WG) has been trying to get across of late, it is the data model (Information Set) that is important, not the particular element names nor the particular attribute names." Also in HTML, For related references, see the news item "Market Data Definition Language (MDDL) Advances Toward Version 1.0 Release." [cache]

  • [August 24, 2001] "Managing Knowledge for Strategic Business Analyst: The Executive Information Portal." By John Mylopoulos, Attila Barta, Raoul Jarvis, Patricia Rodriguez-Gianolli, and Shun Zhou. University of Toronto. Draft version 03/03/01. 18 pages. "Strategic business analysts keep track of trends that are relevant to their organization and its strategic objectives. To accomplish their mission, they monitor news stories and other reports as they become available, looking for evidence that these objectives remain on track, or have encountered obstacles. The paper presents a prototype enterprise information portal intended to support this type of knowledge work. The system supports three key functions. Firstly, it offers tools for building and analyzing semantic models of strategic objectives, the relationships among them, as well as events and actors who can play a role, positive or negative, in their fulfillment. Secondly, the system uses a powerful query language and the semantic model to assist analysts as they search external sources for relevant material, also provides semi-automatic classification and clustering of documents. Thirdly, documents are placed in an XML format and stored in an XML server. In addition, analysts can annotate or summarize documents and relate them to nodes of the semantic model... Once a document is registered in DocMan [content management system with XML data server], it can be annotated or summarized. Annotations can be thought as 'knowledge records' for our system in the sense that they facilitate the exchange of knowledge between a group of collaborating business analysts. Moreover, annotations can be used as discussion threads. Furthermore, DocMan allows annotations to be broadcast, via e-mail, to the entire analyst group. Document summaries constitute another important form of knowledge sharing. A document might have many summaries attached which represent the perspectives of many analysts who have reviewed the document. In addition, DocMan offers support for attaching keywords to registered documents. This function can be used in order to update the list of keywords attached to any one document by the classification component of EXIP. Annotations, as well as summaries and keywords are all encoded in XML. Annotations and summaries are separate XML documents, linked to the document they refer. The attached keywords are incorporated in the original document, now in XML format. Parts of the annotation and summary documents are the username of the analyst that authored the annotation/summary. We need this information in order to search and mine the data with respect to the author. Another DocMan feature is its support of semantic relationships among documents. For example, a document may support or contradict or follow-up or simply relate to another document. Such relationships are inserted by users and are part of the EXIP semantic model. In the internal XML representation of documents, a relationship consists of a relationship tag with the corresponding attribute for the relationship type... Since annotations and summaries are represented in XML format, they are easy to search. Thus, we use the ToX search engine to search not only documents but also annotations and summaries. Moreover, using the XML structure we can also follow the relationships among documents... Documents are transformed from a domestic format used within each document source into an XML schema. The schema consists of three sections. The first is the TOPICS section, used for general topic classification of the document. The second is the MODEL section which is used for model classification. Here the GOALS, EVENTS, and LINKS subsections assign a document to nodes in the semantic model. The third is the DOCUMENT section, and it includes relevant document fields, such as TITLE, AUTHOR and the like. Once wrappers have selected documents from a particular document source, the documents are parsed and transformed into the EXIP XML format..." [Summary:] We have presented a prototype EIP intended to support a group of knowledge workers in retrieving, classifying and using information downloaded from a variety of information sources. Our approach differs from state-of-practice EIPs in that it treats all information managed by the system as semi-structured data, thereby exploiting tools such as declarative query languages and XML data servers. Moreover, our approach adopts a Machine Learning framework for quick learning and continuous evolution of its classifiers and clustering algorithms. Finally, our framework supports lightweight semantic models of relevant domain knowledge which is used for classification, retrieval and analysis." [cache, PDF]

  • [August 24, 2001] "On the Integration of Topic Maps and RDF Data." By Martin Lacher and Stefan Decker. 14 pages. Paper presented "on the integration of Topic Maps and RDF" at the August 2001 'Semantic Web Workshop' at Stanford. ['We provide a way to make Topic Map sources RDF-queriable by exchanging one layer in a layered data model stack. The exchanged layer is the layer on which both are represented as a graph; we use TMPM4 as a Topic Map graph representation. Our approach complies with what Graham Moore has termed in his XML Europe paper "modeling the model".'] "Topic Maps and RDF are two independently developed paradigms and standards for the representation, interchange, and exploitation of model-based data on the web. Each paradigm has established its own user communities. Each of the standards allows data to be represented as a graph with nodes and labeled arcs which can be serialized in one or more XML- or SGML-based syntaxes. However, the two data models have significant conceptual differences. A central goal of both paradigms is to define an interchangeable format for the exchange of knowledge on the Web. In order to prevent a partition of the Web into collections of incompatible resources, it is reasonable to seek ways for integration of Topic Maps with RDF. A first step is made by representing Topic Map information as RDF information and thus allowing Topic Map information to be queried by an RDF-aware infrastructure. To achieve this goal, we map a Topic Map graph model to the RDF graph model. All information from the Topic Map is preserved, such that the mapping is reversible. The mapping is performed by modeling the graph features of a Topic Map graph model with an RDF graph. The result of the mapping is an RDF-based internal representation of Topic Maps data that can be queried as an RDF source by an RDF-aware query processor... Interoperability is of greatest importance for the future Semantic Web. We suggested a way to achieve interoperability between Topic Maps and RDF, which enables the joint querying of RDF and Topic Maps information sources. Our work builds on existing work on general approaches for the integration of model based information resources. In contrast to those general approaches we showed a detailed mapping specifically from XTM Topic Maps to RDF. We achieved this by adopting an internal graph representation for Topic Maps, which has been published as part of one of the processing models for Topic Maps. We perform a graph transformation to generate an RDF graph from the Topic Map graph representation. The Topic Map source can now be queried with an RDF query language together with RDF information sources. We see this as a first step towards the integration of the many heterogeneous information sources available on the Web today and in the future." See: (1) "(XML) Topic Maps", and (2) "Resource Description Framework (RDF)." [cache]

  • [August 24, 2001] "Speech Recognition Grammar Specification for the W3C Speech Interface Framework." W3C Working Draft 20-August-2001. Edited by Andrew Hunt (SpeechWorks International) and Scott McGlashan (PipeBeach). Version URL: Latest version URL: Previous version: 2001-01-03. The WD describes markup for grammars for use in speech recognition, and forms part of the proposals for the W3C Speech Interface Framework. Abstract: "This document defines syntax for representating grammars for use in speech recognition so that developers can specify the words and patterns of words to be listened for by a speech recognizer. The syntax of the grammar format is presented in two forms, an augmented BNF syntax and an XML syntax. The specification intends to make the two representations directly mappable and allow automatic transformations between the two forms." Detail: "The syntax of the grammar format is presented in two forms, an Augmented BNF (ABNF) syntax and an XML syntax. The specification ensures that the two representations are semantically mappable to allow automatic transformations between the two forms. (1) Augmented BNF syntax (ABNF): this is a plain-text (non-XML) representation which is similar to traditional BNF grammar and to many existing BNF-like representations commonly used in the field of speech recognition including the JSpeech Grammar Format from which this specification is derived. Augmented BNF should not be confused with Extended BNF which is used in DTDs for XML and SGML. (2) XML: This syntax uses XML elements to represent the grammar constructs and adapts designs from the PipeBeach grammar, TalkML, and a research XML variant of the JSpeech Grammar Format." Status: this is the 20th August 2001 last call Working Draft of the Speech Recognition Grammar Specification, and incorporates changes in response to feedback on the previous draft. This last call review period ends 28-September-2001. The document has been produced as part of the W3C Voice Browser Activity. See also the public mailing list archives for 'www-voice'. See other documents in the W3C Voice Browser Activity: (1) Stochastic Language Models (N-Gram) Specification; (2) Natural Language Semantics Markup Language for the Speech Interface Framework; (3) Speech Synthesis Markup Language Specification for the Speech Interface Framework; (4) Call Control Requirements in a Voice Browser Framework.

  • [August 24, 2001] "ebXML and Interoperability. And other issues from the Terms of Reference." By Mike Rawlins. Part 3 (August 23, 2001) from the series "ebXML - A Critical Analysis" ['The 18-month ebXML joint initiative between UN/CEFACT and OASIS was declared officially completed this past May 11. Now that it is over, it is time to take a look at what it accomplished. This series presents an analysis of the products of ebXML, its success in achieving its stated objectives, and an assessment of the long-term impact of the initiative.'] "The Terms of Reference between UN/CEFACT and OASIS that laid the foundation for ebXML discuss several objectives and issues to be addressed by the initiative. In this article I'll examine how well those were achieved. The primary goal specified in the Terms of Reference for ebXML was to enable interoperability. To assess how well this goal was met I'll use the definition of interoperability from the ebXML Requirements Specification (section 2.5.1). I'll use a fairly simple approach of assigning a letter grade of A through F (with a 4-point scale) to each of the criteria. Then to keep things simple, I'll average the grades for an overall score, weighting each criterion equally. We'll start off with a perfect 'A' for achieving interoperability as the default, then lower by one or more letter grades for deficiencies such as the following: (1) Offering options, with a means to indicate the option chosen, instead of a single solution; (2) Offering several options without reasonable guidance on how to choose among them; (3) Specification not being complete. Where ebXML didn't address a criterion, I assign a failing "F". Here are the criteria and my grading. Where notes were included in the Requirements Specification, I have included them here. The basis of the grading is the ebXML work as of the end of the initiative in May, 2001. Further work may (I hope!) improve the grading, particularly in cases where the work is incomplete... in regard to the deliverables called for in the Terms of Reference, ebXML did develop a set of technical specifications, but we didn't do a very good job at enabling interoperability. ebXML did not address its other two deliverables. In summary, I would have to say that we failed to meet the mandates of the Terms of Reference. This is a rather severe criticism, and I feel obliged to point out that this is by necessity a fairly subjective analysis; others may have different opinions. I again note that much of the work was incomplete or not addressed when the initiative ended, and the assessment may improve considerably as CEFACT continues the business process and core component work. I will, however, show in my next article that assessing ebXML's performance in regard to its other major goal is much more objective and clear cut. By nearly any way you look at it, ebXML was a dismal failure in bringing the benefits of e-commerce to Small to Medium Enterprises (SMEs) and developing countries..." See: "Electronic Business XML Initiative (ebXML)."

  • [August 24, 2001] "XSL Transformations (XSLT) Version 1.1." W3C Working Draft 24-August-2001. Edited by James Clark. NOTICE: "As of 24-August-2001 no further work on this draft is expected. The work on XSLT 2.0 identified a number of issues with the approaches being pursued in this document; solutions to the requirements of XSLT 1.1 will be considered in the development of XSLT 2.0 [XSLT20REQ -*gt; XSLT Requirements Version 2.0. W3C Working Draft 14 February 2001]. Other than this paragraph, the document is unchanged from the previous version..." Version URL: Latest version URL: Previous version URL: Abstract: This specification defines the syntax and semantics of XSLT, which is a language for transforming XML documents into other XML documents. XSLT is designed for use as part of XSL, which is a stylesheet language for XML. In addition to XSLT, XSL includes an XML vocabulary for specifying formatting. XSL specifies the styling of an XML document by using XSLT to describe how the document is transformed into another XML document that uses the formatting vocabulary. XSLT is also designed to be used independently of XSL. However, XSLT is not intended as a completely general-purpose XML transformation language. Rather it is designed primarily for the kinds of transformations that are needed when XSLT is used as part of XSL." For related resources, see "Extensible Stylesheet Language (XSL/XSLT)."

  • [August 24, 2001] "Semantic Data Modeling Using XML Schemas." By Murali Mani, Dongwon Lee, and Richard R. Muntz (Department of Computer Science, University of California, Los Angeles, CA). [To be published] in Proceedings of the 20th International Conference on Conceptual Modeling (ER 2001), Yokohama, Japan, November, 2001. "Most research on XML has so far largely neglected the data modeling aspects of XML schemas. In this paper, we attempt to make a systematic approach to data modeling capabilities of XML schemas. We first formalize a core set of features among a dozen competing XML schema language proposals and introduce a new notion of XGrammar. The benefits of such formal description is that it is both concise and precise. We then compare the features of XGrammar with those of the Entity-Relationship (ER) model. We especially focus on three data mod- eling capabilities of XGrammar: (a) the ability to represent ordered binary relationships, (b) the ability to represent a set of semantically equivalent but structurally different types as 'one' type using the closure properties, and (c) the ability to represent recursive relationships... Ordered relationships exist commonly in practice such as the list of authors of a book. XML schemas, on the other hand, can specify such ordered relationships. Semantic data modeling using XML schemas has been studied in the recent past. ERX extends ER model so that one can represent astyle sheet and a collection of documents conforming to one DTD in ERX model. But order is represented in ERX model by an additional order attribute. Other related work include a mapping from XML schema to an extended UML, and a mapping from Object-Role Modeling (ORM) to XML schema . Our approach is different from these approaches: we focus on the new features provided by an XML Schema -- element-subelement relationships, new datatypes such as ID or IDREF(S), recursive type definitions, and the property that XGrammar is closed under union, and how they are useful to data modeling... The paper is organized as follows. In Section 2, we describe XGrammar that we propose as a formalization of XML schemas. In Section 3, we describe in detail the main features of XGrammar for data modeling. In Section 4, we show how toconvert an XGrammar to EER model, and vice versa. In Section 5, an application scenario using the proposed XGrammar and EER model is given. Finally, some concluding remarks are followed in Section 6. ... [Conclusions:] In this paper, we examined several new features provided by XML schemas for data description. In particular, we examined how ordered binary relationships 1:n (through parent-child relationships and IDREFS attribute) as well as n:m (through IDREFS attribute) can be represented using an XML schema. We also examined the other features provided by XML grammars -- representing recursive relationships using recursive type definitions and union types. The EER model, conceptualized in the logical design phase, can be mapped on to XGrammar (or its equivalent) and, in turn, mapped into other final data models, such as relational data model, or in some cases, the XML data model itself (i.e., data might be stored as XML documents themselves). We believe that work presented in this paper forms a useful contribution to such scenarios." Also available in Postscript format. See related references in (1) "Conceptual Modeling and Markup Languages", and (2) "XML Schemas." [cache PDF, Postscript]

  • [August 24, 2001] "CPI: Constraints-Preserving Inlining Algorithm for Mapping XML DTD to Relational Schema." By Dongwon Lee and Wesley W. Chu (University of California at Los Angeles, Department of Computer Science, Los Angeles). 27 pages. To be published in Journal of Data and Knowledge Engineering. "As Extensible Markup Language (XML) is emerging as the data format of the Internet era, there are increasing needs to efficiently store and query XML data. One path to this goal is transforming XML data into relational format in order to use relational database technology. Although several transformation algorithms exist, they are incomplete in the sense that they focus only on structural aspects and ignore semantic aspects. In this paper, we present the semantic knowledge that needs to be captured during transformation to ensure a correct relational schema. Further, we show an algorithm that can: (1) derive such semantic knowledge from a given XML Document Type Definition (DTD), and (2) preserve the knowledge by representing it as semantic constraints in relational database terms. By combining existing transformation algorithms and our constraints-preserving algorithm, one can transform XML DTD to relational schema where correct semantics and behaviors are guaranteed by the preserved constraints. Experimental results are also presented... One way to query XML data is to reuse the established relational database techniques by converting and storing XML data in relational storage. Since the hierarchical XML and the at relational data models are not fully compatible, the transformation is not a straightforward task. To this end, several XML-to-relational transformation algorithms have been proposed (Deutsch et al., 1998; Florescu and Kossmann, 1999; Shanmugasundaram et al., 1999). For instance, Shanmugasundaram et al. (1999) presents 3 algorithms that focus on the table level of the schema while Florescu and Kossmann (1999) studies different performance issues among 8 algorithms that focus on the attribute and value level of the schema. They all transform the given XML Document Type Definition (DTD) to relational schema. Similarly, Deutsch et al. (1998) presents a data mining-based algorithm that instead uses XML documents directly without a DTD. Although they work well for the given applications, they miss one important point. That is, the transformation algorithms only capture the structure of a DTD and ignore the hidden semantic constraints... Our experimental results reveal that constraints can be systematically preserved during the conversion from XML to relational schema. Such constraints can also be used for semantic query optimization or semantic caching... Despite the obstacles in converting from XML to relational models and vice versa, there are several practical benefits: (1) Considering the present market that is mostly dominated by RDB products, it is not easy nor practical to abandon RDB to support XML. It is very likely that industries would be reluctant to adopt the new technology if it does not support the existing RDB techniques as they were reluctant towards object-oriented database in the past. (2) By using RDB as an underlying storage system, the mature RDB techniques can be leveraged. That is, a vast number of sophisticated techniques (e.g., OLAP, Data Mining, Data Warehousing, etc.) developed for RDB can be applied to XML data with minimal changes. (3) The integration of a large amount of XML data on the Web with the legacy data in relational format is possible. We strongly believe that devising more accurate and efficient conversion metholodogies between XML and relational models is very important and our CPI algorithm can serve as an enhancement for such conversion algorithms. The prototype of CPI algorithm is available online at The interested readers are welcome to experiment, improve and extend further." Also in Postscript format. See: "XML and Databases." [cache]

  • [August 24, 2001] "Document Object Model (DOM) Level 3 Events Specification Version 1.0." W3C Working Draft 23-August-2001. Edited by Tom Pixley (Netscape Communications Corporation). Part of the W3C DOM Activity. See also "Changes between DOM Level 2 Events and DOM Level 3 Events." "This specification defines the Document Object Model Events Level 3, a platform- and language-neutral interface that allows programs and scripts to dynamically access and update the content, structure and style of documents. The Document Object Model Events Level 3 builds on the Document Object Model Events Level 2..." Details: "The DOM Level 3 Event Model is designed with two main goals. The first goal is the design of a generic event system which allows registration of event handlers, describes event flow through a tree structure, and provides basic contextual information for each event. Additionally, the specification will provide standard modules of events for user interface control and document mutation notifications, including defined contextual information for each of these event modules. The second goal of the event model is to provide a common subset of the current event systems used in DOM Level 0 browsers. This is intended to foster interoperability of existing scripts and content. It is not expected that this goal will be met with full backwards compatibility. However, the specification attempts to achieve this when possible. The following sections of the Event Model specification define both the specification for the DOM Event Model and a number of conformant event modules designed for use within the model. The Event Model consists of the two sections on event propagation and event listener registration and the Event interface..." See DOM Level 3 Specifications.

  • [August 24, 2001] "Just Over the Horizon ... a new DOM. A preview of DOM Level 3." By Brett McLaughlin (Enhydra strategist, Lutris Technologies). From IBM developerWorks. August 2001. ['This article previews the W3C's XML Document Object Model Level 3, due to be released toward the end of 2001 or early in 2002. Java developer and author Brett McLaughlin gives an overview of key features in the forthcoming version of the DOM, which will offer better access to pieces of information in an XML document, better comparisons, and a much-needed bootstrapping process. Six short code samples demonstrate some new methods.'] "The Document Object Model (DOM) is arguably the most popular API for manipulating XML in use today. It presents an XML document in an object-based form, making it simple to manipulate for Java programmers and other developers who are already familiar with objects. Additionally, it works across languages, providing access to XML in JavaScript/ECMAScript, C, and other languages. While this article's code samples are all in Java, the changes detailed will be available in all language bindings (the mapping of the specification to a specific programming language) of the DOM Level 3 specification. The current version of the DOM specification, DOM Level 2, is in widespread use in production applications all over the world. However, there are some recognized problems with this version of the specification: most notably, the inability to bootstrap a DOM implementation. Bootstrapping provides a way to load a DOM implementation without vendor-specific code, which is critical in allowing an application to run with various parsers. In addition, node comparisons are fairly limited in the current DOM specification, and some items in an XML document are not available, most notably the XML declaration (<xml version="1.0" standalone="yes">, for example). Happily, DOM Level 3, as currently described in draft form, rectifies all of these problems. In this article, I'll show you how, and I'll give you a sneak preview of what to look for in this new and improved version of the specification..." See (1) the W3C DOM web site, and (2) DOM Level 3 Specifications.

  • [August 24, 2001] "Indexing XML Data with ToXin." By Flavio Rizzolo and Alberto Mendelzon. Paper presented at the Fourth International Workshop on the Web and Databases (in conjunction with ACM SIGMOD 2001). Santa Barbara, CA. May 2001. "Indexing schemes for semistructured data have been developed in recent years to optimize path query processing by summarizing path information. However, most of these schemes can only be applied to some query processing stages whereas others only support a limited class of queries. To overcome these limitations we developed ToXin, an indexing scheme for XML data that fully exploits the overall path structure of the database in all query processing stages. ToXin consists of two different types of structures: a path index that summarizes all paths in the database and can be used for both forward and backward navigation starting from any node, and a value index that supports predicates over values. ToXin synthesizes ideas from object-oriented path indexes and extends them to the semistructured realm of XML data. In this paper we present the ToXin architecture, describe its current implementation, and discuss comparative performance results... [Conclusions:] In this paper we presented ToXin, an indexing scheme that supports path queries over XML data. We discussed its architecture, related work and presented performance results. The motivation for this work was to overcome some limitations of current indexing proposals for semistructured data, such as the lack of support for all query processing stages (Dataguides, 1-indexes and 2-indexes), and the need for an explicit specification of the paths to index (T-indexes). To that end, we combined ideas from Dataguides and access support relations in a way that allows us to use the index in all the query evaluation stages for any general path query. The experimental results suggest that the query types that benefit the most by using ToXin are those with large query answers. In addition, the closer to the root the filter section starts, the wider the difference in performance between ToXin and other indexing schemes. We are currently extending the work presented here in several ways. The main directions are: adding order to the index structure; implementing the ToXin graph, by extending the ToXin tree with the semantics of the IDRefs; making the index persistent; and investigating ways to extend ToXin so it can be used as an alternative to DOM for storing, querying and updating XML documents." See details in the corresponding news item 2001-08-24.

  • [August 24, 2001] "ToX: The Toronto XML Engine." By Denilson Barbosa, Attila Barta, Alberto Mendelzon, George Mihaila, Flavio Rizzolo, and Patricia Rodriguez-Gianolli. Paper presented at the First International Workshop on Information Integration on the Web. Rio de Janeiro, Brazil. "We present ToX -- the Toronto XML Engine -- a repository for XML data and metadata, which supports real and virtual XML documents. Real documents are stored as files or mapped into relational or object databases, depending on their structuredness; indices are defined according to the storage method used. Virtual documents can be remote documents, defined as arbitrary WebOQL queries, or views, defined as queries over documents registered in the system. The system catalog contains metadata for the documents, especially their schemata, used for query processing and optimization. Queries can range over both the catalog and the documents, and multiple query languages are supported. In this paper we describe the architecture and main components of ToX; we present our indexing and storage strategies, including two novel techniques; and we discuss our query processing strategy. The project started recently and is under active development [2001]." See details in the corresponding news item 2001-08-24.

  • [August 24, 2001] "Speech Technology Grows Up. Speech applications can save money and the technology is moving into advanced applications." By Kathleen Ohlson. In Network World Fusion (August 20, 2001). "... In the coming months, voice technology will only get better, observers say. Industry experts and vendors expect support for VoiceXML, a specification that would enable speech-based applications and online information to become phone and voice accessible, and the infusion of speech recognition in wireless devices, such as cell phones and PDAs, to flourish. Thrifty has deployed SpeechWorks' interactive speech recognition software to handle customer requests for car rental quotes. Customers who call Thrifty's reservation number are prompted to give information regarding dates, times, car size, city and airport, and then receive reservation information. When a customer wants to book a reservation, he is transferred to a sales agent. The agent receives the calls and information containing the customer's requests on his computer screen. The car agency has handled more than 200,000 calls so far through the system, and it plans to push over more by summer's end. Thrifty receives 4 million calls per year with 30% to 40% coming from customers checking rates and availability, according to DuPont, staff vice president of reservations... In addition to Thrifty, United Airlines and T. Rowe Price are two companies that have recently implemented interactive speech systems. Speech technology is also expected to penetrate in areas such as inventory tracking and salesforce automation, according to industry experts. For example, salespeople could prompt for information regarding their contacts and calendars through a phone...One of the main drivers of speech technology in the coming months will be the adoption of VoiceXML, which basically outlines a common way for speech applications to be programmed. With the adoption of VoiceXML, businesses would only need to build an application once and then could run it on multiple vendor platforms. VoiceXML is the brainchild of IBM, AT&T, Lucent and Motorola, and is currently supported by more than 500 companies, including Nokia, Sprint PCS, Nuance and SpeechWorks. SpeechWorks recently rolled out its VoiceXML-based speech recognition engine OpenSpeech Recognizer 1.0; Nuance, Lucent, IBM and others have implemented VoiceXML into their products..." See "VoiceXML Forum."

  • [August 24, 2001] "Voice XML Version 2 Stalled Over IP Issue." By Ephraim Schwartz. In InfoWorld August 24, 2001. "Version 2 of the Voice XML markup language is all but signed and sealed, but not quite delivered due to a snag in nailing down IP (intellectual property) rights. According to an industry analyst familiar with the issues discussed at the Voice XML Forum, all the specifications have been agreed upon, but there is a concern still that a future developer using VXML could be sued by a member of the Forum for infringement of IP rights... One solution may be that companies [currently 55 Forum members] might choose to provide license-free use or forego patent rights, Meisel added. All sources in the speech technology industry see VXML as a boon to the industry because it uses a standard language already familiar to Web developers. Version 2, expected to ship by the end of the year, is in its final development stages, according to the Forum chairman Bill Dykas... Up until now, developers creating speech applications used proprietary formats for writing speech grammars. A speech grammar is needed to map a wide range of responses into a narrower range, explained Dykas. For example, in a 'yes/no grammar' there may be a dozen ways for a caller to respond in the affirmative to a question including yeah, yes, okay, please, and alright which all can be mapped to Yes. Version 2 of VXML will define a common format so the program has to deal with only a single response. The second major addition to the standard -- the Voice XML Forum is working with the W3C standards body -- is the clarification of the call transfer tags... technology components, as for example in telephony: how to manipulate telephone voice mail and load balancing between mechanisms if a large number of calls come in simultaneously... Other areas include natural language understanding and multimodal interfaces for handhelds and cellular handsets. For example, in using a multimodal interface, a mobile worker may make a voice request to a database for customers that match a certain set of parameters, but the results will be displayed rather than spoken." See "VoiceXML Forum."

  • [August 23, 2001] "P3P: Protector Of Consumers' Online Privacy." By Jason Levitt. In InformationWeek (August 20, 2001), pages 44-46. "While various security standards and technologies have emerged in recent years, few technological innovations have evolved to help protect the privacy of personal information. P3P, the Platform for Privacy Preferences, is perhaps the first technology that consumers will encounter, because it will be part of Microsoft's forthcoming Internet Explorer 6.0 browser. P3P is a World Wide Web Consortium standard designed to help users gain control over the use of their personal data. The standard is starting to appear on Web sites and in software products. The primary purpose of P3P is to turn the fine print of a Web site's privacy policy into something that users can understand. P3P should help consumers make informed decisions about whether to share their personal information with a Web site. To accomplish this goal, P3P must be deployed on both clients and servers. On the server side, Web sites must encode their privacy policies in a machine-readable XML language. Users who access the Web site using a P3P-compliant client, such as Internet Explorer 6.0, can review the sites' privacy policies and decide whether they want to divulge any personal information. While many E-commerce sites have online privacy policies, these policies are often written in legalese that's hard for users to understand. P3P's XML language will encourage sites to express their privacy policies with precision and specify exactly what they'll do with users' private information. For sites that want to deploy P3P, translating their current privacy practices into P3P's XML language will be a primary challenge. This can be tedious, because P3P requires exact answers for many privacy questions... Consultants such as PricewaterhouseCoopers have helped companies deploy privacy policies, and P3P generator tools such as IBM's P3P Editor and Microsoft's Privacy Wizard help translate natural-language privacy policies into P3P's XML privacy language... AT&T, IBM's Tivoli subsidiary, and NEC are other vendors that are committed to supporting P3P in various products and services. But many other software makers aren't yet committed to P3P. 'At the moment, we aren't sure whether P3P is the best solution,' says Live Leer, a PR manager for Opera Software AS, creators of the Opera Web browser. Similarly, P3P isn't in Netscape's version 6.1 browser, released last week, or America Online's software, which is used by 30 million people. With the release of Internet Explorer 6.0, it's certain that P3P will be on some user desktops this fall, but will it make a difference in users' online privacy experience? Ultimately, P3P will have little effect unless sites deploy it and there are sufficient privacy laws to back it up..." See W3C Privacy and P3P and "P3P Specification: Platform for Privacy Preferences."

  • [August 23, 2001] "Microsoft Adopts P3P In Internet Explorer 6.0." By Jason Levitt. In InformationWeek (August 20, 2001), page 46. "Web-site operators and users may be in for a surprise when Microsoft's Internet Explorer 6.0 browser is released Oct. 25. That's because users may be confronted with privacy warnings displayed by the browser, and some Web sites may receive complaints from those users. The warnings are the result of Microsoft's implementation of the Platform for Privacy Preferences. P3P is a World Wide Web Consortium standard that lets Web sites display their privacy policies so that users can decide how much personal data they want to reveal to the site. For users, the default privacy settings in Internet Explorer 6.0 will block all third-party cookies. Cookies are bits of data that Web sites use to store information on a user's computer such as a logon name and password so users don't have to enter that data each time they visit the site. Cookies can also track users' behavior, such as the sites they visit and products they buy. Third-party cookies come from Web sites other than the site the user is browsing. Typically, ad banners are the most common source of third-party cookies. Internet Explorer 6.0 will display a warning dialog box and an icon in the browser status bar the first time a user encounters a non-P3P-compliant site that attempts to store third-party cookies on the user's machine... Some sites might use forms or other mechanisms to transmit personal data, says Lorrie Cranor, chair of the P3P specification working group at the World Wide Web Consortium and a researcher at AT&T Labs. That's why Cranor is working on an ActiveX Control for Internet Explorer that will offer more comprehensive P3P reporting than IE 6.0. The ActiveX Control will look at personal data sent via forms. Among other things, it will put an icon at the top of the browser window that will change to indicate whether a site is P3P-enabled and, if it is, whether or not it matches the user's preferences... For information about Microsoft's P3P initiatives, see the privacy wizard." See W3C Privacy and P3P and "P3P Specification: Platform for Privacy Preferences."

  • [August 23, 2001] "XML for Data: Four Tips for Smart Architecture." By Kevin Williams (Chief XML Architect, Equient). From IBM developerWorks. August 2001. ['This column tells how to avoid some common mistakes that even smart architects make when designing an XML solution. XML architect and author Kevin Williams offers four tips for designing flexible and high performance systems.'] "XML suffers from an all-too-common problem with new technologies: I call it 'buzzworditis.' Like the C++ language and client-server architecture before that, XML has visibility at the executive level -- the nontechnical executive level. This leads to corporate memos insisting that "entire systems" need to be somehow 'converted' to XML for the good of the company. However, like C++ and client-server architecture, XML isn't an answer in and of itself; it's simply a tool you can use to help build your technical solution. By understanding the strengths and weaknesses of XML compared to other possible architectural choices, you can minimize or prevent major headaches later in the development (or maintenance) cycle. This column recommends following four general design guidelines for the judicious use of XML in the data architecture of your systems. ... This column looks at some of the ways XML fits into an overall system architecture and where it does (and doesn't) make sense. You'll see that some sort of indexing mechanism -- ideally a relational database -- should be part of your overall architecture in most cases. In short, use XML to perform the tasks it excels at, such as driving a rendering system. As you're architecting (or rearchitecting) your systems, remember that XML is just another tool in your development toolbox. You wouldn't use a screwdriver to hammer in a nail. Don't try to make XML do things it isn't designed to do well... Tip 1: If you don't need it, throw it away; Tip 2: Don't use XML for searching; Tip 3: Don't use XML for summarization; Tip 4: Use XML to drive rendering..."

  • [August 23, 2001] "Working with XML: The Java API for XML Parsing (JAXP) Tutorial." By Eric Armstrong. Announced by Eric Armstrong 2001-08-22. Version 1.1. 494 pages. ['A PDF version of the JAXP 1.1 tutorial is now available. HTML versions of all XML, DTD, and XSL files were created to make those files viewable in PDF (as well as in a normal browser). The tutorial pages now contain additional inks to those files. The planned work on the JAXP 1.1 tutorial is now complete.'] "This tutorial covers the following topics: Part I: Understanding XML and the Java XML APIs explains the basics of XML and gives you a guide to the acronyms associated with it. It also provides an overview of the Java TM XML APIs you can use to manipulate XML-based data, including the Java API for XML Parsing ((JAXP). To focus on XML with a minimum of programming, follow The XML Thread, below. Part II: Serial Access with the Simple API for XML (SAX) tells you how to read an XML file sequentially, and walks you through the callbacks the parser makes to event-handling methods you supply. Part III: XML and the Document Object Model (DOM) explains the structure of DOM, shows how to use it in a JTree, and shows how to create a hierarchy of objects from an XML document so you can randomly access it and modify its contents. This is also the API you use to write an XML file after creating a tree of objects in memory. Part IV: Using XSLT shows how the XSL transformation package can be used to write out a DOM as XML, convert arbitrary data to XML by creating a SAX parser, and convert XML data into a different format. Additional Information contains a description of the character encoding schemes used in the Java platform and pointers to any other information that is relevant to, but outside the scope of, this tutorial... Scattered throughout the tutorial there are a number of sections devoted more to explaining the basics of XML than to programming exercises: A Quick Introduction to XML; Writing a Simple XML File; Substituting and Inserting Text; Defining a Document Type; Defining Attributes and Entities; Referencing Binary Entities; Defining Parameter Entities; Designing an XML Document..." See: "Java API for XML Parsing (JAXP)." [cache]

  • [August 23, 2001] "XML DTDs for the E-PCT Standard. From Annex F, Appendix I, in Part 7 of the WIPO PCT Administrative Instructions Under the PCT Proposed Modifications Relating to the Electronic Filing and Processing of International Applications. PCT/AI/1 Add.4 Prov. Rev.3 (July 17, 2001). 84 pages. "This document presents the XML DTDs used for the electronic exchange of international application documents as defined in Annex F. It also contains details of the methodology adopted in drafting these DTDs... This specification is expected to expand in scope over time to include subsequent formal exchanges between the parties involved. Included below are some of the DTDs required for the initial phase of electronic submission of an electronic application, and for some communications in the designated Office PCT communication sector. Other DTDs will be required for later phases of the processing of an electronic PCT application (E-PCT). While the immediate goal of this specification is to support E-PCT applications, the Trilateral Offices intended to use it as the basis for their own national electronic applications for a variety of industrial-property types and recommend that it would be the basis for an eventual WIPO standard for use by other Offices. With that in mind, the DTDs created for E-PCT will be constructed in components for element definitions and from which the Trilateral Offices and others can derive elements and DTDs for their needs in a consistent and compatible manner. The other DTDs that will eventually be required will also be based on the component DTD architecture. The XML name-space facility (XMLNS) will be used to support the distinction of sub-component names in this specification and those produced by other Offices. As an aid to understanding the DTDs and creating additional components, an HTML-based documentation has been constructed of the elements and structures in the sources of the proposed DTDs, that will illustrate their relationships and their definitions. This HTML documentation is available on the PCT Electronic Filing web-site. Given the large number of international application document exchange DTDs, they will be built using a number of reusable standard components combined with transaction specific components. The method for producing and organizing these components will initially be based on replication among DTDs and inclusion of DTD structures using the XML feature for "external parameter entities." The emerging XML standard, called XML Schema, contains more sophisticated support for component-level structure and re-use. At some point in the future, when XML Schema has reached maturity in the industry, the DTD components may be converted to XML Schema components..." See details in the news item and in "WIPO XML DTDs for the Electronic Patent Cooperation Treaty Application." Document also available in Word .DOC format. [cache]

  • [August 23, 2001] "A New Kind of Namespace. [XML-Deviant.]" By Edd Dumbill. From August 22, 2001. "...when a technology emerges from the W3C there are always teething pains. W3C Recommendations aren't quite the shrink-wrapped parcels of authority they're often taken to be. Perhaps the canonical example of this has been the XML and Namespaces Recommendation, the somewhat unpredictable repercussions of which are still being felt; all the while, typically, the specification is heralded as a fine piece of work by some of its major supporters. This summer's problem child has undoubtedly been W3C XML Schema. One Schema issue in particular has already figured strongly in these pages: the use [of] unqualified elements within namespace-qualified types. Last week's discussion on XML-DEV, while not resolving this issue, has produced a useful crystallization of the motivation behind this technique. This article describes the motivation, the import of which seems to be that XML Schema has added to XML another form of namespaces... The inclusion of locally-scoped types in XML Schema is not an anomaly. It does come as a surprise, however, that it's taken this long to find a reasonable justification of the functionality... [thanks to a post from Matthew Fuchs] at least we know why locally scoped names were included in W3C XML Schema, something that will help the expert community develop best practices..." On namespaces generally, see references in "Namespaces in XML."

  • [August 23, 2001] "Modeling XML Vocabularies with UML: Part I." By Dave Carlson. From August 22, 2001. "The arrival of the W3C's XML Schema specification has evoked a variety of responses from software developers, system integrators, XML document analysts, authors, and designers of B2B vocabularies. Some like the richer structure and semantics that can be expressed with these new schemas as compared to DTDs, while others complain about excessive complexity. Many find that the resulting schemas are difficult to share with wider audiences of users and business partners. I look past many of these differences of opinion to view XML Schema simply as implementation syntax for models of business vocabularies. Other forms of model representation and presentation are more effective than W3C XML Schema when specifying new vocabularies or sharing definitions with users. In particular, I favor the Unified Modeling Language (UML) as a widely adopted standard for system specification and design. My goal in this article and in this series is to share some thoughts about how these two standards are complementary and to work through a simple example that makes the ideas concrete. Although this discussion is focused on the W3C XML Schema specification, the same concepts are easily transferred to other XML schema languages. Indeed, I have already applied the same techniques to creating and reverse engineering DTDs and SOX schemas, as well as RELAX, TREX, and RELAX NG. In general, I use the term 'schema' when referring to the family of XML schema languages... The focus of the present article has been capturing the conceptual model of a vocabulary, which is the logical first step in the development process. The next article presents a list of design choices and alternative approaches for mapping UML to W3C XML Schema. The UML model presented in this first article will be refined to reflect the design choices made by the authors of the W3C's XML Schema Primer, where this example originated. For our purposes, these authors are the stakeholders of system requirements. The third article will introduce a UML profile for XML schemas that allows all detailed design choices to be added to the model definition and then used to generate a complete schema automatically. The result is a UML model that is used to generate a W3C XML Schema, which can successfully validate XML document instances copied from the Schema Primer specification. Along the way, I'll introduce a web tool used to generate schemas from UML and reverse engineer schemas into UML." See in this connection: (1) "Conceptual Modeling and Markup Languages"; (2) the web site; (3) Carlson's book, Modeling XML Applications with UML: Practical e-Business Applications; (4) Kimber and Heintz, "Using UML to Define XML Document Types" [MLTP 2/3, 'defines a convention for the use of UML to define XML documents'].

  • [August 23, 2001] "Understanding W3C Schema Complex Types." By Donald Smith. From August 22, 2001. "Are W3C XML Schema complex types so difficult to understand that you shouldn't even bother trying? Kohsuke Kawaguchi thinks so; or so he claimed in his recent article, in which he offered assurances that you can write complex types without understanding them. My response to that assertion is to ask why would you want to write complex types without understanding them, especially when they are easily understandable? There are four things you need to know in order to understand complex types in W3C Schemas... One of the most important, but least emphasized, aspects of W3C schemas is the type hierarchy. The importance of the type hierarchy can hardly be overstated. Why? Because the syntax for expressing types in schemas follows precisely from the type hierarchy. Schema types form a hierarchy because they all derive, directly or indirectly, from the root type. The root type is anyType. (You can actually use anyType in an element declaration; it allows any content whatsoever.) The type hierarchy first branches into two groups: simple types and complex types. Here we encounter the first two of the four things you need to know in order to understand complex types: first, derivation is the basis of connection between types in the type hierarchy; and, second, the initial branching of the hierarchy is into simple and complex types. It's no wonder that people get confused about complex types. They generally don't realize that all complex types are divisible into two kinds: those with simple content and those with complex content. The reason why people don't generally realize this is because they normally learn the abbreviated syntax first. But, as we've seen, if you learn the full syntax and the logic behind it first, then the abbreviated syntax, and complex types in general, cease to be a befuddingly conundrum. If all of this is now as clear to you as it is to me, you don't have to trust anyone's assurances that you should use complex types without understanding them. You can now use and understand them..." For schema description and references, see "XML Schemas."

  • [August 22, 2001] "SAX, the Power API." By Benoit Marchal (Consultant, Pineapplesoft). From IBM developerWorks. August 2001. ['In this preview from XML by Example, compare DOM and SAX and then put SAX to work. This preview of the second edition of XML by Example by Benoit Marchal gives a solid introduction to SAX, the event-based API for processing XML that has become a de facto standard. This preview tells when to use SAX instead of DOM, gives an overview of commonly used SAX interfaces, and provides detailed examples in a Java-based application with many code samples. Used with permission of Que Publishing, a division of Pearson Technology Group.'] "Adapted from a chapter in the forthcoming second edition of XML by Example, the article serves as an introduction to SAX, the event-based API for processing XML that complements the Document Object Model, or DOM, the object-based API for XML parsers published by the W3C. You will see that SAX: (1) Is an event-based API; (2) Operates at a lower level than DOM; (3) Gives you more control than DOM; (4) Is almost always more efficient than DOM; (5) But, unfortunately, requires more work than DOM... As an object-based interface, DOM communicates with the application by explicitly building a tree of objects in memory. The tree of objects is an exact map of the tree of elements in the XML file. DOM is simple to learn and use because it closely matches the underlying XML document. It is also ideal for what I call XML-centric applications, such as browsers and editors. XML-centric applications manipulate XML documents for the sake of manipulating XML documents. For most applications, however, processing XML documents is just one task among many others. For example, an accounting package might import XML invoices, but that is not its primary activity. Balancing accounts, tracking expenditures, and matching payments against invoices are. Chances are the accounting package already has a data structure, most likely a database. The DOM model is ill suited to the accounting application, in that case, as the application would have to maintain two copies of the data in memory (one in the DOM tree and one in the application's own structure). At the very least, maintaining the data in memory twice is inefficient. It might not be a major problem for desktop applications, but it can bring a server to its knees. SAX is the sensible choice for non-XML-centric applications. Indeed SAX does not explicitly build the document tree in memory. It enables the application to store the data in the most efficient way..." Article also available in PDF format.

  • [August 22, 2001] "NewsML - User Guideline for News Agencies." [Compiled by] David Allen, Managing Director of International Press Telecommunications Council (IPTC). ['The NewsAgency Implementation Guideline 2001-08 covers various aspects of implementation that have been raised by users. The guideline is marked DRAFT as it has yet to be formally approved by IPTC members.'] "NewsML is an extensible management format for news information of all types. It consists of a single XML Document Type Definition (DTD) but allows a number of different types of document instances to be valid against the DTD. The document types are NewsML with contained NewsItem and NewsComponents, TopicSets and Catalogs. A Catalog is used by a provider to give information to users on how to locate the various parts of NewsML that are used in the providers services. The TopicSets provide data in controlled vocabularies to populate those parts of NewsML where vocabulary information is required, in particular DescriptiveMetadata. This Guideline provides advice on good practices for News Agencies using NewsML in their services. NewsML is intended to be used widely for news and has to exist in a standard form to allow providers and recipients to process the information in a consistent manner. NewsML has been Trademarked by IPTC and the published DTD must be used to validate all instances claiming to be NewsML..." Document also available in HTML format. See "NewsML and IPTC2000", and note the recent announcement for the NewsML derivative "SportsML." [cache]

  • [August 17, 2001] "RELAX NG Non-XML Syntax." By James Clark. Reference posted to the OASIS RELAX NG Mailing List, '', 17-August-2001, with subject 'A non-XML syntax for RELAX NG'. "I've developed an experimental non-XML syntax for RELAX NG. There's an implementation in Java (using JavaCC) that translates into RELAX NG. The RELAX NG schema for RELAX NG in the non-XML syntax is 64 lines (2107 bytes), versus 342 lines (8187 bytes) for the XML syntax. It's quite similar in many ways to the type syntax of the current XQuery 1.0 Formal Semantics WD..." From the web site description: "[This document references] a description of the non-XML syntax for RELAX NG. There is a Java program that translates from this non-XML syntax to RELAX NG's XML syntax; this is available packaged as a ZIP file containing source, documentation and a jar file, and as a Win32 executable; there is also documentation on how to use the translator; also an example showing the schema for RELAX NG's XML syntax written in the non-XML syntax. This syntax is not a part of RELAX NG, and is not a product of the OASIS RELAX NG TC..." See: "RELAX NG."

  • [August 17, 2001] xml2rfc -- Mailing list for software packages implementing rfc2629. Send email to See the reference page: "A handy little tool, xml2rfc, will allow you to take your XML source and see how the results look like in the original ASCII look-and-feel or the new modern HTML rendition of that look-and-feel. You can download xml2rfc as a zip or tgz file, or try your results in this handy converter form [...] In theory, the nroff output is suitable for input to the RFC editor. In addition, there is a directory that contains bibliographic summaries of each RFC, suitable for including in your input..." IETF Network Working Group Request for Comments 2629 "Writing I-Ds and RFCs using XML" was written by Marshall Rose. The specification "presents a technique for using XML (Extensible Markup Language) as a source format for documents in the Internet-Drafts (I-Ds) and Request for Comments (RFC) series..." See the XML DTD and "Using XML for RFCs."

  • [August 16, 2001] "House to Sweep Floor with XML." By Susan M. Menke. In Government Computer News (August 16, 2001). "The House of Representatives this week released drafts of 110 Extensible Markup Language document type definitions for all its legislative activities. The DTDs are in the public domain and cover categories ranging from bills and resolutions to deletions and anomalous document structures. No date has been set, however, for starting to format congressional materials in XML format, which would make searching and reusing the volumes of legislative output much easier. Current searches at are limited to bill numbers and key words. The congressional DTDs define the elements and relationships in what a legislative body produces, as distinct from, say, DTDs that already exist for the automotive, banking, telecommunications and other industries..." See details in the news item of August 13, 2001. On XML in US Government agencies: "US Federal CIO Council XML Working Group."

  • [August 16, 2001] ".NET Framework Beta 2 Evaluation Guide." By [Lori Merrick]. Microsoft Corporation. 2001-08-02. 53 pages. 'A Guide to Reviewing the Microsoft .NET Framework: a platform for rapidly building and deploying XML Web services and applications to solve today's business challenges.' Abstract: "The Microsoft .NET Framework is a platform for building, deploying, and running XML Web services and applications. It provides a highly productive, standards-based, multi-language environment for integrating existing investments with next-generation applications and services as well as the agility to solve the challenges of deployment and operation of Internet-scale applications... The .NET Framework is the result of two projects. The goal of the first project was to improve development on Windows, looking specifically at improving COM, the Microsoft Component Object Model. The second project aimed at creating a platform for delivering software as a service. These two projects came together more than three years ago. The finished product dramatically improves programmer productivity, ease of deployment, and reliable application execution, and introduces a totally new concept to computing: that of Web Services - loosely coupled applications and components designed for today's heterogeneous computing landscape by communicating using standard Internet protocols such as XML and SOAP. To solve the challenges facing Internet development now and for the future, we need to be able to write applications in any programming language, access any platform, and scale over the Internet to global proportions. This application development strategy is very compelling, as it enables companies to make use of existing hardware, utilize current applications and use developers they have on staff, without having to retrain them on a new programming language. This style of computing is called XML Web services and represents the next evolution of application development. An XML Web service is an application that exposes its functionality programmatically over the Internet or intranet using standard Internet protocols and standards such as HTTP and XML. XML Web services solve the challenges facing Web developers by combining the tightly coupled, highly productive aspects of N-tier computing with the loosely coupled, message-oriented concepts of the Web. Think of XML Web services as component programming over the Web. Conceptually, developers integrate XML Web services into their applications by calling 'Web APIs' just as they would call local services. The difference is that these calls can be routed across the Internet to a service residing on a remote system. For example, a service such as Microsoft Passport could enable a developer to provide authentication for an application. By programming against the Passport service, the developer can take advantage of Passport's infrastructure and rely on Passport to maintain the database of users, make sure that it is up and running, backed up properly, and so on, thus offloading a whole set of a development and operational chores. The .NET Framework is the culmination of the combined efforts of several teams at Microsoft, working together to create a platform for rapidly building and deploying XML Web services and applications. The vision for the .NET Framework platform is to combine a simple-to-use programming paradigm with the scalable, open protocols of the Internet. To achieve this vision several intermediate goals had to be delivered..." See .NET XML Web Service Specifications at the GotDotNet web site. [document source]

  • [August 16, 2001] "Overview of the ebXML Architectures." By Mike Rawlins. July 19, 2001. Part of ebXML - A Critical Analysis, "a series which presents an analysis of the products of ebXML, its success in achieving its stated objectives, and an assessment of the long-term impact of the initiative..." From the introduction: "ebXML started its work program with the overall solution and its high level architecture already decided. The task of the Architecture project team was therefore not to develop the architecture from a set of requirements, but to describe the chosen architecture (based on what the other project teams were doing) and flesh out its details. They had quite a difficult time doing this, evidenced by the fact that the Technical Architecture Specification was approved in February 2001, very near to the May 2001 completion of the ebXML work program. One of the main difficulties in describing the ebXML architecture is that, in conventional software architecture terms, there is not one ebXML architecture but instead there are two. One of these is the architecture for the software comprising the technical infrastructure, often referred to as a product architecture. The other is the architecture for performing systems analysis and development, often referred to as a process architecture. The Architecture Specification somewhat alludes to this in its "Recommended Modeling Methodology" section when discussing the Business Operational View and Functional Service View (two concepts taken from the ISO Open-EDI Reference Model). However, it does not explicitly identify two separate architectures. The latest academic thinking about software architectures holds that six attributes are required to fully describe an architecture. These are: (1) Elements (components/parts) from which systems are built; (2) Interactions (connections/connectors/glues/relationships) between the elements; (3) Patterns - The layout of the elements and their interactions, guiding their composition. For example, the number of elements, the number of connectors, order, topology, directionality; (4) Constraints - On the patterns. For example, temporal, cardinality, concurrency, etc.; (5) Styles - Abstraction of architectural components from various specific architectures (Sometimes used interchangeably with patterns). For example: Layered (as in Unix OS, OSI stack), pipe & filter, object oriented; (7) Rationale - Describe why the particular architecture was chosen... The ebXML Architecture Specification does a fairly good job in defining the elements and interactions of the ebXML Architecture..." See: "Electronic Business XML Initiative (ebXML)."

  • [August 16, 2001] "RELAX NG Shorthand Guide." By Kohsuke KAWAGUCHI (Sun Microsystems). Posted to the OASIS RELAX NG mailing list 2001-08-16 (''). ['I've written a simple XSLT stylesheet that allows you to write a RELAX NG schema in concise way, and produce a fully compliant RELAX NG schema automatically...'] "This document describes the functionality of RELAX NG short-hand processor. RELAX NG is a nice schema language, but sometimes it is painful to type all tags by hand. For example, if you want to write an optional attribute (which is IMO very common), you need to type in [...] it becomes especially hard if you are using normal text editor. The RELAX NG short-hand processor partially addresses this problem by providing several "short-hand" notations that makes schema authoring easier. I wrote a RELAX NG schema for VoiceXML by using this short-hand processor and it took 690 lines. After the processing, RELAX NG schema becomes 1036 lines. So in this case, it saves nearly 1/3 of the typing. Your experience will vary, but I hope you find this processor useful... As you see, it's almost like normal RELAX NG, but you'll notice that the namespace URI is different and there are unfamiliar attributes (@occurs and @type). The current processor is written in XSLT, so once you completed the schema, use XSLT processor to produce a normal RELAX NG schema. If you are using Windows, you can use msxsl tool as: c:\>msxsl myschema.srng shortRNG.xsl > myschema.rng And the produced myschema.rng file can be used with any RELAX NG compliant processor." See: "RELAX NG." [cache 2001-08-16]

  • [August 15, 2001] "OASIS ebXML Registry Proposal: ebXML Registry as a Web Service." Prepared by the OASIS ebXML Registry RAWS Sub-team. Posted 2001-08-15 by Farrukh Najmi to the OASIS Reg-Rep [RAWS] list. An initial RAWS draft proposal for consideration and review by the OASIS ebXML Registry TC. "This document proposes focused enhancements to the ebXML Registry Services specification that will allow the ebXML Registry services to be accessible as a set of abstract web services with concrete normative bindings specified for ebXML Messaging Service and SOAP. Currently the only normative access to the ebXML Registry is over the ebXML Messaging Service. What is lacking is a clean separation between an abstract service interface specification and multiple concrete technology specific bindings (e.g., ebXML Messaging Service). The proposal allows more flexibility and ease of access to clients by defining a second normative interface to the ebXML Registry that is based on the widely adopted SOAP protocol... The primary motivation behind this proposal is to further ebXML Registry adoption. It is our assertion that adoption is furthered by: (1) Building registry clients with limited infrastructure; (2) Enabling additional technology bindings for accessing the registry service; (3) Aligning with emerging and de facto standards. ebXML Registry adoption may be measured in the number of operational public ebXML Registries. Currently this number is one. We would like it to higher... Making ebXML Registry available as an abstract web service with additional technology bindings (e.g., SOAP) gives clients more options to interact with an ebXML Registry. A normative SOAP binding (SOAP 1.1 and SOAP with Attachments with HTTP) is proposed since SOAP has considerable mind share and adoption and has in fact been adopted by the ebXML Messaging Service itself. Numerous tools exist that make it very simple for clients to access any SOAP based web service... The following concrete deliverables are proposed: (1) XML Schema definition for [ebRIM] and [ebRS] with full support for XML namespaces, data types, constraints etc. This schema would replace Registry.dtd; (2) Abstract service definition of Registry Services; (3) WSDL description of the abstract Registry Services and related concrete SOAP binding..." See the associated files in the posting: Registry.xsd: The XML Schema for ebXML registry; Registry.wsdl: Abstract service definition for ebXML Registry service; RegistrySOAPBinding.wsdl: Concrete binding to SOAP/HTTP for the abstract ebXML Registry service. Context: OASIS ebXML Registry Technical Committee. Draft schemas are available in the .ZIP file. See: (1) "Electronic Business XML Initiative (ebXML)" and (2) "XML Registry and Repository."

  • [August 15, 2001] "IBM Touts Web Services, Portal Plans." By Paul Krill. In InfoWorld August 14, 2001. "IBM at its Software Solutions Conference here hailed Web services as a way to deploy new XML-based applications on the Web and integrate legacy applications. The company also detailed a merged portal offering featuring Lotus K-Station and IBM's WebSphere Portal Server... To foster Web services, Armonk, N.Y.-based IBM on Tuesday unveiled WebSphere Studio Version 4, a set of Web services and JSP (JavaServer Pages) development tools for building and deploying Web-based applications. The package features support of XML, Java, and SOAP (Simple Object Access Protocol). Also announced was availability of Visual Age for Java Version 4 featuring a beta version of the new WebSphere Studio Application Developer, which is a development environment for Java and J2EE (Java 2 Enterprise Edition) application developers... IBM also detailed a beta Linux release of WebSphere Studio Workbench, a free toolkit available for download. [...] Also, IBM and Natick, Mass.-based edocs, which develops online account management and billing systems, announced plans to offer telecommunications service providers a suite of e-business services for online account management and e-billing. The alliance includes integration of edocs' eaSuite with IBM eServers and middleware. The companies will leverage eaSuite to deliver business-to-consumer and business-to-business solutions for telecommunications providers. Additionally, the companies are collaborating to deliver online account management systems for other vertical industries, including financial institutions, retailers, insurance companies, and utility services."

  • [August 15, 2001] "IBM Unveils Web Services, Portal Tools Strategy." By James Thompson. In infoconomy August 15, 2001. "Computer giant IBM has announced a series of initiatives intended to make it a web services giant, starting with a new toolset. IBM claim's to be the first to market with a web services tool suite - WebSphere Studio Version 4 - which combines the concept with Java Server Pages (JSPs), which is available from this week. Web services are based on a set of open standards. The purpose is to enable organisations to link their business processes to partners and suppliers over the Internet, without the need for 'labour intensive' application coding. Based on the flagship WebSphere application server, Studio 4 will enable developers to deploy new XML-based applications over the Internet, as well as integrate legacy applications, claimed John Swainson, IBM's general manager of Application and Integration Middleware. Studio 4 will be licensed in addition separately to WebSphere Application Server 4. The company also announced the availability of Visual Age for Java Version 4. Visual Age includes a beta version of the new WebSphere Studio Application Developer, a development environment for Java and Java 2 Enterprise Edition (J2EE) developers... Studio 4 will support the standards central to the adoption of web services: XML, the universal description, discovery, integration (UDDI) protocol, the simple object access protocol (SOAP) and the Web Services Description Language (WSDL). However, a lack of agreement about the specifications and parameters these standards will use is currently holding back web services. As a result, analysts predict that widespread adoption of web services will not start until 2003 at the earliest..."

  • [August 15, 2001] "IBM Unveils Slew of Developer Tools." By Peter Galli. In eWEEK. August 14, 2001. "IBM used its Solutions technical developers conference here to announce a range of new and updated developer tools. The list included WebSphere Studio Version 4, its first commercially available set of Web services and JavaServer Pages development tools that enable software developers to create Web-based applications and extend their existing applications to the Web with minimal knowledge of Java, XML or SOAP. IBM officials said WebSphere Studio 4 will be available Aug. 28 and will cost $599 per copy for the Professional Edition and $1,999 a copy for the Advanced Edition. Both editions will include the beta version of WebSphere Studio Site Developer, IBM's development environment for Web site developers. IBM's Senior Vice President of Software Steve Mills also announced the availability of VisualAge for Java Version 4, which includes a beta version of IBM's new WebSphere Studio Application Developer--its next-generation development environment for Java technology and J2EE application developers. The Enterprise Edition will cost $2,999 a copy. For Linux developers, IBM announced that its free tool development kit, WebSphere Studio Workbench, will be available for download on August 28, 2001. The Workbench allows Linux software vendors to integrate their own tools portfolio with IBM's. An additional offering for developers is IBM's WebSphere Private UDDI Registry, available as a free download that will enable companies to implement Web services technologies in the controlled environment of a private intranet or extranet..."

  • [August 15, 2001] "Integrating Digital Educational Content Created and Stored within Disparate Software Environments: An XML Solution in Real-World Use." By Mark S. Frank, Thomas Schultz, and Keith Dreyer (Department of Radiology, Massachusetts General Hospital, Boston, MA). Paper presented at the 18th Symposium for Computer Applications in Radiology Annual Meeting, SCAR 2001, Salt Lake City, UT, USA, 3-6 May 2001. Published in Journal of Digital Imaging Volume 14, Number 2 (June 2001), pages 92-97. "The solution provides a standardized and scalable mechanism for exchanging digital radiological educational content between software systems that use disparate authoring, storage and presentation technologies. We use two software systems for creating educational content for radiology: one is an authoring and viewing application; the other facilitates the input and storage of interactive knowledge and associated imagery, delivering this content over the Internet. A subset of knowledge entities common to both systems was derived. An additional subset of knowledge entities that could be bi-directionally mapped via algorithmic transforms was also derived. An XML object model and associated lexicon were created to represent these knowledge entities and their interactive behaviors. Attention was exercised in the creation of the object model in order to facilitate the integration of other sources of educational content. XML generators and interpreters were written for both systems. Deriving the XML object model and lexicon was the most critical and time-consuming aspect of the project. The transfer of hundreds of educational cases and thematic presentations between the systems can now be accomplished in a matter of minutes. The use of algorithmic transforms results in nearly 100% transfer of context as well as content, thus providing presentation-ready outcomes. The automation of knowledge exchange between dissimilar digital teaching environments magnifies the efforts of educators and enriches the learning experience for participants. XML is a powerful and useful mechanism for transferring educational content between systems..."

  • [August 15, 2001] "Why Content and XML Integration Technologies Are Fundamental." By Frank Gilbane. In The Gilbane Report Volume 9, Number 6 (July 2001), pages 1-7. "... We are convinced that what we would broadly call content technologies and XML-based integration technologies, are the two most important categories of software technology today. We think that, aside from any hype associated with individual products, buzzwords, or market research growth predictions associated with these, that they are fundamental software segments that need to be at the core of all enterprise IT strategies. This month we discuss why these are so fundamental and explore how they are connected, and will continued to be propelled by commerce... XML can, and is, being used for many applications in many different ways, but more than anything else, it is at the center of the catalog integration challenge. The more XML is used, the simpler cross application and cross platform communications (and computing) will be. There is, in fact, a network effect here. XML is of course not the solution to all catalog or application integration problems. Partly because of all the XML hype, there remain XML detractors who criticize XML for not solving all our information management problems. But, having a common syntax that developers all over the world understand provides immeasurable value. The lack of universal semantics is not a deficiency because it is an impossibility. Certainly the honeymoon with XML is largely over as people realize that XML is not magic or even easy. It is just an indispensable and inevitable enhancement to computing. Without XML (or something else that did the same thing) we wouldn't be looking forward to eventual B2B integration. XML simply is the way companies will communicate across firewalls for all types of B2B integration and web services. This does not replace the need to parse and process and feed the data to appropriate software applications - there is still a lot of development necessary, especially with legacy applications... Content management of the enterprise kind, and XML, both need to be core parts of IT infrastructures. This is not the same thing. XML is at least as important, and far more popular, for application integration than for content management. There is certainly a critical connection. But the reason XML is so important is that it greases many wheels and gears throughout infrastructures. Now is a good time to make your own judgments about strategic IT initiatives and which technologies are critical for investment and which would be 'nice to have' but are not critical. There are other important technologies besides those that we have focused on, but it is clear that whatever else you need to invest in, content and XML integration technologies must be included..."

  • [August 15, 2001] "Microsoft Debuts Demo 2 of XML Query Tool." By Jeffrey Burt. In eWEEK August 14, 2001. "Microsoft Corporation today released the second demo of an XML query tool. XQuery is the Redmond, Wash., company's implementation of the latest version of the World Wide Web Consortium's XML Query Working Draft, which was released June 7. 'This is emerging technology,' said Philip DesAutels, product manager for XML Web services at Microsoft. 'We're rolling this out to our developer community to go out and get their input into it.' A committee of the W3C is working on an XML (Extensible Markup Language) query specification designed to enable users to extract data from documents on the Web and manipulate the data that is found. The committee released the first public draft of the technology in February, and Microsoft released its first demo version of XQuery in April. Eventually, XQuery will be incorporated throughout Microsoft's products, from its SQL Server database tools to the company's .Net frameworks, said Mark Fussell, lead program manager for XML technologies at Microsoft. More than 30,000 individual developers visited the Web site of the first demo, and Microsoft expects many more to do so with the second demo. Available on the Web or by download, the second demo enables developers to use it with .Net today and includes an interface to allow query results on SQL Server to be tagged as XML data..." See the recent news item, and the announcement from May 14, 2001: "Microsoft Hosts Online XQuery Prototype Application." General references: "XML and Query Languages."

  • [August 15, 2001] "Microsoft publishes XML 'missing link'. XQuery to help developers build Web services that extract data from XML-based documents." From ZDNet News. Wednesday 15th August 2001. "Microsoft has published its first implementation of a specification for extraction of data from XML-based documents, which the company claims is the equivalent of SQL for relational data, and which has so far been missing from XML standards. Microsoft announced yesterday that it has published XQuery Demo, an implementation based on the World Wide Web Consortium's recently published XQuery draft. XQuery is a data model for XML documents, a set of query operators on that data model, and a query language based on the query operators. Microsoft's product manager for XML web services Philip DesAutels said that all developers using XML will come in to contact with XQuery, which he said had been an important yet missing element from a growing body of XML standards. "If you are using XML you will touch XQuery. It will reach across the developer community," he said. DesaAutels said he believes XQuery is fundamental to Web services, mirroring SQL's importance in traditional client/server-based computing. Microsoft hopes that developers will use XQuery Demo to build Web services that extract data from XML-based documents..." See the recent news item and the announcement from May 14, 2001: "Microsoft Hosts Online XQuery Prototype Application."

  • [August 14, 2001] "The House That CORBA Built." By Paul May. In Application Development Advisor Volume 5, Number 5 (June 2001), pages 30-32. ['The OMG's Corba hasn't been talked about much recently, outside of mission-critical developments in key vertical markets. But now, explains Paul May, the organisation is trying to reposition itself at the centre of interoperability with the Model Driven Architecture.'] "The OMG has introduced a newly rationalised context called Model Driven Architecture (MDA). This initiative is designed to provide a framework for integrating new applications with older, legacy software, using key modelling standards including the Unified Modelling Language (UML), the Meta Object Facility (MOF) and the Common Warehouse Metamodel (CWM) at its core. These modelling standards will be middleware-independent, meaning that they can hook into, say, Corba, XML or Java. With this 'one size fits all' approach, the OMG is promoting MDA as an all-embracing standards framework for application development, relying on UML's authority as much as its containment of Corba. The move is a shrewd one, opening up new vistas for the OMG; MDA provides an ongoing context for Corba, while designating it as just one of many potential infrastructure layers. In this way, it can keep itself involved in new developments, such as the burgeoning Web services market... While the OMG's original impact on the software industry derived from the Corba standard's role in the middle tier, with the MDA launch, the body is promoting itself as a broader standards-setting group for the systems development community. MDA is a forward and backward-looking systems development and integration strategy, designed to reduce developers' dependency on any one product type, vendor or platform which supports XML-based languages. One example is XMI (XML Metadata Interchange), an XML standard that enables analysts and designers to exchange modelling information. These domain-specific XML standards are appearing widely. Developers throughout the industry are embracing XML as an elegant, efficient base technology for expressing semantics of all kinds, from engineering specifications to online monetary transactions. XMI was inspired by practitioners and UML can be seen as the result of practitioner pressure for a standard modelling approach in the face of previous industry confusion. The confusion was all the more frustrating when the analysis and design gurus at the heart of the modelling movement tended to disagree with each other more than they agreed." See "OMG Model Driven Architecture (MDA)" and "XML Metadata Interchange (XMI)." [cache]

  • [August 13, 2001] "Above the Noise. Realizing Age-Old Visions of Software." By Michael Vizard [InfoWorld Editor in Chief]. In InfoWorld Issue 33 (August 13, 2001), page 8. "One of the persistent knocks against IT is that, depending on who you ask, as much as $7 out of every $10 dollars spent on software goes into installing and integrating the software once it's purchased. This always leaves the people who buy the software feeling a little burned, and frankly, gives the industry a bad reputation among businesspeople who typically fund these projects... The good news is that new technology is about to make a difference in this space in the form of Web services. As a concept, Web service is not all that new. Basically, it consists of network resident applications made up of components that can be called over the Net using a set of defined interfaces. And most of us have been talking about this type of software architecture for more than 20 years. What's new is the advent of XML, which in its purest form is a self-describing data-neutral format. The key phrase in that description is the term self-describing, which means that one piece of software can automatically discover and understand the function that another piece of software can perform. We're still in the early days of developing all the standards around Web services, but once they are in place, Web services will significantly alter how we deploy, manage, and acquire software -- and more importantly, reduce a lot of the soft hidden costs associated with enterprise software... there is a war on between Microsoft and its vision of XML as implemented in its .NET architecture and the backers of Java, who tend to view XML as a complementary set of technologies that will make Java applications more accessible. It's unlikely that either side will vanquish the other, so we can expect to see these two architectures co-exist in the market. Nor do we necessarily have to wait for the big guns in the industry to finish their architectures to take advantage of Web services today. A number of smaller companies, including NetObjects, Cape Clear, SilverStream, Avinon, and WebCollage, all have viable offerings available today..."

  • [August 13, 2001] "Plugging into the Global Grid." By Tom Sullivan and Ed Scannell. In InfoWorld Issue 32 (August 06, 2001), pages 15-16. "Just as enterprises are focusing on how to optimize their IT resources in a slowing economy, a new way to exploit the Internet to connect processors in potentially massive virtual systems is emerging. What some experts are calling the Great Global Grid is on its way to becoming a locus for tapping into a wealth of unused processing power, taking advantage of extensive XML processing and peer-to-peer networking, as well as strengthening Web services and other widely distributed software systems... IBM officials this week announced that the company will work with The National Grid, a far-flung network of systems spread across the United Kingdom. IBM is also expected to offer grid-computing services in a utilitylike model. Specifically, IBM will build at Oxford University a sophisticated data-storage facility that will be the primary source of high-energy physics data to be generated by the U.S. Particle Physics Lab in Chicago. Additionally, the U.K. grid will be used for experiments at the recently created Large Hadron Collider at CERN, the European particle physics lab in Geneva. Also this week, Almonte, Ontario-based Killdara announced Vitiris, a Web services platform that will connect grids by supporting de facto Web services standards XML, SOAP (Simple Object Access Protocol), WSDL (Web Services Description Language), and UDDI (Universal Description, Discovery and Integration). Standards will be key to the success of grids, and last month, members of the academic, scientific, and commercial arenas, as well as standards bodies such as Object Management Group (OMG), World Wide Web Consortium (W3C), and Grid Forum, came together at the Software Services Grid Workshop to accelerate the growth of standards for the Great Global Grid. Another Grid standards group, Globus, an open-source development organization, will convene in San Francisco later next week to focus on infrastructure issues for high-performance computing via wide-area links. Globus is focused on developing a standardized grid architecture and related technologies..." See also "Software Services Grid Workshop."

  • [August 13, 2001] "Integration is Power. ROI, XML, Web Services. GE GXS' Frank Campagnoni Discusses Integration Challenges." By Mark Jones. In InfoWorld Issue 33 (August 13, 2001), pages 36-39. "Understanding the benefits of a truly integrated IT system is no mean feat when a costcentric model can obscure the advantages of business-to-business relationships. Frank Campagnoni, CTO of GE Global eXchange Services (GXS) and a member of InfoWorld's CTO Advisory Council, talks with InfoWorld West Coast News Editor Mark Jones about the challenges of enterprise and business-to-business integration... [excerpts:] 'GE GXS is one of the largest providers of business-to-business e-commerce services focused on EDI [electronic data interchange]. Data interchange traditionally meant EDI, but now it means things like XML and user-defined formats for exchanging crucial business information over the Internet or a private network. We also provide supply-chain services capabilities because our service-based model includes services and applications that allow companies to interact with one another in private exchanges, public exchanges, or industry-focused exchanges. We glue all that together with a line of integration software.... One of the things I see is that a lot of customers get ahead of themselves, particularly in business-to-business e-commerce. One of the things that's really important to do is to get your internal health in order before you look at how to get a cohesive strategy in terms of how you're going to interact externally. Some folks get a little bit ahead of the game and try to put a strategy in place for how they're going to do b-to-b commerce before they've actually got any sort of integration in place with their own IT systems. And that causes them a great deal of pain and hardship. The other thing I think is important is that folks try to formulate an IT strategy. When you start to look externally at interacting with your supply chain, one of the things I found that a lot of folks tend to do, which I think is a mistake, is to somehow hitch their wagon to the leading initiatives. So should I do xCBL [XML Common Business Library] à la Commerce One, should I do cXML [Commerce XML] à la Ariba?... I think the core issue is the fragmentation of standards. The issue of XML nowadays is to pick a standard because there are plenty of them to choose from, and that can become counterproductive. One of the capabilities that we provide is removing the technology risk from businesses. Our view is we sit in the middle of businesses, we take whatever data format is used within the business, and we can translate that to whatever data format their trading partners want, thus relieving folks from having to make a choice, which may or may not be right. It's the benefit of going with a service provider..."

  • [August 13, 2001] "Web Services Competition Heats Up." By Tom Sullivan. In InfoWorld August 13, 2001. "Application server vendors are gearing up to do battle with IBM and Microsoft in the Web services game by strengthening their infrastructure toolkits and support for standards. Sybase will release next week EAServer 4.0, the latest edition of its application server, which includes increased J2EE (Java 2 Enterprise Edition) and XML functionality. This week, Billerica, Mass.-based SilverStream Software announced its first application server with core support for SOAP (Simple Object Access Protocol), XML, UDDI (Universal Description, Discovery, and Integration), and WSDL (Web Services Description Language)... Earlier this month, BEA Systems, based in San Jose, Calif., unveiled a WebLogic version with built-in support for Web services. Dublin, Ireland-based CapeClear has also expanded its reach with new support for iPlanet's application server. Oracle this week made an early version of JDeveloper for 9i available on its Web site for download, targeting developers who build and deliver Web services. Oracle is behind Microsoft in marketing Web services, but the vendor foresees Web services as the next step for doing business via the Internet, said John Magee, Oracle's senior director of 9i product marketing, based in Redwood Shores, Calif. Oracle plans to apply its strategy of integrating everything from its database and applications to development tools, and offer a tightly knit package for Web services, he said. Oracle is taking direct aim at Microsoft, IBM, BEA, Hewlett-Packard, and Sun Microsystems..."

  • [August 13, 2001] "An Introduction to XML Digital Signatures." By Ed Simon, Paul Madsen, Carlisle Adams. From August 08, 2001. ['How to understand and create XML-aware digital signatures.] "The very features that make XML so powerful for business transactions (e.g., semantically rich and structured data, text-based, and Web-ready nature) provide both challenges and opportunities for the application of encryption and digital signature operations to XML-encoded data. For example, in many workflow scenarios where an XML document flows stepwise between participants, and where a digital signature implies some sort of commitment or assertion, each participant may wish to sign only that portion for which they are responsible and assume a concomitant level of liability. Older standards for digital signatures provide neither syntax for capturing this sort of high-granularity signature nor mechanisms for expressing which portion a principal wishes to sign. Two new security initiatives designed to both account for and take advantage of the special nature of XML data are XML Signature and XML Encryption. Both are currently progressing through the standardization process. XML Signature is a joint effort between the World Wide Web Consortium (W3C) and Internet Engineering Task Force (IETF), and XML Encryption is solely W3C effort. This article presents a brief introduction to the XML Signature specification and the underlying cryptographic concepts... As XML becomes a vital component of the emerging electronic business infrastructure, we need trustable, secure XML messages to form the basis of business transactions. One key to enabling secure transactions is the concept of a digital signature, ensuring the integrity and authenticity of origin for business documents. XML Signature is an evolving standard for digital signatures that both addresses the special issues and requirements that XML presents for signing operations and uses XML syntax for capturing the result, simplifying its integration into XML applications." See "XML Digital Signature (Signed XML - IETF/W3C)."

  • [August 13, 2001] "Creating VoiceXML Applications With Perl." By Kip Hampton. From August 08, 2001. ['Kip Hampton shows how Perl and VoiceXML can work together.'] "VoiceXML is an XML-based language used to create Web content and services that can be accessed over the phone. Not just those nifty WAP-enabled 'Web phones', mind you, but the plain old clunky home models that you might use to order a pizza or talk to your Aunt Mable. While HTML presumes a graphical user interface to access information, VoiceXML presumes an audio interface where speech and keypad tones take the place of the screen, keyboard, and mouse. This month we will look at a few samples that demonstrate how to create dynamic voice applications using VoiceXML, Perl, and CGI. A rigorous introduction to VoiceXML and how it works is beyond the scope of this tutorial. For more complete introductions to VoiceXML's moving parts see Didier Martin's 'Hello, Voice World' or the VoiceXML Forum's FAQ... VoiceXML is much more than an alternative interface to the Web. It allows developers to extend their existing applications in new and useful ways, and it offers many unique opportunities for new development. As you may have guessed, though, that power and flexibility come with a hefty price tag: VoiceXML gateways (the hardware and software that connect the Web to the phone system, translate text to speech, interpret the VoiceXML markup, etc.) are not cheap. The good news is that many of prominent VoiceXML gateway providers offer free test and deployment environments to curious developers, so you can check out VoiceXML for yourself without breaking the bank." References in: "VoiceXML Forum."

  • [August 13, 2001] "Opening Old Wounds. [XML Deviant.]" By Leigh Dodds. From August 08, 2001. ['Dodds delves into namespaces again.'] " This week's XML-Deviant explores a namespace debate that has resurfaced on XML-DEV and wonders whether a few rays of sunshine could dry up this and other debates once and for all... The debate was sparked again following the publication of Simon St. Laurent's SAX Filters for Namespace Processing. These filters allow an explicit association of unprefixed child elements (e.g. familyName above) with a parent element's namespace. Some concern was voiced over the prospect of the side-effects of their unguarded use. But careful application of the filter could benefit those preferring to use explicit namespace associations..." See references in "Namespaces in XML."

  • [August 11, 2001] "Postcard From the Future: An Introduction to Intentional Schema-Based Programming." By Michael Corning (Microsoft Corporation). 11 pages. ['Intentional Programming is a programming system that brings the power of the computer to bear on the development of software. That is, Intentional Programming has the promise to bring Moore's Law to the software industry. This session illustrates Intentional Programming with an implementation of Schema-Based Programming, something called Intentional XML.'] "IP and XML have a great deal in common but this is not by design, for the two technologies have known nothing of each other save for the work done with Intentional XML over the last year or so. Especially from the perspective of schema-based programming, IP and XML are both technologies where: Programs are treated like data Programming is a matter of data transformation Applications are infinitely extensible A key concept in IP is identity; in XML it's the notion of names. IP was designed to represent code; XML was designed to represent data. IP uses an extremely efficient binary format; XML uses an extremely flexible text format. As with all synergistic systems, neither can do what the other can, and both together do more than either separately. Remember that the key approach to schema-based programming is the rigorous use of abstractions. Models, and Views, and Controllers are abstractions, ways to think about programming independent of implementation issues. XML is a way to implement, but is itself based on abstractions such as nodes and trees. IP, too, is based on trees and extends the notion of names, making them markers for something deeper. Where namespace management is crucial in XML, it's trivial in Intentional XML. But in terms of how IP will impact schema-based programming, there is one thing that is most crucial. As you saw in session one, extending features in schema-based programming applications is demonstrably easy, but, in all cases we are simply extending the application's fixed collection of constructs and abstractions... The first principle was 'separate presentation from data', and the second was 'separate presentation from implementation'. SBPNews meets (and extends) the first principle by embracing the Model-View-Controller design framework and by incorporating XML into the Model. The result is a state-of-the-art fifth generation web site that is extremely fast, easy to code and maintain, and extremely easy to extend. The major limitation of SBPNews is that it requires a state-of-the-art client such as Internet Explorer 5.x. It's not so much that downlevel browsers are left behind, but that emerging tiny bandwidth form factors such as the PocketPC and web-enabled cell phones are left out. To address these issues, and to meet the demands of the second principle, I moved SBPNews to the server using XSL ISAPI 2. But to fully qualify for meeting the bar of the second principle, I had to imbue XSL ISAPI 2 with COM; I did this by adding arbitrary COM component support to XSLT transforms. Principle III: Separate process from implementation... XML represents the relationships and messages used by all components of any given process. When those components include web-services (as they will in early 2001), that '"process' can be an arbitrarily large part of the entire Internet. In a way, using BTO enables large-scale applications and processes to be built declaratively (another fundamental tenet of schema-based programming)... declarative languages have a great deal to offer the professional developer: faster, smaller, more extensible and flexible applications. Principle IV: Separate source code from implementation...You need to separate your source code from its implementation. You need something that will mediate between the two, keeping track of versions of that source code, references to objects used by the source code, and even the abstractions your mind uses to think about what the source code is doing and what you need your application to ultimately do for you. The IP system is a cross between operating system and compiler. The programs you write inside IP don't run from IP -- you build applications for a given implementation from IP..."

  • [August 11, 2001] "More XML Fundamentals. [Web Developer.]" By Michael S. Dougherty. In DB2 Magazine (Quarter 3, 2001). "In this article, I'll discuss some more advanced concepts and show how XML ties into DB2 Universal Database (UDB)... Like a database, XML involves storage, schema, query languages, and plenty of interfaces. However, XML lacks many core database tools and features, such as indexes, security, data integrity, large capacity, and triggers. When determining how to implement XML with a database, one of the first points to consider is whether the XML implementation will encompass data storage or the overall design of Web pages or will be used primarily for document management. This question is important because using XML with document management is very different than using XML with data storage retrieval. XML handles document management with the Document Object Model (DOM) using a native XML database (one designed specifically for XML storage) or a content management system (an application designed to manage documents that is built with native XML). DOM is an API for accessing content within a Web browser that is written to include information about document structure. DOM allows the developer to dynamically access and update the content, structure, and style of documents. Using the DOM is excellent for document management, but often is not necessary for data management. In spirit with the first installment of this article, we shall focus on data management... In the last article, the author described how to use XML to connect to DB2 UDB. Because DB2 is a relational database, the most common connection mechanisms include Microsoft's Open Database Connectivity (ODBC), Sun's Java Database Connectivity (JDBC), and newer hybrids such as Object Linking and Embedding (OLE). The main liability of using these class libraries, as well as those that access native database drivers, is that they are too complex for standard XML use. Currently, XML interfaces provide the best support for update, delete, insert, and query messages. The interface for handling multiple objects in the database will not be much more complex. Therefore, the XML classes in the development environment provide simple functionality, and may not be sufficient for the types of connectivity requirements of some applications... . The primary use of relational databases like DB2 UDB 7.2 with regard to XML is to integrate XML styles, tables, and object mapping to the dynamic appearance of Web pages. When mapping data with XML and relational databases, you can choose from several options. Remember that XML is basically a hybrid similar to object databases in data modeling, so it can represent data from an RDMS adequately. There are plenty of software products that effectively and automatically map XML objects and classes to relational database tables and directly into XML databases. Database vendors such as IBM, Microsoft, Oracle, and Sybase have developed tools to assist in converting XML documents into relational tables..." See: "XML and Databases."

  • [August 11, 2001] "Information Supply Chain: XML Schemas Get the Nod." By Solomon H. Simon. In Intelligent Enterprise (July 23, 2001). ['Now that XML Schemas have reached final recommendation status, they are more attractive than DTDs.'] "In spite of the kick-start that B2B e-commerce provided for XML, many companies held back because of their perception that XML lacked standards. But now that XML Schemas have been given final recommendation status by the World Wide Web Consortium (W3C), that resistance can start to subside. This status is tantamount to making XML Schemas a metadata standard and a valid alternative to Document Type Definitions (DTDs). With the general acceptance and use of schemas, companies will be ready to kick their XML communications and data interchange efforts into high gear. XML Schemas greatly simplify the use of XML in business applications because they follow XML format, enable data reuse, are compatible with extensible stylesheet language transformations, and are simpler compared to DTDs... A schema is the XML construct used to represent the data elements, attributes, and their relationships as defined in the data model. By definition, a DTD and a schema are very similar. However, DTDs usually define simple, abstract text relationships, while schemas define more complex and concrete data and application relationships. A DTD doesn't use a hierarchical formation, while a schema uses a hierarchical structure to indicate relationships. The XML Schema standard uses the XML syntax exclusively, rather than borrowing from SGML, and it will augment, then later supplant, DTDs... Schemas give the developer richer control over the data type declarations than is possible in DTDs. Second, schemas allow greater reuse of metadata by permitting the developer to include more external schemas than allowable with DTDs. The main reason to use schemas is to improve compatibility and consistency within an XML document or application. In isolation, it doesn't matter significantly if an XML document uses a DTD or a schema. However, the moment that a developer or user wants to modify the document, share the document, or combine multiple documents, the differences become more apparent. Because schemas follow the XML format, it is easier to design tools, such as extensible stylesheet language transformation scripts, that will modify them. A real concern about XML documents is that developers will use different vocabularies, which will minimize interoperability. To leverage the capabilities of XML, developers must be able to bend the syntax rules of a specific document without breaking the vocabulary. Although there are still obstacles to overcome, such as vocabulary, the W3C's recommendation of XML Schemas is a major step toward better data interchange between companies and, eventually, more sophisticated, widely used B2B e-commerce." For schema description and references, see "XML Schemas."

  • [August 11, 2001] "Building the XML Repository. Part 1: The Data Administrator's View of XML." By David Plotkin (Manager of Data Administration Longs Drug Stores, Inc.). In Intelligent Enterprise (August 2001). "Welcome to part 1 of the tutorial 'Building the XML Repository'. This two-part web tutorial will cover the following material: Part 1, The Data Administrator's View of XML: What XML is -- from the standpoint of a Data Administrator; What XML is NOT; The Relational View of XML (which is hierarchical); Why Use XML instead of a flat file?; Validating an XML document with a Document Type Definition (DTD); Using Elements and Attributes in a DTD; Using Entities in a DTD. Part 2, Implementing the DTD Repository: The New Frontier in Metadata Management; Reasons for Building a DTD Repository; The overall schema; Relating DTDs and XML documents; Relating DTDs, XML documents, and Elements; Relating Elements and Attributes; Physical implementation of Elements and Attributes; Repository functionality; Building the Repository application; Lessons Learned..." See other introductory articles and tutorials in "Introducing the Extensible Markup Language (XML)."

  • [August 11, 2001] "Changing Terrain: Open Middleware Standards Are Redefining EAI and B2B Integration." [IntelligentEAI Features] By Mark Hansen (MIT Sloan School of Management). In Intelligent Enterprise Volume 4, Number 12 (August 10, 2001), pages 60-63. "During the last 12 months, the market has responded to customer demands for standardization with a plethora of new EAI standards. EAI functionality that used to be offered only by a few vendors, bundled into proprietary products, has gradually been standardized by various industry bodies and is becoming part of the basic e-business infrastructure offered by application server, messaging, and database vendors. For example, connectors to enterprise information systems (EISs) such as SAP and Siebel Systems Inc. used to be available only when bundled with a proprietary integration broker. Now they are available from EIS vendors in compliance with the Java 2 Enterprise Edition (J2EE) Connector Architecture standard to work as plug-and-play adapters with J2EE application servers. In another example, B2B-oriented Internet messaging functionality that used to be available only when bundled with a proprietary B2B server is now being built into the Java programming language in the form of the Java API for XML Messaging (JAXM)... SOAP, WSDL, and UDDI for B2B integration: Originally, the online ordering system was developed so that [fictitious] ABC's retail customers could place orders over the Internet using a browser. However, the application has been so successful that ABC's wholesale customers have demanded that ABC provide these capabilities as Web services that can be accessed by the wholesale customers' B2B ordering applications. Fortunately, ABC's J2EE application server vendor recently announced "out of the box" support for Web services standards including Simple Object Access Protocol (SOAP), the Universal Description, Discovery, and Integration (UDDI) specification, and the Web services description language (WSDL). This means that ABC can take the Enterprise JavaBeans (EJBs) for various order management functions (for example, price quote, check availability, and place order) that were created for the online ordering system and automatically generate the Web services interface illustrated in Figure 1 using tools bundled with the J2EE application server. SOAP is a standard specifying how two applications can exchange XML documents over HTTP. Originally championed by Microsoft, SOAP now has broad industry support and is being considered by the World Wide Web Consortium (W3C) standards body in the context of the XML Protocol Activity... WSDL is a standard for describing the XML that goes in and out of a Web service. ABC's wholesale customers can read the WSDL for a purchase order from the Web services interface, and use this information to properly structure an XML document representing a purchase order that is sent to ABC in a SOAP message... UDDI is a standard for publishing information about Web services in a global registry. For example, wholesale customers who want to place purchase orders over the Internet with ABC could look up information about ABC's Web services in a UDDI Registry. There, they would find links to the URLs containing the WSDL information needed to format purchase orders correctly for posting, via SOAP, to ABC's Web services interface..."

  • [August 11, 2001] "Instant Recall With XML Data Caching. XML Data Caching." By Tony Amos (New York Mercantile Exchange). In WebTechniques Volume 6, Issue 9 (September 2001), pages 39-41. ['Most applications use a single data store, leading to potential bottlenecks by slow database requests. Alleviate this malady with XML data caching on the Web server or application server.'] "Because the typical three-tier Web application design employs one or more Web servers and a single data store, one all-too-common malady is slow response to database requests. By caching database information on the Web server or application server, you can relieve the database server of some of its repetitive work. One way to do this is to create an in-memory database on the Web server that maintains a copy of static, read-only information drawn from the database. Not only does this enhance performance by reducing the database load, but you also gain greater flexibility in how your application can use the data. For example, the application could then perform its own sorting, key lookups, and operations on data subsets. This isn't as hard as it may sound. You can find a prebuilt solution in most developers' toolkits: A good XML parser coupled with an XSLT processor delivers everything you need, and more... In my example, I'll use the XML parser to cache the names and abbreviations of the 50 U.S. states. The number of states isn't likely to change often, so I could code them right into the forms. However, that approach is more cumbersome to maintain. If my application's geographic rules were to expand to include U.S. possessions, I would have to change every page by hand. By using code to populate the list of states, a simple change to the database updates all of my pages. ... This example uses ASPs to implement a server-side cache of states. Your application may have many other data elements to which you can apply this technique, including cross-referencing codes, zip codes, area codes, or any other static code that you use frequently. This solution works on any platform that supports dynamic Web pages and some form of global variable. As I've said before, the method names will vary between parsers and processors, but the concept remains consistent across platforms..."

  • [August 08, 2001] "Federal Tag Standards for Extensible Markup Language." Logistics Management Institute (LMI) Report. By Mark R. Crawford, Donald F. Egan, and Angela Jackson. Report GS018T1. June 2001. 76 pages. "The General Services Administration (GSA) tasked the Logistics Management Institute (LMI) to review different XML tag dictionary initiatives to determine if the federal government can adopt a single standard for establishing XML tags. To establish an initial focus to this effort, GSA nominated the XML tags of a set of widely used federal forms as a pool of data elements to study. These forms included several grant forms and a set of XML tags that had already been built for an electronic federal grant application. GSA also asked us to include tags developed by RosettaNet for ordering products. In addition to establishing an approach, we were tasked to develop preliminary strategies for standardizing XML and data elements beyond simply assigning tags. Issues that have arisen from this analysis are (1) the actual naming of data elements or fields, (2) general or specific names for data elements or fields, (3) rules for structuring the data elements or fields, and (4) the establishing of limits to the length of a data element or field name. The data set of elements that resulted from even this relatively small sample size was very large -- more than 8,000. To bring the number down to a more manageable limit, we focused on several widely used data elements, such as name, organization, and dates. This group of data elements was not only the most common but also often the most complex. In reviewing the data set, we found that, although all the industry groups were following similar paths, they diverged in detail. Most of these efforts also are in their initial stages. Currently, RosettaNet is the most fully developed. The ebXML endeavor, although important because it is the only initiative with the formal backing of a neutral, internationally recognized standards body, probably has made the least progress toward a completed dictionary. However, ebXML has established a coherent set of design rules derived from International Organization for Standardization (ISO) standards for data dictionaries. One other issue quickly became evident: the federal forms GSA provided indicate that the government has immediate areas of interests (e.g., personnel management and grants) currently ignored by industry efforts. For these reasons, we believe that the federal government should -- in conjunction with the soon-to-be-finalized ebXML core-component technical specifications -- establish a set of rules and specific tags that will meet its needs, and that these closely align to what is present in industry. We also believe that establishing a set of XML tags is only an initial step. Many new requirements will emerge and many of them will be unique to the government. An ongoing mechanism is necessary to meet these new requirements. Further, tags are only the tip of the component-standards iceberg. Document type definitions (DTDs) and Schemas are the means for defining an XML business transaction (roughly equivalent to an EDI transaction set); the government also will need to participate in creating them as well..." See also "US Federal CIO Council XML Working Group." [cache]

  • [August 08, 2001] "Corel to acquire SoftQuad." By Jeffrey Burt and Peter Galli. In eWEEK. August 07, 2001. "Corel Corp. said Tuesday it will acquire SoftQuad Software Ltd. in a stock-for-stock deal designed to give the Ottawa-based software company XML capabilities to help push its product line onto the Web. Terms of the transaction were not disclosed, although published reports value the deal at approximately $36 million. It still needs to win regulatory and SoftQuad shareholder approval. The transaction follows Corel's July 16 acquisition of Micrografx Inc., which was the first phase of a three-step corporate strategy the company outlined in January to expand its content publishing offerings. A key part of the initiative was enabling customers to publish through multiple delivery channels, including the Web. The deal for Toronto-based SoftQuad completes the second phase -- to leverage Web-based technology -- of the three-pronged strategy. SoftQuad's expertise is in XML (Extensible Markup Language) technologies, including the XML-based content creation tool, XMetal... Corel realized that once an HTML document was posted on the Web, it was fairly static... Corel's third step will include developing new technology targeting what it calls high-growth areas, including wireless and Web services. Corel is pushing aggressively ahead with its new corporate strategy after a tough winter that saw it reporting a fourth-quarter loss and a failed merger with what is now Borland Software Corp. more than a year ago." See the announcement.

  • [August 08, 2001] "Corel XML buyout 'is .NET move'." By Joris Evers and Sam Costello. In Macworld Daily News August 08, 2001. "Corel is to acquire Toronto-based XML (Extensible Markup Language) developer SoftQuad Software in an all-stock transaction valued at $37 million. The acquisition furnishes Corel with another building block for developing solutions to offer as part of Microsoft's .NET strategy - Microsoft's platform for XML Web services. These are platform- and language-independent applications that can share data online. Corel can now deliver a product that will allow customers to create, manage and simultaneously publish content across multiple delivery-channels. SoftQuad has partnerships with makers of content-management systems, Corel said. The company plans to offer tools for Corel's equivalent of network publishing. This strategy is being widely adopted by conventional print-based companies, as they move toward producing tools for multiple content-delivery services. Such companies include Adobe and Quark... Corel CEO Derek Burney denied Corel's ties with Microsoft had any bearing on the deal: 'Corel is exploring its own strategy,' he affirmed. The company plans to extend the XML capabilities in WordPerfect, and to integrate XML into all its product lines'." See the announcement.

  • [August 08, 2001] "Group says ASN.1 can field XML, save bandwidth." By R. Colin Johnson. In EE Times August 07, 2001. "The ASN.1 Consortium will be launched later this summer to hawk Abstract Syntax Notation One as the preferred communications standard for achieving interoperability between computing platforms sharing the Extensible Markup Language (XML). As digital communications spreads from cell phones to wireless personal organizers and XML-powered information appliances, the consortium claims to have the interoperability 'Rosetta stone' in place. 'ASN.1 is already the enabling technology for cell phone messages, radio paging, streaming audio and video over the Internet, 800-number services, ISDN telephone services, digital certificates and secure e-mail,' said Bancroft Scott, founder and president of OSS Nokalava here. 'But with the ongoing work in the engineering community to deliver digital information with XML, we want to get the word out on how you go about getting these devices talking to one another.' ASN.1 was invented in 1984 and has become an international standard published jointly by the International Standards Organization, the International Electrotechnical Commission and the International Telecommunications Union. The telephone companies popularized it as a method of gluing together all the various computers that must route messages between handsets through a maze of diverse switches. Once a message is encoded in ASN.1's formal language, any other computing device along a telecommunications route can securely decode it -- whether it's a billion-dollar supercomputer or a $50 cell phone. In the '90s, ASN.1 tool kits were ported to nearly 100 computing platforms running C, C++, Java, Pascal and proprietary operating systems for embedded devices like cell phones. Despite its widespread use, ASN.1 is not widely recognized in the engineering community as a route to interoperability. In fact, the emergence of XML has begun to overshadow ASN.1 as the preferred universal data-formatting methodology. XML allows application developers to encode their data into HTML-like text files that encapsulate the information about the data -- its application-specific 'tags'-- along with its raw alphanumerical values. XML allows computers of all sizes -- from supercomputers to those embedded in cell phones -- to share the same data files, such as accessing an appointment calendar from a PC in the office or from a cell phone while on the road. The new consortium, however, argues that the venerable ASN.1 specification can be the preferred method of telecommunicating raw XML database information from device to device..." Compare: "ASN.1 Markup Language (AML)."

  • [August 08, 2001] "XML and Microsoft Office." By Paul Cornell (Microsoft Corporation). From MSDN Library. August 2, 2001. "The Extensible Markup Language (XML) is a set of technologies that provides you with a platform-neutral and application-neutral format for describing data. This allows you to import and work with data that originates from applications outside of Microsoft Office, as well as export data from Office to a myriad of other data formats that your business partners may require. XML will also comprise a large part of the Microsoft applications, operating systems, and technologies in the future, so learning about XML now will reap huge dividends for you as an Office developer... Using XML in Your Office Solutions: In Microsoft Office 2000, there is very limited XML support. In fact, the only place that XML is used in Office 2000 is through the use of XML data islands in Microsoft Excel 2000, Microsoft PowerPoint 2000, and Microsoft Word 2000 when these documents are saved as Web pages. This allows these documents to be viewed and edited in Microsoft Internet Explorer while maintaining the rich formatting that was used when these documents were created in their original applications. In Microsoft Office XP, XML is implemented in several additional ways: (1) You can save Microsoft Excel 2002 spreadsheets and Microsoft Access 2002 database tables, queries, and views as XML. (2) You can import XML into Excel 2002 spreadsheets and Access 2002 databases. (3) Microsoft Outlook 2002 views are defined in XML. You can modify these view formats by using Visual Basic for Applications (VBA) code. (4) Smart tags can be embedded as XML inside of Microsoft Word 2002 documents, Excel 2002 spreadsheets, Outlook 2002 e-mail (when Word 2002 is enabled as your e-mail editor), or Web pages (when Office XP or one of the individual applications just mentioned) is installed on your computer. Reusable smart tags can also be written in XML and distributed to multiple Office XP users. In the remainder of this column, I will demonstrate how to import and export XML to and from Access 2002 and Excel 2002. I will also show you how to modify an Outlook 2002 view using XML...Short of retyping the XML file, how do we transform the data from one XML structure to another? The answer is by using an XML technology called XSL Transformations (XSLT), which is itself based on an XML technology called the Extensible Stylesheet Language (XSL). The MSXML parser can apply the instructions in an XSLT file to display XML in a different format. The MSXML parser also exposes an object model called the XML Document Object Model (DOM) that allows you to access the various elements in an XML file (among other things)..."

  • [August 08, 2001] "Oracle talks Web services." By Tom Sullivan. In InfoWorld August 08, 2001. "After talking up its Dynamic Services Framework at the end of last year, Oracle has kept its Web services strategy quiet, making only a few announcements relating to standards support. But with the posting Wednesday of a pre-release version of its JDeveloper tool kit for 9i on the Web, the company has offered some insight into its plans for Web services... Within Oracle's integrated approach, which includes the database, application server and applications, it sells an infrastructure for Web services akin to that of Microsoft. On the Java side is approach is similar to that of IBM, BEA Systems, Hewlett-Packard, and Sun Microsystems. While the company has been flexing its marketing muscles toward Web services and being a general infrastructure player for years, Urban continued, IBM is taking a similar route by closely aligning its Tivoli software, MQSeries, DB2, and WebSphere..." See the announcement.

  • [August 07, 2001] "Fidelity Pours Resources Into XML." By Jeffrey Schwartz. In InternetWeek (August 6, 2001). ['Fidelity Investments in October will complete a sweeping program to make all its corporate data XML compatible. It's been called the most far-reaching XML deployment ever.'] "Fidelity Investments in October will complete a sweeping program to make all its corporate data XML-compatible. It's the most far-reaching XML deployment ever, experts say, promising Fidelity employees and customers rapid, standardized access to data--regardless of where that data resides. The effort, which experts estimate has cost tens of millions of dollars, should help the world's largest mutual fund company and online brokerage eliminate up to 75 percent of the hardware and software devoted to middle-tier processing and speed the delivery of new applications. Today, two-thirds of the hundreds of thousands of hourly online transactions at use XML to tie together Web and back-end systems. Before XML, comparable transactions took seconds longer because they had to go through a different proprietary data translation scheme for each back-end system they retrieved data from. Fidelity is years ahead of other companies in leveraging XML to standardize corporate data, analysts say. 'What they've done will take the mainstream five years to do and the conservative companies 10 years,' says Gartner Group analyst Roy Schulte... Most companies are approaching XML conversion application by application rather than converting all their data at once... Fidelity's taking a leadership position not only in deploying XML, but also in its overall approach to IT. As some rivals, including Charles Schwab, slash IT personnel and capital expenditures amid the economic downturn, Fidelity is increasing its Internet infrastructure and overall technology spending, typified by the massive XML conversion. Fidelity will spend about $2.7 billion -- more than 24 percent of gross revenue -- on technology in 2001. That figure includes $350 million for Internet development alone, which is a 35 percent increase over last year... Fidelity's XML strategy is most critical to bringing new applications -- and services -- to customers faster than rivals. By using XML as a common language to which all corporate data -- from Web, database, transactional and legacy systems -- is translated, Fidelity says it's already saving millions of dollars on infrastructure and development costs. That's because the single data format eliminates scores of proprietary translation methods that it previously had to develop for communications between the company's many systems. Standardizing on XML also lets Fidelity impose common ways of representing types of data, such as account balances, across systems. Before, each system or business unit might use different methods to convey an account balance, requiring further programming tweaks when it came to exchanging data between systems... Fidelity's XML effort involves writing Java server components that translate different attributes and data types into XML definitions. Those components are stored within IBM's WebSphere..." See related articles: "Fidelity architect details the challenges of implementing XML" and "Lab keeps Fidelity in technology's forefront."

  • [August 07, 2001] "Java's Sweet Spot." By Oliver Rist and David Aubrey. In InformationWeek July 30, 2001). "Application servers are at the core of Web application development efforts. While many IT managers and CIOs must deal with this curious piece of middleware every day, others see it only as an icon in an application diagram. Finding out what application servers can and can't do lets senior IT managers help their developers make informed decisions that can benefit an entire company. Application servers evolved directly from the earliest HTML content servers. What makes an application server different from a Web server is its ability to turn data from a variety of sources into Web content, not just HTML. In this article, we'll look at the most popular app servers that conform to Sun Microsystems' Java 2 Enterprise Edition specification... Choosing the right application server platform for your design is a critical step in completing a successful development project. Most often, this revolves around how well the product handles Java, J2EE, EJB, and JSP code. Support for bundled as well as third-party ancillary tools is also important. But the last step in evaluating an application server has to be its future. Java is the darling today and will undoubtedly continue to be important for the foreseeable future. But anyone observing the Web application industry already can see future standards growing in importance. XML is an example, along with its associated standards, such as the Simple Object Access Protocol and the E-business XML (ebXML) effort. IT managers need to identify how important these technologies are to the future of their applications. How will the company that sells you a J2EE application server today support these standards tomorrow? Extending the Web into the wireless world is another great example. New standards like the Wireless Access Protocol are making mobile Web apps not only feasible, but even practical. Closely intertwined with XML, these standards will become critical as wireless development becomes more popular, and application server and other Web middleware will need to support it. What's important in an application server is changing just as quickly as the functionality of the Web is growing..."

  • [August 07, 2001] "The Delphi XML SAX2 Component and MSXML 3.0." By Danny Heijl. In Dr Dobb's Journal [DDJ] (September 2001), pages 42-54. "Danny shows how to use the C++ COM interfaces of Microsoft's MSXML 3.0 SAX2 parser with Borland Delphi. He then presents TSAXParser, a Delphi component that uses these interfaces, but shields you from their complexities. Additional resources include xmlsax2.txt (listings) and (source code)."

  • [August 07, 2001] "SOAP: Simplifying Distributed Development." By Neil Gunton. In Dr Dobb's Journal [DDJ] (September 2001), pages 89-95. "The Simple Object Access Protocol (SOAP) was developed as an open RPC protocol using XML, targeting much the same problem set as CORBA, DCOM, and Java RMI. Neil uses it to add new facilities to his web site. Additional resources include soap.txt (listings)." See "Simple Object Access Protocol (SOAP)."

  • [August 07, 2001] "Websigns: Hyperlinking Physical Locations to the Web." By Salil Pradhan, Cyril Brignone, Jun-Hong Cui, Alan McReynolds, and Mark T. Smith. In IEEE Computer Volume 34, Number 8 (August 2001), pages 42-46. [Special Issue on "Location Aware Computing."] "First-generation mobile computing technologies typically use protocols such as WAP and i-mode to let PDAs, smart phones, and other wireless devices with Web browsers access the Internet, thereby freeing users from the shackles of their desktops. We believe, in addition, users would benefit from having access to devices that com-bine the advantages of wireless technology and ubiquitous computing to provide a transparent linkage between the physical world around them and the resources available on the Web. Building on a decade of research in this area, we are developing devices that augment users' reality with Web services related to the physical objects they see. In the CoolTown research program at Hewlett-Packard Laboratories, we are building ubiquitous computing systems that sense physical entities in the environment and map them to a Web browser. To create a hyperlink between a physical entity and a Web resource, we attach infrared beacons, radio frequency ID tags, or bar codes to people, places, and things to associate them with an appropriate universal resource identifier, resolving the URI in the network if it is not already a URL. These hyperlinks rely on commonly available wireless mobile devices to help users automatically access services associated with physical objects... Websigns essentially bind location coordinates, control parameters such as access range, and a service represented by a URL. We use the Websign Markup Language (WsML), an XML application, to express the binding semantics. As the 'Websign Markup Language' sidebar indicates, usually Web servers host WsML for mobile devices to download over a cellular wireless connection. Mobile devices can also host WsML for other peer-to-peer devices. Typically, peer devices can communicate over short-range radio networks such as Bluetooth or send WsML embedded in text-message-over systems such as Short Message Service." See the discussion.

  • [August 07, 2001] "HR-XML 1.1 Adds Flexibility for International, Temporary and Contract Users." By Rich Seeley. In Application Development Trends (August 4, 2001). "Chuck Allen, HR-XML Consortium director, said he and his colleagues 'learned a lot' from HR-XML Staffing Exchange Protocol (SEP) Version 1.0, which was approved last October. They incorporated much of what they learned into the newly released 1.1 version of the standard, which was designed for internal corporate human resources, financial departments and Internet job boards... Among the initial assumptions was that the standard would be used primarily for internal corporate HR systems. But staffing companies, such as Kelly, that were early adopters were interested in requisitions coming out of purchasing departments. Allen said there was some internal debate among HR-XML members as to whether they wanted to get into the procurement side of the hiring business or remain purely an HR standard. But looking at how a new hire or temporary position is handled in the interconnected world of the Internet convinced members to expand the standard for use by purchasing departments. SEP 1.1 now accounts for what Allen calls the 'chaining of information' that occurs as a request to hire a full-time, contract or temporary worker travels about the Internet. "You start with a requisition in the purchasing office that goes to a staffing agency; if they don't have a candidate, it will go out to a job board," he explained. "We realized that even if we designed the protocol for separate transactions, the data in this chain would get muddled. These are not entirely different transactions." The new version of HR-XML SEP 1.1 supports chaining of information; this means that if a company so desired, the purchasing manager would be able to track what happened to that requisition back through the staffing agency and out to a job board such as" See: "HR-XML Consortium."

  • [August 07, 2001] "XML: The next Esperanto?" [Opinion.] By John D. Williams. In Application Development Trends Volume 8, Number 8 (August 2001), pages 63-64. "So far, with XML, we've moved from raw data to structure and beyond structure to grammar, to add value and information to our system communications. Is this enough for all our system communication needs? Think back once again to our dialogue. The dialogue obeys the rules of both structure and grammar, yet we are still unsure of its meaning. Does this kind of ambiguity happen in system-to-system communication? It certainly does. As we will see, this can be an issue in developing B2B exchanges. So where does that leave us with XML today? We have structure. We have grammar. Unfortunately, we have no context. Why is this a problem? I think the current state of XML is like the Esperanto language. Esperanto was introduced in 1887 by Dr. L.L. Zamenhof. He proposed Esperanto as a second language that would allow people who speak different native languages to communicate, yet at the same time retain their own languages and cultural identities. Sounds like a great idea doesn't it? It sounds much like the rationale behind XML. So how has it worked? While there is a claim that millions of people around the world speak Esperanto, evidence seems to suggest that the language is not widely used for communication. Why is this and how does it relate to intersystem communications? Like XML, Esperanto has structure and grammar, but that isn't sufficient for it to become the predominant language people use to communicate. There is a line of thought from anthropology that says language shapes the way we think and perceive. The Sapir-Whorf hypothesis states that the structure of a language constrains thought in that language, and constrains and influences the culture that uses it. In other words, if concepts or structural patterns are difficult to express in a language, the society and culture using the language will tend to avoid them. Individuals might overcome this barrier, but the society as a whole will not. Esperanto does not have this type of influence, because it is an artificial language. People do not think natively in Esperanto. It does not provide anyone the native context for understanding and interpreting their world. So what?... I propose that we create a standard, called XML Frames, to provide this context for systems. Systems would use these frames for interpreting the information structured by XML tags and constrained by XML Schema. Frames should not become buckets full of everything. It makes little sense to try to define the whole world in a single frame. Instead, frames should be focused and contain only the information needed to provide context for a given exchange (though that may be fairly complex). We could supplement frames with a methodology for understanding and resolving differences between frames..."

  • [August 03, 2001] "Architectures in an XML World." By Joshua Lubell (National Institute of Standards and Technology, Manufacturing Systems Integration Division). [To be] published in the proceedings of Extreme Markup Languages 2001, Montréal, Cananda, August 14-17, 2001. "XML (Extensible Markup Language) developers have at their disposal a variety of tools for achieving schema reuse. An often-overlooked reuse method is the specification of architectures for creating and processing data. Experience with APEX, an architecture processing tool implemented using XSLT (Extensible Style Language Transformations), demonstrates that architectures can fulfill a role not well served by alternative approaches to reuse... Developers of markup languages have long recognized the importance of reuse. Since the early days of SGML (Standard Generalized Markup Language), authors of DTDs (Document Type Definitions) have used parameter entities to help make markup declarations more reusable. Newer approaches to reuse run the gamut from the relatively simple concept of namespaces to more sophisticated methods such as the facilities available in the W3C's (World Wide Web Consortium's) XML Schema specification. As a result, XML developers today have at their disposal a variety of tools for achieving reuse... Architectures, alternatively referred to as 'architectural forms' or 'inheritable information architectures,' have been around since the mid-1990s. Although the architecture mechanism's invention predates the standardization of XML, architectures are still being used today -- most notably in the ISO Topic Maps standard and in the W3C's XML Linking specification (XLink). In this paper, I briefly describe the architecture mechanism. Next, I discuss APEX (Architectural Processor Employing XSLT), a tool implemented using XSLT (Extensible Style Language Transformations) for processing architectures. I conclude by discussing how architectures compare with some alternative reuse techniques. Within the context of markup languages, an architecture is a collection of rules for creating and processing a class of documents. Architectures allow applications to: (1) Extend XML vocabularies without breaking existing applications. (2) Create architecture-specific document views, retaining only relevant markup and character data while hiding all other content. (3) Promote data sharing between user communities with inconsistent terminologies by enabling the substitution of identifier names and by allowing simple document transformations." Note, in connection with APEX and the Extreme 2001 paper, Lubell's recent post to the RELAX NG mailing list: "I added an example to my XSLToolbox package that uses RELAX NG patterns to represent the 'purchase order' and 'international purchase order' sample schemas from Part 0 of the W3C XML Schema recommendation. The international purchase order pattern uses the purchase order pattern as a (ISO/IEC 10744 Annex A.3) architecture. My example attempts to show how you can use RELAX NG and architectures together to mimic W3C XML Schema features such as type extension and substitution groups. The example also exploits RELAX NG's namespace-awareness in that the international purchase order pattern contains an embedded annotation specifying the default attribute values needed for architectural processing. I'm thinking of making this example into a poster presentation for Extreme 2001, to supplement a paper I'm presenting on architectures and XSLT... To run my example, download and unzip the XSLToolbox distribution, and look at README.TXT in the apex/examples/schema directory." References on AFs: "Architectural Forms and SGML/XML Architectures."

  • [August 03, 2001] "Status and Directions of XML in Technical Documentation in IBM: DITA." By David A. Schell, Michael Priestley, John P. Hunt, and Don R. Day. Paper presented at IBM's 'Make IT Easy 2001 Conference' [and voted one of the best papers at the year 2001 conference]. "For the past two years a workgroup inside IBM's User Technology community has been working on creating a XML architecture for the next generation information deliverables. In this paper we describe the current state of that work, the status of the Darwin Information Typing Architecture, and our directions for XML. We also discuss our guiding principles for our work on XML and our activities related to validation and proof-of-concept... In March of 2001, IBM released for public awareness and commentary a new architecture for authoring, producing, and delivering technical information. The Darwin Information Typing Architecture (DITA) deals with the complexity of information at two levels: it goes beyond book-oriented DTDs by addressing typed data at the topic level, and it features specialization, which allows derivation of new topic types (and their specialized vocabularies) from base types. Topics of any type can be assembled into books, webs, and helpsets without rewriting, owing again to the specialization methodology, which allows new vocabularies to be processed reliably by previous tools. DITA was developed by a team led by Don R. Day, Michael Priestley, and Dave A. Schell of IBM. The popular hype surrounding XML is that it promises greater reuse, semantic specificity, and interchangeability. Out of the hundreds of XML applications currently proposed or deployed, very few deliver on all three promises, because the promises are basically in conflict with each other. For example, in most XML applications the more specific the vocabulary, the less interchangeable the documents (a <var> programming variable element is not likely to be required in DTDs that support simple memos, therefore it would have to be transformed if the content were reused in a memo). DITA helps deliver on these promises of XML by having base semantics from which new vocabularies are progressively defined, and this same base semantic continues to support the processing for the new vocabularies. This architectural feature ensures that new element names can always be associated to existing processors to produce contextually correct results. Likewise, new elements can always be transformed back to their base types in situations where the specificity needs to be relaxed, or transformed to related vocabularies derived from the same base semantics in cases where documents are interchanged between companies that use different names for similar elements or attributes. Semantic specificity based on derivation has other advantages. Derivation avoids the sort of semantic overloading that may occur in other DTDs when an unrelated existing element is used as a base for a new semantic. With no derivation architecture, such newly defined elements may be used in contexts that have nothing to do with the semantics of that new element. Derivation ensures that an entire context for a new vocabulary is properly represented, and that the content models are therefore reasonable. Moreover, one can limit the number of elements in new content models so that fewer nonapplicable elements are visible. This is a tremendous authoring aid! In effect, the choices available for writers within a particular specialized vocabulary need only be those that are appropriate for that context." See also: IBM's Darwin Information Typing Architecture (DITA). [cache]

  • [August 03, 2001] "Picture This: Web Interfaces Go Interactive." By Kurt Cagle (WWW). In XML Magazine Volume 2, Number 4 (August/September 2001). ['Static bitmaps and primitive animation have limited the development of truly interactive Web graphics. Now there's SVG, a standalone language that lets you use XSLT to generate dynamic graphical user interfaces from XML data sources. Will combining graphical language capabilities with the flexibility of XML and XSLT make SVG mainstream?'] "The concept of Web interfaces until recently was limited by the toolsets that we had available: collections of widgets that make up most windowing systems, buttons, scrollbars, drop-down lists, check buttons, and so forth. Multimedia developers using products like Macromedia Director or Flash pushed the boundaries of what such interfaces could look like, but their large size -- and the notion that building client interfaces was the least compelling part of programming -- kept much of this interest from spreading beyond kiosks and children's games. Enter Scalable Vector Graphics (SVG), which offers a way to turn any set of XML-based information into an interface -- any interface. SVG marries the low bandwidth and discrete identities of graphic languages like PostScript with the flexibility and interaction of XML and XSLT. While it probably will take a few years before SVG becomes mainstream, the ability to turn any graphic image into an interface, dynamically, will make SVG a major contender in the Web world. Let's see how... Transforming to SVG: In many respects, XSLT is the perfect vehicle for creating SVG. While some SVG pieces will be created through design tools such as Adobe Illustrator, a significant amount of the information graphics to which SVG doubtless will be applied involve taking an XML data document of some sort and converting it into a visual image. While XSLT's functional abilities are generally sparse (although they can be augmented through nonstandard extensions, depending on the parser), most information graphs do not depend on graphing mathematical functions, but rather illustrating specific sets of data points... The SVG of the Future: Turning a static graphic into a user interface perhaps demonstrates the crucial strength of SVG going into the future. SVG offers a way to turn any set of XML-based information into an interface -- any interface. A map of a building could record electricity and temperatures in any part of the building, or be used as part of a SOAP-based system to let you turn thermostats up or down, send computers into hibernation mode, or tell when your four-year-old daughter has left the refrigerator door open and send a signal to the servo in the door to close it. For that matter, an SVG-based system could also keep track of where your daughter is (through a sensor in her clothes), so you don't close the door on her accidentally. You can also envision program design through the manipulation of SVG objects on a desktop in much the same way that UML is designed now using Rational Rose or Visio software. The advantage to SVG over other coding is that SVG objects can be generated through transformations from XML-based IDLs and can send events to inform the underlying objects when they need to change (or attempt to change) in the face of user stimuli. Clearly, the complex roles of dynamic graphical interfaces fosters a great opportunity for SVG in the Web's evolution..." See: "W3C Scalable Vector Graphics (SVG)."

  • [August 03, 2001] "Object Serialization and OO Techniques." By Dan Wahlin. In XML Magazine Volume 2, Number 4 (August/September 2001). ['XML serialization provides a great mechanism for working with XML documents in an object-oriented manner. Learn how to use XML serialization techniques to employ object-oriented techniques and manipulate the web.config file. Use XML object serialization with configuration files to make applications easier to maintain'] XML Serialization presents an excellent mechanism for working with XML documents in an object-oriented manner. In the June/July issue I discussed how using XML serialization in the IBuySpy Portal site ( simplified working with configuration files. Now I'll demonstrate how to leverage XML serialization techniques to manipulate the web.config file using object-oriented techniques rather than the Document Object Model (DOM). Let's take a detailed look at the serialization/deserialization process and how to leverage XML serialization to manipulate the web.config file used in ASP.Net applications... This process requires a little extra setup work, but this work pays off by allowing the file to be manipulated using object-oriented techniques. In cases where the resulting serialized XML document must follow a predefined schema it's important to have a good understanding of how to control the serialization process. .. Notice that all of the data is stored using attributes. This attribute-centric structure is important because properties and their related values are serialized as elements and text nodes by default. To serialize properties as attributes of an element, specific .Net attributes must be used. If you're not familiar with what .Net attributes are or how they can be used in the platform, the following definition pulled from the .Net SDK provides a good overview [...] The common language runtime allows you to add keyword-like descriptive declarations, called attributes, to annotate programming elements such as types, fields, methods, and properties. When you compile your code for the runtime, it is converted into Microsoft intermediate language (MSIL) and placed inside a portable executable (PE) file along with metadata generated by the compiler. Attributes allow you to place extra descriptive information into metadata that can be extracted using runtime reflection services. Although working with XML documents in an object-oriented manner through using XML serialization requires some initial setup work, the resulting application is much easier to maintain and use by programmers with basic object-oriented knowledge. For a live example of the web.config application detailed here, visit You'll find the application in the CodeBank section."

  • [August 03, 2001] "XML Special Report on Pervasive Computing." By Stuart J. Johnston. In XML Magazine Volume 2, Number 4 (August/September 2001). ['From HailStorm to JXTA, Sun ONE to .Net, the notion of pervasive computing is rapidly becoming a reality. With XML at the heart of these ubiquitous technologies, Microsoft and Sun Microsystems are leading the way to empower users'] "In the Web services horse race, to many observers Microsoft is ahead by a nose. The Redmond giant -- not about to be hobbled by the government's antitrust suit -- launched its .Net Web services initiative more than a year ago. In recent months, it has gradually raised the veil as well as the noise level on key early Web services it plans to provide -- some argue '"force on' -- users. Microsoft has collectively named those services 'HailStorm.' Company executives have said that there will be plenty of room for third parties to provide their own services on top of the .Net architecture, but that Microsoft will provide a core set of HailStorm services itself. The first of these will be an authentication platform, a universal messaging system, a notification platform, and storage of users' data out "in the cloud" -- basically on servers in data centers provided by Microsoft. By definition, HailStorm will support Web Services Description Language (WSDL), Universal Description, Discovery, and Integration (UDDI), and Simple Object Access Protocol (SOAP)... Meanwhile, Sun Microsystems has been just as vocal about serving up pervasive computing for users. But it's not as simple as comparing Sun's approach, which obviously centers on Java, to .Net and HailStorm. Still, despite suggestions to the contrary by senior Microsoft executives like Ballmer, Sun has not missed the point. XML is undoubtedly an important part of the Java company's plans. And it's getting to be more so. Java 2 Enterprise Edition (J2EE) version 1.4, which will likely ship within the next two years, will have built-in support for the Web services model, but Sun isn't waiting and has already begun shipping portions of its strategy. Sun's approach to XML falls mainly into two areas: the Sun Open Net Environment or Sun ONE, which includes adoption of a wide range of XML technologies as well as the incorporation of XML applications programming interfaces (APIs) into the Java language, and Sun's open source, peer-to-peer project called JXTA. Like all of the other pretenders to the Web services throne, Sun has embraced XML, even if in some areas at first a bit coldly. For instance, with its gigantic investment in Java and Java communications, the arrival of SOAP was more like the arrival of the poor country cousin coming to visit the city relations for the first time. Though the company initially endorsed SOAP for certain nonmission-critical communications, executives confidently put their money on the United Nations- and OASIS-supported electronic business XML (ebXML) project. That is, until last spring when the ebXML task force adopted SOAP for the transport, routing, and packaging portions of the proposed standard. At that point, seeing that SOAP had been confirmed as a member of the family, Sun finally acquiesced. Adopting SOAP did not loosen Sun's faith in Java as the first born, but it did mean that all of the Java children would at least be able to speak XML when necessary. This is primarily accomplished by adding a series of APIs to Java that retrofit the language to support key XML functions..."

  • [August 03, 2001] "Can XSLT Speed Up Table Displays?" By A. Russell Jones. In XML Magazine Volume 2, Number 4 (August/September 2001). ['Can XSLT really make displaying database data in tables more efficient? Compare and contrast four table-building methods that use ASP to display data in HTML-formatted tables. The fastest and most flexible way to create HTML-formatted tables just might surprise you.'] "Ever since Microsoft released Active Server Pages (ASP), people have been using it to display database data in HTML-formatted tables -- and arguing about the most efficient method. Now that XSLT has joined the race, I wanted to know whether using XSLT could have any effect on efficiency. The sample code contains four ASP pages to facilitate comparing four different methods for building a table: a standard loop through a recordset (TableStandard.asp), a server-side XSLT transform (tableXSLT.asp), a client-side XSLT transform (tableXSLTClientAuto.asp), and a cached server XSLT transform (tableXSLTCached.asp). Each method uses the SQL Server sample Northwind database and retrieves the entire Customers table (91 rows, 10 columns). The generated table requires approximately 92 KB, so it serves as a reasonable real-world example... When ADO was able to persist data in XML form, and Internet Explorer was able to bind HTML elements to data sources, a few people began binding their tables directly to XML data islands. I didn't make a performance comparison between the databinding version and other versions, but my subjective impression is that it compares favorably with other methods that perform display processing on the client rather than on the server. At about the same time, XSL and XSLT began to make waves in the industry (XSL was an early, nonstandard version of XSLT). Using an XSLT style sheet, you can process XML documents on the server or client and output HTML -- the equivalent of looping through the recordset, but written in optimized binary code. To do this though, you need to deal with two problems: the XML produced by persisting an ADO recordset is not exactly in the format you might expect, and it doesn't include any character-encoding specification, which makes the XML difficult to work with. However, both of these problems are easy to solve... The method that displays fastest in the browser is to build the table by either looping through a recordset or using a generic transform on the server. The method that uses the least Web server resources is to retrieve the data with the FOR XML clause, and send the resulting XML data directly to Internet Explorer, where a client transform creates and displays the HTML. However, the most flexible method is to use cached XSLT transforms on the server because it works with all browsers, displays faster than client-side transforms, and -- at least for documents the size of the Customers table -- is almost as fast as sending the XML data to the client and transforming it there. Providing actual times is useless, because your setup is likely to be different than mine, but using a cached XSLT transform on the server runs at essentially half the time required to loop through the rows of a recordset in server script code. If you're already using MSXML to perform transformations on your server, you can add just a few lines of code and double the speed by caching the transforms. If you're still looping through recordsets building the HTML manually, you'll be glad to know that the effort you put into changing to transformations will pay off in reduced server load and faster pages."

  • [August 03, 2001] "WSDL for Defining Web Services." By Don Kiely. In XML Magazine Volume 2, Number 4 (August/September 2001). ['Achieving fully distributed Web services has yet to be realized, partly because a lack of standards has fragmented development efforts. The W3C's XML Protocol Working Group hopes to change that with initiatives that include WSDL as a complement to SOAP and UDDI. With the W3C's XML Protocol Working Group and industry heavyweights behind it, will WSDL reach Recommendation status?'] "The big news in Web development over the last year has been Web services. Microsoft made it mainstream with its early work on Simple Object Access Protocol (SOAP), using it as the basis for its vision of Web services in the .Net framework. During that time many groups submitted their own standards proposals to provide key pieces of the XML middleware story. Fundamentally, these networked service requests are a way to request XML-related functionality from a remote machine over a network such as the Internet. The more notable standards entries for Web services include Web Distributed Data eXchange (WDDX), XML Remote Procedure Call (XML-RPC), and SOAP. There also have been proposals for defining the descriptions and structure of such content, including Information Content Exchange (ICE) and the RDF (Resource Description Framework) Site Summary (RSS). Many developers have also done very well using the common Internet standard of Multipurpose Internet Mail Extensions (MIME). At the same time, Web developers have plugged away building applications with plain old HTTP. But many other XML protocol initiatives are floating around the W3C and elsewhere, so the W3C has a new XML Protocol Working Group for addressing these issues. The group has begun defining its charter and pulling together the proposed standards that ultimately will be the backbone of this new generation of Web features. Here's an excerpt from the group's charter: 'A broad range of applications will eventually be interconnected through the Web. The initial focus of this Working Group is to create simple protocols that can be ubiquitously deployed and easily programmed through scripting languages, XML tools, interactive Web development tools, etc. The goal is a layered system which will directly meet the needs of applications with simple interfaces (e.g. getStockQuote, validateCreditCard), and which can be incrementally extended to provide the security, scalability, and robustness required for more complex application interfaces. Experience with SOAP, XML-RPC, WebBroker, etc. suggests that simple XML-based messaging and remote procedure call (RPC) systems, layered on standard Web transports such as HTTP and SMTP, can effectively meet these requirements'..." See: "W3C XML Protocol."

  • [August 03, 2001] "An XML Framework for Coordinating Creative and Technical Design." By Gordon Benett. In IntranetJournal August 2001. "Legend has it that the Tower of Babel was the world's second great engineering project (after Noah's ark) and its first great engineering failure. Things went smoothly as long as the architect, construction workers and project managers all spoke the same language; until, that is, things got real. Real world projects are multi-cultural endeavors. Even when everyone speaks the same language, different factions bring different interests, priorities, strengths and weaknesses to the task at hand. Anticipating and accomodating this reality is one of the keys to good management. In this article, we'll look at the cultural clash between three groups often involved in Web projects: front-end developers, information architects and visual designers. Then I'll describe an XML-based framework I used successfully to expedite production of a 600-page commercial Web site... One of the value propositions most cited for XML is the way it enables designers to separate content and structure from presentation. In terms of our design constituencies, it seemed fair to identify the information architecture (IA) with site structure and the visual design with presentation. Front-end developers were responsible for mediating between the two, populating templates with DHTML in confromance with both the style guide (a creative design document) and the site architecture. It turned out to be straightforward, even 12 months ago, to mimic this organization in code using XML, DTDs, XSLT and the DOM. The first step was to recognize that the content, which in this case amounted to around 600 pages of Word documents, was well structured. This meant that it could be converted programatically to XML and validated against a Document Type Definition (DTD). Every document had a title, multiple paragraphs of body content, legal disclaimers and numerous optional elements which, importantly, always stood in well-defined relation to the mandatory elements. We created a rough DTD by inspecting a small number of documents, then refined it by exception during the conversion process. The final DTD was a formal representation of all the structures in our content... A bigger benefit was the way the DTD enabled communication between project teams. For once, the art department received a formal and comprehensive listing of all the element types appearing in the site. Using this designers were able to create a mapping from each element type (for instance, the Title tag) to a presentation style (such as font-size: 18pt; color: navy). In so doing, the designers, while remaining true to their aesthetic proclivities, fleshed out a style guide formal enough for front-end developers to use as a CSS specification. In effect, the developers created a CSS 'binding' to the DTD based on the art department's design work... The real advantage to having an XML site blueprint is the ability, using Extensible Style Language Transforms (XSLT) and the CSS binding of the style guide, to programatically generate site prototypes. Using XSLT, the site architecture, instead of being buried in a set of Visio diagrams or Excel spreadsheets, can fluently be rendered into Web pages and submitted for usability testing..."

  • [August 03, 2001] "Magic with Merlin: Long-term Persistence. Serialize JavaBean component state to XML." By John Zukowski (President, JZ Ventures, Inc.). From IBM developerWorks. July 2001. ['The ability to save the JavaBean component state for long-term persistence within an XML document has been a topic of much discussion with Java developers in the past few years. This feature has finally been adopted in the 1.4 version of J2SE. In this installment of Magic with Merlin, John Zukowski shows you how to use the new XMLEncoder and XMLDecoder classes, bypassing serialization and allowing you to generate fully initialized bean instances. Share your thoughts on this article with the author and other readers in the discussion forum by clicking Discuss at the top or bottom of the article.'] "One new feature of Merlin has been thrown around in various incarnations at Sun's Swing Connection for some time now; in fact, it was first discussed at the 1999 JavaOne show. That feature is the ability to save the JavaBean component state for long-term persistence within an XML document. Serialization works fine for short-term marshaling needs, with CORBA and RMI, or for saving state information within an executing servlet. However, serialization can run into problems across versions of class libraries or Java run-time environments, among many other issues. The new XMLEncoder / XMLDecoder classes permit the dumping of the JavaBean component state to a text file for easy modification outside of a Java program or more likely for the generation of such files. Let's take a look at how to use the classes and examine the file generated..." With source code; article also available in PDF format.

  • [August 03, 2001] "Soapbox: Humans Should Not Have To Grok XML. Answers to the question 'When shouldn't you use XML?'" By Terence Parr (Chief scientist, From IBM developerWorks. August 2001. ['Today the computing world tends toward using XML for any and all formal specifications and data descriptions. The author, a big fan of XML, asks a blasphemous question: "Is XML totalitarianism a good idea?" In this opinion piece, Terence Parr, co-founder of jGuru, demonstrates that XML makes a lousy human interface. He also provides questions to ask yourself to determine if XML is appropriate even for your project's program-to-program interface needs.'] "Remember what life was like before cut-and-paste? In modern operating systems, the paste buffer holds data in a standard way and each program is free to interpret the buffer data as it sees fit. For example, you can cut from a database program and meaningfully paste into a graphing program. Similarly, we have a standard means of sharing data between programs and between machines on the Internet called XML. Without XML or similar standard, no two programs could share information -- the fundamental syntax used to format data must be the same for data portability. Of course, you may not be able to interpret that data, but you can at least read it in. Take a look at such things as SOAP and XBeans to see how XML facilitates interoperability (see Resources). Now that we've had a group hug and agreed that XML is, or should be, the common language of program data interchange, I'd like to discuss the converse: When does using XML makes no sense? First, I need to remind you what XML looks like and how it differs from other data formats. Given that background I can then ask a series of questions that may prove useful when determining a data format for your project. Finally, I'll demonstrate my main proposition: XML makes a lousy human interface... My argument boils down to one of human vs. computer hardware. Humans deal especially well with implied structure whereas computers, which were designed to be good at what we are not, prefer explicit structure. The closer your computer language is to natural language, the more natural it will be for a human, but the harder it will be to implement. A good compromise in this tug-o-war is to use a subset of natural language possibly with some hints in the form of punctuation, mathematics being the most obvious and useful example. To my amazement, this classical approach has lost dominance to XML-based explicit structure languages whose form is trivial to recognize (download a free standard XML parser), but that are extremely unnatural and laborious to type and read. Where you strike the balance in your interface language has a lot to do with your experience and available resources, but I hope you at least recognize that computer-friendly XML syntax is not human friendly. Let me leave you with some advice: learn about languages, their design and implementation. Consider that XML itself exists to "fix" SGML's linguistic complexity and implementation difficulties. Skill with computer languages is the single most useful weapon you can acquire because it covers just about every application of computing. As the primary developer of ANTLR, a popular parser/translator generator, I receive questions from an amazingly broad group of users: biologists doing DNA pattern recognition, NASA scientists automatically building communication libraries from deep space probe specification RTF documents, people building configuration files for every conceivable kind of program, and so on..."

  • [August 03, 2001] "RELAX NG Common Annotations." By [James Clark]. Working Draft 3-August-2001. For the OASIS RELAX NG Technical Committee (activity). Initial/first draft; not an official committee work product and may not reflect the consensus opinion of the committee. ['There are lots of details that we haven't yet decided as a group. Rather than leaving the details out, I have tried to specify something reasonable. I suggest people raise issues for anything for which they want something different from what I've specified.'] "Abstract: "This specification defines elements and attributes that may be used as annotations in RELAX NG schemas." Excerpt: "RELAX NG provides an annotation capability. In a RELAX NG schema, RELAX NG-defined elements can be annoated with child elements and attributes from other namespaces. This specification defines some useful annotations, emphasizing compatibility with XML 1.0. The elements and attributes defined in this specification have the namespace URI Examples in this specification follow the convention of using the prefix a to refer to this namespace URI. Conformance: This specification defines three features, [viz.,] (1) attribute default value; (2) ID/IDREF/IDREFS; (3) documentation. [...]"

  • [August 03, 2001] Markup Languages: Theory and Practice [ISSN: 1099-6622] Volume 2, Number 3. "Summer 2000," published Summer 2001. Edited by C. Michael Sperberg-McQueen (W3C) and B. Tommie Usdin (Mulberry Technologies). I have prepared an annotated Table of Contents document with abstracts, excerpts, and additional references. Articles in MLTP Volume 2, Number 3 [pages 205-335] include: "Managing XML Documents in an Integrated Digital Library" [David A. Smith, Anne Mahoney, Jeffrey A. Rydberg-Cox]; "Meaning and Interpretation of Markup" [C. M. Sperberg-McQueen, Claus Huitfeldt, Allen Renear]; "Managing Web Relationships With Document Structures" [Michael Priestley]; "An XML Messaging Architecture for Border Management Systems" [Andy Adler, James MacLean, Alan Boate]; "Navigable Topic Maps for Overlaying Multiple Acquired Semantic Classifications" [Helka Folch, Benoît Habert, Saadi Lahlou]; "Beyond Schemas: Schema Adjuncts and the Outside World" [Scott Vorthmann, Jonathan Robie]; "Using UML to Define XML Document Types" [W. Eliot Kimber, John Heintz]; "Using Java for XML Processing: Review of Java and XML and Java and XML" [Keith W. Boone]; "Review of DocBook - The Definitive Guide" [Normand Montour]. See: "MIT Press Publishes Markup Languages: Theory and Practice Volume 2, Number 3."

  • [August 03, 2001] "Microsoft Hails XML Web Services." By Paul Krill. In InfoWorld August 02, 2001. "A Microsoft official on Thursday positioned the company's .NET vision as an XML-based strategy for integrating disparate systems and applications, preserving existing investments and linking to partners' networks. Microsoft products such as the VisualStudio.NET development tool represent the core of Microsoft's XML strategy, according to Barry Goffe, group manager for .NET enterprise solutions. Hailing the company's strategy during the company's Silicon Valley Speaker Series event here, Goffe touted .NET as the company's XML Web services plan. XML, Goffe said, is the key to solving age-old problems of unifying disparate applications, data, and systems and connecting to partners with dissimilar computers and software. 'Anyone that tells you that there's a natural affinity between, let's say, Java and XML, is missing the point,' Goffe said. XML services, Goffe said, require adherence to several standards: UDDI (Universal Discovery, Description, and Integration), for publishing and finding services; SOAP (Simple Object Access Protocol), for a universal data format; WSDL (Web Services Description Language), for service descriptions; and HTTP and TCP/IP for communications... Included in Microsoft's .NET strategy are offerings such as the HailStorm Web services, Passort identity management, Windows XP, and Windows CE. Two tools offerings shipping by the end of the year, VisualStudio.NET and the .NET Framework, will have intrinsic support of Web services, Goffe said..."

  • [August 03, 2001] "XML Editors: Fact or Fiction?" By Alan Houser (Principal, Group Wellesley). Presentation to the Pittsburgh Markup Language Users Group, July 18, 2001. Author's note: "This is a rather informal overview of currently-available XML editors. I make several observations: (1) XML editors fall into two general categories: document-centric and data-centric. (2) Data-centric editors (i.e. XML Spy) are woefully inadequate for human users who wish to create document-oriented content. (3) There is still a dearth of document-centric editors on the market. (4) Many editors in both categories are missing important XML-related functionality. For example, Softquad's XMetaL does not support namespaces. Tibco's XML Instance will not validate an XML document against a DTD. (5) At least for now, one should consider other options (besides native XML editors) for getting document-oriented XML content from human authors." ['I've posted the slides from a short talk I gave last month to the Pittsburgh Markup Language Users Group about the current state of XML editors, particularly those suitable for creating conventional documents as XML instances (i.e. 'document-centric' editors). I've found that the majority of XML editors are appropriate for XML documents that consist primarily of short string and numeric data. However, for creating XML content that consists primarily of text -- titles, headings, and paragraphs -- most fall far short in terms of features that human authors expect in content creation tools.'] See also the related papers.

  • [August 02, 2001] "XML for Data: XLink and Data. Using XLink to simplify the representation of data." By Kevin Williams (Chief XML architect, Equient, a division of Veridian). From IBM developerWorks. July 2001. ['This column takes a look at how to use XLink pointers when representing data to make XML documents more compact and flexible. Sample code shows examples of an invoice with and without the XLink pointers, plus an example of using XLinks with a URL-addressable database.'] "The W3C recently promoted a specification called XLink to Recommendation status. In this column, I take a look at XLink and how you can use it to simplify the representation and transmission of data. What is XLink, anyway? To quote the W3C XLink specification: '[T]he XML Linking Language (XLink) ... allows elements to be inserted into XML documents in order to create and describe links between resources.' The specification then goes on to assert that links defined using XLink are similar to HTML hyperlinks, leading many programmers to conclude that this is the only purpose for the specification. However, there is another way XLink can be used to great benefit: to show the relationships between data resources. Consider a typical order-tracking application, say for a large manufacturing company. An XML document describing an order would usually contain information about the customer who placed the order, the order status, and the individual line items on the order, with quantities and prices. Consumers of this document might want to use it in very different ways. In the accounting department, someone who requests the order data probably would be interested only in the total price that it needs to bill the customer -- details about the individual line items on the invoice (apart from the quantity and price) would be irrelevant. By contrast, when customers request their orders (for viewing online, perhaps), they might want to see more information, for example, the human-readable name for a part on a line item. Transmitting the entire document to each customer with full details doesn't necessarily make sense: It would be ideal to transmit just the bare bones of the order (for consumers who are interested only in the basics) with pointers to more detailed info. XLink provides a great way to accomplish that... This column demonstrates how you can use XLink's basic functionality to simplify your document structures and reduce your network transmission overhead. It looks only at the way to use simple links; XLink also provides extended link functionality, which you can use to relate many resources together (you might create an XLink linkbase that relates a customer to all of his or her orders, for example). As XML and the associated helper technologies continue to mature, programmers will have more flexibility when deciding how to implement information systems, allowing you to tune your solutions to best meet the needs of your clients..." Article also available in PDF format. See "XML Linking Language."

  • [August 02, 2001] "XMLegant Answers: 'T' is for 'Transformation'." By Bill Trippe. In Transform Magazine Volume 10. Number 8 (August 2001), pages 31-32. "People often ask me, 'What's the big deal about XML?' On its own, XML data just sits there - tidy, well organized and self-describing - but it still just sits there. Yet XML is an expanding technology not for what it does itself; it is important because of the many related technologies and standards now coming to light. At the very least, you need to be able to format data for presentation and shape it into any other necessary forms. This is where Extensible Stylesheet Language (XSL) comes in, and, in particular, its offshoot language, XSL Transformations (XSLT). This talk of transformation probably seems self-serving - this magazine is named Transform, after all. But, in fact, the word "transformation" has been used with XML since its inception and earlier than that with SGML, XML's predecessor. The question about XML data always has been, 'What can I do with it?' The first answer has always been, 'Transform it into something else.' In my June column, I talked about XML schemas and how they are central to data modeling in the new enterprise infrastructure. One of my key points in that column was that software development today often involves moving data among loosely coupled systems, and the schema becomes the lynchpin in this movement. If schemas are the lynchpin, XSLT and other transformation tools are both the intermodal transportation system - how the data gets from one unlike system to another - and the machinery for moving all the data at each point... The concepts are pretty straightforward. XML data can be traversed, among other ways, as a tree structure. For instance, you can think of an address database as one big tree, with each address being a branch, and elements of each address being smaller branches, and so on. In this example, you could use XSLT to traverse the addresses to select and manipulate each name and phone number, or each name and ZIP code, or each name, phone number and ZIP code. If you have multiple databases with such information, you could create XML versions of the data and use XSLT to map from one data source to another..." For related resources, see "Extensible Stylesheet Language (XSL/XSLT)."

  • [August 02, 2001] "XML: The E-Business Jack-of-All-Trades." [Content Management: Business Rules.] By Bruce Silver. In Transform Magazine Volume 10. Number 8 (August 2001), pages 27-28. "It seems that the answer to any information exchange problem today comes down to the same magic bullet: XML. Need to personalize Web content? XML. Need to build an online catalog? XML. Need to repurpose content for print, Web and wireless? XML again. How about integrating back-office systems across your supply chain? B2B commerce, including emerging Web services? OK, you get the picture. XML's power lies in its simplicity. . . Thus it should be no surprise that as companies roll out their e-business initiatives, they're starting to get large volumes of XML. Now they have to deal with the obvious problem of how to manage it all. Web content management technology is doing a good job with the 'unstructured' part - the content on the Web site, including online catalogs - but what about the transactional data itself. The order quantities, part numbers, and shipping and billing addresses flowing between buyers and sellers are all XML as well. Initially it was just assumed that in data-centric applications XML would be used simply as an Internet-friendly pipe between existing relational databases. But the 'information democracy' enabled by XML does not always fit easily with the traditional relational model...Accommodating unique data elements needed only by one trading partner leads to inefficient and slow databases, and the time and cost of responding to new and changing requirements can derail the whole project. Standardized, application-specific data structures called schemas are the industry's proposed solution to this impasse. But as in all such work, there are competing groups, standards progress is slow and continually changing and adoption is spotty. In B2B initiatives, designers often have to plan for XML based on no standard schema, multiple schemas or schemas that will change significantly within six months. The next obvious question is, why not use an XML database to manage the information in its native form? Native XML databases can manage XML data without knowing its schema, or where no schema exists - a key advantage over relational databases - and there are now several of them on the market..."

  • [August 02, 2001] "Semantic Web Content Accessibility Guidelines for Current Research Information Systems (CRIS) and Web content developers of research relevant information at the universities and research institutions." 'How to publish information about research relevant object into semantic web.' By Andrei S. Lopatenko (Vienna University of Technology). "I wrote a draft specification of RDF Schema to describe research information (projects, institutions, researchers, publications, results). The schema is based on Common European Research Information Format ( CERIF-2000 ) recommendation..." See also the RDF/Metadata reference list.

  • [August 02, 2001] XML RBAC Policy for X.509: XML DTD. Document type: X.509_PMI_RBAC_Policy. Specification of the RBAC policy language, version 2. From David Chadwick (IS Institute, University of Salford). Posted to the OASIS XACML [Access Control] TC mailing list. "As part of an EC project that I working on at the university, we have developed a DTD for an RBAC X.509 Policy. I attach the DTD and an example XML policy that shows some of the features of it. We are also developing a Java API for use by the AEF when calling the ADF, based rougly on the OpenGroups AZN API. Our API will use the above policy to direct it in its decision making..." [.ZIP]

  • [August 01, 2001] "Cisco unveils device for delivering wireless content." By George A. Chidi Jr. In InfoWorld August 01, 2001. "Cisco Systems unveiled a new appliance for converting HTML (hypertext markup language) and XML (extensible markup language) into other data formats suitable for use on wireless devices, cellular phones and PDAs (personal digital assistants), the company announced Wednesday. The Cisco CTE 1400 Series Content Transformation Engine is an appliance the size of a single rack unit server, sitting on the network between client devices like phones or PDAs, and the content switches and caching devices containing data. It supports up to 10,000 simultaneous users and 1,400 concurrent active sessions per unit, according to Cisco. The Cisco CTE passes on requests for content to back-end servers, functioning like a reverse-proxy. It passes on essential information formatted to fit the relatively small screens and memory requirements of the specific requesting devices like a PDA or WAP (Wireless Application Protocol) enabled phone, acting as a Web server to the client device and as a client device to the Web server. It supports multiple protocols, including HTML, XML, XSL (extensible stylesheet language), XSLT (extensible stylesheet language transformation), XHTML (extensible hypertext markup language) and WML (wireless markup language)..." See (1) the announcement "Cisco Systems Introduces Cisco CTE 1400 Series Content Transformation Engine for Mobilizing Content to Wireless Devices & IP Phones. Breakthrough Appliance Accelerates eBusiness Applications", and (2) "WAP Wireless Markup Language Specification (WML)."

  • [August 01, 2001] Wireless Markup Language (WML). Version 2.0. Proposed Version 26-June-2001. Wireless Application Protocol, WAP-238-WML-20010626-p. 72 pages. "Wireless Application Protocol (WAP) is a result of continuous work to define an industry-wide specification for developing applications that operate over wireless communication networks. The scope for the WAP Forum is to define a set of specifications to be used by service applications. The wireless market is growing very quickly and reaching new customers and services. To enable operators and manufacturers to meet the challenges in advanced services, differentiation, and fast/flexible service creation, WAP defines a set of protocols in transport, session, and application layers. This specification defines the Wireless Markup Language (WML) Version 2. This specification refers to version 2 of WML as WML2. WML2 is a language which extends the syntax and semantics of XHTML Basic and CSS Mobile Profile with the unique semantics of WML1, optimised for specifying presentation and user interaction on limited capability devices such as mobile phones and other wireless mobile terminals. XHTML is the reformulation of HTML 4.0 as an application of XML. XHTML Basic is a subset of XHTML 1.1 that includes the minimal set of modules required to be an XHTML Family document type, and in addition it includes images, forms, and basic tables. It is designed for Web clients that do not support the full set of XHTML features, for example, web clients such as mobile phones, PDAs, pagers, and set-top boxes. The document type definition is implemented using XHTML modules as defined in [W3C Spec XHTMLMod]. A pure XHTML Basic document is a valid WML2 document..." See the announcement for WAP Version 2.0. [cache]

  • [August 01, 2001] "A Framework for Multilingual, Device-Independent Web Sites. Using Sun XML Language and Device-Independent JSP Extensions to Implement Dynamic Web Content" By Marc Hadley (Staff Engineer, Sun Microsystems). April, 2001; posted July 2001. ['Get a thorough introduction to using Sun XML Language and Device-Independent JSP Extensions to implement dynamic, multi-lingual, platform-independent Web content. Includes sample code.'] "The main problem with HTML, WML and and other presentation based markup languages is that the content is intermingled with the presentation in such a way that it is difficult to extricate one from the other. The eXtensible Markup Language (XML) solves this problem by providing a standard for the semantic markup of content along with a standard way, XSL Transformations (XSLT), to transform XML to presentation oriented delivery formats... Java Server Pages (JSP) provide an ideal platform for dynamic page generation. In particular JSP 1.1 Tag Extensions, also called custom tag libraries or taglibs for short, provide a simple way to encapsulate such functionality in an easy to use tag based form... The Sun XML Language and Device-Independent JSP Extensions (XML LDI Extensions) are a set of JSP tag extensions useful for creating multi-lingual and device-independent web pages. XML LDI Extensions are available in both binary and source code distributions and are released under the corresponding Sun License. Readers are assumed to have a passing familiarity with XML and to be comfortable programming JSP pages. Installation instructions for the Sun XML Language and Device-Independent JSP Extensions can be found online. ... JSP custom tag extensions provide a convenient and easy to use method of incorporating dynamic content in web pages. The combination of XML and context sensitive XSLT allows development of device independent web pages. Pages can be made multilingual by ensuring that XSLT stylesheets are language independent through the use of vocabulary repositories..." See other Sun XML Resources on the web site.

July 2001

  • [July 31, 2001] "Navigating XML." [Technology: Standards.] By Jacques Surveyer. In Internet World August 01, 2001. "Some decry the many dialects of XML, but extensibility is its raison d'etre, after all...Though there are many more XML tools targeted for client-side use, XML also is used much more extensively on the client, buried in programs that take advantage of XML's data interchange and distributed processing features. Part of the problem is that XML is like the many-headed Hydra, with new tools and functionality appearing almost biweekly. Indeed, XML and its ever-widening ensemble can be mapped into five major IT processing roles: a mechanism for standardized data interchange; a medium for temporary or persistent storage of objects; a method for more effectively exposing the relevance and meaning of information contained in XML data stores for external query and search engines; a means to separate the structure, formatting, and raw data for different browsing and display purposes; and a method to invoke local or distributed processes, including passing network messages in a standardized way. In just three or four years, XML has become the automated medium of choice for data interchange and short-term persistence. XML may also become a major player in Web services on account of its ability to invoke distributed processes. IBM, Microsoft, and others are working overtime to develop Web services based on Simple Object Access Protocol (SOAP) and Web Services Description Language (WSDL). Standard for Interchange Some observers are criticizing the proliferation of different dialects of XML such as ebXML, RosettaNet, XDDI, cXML, and others. But these are primarily vertical market dialects, or specialized XML document type definitions (DTDs). This diversification is to be expected, given the diverse arenas in which XML is being used. These dialects reflect the success XML has had as it emerges as the data interchange standard of choice. The appearance of the Document Definition Markup Language recommendation (formerly known as XSchema) from the W3C's XML working group allows organizations and cooperating groups to more precisely define their data resources with appropriate validation support. These XML DTDs and schemas, in turn, are supported by more powerful XSLT and language processor-based translation routines. The latter allow automatic conversion of XML data between two sources, with different underlying designs, as explicitly laid out in their DTDs and schema definitions. With XSLT mappings, transformations and filtering can be performed. The key is having precise and standardized definitions of structure, layout, and content ranges of permitted values. XML DTDs provide a good start; the new DDML recommendation brings it up to industrial strength. That is why XML is rapidly becoming the data interchange standard... It isn't hard to find a business Web site using XML processing for EDI-like transactions. Under the covers, many popular application servers rely on XML for temporary data storage or data exchange tasks. And XML is becoming popular for desktop storage. For example, Microsoft Excel and Visio programs offer XML as an alternate storage format. Visio already has third-party tools like Simul8 reading Visio drawings, adding simulation data to the drawing, saving it back out in Visio's .VXD XML-based file format. When Visio rereads the file, Simul8's XML elements are treated as comments by Visio but are retained when the file is saved again. In effect, XML files become a common shared file format. More important, Visio finds that the XML files are processed up to three times faster than their binary counterparts -- the larger the file, the greater the speed-up. This is an important feature, because XML-formatted files can increase in size by factors of 50 to 300 percent as tags are added. The trade-off appears to be speed and ready interoperability versus size and some security concerns."

  • [July 31, 2001] "Implementing Roles in WSFL. The Web services insider, Part 6: Assuming responsibility." By James Snell (Software engineer, IBM Emerging Technologies). From IBM developerWorks. July 2001. "Web services offer the potential for creating highly dynamic and versatile distributed applications that span technological and business boundaries, allowing service providers and service consumers to improve the way they do business. Web Service Flow Language (WSFL) extends this potential by building a framework in which service providers and consumers can come together to implement standard business processes; a place where the '"Who is doing what' is less important than the 'What is being done.' This framework allows anyone who properly implements the appropriate Web service interfaces to assume the various roles of business process. In this installment of the Web service insider, I continue my discussion of WSFL, focusing on how to become a service provider... The previous Web services insider installment introduced business process modeling and the concept of service provider types that fulfill the various responsibilities of implementing those business processes. In this installment, I'm going to take a much more in-depth look at becoming a WSFL service provider. The good news is that it takes nothing more than properly implementing either a single WSDL-defined Web service interface or several, using any Web services-enabled platform -- whether or not that platform is WSFL or even WSDL capable. In other words, you really don't have to know anything about WSFL to assume the role of a WSFL service provider. What you do have to know is how to take a Web services interface definition and turn it into an actual Web service implementation, and that's the process that I intend to explain here... Every activity within a WSFL flow model is implemented in the form of a Web service offered by a Web service provider and fulfilling one of the defined roles within the process. Each service provider is naturally expected to properly fulfill the requirements of implementing the Web service, or set of Web services that actually execute that activity. Each activity within a WSFL flow model is linked to an actual Web service implementation using information contained in the WSFL global model... WSFL's ability to allow any Web service provider to implement any of the activities defined in the business process is perhaps one of its most powerful and useful features. The ability to dynamically locate, and bind to providers based on a user-defined set of rules adds a new dimension to conducting business on the Web that did not exist prior -- dynamic federation and integration of loosely coupled application components. In the next installment of this column, I'll introduce you to another cool feature of WSFL: the ability to recursively compose new business processes from existing business processes..." Article also in PDF format. See: "Web Services Flow Language (WSFL)." [cache]
  • [July 31, 2001] "Case Study: Unisys Unlocks the Box. Executives like Lori Wizdo are helping Unisys meet their goals of moving full steam ahead on XML-driven knowledge management initiatives." By David F. Carr. In Internet World August 01, 2001. "'We have a head for e-business' is how Unisys Corp. has marketed itself to the world -- you know, those ads featuring folks with computer monitors grafted onto their shoulders. But now, this old-line computer company, historically known for its strength in back-end transactional systems for industries like banking, is making some changes because it needs to do a better job of helping its own people put their heads together... The software choice was the result of months of research, which came down to a bake-off between DataChannel and SageMaker Inc. Other products were ruled out because they couldn't support the multi-tier security architecture Unisys employs, or because they didn't have the right combination of front-end design tools and a solid infrastructure. Farbrother says DataChannel took the lead over SageMaker late in the game because it looked like it was going to be easier to administer. Dave Snyder, director of e-application services in Unisys' corporate IT department, says there really was no single deciding factor. But besides its focus on XML, DataChannel 'has a very solid vision of the future, and a very clear methodology for how to bring it to life,' he says. DataChannel essentially started out with XML-parsing software and migrated into the portal market. It has also just released extensions to its system for working with enterprise application integration products from partners such as SeeBeyond Technology Corp. However, Unisys has yet to tap the portal's high-end capabilities to any great extent. Snyder says most of the work so far has centered on using XSLT style sheet and transformations technology to alter document presentation. His group has also done some experiments with creating a presentation-independent XML interface to the corporate directory, which could be the first step to retargeting content to phones or personal digital assistants. But that was mostly a matter of demonstrating the potential, since mobile access to that data isn't a top priority, he adds. DataChannel consultants handled the XSLT customization for the initial corporate portal deployment, and DataChannel also provided training in XML technology and the implementation within its product. Now, Unisys IT staff has taken charge of creating and modifying XSLT templates for the community portals and updates to the corporate portal. Although others at Unisys have worked with XML technology, the internal IT staff is still learning, according to Kane. So far, the experience has been positive, and developers seem to be coming up to speed quickly, he says.... Trying to measure ROI on this project will take discipline, since this isn't a case where it can be easily quantified as a bunch of cost savings..."

  • [July 31, 2001] "Generalizing the OpenURL Framework beyond References to Scholarly Works: The Bison-Futé Model." By Herbert Van de Sompel ( Cornell University) and Oren Beit-Arie (Ex Libris -USA, Inc.). In D-Lib Magazine [ISSN: 1082-9873] Volume 7 Number 7/8, July/August 2001. "This paper introduces the Bison-Futé model, a conceptual generalization of the OpenURL framework for open and context-sensitive reference linking in the web-based scholarly information environment. The Bison-Futé model is an abstract framework that identifies and defines components that are required to enable open and context-sensitive linking on the web in general. It is derived from experience gathered from the deployment of the OpenURL framework over the course of the past year. It is a generalization of the current OpenURL framework in several aspects. It aims to extend the scope of open and context-sensitive linking beyond web-based scholarly information. In addition, it offers a generalization of the manner in which referenced items -- as well as the context in which these items are referenced -- can be described for the specific purpose of open and context-sensitive linking. The Bison-Futé model is not suggested as a replacement of the OpenURL framework. On the contrary: it confirms the conceptual foundations of the OpenURL framework and, at the same time, it suggests directions and guidelines as to how the current OpenURL specifications could be extended to become applicable beyond the scholarly information environment..." See the XML Schema for a sample descriptor format in Appendix A, and an encoded sample 'ContextObject' format in Appendix B. Background: see "NISO Develops an OpenURL Standard for Bibliographic and Descriptive Metadata."

  • [July 31, 2001] "XML-Native Databases. [Internet Technology.]" By David F. Carr. In InternetWorld July 15, 2001. "With new XML query tools, some say efficiency and preserving metadata make them the way to go. But are they the only way?'] "If XML is really, truly becoming all that it's been cracked up to be, then 'XML-native' databases should have a chance of loosening the stranglehold relational databases have long had on corporate data. When an XML database vendor talks, however, it's most often an object-oriented database vendor's lips that are moving. One of the bigger fish in this small but growing pond, eXcelon, was formerly known as Object Design, for example. Object Design, maker of the ObjectStore database, has been redefined as a subsidiary of eXcelon, reflecting the latter's emphasis on its XML-centric product line. The case for XML databases is simple. Vendors talk about how much more convenient it is for XML developers to work entirely within XML and not have to worry about how their data will be mapped onto relational database tables, while XML-native developers complain about the overhead required to disassemble an XML data structure for storage, then put it back together again. 'It bridges the gap between the database structure and the way the data is actually used,' says Tim Matthews, president of an XML database startup called Ipedo. 'When you can get XML content in XML by using XML, that bridges the gap between a developer and a database administrator.'... Then there were the object-relational databases that were supposed to give us the best of both worlds. Object-relational databases were supposed to revolutionize the management of Web content. These hybrid products would be better able to organize all the arbitrary relationships between HTML pages, embedded multimedia content, and so on. The object-relational model is still very much with us: Now it's being adapted to the management of XML data. In much the same fashion, object-relational mapping middleware is being recast as XML middleware. The reason for the parallel evolution is that XML is essentially a way of modeling object hierarchies as marked-up text. But there are key differences. For example, XML and the tools for creating and querying XML documents are accessible to a large pool of Web developers instead of being limited to a small community of object-oriented programmers... 'None of these people have defined technically what an XML database is -- it's entirely a marketing term,' says Ronald Bourret, a freelance programmer, researcher, and writer. Nevertheless, Bourret says, you might consider an XML-native product if your application truly revolves around XML and performance is crucial, as long as the vendor can substantiate claims of superior performance. Besides maintaining an open-source middleware solution called XML-DBMS, Bourret maintains a catalog of XML data management solutions on his personal Web site at Alternatives to 'going native' include middleware that translates between XML and relational databases and relational database vendors' who claim that their products are now 'XML-enabled'." See: "XML and Databases."

  • [July 31, 2001] "Adobe Helps Tackle Transactions. XML lets trading partners extract data from Adobe Acrobat files and import it to back-end databases." By Ann Sullivan. In Network World [Fusion] Volume 18, Number 31 (July 30, 2001), pages 31-32. "Service provider CCEWeb moves millions of dollars between financial institutions and their importer/exporter customers, and an unlikely software program plays a key role in the process: Adobe Systems' Acrobat. Toronto's CCEWeb handles Internet-based transactions for parties involved in international trade, including banks, shipping companies, customs brokers and insurance providers. The company's @GlobalTrade service combines credit card and bank payment functions with a document management system that tracks the progress of complex trade transactions. Along the way, Acrobatformatted documents, such as commercial invoices, letters of credit and insurance certificates, change hands. Adobe has made a name for itself in cross-platform document sharing with its PDF, which preserves graphics features so a document looks the same when it's received as it did when it was sent - layouts, fonts and images intact. But Acrobat traditionally has not been thought of as a corporate software application. These days, PDF is about much more than making forms look pretty. In Acrobat Version 5.0, released this spring, Adobe added a number of new features aimed at enterprise customers. One of those features is XML support for Adobe's electronic forms, called eForms, says product manager David Baskerville. With eForms, authors can specify fields such as account or invoice number, and data sources such as Open Database Connectivity (ODBC)-compliant databases, so that content can be pulled from back-end data sources and used to populate electronic forms. Information contained in fields can be saved in XML format and also exported to ODBC databases at the other end of a transaction. Previous versions of Acrobat supported data export only in a proprietary Adobe format. Adobe also enhanced Acrobat's collaborative features to enable parties to review and comment on PDF documents through Web browsers. Authorized viewers can add comments to a data repository such as a WebDAV server or an ODBC database. Previous versions allowed Web-browser access to PDF files, but annotation tools were not accessible in the browser window..."

  • [July 31, 2001] "Bowstreet Updates Web Services Platform." By Tom Sullivan. In InfoWorld July 30, 2001. "Bowstreet on Monday released an enhanced version of its Business Web Factory software for creating Web services. The Lynnfield, Mass.-based company claims that its Bowstreet Business Web Factory 4 enables enterprises to Web services-enable their e-business middleware, applications servers, enterprise platforms, and data. 'We are continuing to lower the barrier to Web services adoption,' said Chip Martin, a product manager at Bowstreet. To that end, new features include support for the latest Web services standards, including the core de facto Web services standards, XML, SOAP (Simple Object Access Protocol), and WSDL (Web Services Description Language). Furthermore, Business Web Factory brings a higher level of interoperability with Web services infrastructures from IBM, BEA Sy0stems, Sun Microsystems, and Microsoft, as well as applications from PeopleSoft and SAP. Martin added that Bowstreet's design environment has been integrated with JBuilde from Borland..." See the announcement: "Bowstreet announces new version of product that enables mainstream enterprise customers to immediately begin transition to dynamic web services. Business Web Factory 4 integrates with e-business middleware and applications from BEA, IBM, Microsoft, PeopleSoft, SAP, Sun Microsystems and others."

  • [July 31, 2001] "Open Source, Open Data: What XML has to offer Open Source." By Simon St.Laurent (Associate Editor, O'Reilly & Associates). ['I gave a presentation at last week's O'Reilly Open Source Conference on 'Open Source, Open Data: What XML has to offer Open Source.' Free beer, free speech, and now free love. What's the computing world coming to?] "Different kinds of freedom: 'Information wants to be free' is often applied to copyright issues - free speech - and pricing - free beer. There are additional ways information can be be free. Free love: XML makes it possible for the same information to interact with multiple programs in multiple environments. Instead of the information being bound inseparably to one program, it can be read, processed,and stored by any number of programs. Lowering the bar, raising the quality: Highly optimized binary formats require immense effort to figure out - ask anyone who's had to filter the Microsoft formats. The alternative has been simple text formats like CSV. XML offers the data quality of binary formats with the accessibility of text formats. There is, of course, a cost in verbosity..."

  • [July 30, 2001] "XML Gives Voice to New Speech Applications." By Steve Chambers. In Network World [Fusion] Volume 18, Number 31 (July 30, 2001), page 37. "Speech technology is evolving to the point where an exchange of information between a person and a computer is becoming more like a real conversation. Many factors are responsible for this, ranging from an exponential increase in computing power to a general advancement of basic speech technology and user interface design. Speech-based applications deployed to date have been based on code created by a few speech software vendors. VoiceXML will likely change this landscape by virtue of its promised vendor independence in creating speech applications. VoiceXML is the emerging standard for speech-enabled applications. It defines how a dialog is constructed and executed between a caller and a computer running speech recognition and/or text-to-speech software. VoiceXML incorporates the flexibility to create speech-enabled Web-based content or to build telephony-based speech recognition call center applications. . . Vocabularies and grammars are the key components that define the input to a speech-enabled page. The vocabulary consists of the words to be recognized by the speech recognition engine. For example, a vocabulary for a flight information system might consist of city names and travel-related words such as 'leaving' and 'fly.' Grammars provide the structure to identify meaningful phrases. A vocabulary and grammar are combined within a speech-enabled application to define speech recognition within a reasonable range of efficiency for both the caller and the speech recognition processor. Designing a speech application includes presenting data for delivery over the phone, constructing a call flow and enabling prompts and grammars. VoiceXML provides a common set of rules as a flexible foundation, but it's up to the designer to create the appropriate flow and personality for a speech system..." See "VoiceXML Forum."

  • [July 24, 2001] "RosettaNet Launches Manufacturing Specification." By Mitch Wagner. In InternetWeek July 23, 2001. "Semiconductor companies in the RosettaNet consortium in June started the first use of standardized interfaces for manufacturing systems. The interfaces are designed to use XML technology to ease communications between semiconductor companies and the businesses to which they outsource the manufacturing of semiconductors. Nine semiconductor manufacturing companies, including Motorola and National Semiconductor, joined in seven pairs to communicate information to deliver snapshots of the status of individual manufacturing lots. The communications used the latest completed RosettaNet standard, known as Work-In-Process Partner Interface Process (WIP PIP) 3D8. The standard simplifies communication between semiconductor companies such as Motorola, which design and distribute semiconductors, and manufacturing companies such as Chartered Semiconductor Manufacturing, which handle the actual manufacture of the semiconductors on an outsourced basis. WIP PIP 3D8 standardizes formats for providing information on the status of individual lots of semiconductors, or 'boats.' Each boat has an associated lot number, and the manufacturer needs to know specific information, such as the lithography process used to etch the wafers... Previous PIPs from RosettaNet -- the consortium has standardized 47 -- governed functionality further down the supply chain. Manufacturing is traditionally less likely to incorporate standardized processes, said RosettaNet CEO Jennifer Hamilton. 'When you're dealing with finished goods, there are standards in place such as bar coding and palette tracking, and you're building on existing automation standards and business practices,' Hamilton said. Moreover, semiconductor manufacturing companies weren't vertically integrated until five to 10 years ago. Only then did companies begin to outsource the manufacturing of semiconductors, making business processes even less standardized than in industries that grew up with outsourcing, Hamilton said." See "RosettaNet."

  • [July 24, 2001] "Sun Finishes JavaBean Spec, Teams With EDS." By Charles Babcock. In Interactive Week July 17, 2001. "Sun Microsystems announced Monday that the specification for Enterprise JavaBean 2.0 is finished and is available as part of Java 2 Enterprise Edition. The EJB 2.0 specification will make it easier for Java developers to use multiple tools, databases and application servers, mixing and matching different vendors, said Rich Green, general manager of Sun's Java Software Development division. Enterprise JavaBeans, or independent modules of Java code, may now incorporate Java Messaging Service. Through messaging, EJBs can link data in legacy systems to Web applications, he said. Java application servers, such as BEA Systems' WebLogic, IBM's WebSphere, Silverstream's Silverstream, Brokat Technologies' Enterprise Server and Borland's AppServer, and Oracle Application Server incorporate Java Messaging Service. Servers based on high-speed JMS messaging are available from Sonic Software and Talarian. Sun's own iPlanet unit also supports JMS. EJB 2.0 also includes a language for querying other Java systems and sharing information. In a related move, Sun will upgrade the Java 2 Enterprise Edition software development kit version 1.3, now in its second beta version, with support for Java's XML parsing interface, JAXP. The move gives Java application builders the opportunity to build in XML parsing for a wide variety of XML parsers. The expanded development kit also includes more support for asynchronous applications, which allows one business process to communicate with another, without the two needing to be in direct contact as the communication occurs. A message may be stored and forwarded at a moment when the target application isn't running at full capacity, Green said..." See the announcement: "Next Version of J2EE Is Now Available for Download, with Enhanced Features Developed through the Java Community Process. Enhanced EJB Specification 2.0 Brings Interoperability and Portability to a Wider Audience of Developers."

  • [July 24, 2001] "B2B Gateway simplifies integration." By Renee Boucher Ferguson. In eWEEK July 20, 2001. "Business-to-business sites struggling with integration among trading partners, across the firewall and over the Internet now have help in the form of a new integration tool. BroadVision Inc., of Redwood City, Calif., and WebMethods Inc., of Fairfax, Va., announced this week the availability of BroadVision B2B Gateway, built on WebMethods' integration platform. The tool is the first fruit of a deal inked earlier this year in which the pair agreed to bundle products. The jointly developed B2B Gateway integrates data from back-end systems such as enterprise resource planning, customer relationship management and supply chain management into dynamic Web sites. It also offers an information exchange across trading networks, with transformation of multiple data sources supported through interoperability standards such as XML (Extensible Markup Language), CXML (Commerce XML), EDI (electronic data interchange), OBI (Open Buying on the Internet), BizTalk and RosettaNet over the Net, said officials. Aimed at users looking to hook into e-commerce platforms and integrate with key suppliers and customers, Gateway's prebuilt on-ramps let suppliers quickly connect to e-marketplaces while allowing users to easily integrate with partners, officials said. ..Frey said, Xerox Corp.'s portal site, integrates with trading hubs, including Ariba Inc. and Commerce One Inc., and with customers' back-end legacy systems, but officials there were looking for a simpler method of integrating with key customers and partners. Rather than process batches of transactions, Gateway lets Xerox do transactions in almost real time, so data can be pulled and pushed almost instantaneously, said Frey, in Rochester, N.Y. B2B Gateway is available now from BroadVision. Pricing begins at $500,000."

  • [July 24, 2001] "Beware of jumping the gun on web services." By Eric Lundquist. In eWEEK July 20, 2001. "... this Web services business is in the pre-prebeta stage. Of course, being in the early stages hasn't stopped the vendors, the analysts and, yes, even the journalists from taking a murky subject and throwing in enough acronyms to make it incomprehensible. While there is Microsoft's .Net, which includes a framework, servers, orchestration, Passport and a C# programming language, Redmond has not been alone in piling on the white papers and acronyms way ahead of any actual product. You have the XML standard underlying everything and SOAP to provide the rules for tying stuff together and UDDI to find what services businesses are offering. And DISCO, WSDL and JUMP. No wonder thaters, making your company operate as seamlessly as possible and conducting business with your customers in a manner that makes you a valued partner. When it works well, it will be invisible in its operation. And your decision to design under the .Net architecture, IBM's WebSphere, Oracle's Dynamic Services, Hewlett-Packard's e-services or any of the other contenders will be a fundamental business and technological decision. Once you start affiliating with one company's service offering, it becomes very difficult to change directions. And that is why it is worth the time to investigate what Web services can mean for your company. Even if the budget has been cut, you should be able to draw a road map of how the services could change your computing infrastructure. Which brings us to the four words that started this column..."

  • [July 24, 2001] "Protection on the way for Web services." By Brian Fonseca. In InfoWorld July 20, 2001. "In response to users' security concerns about Web services, Netegrity, Securant Technologies, and Oblix are staking out territory as security providers for the fledgling market. Next week Netegrity will introduce TransactionMinder, a policy-based platform designed to centrally manage and provide access control to XML-based Web services and registries such as UDDI (Universal Description, Discovery, and Integration), said Jim Ducharme, director of development at the San Diego-based security vendor. At this point, TransactionMinder, due out next year, is limited to support of the Sun ONE architecture and webMethod's integration platform, although future plans call for support of Microsoft's .NET as well as BEA Systems' and Bowstreet's Web services systems. Securant Technologies announced this week it would offer preintegrated single sign-on, and Web-access management for BEA WebLogic Portal 4.0 through its ClearTrust SecureControl product. The move to integrate with BEA's platform, according to Securant co-founder and CTO Eric Olden, is crucial to the San Francisco-based company's Web services strategy. Securant helped pioneer the AuthXML standards proposal, which eventually became the Web security standard SAML (Security Assertion Markup Language), and has been incorporating SAML across its product line, Olden said. In fact, Securant plans to ship its first SAML-based product before the end of the year, even though SAML standards may not be ratified before that occurs, Olden said. Cupertino, Calif.-based Oblix got into the act this week with the release of its XML-enabled Oblix NetPoint 5.0. The Web access and identity management software product is built on a Web services infrastructure, according to Oblix officials..." See (1) the Netegrity announcement: "Netegrity Unveils Product Strategy for Web Services. New TransactionMinder Product First to Address Need for Securing and Managing Web Services", and (2) "Security Assertion Markup Language (SAML)."

  • [July 24, 2001] "WebDAV Access Control Protocol." IETF Internet Draft. Reference: 'draft-ietf-webdav-acl-06'. By Geoffrey Clemm (Rational Software), Anne Hopkins (Microsoft Corporation), Eric Sedlar (Oracle Corporation), and Jim Whitehead (U.C. Santa Cruz). June 21, 2001. Expires December 21, 2001. "This document specifies a set of methods, headers, and message bodies that define Access Control extensions to the WebDAV Distributed Authoring Protocol. This protocol permits a client to remotely read and modify access control lists that instruct a server whether to grant or deny operations upon a resource (such as HTTP method invocations) by a given principal." This specification is organized as follows. Section 1.1 defines key concepts used throughout the specification, and is followed by more in-depth discussion of principals (Section 2), and privileges (Section 3). Properties defined on principals are specified in Section 4, and access control properties for content resources are specified in Section 5. The semantics of access control lists are described in Section 6, including sections on ACE combination (Section 6.1), ACE ordering (Section 6.2), and principals required to be present in an ACE (Section 6.4). Client discovery of access control capability using OPTIONS is described in Section 7.1, and the access control setting method, ACL, is specified in Section 8. Internationalization considerations (Section 11) and security considerations (Section 12) round out the specification. An appendix (Section 19.1) provides an XML Document Type Definition (DTD) for the XML elements defined in the specification..." See the XML DTD... "The goal of the WebDAV access control extensions is to provide an interoperable mechanism for handling discretionary access control for content in WebDAV servers. WebDAV access control can be implemented on content repositories with security as simple as that of a UNIX file system, as well as more sophisticated models. The underlying principle of access control is that who you are determines how you can access a resource. The 'who you are' is defined by a 'principal' identifier; users, client software, servers, and groups of the previous have principal identifiers. The 'how' is determined by a single 'access control list' (ACL) associated with a resource. An ACL contains a set of 'access control entries' (ACEs), where each ACE specifies a principal and a set of privileges that are either granted or denied to that principal. When a principal submits an operation (such as an HTTP or WebDAV method) to a resource for execution, the server evaluates the ACEs in the ACL to determine if the principal has permission for that operation." The WebDAV Access Control Protocol group is a sub-working group of the IETF WebDAV Working Group. See the news item. [cache]

  • [July 24, 2001] "Startup looks to provide security, reliability for Web services." By Tom Sullivan. In InfoWorld July 23, 2001. "Looking to add security and reliable messaging for Web services and existing infrastructures, Kenamea made its software commercially available on Monday. The San Francisco-based startup is pushing what it calls an application network, or a platform for security and reliable messaging between enterprise systems, according to CEO John Blair. "It enables you to put Web services out in the cloud safely and reliably," Blair said. With Kenamea Application Network software, the company is targeting customers that are preparing for Web services, particularly those building applications that will be accessible across the Internet. The software enables disparate systems, J2EE (Java 2 Enterprise Edition), and COM (Component Object Model)-based systems to securely and reliably communicate with each other which, in turn, prepares them for access via the Web services model of any device supporting SOAP (Simple Object Access Protocol) and XML. Kenamea Application Network is available as a service that runs on proprietary messaging servers, but does not require additional communications hardware inside the corporate firewall. The company joins several startups hopping into the nascent Web services gold rush, such as Grand Central, Avinon, Velocigen, LogicLibrary, Curl, and Epicentric -- each of which are looking to fill in different niches not yet addressed by the major Web services infrastructure providers - as well as IBM, Microsoft, Hewlett-Packard and Sun Microsystems... Kenamea was founded in August 9 by John Blair and Bob Pasker. Prior to founding the company, Blair was a partner at Regis McKenna, and Pasker co-founded WebLogic and later sold it to BEA Systems." See "Industry's First Application Network Software Now Commercially Available. Software Provides Foundation for Reliable and Secure Communication Between Back-End Enterprise Applications and End-User Devices."

  • [July 23, 2001] "Xerces, XML4J, and XML4C add XML Schema support. Summer 2001 updates to Apache and IBM parser." By Natalie Walker Whitlock (Writer/Owner, Casaflora Communications) . From IBM developerWorks. July 2001. ['New versions of the Apache XML Project's Xerces parsers released in June support the W3C XML Schema Recommendation. The new Xerces for Java supports essentially all of the XML Schema spec; Xerces for C++ implements a more limited subset of XML Schema, an incremental step toward complete support of the newly anointed specification that will in many cases take the place of DTDs in XML development. IBM also released updates to the alphaWorks parsers -- XML4C and XML4J -- that correspond to the Xerces parsers. A table outlines the XML Schema features supported in this release of the parsers.'] "The two popular Xerces parsers from the Apache Software Foundation, Xerces Java (aka Xerces-J) and Xerces C++ (aka Xerces-C) made a great leap forward in June to support XML Schema. Xerces-J 1.4.1 boasts essentially complete support for the entire W3C XML Schema Recommendation. Xerces-C 1.5 supports a more limited subset of XML Schema. The alphaWorks parsers based on them, XML4J and XML4C, have also been updated with corresponding XML Schema support. The Xerces-C update is characterized as an important incremental step toward full W3C XML Schema support. In its announcement to its mailing list, the Apache XML Project promises to continue to update its open-source C++ parsers steadily, with the goal of implementing all of the features of the current XML Schema Recommendation before the end of the year... Other parsers, such as those from Oracle, XSV, XmlSpy, MSParser, and Extensibility, all claim some support for XML Schema. According to the companies' technical specifications, however, currently these parsers merely edit and validate XML schemas; they cannot read or interpret XML Schema instances. Additionally, at this writing the MSXML parser supports only Microsoft's version of the schema language, XML-data. According to my review of the online literature (Web sites, newsgroups, and mailing lists), the Xerces parsers (and their alphaWorks relatives) are the first to truly support advanced W3C XML Schema functionality..." For schema description and references, see "XML Schemas."

  • [July 23, 2001] "New technology gives Web a voice." By Wylie Wong. In CNET July 19, 2001. "A budding standard, the brainchild of tech giants AT&T, IBM, Lucent Technologies and Motorola, is fueling new software that allows people to use voice commands via their phones -- either cell or land-based -- to browse the Web. Users of the technology can check e-mail, make reservations and perform other tasks simply by speaking commands. The technology, called VoiceXML, is now winding its way through the World Wide Web Consortium Internet standards body, which is reviewing the specification and could make it a formal standard by year's end. Proponents of VoiceXML say standardization is crucial for the market for Web voice access software and services to take off. The standard gives software and hardware makers, as well as service providers and other companies using the technology, a common way to build software to offer Web information and services over the phone. . . Even though the VoiceXML specification hasn't been finalized, tech companies and telecommunications service providers alike have flocked to support the technology and are already offering new software and services that tie the telephone to the Internet. The technology has gained the support of nearly 500 companies, including IBM, networking giant Cisco Systems, database software maker Oracle and stock brokerage firm Charles Schwab... Another reason for the rush to develop voice-driven Web interfaces, particularly for cell phones is pending legislation in at least 35 states that will make it illegal to drive with a cell phone next to the driver's ear, except for during emergency calls. New York has already passed a law banning the practice. Voice command recognition using VoiceXML could help cell phone carriers, such as Verizon and Alcatel, to provide hands-free Web surfing. Both carriers, along with automaker DaimlerChrysler, are members of the VoiceXML Forum, an industry organization founded by AT&T, IBM, Lucent and Motorola, to promote VoiceXML specification... Speech recognition companies Nuance and SpeechWorks have updated their software to support VoiceXML. Their customers, such as American Airlines, United Parcel Service and E*Trade, use existing speech-recognition technology to offer voice-activated services to track flight and delivery times as well as stock prices. VoiceXML has spawned new companies, such as Tellme Networks, HeyAnita, BeVocal and VoiceGenie, which either offer voice portal services or sell the software that allows service providers and businesses to offer voice portal services. The voice-activated services provide basic information, such as stock quotes, traffic information and news headlines, as well as the ability to buy movie tickets online..." See "VoiceXML Forum."

  • [July 23, 2001] "Ixiasoft Speeds Access to XML Data. Vendor's TextML database is optimized for handling documents in XML format." By Amy Johnson. In ComputerWorld July 16, 2001. ['Niche: Native XML database that handles XML documents more efficiently than relational databases.'] "Bill Bean, vice president of business development at American LegalNet Inc., needed a high-performance database to serve up 170,000 files to more than 1 million users per month. The catch was that the Encino, Calif.-based online supplier of electronic legal forms kept its files in XML format. After some comparison trials that pitted relational databases against the TextML native XML database from Canadian start-up Ixia Inc. (known in the U.S. as Ixiasoft), the company went with the latter. Speed tests showed that the native product was at least 30% faster. 'It's fast, and it works,' says Bean, whose Web site,, went live in January. What makes TextML faster than a relational database, says Ixiasoft CEO Philippe Gelinas, is that it keeps information in original XML documents, rather than breaking it down into pieces and storing it in tables and cells as relational databases require. That conversion step is a significant performance drain, he says. In addition, the rigidity of relational database structures makes modifications to accommodate changes in the XML document structure a complex process. '[A native XML database] is a solid technology for managing for XML,' says Nick Wilkoff, an analyst at Forrester Research Inc. in Cambridge, Mass. XML files are designed in a hierarchical fashion, which is difficult to map to a relational database's table structure, he explains. But according to Wilkoff, the challenge for Ixiasoft is making a native XML database the preferred choice over relational databases for managing XML data. This could be a difficult idea to sell to IT departments that have a large investment in relational database infrastructure and programming skills... TextML runs on a Windows NT 4.0 or Window 2000 server. Since the product relies on some features of the Windows operating system that are hard to duplicate on Unix, support for Unix is still up in the air, Wilkoff says. TextML functions as a black box, so developers must build an application around it so end users can retrieve XML data, says Gelinas. Ixiasoft supplies an application programming interface for developers to build those applications, based on Microsoft Corp.'s COM+. The product will also support Microsoft's .Net Web-based services initiative." See: "XML and Databases."

  • [July 23, 2001] "ebXML Registry/Repository Implementation." Available July 2001. "This first implementation from Sun is based on the Java 2 Platform, Enterprise Edition (J2EE) technology. The Registry/Repository implementation can be used to submit, store, retrieve, and manage resources to facilitate ebXML-based business-to-business partnerships and transactions. Submitted information may be, for example, XML schema and documents, business process descriptions, business context descriptions, UML models, business collaboration information, Core Components, or even software components. The Registry/Repository implementation uses EJB technology, which reduces development complexity while providing automatic support for middleware services such as database connectivity, transaction management, scalability, and security. With this download you will receive the following main components: (1) Registry Information Model; (2) Registry Services; (3) Security Model; (4) Data Access API; (5) Java Objects Binding Classes; (6) JSP Tag Library... To install and deploy the Registry/Repository distribution you will require a J2EE environment. We suggest using the following software components: Netscape Communicator 4.72 or higher or Internet Explorer 5.0 or higher; JDK 1.2.2 or later; Application server environment; this implementation includes complete instructions for deploying the Registry/Repository using the following iPlanet Application Server environment components [...], Database server; Sybase or Oracle are recommended to be used for the Registry database..." See the earlier announcement of June 4, 2001: "Sun Releases The First Java Technology-Based Implementation of eBXML Registry and Repository for Web Services." Other references: "Electronic Business XML Initiative (ebXML)."

  • [July 23, 2001] "Version Forward: Version-proof Your Data." By Dan Kobelt. From July 2001. "If you are going to provide a web service, or distribute XML documents, you should version proof your data. Versioning XML data without changing the tools that process the documents is not difficult. All it takes is some extra thought in the design stage. This means that older software will even process XML structures that are invented after the associated software is shipped. A version-proofing design must include: (1) A way of knowing the data structure version of an XML document. (2) A DTD that validates that version of the document. (3) A transformation that puts documents into the current version, allowing your software to process it. Doing incremental transformations (from version 1.1 to 1.2 to 1.3 to 1.4) could lose data. Direct transformations (from version 1.1 to version 1.4) are the best. However, it is geometrically more work to craft all those transformations as the version numbers mount up. There are good articles that explain transformations further. I like DTDs because they are inexpensive, good quality error checkers. In the old days of data processing, a lot of extra code was needed in the application to validate data. Once a DTD is defined, a parser will do the work of seeing if the XML document is valid. DTD's can be as loose or as strict as you choose. Start using them as a good professional habit. As the application gets near completion, tighten the rules in the DTD. If the application is strictly controlling both ends of communication, a DTD is not needed. If these XML documents are widely distributed, or can be formed by anyone, a DTD is a great helper..."

  • [July 21, 2001] "Take My Advice: Don't Learn XML." By Michael Smith (Moderator, xml-doc mailing list). In O'Reilly News July 18, 2001. "If document authoring is important to you (you're a technical writer, an HTML markup author, manager of a documentation group, an anonymous pamphleteer) and you're trying to decide whether it would be worthwhile for you to learn XML and use it for authoring documents, stick around. What you learn might save you a lot of time and spare you from some unnecessary frustration... Why not learn XML? The short answer is: because 'learning XML' is a complicated task -- a chore much more complicated than learning, say, HTML, and one that you don't need to put yourself through if you're just looking for a better system for document authoring. So instead of learning XML, I recommend that you complete the following alternate course of study, which I've organized into just three easy 'lessons'... I'm sure you've already heard plenty of hype about the value of learning XML, so I don't expect you to simply make a leap of faith and take my word for it when I tell you that learning DocBook or TEI5 or some other specific markup dialect (instead of learning about XML in general) may very well be the best route to the document-authoring solution you've been looking for. I reckon that to be convinced, you probably want to know some details. So that's what I'll try to provide in the remainder of this lesson, using DocBook as an example, because it's my preferred dialect... I wrote this article not only to try to sell other document authors on what I see as the value of learning DocBook, but also to elicit some further discussion." See: "DocBook XML DTD."

  • [July 21, 2001] "Business Transactions in Workflow and Business Process Management." By Mark Potts and Sazi Temel. For the OASIS Business Transactions Technical Committee Workflow sub-committee. BTP draft 0.3c. July 20,2001. 14 pages. "Process Management Systems (Workflow) control and coordinate the execution of business processes consisting of heterogeneous and distributed activities and tasks. As advancements in distributed systems enable more pervasive computing models; Web services and businesses-to-business collaborative systems are becoming the more predominant methods of creating business applications. Process oriented workflow systems and business-to-business (B2B) applications, whether they are based on Web services or other distributed system technologies, require transactional support in order to guarantee consistent and reliable execution. However, classical (ACID) transactions and extended transaction models based on the ACID transactions are too constraining for the applications that include steps/tasks/activities/services that are disjoint in both time and location. It is paradoxical that while applications (and data) are becoming more and more loosely coupled, requirement for orchestrated transactions across these distributed services are increasing. The motivation is to create a business transaction protocol to be used in applications that require transactional support beyond classical ACID and extended transactions. The goal of Business Transaction Protocol (BTP) is to orchestrate loosely coupled software services into a single business transaction. There are several other protocols that are developed for various aspects of business process management and B2B collaborations, among them Business Process Modeling Language (BPML), Web Services Flow Language (WSFL), Electronic Business Extended Modeling Language (ebXML) and XLANG. BTP aims to be an underlying protocol that offers transactional support in terms of coordinating distributed autonomous business functionality, in the form of services, that can either be orchestrated by the application layer or a BPM system. This paper gives an overview of other related protocols and the opportunities that exist for BTP to be complementary for the specific protocols listed above...The Business Transaction Protocol, BTP, is a Committee Specification of the Organization for the Advancement of Structured Information Standards (OASIS)."

  • [July 21, 2001] "XML Object Serialization in .Net. Serialization techniques have been simplified in .Net to manipulate XML documents in a more object-oriented manner." By Dan Wahlin. In XML Magazine June/July 2001. "The .Net platform offers an incredible amount of XML-related features that provide programmers with more power and flexibility in building applications. The ability to persist object state using serialization techniques has long been a useful programming construct that has been greatly simplified in .Net through the use of attributes built into .Net languages. By using a specific set of attributes, the serialization process can be leveraged to manipulate XML documents in a more object-oriented manner, making them easier to program against. I'll demonstrate one of these XML-related features that focuses on using C# to serialize/deserialize objects to and from XML... XML serialization provides .Net programmers with unprecedented power and flexibility in working with objects and XML documents. In an upcoming issue I'll show how the techniques demonstrated in the IBuySpy portal site can be used to create an ASP.NET application that objectifies the config.web (web.config in Beta 2) file to make modifying it easier. If you'd like a preview of this application (based on Beta 1), take a look..." With program code.

  • [July 21, 2001] "SOAP InterOpera." By Steve Gillmor. In XML Magazine June/July 2001. "SOAP forms the core layer of Web services, and there is an active effort under way by leading developers to keep SOAP simple as a foundation for Web services that provide universal interoperability among implementations. In a roundtable discussion hosted by Editor in Chief Steve Gillmor and Editorial Director Sean Gallagher, principal developers -- SOAP coauthors Dave Winer and Noah Mendelsohn, Microsoft XML Chief Andrew Layman, and other luminaries -- discussed recent accomplishments and current challenges for interop and the standardization of Web services." See "Simple Object Access Protocol (SOAP)."

  • [July 21, 2001] "Programming Web Services with XML-RPC." By Simon St. Laurent, Joe Johnston, Edd Dumbill. From July 18, 2001. ['An excerpt from the recently published O'Reilly book Programming Web Services with XML-RPC, written by Simon St.Laurent, Joe Johnston and Edd Dumbill. The excerpt shows how XML-RPC, a simple XML web services technology, and PHP, the popular web page scripting language, can be put to use to integrate two web applications, neither of which need to be under the control of the programmer.'] "The following sections explore using PHP to integrate two web applications into one interface. The first section demonstrates how to create a complete PHP XML-RPC server application, in this case a discussion server. The web application to which this server will be connected is a database called Meerkat, the brainchild of Rael Dornfest and O'Reilly & Associates, Inc. (who also happen to be the publishers of this book). Meerkat is a storehouse of news about technological developments. After a subsequent section that gives an overview of Meerkat, the chapter demonstrates how to integrate the database with the custom XML-RPC discussion server..." See: "XML-RPC."

  • [July 21, 2001] "The Collected Works of SAX." By Leigh Dodds. From July 18, 2001. ['SAX is the topic for Leigh Dodds' XML-Deviant column this week. Leigh covers recent conversation on the XML developers' mailing list XML-DEV about collecting SAX-related utility code together, to include such things as support for XML Schemas and RDDL.'] "Now more than three years old, SAX (Simple API for XML) is the oldest and most stable XML API in widespread use today. Yet despite its obvious utility it can be quite daunting to programmers making their first foray into manipulating XML documents. It's no surprise, then, than many appear to prefer using the DOM API in their early coding efforts, despite its many quirks and additional overhead. A likely reason is that most tutorials introduce XML as a hierarchical data structure, which makes the DOM tree structure conceptually easier to understand initially. This is true even for Java programmers who might be expected to be more comfortable with event-oriented architectures, given their prevalence in Java APIs, with Swing being the obvious instance. Another factor that leads developers to DOM (or variants like JDOM and dom4j), despite SAX's efficiency is the additional programming effort required to develop a SAX application. Such effort includes writing appropriate callback handlers, employing a state machine, and so on. In contrast, building a DOM is simple, and manipulating it is relatively simple too. So any effort to reduce some of SAX's additional overhead should be well received..."

  • [July 21, 2001] "A Primer for HTTPR. An overview of the reliable HTTP protocol." By Francis Parr (IBM Research, Hawthorne), Michael H. Conner (IBM Internet Software, Austin, TX) and Stephen Todd (IBM MQSeries, Hursley). From IBM developerWorks. July 2001. ['Reliable HTTP (HTTPR) is a new protocol that offers the reliable delivery of HTTP packets between the server and client. This solves a number of issues that are evident in current HTTP and opens the way to reliable messaging between Web services.'] "Reliable messaging refers to the ability of a sender to deliver a message once and only once to its intended receiver. It is a necessary building block for most non-query communication between programs. The basic method for achieving reliable delivery is for the sender to send the message repeatedly to the target until the target acknowledges receipt of the message. The message must contain an identifier of some kind so that the target will discard any duplicates it receives. While this should be simple task to perform, it is surprisingly difficult to achieve in the full context of possible failures and acceptable efficiency... HTTPR is a protocol for the reliable transport of messages from one application program to another over the Internet, even in the presence of failures either of the network or the agents on either end. It is layered on top of HTTP. Specifically, HTTPR defines how metadata and application messages are encapsulated within the payload of HTTP requests and responses. HTTPR also provides protocol rules making it possible to ensure that each message is delivered to its destination application exactly once or is reliably reported as undeliverable. Messaging agents use the HTTPR protocol and some persistent storage capability to provide reliable messaging for application programs. This specification of HTTPR does not include the design of a messaging agent, nor does it say what storage mechanisms should be used by a messaging agent; it does specify information necessary for safe storage, when to store it, and for a messaging agent to provide reliable delivery using HTTPR. SOAP messages transported over HTTPR will have the same format as SOAP messages over HTTP. The additional information need to correlate request and response in the HTTPR asynchronous (or pseudo-synchronous) environment is put into the HTTPR message context header. The SOAPAction parameter is carried in the HTTPR message context header as the type app-soap-action. When request-response style SOAP messages are used, the HTTPR rules for response matching to specific requests must be followed. In particular, the message-id of the request message must be copied into the correlation-id of the response. Extensions to SOAP such as in ebXML and SOAP-RP contain application-level correlation information that must also be carried in the HTTPR message context header for this protocol... HTTPR provides the features of reliable messaging that are lacking on the Web. With the advantages explained in this article, we have shown that this new protocol will not only fit into the current infrastructure of the Web without major redevelopment, but will also satisfy the needs of enterprise applications that require such features. We have also shown how SOAP can operate over HTTPR to allow Web services to make use of these reliability features." Also available in PDF format.

  • [July 18, 2001] "XML for Data: Styling With Schemas. Using XML Schema Archetypes and XSLT Style Sheets to Simplify Your Code." By Kevin Williams (Chief XML architect, Equient - a division of Veridian). From IBM developerWorks. July 2001. ['This column by developer and author Kevin Williams demonstrates how to use XML Schema archetyping (and style sheets) to control styling of data for various presentation modes. Ten code samples in XML, XML Schema, and XSLT show how the techniques work to reduce code bulk and simplify maintenance.'] "In my previous column, I described how simple and complex archetypes may be used to simplify and streamline your XML schema designs. This column takes a look at one practical application of XML Schema archetypes: using style sheets to provide a consistent rendering of archetypes in the presentation layer... What are 'archetypes'? Archetypes are common definitions that can be shared across different elements in your XML schemas. In early versions of the XML Schema specification, archetypes had their own declarations; in the released version, however, archetypes are implemented using the simpleType and complexType elements... This column outlines the way you can use archetypes to streamline your coding experience. This discussion really only scratches the surface. In a large system that must support many presentation targets (HTML, wireless, other machine consumers) and many different source-document types (for bandwidth reduction or security reasons), using archetypes properly makes it very easy to keep your style sheet output consistent and correct." See previously: "XML for Data: Using XML Schema Archetypes. Adding Archetypal Forms to Your XML Schemas." For schema description and references, see "XML Schemas."

  • [July 18, 2001] "XML Catalogs." For the OASIS Entity Resolution TC. Working Draft 16-July-2001. Edited by Norman Walsh (Sun Microsystems). "The requirement that all external identifiers in XML documents must provide a system identifier has unquestionably been of tremendous short-term benefit to the XML community. It has allowed a whole generation of tools to be developed without the added complexity of explicit entity management. However, the interoperability of XML documents has been impeded in several ways by the lack of entity management facilities: (1) External identifiers may require resources that are not always available. For example, a system identifier that points to a resource on another machine may be inaccessible if a network connection is not available. (2) External identifiers may require protocols that are not accessible to all of the vendors' tools on a single computer system. An external identifier that is addressed with the ftp: protocol, for example, is not accessible to a tool that does not support that protocol. (3) It is often convenient to access resources using system identifiers that point to local resources. Exchanging documents that refer to local resources with other systems is problematic at best and impossible at worst. The problems involved with sharing documents, or packages of documents, across multiple systems are large and complex. While there are many important issues involved and a complete solution is beyond the current scope, the OASIS membership agrees upon the enclosed set of conventions to address a useful subset of the complete problem. To address these issues, this Standard defines an entity catalog that maps both external identifiers and arbitrary URI references to URI references..." Background references: "Catalogs, Formal Public Identifiers, Formal System Identifiers" and "SGML/XML Entity Types, and Entity Management." [cache]

  • [July 18, 2001] "Exclusive XML Canonicalization Version 1.0." W3C Working Draft 05-July-2001. By Donald E. Eastlake 3rd (Motorola) and John Boyer PureEdge Solutions Inc.). Latest version URL: Produced by the IETF/W3C XML Signature Working Group. Abstract: "Canonical XML [W3C Rec XML-C14N, 15 March 2001] recommends a standard means of serializing XML that, when applied to a subdocument, includes its namespace and some other XML context. However, for many applications, it is desirable to have a method which, to the extent practical, excludes such context. In particular, where a digital signature over an XML subdocument is needed which will not break when that subdocument is removed from its original document and/or inserted into a different document. The Exclusive XML Canonicalization method described herein provides such a method." Detail: "It is normal for XML documents and subdocuments which are equivalent for the purposes of many applications to differ in their physical representation. For example, they may differ in their entity structure, attribute ordering, and character encoding. The goal of this specification is to establish a method for serializing an XPath node set representing a subdocument such that this method has the following properties: (1) It is minimally affected by the XML context of the subdocument. (2) If the input represents a well-formed XML document, then the output will be a well-formed XML document whose exclusive canonicalization will be identical to that output. (3) So far as practical, it can be determined whether two subdocuments are identical, or whether an application has not changed a subdocument, except for transformations permitted by XML 1.0 and Namespaces in XML, by comparing their exclusive canonicalization. [Readership:] Complete familiarity with the Canonical XML Recommendation [XML-C14N] is assumed..."

  • [July 18, 2001] "XML Transformation Demo. With Apache, mod_perl, and AxKit." By Chuck Bearden (Rice University). ['XML files are three finding aids for collections in Fondren Library's Woodson Research Center, marked up in EAD.] "By way of follow-up to [discussion] about AxKit, I'm posting the link to a brief demo of XSLT transformation I created for an introductory XML course given by a colleague here at Rice... I use three finding aids from collections in our Woodson Research Collection, marked up in EAD as part of the TARO project. There are links permitting you to view them as plain XML (mime type 'text/plain'), and transformed according to each of five XSLT stylesheets. Let me emphasize that I don't know XSLT. Daniel Pitti of IATH at UVa created these stylesheets for use in the courses he teaches, and he kindly sent them to me for the purposes of creating this demo. These stylesheets were not designed for transforming EAD into HTML for public use on websites, but rather for illustrating XSLT transformation pedagogically. My demo is designed to show folks new to XML how one document source can be transformed into multiple outputs--something which readers of this list are well aware of. However, it also lets you see AxKit with libxml2 and libxslt at work..." See "Encoded Archival Description (EAD)" and "Extensible Stylesheet Language (XSL/XSLT)."

  • [July 18, 2001] "Wrapping up SVG." [Column on SVG.] By Mark Gibbs. In Network World Fusion July 09, 2001. "We promised to wrap up the topic of Scalable Vector Graphics and we'll do so discussing scripted animation. When we started with SVG a few weeks ago we discussed its use of declarative animation, where the animation is defined in the SVG file using Synchronized Multimedia Integration Language. While you can achieve a fantastic amount using declarative animation, you cannot interact with the user or anything outside the SVG graphic. This is where scripted animation comes into play - specifically, JavaScript-driven animation. To manipulate SVG using JavaScript, we can embed the SVG graphic in the document that contains the JavaScript or embed the JavaScript in the SVG code. Either way, the key to scripting is to name all the elements we want to manipulate... All we have provided is a glimpse of the mechanism used to animate SVG with JavaScript. The actual implementation is a little more complex, and we refer you to the Adobe tutorial for lots of code and examples... A promising SVG tool in the beta stage is Jasc WebDraw, which offers a built-in script editor and can import SVG files. We expect to see Illustrator, WebDraw and many other tools evolve rapidly to provide in-depth SVG support. Mark our words, SVG is a standard that will quickly become a key component of Web content..." See: "W3C Scalable Vector Graphics (SVG)."

  • [July 18, 2001] "XML Group Works On Election Markup Language." By Jim Carr. In MicroTimes Magazine Issue 223 (July 9, 2001). "Can technology save us from election nightmares such as last year's presidential voting debacle? An international group intent on setting a specification for automatically exchanging election information thinks so. OASIS, the Extensible Markup Language (XML) interoperability consortium, has formed a committee to standardize the exchange of election and voter services information using XML, which is an open standard used in the exchange of data on the Internet. The committee will develop what it calls the Election Markup Language (EML), an XML-based specification for the structured exchange of data among hardware, software and service vendors providing public and private ventures with election or voter services... Government elections will be just one of the ways in which EML can be applied, according to the consortium. The OASIS specification will also be applicable to private elections, such as those held by publicly traded corporations, credit and labor unions, pension plans, trade associations and not-for-profit organizations. The OASIS committee's work will cover a variety of election-related functions. These include voter registration, dues collection, change of address tracking, citizen/membership documentation, redistricting, requests for absentee ballots, election calendaring, polling place management, election notification, ballot delivery and tabulation, election results reporting and demographics. The organization's election and voter services technical committee, which OASIS said includes founding sponsors Accenture (formerly Andersen Consulting), Microsoft Corp. and Inc., will manage development of the EML specification..." See: "Election Markup Language (EML)."

  • [July 17, 2001] "RELAX NG: Unification of RELAX Core and TREX." By MURATA Makoto (International University of Japan, currently visiting IBM Tokyo Research Lab.) Paper [to be] presented at Extreme Markup Languages 2001, August 12-17, 2001, Montréal, Canada. "RELAX Core and TREX are schema languages for XML. RELAX Core was designed in Japan and has recently been approved as an ISO Technical Report (ISO TR 22250-1); TREX was designed by James Clark. RELAX Core and TREX are similar: they are based on tree automata and do not change information sets. On the other hand, there are some significant differences: attributes, unordered content models, namespaces, wild cards, the syntax, and the underlying implementation techniques. At OASIS, it was decided to unify these two languages and the new language is called RELAX NG. This talk shows how differences between RELAX Core and TREX are resolved in RELAX NG." See: "RELAX NG" and "XML Schemas."

  • [July 17, 2001] "Implementing Concurrent Markup in XML." By Patrick Durusau (Society of Biblical Literature) and Matthew Brook O'Donnell (University of Surrey). Paper [to be] presented at Extreme Markup Languages 2001, August 12-17, 2001, Montréal, Canada. "Texts in the humanities -- as well as other texts -- often exhibit or their users wish to encode multiple overlapping hierarchies using descriptive markup, e.g., by marking physical page features as well as textual and linguistic structures. The optional CONCUR feature of SGML has seldom been implemented and is not present in XML. Relying upon XPath expressions, the authors have implemented concurrent markup in standard XML. XSLT scripts are used to build and query across concurrent hierarchies. The authoring, validation and processing of the document instances required for this technique use standard XML software." See "SGML/XML and (Non-) Hierarchy."

  • [July 17, 2001] "Basic Semantic Web Language. A Simple RDF-In-XML Proposal." By Sean B. Palmer. 15-July-2001 (or later). Latest Version URL: "A proposal for a stripped down syntax based on RDF, as an application of XML..." See the announcement and discussion. [cache]

  • [July 16, 2001] Do you XML? Then you need to XSL." By Bill Pitzer. In ZDNET Developer July 09, 2001. "XML might carry alot of promise among today's Web developers, yet until very recently it offered little control over its mode of presentation: introducing XSL. XML has definitely been the buzz around the Web development world, especially following the release of Internet Explorer 5, with its updated XML parser. But while XML is great for describing structured data, it doesn't actually have a means of displaying the data that it describes. XML was designed this way so that data is kept separate from its presentation. Enter Internet Explorer 5 again, which now supports a subset of the Extensible Stylesheet Language, or XSL. XSL transformations turn XML into a grammar and structure suitable for display in a browser. Internet Explorer 5 supports many different types of XSL transformations. A complete listing can be found in the "XSL Developer's Guide" on Microsoft's site. In this article, we'll concentrate on applying XSL style sheets to XML documents. This allows direct browsing of the XML data contained in your page. Although Cascading Style Sheets (CSS) can be used for this purpose, they don't provide the flexibility allowed by XSL. ... This simple example barely scratches the surface of what can be done with XML. The language is a very powerful tool for manipulating data. Look more closely at the XSL reference and you'll see how easy it is to change the look of your output. Using this article as a starting point, take our structure and try to tweak it. Some things you could try: sort the records, filter them on certain field criteria, or actually change the data in a node before outputting it. XML's strength will become more apparent as you become more familiar with the language and its application. And, keep in mind that the transformation method we use in this article is a subset of the entire XSL language working draft. Microsoft intends to support XSL in its entirety, based on the final W3C recommendations..." For related resources, see "Extensible Stylesheet Language (XSL/XSLT)."

  • [July 16, 2001] XML Schema for ISBN. By Roger L. Costello and Roger Sperberg. Description: "Roger Sperberg and I have collaborated to create an ISBN simpleType definition. It defines the legal ISBN values for every country in the world... The ISBN schema was not able to check all the constraints on an ISBN number. One of the constraints on ISBNs is that the last digit must match a certain sum of the previous digits modulo 11. (This is all documented in the ISBN schema.) Clearly, this constraint is not expressable with XML Schemas. Consequently we needed to supplement the ISBN schema simpleType definition with something else. We choose to express the additional constraints using XSLT... this allows anyone in publishing who is working with XML Schema to incorporate ISBN validation in their applications without having to create it from scratch. The agencies for the 126 group codes (which mostly represent countries, but also geographical regions and language groupings) do not all follow the recommendations of the international ISBN agency, so pending further research, the validation for some groups is not full. (It is complete, however, for the English-speaking ISBNs)..." -- Postings from Roger L. Costello and Roger Sperberg. Schema references: see "XML Schemas." [cache, with examples, XSLT script]

  • [July 16, 2001] "Taxonomy of XML Schema Languages Using Formal Language Theory." By MURATA Makoto (International University of Japan, currently visiting IBM Tokyo Research Lab.) Dongwon Lee, and Murali Mani (University of California at Los Angeles/Computer Science Department). Paper to be presented at Extreme Markup Languages 2001, August 12-17, 2001, Montréal, Canada. PDF (print) version: 25 pages. "Most people are familiar with regular expressions, which define sets of strings of characters (regular languages). An extension of the idea of regular languages (as sets of strings) yields the idea of regular tree languages, which are sets of trees. From the ideas of regular tree languages, a mathematical framework for the description and comparison of XML Schema languages can be constructed. In this framework, four subclasses of regular tree languages and distinguished: local tree languages, single-type tree languages, restrained-competition tree languages, and regular tree languages. With these subclasses one can classify a few XML schema proposals and type systems: DTDs, the W3C XML Schema language, DSD, XDuce, RELAX, and TREX. Different grammar subclasses have different properties under the operations of XML document validation and type assignment..." Also available in HTML format. See the XPress project publications listing for related papers. Schema references: see "XML Schemas." [cache]

  • [July 16, 2001] "A Standards-Based Framework for Comparing XML Schema Implementations." By Henry S. Thompson (HCRC Language Technology Group and World Wide Web Consortium) and Richard Tobin (HCRC Language Technology Group). Paper to be presented at Extreme Markup Languages 2001, August 12-17, 2001, Montréal, Canada. "XML Schema processing may be described as a mapping from an input information set to an output post-schema-validation information set (PSVI). Information sets are not defined as concrete data structures, APIs, or data streams; they are abstractions. But if they realize the PSVI in different ways, how can one compare two implementations of XML Schema to check their consistency with each other? One simple standards-based approach is to reflect the PSVI as an XML document. One can then use standard tools to compare the output of the two processors. XSLT stylesheets can be used to display the reflected PSVI and to highlight differences between results produced by different processors or by the same processor from different inputs..." Schema references: see "XML Schemas."

  • [July 16, 2001] "How to Use XLink with XML. XLink Works for Basic Links or For Embedding External Resources." By Brett McLaughlin (Enhydra strategist, Lutris Technologies). From IBM developerWorks. July 2001. ['XLink, an XML-related specification, lets you achieve dramatic linking effects in your XML documents. In this short tip learn how to include parts of other XML documents in your own XML through XLink. The code example demonstrates the technique.'] "Since the release of XML more than two years ago, an incredible amount of interest in all things 'X' has developed. As proof of this fact, you can check out a ton of XML-related specifications these days: XPointer, XLink, XSD (XML Schema), RDF, RSS, XHTML, to name a few. In this tip, I briefly explore XLink, a particularly useful specification which defines an XML linking mechanism for referring to other documents. For those of you who are HTML authors, at first XLink may sound a lot like the a element that you are so used to, as in <a href="">Check out Nickel Creek!</a>. But XLink offers much more than unidirectional linking. Using XLink, you can create bidirectional links. You can also define how links are processed, and, most important, you can allow linking from any XML element (instead of from the a element only). For all these reasons, XLink is worth knowing about..." See "XML Linking Language."

  • [July 13, 2001] "Roadmap: A Guide to [A Guide to through December, 2001.]" By Sun Microsystems. July 2001. 41 pages. "XML File Format: XML has been adopted as the new native file format, replacing the old binary file format. The new XML based format provides an open standard for office documents and represents a complete specification encompassing all documents. The XML filters of the different modules store packages instead of plain XML files. A package is simply a compressed file (zip, tar). The plain XML file is contained within the 'Content' entry of the package. To reflect the new XML format, the file name extensions have been changed: word processing: sxw; word processing Master Document: sxg; spreadsheets: sxc; draw: sxd; presentations (packed): sxi (sxp); math: sxm (fully compliant with mml standards). Documents created in StarOffice 5.2 or earlier versions will continue to be supported. They will be able to be loaded and saved either in the new XML based format or in the old binary format... New WebDAV Support: With it will be possible to access resources using the WebDAV protocol, the IETF standard for collaborative authoring on the Web. WebDAV (Web-based Distributed Authoring and Versioning) is a set of extensions to the HTTP protocol which facilitate collaborative editing and file management between users located remotely on the Internet. This enables users to collaborate over the Web in the same way as they might over a corporate intranet... New Configuration Files: Configuration data, formerly mainly stored in *.ini-files, will be saved as XML data. The data has been restructured to store the configuration settings for different program packages or modules in appropriate XML files. The program will install the files under share\config\registry while special user data will be stored under user\config\registry... New MathML Filter: A new import and export filter will be available for MathML files based on the the standardized W3 XML file format for equation exchange..." [Post from Michelle Milledge: "I'd like to announce that now has a Roadmap detailing the features expected to be implemented by the end of the year 2001. Please check this out [from among] the list of white papers. In addition, I'd like to announce that will be represented in the Sun booth at the O'Reilly Open Source Conference which is taking place in San Diego, California this month. The whole conference is July 23-27 but the exhibits will be open July 25-27. We are sponsoring the keynote on Thursday where our good friend Craig Mundie of Microsoft will be touting the virtues of Shared Source followed by Michael Tiemann of Red Hat who will speak on Open Source. We will also be holding a Birds of a Feather session on Thursday night at 7pm to talk about where we've been and how we can better meet the needs of our expanding community in the future..." See: "StarOffice XML File Format." [cache]

  • [July 13, 2001] "The Visual Specification of Context." By Anne Brüggemann-Klein, Stefan Hermann, and Derick Wood. June, 2001. Theoretical Computer Science Center Research Reports, Report HKUST-TCSC-2001-06. Department of Computer Science, Hong Kong University of Science & Technology. "We introduce a new visual technique for the specification of context in hierarchically structured documents such as those defined by XHTML and XML. The technique is based on a formal graph model called T-graphs and on what we call T-configurations in (abstract) document trees. A T-configuration is a restricted substructure of a tree. The technique is implemented in the current version of DESIGNER. Although we apply this technique to the specification of context-dependent stylesheets for XHTML and XML documents, the technique has much wider applicability. For example, it can be applied to query specification for structured documents and to computer program transformations. Previously, we introduced a context-specification method that used regular expressions to specify sets of paths in a tree that we called caterpillar expressions. The work on T-graphs is an attempt to provide a 90 percent solution to context-specification problems that solves, in practice, almost all context-specification problems. T-graphs are much easier to visualize and to construct than are caterpillar expressions; they are both a restriction of and a generalization of caterpillar expressions. We compare T-graphs with the context specification techniques found in other stylesheet systems and we also provide examples of context that we can and cannot specify with T-graphs. Although T-graphs are restrictive, they lend themselves to visual construction and modification, our main requirement when we designed this context-specification method. We also investigate the time and space complexity of T-graph matching, a necessity for efficient implementation. [...] As always with implementations, the implemented version of T-graphs in Designer differs somewhat from the T-graphs we have defined. The major difference is that the full T-graph editor has not been implemented. Nevertheless, we have demonstrated the viability of the T-graph approach. More important, from the graphic designers' viewpoint, we have also implemented a more visual version of T-graph specification and editing. Although we have not yet carried out appropriate cognitive experiments to test the learnability and usability of the T-graph concept, we believe that such tests will not falsify the approach. We have proposed a dffierent approach to context specication based on tree walking and paths that uses caterpillar expressions. Each stem in a T-graph can be described by a caterpillar expression; however, the T-graph as a whole cannot, in general, be described by a single caterpillar expression. The reason is simple: caterpillar expressions, as currently defined, cannot simultaneously distinguish between 'to the left of a node' and 'to the right of a node'. For example, if we want to capture contextual nodes that have a node labeled 'a' to their left and a node labeled 'b' to their right, we cannot specify this relationship with a caterpillar expression. (Note that it is easy to construct a T-graph that checks this relationship.) We have implicitly assumed that we never access more than one T-graph at the same time. But, we may wish to generalize a context specification by combining two or more T-graphs to give the union of the contexts they specify. Alternatively, we may wish to break a complex context specification into partial specifications for which T-graphs are easier to construct. We then need to combine the resulting T-graphs. One obvious question to ask in both these scenarios is whether two or more T-graphs can all match a given T-conguration; indeed, whether there is a T-conguration that has this property. This question can be reduced to asking whether the right stems of two T-graphs can match the right stem of some T-conguration. This problem has been studied in pattern matching when both the text and pattern have wild-card symbols; however, the difference here is that we are not given a text, we are given only two 'patterns'. It is an interesting open problem whether we can solve this T-graph-matching problem efficiently." [This is a full and extended version of our conference papers that were presented at DL '99 and at ICCC/EP '99.]

  • [July 13, 2001] "Building a Web-based federated simulation system with Jini and XML." By Xueqin (Lily) Huang and John A. Miller (Computer Science Department, University of Georgia). Pages 143-50 (with 33 references) in Proceedings of the 34th Annual Simulation Symposium. ANSS'01, Seattle, Washington, April 22-26, 2001. "In a Web-based federated simulation system, a group of simulation models residing on different machines attached to the Internet, called federates, collaborate with each other to accomplish a common task of simulating a complex real-world system. To reduce the cost of developing and maintaining simulation models and facilitate the process of building complex collaborative simulation systems, reuse of existing simulation models and interoperability between disparate simulation models are of paramount importance. Moreover to make such a system highly extensible, the individual federates, which could reside on the same host or physically distributed hosts, should be able to freely join and leave a federation without full knowledge of its peer federates. Simply put, an ideal simulation system should allow for quick and cheap assembly of a complex simulation out of independently developed simulations and at the same time allow the participating simulations to have maximum independence. Fortunately this is made possible by some emerging Jini technologies, notably Jini and the Extensible Markup Language (XML). We introduce Jini and XML and present the design and prototype implementation of a Web-based federated simulation system using Jini and XML." See JSIM: A Java-Based Simulation and Animation Environment. Paper also available in PDF format.

  • [July 13, 2001] "XML-Based Messaging in the JSIM Simulation Environment." By Xueqin (Lily) Huang. Masters Thesis. May 2001. 83 pages. With User's Guide. "Software reuse and interoperability have long been the Holy Grail of the simulation community. The current state-of-the-art interoperability standard in simulation is the High Level Architecture (HLA). HLA is a US DoD mandate and an IEEE standard. It has generated much interest among simulation researchers and practitioners, but whether it will gain widespread acceptance by the mainstream simulation community depends on its ability to support the mainstream approaches and techniques. Jini and XML are two emerging technologies that will potentially have tremendous impact on the solutions to the reuse and interoperability problems in the simulation domain. This paper proposes the use of Jini and XML in building an open and flexible framework for a federated simulation system. It also presents the design and development of a prototype XML-based messaging system that is oriented towards our proposed system. A finding of our research is that Jini and XML may well complement each other in achieving interoperability between distributed components. Together, they constitute a promising infrastructure technology for future distributed systems." [cache]

  • [July 13, 2001] "An Efficient Data Extraction and Storage Utility For XML Documents." By Ismailcem Budak Arpinar, John Miller, and Amit P. Sheth (LSDIS Lab, Computer Science Department, University of Georgia). Pages 293-295 in Proceedings of the 39th Annual ACM Southeast Conference (ACMSE'01), Athens, Georgia, March 2001. "A flexible filtering technique and data extraction mechanism for XML documents are presented. A relational database schema is created on the fly to store filtered and extracted XML elements and attributes. Building an XML based workflow process repository provides a motivation. Dynamic XML technology combined with Java reflection provides for an efficient traversal method for XML hierarchies to locate the elements/attributes to be filtered... The repository, involving the Data Extraction and Storage Utility (i.e., Extractor), has the following main capabilities: (1) Filtering of XML objects that need to extracted, (2) Generating relational schemas for on-the-fly storage of XML documents, (3) Loading data from XML documents into relational tables, (4) Re-creating original XML documents as needed, (5) Querying, browsing, and versioning. Our scheme has superiority over other storage alternatives for XML documents in terms of practicality and flexibility. Practicality arises because of the obvious acceptance and wide us of Relational Database Management Systems (RDBMSs); flexibility is provided by selective extraction mechanism (i.e., filtering) employed by the Extractor, which is not available in similar approaches using a RDBMS. Other approaches, such as XML databases (e.g., Lore), might have superiority over our approach in terms of efficient storage and querying XML documents... Recently, XML gains a great acceptance as a data interchange format on the Web. Thus, providing storage and querying capabilities for XML attains interests of many researches. However, a broadly accepted solution is still missing. We believe that our approach provides for a flexible and practical solution until XML DBMSs are improved and standardized. Furthermore, the XML based workflow repository provides easy exchange of workflow process definitions between companies, and an integration tool to enable coordination of companies' business processes." Also available in PDF format. See: "XML and Databases."

  • [July 13, 2001] "Matching RDF Graphs." By Jeremy Carroll (HP Labs). July 13, 2001 (draft). "The Resource Description Framework (RDF) describes graphs of statements about resources. This paper explores the equality of two RDF graphs in light of graph isomorphism literature. We consider anonymous resources as unlabelled vertices in a graph, and show that the standard graph isomorphism algorithms, developed in the 1970s, can be used effectively for comparing RDF graphs..." [Post to '': "One of the improvements in Jena-1-1-0 [] is a matching algorithm that can tell if two models are the same. The algorithm aligns the anonymous resources; so that two files, identical except for the order of statements will compare equal. I've written up the algorithm used... It's based on a standard algorithm from graph theory. It could also be useful for deeper notions of equivalence (e.g. after we have decided that certain pairs of URI's actually refer to the same resource). Any feedback, including stuff like typos and spelling errors, as well as more profound comments, would be welcome. I plan to take the doc to a second final version in three weeks time, when I will post a technical report number and a non-transitory URL..." See "Resource Description Framework (RDF)." [cache]

  • [July 13, 2001] "Building Secure Web Services with Microsoft SOAP Toolkit 2.0." By Kirill Gavrylyuk (Test Lead, Web Data SOAP Team, Microsoft Corporation). MSDN. July 2001. ['Microsoft SOAP Toolkit 2.0 provides a flexible framework to build scalable Web services for various intranet and Internet solutions. Security is an important aspect of building reliable services in both scenarios. SOAP Toolkit 2.0 provides support for Internet security based on the IIS security infrastructure. This article describes how to build secure solutions with the Microsoft SOAP Toolkit 2.0.'] "As with any distributed protocol, a critical part of any successful SOAP application is getting the security right. The SOAP standard doesn't specify any security mechanisms but delegates security handling to the transport layer. In the case of the SOAP Toolkit 2.0, that transport layer is HTTP. SOAP running on HTTP is basically just a Web application like any other ASP or ISAPI application you have running on IIS. Authentication, authorization, and encryption for SOAP use the same mechanisms as your favorite Web applications do. If you're familiar with Web security, you already know SOAP security. If you haven't worked much with Web applications, this paper will give you enough background to get started. Each topic is covered at a fairly high level... Based on the ideas laid out in Designing Secure Web-Based Applications for Microsoft Windows 2000, we will start by outlining the golden rules in building secure Web services. The following seven categories stand behind a secure Web service: Authentication, Authorization, Auditing, Privacy, Integrity, Availability, Nonrepudation. Authentication is a process by which an entity, also called a principal, verifies that another entity is indeed who or what it claims to be. Yhe SOAP Toolkit 2.0 provides support for the following Authentication methods: Basic, Digest, Kerberos, Windows NTLM, SSL Client certificates, SOAP headers based Authentication, Proxy Authentication. In this document we will cover how to configure both the server- and client-sides to use these methods..." See "Simple Object Access Protocol (SOAP)."

  • [July 13, 2001] "alphaWorks' New XML Registry Tool Tracks Apps Down." By Michael Singer. From News (July 06, 2001). "Developers working with extensible mark-up language (XML) may never have to worry about misplacing important technical information on their projects again. IBM's emerging technologies lab alphaWorks Friday released a new resource that allows developers to store, search and manage XML-based applications. The new resource called the XML Registry/Repository (XRR) data management system is available as a free download from the Cupertino-based alphaWorks site. With the new software, developers can manage their XML schemas (DTD, XSD), stylesheets (XSL) and documents (WSDL). The goal is to simplify the process and to speed up the adoption of XML standards. With the help of XRR, a corporate employee can search or browse for an XML document or schema, then insert the document into their application automatically. XRR allows IBM's DB2 Universal Database to store and search XML documents by content, including tasks such as registration, searches by metadata, classification, and association. Take for example a developer who has to write an application to process health insurance claims. The claim forms are XML documents exchanged between the company and the insurance provider. The developer uses the XRR Web UI to search the XRR for schemas in the health insurance category and locates the appropriate schema document. After downloading the documentation and sample programs associated with this schema from the XRR, the developer completes the application and codes the application to access the insurance claim schema via a URL obtained from the XRR. The software is an implementation of the OASIS XML Registry Working Draft Specification 1.1. OASIS, a non-profit, XML interoperability consortium, is developing this specification for interoperable registries and repositories for XML-related entities, including but not limited to DTDs and schemas. alphaWorks also says it is now offering developers direct access to alphaWorks Utilities, a new online 'computing utility' in its first stages of development..." [Re: the June 21, 2001 minor update of the alphaWorks tool released on 2001-06-01] References: (1) "IBM alphaWorks Releases XML Registry/Repository Data Management System"; (2) XML/SGML Name Registration.

  • [July 13, 2001] "Binary Data to Go: Using XML-RPC to Serve Up Charts on the Fly." By Joe Johnston [co-author of Programming Web Services with XML-RPC] From July 9, 2001. ['O'Reilly software engineer Joe Johnston demonstrates how to use XML-RPC to transport binary data as base64-encoded messages and how to set up a simple Web service that creates charts with this data.'] "Although less famous than its younger sibling SOAP, XML-RPC is a simple and easy tool that can help you integrate even the most uncommunicative of systems. Where SOAP is a generalized, object-oriented, messaging protocol that is designed to carry arbitrary XML payloads across any network protocol, XML-RPC is a simple procedural protocol designed only to make remote function calls. Lest you get the impression that XML-RPC is inferior to SOAP, there are a good number of everyday problems that XML-RPC can solve adroitly. XML-RPC can transport binary data as base64-encoded messages, much the way email clients send attachments. Because this feature is somewhat rarely used in XML-RPC applications, it's worth taking a look at a Web service that creates simple charts for client programs using this technique... As of this article's publication date, three Perl implementations of XML-RPC exist, all available at your local CPAN Web site. The oldest and best known of these is Frontier::RPC2, written by Ken MacLeod. Unfortunately, the current stable release, 0.06, has a bug that hampers transmission of base64 objects. The newest module, RPC::XML was written by Randy Ray, and it provides the really useful feature of introspection, which is a method that lets remote clients ask the server for the remote procedures it provides. It also provides type checking to ensure that clients are providing the kinds and numbers of arguments that the implementing Perl procedure expects. The module used in this article is SOAP::Lite. Recently, Paul Kulchenko added XML-RPC support to his existing SOAP package. The result is a very solid and surprisingly flexible XML-RPC library. One of the advantages of using the XMLRPC::Lite classes that come bundled with SOAP::Lite is that you can create XML-RPC servers that look like CGI scripts. This means that the XMLRPC::Transport::HTTP::CGI class lets your system's Web server worry about mundane HTTP issues (like authentication and logging) and lets you concentrate on implementing your Web service API..." References in "XML-RPC."

  • [July 13, 2001] "Practical Extraction of Meaning from Markup." By C. M. Sperberg-McQueen (W3C), Claus Huitfeldt (University of Bergen), and Allen Renear (University of Illinois at Urbana/Champaign). ACH/ALLC 2001. New York University, 15-June-2001. Slides from a presentation on Practical Extraction of Meaning from Markup. Presented by C. M. Sperberg-McQuee at ACH/ALLC 2001. Reports on work being done together with Claus Huitfeldt (University of Bergen) and Allen Renear (University of Illinois at Urbana/Champaign).

  • [July 12, 2001] "Using WSDL in a UDDI Registry." From UDDI Working Draft Best Practices Document. Version 1.05. June 25, 2001. By Francisco Curbera (IBM), David Ehnebuske (IBM), and Dan Rogers (Microsoft). "The Universal Description Discovery and Integration (UDDI) specification provides a platform-independent way of describing services, discovering businesses, and integrating business services using the Internet. The UDDI data structures provide a framework for the description of basic business and service information, and architects an extensible mechanism to provide detailed service access information using any standard description language. Many such languages exist in specific industry domains and at different levels of the protocol stack. The Web Services Description Language (WSDL) is a general purpose XML language for describing the interface, protocol bindings and the deployment details of network services. WSDL complements the UDDI standard by providing a uniform way of describing the abstract interface and protocol bindings of arbitrary network services. The purpose of this document is to clarify the relationship between the two, describe how WSDL can be used to help create UDDI business service descriptions... As an aid to understanding the sections ahead, we provide here a brief overview of two UDDI data structures that are particularly relevant to the use of WSDL in the context of a UDDI registry: the tModel, also known as the service type definition, and the businessService. tModels provide the ability to describe compliance with a specification, a concept, or a shared design. tModels have various uses in the UDDI registry. We are interested here in the use of tModels to represent technical specifications like wire protocols, interchange formats and sequencing rules. When a particular specification is registered with the UDDI repository as a tModel, it is assigned a unique key, which is then used in the description of service instances to indicate compliance with the specification..." See: "Universal Description, Discovery, and Integration (UDDI)." [cache]

  • [July 12, 2001] "Defective Sign & Encrypt in S/MIME,PKCS#7, MOSS, PEM, PGP, and XML." By Don Davis (Shym Technology). Paper presented at USENIX 2001 - USENIX Annual Technical Conference. Abstract: "Simple Sign & Encrypt, by itself, is not very secure. Cryptographers know this well, but application programmers and standards authors still tend to put too much trust in simple Sign-and-Encrypt. In fact, every secure e-mail protocol, old and new, has codifiednaove Sign & Encrypt as acceptable security practice. S/MIME, PKCS#7, PGP, OpenPGP, PEM, and MOSS all suffer from this flaw. Similarly, the secure document protocols PKCS#7,XML-Signature, and XML-Encryption suffer from the same flaw. Naove Sign & Encrypt appears only in file-security and mail-security applications, but this narrow scope is becoming more important to the rapidly-growing class of commercial users. With file- and mail-encryption seeing widespread use, and with flawed encryption in play, we can expect widespread exposures. In this paper, we analyze the naïve Sign & Encrypt flaw, we review the defective sign/encrypt standards, and we describe a comprehensive set of simple repairs. The various repairs all have a common feature: when signing and encryption are combined, the inner crypto layer must somehow depend on the outer layer, so as to reveal any tampering with the outer layer." [Jeremy Epstein: "It's not that the crypto algorithms are broken, it's that they're being used in broken ways that allow surreptitious forwarding, among other things..."] Also in PostScript format; ; see the abstract; [cache].

  • [July 12, 2001] "Recurse, Not Divide, To Conquer. Why not to divide an HTML element between XSLT templates, and what to do instead." By Benoît Marchal (Consultant, Pineapplesoft). From IBM developerWorks. July 2001. ['Software consultant and author Benoît Marchal answers an XSLT student's frequently asked question: How do you divide an HTML element between two XSLT templates? The trick is to ask the right question. This article demonstrates how to shift your thinking into the XSLT recursive approach, which is especially helpful if you have a background in a procedural language (Java and the like). Sample code demonstrates the right way (and the wrong way) to work with a flat XML or XHTML file that you want to process hierarchically.'] "I like to think of XSLT (XSL Transformations) as a simple and effective scripting language to manipulate XML documents. I have used XSLT in a broad range of applications encompassing publishing and application integration. I have come to enjoy XSLT, but I have also learned that it can be disconcerting to experienced developers who are learning XSLT because it has a distinct functional/recursive flavour (as opposed to procedural programming languages, such as Java). As this article illustrates, understanding the XSLT working model makes it possible to develop algorithms that work well with the language. A common question Students of XSLT regularly ask me how to split an HTML (or XML) tag across two XSLT templates. The question arises when a developer is trying to add a hierarchical level to an XML document. I think it's worth studying this problem in some details for two reasons: (1) It's a frequently asked question, and many developers will benefit from an answer. (2) Even more importantly, it's the wrong question to ask. In this article I'll suggest which question makes more sense, and I'll tell you the answer to that question..." Also available in PDF format.

  • [July 12, 2001] "The XML Behind Groove's Custom Applications. [Using XML to Configure Groove.]" By Brian Buehling. From July 11, 2001. "This article examines the use of XML to configure custom applications for Groove, a peer-to-peer groupware platform...One of the most promising applications of peer-to-peer technology intended to enhance online collaboration is Groove, the P2P platform of Groove Networks. Groove announced the general availability of its flagship product earlier this year, and since that time there's been quite a bit of attention drawn to the Beverly, Massachusetts technology company. Led by Lotus Notes pioneer Ray Ozzie, the company has raised $60 million of funding to develop and market its collaboration platform. Unlike P2P applications that focus on file sharing or distributed computing, the Groove platform focuses on sharing integrated workspaces among corporate users. Groove users can set up secure workspaces to collaborate on projects by accessing corporate databases, sending instant messages, or browsing the Web together... Groove provides a set of code samples and API documentation to help developers who want to begin building Groove tools, which are custom applications designed to work within the Groove platform. Programmers can quickly develop collaborative P2P applications by creating these tools with the Groove Development Kit (GDK). Programmers with a solid understanding of XML will have less trouble than others making the transition to building Groove tools. XML plays is a significant part of the Groove architecture. Though most of the work required to develop Groove tools involves writing code in JavaScript, VB, or C++ to control the tool application logic, there are four XML files that developers have to configure to allow users access to their custom tools. The rest of this article will examine these files and their roles..."

  • [July 12, 2001] "Generate Graphics Dynamically Using Perl, XML, and SVG." By Kip Hampton. From July 11, 2001. "Scalable Vector Graphics (SVG) is a compact XML language to describe two-dimensional images. With SVG you can create extremely sophisticated images complete with paths, layering, masks, opacity control, animation, scriptable interactivity, and a small host of other advanced features -- all using nothing more than your favorite text editor or XML tools. This month we talk about creating SVG documents quickly and simply using Perl and David Meggison's XML::Writer module... It's worth noting that SVG requires a special browser plug-in or standalone viewer to view the rendered markup as the intended image. Rather than requiring readers to download one of these tools in order to view the results of the code samples, I have rasterized and exported the generated SVG images into Portable Network Graphics (PNG) using a utility that ships with the Apache Software Foundation's Batik project... While I rarely offer subjective value judgments about the technologies I discuss in this column, I'm going to make an exception in this case. SGV is astonishingly cool. The examples above barely scratch the surface of SVG's flexibility and communicative power. To brush it aside as just another way to make 'pretty pictures' is to miss the point completely. The combination of Perl's XML tools and SVG provides a range of creative options that would be difficult to achieve by other means. I hope that you have been inspired to continue to investigate both SVG and Perl's ability to generate it." See references in (1) W3C Scalable Vector Graphics (SVG) and (2) "W3C Scalable Vector Graphics (SVG)."

  • [July 12, 2001] "P2P and XML in Business." By Brian Buehling. From July 11, 2001. ['An overview of the application of peer-to-peer technology in the enterprise, and the role played by XML.'] "Following the growth of business-to-business exchanges and supply chain management systems, the emergence of peer-to-peer (P2P) computing is likely to become another deployment arena for XML technology. Whether exchanging user messages, application state, or processing instructions, relaying information effectively is a critical component of any P2P application. By using XML system designers can establish rules for peer interaction that allow developers to build applications independently... XML offers an ideal mechanism to transfer short, structured messages between peer applications. XML can be easily customized for specific P2P systems and readily transmitted over today's Internet protocols. XML data can be encrypted using existing technologies, making it an ideal candidate for secure messages. There are already several implementations of XML-based messaging schemes, including SOAP and XML-RPC. Utilizing XML to cache application data locally in P2P systems offers several advantages. Caching data in XML allows for more flexibility and easier retrieval than custom or unstructured formats, and it has a much smaller overhead than installing a relational database on each peer. Developers can take advantage of XML handlers to search, validate, retrieve, and manipulate the data needed to support the peer application. This approach will reduce the overall complexity of the P2P system. In many cases XML stores are easier to implement than storing unstructured data directly in the file system and require less system resources to operate than relational databases. XML can also be used to help manage the deployment of the application components to peers in the network -- often one of the most difficult challenges of P2P systems. With the potential of having millions of peers interacting, having an effective process to distribute software updates is essential to the long-term success of any P2P system. One XML-based solution to this problem is Open Software Description (OSD)..."

  • [July 12, 2001] "Issues with the W3C's TAG, and XML Blueberry update." By Leigh Dodds. From July 11, 2001. "This week the Deviant reports on the impending formation of a new Web Architecture group within the W3C and provides an update on the state of XML Blueberry." See: "XML Blueberry Requirements." W3C Working Draft 20-June-2001. Edited by John Cowan (Reuters).

  • [July 12, 2001] "Will Microsoft embrace or deface XML?" By Jim Rapoza. In eWEEK (July 09, 2001). "In my reviews of XML standards and Microsoft's .net products, I've consistently questioned whether Microsoft will support the standards and, if so, how fully. Well, just about two months after the release of the key XML Schema standard, it looks as if Microsoft is going for full support of the standards, rather than its more common embrace-and-extend strategy. I came to this conclusion while at the Microsoft TechEd conference last month in Atlanta. In sessions about deploying Web services and using XML (which were heavily attended), Microsoft developers and managers consistently showed examples based on thorough support of XML standards. I even attended a session on the XML Schema standard itself, where the presenter said straight out that developers should be using the XML Schema Definition instead of the XML Data Reduced schema on which many Microsoft products, including BizTalk Server 2000, are based... Let's give Microsoft credit for realizing that this strategy would be a disaster with XML, especially because XML is at the heart of the company's .Net strategy. If Microsoft wants .Net to become one of the big platforms for developing B2B and Web services, then its tools need to work well with everything. If .Net only works well with Microsoft systems, then businesses won't use it to build their e-business infrastructures..."

  • [July 12, 2001] "SOAP Specification for Web Services Progresses." By Tom Sullivan. In InfoWorld July 09, 2001. "The W3C (World Wide Web Consortium), in Cambridge, Mass., on Monday published a working draft for the SOAP (Simple Object Access Protocol) 1.2 standard. SOAP, a key standard embraced by vendors pushing Web services, is a data transfer protocol for sending and receiving XML information. The new working draft of SOAP 1.2 operates under a refined processing model, and major enhancements include compliance with the W3C Schema Recommendation and the use of XML Namespaces. Furthermore, the draft includes recommendations for error messages for mandatory extensions. The W3C said the recommendations provide developers with more pertinent information to help them build more interoperable and extensible applications... [MS] Desautels said that SOAP 1.2 will not break SOAP 1.1, and upgrading will be a nominal rewrite that only affects a small part of the system. So companies that already have a SOAP 1.1 implementation will not need to rewrite the back-end logic or the actual Web service. SOAP works in conjunction with the XML, WSDL (Web Services Description Language), and UDDI (Universal Description, Discovery, and Integration) standards to form the core de facto Web services protocols that enable the describing, registering, locating, and consumption of Web services."

  • [July 09, 2001] "From DTDs to XML Schemas. [EXPLORING XML.]" By Michael Classen. From July 2001. ['Describing XML documents using XML Schemas offers a number of advantages over DTDs. In today's Tools Treasure Hunt, XML explorer Michael Classen introduces you to a utility that will help you convert your existing DTDs to XML Schemas.'] "The XML Schema standard was conceived to improve on DTD limitations and create a method to specify XML documents in XML, including standard pre-defined and user-specific data types. Defining an element specifies its name and content model, meaning attributes and nested elements. In XML Schemas, the content model of elements is defined by their type. An XML document adhering to a schema can then only have elements that match the defined types. One distinguishes simple and complex types. A number of simple types are predefined in the specification, such as string, integer and decimal. A simple type cannot contain elements or attributes in its value, whereas complex types can specify nesting of elements and associations of attributes with an element. User-defined elements can be formed from the predefined ones using the object-oriented concepts of aggregation and inheritance. Aggregation groups a set of existing elements into a new one. Inheritance extends an already defined element so that it could stand in for the original. The DTD to XML Schema Conversion Tool takes a DTD and translates it into its equivalent XML schema definition..." For schema description and references, see "XML Schemas."

  • [July 09, 2001] "Zvon UDDI Reference." By Miloslav Nic. July 2001. "We have just published the Zvon UDDI Reference based on the final version of XML Schema standard. The normative XML Schemas uddi_v2.xsd, uddi_v2replication.xsd, uddi_v2custody.xsd, and uddi_1.xsd are available at The reference contains a comparison of versions 1 and 2, and both versions contain both abbreviated and expanded schema for the selected element as well as the context it can appear in. See also the Zvon SOAP Reference and WSDL Reference. [Posting from Miloslav Nic] See: "Universal Description, Discovery, and Integration (UDDI)."

  • [July 05, 2001] "Extending XML Query Language. XQuery for Querying Relational Databases." By John Gao (Tracker Business System, Richland, WA) and Devin Smith (Pacific Northwestern National Lab). Masters Thesis (Washington State University). May, 2001. Abstract: "The objective of this research is to extend an XML query language for querying XML documents stored in relational databases. XML has become the dominant language for computerized data representations and exchanges. XML documents are mainly stored in file systems, but database systems can provide better data security and concurrency management. For practical reasons, relational databases will be the main storage system of XML documents if information in the XML documents is to be retrieved efficiently. SQL is primarily used for querying relational data. XML query languages are developed mainly for XML documents in files systems or non-relational databases. No query language is available for querying XML documents in relational databases efficiently. SQL extended with XML query power or an XML query language extended with relational query functionality may be used to query XML documents in relational databases. XQuery (working draft), the XML query language proposed by the World Wide Web Consortium, combines superior features from other XML query languages, SQL, and object-oriented query languages. XQuery can query relational databases only after data are transferred to XML documents, costing time and memory. The query structure of the FOR/LET-WHERE-RETURN (FLWR) in XQuery is similar to the SELECT-FROM-WHERE structure in SQL. Thus, XQuery was extended for querying relational databases directly by adding a Mode expression in FLWR to control data input and output types. Extended-XQuery can query XML documents in relational databases whether they are stored as blob fields or mapped into relational tables. Like SQL, Extended-XQuery can query relational data directly without any data transformation. A relational query engine from Interbase is used by Extended-XQuery for querying relational data in this research. Extended-XQuery queries are translated to SQL queries and then executed with the SQL engine. A user interface for translating and executing Extended-XQuery queries was implemented with Borland Delphi 5. To be a fully functional database language, XQuery also needs to be extended with the functionality for update, insert and deletion in future research..." See "XML and Query Languages."

  • [July 05, 2001] "Basic XML and RDF Techniques for Knowledge Management. Part 1: Generate RDF using XSLT. [Thinking XML #4.]" By Uche Ogbuji (CEO and principal consultant, Fourthought, Inc.). From IBM developerWorks. July 2001. ['I've started a series in my column about working with RDF in an XML system. I use an issue tracker as my example: hopefully one that will soon be the live tracker of RIL, 4Suite, etc... Columnist Uche Ogbuji begins his practical exploration of knowledge management with XML by illustrating techniques for populating Resource Description Framework (RDF) models with data from existing XML formats. As shown in the three code listings, RDF can be used as a companion to customized XML, not just as a canonical representation for certain types of data. This column, with code samples included, demonstrates how easy it can be to jump-start knowledge management with RDF even relatively late in the development game.'] "Although Resource Description Framework (RDF) was designed by the W3C as a general metadata modeling facility, it offers many features that make it an ideal companion to XML data. In many emerging XML applications, the knowledge encapsulated in the application throughout its lifetime is stored in XML documents in a database or repository. The basis of RDF's strength as a knowledge-management tool is that it allows you to organize, interrelate, classify, and annotate this knowledge, thereby increasing the aggregate value of the stored data. RDF has a reputation for complexity that is belied by the simplicity of adding RDF support to XML-based applications. This article begins an exploration of the symbiosis between RDF and XML. I'll demonstrate how to use XSLT to generate RDF from XML... In this column I have presented a simple example of the use of XSLT to extract RDF from XML instances. As more and more XML-based applications come into use, such techniques are useful in expanding applications with knowledge-management features. The next installment will continue with the issue tracker example, demonstrating batch processing of the issue documents and some open-source tools useful for such processing." Article also available in PDF format. See "Resource Description Framework (RDF)."

  • [July 05, 2001] "Transforming XML: Math and XSLT." By Bob DuCharme. From July 05, 2001. ['XSLT is primarily for transforming text, but you can use it to do basic math too.'] "XSLT's full support of XPath's math capabilities lets you do all the basic kinds of arithmetic and a little more. Let's look at a stylesheet that demonstrates these capabilities by using the values from this document... XSLT is about manipulating text, not numbers, but you can build on the mathematical operations provided as part of XSLT to perform more complicated calculations. For example, the following stylesheet, which accepts any document as input, computes the value of pi. The precision of the result depends on the value of the iterations variable..." For related resources, see "Extensible Stylesheet Language (XSL/XSLT)."

  • [July 05, 2001] "XML Q&A: Namespace Nuances." By John E. Simpson. From July 05, 2001. ['This month's Q&A column tackles the question of how to write DTDs for XML applications that use namespaces.'] "Namespaces enable you to mix, in one XML document, element (and sometimes attribute) names from more than one XML vocabulary... Once you start using namespaces in a particular document, you must commit to going the whole way. In theory, only the names of the two table element types needed to be disambiguated. In practice, though, you use namespaces to disambiguate entire vocabularies -- even the names of elements, like td and chair above, which are already unambiguous. Thus, if you decide to require that the furniture-type table have a furn: prefix, you're committed to using that prefix on the names of the furniture, chair, and lamp elements as well... Here's how the fragment of a DTD above, way back at the beginning, could be modified to accommodate both validation and namespaces. Now your application will find an element named f:deposit in the DTD, whereas before the DTD declared only an element named deposit (no prefix). And now the rest of the document can use any of the four explicit prefixes on any element name, as long as those names, including prefixes, are declared in the DTD. If an element named s:envelope appears in the document, an element named s:envelope must be declared in the DTD. A declaration for a simple envelope element won't suffice..." From Ron Bourret's FAQ: "DTDs can contain qualified names but XML namespace declarations do not apply to DTDs. This has a number of consequences. Because XML namespace declarations do not apply to DTDs: (1) There is no way to determine what XML namespace a prefix in a DTD points to. Which means... (2) Qualified names in a DTD cannot be mapped to universal names. Which means... (3) Element type and attribute declarations in a DTD are expressed in terms of qualified names, not universal names. Which means... (4) Validation cannot be redefined in terms of universal names as might be expected. This situation has caused numerous complaints but, as XML namespaces are already a recommendation, is unlikely to change. The long term solution to this problem is an XML schema language: all of the proposed XML schema languages provide a mechanism by which the local name in an element type or attribute declaration can be associated with an XML namespace. This makes it possible to redefine validity in terms of universal names..." On namespaces, see (1) XML Namespaces FAQ [Maintained by Ronald Bourret] and (2) "Namespaces in XML."

  • [July 05, 2001] "XML-Deviant: Against the Grain." By Leigh Dodds. From July 05, 2001. ['XML developers are talking about a perennial question: how can XML and database technologies be integrated appropriately?'] "...summarizes some of the comments made by XML-DEV members in response to a recent critical article on the relationship between XML and databases... Extending the capabilities of database management systems to facilitate the move from art to science can only be a good thing. At least, it is difficult to see how it could be a bad thing. It also seems obvious that building this work using formal models is smart, which exactly what the XML Query work is doing. Only this time the model and the query syntax are being developed hand in hand. Unlike relational theory and SQL we should hopefully have a standard XML query language very shortly. One might also hope that this would limit mismatches between the two, which seems to be the case for pure relational models and those expressed by SQL. Promisingly, just this week two early implementations have been announced which means developers can finally begin to come to grips with this new technology..."

Hosted By
OASIS - Organization for the Advancement of Structured Information Standards

Sponsored By

IBM Corporation
ISIS Papyrus
Microsoft Corporation
Oracle Corporation


XML Daily Newslink
Receive daily news updates from Managing Editor, Robin Cover.

 Newsletter Subscription
 Newsletter Archives
Globe Image

Document URI:  —  Legal stuff
Robin Cover, Editor: