The Cover PagesThe OASIS Cover Pages: The Online Resource for Markup Language Technologies
Advanced Search
Site Map
CP RSS Channel
Contact Us
Sponsoring CP
About Our Sponsors

Cover Stories
Articles & Papers
Press Releases

XML Query

XML Applications
General Apps
Government Apps
Academic Apps

Technology and Society
Tech Topics
Related Standards
Last modified: May 28, 2002
XML Articles and Papers. April - June 2001.

XML General Articles and Papers: Surveys, Overviews, Presentations, Introductions, Announcements

References to general and technical publications on XML/XSL/XLink are also available in several other collections:

The following list of articles and papers on XML represents a mixed collection of references: articles in professional journals, slide sets from presentations, press releases, articles in trade magazines, Usenet News postings, etc. Some are from experts and some are not; some are refereed and others are not; some are semi-technical and others are popular; some contain errors and others don't. Discretion is strongly advised. The articles are listed approximately in the reverse chronological order of their appearance. Publications covering specific XML applications may be referenced in the dedicated sections rather than in the following listing.

June 2001

  • [June 29, 2001] "XML for Data: Using XML Schema Archetypes. Adding Archetypal Forms to Your XML Schemas." By Kevin Williams (Chief XML architect, Equient - a division of Veridian). From IBM developerWorks. June 2001. ['In the first installment of his new column, Kevin Williams describes the benefits of using archetypes in XML Schema designs for data and provides some concrete examples. He discusses both simple and complex types, and some advantages of using each. Code samples in XML Schema are provided.'] "In my turn on the Soapbox, I mentioned in passing how archetypes can be used in XML Schema designs for data to significantly minimize the coding and maintenance effort required for a project, and to reduce the likelihood of cut-and-paste errors. In this column, I'm going to give you some examples of the use of archetypes in XML schemas for data, and show just where the benefits lie. What are archetypes? Archetypes are common definitions that can be shared across different elements in your XML schemas. In earlier versions of the XML Schema specification, archetypes had their own declarations; in the released version, however, 'archetypes' are implemented using the simpleType and complexType elements. Let's take a look at some examples of each. Simple archetypes are created by extending the built-in datatypes provided by XML Schema. The allowable values for the type may be constrained by so-called facets, a fancy term for the different parameters that may be set for each built-in datatype. It's also possible to create a simple type by defining a union of two other datatypes or by creating a list of values that correspond to some other datatype. For our purposes, however, the restrictive declaration of simple types is the most interesting. Let's take a look at some examples... This installment has taken a look at the use of archetypes in the design of XML schemas. You've seen that judicious use of archetypes, together with smart naming conventions, can make schemas shorter and easier to maintain. There's an additional benefit to using archetypes -- a little trick to ensure consistent styling of your information..." Note the reference to the author's book Professional XML Schemas [ISBN: 1861005474], from Wrox Press; released now/soon. For schema description and references, see "XML Schemas."

  • [June 26, 2001] "DAML Processing in Jess (DAMLJessKB)." From Joe Kopena. 2001-06-26. "This software is intended to facilitate reading DAML files, interpreting the information as per the DAML language, and allowing the user to query on that information. In this software we leverage the existing RDF API (SiRPAC) to read in the DAML file as a collection of RDF triples. We use Jess (Java Expert System Shell) as a forward chaining production system which carries out the rules of the DAML language. The core Jess language is compatible with CLIPS and this work might be portable to that system. A similar approach is taken by DAML API, they also hook RDF API into Jess. However the bridge they use between the two is a little different and at the moment less complete in at least the publicly available version. The basic flow of this library is as follows: (1) Read in Jess rules and facts representing the DAML language; (2) Have RDF API read in the DAML file and create SVO triples; (3) Take triples and assert into Jess' rete network in VSO form, with some slight escaping of literals and translation; (4) Have Jess apply the rules of the language to the data; (5) Apply the agent's rules, queries, etc. The bridge between RDF API and Jess is very simple: each triple is inserted more or less as-is into the knowledge base. A not insignificant help in this is Jess' relatively loose syntax constraints, very few characters need to be escaped to be valid. In Jess these are referred to as ordered slots. An alternative would be to build Jess' unordered (named) slots. This would require more preprocessing of the triples to determine relations. It might be more efficient but also might break down due to the cumulative nature of DAML/RDF -- facts about an object can be asserted at any time and don't neccesarily follow the template. In DAML/RDF it is ok to assert an arbitrary relation about an object at any time unless specifically stated otherwise. This might not mesh well with Jess' templating mechanism. We generally follow the methodology of the DAML/RDF/RDF-S KIF Axiomatization in building our rules. Each fact is asserted as the sentence (PropertyValue <predicate> <subject> <object>). This is sufficient to assert any RDF/DAML information, since all constructs boil down to an underlying set of triples..." Note from Joe Kopena on '', 2001-06-26: "At the moment I'm working on a project using DAML to exchange information between units (arguably agents). I'm using it to encode my data and for the ontologies which express what the data means. Recently I've been doing some work on taking in DAML through RDF API and feeding it into Jess (Java Expert System Shell) to be processed. The result is that the data gets treated as DAML as opposed to just RDF triples. . . I'm using DAML in very simple ways at the moment (not even comparable in a number of ways to RDF Schema), but the number of constructs processed is growing as I need them and the system seems fairly useful already. Comments, suggestions, questions, discussion are all welcome..." See "DARPA Agent Mark Up Language (DAML)."

  • [June 26, 2001] "An Axiomatic Semantics for RDF, RDF-S, and DAML+OIL." By Richard Fikesand Deborah L. McGuinness. (Knowledge Systems Laboratory, Computer Science Department, Stanford University). March 1, 2001. "This document provides an axiomatization for the Resource Description Framework (RDF), RDF Schema (RDF-S), and DAML+OIL by specifying a mapping of a set of descriptions in any one of these languages into a logical theory expressed in first-order predicate calculus. The basic claim of this paper is that the logical theory produced by the mapping specified herein of a set of such descriptions is logically equivalent to the intended meaning of that set of descriptions. Providing a means of translating RDF, RDF-S, and DAML+OIL descriptions into a first-order predicate calculus logical theory not only specifies the intended meaning of the descriptions, but also produces a representation of the descriptions from which inferences can automatically be made using traditional automatic theorem provers and problem solvers. For example, the DAML+OIL axioms enable a reasoner to infer from the two statements 'Class Male and class Female are disjointWith.' and 'John is type Male.' that the statement 'John is type Female.' is false. The mapping into predicate calculus consists of a simple rule for translating RDF statements into first-order relational sentences and a set of first-order logic axioms that restrict the allowable interpretations of the non-logical symbols (i.e., relations, functions, and constants) in each language. Since RDF-S and DAML+OIL are both vocabularies of non-logical symbols added to RDF, the translation of RDF statements is sufficient for translating RDF-S and DAML+OIL as well. The axioms are written in ANSI Knowledge Interchange Format (KIF), which is a proposed ANSI standard. The axioms use standard first-order logic constructs plus KIF-specific relations and functions dealing with lists.[1] Lists as objects in the domain of discourse are needed in order to axiomatize RDF containers and the DAML+OIL properties dealing with cardinality..." See "DARPA Agent Mark Up Language (DAML)."

  • [June 26, 2001] "Department of Defense Adopts StarOffice." By Peter Galli. In eWEEK (June 25, 2001). "In a significant win for open source desktop productivity suites, Sun Microsystems Inc. today announced that the U.S. Defense Information Systems Agency (DISA) would implement up to 25,000 units of its StarOffice 5.2 software. StarOffice, Sun's open source productivity application suite that includes word processing, spreadsheets, presentations, and database applications for the Solaris, Windows and Linux platforms, would replace Applix on more than 10,000 of DISA's Unix workstations at 600 client organizations worldwide, said Susan Grabau, the product line manager for StarOffice. DISA has already begun implementing StarOffice as the automation Unix desktop solution for its Global Command and Control System, she said. The deal had not cost DISA anything as there was no license fee associated with StarOffice, and the federal government already had extensive support contracts with Sun which would cover this implementation, she said...Sun is also on track to release StarOffice 6 later this year. Iyer Venkatesan, the senior product manager for StarOffice, told eWeek in late April that StarOffice 6 would include the recently finalized XML file format specifications, which would make file sharing far easier. 'Files will now be able to be saved in either an XML format or in the current binary format. The lets users easily share information across applications, and will simplify the importing and exporting of files from different programs while greatly improving file sharing and readability,' he said." See references in "StarOffice XML File Format."

  • [June 26, 2001] "An Introduction to XQuery. A look at the W3C's proposed standard for an XML query language." By Howard Katz (Fatdog Software). From IBM developerWorks. June 2001. ['Howard Katz introduces the W3C's XQuery specification, currently winding its way toward Recommendation status after emerging from a long incubation period behind closed doors. The complex specification consists of six separate working drafts, with more to come. This article provides some background history, a road map into the documentation, and an overview of some of the technical issues involved in the specification. A sidebar takes a quick look at some key features of XQuery's surface syntax. Code samples demonstrate the difference between XQuery and XQueryX and show examples of the surface syntax.'] "The W3C's XQuery specification has been in the works for a long time. The initial query language workshop that kicked things off was hosted by the W3C in Boston in December 1998. Invited representatives from industry, academia, and the research community at the workshop had an opportunity to present their views on the features and requirements they considered important in a query language for XML. The 66 presentations, which are all available online, came mainly from members of two very distinct constituencies: those working primarily in the domain of XML as-document (largely reflecting XML's original roots in SGML), and those working with XML as-data -- the latter largely reflecting XML's ever-increasing presence in the middleware realm, front-ending traditional relational databases. The working group is large by W3C standards (I'm told that only the Protocol Working Group has a larger membership). Its composition of some 30-odd member companies reflects the views of both constituencies. What's now starting to coalesce into final form is an XML query language standard that very ably manages to represent the needs and perspectives of both communities. The key component of XQuery that will be most familiar to XML users is XPath, itself a W3C specification. A solitary XPath location path standing on its own (//book/editor meaning 'find all book editors in the current collection') is perfectly valid XQuery. On the data side, XQuery's SQL-like appearance and capabilities will be both welcome and familiar to those coming in from the relational side of the world..." See references in (1) "XML Syntax for XQuery 1.0 (XQueryX) Published as W3C Working Draft" and (2) "XML and Query Languages."

  • [June 26, 2001] "Users Seek Web Services Clarity." By Jack McCarthy, Tom Sullivan, Eugene Grygo, and Cathleen Moore. In InfoWorld (June 22, 2001). "While industry vendors climb over one another to get to the top of the Web services heap, users are opting for caution until critical technology and business issues are resolved. Concerns about hazy pricing and potential interoperability problems have surfaced as vendors dash to differentiate themselves in the standards race. But far and away the biggest question looms over security... Stumbling blocks or not, major vendors this month plugged Web services and plowed full-steam ahead with initiatives. This week, Microsoft heralded the second beta of Visual Studio.NET and the .NET Framework, its tools for building Web services. Microsoft Chairman Bill Gates described Visual Studio.NET as the centerpiece development product of the .NET strategy. Sun Microsystems recently unveiled Sun Open Net Environment and this week teamed with Oracle to offer a kit for moving Windows code, data, and applications to Java 2 Enterprise Edition (J2EE). Lotus Development embraced the model by unveiling Workflow 3.0 to offer a graphical system for managing business processes that integrate with standards-compliant Web-based applications. Debuting at this week's DevCon show, the Workflow upgrade includes support for Java APIs, XML, and other standards, allowing developers to easily build Internet-based workflow applications. Available this fall, Workflow will also offer Lotus Sametime instant messaging and support for Linux. Lotus parent company IBM and Hewlett-Packard are also on board with the WebSphere application server and Core Services Framework, respectively. Analysts say that momentum is building but that users have time to sort through the hype and discover how Web services can benefit them... Behind the growing interest in Web services are the promises of cost savings in application development as well as more powerful e-business interactions when business processes are exposed. The model has already attracted many enterprises to set up limited systems as they wait for Web services to evolve. Ahead of the curve, Dollar Rent A Car Systems, based in Tulsa, Okla., has been one of the early adopters of Web services. The company set up a link from Southwest Airlines' Web site to Dollar's reservation system using Microsoft's SOAP (Simple Object Access Protocol) Toolkit and a Windows 2000 Server. Visitors can now rent a car from Dollar without leaving Southwest's site... The standards debate remains another unresolved Web services issue. XML, UDDI (Universal Description, Discovery, and Integration), SOAP, and WSDL (Web Services Description Language) 'are the Four Horsemen of Web services; everybody loves them,' said Dana Gardner, an analyst at Aberdeen Group in Boston. But the evolution of the standards will parallel what has occurred in other technologies in that 'there will be less agreement as people look for differentiation,' Gardner added."

  • [June 25, 2001] "E-book Project Highlights Role of DOI in Selling Digital Content." By Mark Walter. In Seybold Report: Analyzing Publishing Technology [ISSN: 1533-9211] Volume 1, Number 6 (June 18, 2001), pages 8-12. ['As standards for numbering and metadata come into focus, the "bar code for digital content" will be grease for the e-commerce distribution chain driving sales of digital goods.'] "The work of proving the viability of the DOI in a commercial e-book setting is the task of the DOI-EB initiative, and the first fruits of this pioneering project were unveiled late last month at a panel at the Book Expo America show in Chicago... The importance of the DOI-EB project is that it creates a public demo that helps educate the community about how DOIs can operate in the context of e-books. At the same time, it fleshes out the improvements that must be made to its system for all parties -- publishers, retailers and distributors -- to adopt the DOI. The project also has implications for fledgling efforts to get cross-vendor compatibility in digital rights management (DRM). Of the three areas where DRM vendors agree standardization can occur in the near term -- numbering, metadata and rights language -- the DOI-EB project identifies answers to two of those. Bob Bolick, the director of new business development at McGraw-Hill Professional and a leading participant in several of the DRM working groups, pointed out the importance of the project in that context: 'Because an identifier standard and metadata standard are key to achieving some form of interoperability among digital rights management systems and e-book formats, this DOI-EB work strikes me as one of the most important standards efforts occurring in our industry this year.' The DOI Foundation has also anounced its support for efforts to extend the original Indecs work to rights-related terms... it's important to look at how the DOI dovetails with other standards efforts and technology developments taking place on the Net. Here, too, we see encouraging signs. Norman Paskin, the IDF's director, has been an active ambassador for the DOI, acting as a liaison with rights-management committees in the Internet Engineering Task Force, WorldWide Web Consortium and MPEG. DOIs already can be expressed as Universal Resource Names (URNs), the IETF's syntax for generic resources, and the DOI is compatible with OpenURL, a proposed syntax for embedding parameters -- such as identifiers and metadata -- into hyperlinks. Paskin said he expects the DOI to shortly issue the DOI Namespace, its data dictionary for about 800 metadata elements spanning e-books, journals and audio and video material. The DOI has also started a working group to develop a services definition interface that would make DOI services available to a variety of Web-enabled systems. In short, although DOI was initiated by commercial publishers to help them sell intellectual property, its implementation is being carefully crafted to complement other standards and technology developments being developed for the Web and Internet at large." See also: "XML and Digital Rights Management (DRM)."

  • [June 25, 2001] "Government Data Standards Catalogue. Volume 1 - General Principles." By [UK] e-Government Interoperability Framework (e-GIF). Issue: 0.5 For Public Consultation, 15/05/01. "The [UK] e-Government Interoperability Framework (e-GIF) mandates the adoption of XML and the development of XML schemas as the cornerstone of the government interoperability and integration strategy. A key element in the development of XML schemas is an agreed set of data standards. The Government Data Standards Catalogue sets out the rationale, approach and rules for setting and agreeing the set of Government Data Standards (GDS) to be used in the schemas and other interchange processes. It also contains the standards agreed to date. These standards are also recommended for data storage at the business level. The Catalogue comprises 3 volumes: Volume 1 sets out the general principles, i.e., the rationale, approach and rules for setting standards; Volume 2 sets out the Data Types standards; Volume 3 sets out the Data Items standards. References: see "e-Government Interoperability Framework (e-GIF)." [source]

  • [June 25, 2001] "Government Data Standards Catalogue. Volume 2 - Data Types Standards. By [UK] e-Government Interoperability Framework (e-GIF). Issue: 0.5 For Public Consultation, 15/05/01. See previous entry for description. Data Types Examples: "Amount Sterling, BS7666 Address, Date, E-Mail Address, Forename, Individual Full Name, International Postal Address, Name Suffix, Postcode, Requested Name, Surname, Time, Title, UK Postal Address, UK Telephone Number." Volume 3 is not yet published (2001-06-25). References: see "e-Government Interoperability Framework (e-GIF)." [source]

  • [June 25, 2001] "OASIS Security Services Technical Committee Glossary.". Reference: 'draft-sstc-ftf3-glossary-00' (incorporates draft-sstc-glossary-00). 20-June-2001. 23 pages. "This document comprises an overall glossary for the OASIS Security Services Technical Committee (SSTC) and its subgroups. Individual SSTC documents and/or subgroup documents may either reference this document and/or import select subsets of terms. The sources for the terms and definitions herein are referenced in Appendix A. Please refer to those sources for definitions of terms not explicitly defined here. Where possible and convenient, hypertext links directly to definitions within the aforementioned sources are included. Some definitions are quoted directly from the sources, some are modified to fit the context of the OASIS SSTC (aka SAML) effort..." See (1) "Draft Documents Available for the Security Assertion Markup Language (SAML)" and (2) "Security Assertion Markup Language (SAML)."

  • [June 23, 2001] "Requirements for a Rights Data Dictionary and Rights Expression Language." In response to ISO/IEC JTC1/SC29/WG11 N4044: 'Reissue of the Call for Requirements for a Rights Data Dictionary and a Rights Expression Language -- MPEG-21, March 2001.' By [David Parrott] Reuters. 1 June 2001. Version 1.0. 62 pages. "This document describes Reuters requirements for a Rights Expression Language and Rights Data Dictionary (RDD-REL) in response to the call for requirements made by the MPEG-21 Requirements Committee... Digital Rights Management has for some time been closely linked with the technique of encrypting data files and managing the distribution and application of cryptographic keys in order to limit who can access the content and the manner in which access can take place. That technique is more appropriately labelled 'Digital Rights Enforcement' since it is more about enforcing rights than specifying and managing them. Moreover, even when enforcement is the goal, one might consider a whole array of implementation techniques which may or may not rely on encryption technology. In truth, the management of rights in the digital domain is far wider than the rather restrictive case outlined above. Rights (and obligations) management touches on numerous areas close to the hearts of many companies dealing in intellectual property (IP). Laying enforcement issues to one side, the value cannot be understated of simply being able to describe in a machine readable, standard format, the requirements of an IP owner on all other participants in the value chain. Those requirements can be described, broadly, as Rights and Obligations...A basic requirement for Rights and Obligations management systems to be successful is the ability to communicate Rights and Obligations in a standard form. Machine-readability is key to the dynamic specification of electronic contracts which is, in turn, critical to the dynamic construction of value-chains. A single Rights Expression Language should be common to all aspects of commercial activity. In that way alone, straight through rules processing is made possible. Rights and obligations can be created by different participants in the value-chain and layered upon each other. Data from different sources can be mixed freely without compromising the IP Rights of any of the rights holders. At the same time, the rights of individuals and downstream recipients of content must be protected..." Document source: see the posting from David Parrott (Reuters Limited) of 21-June-2001 to the XACML TC list, and the .ZIP file. Note relevant to the XACML TC discussion: "... I am forwarding FYI Reuters response to MPEG-21's call for requirements for their Rights Data Dictionary and Rights Expression Language. A key point to note is that the response describes a number of features to be included in MPEG's rights expression language that overlap with many of the "differences" I recently heard Simon list between XACML and DRM. These include: (1) fine granularity; (2) the use of rights expressions as policies to drive all manner of enforcement implementations (e.g., file system access, database access, services such as CORBA access, etc); (3) dynamically changing rights (not limited to static objects); (4) predicating rights of access on complex contextual information. There are many others. It would be useful to get people's thoughts on just how close the XACML and MPEG-21 activities are likely to become..." For background, see: (1) "MPEG Rights Expression Language (REL)"; (2) "Extensible Access Control Markup Language (XACML)"; and (3) "XML and Digital Rights Management (DRM)."

  • [June 23, 2001] "Digital Rights Management (DRM) Architectures." By Renato Iannella (Chief Scientist, IPR Systems). In D-Lib Magazine [ISSN: 1082-9873] Volume 7, Number 6 (June 2001). "Digital Rights Management poses one of the greatest challenges for content communities in this digital age. Traditional rights management of physical materials benefited from the materials' physicality as this provided some barrier to unauthorized exploitation of content. However, today we already see serious breaches of copyright law because of the ease with which digital files can be copied and transmitted. Previously, Digital Rights Management (DRM) focused on security and encryption as a means of solving the issue of unauthorized copying, that is, lock the content and limit its distribution to only those who pay. This was the first-generation of DRM, and it represented a substantial narrowing of the real and broader capabilities of DRM. The second-generation of DRM covers the description, identification, trading, protection, monitoring and tracking of all forms of rights usages over both tangible and intangible assets including management of rights holders relationships. Additionally, it is important to note that DRM is the 'digital management of rights' and not the 'management of digital rights'. That is, DRM manages all rights, not only the rights applicable to permissions over digital content. In designing and implementing DRM systems, there are two critical architectures to consider. The first is the Functional Architecture, which covers the high-level modules or components of the DRM system that together provide an end-to-end management of rights. The second critical architecture is the Information Architecture, which covers the modeling of the entities within a DRM system as well as their relationships. (There are many other architectural layers that also need to be considered, such as the Conceptual, Module, Execution, and Code layers, but these architectures will not be discussed in this article.) This article discusses the Functional and Information Architecture domains and provides a summary of the current state of DRM technologies and information architectures... For an example of a rights language, see the Open Digital Rights Language. ODRL lists the many potential terms for permissions, constraints, and obligations as well as the rights holder agreements. As such terms may vary across sectors, rights languages should be modeled to allow the terms to be managed via a Data Dictionary and expressed via the language... Second generation DRM software is now providing some of the Architectures described in this article in deployed solutions. A typical example from the E-book sector is the OzAuthors online ebook store. OzAuthors is a service provided by the Australian Society of Authors in a joint venture with IPR Systems. Their goal is to provide an easy way for Society members (including Authors and Publishers) to provide their content (ebooks) to the market place at low cost and with maximum royalties to content owners [example]... All of this information is encoded in XML using the ODRL rights language. This encoding will enable the exchange of information with other ebook vendors who support the same language semantics, and will set the stage for complete and automatic interoperability... DRM standardization is now occurring in a number of open organizations. The OpenEBook Forum and the MPEG group are leading the charge for the ebook and multimedia sectors. The Internet Engineering Task Force [IETF] has also commenced work on lower level DRM issues, and the World Wide Web Consortium held a DRM workshop recently. Their work will be important for the entire DRM sector, and it is also important that all communities be heard during these standardization processes in industry and sector-neutral standards organizations." See: "XML and Digital Rights Management (DRM)."

  • [June 23, 2001] "A Digital Object Approach to Interoperable Rights Management: Finely-grained Policy Enforcement Enabled by a Digital Object Infrastructure." By John S. Erickson (Hewlett-Packard Laboratories). In D-Lib Magazine [ISSN: 1082-9873] Volume 7, Number 6 (June 2001). "This article builds upon previous work in the areas of access control for digital information objects; models for cross-organizational authentication and access control; DOI-based applications and services; and ongoing efforts to establish interoperability mechanisms for digital rights management (DRM) technologies (e.g., eBooks). It also serves as a follow-up to my April 2001 D-Lib Magazine article, where I argued that the introduction of additional levels of abstraction (or logical descriptions) above the current generation of DRM technologies could facilitate various levels of interoperability and new service capabilities. Here I advocate encapsulating data structures of heterogeneous information items as digital objects and providing them with a uniform service interface. I suggest adopting a generic information object services layer on top of existing, interoperable protocol stacks. I also argue that a uniform digital object services layer properly rests above existing layers for remote method invocation, including IIOP, XML-RPC or SOAP. Many of the components suggested within this article are not new. What I believe is new is the call for an identifiable information object services layer, the identification of an application layer above it, and the clear mapping of an acceptable cross-organizational authentication and access control model onto digital object services... One aspect of the previously missing infrastructure was an object serialization, or structured storage, model that could be readily adopted across applications and platforms. We now have that model with the emergence of XML. In general, an advantage that data models with explicit structure have is that they naturally accommodate mechanisms for binding policy expressions to structural sub-trees within the information object hierarchies they represent. My focus here is on fine-grained policy expression and enforcement. Or, perhaps more accurately, policy expression at an appropriate level of granularity, since it is clear that not all object behaviors may require uniquely expressed policies. Generally, policy expression concerns the creation of tuples relating subjects, objects and actions, where in this context a 'subject' can be (loosely) thought of as a requestor for a service, an 'object' as a specific service (or behavior) of an information object, and an 'action' as some permissible action..." See: "XML and Digital Rights Management (DRM)."

  • [June 22, 2001] "CTO Forum: Ballmer Pushes .NET, XML for Web Services." By Matt Berger. In InfoWorld (June 22, 2001). "Speaking to a room full of chief technology officers and other industry executives about Microsoft's new vision for building and delivering its software, Chief Executive Officer Steve Ballmer attempted to explain why the company's .NET initiative and all the software products built around it will enable the next generation of business and the Internet. Software will no longer be packaged and sold to customers on a CD, and applications will no longer be static programs that sit on a desktop or run off of a server, Ballmer said during a speech Thursday at the InfoWorld CTO Forum here. Instead, he said, they will be delivered over the Internet as services that allow customers to interact with them dynamically... Using many of the same phrases from earlier presentations on the subject, Ballmer called XML the 'lingua franca of the Internet,' saying it will drive the evolution of the Internet and Web services. 'This is the XML Revolution,' he said. 'I think this will be as big or even bigger than any revolution that preceded it. 'This is why XML lies at the heart of Microsoft's .NET initiative, Ballmer said, adding that Microsoft has begun to incorporate support for XML in every part of its product line, from servers to desktop software to development tools, and the company is trying to convince partners, customers, and developers to do the same. It signals a new strategy from Microsoft that it is betting all of its chips on XML as the standard for developing its software to deliver new applications and Web services, said Steve Jurvetson, managing director of Silicon Valley venture firm Draper Fisher Jurvetson, who attended the event. Microsoft's decision to embrace XML, as well as support from other parts of the software industry, will pay off in the long run, said Tim Bray, the co-inventor of XML, who attended the CTO Forum as a representative of his new company Systems... XML is built into Microsoft's forthcoming Windows XP operating system, for example. The latest release of its Office productivity suite, Office XP, also incorporates hints of how Microsoft plans to use XML, such as its Smart Tags function, which delivers information from the Web via hyperlinks within applications. The company has also made XML an integral part of its Visual Studio.NET developer products and the .NET Framework. Microsoft delivered beta 2 versions of both of those products to developers this week at its TechEd conference in Atlanta..."

  • [June 22, 2001] Security Assertions Markup Language. Core Assertion Architecture. Version 09. 20-June-2001. Edited by P. Hallam-Baker. Contributions by Phillip Hallam-Baker, Tim Moses, Bob Morgan, Carlisle Adams, Charles Knouse, David Orchard, Eve Maler, Irving Reid, Jeff Hodges, Marlena Erdos, Nigel Edwards, and Prateek Mishra. From the OASIS SSTC and SAML work. "This document contains two sections. Section 1 contains the text proposed by the Core Assertions and Protocol group for the Core Assertions section of the SAML. Section 2 contains references to the material cited in the text. SAML specifies several different types of assertion for different purposes, these are: (1) Authentication Assertion: An authentication assertion asserts that the issuer has authenticated the specified subject. (2) Attribute Assertion: An attribute assertion asserts that the specified subject has the specified attribute(s). Attributes may be specified by means of a URI or through an extension schema that defines structured attributes. (3) Decision Assertion: A decision assertion reports the result of the specified authorization request. (4) Authorization Assertion: An authorization assertion asserts that a subject has been granted specific permissions to access one or more resources. The different types of SAML assertion are encoded in a common XML package, which at a minimum consists of: (1) Basic Information: Each assertion must specify a unique identifier that serves as a name for the assertion. In addition an assertion may specify the date and time of issue and the time interval for which the assertion is valid. (2) Claims: The claims made by the assertion. This document describes the use of assertions to make claims for Authorization and Key Delegation applications. In addition an assertion may contain the following additional elements. An SAML client is not required to support processing of any element contained in an additional element with the sole exception that an SAML client must reject any assertion containing a 'Conditions' element that is not supported. (3) Conditions: The assertion status may be subject to conditions. The status of the assertion might be dependent on additional information from a validation service. The assertion may be dependent on other assertions being valid. The assertion may only be valid if the relying party is a member of a particular audience. (4) Advice: Assertions may contain additional information as advice. The advice element may be used to specify the assertions that were used to make a policy decision. The SAML assertion package is designed to facilitate reuse in other specifications. For this reason XML elements specific to the management of authentication and authorization data are expressed as claims. Possible additional applications of the assertion package format include management of embedded trust roots [XTASS] and authorization policy information [XACML]..." See: "Security Assertion Markup Language (SAML)."

  • [June 22, 2001] "Shibboleth Specification." DRAFT v1.0. Shibboleth Working Group Specification Document. 'draft-internet2-shibboleth-specification-00'. May 25, 2001. "This document provides the specifications for the Shibboleth system, including interfaces, message specifications, etc. This document should define Shibboleth in sufficient detail that (1) someone can implement the system without having to guess or interpret what was intended, and (2) separate but compliant implementations are guaranteed to interoperate.... The Shibboleth Model differs from the SAML model in a several key ways. It can be described as: (1) The SHIRE uses the WAYF Service to locate the User Home Organization. The WAYF produces a BLAH. (2) The SHIRE will send an Attribute Query Handle Request to the Handle Service (HS) to obtain a reference to the user. The HS will use the local web authentication mechanism to authenticate the browser user. However, instead of generating a Name Assertion, the HS will generate an attribute query handle (AQH - an opaque user handle), and return it in an Attribute Query Handle Response. Only the Attribute Authority will be able to map the AQH to a specific user. (3) The SHAR will send an Attribute Query Message to the Attribute Authority. The SHAR cannot ask for specific attributes; rather, the query should be understood to mean "give me all the attributes you can for this user for this target". The Attribute Authority will return an Attribute Query Reponse, containing assertions for all of the attributes it is authorized to release for this target. The Attribute Authority will likely obtain the attributes from the origin site's pre-existing Attribute Repository (e.g., Directory). (4) The Resource Manager will make an access decision, based on the supplied attributes, the target resource, and the requested operation. It will then either grant or deny access. It will not produce an Authorization Decision Assertion..." See the "Definition and explanation of SHAR/AA attribute request and response messages" with W3C XML Schema: "This document describes possible XML message formats for Shibboleth attribute request and response messages passed directly or indirectly between the SHAR and AA components of the Shibboleth architecture. The formats are expressed in the XML Schema Definition language." [Shibboleth, a project of MACE (Middleware Architecture Committee for Education), is investigating technology to support inter-institutional authentication and authorization for access to Web pages. Our intent is to support, as much as possible, the heterogeneous security systems in use on campuses today, rather than mandating use of particular schemes like Kerberos or X.509-based PKI. The project will produce an architectural analysis of the issues involved in providing such inter-institutional services, given current campus realities; it will also produce a pilot implementation to demonstrate the concepts."]

  • [June 22, 2001] Network Data Management - Usage (NDM-U) For IP-Based Services. Version 2.5. April 12, 2001. 62 pages. Chief Editor: Steve Cotton (Cotton Management Consulting). "This document, in conjunction with the referenced Service Definition documents, is intended to specify technical information that is sufficient for practical implementations of interchange of usage data among service elements participating in the delivery of IP-based services, either within a single enterprise or across multiple enterprises. The IPDR organization intends to submit this specification to selected accredited organizations for consideration as an approved standard. This specification is divided into three major chapters: (1) IPDR Reference Model - a definition of the abstract and operational relationships between entities involved in the generation, recording, storage, transport, and processing of usage attributes. (2) Business Requirements - a definition of business requirements to be addressed by the protocol specification and specific scenarios for the major process flows anticipated in actual application. (3) Protocol - the notation, data unit syntax, and dynamic procedures involved in the operation of the interfaces specified in the reference model. IPDR stands for the Internet Protocol Detail Record, the name comes from the traditional telecom term CDR (Call Detail Record), used to record information about usage activity within the telecom infrastructure (such as a call completion). NDM-U stands for Network Data Management - Usage. It refers to a functional operation within the Telecom Management Forum's Telecom Operations Map. The NDM function collects data from devices and services in a service providers network. Usage refers to the type of data which is the focus of this document. Introduced in NDM-U 2.0, Service Specifications define the fields that should be present in IPDRDocs for each class of service. For example, the usage data captured for a Voice over IP call is very different from a query made to a Content-hosting Application Service Provider, so each requires its own Service Specification. The formal definition language for Service Specifications is XML DTDs [and XML Schema]. Service Specifications are updated or inaugurated to reflect changes in industry practice and new-generation capabilities that can roll out every month in the Internet world. Version 2 summary: "This revision introduces a major upgrade of the syntax notation of the protocol, namely XML Schema versus XML 1.0. This upgrade has been introduced to allow the protocol to specify strong typing of the usage attributes, thus conforming to the business requirements for data integrity. In addition, the dynamic operation of IDPR document transport has been specified, using the consensus choice for best conforming to business requirements, Simple Object Access Protocol (SOAP). Finally, the usage attributes for each of the services defined in the Business Requirements chapter are now formally specified, using the XML Schema definition supplied in the Protocol chapter." See also the [extracted] XML schema, perhaps also online as a separate document. References: " Network Data Management Usage Specification."

  • [June 22, 2001] "Soapbox: Why XML Schema beats DTDs hands-down for data. A look at some data features of XML Schema" By Kevin Williams (Chief XML Architect, Equient - a division of Veridian). From IBM developerWorks. June 2001. ['In his turn on the Soapbox, info-management developer and author Kevin Williams tells why he's sold on XML Schema for the structural definition of XML documents for data. He looks at four features of XML Schema that are particularly suited to data representation, and he shows some examples of each. Code samples include XSD schemas and schema fragments.'] "As you're no doubt aware, the W3C recently promoted the XML Schema specification to Recommendation status, making that spec the XML structural definition language of choice. While most people find the specifications a little hard to read, the jargon conceals a very strong set of features, especially for those of us who are designing XML structures for data. I'd like to take a look at a few of those features. Strong typing is probably the biggest advantage XML Schema has over DTDs, and it is the aspect of XML Schema you've heard the most about. In a DTD, you don't have a whole lot of choices for constraining the allowable content of your elements and attributes... [Conclusion:] I've taken a brief look at some aspects of XML Schema that make schemas much better than DTDs for the definition of XML structures for data. While DTDs are likely to be around for a while yet (there are plenty of legacy documents that still rely on them for their structural definition), support for XML Schema is quickly being implemented for all the major XML software offerings. In the following months, I'll take a look at some of the ideas I've laid out here in greater depth in my forthcoming column." Article also in PDF format. For schema description and references, see "XML Schemas."

  • [June 21, 2001] "Progressing the UN/CEFACT e-Business Standards Development Strategy." From United Nations Centre for Trade Facilitation And Electronic Business (UN/CEFACT). UN/CEFACT Steering Group (CSG) E-Business Team. General CSG eBTeam/2001/EBT0001 16-June-2001. "The UN/CEFACT Plenary endorsed the proposed strategy for achieving its e-Business vision 1 at its March 2001 meeting. Subsequently, the UN/CEFACT Steering Group (CSG) and OASIS announced the successful completion of the development stage of ebXML and reached an agreement for the allocation of responsibility for maintenance and further development of ebXML specifications. Under the agreement, UN/CEFACT will be responsible for Business Processes and Core Components. OASIS will be responsible for maintaining and advancing a series of technical specifications. Jointly, UN/CEFACT and OASIS will be responsible for marketing and developing the technical architecture specification. The CSG believe the most effective way forward is to bring together the expertise and resources of the UN/EDIFACT Working Group (EWG), the Business Process Analysis Working Group (BPAWG), the Codes Working Group (CDWG), and the Business Process and Core Component work from the ebXML initiative. The result is the consolidation of all these efforts into a new Working Group, the e-Business Working Group, that will be able to address the needs of all its users. This initiative will require considerable planning and consultation if it is to achieve its objectives within the projected time scale. To lead this process, the CSG has established a special e-Business Team to undertake the initial coordination and development work. This paper is the first deliverable of the e-Business Team. It provides the description and responsibilities of the new e-Business Working Group. ['This paper is intended to provide a notional description of the proposed e-Business Working Group and likely responsibilities. It is by no means fully inclusive of all requirements that will eventually be identified. It is intended to establish a baseline and context within which meaningful discussion and alternative proposals can be developed. All aspects of the organisation as well as the various duties will be confirmed through the approval of Mandates and Terms of Reference for the e-Business Working Group and each subgroup.'] See the communiqué from Ray Walker, "UN/CEFACT's Proposal for a New Electronic Business Working Group." References: "Electronic Business XML Initiative (ebXML)."

  • [June 21, 2001] "Augmented Metadata in XHTML." Sun Microsystems Working Draft 21-June-2001. Edited by Murray Altheim (Sun Microsystems) and Sean B. Palmer. Draft version for feedback ('work in progress'). Abstract: "This specification describes several minor syntax modifications to XHTML (the XML transformation of HTML) which provide much of the essential functionality required to augment Web pages with metadata as found in published descriptions of the Semantic Web. This augmentation allows Dublin Core metadata, a highly popular standard developed by the library community to be incorporated in Web pages in a way that is compatible with today's Web browsers, and describes a generalized mechanism by which other popular schemas can be used in similar fashion. The metadata can be associated with any XHTML or XML document or document fragment (actually, any addressable resource), internal or external to the document." Detail: "This specification describes three minor modifications to XHTML 1.1 which provide much of the essential functionality required to augment Web pages with schema-characterized metadata, as according to the need expressed in published descriptions of the Semantic Web. Using the extensibility provided by the W3C Recommendation Modularization of XHTML, this specification includes an 'XHTML Augmented Metadata 1.0 DTD' that implements these features. The first two modifications are relatively trivial, in terms of implementation: (1) allow the <meta> element to appear within any block element as metadata about its parent (i.e., any major document component); (2) add an optional href attribute to the <meta> element to allow it to point to any addressable resource. The third modification is to: (3) add a Dublin Core module to XHTML, modifying the content model of the <meta> element to contain its content. [From the post to '': "I've been hesitant to announce this since it's not quite finished, but since you asked, here's a specification in the works that describes how to incorporate Dublin Core metadata within XHTML, so that Web pages can be harvested for their subject, author, etc. content. How this might occur is described in section 5.5.3. You'll note that this doesn't put RDF of any flavour into a Web page. That couldn't be validated, which is one of the requirements of the project, and in terms of being globally useful, allowing every author in the world to create their own flavour of metadata isn't a particularly compelling need; we all need to agree on using the same "carrier" with a small number of controlled vocabularies. Dublin Core fits this bill as a very popular way of capturing a subset of the kinds of metadata described in things I've read about the Semantic Web. There's also a section on how to work this with topic maps..." Related references in (1) "Dublin Core Metadata Initiative (DCMI)" and in (2) "XHTML and 'XML-Based' HTML Modules."

  • [June 21, 2001] "DAML-S: Semantic Markup For Web Services." By David Martin. 2001-05-23 or later. "The Semantic Web should enable greater access not only to content but also to services on the Web. Users and software agents should be able to discover, invoke, compose, and monitor Web resources offering particular services and having particular properties. As part of the DARPA Agent Markup Language program, we have begun to develop an ontology of services, called DAML-S, that will make these functionalities possible. This white paper describes the overall structure of the ontology, the service profile for advertising services, and the process model for the detailed description of the operation of services. We also compare DAML-S with several industry efforts to define standards for characterizing services on the Web... DAML-S is an attempt to provide an ontology, within the framework of the DARPA Agent Markup Language, for describing Web services. It will enable users and software agents to automatically discover, invoke, compose, and monitor Web resources offering services, under specified constraints. We have released an initial version of DAML-S. It can be found at the URL: We expect to enhance it in the future in ways that we have indicated in the paper, and in response to users' experience with it. We believe it will help make the Semantic Web a place where people can not only find out information but also get things done." See the document DAML-S 0.5 Draft Release (May 2001): "This directory contains a draft version of the DAML-S language under development by a group of DAML researchers. We encourage feedback from interested parties. DAML-S is a DAML-based Web service ontology, which supplies Web service providers with a core set of markup language constructs for describing the properties and capabilities of their Web services in unambiguous, computer-intepretable form. DAML-S markup of Web services will facilitate the automation of Web service tasks including automated Web service discovery, execution, composition and interoperation. Following the layered approach to markup language development, the current version of DAML-S builds on top of DAML+OIL (March 2001), and subsequent versions will likely build on top of DAML-L." See "DARPA Agent Mark Up Language (DAML)."

  • [June 20, 2001] "XML Blueberry Requirements." W3C Working Draft 20-June-2001. Edited by John Cowan (Reuters). Latest version URL: Abstract: "This document lists the design principles and requirements for the Blueberry revision of the XML Recommendation, a limited revision of XML 1.0 being developed by the World Wide Web Consortium's XML Core Working Group solely to address character set issues." Detail: "The W3C's XML 1.0 Recommendation was first issued in 1998, and despite the issuance of many errata culminating in a Second Edition of 2001, has remained (by intention) unchanged with respect to what is well-formed XML and what is not. This stability has been extremely useful for interoperability. However, the Unicode Standard on which XML 1.0 relies has not remained static, evolving from version 2.0 to version 3.1. Characters present in Unicode 3.1 but not in Unicode 2.0 may be used in XML character data, but are not allowed in XML names such as element type names, attribute names, processing instruction targets, and so on. In addition, some characters that should have been permitted in XML names were not, due to oversights and inconsistencies in Unicode 2.0. As a result, fully native-language XML markup is not possible in at least the following languages: Amharic, Burmese, Canadian aboriginal languages, Cantonese (Bopomofo script), Cherokee, Dhivehi, Khmer, Mongolian (traditional script), Oromo, Syriac, Tigre, Yi. In addition, Chinese, Japanese, Korean (Hangul script), and Vietnamese can make use of only a limited subset of their complete character repertoires. In addition, XML 1.0 attempts to adapt to the line-end conventions of various modern operating systems, but discriminates against the convention used on IBM and IBM-compatible mainframes. XML 1.0 documents generated on mainframes must either violate the local line-end conventions, or employ otherwise unnecessary translation phases before and after XML parsing and generation. A new XML version, rather than a set of errata to XML 1.0, is being created because the change affects the definition of well-formed documents: XML 1.0 processors must continue to reject documents that contain new characters in XML names or new line-end conventions. It is presumed that the distinction between XML 1.0 and XML Blueberry will be indicated by the XML declaration..." See the 'www-xml-blueberry-comments' mailing list archives and related references in "XML and Unicode."

  • [June 20, 2001] "Microsoft's Ballmer: .NET is About Integration." By Michael Vizard and Mark Jones. In InfoWorld Issue 25 (June 18, 2001), pages 20-22. "As part of an ambitious effort to create an architecture that fosters data and application integration, Microsoft has laid out a broad foundation based on XML technologies that will be marketed under the name of Microsoft.NET. In an interview with InfoWorld Editor in Chief Michael Vizard and West Coast News Editor Mark Jones, Microsoft CEO Steve Ballmer, who will be a keynote speaker at the InfoWorld CTO Forum this week, talks about how he sees this 'bet-the-company' strategy paying off for Microsoft customers and its industry allies. [Q: Why should corporations pay any attention to Microsoft.NET today?] Ballmer: There's a ton of information that is essentially locked in back-office systems today. We want to help [companies] bring that information together in new applications. We want to help them expose the information to the consumer. The way we would propose doing that is to essentially wrap it via XML and then build next-generation applications that pull things together using the XML infrastructure. This is about enterprise application integration. This is about business-to-business. This is about unlocking, getting knowledge of back-office systems to front office. [Q: What's the core business model behind Microsoft.NET?] Ballmer: We will build software, servers, and tools that have .NET-and XML-platform capability built-in, and we will sell those as we sell software today. We will also have a set of services that you should think of as sort of customer-facing as opposed to developer-facing. These will be advanced services for consumers and knowledge workers that use an XML data store that the user has running on the Internet. These additional services on top of that somebody might subscribe to as part of Windows or on top of Windows or on top of Office, etc. We will also charge developers some [sort of] fixed fee to use our services per year because there's real operational costs in serving a developer. But we don't have any model under consideration that calls for transaction fees and that sort of thing... [Q: What are the major points of difference between you and Sun about the role of XML?] Ballmer: XML is a message format, but it also implies a programming model. I don't send you a Java program that you run. I send you an XML message and you send me back an XML message. Yes, it's a data exchange format, but it is also the backbone for the way you write loosely coupled applications that extend one another and complement one another and work together. I don't think Sun gets that, frankly. Or maybe they do get it but strategically it is inopportune for them to get it..."

  • [June 20, 2001] "Microsoft Fires .NET Arrows at Java." By Tom Sullivan and Ed Scannell. In InfoWorld Issue 25 (June 18, 2001), pages 17-20. "Just two weeks after rival Sun Microsystems and its Java partners ballyhooed Web services at JavaOne, Microsoft will fire back this week with its own salvo. The company will use its TechEd developers conference in Atlanta to extol the advantages of integrating enterprise systems with Web services. Bill Gates, Microsoft chairman and chief software architect, will announce the availability of Visual Studio.NET beta 2. The final version, planned to ship later this year, allows developers to create components with native XML interfaces that can interoperate with other Web services. Microsoft will bolster its heavy bet on XML -- via its .NET initiative -- with a demonstration of Yukon, the forthcoming SQL Server version. Yukon's Web services-applicable features include XML processing and the ability to store XML natively in the database. When it ships, Yukon will support multiple languages within the database via the Common Language Runtime (CLRT) so any language that supports CLRT can be stored in the database... [MS'] Flessner will kick off the conference Monday, explaining how XML Web services solve the enterprise integration problem, whether that is system-to-system integration or feeding data from one application to a variety of Internet access devices. "There is important business value that can be derived from connecting directly to your partners and customers," said Barry Goffe, group manager for Microsoft .NET. The company is determined to outgun rivals IBM and Sun by providing a better implementation of XML-based Web services standards in its .NET tools and servers. With its CLRT woven tightly into Visual Studio.NET, Microsoft believes its technology will have broader appeal than Sun's because developers can write in any language they choose, not just Java. Microsoft CEO Steve Ballmer said Microsoft is betting on XML against just one language. 'Java is inadequate and the way that applications will be extended will be by responding to XML messages. It won't be by sending somebody a Java program,' he said. Microsoft's rivals expect it to distort the open standards, notably UDDI (Universal Description, Discovery, and Integration), SOAP (Simple Object Access Protocol), and WSDL (Web Services Description Language)... Other companies are not convinced that the XML-based Web services that any of the vendors are selling offer the best means to build Web services-like functionality. Ameritrade, an online brokerage in Omaha, Neb., is using BEA's Tuxedo at the middleware layer and Java to deliver its brokerage services. 'We are putting components in at the middleware level, as opposed to doing it at the XML level,' said CIO Jim Ditmore..."

  • [June 20, 2001] "RealNetworks Pushes Copyright Initiative." By Melanie Austria Farmer and Jim Hu. In CNET (June 20, 2001). "Streaming-media giant RealNetworks on Wednesday unveiled new technology intended to promote the legal use of copyrighted material over the Web. The company is aiming the software in its RealSystem Media Commerce Suite at media companies and retailers that want to deliver music, movies and other copyrighted material securely over the Web. The software can be tied into existing systems for delivery of digital content. RealNetworks also introduced an initiative to provide a common, open standard--called XMCL, for Extensible Media Commerce Language--that would enable the content to be played on systems from different providers of digital entertainment. Supporters include media and technology notables such as IBM, Napster, InterTrust, Metro-Goldwyn-Mayer, Sony Pictures Digital Entertainment and Sun Microsystems. The moves are likely to heighten the already intense competition between RealNetworks and Microsoft, both of which distribute technology that allows consumers to watch videos or listen to music over the Web... Microsoft countered Wednesday with its own set of announcements. The Redmond, Wash.-based software giant unveiled Microsoft Producer, a system that lets people incorporate Windows Media audio and video technology into their business presentations. In addition, the company said it will begin highlighting how media and entertainment companies such as EMI Recorded Music, Viacom's CBS NewsPath and Lions Gate Entertainment are using its Windows Media digital rights management system... The control of copyrighted materials online falls into the realm of digital rights management, which will play an increasingly important role as online music becomes more popular with consumers. Content producers such as record labels and movie studios have generally acknowledged the Internet as a new way to sell and distribute their works. But the lack of safeguards preventing the unwanted dissemination of their works has made content providers more conscious of copyright abuses on the Internet. Thus, many content companies have proceeded slowly, waiting for a sufficient way to secure their works... The XMCL proposal envisions a way for digital content to be played independently of rights management systems and codecs. Codecs are the mathematical codes that compress large audio files into smaller, more usable packages that can be streamed or downloaded over the Web." See: "Extensible Media Commerce Language (XMCL)."

  • [June 20, 2001] "RealNetworks Unveils Digital Rights Standard, Products." By 'Reuters'. In InternetWeek (June 18, 2001). "Media software maker RealNetworks Inc. on Wednesday launched a new product it says will help entertainment conglomerates manage and track the use of their copyrighted material on new online services they plan to soon roll out. RealNetworks also unveiled an initiative to standardize the delivery of content via the Web in a way that is secure and profitable, marking what analysts said was the Seattle-based company's boldest move yet to tackle a main strength of competing technology by cross-town rival Microsoft Corp... Although the more technical of the announcements, Real's standardization initiative could pose a bigger threat to Microsoft, analysts said. Real's proposed standard is called the eXtensible Media Commerce Language, or XMCL, a media-oriented version of the XML (eXtensible Markup Language) standard that companies like Microsoft are betting heavily on to enable a new generation of Web-based services. Just as XML describes different types of data so different computer systems can talk to each other, XMCL would be a common language for describing the rights and rules for a piece of media like a song or a film, Albertson said... The other pillar of Real's strategy is a product called the RealSystem Media Commerce Suite, which will let online music and video stores easily package, sell and deliver their wares to customers, Albertson said..." See: "Extensible Media Commerce Language (XMCL)."

  • [June 20, 2001] "Big Guns Take Aim at Digital Copyright Management." By Sumner Lemon and Stephen Lawson. In InfoWorld (June 20, 2001). "Backed by some of the biggest names in the online entertainment industry, RealNetworks on Wednesday announced the formation of the XMCL (Extensible Media Commerce Language) Initiative. The company said the initiative will define an open XML-based framework for managing rights to digital media, including applications such as purchase, rental, video-on-demand, and subscription services. The list of companies that are backing the XMCL Initiative includes media-industry heavyweights such as Bertelsmann, EMI Group, Metro-Goldwyn-Mayer Studios (MGM), and AOL Time Warner. But Microsoft, which has its own digital-rights management framework built around the Windows Media Format 7 file format, is conspicuously absent from the list. Digital rights management technologies allow copyright holders to control how movies and songs are used and distributed online. Also, the technologies can restrict the number of times a user can play a certain file, or prevent a file from being copied and passed on to other users. XMCL will simplify rights management by letting content providers define business rules in a standard way, RealNetworks said in a statement. Specific details of how XMCL would be implemented were not made available... RealNetworks announced the XMCL Initiative at the same time it launched its RealSystem Media Commerce Suite, a suite of multimedia content applications. The software will eventually support XMCL and give users the ability to choose from a variety of back-end platforms, the company said. RealSystem Media Commerce Suite can be integrated with third-party digital rights management applications, such as flexible rights management software from InterTrust Technologies, the statement said. InterTrust, which has filed a patent infringement suit against Microsoft over digital-rights management in Windows Media Player, is a member of the XMCL Initiative..." See the announcement, and XMCL main reference page.

  • [June 20, 2001] "Making RDF Syntax Clear. Proposal of a DTD and minor syntax enhancement to RDF, to overcome many of the current practical difficulties." By Rick Jelliffe (Topologi Pty. Ltd.) 2001-06-20. "The current RDF Recommendation is almost impossible to implement because the discipline of a DTD was not used. Consequently, RDF implementations lack exchangability, and most people coming to the RDF Spec (from outside the 'RDF Community') expecting clear description of syntax must go away disappointed/ Furthermore, the advent of RDFS raises compatability issues, in that certain elements are used in RDFS, but are only general names in RDF. This proposal suggests the the situation could be improved by: [1] creating a normative DTD for RDF; [2] state clearly that this DTD (and DTDs that use it) embodies the RDF exchange current RDF exchange XML; [3] reconciles the use of namespaces in RDF with XML Schemas; [4] clarifies RDF's current syntax with standard concepts such as "architectures". I propose that this DTD should be included as a normative part of the RDF specification, and the BNF sections removed or reworded to fit in with it. From the 2001-06-20 posting to '': "I have posted to the RDF comments list a proposal for clarifying RDF syntax. This proposal features a new DTD, used to map between RDF documents and a notional XML Schemas schema using xsi:type. I have been working through RDF specifications and examples again recently, and I am even more convinced than ever that getting the basic discipline of the transfer syntax clear is a prerequisite for RDF becoming useful..." See "Resource Description Framework (RDF)."

  • [June 20, 2001] "Simplified XML Syntax for RDF." By Jonathan Borden (Tufts University School of Medicine, The Open Healthcare Group). June 17, 2001 or later. "A simplified XML syntax for RDF is proposed. Its major differences with RDF 1.0 are: (1) namespace =; (2) defined as tree regular expression; (3) attribute aboutQ="ex:name" accepts QName as value indicating subject; (4) attribute resourceQ="ex:value" accepts QName as value indicating object; (5) rdf:parseType="Resource" is default; (6) The subject or object of a statement may be either a URI reference, a qualified name, a quantified variable, another statement or a collection of statements; (7) ?x defines a quantified variable. XML Syntax: The XML syntax for RDF 1.0 can be described in terms of a tree regular expression. This form can be thought of as expressing constraints on the XML Infoset which arises when parsing an RDF document. The advantage of expressing the syntax in this form over EBNF, is that a tree regular expression (e.g., RELAXNG/TREX schema already takes into account the rules of XML syntax + XML namespaces, e.g., correctly handles namespace prefixes, empty elements, mixed content, whitespace, attribute ordering etc. Such schemata are also described as 'hedge regular expressions' or 'hedge automata' []. The tree regular expression schema for RDF 1.0 is available [online]. This schema handles several proposed updates such as the requirement that the "rdf:about" and "rdf:ID" attributes be prefixed/qualified. A tree regular expression for the proposed syntax is available [online]..." See: "Resource Description Framework (RDF)" and "RELAX NG."

  • [June 20, 2001] "RELAX NG schema for W3C XML Schema." Prepared by Jeni Tennison. Posted to '' on 20-Jun-2001. Comments: "I think that the XML Schema vocabulary is quite a neat showcase for RELAX NG because there are so many co-dependencies between attributes and between attributes and elements. This RELAX NG schema follows the XML Schema for XML Schema to a certain extent (using the same kind of naming scheme) to facilitate comparison between the two. I have also added comments about the ease with which the two handle different aspects of the vocabulary. I've tested it with Jing against various XML Schemas, and it seems to be working, though obviously if anyone spots any bugs please get in touch..." See: "RELAX NG." [cache 2001-06-20]

  • [June 19, 2001] "XML Training Wheels. An XSLT and Java-based tool for producing tutorials -- custom-built for developerWorks but ready to adapt to your own use." By Doug Tidwell (Cyber evangelist, developerWorks). From IBM developerWorks. June 2001. ['See how developerWorks produced a custom XSLT application with Java-based open-source tools that automates the tedious work of producing the developerWorks HTML-based tutorials. Known as the Toot-O-Matic, the tool now is available for any developer either to inspect as an XSLT exemplar or to tailor to your own training needs. Doug Tidwell explains the design goals and the XML document design. He also describes how the 13 code samples demonstrate the techniques used in generating a truckload of HTML panels full of custom graphics, a ZIP file, and two PDF files from a single XML source document.'] "Here at developerWorks, we're pleased to release the source of the Toot-O-Matic, the XML-based tool we use to create our tutorials. In this article, we'll discuss the design decisions we made when we built the tool, talk about how you can use it to write your very own tutorials, and talk a little bit about how the source code is structured. We hope you find the tool useful, and that it will give you some ideas about how to use XML and XSLT style sheets to manipulate structured data in a variety of useful ways... In achieving the final goal of seeing how much we could do with XSLT, the Toot-O-Matic exercises all of the advanced capabilities of XSLT, including multiple input files, multiple output files, and extension functions. Through the style sheets, it converts a single XML document into: (1) A web of interlinked HTML documents; (2) A menu for the entire tutorial; (3) A table of contents for each section of the tutorial; (4) JPEG graphics containing the title text of all sections and the tutorial itself; (5) A letter-sized PDF file; (6) An A4-sized PDF file; (7) A ZIP file containing everything a user needs to run the tutorial on their machine... This discussion of the Toot-O-Matic tool illustrates the full range of outputs that you can generate from a single XML file. The structure of our original XML documents enables us to convert flat textual information into a number of different formats, all of which work together to deliver a single piece of content in a variety of interesting and useful ways. Using this tool, we have shortened and streamlined our development process, making it easier, faster, and cheaper to produce our tutorials. Best of all, everything we've described here is based on open standards and works on any Java-enabled platform. The Toot-O-Matic tool shows how a simple, inexpensive development project can deliver significant results." Also available in PDF format. For related resources, see "Extensible Stylesheet Language (XSL/XSLT)." [cache]

  • [June 19, 2001] "Ferrets and Topic Maps: Knowledge Engineering for an Analytical Engine." By James David Mason Ph.D. Reference: Y/WPP-011. Paper presented at XML Europe 2001 (Paris). "The 'Ferret' analytical engine, developed originally by the Y-12 National Security Complex of the U.S. Department of Energy to seek classified data and associations in documents and present its findings in the light of formal rules, requires a structured information base that represents not just individual facts but a set of implications and a collection of rules. The fundamental knowledge base is evolving towards forms that enhance flexibility and portability. The developers early realized that the knowledge base can be captured in XML by a series of trees that represent taxonomies, analytical structures, and specific indicative facts, but over this a topic map is needed to express links across the trees. Above this, the classification rules could form another topic map that points into the lower layers. In its latest form, however, the knowledge base has come to be entirely represented in a topic map. The 'Ferret' engine combines sophisticated searching with rule-driven analysis and reporting. In its original application, the Ferret engine performs the equivalent of 5,000 simultaneous searches while reading documents at several thousand words per second. The analysis traces implications of concepts discovered in searching and applies the rules for interpreting implications and the actions to be taken when a significant piece of information is found. Because the topic maps that represent this knowledge can be switched easily, Ferret can be reprogrammed to many tasks, including selection and categorization, scanning of e-mail and newsfeeds, diagnostics, and query expansion, in addition to the original classification application..." [From the Conclusion:] "When we began work on the Ferret system, our goal was simply to construct a tool to help the ADCs review documents. . . The first knowledge base was actually based on one derived from the slow prototype we had studied. We realized that design was not maintainable and moved from it to our earliest XML representation. We eventually realized we needed to divorce the knowledge base from any connection to legacy technologies and to concern ourselves only with capturing the intellectual relationships among its components. By treating the Ferret engine as a black box and building the knowledge base using the XTM model, we have achieved a form in which the base will be both portable and maintainable, as well as potentially usable for more than simply controlling the Ferret engine. Even as the knowledge base has evolved, we have been rethinking the uses of the Ferret technology. Besides using it for its original purpose as an ADC's assistant, we have already used it for categorization projects and for scanning e-mail. We believe that with appropriate knowledge bases, Ferret could serve as a diagnostic tool or a mechanism for expanding queries. We are considering extending the reporting mechanism to write out new topic maps as the engine analyzes documents. The new topic maps might assist us in representinganalytical results in processes like classification, or they could serve as indexes for searching the documents that have been analyzed. If we are able to merge generated topic maps with those already in a knowledge base, we believe that we will have created an engine that is self-training within certain domains. As the topic-map technology gains acceptance and support, topic-map tools fromother sources may appear that we can integrate with the Ferret engine, creating even more interesting tools. Conversion of the knowledge base structure from its original form to topic maps is, I believe, the key to future growth of uses for our analytical engine..." Noted in JMason's trip report (Report of Official Foreign Travel to Germany 17 May-1 June 2001): "I presented a paper on the use of topic maps for building the knowledge base for the Ferret classification engine developed by Y-12. I had previously presented a preliminary approach to an XML knowledge base at an August 2000 GCA conference in Montréal. The current approach represents the entire knowledge base in the XTM application; the paper was well received." See: "(XML) Topic Maps." [cache]

  • [June 19, 2001] "Document Object Model (DOM) Level 3 XPath Specification Version 1.0." W3C Working Draft 18-June-2001. Edited by Ray Whitmer (Netscape/AOL). Latest version URL: "The W3C DOM Working Group has published a first public Working Draft of the Document Object Model (DOM) Level 3 XPath Specification. This is the result of discussions from the 'www-dom-xpath' mailing list, feedback from the 'xml-dev' mailing list, and work within in the W3C DOM Working Group." The draft specification "defines the Document Object Model Level 3 XPath; it provides simple functionalities to access a DOM tree using XPath 1.0. This module builds on top of the Document Object Model Level 3 Core." Background: "XPath is becoming an important part of a variety of many specifications including XForms, XPointer, XSL, CSS, and so on. It is also a clear advantage for user applications which use DOM to be able to use XPath expressions to locate nodes automatically and declaratively. But liveness issues have plagued each attempt to get a list of DOM nodes matching specific criteria, as would be expected for an XPath API. There have also traditionally been object model mismatches between DOM and XPath. This proposal specifies new interfaces and approaches to resolving these issues..." Available as a single HTML file; also in PDF and Postscript formats. See: "W3C Document Object Model (DOM)." [cache]

  • [June 19, 2001] "Style sheets can write style sheets too. Making XSLT style sheets from XSLT components." By Alan Knox (Software Engineer, IBM, Hursley Park, Hampshire, England). From IBM developerWorks. June 2001. ['XSLT style sheets can be used to dynamically transform XML to complex presentation markup for browsers -- but if the presentation is complex, the style sheet will be too. What's needed is some tool that can build complex style sheets from simple components. Since XSLT is itself an XML, XSLT can be manipulated with XSLT; style sheets can write style sheets. This article shows how an XSLT style sheet that performs some particular runtime transformation can be built from XSLT components.'] "Another developerWorks article, 'Spinning your XML for screens of all sizes,' discusses problems with writing and managing style sheets that present the same XML basketball statistics on many display devices. The solution involved writing a parameterized style sheet that produces HTML with varying degrees of data content, and then transcoding the output from that style sheet for a specific device using WebSphere Transcoding Publisher. This is an effective and easy solution for many scenarios, but you lose some control over what appears on a user's screen. If you want: (1) Absolute control over what users see, (2) To tune presentation of your application to give the best possible experience on each device, (3) To make use of particular features of a device... Then you have to solve the problems of generating numerous, complex style sheets. This article demonstrates a no-compromise solution that uses the same basketball XML data... XSLT is a declarative language, where the templates that make up a style sheet are independent of each other. XSLT style sheets can be composed from other style sheets using the import and include mechanisms. With suitable care, you can independently develop a number of separate component style sheets that can be put together to make the presentation style sheet that will be applied to the XML data at runtime. These components will be broadly of three types, that deal with: Presenting dynamic XML data [the basketball data in my example]; Reusable bits of presentation mark-up, such as button bars; The residue of the page..." Article also available in PDF format. For related resources, see "Extensible Stylesheet Language (XSL/XSLT)."

  • [June 19, 2001] "Freddie Mac, Fannie Mae Agree To XML Standard." By Robert Bryce. In Interactive Week (June 18, 2001). "Freddie Mac and its larger cousin, Fannie Mae, are fierce competitors in the secondary mortgage market. But the two companies could make paperless real estate purchases a reality, thanks to their support for Web-based mortgage transactions. Both organizations have agreed to support eXtensible Markup Language (XML) standards established by the Mortgage Industry Standards Maintenance Organization. That support will likely mean dramatic changes for hundreds of companies, from lenders to vendors, that participate in the domestic housing market. That market is expected to generate roughly $1.52 trillion worth of new mortgages this year. But the mortgage process is slowed and made more expensive by the reams of paper needed for loan applications, credit reports, surveys and other documents... Freddie Mac estimates that moving to the MISMO standard could reduce origination costs for each loan by up to $700 - savings that would be passed on to consumers. With 2000 revenue of $30 billion and $44 billion, respectively, Freddie Mac and Fannie Mae are the monsters of the mortgage business. Together, they buy about two-thirds of all conventional single-family home mortgages. The two companies are government-sponsored enterprises created by Congress - and given access to federally backed lines of credit - to help promote home ownership. They buy mortgages from lenders, package them as securities and resell them on the open market to investors... Like many other companies, Freddie Mac and Fannie Mae want to minimize the amount of paper they handle. The MISMO was created in January 2000 to create an XML format that could be used from the initial loan application through the '"securitization' of that loan on Wall Street and the servicing of the loan by vendors. 'The obstacle is getting consensus across the industry for what we call "data",' said Gabe Minton, senior director, industry technology, at the Mortgage Bankers Association of America, which oversees the MISMO. The current version of the MISMO standard enables the translation of 2,000 terms, including the borrower's name and address. But as the standard evolves and more paper-based elements of the mortgage process are converted into XML, the number of terms that the standard will define will likely exceed 4,000, Minton said..." See "Mortgage Bankers Association of America MISMO Standard."

  • [June 19, 2001] "The Microsoft Shared Development Process." Microsoft white paper. June, 2001. "The Microsoft Shared Development Process (SDP) provides a mechanism for fast, focused and profitable collaboration on key technology initiatives between Microsoft and industry partners... As we enter a new paradigm in computing, we find an increased call for integration and interoperability, which requires an even closer working relationship across companies and across industries. The emergence of XML-based web services as the new computing model signals a shift away from standalone applications and networks - disconnected islands of information - to one where constellations of applications, devices and services work together. This shift in the computing model requires a change in the way we design and build technology. It's no longer enough to build standalone functionality; we also have to focus on how a particular technology works with others. While XML Web services provides the new integration methodology, we also need new ways for the industry to come together to tackle new challenges. The SDP is designed to provide an easy, flexible and reusable process for Microsoft and industry partners to collaborate. The SDP is structured on the assumption participants are motivated by business success and any cooperation has the objective of growing the industry and expanding profitable opportunities. Unlike projects developed under some open source licenses, the SDP is respectful of intellectual property and will balance goals of protecting intellectual property rights with other goals encouraging widespread adoption of the new technology developed under the SDP... The SDP is designed to let Microsoft and third parties determine how best to address a common computing problem or challenge. Broadly speaking, the SDP scope has three categories which address different models of cooperation dynamics and intellectual property models... Type 3 projects involve cooperation across the industry to enable a better technology solution for many companies and their customers. Examples of this project type might include the development of industry-wide XML schema that describe a common set of data to be shared across applications in a given industry segment. In these cases, the resulting intellectual property will be licensed broadly to the industry, and in some cases may end up getting turned over to existing standards bodies and other cross-industry organizations. Many Type 3 projects will not involve Microsoft directly, but rather a set of interested companies who will take advantage of the process tools and collaboration resources that the SDP will offer to drive a broad industry solution to a particular issue..." See also (1) the announcement: "Microsoft Announces Shared Development Process for Cooperation On Key Technology Initiatives. Company Issues Call to Industry to Join in Definition and Development of 'HailStorm' Services."; (2) "Microsoft Hailstorm."

  • [June 19, 2001] "Microsoft makes a push for .Net gains." By Joe Wilcox. In Yahoo News [CNET] (June 19, 2001). "Microsoft on Tuesday solidified its software-as-a-service strategy, officially naming a forthcoming high-end version of Windows and releasing new tools for software developers. During a keynote speech at the TechEd 2001 conference in Atlanta, Microsoft Chairman Bill Gates also introduced the new Shared Development Process (SDP) program, supporting the company's .Net software services strategy. If successful, the program's working groups and other features could help Microsoft establish HailStorm, the first .Net offering, as the standard for delivering services over the Internet. . . In addition, Microsoft released the final beta, or test, version of software used by developers: Visual Studio.Net, which includes Visual Basic.Net and the .Net Framework. Visual Studio.Net is expected to provide important tools to support Microsoft's drive to move its Windows operating system and software to the Web. The development package includes updated versions of Visual Basic and C++ and adds the first version of C#, a software programming language designed to facilitate the building of Web-based software... In the longer term, SDP may be the more important announcement made by Microsoft on Tuesday, according to analysts. Through the program, Microsoft plans to establish working groups and industry dialog focused on .Net services, starting with HailStorm. Through HailStorm, which uses Microsoft's Passport authentication service, the company plans to provide secure, for-fee Internet services, such as e-mail, address lists and other personal data, to virtually any type of device. The technology uses XML (Extensible Markup Language), an HTML-like programming language for creating complex data delivered over the Web. But Microsoft has been advocating proprietary schemas, or XML vocabularies, that work better with its products. Microsoft's XML dialect would favor Windows and Office -- two products, according to Dataquest, that have a market share of better than 90 percent market. Microsoft could use its dominance in the mature markets as a lever for entering other, emerging markets, Sutherland said." See: (1) "The Microsoft Shared Development Process.", and (2) the announcement: "Microsoft Announces Shared Development Process for Cooperation On Key Technology Initiatives. Company Issues Call to Industry to Join in Definition and Development of 'HailStorm' Services."

  • [June 19, 2001] "BizTalk Automates B-to-B. [Review.]" By P.J. Connolly. In IT World (June 18, 2001). "Today's conventional wisdom holds that XML is the key to helping businesses work together, at least from the standpoint of merging information from disparate systems. But by itself, XML can't do anything to help. Someone has to define the extensions to the XML schema, the structure that the two partners are going to use when exchanging data. . . BizTalk Server 2000 is in some ways Microsoft's most ambitious product yet in terms of its effect on back-end operations. Most businesses that have streamlined their processes over time have done so internally with great success, but things often break down at the front door. Even the most successful EAI (enterprise application integration) or EDI (electronic data interchange) projects will have some sort of disconnect. BizTalk Server 2000 is constructed to remedy that situation by using SOAP (Simple Object Access Protocol) and XML to glue systems together electronically. It is a unique product that any business using EAI/EDI should consider. BizTalk Server is aimed at processing business documents, such as bills of lading, invoices, and purchase orders, as secured e-mail-like messages. These functions require sophisticated features such as document tracking and once-only delivery to provide the reliability needed for business-to-business transactions... The BizTalk Framework, although agnostic regarding message transport protocols, allows BizTags to carry transport-specific information. After it receives application-generated business documents, the BizTalk Server creates BizTalk messages that contain one or more BizTalk documents, which are generated either by the BizTalk Server or by the application used to create the original business document. The BizTalk Message is then sent to the partner's BizTalk Framework-compliant server which unwraps the message and passes it on to the partner's application... Three client-side tools that analysts and developers use to configure the data flow are included: BizTalk Editor, BizTalk Mapper, and BizTalk Orchestration Manager. Editor is used to create and edit XML schemas, whereas Mapper handles the XSLT (Extensible Style sheet Language Transformations) style sheets that convert data between XML schemas. Orchestration Manager, which uses Visio 2000, allows analysts to design a data flow and developers to translate that design into action. We found the BizTalk components easy to set up and use, and we were particularly impressed with BizTalk Orchestration Manager. We've used Visio before and have found it a great design tool, so we had little difficulty using it as the front end for Orchestration Manager. The GUI uses a Visio diagram split down the middle: Analysts create flowcharts on the left side, and developers, working on the right side, link the various functions from the flowchart to COM (Component Object Model) objects and message queues, also using the modified XML schemas as needed. BizTalk, with a little help from Visio's Visual Basic for Applications component, automatically applies the changes..." See "BizTalk Framework."

  • [June 18, 2001] "TechEd: Microsoft touts Web services support in .NET." By Tom Sullivan. In InfoWorld (June 18, 2001). "Microsoft demonstrated Web services support across several of its enterprise servers in the opening keynote of the TechEd developer's conference here Monday. Paul Flessner, vice president of .NET servers, tried to prove that Microsoft's server software products can compete against the traditional players in the market. Dan Kusnetsky, an analyst with Framingham, Mass.-based IDC, said Microsoft's capacity to compete in the enterprise has increased, particularly where enterprises prefer to string smaller servers together than to use a single machine. . . Also in the keynote, Flessner announced the availability of Mobile Information Server 2001 and showed how it can be used to deliver Exchange information to wireless handsets. Flessner brought product managers Don Kagan and Chris Ramsey on stage to demonstrate Content Management Server, which Microsoft acquired from NCompass Labs. Flessner also showed off the next generation of SQL Server, code-named Yukon, and explained its support for XML and the Common Language Runtime..." See the announcement: "Microsoft Drives XML Web Services Integration Through .NET Enterprise Servers. Company Announces Availability of Mobile Information Server, Demonstrates Content Management Server and Takes SQL Server Past 1 Billion Dollars."

  • [June 18, 2001] "Tech Giants Update E-Commerce Standard." By Stephen Shankland. From CNET June 18, 2001. "A gaggle of computing giants will release Monday a new version of a key Web standard that provides some common ground on how competitors such as Microsoft, IBM and Sun Microsystems view the future of the Internet. In September, Microsoft, IBM and Ariba proposed a standard called Universal Description, Discovery, and Integration (UDDI). The standard allows businesses to register with an Internet directory that will help companies advertise their services, so they can find one another and conduct transactions over the Web. The online yellow pages directory that UDDI provides is a key part of how 'Web services' plans such as Microsoft .Net and Sun One will work together despite corporate differences. Since last year, Sun, Hewlett-Packard, Oracle and others have joined the UDDI initiative, and the first working version of the UDDI directory was launched in May. But on Monday, the companies plan to announce the second version of the standard. The new version comes with several improvements. Among them is better support for different languages; more sophisticated searching features; the ability to describe company organizational structures such as divisions, groups and subsidiaries; and more specific business categories that companies can use to describe themselves... Registry services on the Internet are essential for Web services to succeed, and so far UDDI looks like the only option, said Gartner Group analyst Daryl Plummer. Plummber believes UDDI initially will be used in private arrangements among business partners -- for example, Home Depot could use a UDDI-based service that finds light-switch suppliers and ranks them according to pricing and availability of light switches. But UDDI faces a thorny issue: whether it will become an industry standard. Such a move would reduce the control the founding members have but could make UDDI more palatable to others by making it more neutral. UDDI organizers have said they plan to turn it over to a standards body, but that likely won't happen in the immediate future, Plummer said." See the 2001-06-18 announcement and references in "Universal Description, Discovery, and Integration (UDDI)"

  • [June 18, 2001] "A Topic Map Data Model. An infoset-based proposal." By Lars Marius Garshol (Ontopia A/S) and Hans Holger Rath. TMQL [Topic Maps Query Language] Project. Reference: ISO/IEC JTC 1/SC34 N0229. June 18, 2001. "This document defines an abstract model for topic maps which makes explicit the implicit data models of ISO 13250 and XTM 1.0. It also defines a processing model for XTM 1.0 based on the data model. The model is intended to present one possible approach to specifying a data and processing model for topic maps, believed by the author to be preferrable to other proposed approaches. It is hoped that this model may represent a first step on the way to a complete model for topic maps. Such a model would serve many purposes: (1) Enable interoperability between topic map processors by defining precisely what topic map processors are required to do. (2) Enable ancillary standards to be built on the topic map standard in a precise and controlled manner. (3) Make it easier for newcomers to topic maps to understand what their abstract structure is and how they work... This document is not complete; it is an early draft intended to show a possible approach to defining the topic map model. In particular, this document has no official standing whatsoever. It is, as stated above, just a draft proposal... The abstract model for topic maps here presented is inspired by the XML Infoset, and uses a similar system of information items with named and typed properties..." See: "(XML) Topic Maps." [cache, and alternate source, from Ontopia]

  • [June 18, 2001] "The Agricultural Ontology Server: A Tool for Knowledge Organisation and Integration." Food and Agriculture Organization of the United Nations (GILW), Rome. June 2001. "At FAO, we are committed to helping combat and eradicate world hunger. Information dissemination is an important and necessary tool in furthering this cause -- we need to provide consistent, usable access to information for users in places doing this very work. And, the wide recognition of FAO as a neutral international centre of excellence for agriculture positions it perfectly to lead in the development of system specific agricultural ontologies. The Agricultural Ontology Server (AOS) will be instrumental in this effort by structuring agricultural terminology, thus making describing, defining and relating this information manageable for distributed facilities, and by standardising agricultural terminology, thus making resource access and discovery more efficient. The AOS will function as a central common reference tool for serving ontologies. Itself an ontology using the AGROVOC thesaurus as its core, it will contain and serve terms, definitions of those terms and the relationships among those terms. It is designed to serve as a focal point for the vocabulary of the agricultural domain, and to codify and standardise the knowledge within this domain. It will serve common core terms and relationships, as well as richer relationships that designate it as an ontology... The elements of the AOS will need to be encoded within the RDF framework. Common terms and definitions and their associated relationships from the core of the AOS will be identified by Universal Resource Identifiers (URIs) and stored in this common framework. (XTM is a parallel standard in development that may provide richer associations for better encoding.) To enable the second task, the AOS will use XML language to communicate among systems for the exchange of the URIs to build ontologies. The systems interested in utilising the AOS will need to use this language to be capable of interoperability. The conjunction of these standards will enable the communication of machine-readable commonly used URIs among a variety of different tools. In the case of the AOS, this type of communication will allow ontologies created by multiple tools -- their terms, definitions and relationships -- to be shared, evaluated and maintained using the central AOS.... The advent of XML (eXtensible Markup Language) provides the ability to share knowledge across different tools, using a standard schema. The RDF (Resource Description Framework) standard allows storage and sharing of metadata (data about resources) across systems. The topic mapping language, XTM (XML Topic Maps), currently in development, may provide even stronger functionality for the use of metadata. These new standards allow us to leverage controlled vocabularies in the development of common methods for describing, defining and relating resources. Briefly defined, the Agricultural Ontology Server (AOS) will function as a central common reference tool for serving ontologies. An ontology is a system that contains terms, the definitions of those terms, and the specification of relationships among those terms. It can be thought of as an enhanced thesaurus -- it provides all the basic relationships inherent in a thesaurus, plus it defines and enables the creation of more formal and more specific relationships. The AOS, using the AGROVOC thesaurus as its core, is designed to serve as a central focal point for the vocabulary of the agricultural domain, and to codify and standardise the knowledge within this domain. It enables better communication within and across systems, and structures the meaning contained within systems..." See also "Draft Specification for DC-based Application Profile for Agricultural Information."

  • [June 15, 2001] "Sun Fortifies Java Development. Forte for Java 3.0 lets developers create, publish, and subscribe to XML-based Web services" By Ron Copeland. In InformationWeek Issue 841 (June 11, 2001), page 85. "If your company is a Java shop looking for tools to more easily build, assemble, and deploy enterprise applications as Web services, the latest version of Sun Microsystems' Forte for Java could help. It's one of the first development environments to offer such capabilities. Forte for Java 3.0 lets developers use Enterprise JavaBeans components not only to build enterprise applications, but also to create, publish, and subscribe to XML-based Web services. The development toolkit is available on the Web ( as part of the Forte for Java Early Access Program. Forte for Java is a cross-platform integrated development environment for Linux, Solaris, and Windows platforms, and it's based on the NetBeans open-source development environment... One way in which the new release differs from the Forte for Java 2.0 is that it's based on the latest version of the NetBeans open-source project, which added a dozen or so modules to its code base. These modules simplify Java development and address a broad range of issues, including integration with Apache's Ant XML script tool, improved application-server support, and, perhaps most significantly, Simple Object Access Protocol-based Web-services generation and deployment. Highlights of Forte for Java 3.0 include wizards and templates for creating and packaging Enterprise JavaBeans and associated Web components. Java developers will be able to build sophisticated Web services applications rapidly, without the need for significant coding. Using an XML services registry, Java components are packaged as Web services for run-time access and execution..." See also the Sun feature article: "Forte ESP Toolkit Integrates Web Design Tools and XML Technologies."

  • [June 15, 2001] "Topic Maps, NewsML and XML-Possible Integration and Implementations." By Soelwin Oo (Software Developer, Research and Development, empolis UK). 2001. See the larger collection of technical papers. "This paper will discuss how the integration of different Topic Map based technologies can lead to the development of powerful knowledge based resource retrieval systems. It will discuss in detail the possible implementation for integrating a data resource that supports Topic structures with the knowledge embodied within a Topic Map. It will discuss this using examples of technology currently being developed by empolis illustrating the possible architecture of such a system and its potential real world use. Finally, the paper will investigate the potential for further integration and scalability of the system with other Topic Map resources. More specifically, it will elaborate on the possible hurdles and pitfalls that may arise from the integration of data from multiple resources and the possible need for managing ontologies originating from different sources... NewsML is a structured flexible framework based on XML developed by the IPTC (International Press Telecommunications Council) for electronic news based publication. It supports the representation of news items and the relationships between these news items in an XML based structure. Because NewsML possesses associated metadata concerning its news content, it provides the ability for having multiple representations of the same information along with provision for handling arbitrary mixtures of media types, languages and formats. The prime interest towards NewsML within the scope of Topic Maps is that NewsML possesses metadata concerning Topics that provide the ontology of its news content. This news item ontology' puts forward an appropriate example for an opportunity to capture' concepts presented by an XML based format that supports Topic structures. Once the base ontologies used within NewsML are present within a Topic Map, an application can process NewsML documents and present to the user the instances of the base ontologies that are associated with a NewsML document. This will then present a content driven approach for navigation of a Topic Map because the user's starting point will be the base ontologies instantiated by the NewsML document..." See "NewsML and IPTC2000" and "(XML) Topic Maps."

  • [June 15, 2001] "XAS: A System for Accessing Componentized, Virtual XML Documents." By Ming-Ling Lo, Shyh-Kwei Chen, Sriram Padmanabhan, and Jen-Yao Chung (IBM T. J. Watson Research Center). Paper presented at the Twenty Third International Conference on Software Engineering (ICSE 2001). May 12-19, 2001. Published in the conference proceedings, pages 493-502 (with 26 references); available from the IEEE Computer Society. "XML is emerging as an important format for describing the schema of documents and data to facilitate integration of applications in a variety of industry domains. An important issue that naturally arises is the requirement to generate, store and access XML documents. It is important to reuse existing data management systems and repositories for this purpose. We describe the XML Access Server (XAS), a general purpose XML based storage and retrieval system which provides the appearance of a large set of XML documents while retaining the data in underlying federated data sources that could be relational, object-oriented, or semi-structured. XAS automatically maps the underlying data into virtual XML components when mappings between DTDs and underlying schemas are established. The components can be presented as XML documents or assembled into larger components. XAS manages the relationship between XML components and the mapping in the form of document composition logic. The versatility in its ways to generate XML documents enables XAS to serve a large number of XML components and documents efficiently and expediently."

  • [June 15, 2001] "Draft requirements, examples, and a 'low bar' proposal for Topic Map Constraint Language." By Steve Pepper (Project Editor). ISO/IEC JTC 1/SC34 N226. The User Requirements include: (1) TMCL shall permit the definition of classes of topic maps in order to: [a] enable the documentation of the structure and semantics of a class of topic maps; [b] provide a foundation for defining vertical or domain specific applications of topic maps; [c] provide means of validation to ensure consistency within a topic map or across a class of topic maps; [d] enable applications to provide easier and more intuitive user interfaces for creating and maintaining topic maps; [e] enable the separation of the tasks of modeling and populating topic maps. (2) TMCL shall be based on the Topic Map Data Model (and therefore support both XTM and ISO 13250 Topic Maps). (3) TMCL shall not attempt to cover every possible constraint. Instead it should provide a solution for the most commonly required kinds of constraints and, at the same time, an extension mechanism to allow the expression of less common constraints by other means. (4) TMCL shall provide for modularization, and the ability to extend individual sets of constraints through reference to others. (5) TMCL shall be expressible as XML, using the topic map interchange syntax where applicable. (6) TMCL shall build on pre-existing specifications and established best practice for knowledge representation and data modeling where possible. (Candidates for consideration include DAML/OIL, KIF, OKBC, OCL, PAL (Protégé Axiom Language), and XML Schema.) (7) TMCL shall be as concise and human-readable as possible within the terms of the preceding requirements." Cf. the NWI proposal cited below. From the Recommendations of May 2001 Meeting of ISO/IEC JTC1/SC34/WG3 in Berlin: "WG3 submits N221 as a New Project Proposal for a Topic Map Constraint Language to support ISO/IEC 13250. SC34 requests its secretariat to forward this document to JTC1 for ballot. WG3 submits N226 as draft requirements, proposes Steve Pepper (Norway) as acting editor and instructs the acting editor to prepare a final requirements document." See: "(XML) Topic Maps."

  • [June 15, 2001] Topic Map Constraint Language [TMCL]. Proposal For a New Work Item. ISO/IEC JTC 1/SC34 N221. 23 May 2001. Motivated because "a constraint language is needed to build templates for topic maps conforming to ISO/IEC 13250." The new work would address "mechanisms for expressing constraints on classes of topic maps conforming to ISO/IEC 13250:2000." Purpose and justification (1) To enable the documentation of the structure and semantics of a class of topic maps. (2) To provide a foundation for defining vertical or domain specific applications of topic maps. (3) To provide means of validation to ensure consistency within a topic map or across a class of topic maps. (4) To enable applications to provide easier and more intuitive user interfaces for creating and maintaining topic maps. (5) To enable the separation of the tasks of modeling and populating topic maps... This project will be part of a series of Standards and Technical Reports that contribute to the implementation and understanding of ISO/IEC 13250, Topic Maps." Compare the SC34 N226 Draft Requirements for TMCL, cited above. See: "(XML) Topic Maps."

  • [June 15, 2001] "XML-Lit." A communique from Rafael R. Sevilla: "I've started a new XML literate programming project I call XML-Lit.... 'XML-Lit: A simple XML-based literate programming system' This project is somewhat inspired by a very simple program by Jonathan Bartlett called xmltangle. XML-Lit is a simple literate programming system that you can use with any XML-based markup language to make your literate program..." From the introduction to the documentation: "I recently found a simple program called xmltangle by Jonathan Bartlett that provides a simple literate programming system based on DocBook. I have been somewhat frustrated by that program though; for one thing, it did not allow program code snippets to be enclosed within CDATA sections, which would make including a program inline a lot easier to do, and easier to read on screen while you're editing it, especially with programming languages that are chock full of <'s such as the typical C program, or worse yet, an XSL stylesheet, which I planned to use Jonathan's program for. So I set off to create a complete rewrite of the program, which uses James Clark's expat XML parser. So now, I have come up with my own simple literate programming system, xml-lit which takes a similar approach, but instead of enclosing code snippets within within DocBook <programlisting/> tags, I define a new namespace xml-lit which (for now) contains a single tag <xml-lit:code> which has a single attribute named file which gives the name of the file to which the code it encloses should be output. This eliminates the program's dependency on DocBook, so it can be used with any XML-based document markup language (such as XHTML). It's a very simplistic system, but it's able to do the task for which it was designed. The program is also backward-compatible with Jonathan's work given a command line switch..." See the source code download and online documentation. See "SGML/XML and Literate Programming."

  • [June 15, 2001] "Three Myths of XML." By Kendall Grant Clark. From June 13, 2001. ['XML has it all, not only an interoperable syntax but a solution to bring world peace, end poverty and deter evil dictators. Kendall Clark debunks these and other popular myths of XML.'] "... The possibilities of social change brought about by technology are limited as much by the social and historical contexts within which technology comes into existence as they are by intrinsic features of the technology itself. This general point is perhaps never so true as when it's applied to two specific areas of computer technology, both of which concern readers directly: the Semantic Web and, of course, XML. In what follows I debunk three myths of XML, each of which in some way bears on the question of the role of technology in social change: (1) The first myth rests on a confusion about the meanings of words like 'free' and 'open' when they are applied to XML-encoded information. (2) The second myth is that XML is magical, that it has some unique properties that makes impossible things possible. (3) The third is that technology, including XML, is more determinative of social relations and institutions than they are of it..."

  • [June 15, 2001] "Perl and XML: Perl XML Quickstart: Convenience Modules." By Kip Hampton From June 13, 2001. [' The third and final part of our guide to Perl XML modules covers some handy modules geared to specific tasks.'] "This is the third and final part of a series of articles meant to give quick introductions to some of the more popular Perl XML modules. In the last two months we have looked at the modules that implement the standard XML APIs and those that provide more Perlish XML interfaces. This month we will be looking at some of the modules that seek to simplify a specific XML-related task. Unless XML is a significant part of your daily life, chances are good that the more generic XML API modules will seem like overkill. Perhaps they are. If your needs are modest, a module probably exists that will reduce your task to a few method calls. These single purpose, convenience modules are a key entry point to the Perl/XML world, and I have chosen a few of the more popular ones for this month's code samples. In the interest of clarity, we will limit the scope of the examples to the common tasks of creating XML document for other data sources, converting HTML to XHTML, and comparing the contents of two XML documents. While many of the XML API modules provide a way to create XML documents programmatically based on data from any source, several modules exist that simplify the task of creating XML documents from data stored in other common formats. We'll illustrate how to create XML documents based on data extracted from CSV (Comma Separated Value) files, Excel spreadsheets, and relational databases..." See: "XML and Perl."

  • [June 15, 2001] "X Marks (up) the Language." By Eric Bohlman. From XMLPerl.Com June 2001. ['Second in a series of articles written by Eric Bohlman. This article gives a good overview of how to parse XML with Perl, and almost as important, how NOT to parse XML.'] "When we talk about parsing a language, we mean the process of taking a piece of code or data written in that language and breaking it up into its constituent parts as defined by the rules of that language. Parsing is an essential task for any program that wants to use language- based data or code as input. .. There are basically two ways a parser can make the components of an XML document known to an application: it can read through the document and signal the application every time a new component appears, or it can read the entire document and then present the application with a tree structure corresponding to the element structure of the document. A parser that works the first way is called a stream-based or event- driven parser; one that works the second way is called a tree- based parser. Two common terms that you'll hear are SAX and DOM; SAX (Simple API for XML) is a specification (developed informally by members of the xml-dev mailing list) for how a stream-based parser should "talk" to an application; DOM (Document Object Model) is a specification (a formal Recommendation of the W3C) for how an application can access and manipulate the tree structure of a document. Whether to use a stream-based or tree-based parser depends on the nature of the processing being done to the XML documents and the size of the documents. A tree-based parser usually has to load the entire document into memory, which may be impractical when processing documents like dictionaries or large database dumps. With a stream-based processor, you can skip over elements that you aren't interested in (for example, when looking up a particular word in a dictionary). But if your application needs to process certain elements in relation to other elements (for example, reading a bibliography and extracting a list of all authors who have published at least three articles on the same topic), a tree- based parser is much easier to work with. It's worth noting that a tree-based parser can be built on top of a stream-based parser, and that the output of a tree-based parser can be "walked" to provide a stream-based interface to an application. As of this writing, all the Perl tree-based parsers are of the former type. Perl Modules for Parsing XML As of this writing, there are four "mainstream families" of XML parsers available as Perl modules, all of which are available from CPAN: [XML::Parser, XML::DOM, XML::Parser::PerlSAX, XML::Grove] All of these modules provide object-oriented interfaces; if you're not comfortable with object-oriented programming in Perl, now is the time to review the perltoot and perltootc manpages that come with Perl. If you have ActiveState's ActivePerl for Win32, you already have XML::Parser installed, since ActivePerl's PPM utility uses XML documents to store the installation requirements for modules... Next month we'll continue talking about XML parsing and we'll look at tree-based parsing." See: "XML and Perl."

  • [June 15, 2001] "XML-Deviant: What You See Isn't What We Want." By Leigh Dodds From June 13, 2001. ['Getting back to basics, we take a look at the best way of getting your documents marked up in XML.'] "'How do I convert my Word documents to XML?' This one has cropped up in several forums and appears regularly on XML-DEV. And, like many seemingly simple questions, there are a variety of answers. If the intention is to simply convert a small number of documents to XML or HTML suitable for publishing on the Web, then using the built-in Save As XML/HTML option is a good starting point. But the results of this are messy to say the least. A great deal of Word specific cruft is left in the resulting document. This has lead to the production of numerous tools capable of cleaning up the mess, as well as others, like Omnimark, that provide an alternative conversion facility. In some cases what is being asked is much more ambitious. Users may have a large number of documents that must be converted, and they may want to continue to use Word as an authoring tool for the generation of structured XML documents conforming to a particular schema, for use in a publishing system, or document repository. These are the users who are plainly keen to gain some of the widely advertised advantages of XML by moving their documentation out of a proprietary format. In these circumstances it seems that the received wisdom is to roll-up your sleeves and begin coding. The key technique is to use Word Styles (user-defined formatting properties) as markers for particular document structures (paragraphs, lists, headings, etc.) and then use scripts or macros to generate markup based on this styling information. Further manipulation with XSLT, for example, can further refine the results to yield the desired format. Rather surprising for users who may be seeking an off-the-shelf solution..."

  • [June 15, 2001] "Interwoven aims to rally XML migration. Repository allows incremental conversion from HTML." By Cathleen Moore. In ITWorld (June 14, 2001). "Interwoven Inc. rolled out two content management infrastructure products on Tuesday aiming to boost enterprise control of content reuse and distribution. The company's TeamXML is an XML repository designed to help enterprises adopt XML. By allowing users to convert individual Web assets and content components to XML on an as-needed basis, the repository allows enterprises to implement a phased XML migration strategy, according to Interwoven officials. According to Kevin Cochrane, vice president of product management at Sunnyvale, California-based Interwoven, allowing corporate users to control the migration process will help speed the adoption of XML. TeamXML also offers the ability to store XML objects in native form, which boosts the performance and scalability of content, Interwoven officials said. According to Rob Perry, senior analyst at The Yankee Group in Boston, native storage for XML may help drive adoption in companies that are trying to make the move to XML. Interwoven also released OpenSyndicate, a content distribution product aimed at giving enterprise users the ability to control the assembly of content packages..." See details in "Talking XML with Mark Hale, Standards Architect, Interwoven." See also the recent announcements from Interwoven: (1) "Interwoven Announces TeamXML. Next-Generation Object Store to Accelerate Adoption of XML Across the Enterprise. Interwoven Extends XML Leadership with Architecture based on Native XML-Object Model" and (2) Interwoven Announces OpenSyndicate. Business Managers To Take Direct Control of Content Distribution. Interwoven Pioneers Intelligent Content Distribution."

  • [June 15, 2001] "Inside UDDI." By Richard Karpinski. In InternetWeek (June 07, 2001). "Later this month, will unveil version 2.0 of its specification for helping companies find each other via the Internet. With the backing of more than 280 companies, the Universal Description, Discover and Integration Registry looks to have staying power. Yet many enterprises haven't even started to tap its power yet. We spoke with Chris Kurt, the program manager for (and Microsoft's group program manager for Web services) to get a nuts-and-bolts look at how UDDI works -- and how IT can get started using it today. . . UDDI provides an XML-based method for businesses to describe themselves and the Web-based services they offer. The UDDI Business Registry is the public database where companies register themselves. Public UDDI registries are now fully operational. Beta testing wrapped up in early May. IBM and Microsoft are running the public databases. Ariba dropped out, but Hewlett-Packard will launch a third registry later this year. The power of UDDI is the power of ad-hoc discovery of new business partners and processes. If the emerging world of Web services is to flourish, companies need a seamless, automated way to find other businesses on the Internet and determine if their systems and applications are able to work together via the Web. In short, UDDI lets companies do three things: (1) Discover each other; (2) Define how they can interact via the Internet; and (3) Share all this information via an open, global registry... UDDI is a good example of what happens when developers begin thinking about delivering apps as services. The registry is lightweight (it doesn't hold information but links to it); message-based (connections are made by passing XML documents rather than hard-coded integration); and supports highly-distributed apps (even though the look-up database itself is centralized in several locations). Today, UDDI requires too much manual work. The true power of UDDI will come when development tools automatically create the WDSL files to describe newly-created apps and delivers them seamlessly to the public UDDI databases. Also important will be UDDI links within key enterprise apps, such as ERP, supply chain and procurement. Such apps should one day be able to expose the Web services they offer as part of their installation process. UDDI is all about ad-hoc business relationships -- 'discovery,' as it name implies. To that extent, long-time business partners may share their Web services more directly. But as e-business grows, says's Kurt, companies will regularly be evaluating new suppliers, as well as seeking an automated way to learn about the new Web services and interfaces exposed by existing trading partners. Public UDDI registries augmented with private supply community UDDI databases should be able to take care of this gamut of e-business relationships. Meanwhile, version 2.0 of UDDI -- slated to be unveiled this month -- will among other improvements support richer taxonomies to better reflect the complexity of enterprises and the different types of Web services they aim to describe. There's no doubt that UDDI -- and the Web services model it aims to support--is in its infancy. But so far the group has moved quickly toward public implementations and kept the politics at a minimum." See: "Universal Description, Discovery, and Integration (UDDI)."

  • [June 15, 2001] "Microsoft Brings Keyword Search to UDDI." By Ashlee Vance. In InfoWorld (June 15, 2001). "Microsoft and Realnames teamed on a keyword-based searching service Thursday for the UDDI registry, adding one of the first new features to a directory that has been billed as a 'Yellow Pages' for the Internet. The UDDI (Universal Description, Discovery, and Integration) registry aims to make it easier for businesses to provide information about their products and services on the Web as well as locate partners and customers. A number of registries that use differing protocols already exist on the Web, but Microsoft, IBM, and Hewlett-Packard have joined the UDDI effort as a way to make business-to-business commerce on the Web work more smoothly. The vendors claim that thousands of businesses have signed up to use UDDI. Microsoft maintains one of the registry sites where companies can enter information about their business. The software maker is teaming with RealNames to make UDDI-related keywords accessible through the address bar in the Internet Explorer browser, said Christopher Kurt, group program manager for UDDI and Web Services at Microsoft. RealNames removes the need to type in sometimes hard-to-remember Web addresses by allowing companies to register simple keywords -- such as the name of a company or a product. When a user types in one of those keywords, they are taken to the Web sites of the company that registered the word, Kurt said. The system competes with a similar keyword service operated by America Online. In the context of UDDI, users will be able to type UDDI followed by a company name or portion of a company name into the address bar of Internet Explorer -- for example, UDDI flowers" The results would show a list of the businesses registered in UDDI that have flowers in their name. The service could be used by anyone, from a home user shopping for a cricket bat or a large manufacturer in need of raw materials, officials said. The keyword search will also take into account a user's location, returning searches based on the language spoken in the user's locale. When businesses sign up to use the RealNames service, they will be pointed to Microsoft's UDDI registry site in an attempt to encourage growth of the registry. Eventually, users will be able to submit their information to the UDDI registry directly from the RealNames site, said Nico Popp, chief technology officer of RealNames. The UDDI system, which was launched last month, contains three types of information, divided into what the vendors refer to as White, Yellow, and Green pages... Microsoft, IBM, and HP will maintain the servers that collect the registry information for about the next year, at which time the project will be turned over to an as-yet unnamed standards body. Updates to the registry are scheduled to appear throughout 2001, with more complex features being added for varying types of business-to-business transactions. Companies can register their information in the UDDI registry at no charge." See (1) the main reference page "Universal Description, Discovery, and Integration (UDDI)", and (2) the announcement: "Microsoft and RealNames Announce Registration And Navigation Services for UDDI Initiative. Businesses Publish UDDI Records and Receive Worldwide Exposure Through Internet Explorer Browser When Registering Keywords."

  • [June 13, 2001] "XML Takes Root in Catalog, Database Publishing. XML becoming ubiquitous as SGML never could. [The Latest Word.]" By George Alexander. In Seybold Report: Analyzing Publishing Technology [ISSN: 1533-9211] Volume 1, Number 5 (June 04, 2001), pages 27-28. "Tools for publishing catalogs and database-derived publications (directories, reference books, etc.) depend on a combination of database and layout tools. Vendors of such publishing tools have latched onto XML enthusiastically and are finding all kinds of uses for it. XML-based approaches vary a lot in their complexity, ranging from support for simple file exchange to the use of XML internally. Recent announcements from some catalog and database-publishing vendors give an idea of some of the possibilities... It's striking that so many diverse uses for XML have surfaced in this single group of related applications. Clearly, there was an existing need for a file format that provides the combination of structure and flexibility found in XML. Within the next year or so, we expect that virtually every catalog and database will offer some support for XML import and export. It's a logical thing to do, and customers will be asking for it. But note that 'XML support' will not generally mean 'no custom conversion needed' when transferring data to another system. With the exception of a few selected vertical markets, there are no standards for exactly how the XML file coming out of or going into a database should be tagged. This means custom conversion routines (albeit relatively simple ones) will still be needed in most cases. There will be room for additional development in this area in the years to come. Other less obvious uses for XML, such as the API approach that Boheads is using, or the internal use of XML by Datazone, will no doubt continue to surface over the next few years as well..." [The article summarizes a variety of ways XML is used by publishers in document- and database-oriented applications.]

  • [June 13, 2001] "CDC Solutions integrates, extends with Xtensia. Experienced U.K. vendor pitches automated personalized document production as an enterprise capability." By Mark Walter. In Seybold Report: Analyzing Publishing Technology [ISSN: 1533-9211] Volume 1, Number 5 (June 04, 2001), pages 25-26. "Earlier this month, database-publishing veteran CDC Solutions announced plans to offer its multichannel publishing products in a configuration consistent with the enterprise application integration platforms that are emerging in the industry. Called Xtensia, the new platform is built upon the company's existing product modules and expertise in on-demand, customized PDF and XML-based publishing. What's new is the way it expects to see customers integrate the technology with other business systems in enterprise-level applications as well as in departmental point solutions. With Xtensia, CDC is applying its skill in aggregation, personalization and production to material that is structured but unformatted (or at least not frozen as PDF). It offers prebuilt product modules for specific tasks and is in the process of creating standard interfaces for those products to facilitate their integration with other systems. The standard product modules are in the areas of aggregation, personalization and production... CDC has written a new multipage composition engine based on Adobe's PDF libraries. Called Xssembler, it takes well-formed or valid XML as its input and creates composed PDF files as output. Typically Xssembler takes advantage of aggregation and personalization features in specific applications. For example, in the legal department at one of CDC's clients, Xssembler creates personalized contracts based on boilerplate material and variable-data fields. The resulting documents may then be sent to the next step in the cycle -- back to a Web page, on to an e-mail server or directly to a printing device. Where authenticity is a concern, CDC has watermarking capability that it can apply as part of the process... One downside of the XML-based Xssembler is that it is not yet as designer-friendly as the PDF-based product. Where PDFfusion enabled firms to create their PDF overlays using any layout tool they wanted, Xtensia is a command-line program that currently lacks a graphical tool for creating its layouts -- designers end up writing XSL style sheets using ASCII editors. Jim Cook, CDC's chief technology officer, admitted that 'There is a gap in the market right now for a good, graphical style sheet editor,' and CDC has not yet developed its own, expecting that someone will fill that gap in the very near future... CDC's concept -- a network agent that compiles and composes custom documents -- sounds just like dynamically built Web sites, but with a key difference: Xtensia can make good-looking printed documents, which precious few Web production systems do."

  • [June 13, 2001] "New Contenders in Cross-Media Publishing Systems." By Stephen Edwards, with Luke Cavanagh, and Mark Walter. In Seybold Report: Analyzing Publishing Technology [ISSN: 1533-9211] Volume 1, Number 4 (May 21, 2001), pages 4-13. "Two significant new players in the cross-media system business, Seinet [Xtent] and EidosMedia [Méthode], made their debut at Seybold Seminars Boston last month. Their key innovations include the use of XML to support media-neutral structure with media-specific decisions. They are developing the capability to write a story to fit a space, in print or on a Web page, without sacrificing its reusability. We view them as the first of a new breed of editorial content-management systems that have cross-media awareness built in. Newspapers are the target customers, but catalog and journal publishers will also want to take note... A significant difference between these [two] new systems and the old ones is their use of XML to have both well-formed, normalized structure and media-specific decisions and intent. They give editors and designers interactive feedback regarding how the content will appear when delivered, regardless of delivery mechanism. They bring XML up front to the authoring process, rather than downstream as a post-conversion process, such as is offered with Avenue.Quark. This has been done many times before in content-driven publications. Unfortunately, because most previous SGML and XML-based systems were designed for content-driven publications, they negated the importance of media-specific and product-specific decisions, and so skimped on visual feedback and tools for media-specific decisions. Even though the Seinet and EidosMedia systems are aimed initially at newspapers, they parallel the future direction for cross-media systems for catalogs, journals and a variety of other publishing genres. . . The XML implementations of these two companies provide an important benefit we want to elaborate on here: the ability to create only one version of each article and publish it in multiple editions for multiple media. This feature -- attempted also by Atex with Omnex -- breaks from the traditional approach of newspaper systems, which has required the user to create a new version of a story every time it is edited for use in a different edition. In contrast, both Seinet and EidosMedia can use a single file for all editions, with XML tags defining which portions of a story will be published in each edition. For example, if different headlines are required for the Web and print editions, both headlines are contained in the same file, and on output the XML tags determine how the headlines are used. Similarly, a summary of a story to be used in a digest on the Web can be included right in the story. And, where hard-core newspaper editors like to edit a story for each use, a single story can encompass all such changes. This approach offers several important conveniences, besides simplifying the process of searching for specific versions. It reduces the amount of editing, cutting and pasting among files as multiple versions are created. And, when last-minute changes are required, such as in correcting errors, it speeds the process and promotes accuracy by limiting the changes to a single occurrence... The innovation of Seinet, EidosMedia and other newspaper vendors is in applying XML to layout-intensive applications, where editors and designers make editorial decisions, such as the wording of a headline -- in the context of specific layouts. To do this, they've written their own XML editing programs that allow media-specific markup to be inserted in a way that does not corrupt the cross-media structural markup of the article. The implementations represent a leap for XML into a whole new market for publishing system."

  • [June 13, 2001] "W3C Blesses XML Schema. Milestone passed for building XML-enabled Web applications, services and technologies. [Standards.]" By Mark Walter. In Seybold Report: Analyzing Publishing Technology [ISSN: 1533-9211] Volume 1, Number 4 (May 21, 2001), page 3. "After two years of development, refinement and testing, the W3C has approved XML Schema 1.0 as a recommendation, marking a major milestone in the evolution of XML. The new standard creates an underlying common syntax for data interchange and technology integration on the Web, complementing the document-oriented features of XML itself... XML Schema makes two substantial improvements to XML: datatypes and integration with namespaces. Bringing datatypes to XML makes it more suitable for use with databases and the fielded information -- identifiers, numbers, dates, etc. -- often stored in databases. The integration with XML Namespaces makes it easier to validate and resolve documents that make use of multiple tag vocabularies. The W3C has issued a tool as well, an XML Schema Validator called XSV that it co-developed with the University of Edinburgh in Scotland. XSV has been revised at each stage of XML Schema development and now validates against the final spec. In addition, the W3C is inviting developers to send in sample schemas for a test-suite library, to be reviewed and managed by the W3C XML Schema Working Group. . . The passing of XML Schema by the W3C may have been a foregone conclusion, but it nevertheless represents an important milestone. Users can now rightfully demand that vendors comply, and developers have a basis for creating Web services and technologies that exchange data in an open fashion. As with most other HTML and XML standards, we like what's emerged from the W3C committee process, and we expect most vendors to nod their heads in agreement. True compliance, however, will be up to end users to demand and enforce."

  • [June 13, 2001] "NetLibrary Adopts OEB. Drops proprietary format and conversion services, cuts 90 jobs." By Mike Letts. In Seybold Report: Analyzing Publishing Technology [ISSN: 1533-9211] Volume 1, Number 4 (May 21, 2001), pages 42-43. "Last January, in a move designed to cut costs, online library service provider NetLibrary began charging publishers for the free conversion services it originally provided. With sales sputtering, the company found that its inhouse conversion services were a financial liability. Now, less than three months later, NetLibrary has scrapped all on-site conversion services for its NetLibrary service and cut nearly 90 jobs in the process. The conversion policy for MetaText, an interactive digital textbook developer acquired by NetLibrary last year, will remain unchanged. In addition, the company also announced that it has dropped its use of a proprietary format based on Folio Views in favor of the XML/HTML-based publication structure developed by the Open e-Book Forum (OeBF, consortium. Under the company's new service policy, publishers can submit any electronic file meeting OeBF standards, or NetLibrary will outsource any necessary conversion labor to another facility and act as middleman between the publisher and the conversion house..." See "Open Ebook Initiative."

  • [June 13, 2001] "Webjaz in at the Examiner. Two modules used in daily production." By Stephen Edwards. In Seybold Report: Analyzing Publishing Technology [ISSN: 1533-9211] Volume 1, Number 4 (May 21, 2001), page 36. "Harris' Jazbox publishing platform has been installed at the San Francisco Examiner where the paper is using two of the system's key modules: Newsjaz, for print publishing, and Webjaz, for Web publishing. The Examiner Web site provides evidence that Webjaz works. Harris reported that the site is being handled by one technical person, compared with 11 that it needed prior to being split off. Harris has continued to add to the Webjaz functionality, resulting in a solid product. It handles all file types, stores all elements in XML (the text-editing window doesn't show XML tags, but a separate preview window displays them), and supports XSL style sheets. It provides a scheduling facility for automatically publishing a story at a specified time and automatically archiving it later. It offers automatic e-mail messaging and supports version control, audit trails and database replication. If different versions of a story are required for different media, Harris creates multiple versions rather than using the same file... For print publishing, Harris has been focusing on the integration of InDesign with Newsjaz, which was started at the request of the Examiner.

  • [June 12, 2001] "Archaeological Data Models and Web Publication Using XML." By J. David Schloen (The Oriental Institute of the University of Chicago). In Computers and the Humanities (CHUM) Volume 35, Number 2 (May, 2001), pages 123-152. "An appropriate standardized data model is necessary to facilitate electronic publication and analysis of archaeological data on the World Wide Web. A hierarchical 'item-based' model is proposed which can be readily implemented as an Extensible Markup Language (XML) tagging scheme that can represent any kind of archaeological data and deliver it in a cross-platform, standardized fashion to any Web browser. This tagging scheme and the data model it implements permit seamless integration and joint querying of archaeological datasets derived from many different sources... see for the latest version of the XML Document Type Definition in which ArchaeoML is defined..." See also from "Electronic Publication of Ancient Near Eastern Texts": "David Schloen, an archaeologist in the University of Chicago's Oriental Institute, gave the final formal presentation on Saturday afternoon, entitled 'Texts and Context: Using XML to Integrate and Retrieve Archaeological Data on the Web.' Schloen noted that XML is as suitable for representing archaeological databases as it is for representing ancient texts. But whether the information is expressed in XML or in some other data format (e.g., a relational database), archaeologists need an appropriate data model that captures in a rigorous and consistent fashion the idiosyncrasies of units of archaeological observation, as well as the spatial and temporal interrelationships among them. Schloen proposes a hierarchical, 'item-based' data model, rather than the 'class-based' (tabular) data model which currently prevails. The item-based data model has the advantage of being straightforwardly represented in XML as a nested hierarchy of tagged elements with their attributes. Moreover, texts can be treated like any other type of artifact, as items in a spatial hierarchy with their own properties. Schloen concluded by presenting an XML tagging scheme dubbed ArchaeoML ('Archaeological Markup Language') which can represent any kind of archaeological data on any spatial scale, including the vector map shapes and raster images which belong to individual archaeological items..."

  • [June 12, 2001] "Forte ESP Toolkit Integrates Web Design Tools and XML Technologies." From Sun Microsystems. Feature story. June, 2001. "A new feature in the Forte for Java, release 3.0 Early Access software is the Forte for Java Enterprise Service Presentation Toolkit (ESP). This toolkit contains a set of enabling tools that simplify the development of Java 2 Platform, Enterprise Edition (J2EE) Web applications by integrating popular Web design tools and XML technologies. The toolkit is primarily directed at the application presentation layer, for example clients such as browsers, cell phones, and PDAs. Data sources can be any servlets, pages derived from JavaServer Pages (JSP) technology, JavaBeans, or Enterprise JavaBeans (EJB) components that deliver data as XML documents. The Forte ESP Toolkit offers the following benefits: (1) The roles of Web designer and programmer are clearly separated. (2) Web designers can author JSP pages that access dynamic XML data. (3) A single JSP page can access multiple data sources. (4) A single XML data source can be reused for multiple device types... The Forte ESP registry serves as an interface between the Web design and the programmer creating the back-end. The registry contains information about the location of XML data sources and the structure of the data, which can be entered in the registry by a programmer. The Forte ESP Toolkit also offers Web design tool extensions and a servlet that together read the registry, analyze the XML data, and give the Web designer a way to graphically map data to a page layout. The extensions insert custom tags in a JSP page to access the data dynamically at runtime. The design-time data analysis can use a sample, static XML document. This sample enables Web designers and back-end programmers to work in parallel based on their agreed data structure. Thus, the Web designer can do all the page layout, while the XML data source is still in development. This makes the whole team more productive, and the project is completed faster.The Forte ESP Toolkit also enables Web designers working with a single XML data source (created by a programmer) to generate JSP pages that incorporate XML data into HTML documents for browsers, cell phones, and PDAs. The Forte ESP Toolkit provides a JSP custom tag library that supports the embedding of Extensible Style Language (XSL) transformations (XSLT) in a JSP page. When one of these tags is called at runtime, the XSL processor transforms the XSL source document into the HTML needed by the presentation device. The programmers on a project can produce a single set of XSL sources, and the Web designers can use the Forte ESP Toolkit to automatically map those sources to the desired presentation device types and display formats..." See (1) the User's Guide, and (2) the overview in "Creating Web Services with Java Technology and XML."

  • [June 12, 2001] "TMQL Requirements (0.8.2)." Edited by Hans Holger Rath (empolis GmbH) and Lars Marius Garshol (Ontopia). "This document sets down the requirements that will guide the work with the Topic Map Query Language (TMQL), a query language for topic maps. The requirements herein presented document the intentions of the standards editors, as informed by the user community. Its purpose is to make it clear what can be expected to come out of the TMQL process, and to encourage the user community to make their needs known to the editors. This document has requirements for the TMQL standard as a whole, and for the query part of TMQL in particular. Additional requirements for the update part of TMQL will have to be defined at a later stage..." [Referenced in posting from Lars Marius Garshol. "WG3 proposes Hans Holger Rath (Germany) and Lars Marius Garshol (Norway) as editors of Topic Map Query Language (TMQL) and instructs them to produce a final requirements document for TMQL, and to prepare a response to comments from the National Bodies of the UK, US, and Japan.. At the ISO SC34 meeting in Berlin in May I officially replaced Ann Wrightson as co-editor of the TMQL standard with Hans Holger Rath. Based on the previous requirements document put together by Ann and Holger, as well as the discussions in Berlin, Holger and I produced a new TMQL requirements document. This document presents the editors' views on what the TMQL standard should be like and how it should relate to other standards. The editors would very much like to see feedback from the topic map community on these requirements, in order to ensure that the editors and the community are in agreement on the requirements to be fulfilled by the standard before work begins in earnest..." See: "(XML) Topic Maps."

  • [June 12, 2001] "State Courts Look to Pass Judgment on XML. Document-encoding technology seen by some in legal community as key to electronic filing services." By Ellen Messmer. In Network World Volume 18, Number 23 (June 04, 2001), page 10. "Lawyers, courts and legal cases generate mountains of paperwork, but a few states have taken the ground-breaking step to allow electronic filing of documents directly to court Web sites for processing over their intranets. While e-filing is catching on in states such as Georgia, New Mexico, California and Washington, the process of managing legal documents online raises thorny questions about the need for signatures, common security practices and technical standards for interoperability in document exchange. Counties today take varying approaches to e-filing, but there is a growing consensus that the document-encoding technology called XML can be the basis for statewide - and perhaps even nationwide - electronic filing. Georgia has led the charge, as its judiciary and universities have devised an XML tagging specification for the courts dubbed Legal XML. The specification will go on trial next week as four Georgia courts and four e-filing services show how it can be used to transmit XML-based documents to court servers and to competing e-filing services. These courts and document clearinghouses today can't easily share electronic documents. But the use of format-neutral XML tags encoded around content is expected to make it easier to process information received over the Internet as long as the application server receiving it supports XML, too... Georgia hopes to complete the testing of Legal XML by August, and if it works out, it's likely to be required for use in courts statewide. In addition, backers of Legal XML formed a nonprofit organization last winter (see to promote it as a national standard... 'The XML language is the most powerful I've seen to help us accelerate use of e-filing,'" says Bob March, clerk of court at the U.S. District Court in New Mexico, which has used e-filing for about three years. The New Mexico court is redesigning its court management system to support XML. The court in Albuquerque has a T-1 line for receiving legal documents processed through the @court hosted service for receipt by 14 judges..." "Legal XML Working Group."

  • [June 12, 2001] "Cool Graphics in XML." By Mark Gibbs. In Network World Volume 18, Number 23 (June 04, 2001), page 56. "The entire universe is going XML. Everywhere you turn, somebody is turning something into XML. This is, as Ms. Stewart is wont to say, 'A good thing.' The beauty of XML is it provides a whole new way of structuring 'stuff' that goes beyond just organizing data. With XML, data is imbued with meaning and purpose . . . wait a minute, that implicitly makes it information. Cool. Today, we'll look at one of the latest and most promising applications of XML - Scalable Vector Graphics (SVG)... SVG is terribly exciting if you're inclined toward trying whiz-bang graphics on the Web. If you're not, SVG is cool anyway. The reason for such coolness is what SVG can do. Quoting the W3C specification: 'SVG allows for three types of graphic objects: vector graphic shapes (e.g., paths consisting of straight lines and curves), images and text. Graphical objects can be grouped, styled, transformed and composited into previously rendered objects. The feature set includes nested transformations, clipping paths, alpha masks, filter effects and template objects. SVG drawings can be interactive and dynamic. Animations can be defined and triggered declaratively (that is, by embedding SVG animation elements in SVG content) or via scripting.' SVG has a document object model that provides access to all elements, attributes and properties in an SVG graphic document. There are also event handlers such as the ever-popular 'onmouseover' and 'onclick.' SVG's MIME type will be image/svg+xml when the W3C registers it as such -- apparently around the time when SVG is approved as a W3C recommendation (no date set). The specification also recommends that SVG files should have the extension .svg (all lower case) on all platforms..." See: "W3C Scalable Vector Graphics (SVG)."

  • [June 12, 2001] "XML worth a thousand pics." By Mark Gibbs. In Network World Volume 18, Number 24 (June 11, 2001), page 46. "Last week we were cruel and unusual - we gave you a chunk of Scalable Vector Graphics code but put off explaining it until this week. Now where were we. . . . The code [...] These are standard declarations that declare this is XML and that specify the Document Type Definition. DTD is a set of rules that define elements and attributes of an XML document and spell out how valid documents are structured. In effect, a DTD provides an integrity check on a specific type of XML content... This demo shows the basics of text and transformations under SVG. With SVG, there are three basic drawing elements: text, shapes and paths. Shapes include circles, squares and so on, while paths are chains of line segments that can optionally be specified as closed. You may have already surmised that SVG files, while relatively small, get complex very quickly. Hand coding SVG graphics is not for the faint of heart... To this end, a number of graphics tools have become available that support SVG images - for example, editors such as Adobe's Illustrator 9.0 and Jasc Software's WebDraw ... SVG is a standard to watch. Next week, we'll look at dynamic SVG. . ." See: "W3C Scalable Vector Graphics (SVG)."

  • [June 12, 2001] "SVG Reference in SVG." [Notice posted by] Jiri Jirat. See the SVG. From the post: "Hello XML and SVG developers, we have tried to display our site navigation using SVG: Click on any keyword in the section keywords on Notice also the difference in sizes - SVG wins with a huge margin (and it is not gzipped!)..."

  • [June 12, 2001] DOM Level 3 Abstract Schemas and Load and Save Specification Version 1.0. W3C Working Draft 07-June-2001. Edited by Ben Chang, Oracle; Andy Heninger, IBM; Joe Kesselman, IBM; Rezaur Rahman, Intel Corporation. Formerly known as 'DOM Level 3 Content Model and Load and Save'. "This specification defines the Document Object Model Abstract Schemas and Load and Save Level 3, a platform- and language-neutral interface that allows programs and scripts to dynamically access and update the content, structure and style of documents. The Document Object Model Abstract Schemas and Load and Save Level 3 builds on the Document Object Model Core Level 3... This chapter describes the optional DOM Level 3 Abstract Schema (AS) feature. This module provides a representation for XML abstract schemas, e.g., DTDs and XML Schemas, together with operations on the abstract schemas, and how such information within the abstract schemas could be applied to XML documents used in both the document-editing and AS-editing worlds. It also provides additional tests for well-formedness of XML documents, including Namespace well-formedness. A DOM application can use the hasFeature method of the DOMImplementation interface to determine whether a given DOM supports these capabilities or not. One feature string for the AS-editing interfaces listed in this section is 'AS-EDIT' and another feature string for document-editing interfaces is 'AS-DOC'. This chapter interacts strongly with the Load and Save chapter, which is also under development in DOM Level 3. Not only will that code serialize/deserialize abstract schemas, but it may also wind up defining its well-formedness and validity checks in terms of what is defined in this chapter. In addition, the AS and Load/Save functional areas will share a common error-reporting mechanism allowing user-registered error callbacks..." See: "W3C Document Object Model (DOM)." [cache]

  • [June 12, 2001] Document Object Model (DOM) Level 3 Core Specification Version 1.0. W3C Working Draft 05-June-2001. Edited by Arnaud Le Hors, IBM; Gavin Nicol, Inso EPS (for DOM Level 1); Lauren Wood, SoftQuad, Inc. (for DOM Level 1); Mike Champion, ArborText (for DOM Level 1 from November 20, 1997); Steve Byrne, JavaSoft (for DOM Level 1 until November 19, 1997). Latest version URL: This specification defines the Document Object Model Core Level 3, a platform- and language-neutral interface that allows programs and scripts to dynamically access and update the content, structure and style of documents. The Document Object Model Core Level 3 builds on the Document Object Model Core Level 2." See: "W3C Document Object Model (DOM)." [cache]

  • [June 11, 2001] "Microsoft's Proposal for Directory Services Markup Language v2.0." Posted by Peter J. Houston (Microsoft Corporation, Group Program Manager, Active Directory). "The Directory Services Markup Language v1.0 (DSMLv1) provides a means for representing directory structural information as an XML document. DSMLv2 goes further, providing a method for expressing directory queries and updates (and the results of these operations) as XML documents. DSMLv2 documents can be used in a variety of ways. For instance, they can be written to files in order to be consumed and produced by programs, or they can be transported over HTTP to and from a server that interprets and generates them. DSMLv2 functionality is motivated by scenarios including: (1) A smart cell phone or PDA needs to access directory information but does not contain an LDAP client. (2) A program needs to access a directory through a firewall, but the firewall is not allowed to pass LDAP protocol traffic because it isn't capable of auditing such traffic. (3) A programmer is writing an application using XML programming tools and techniques, and the application needs to access a directory. In short, DSMLv2 is needed to extend the reach of directories. DSMLv2 is not required to be a strict superset of DSMLv1, which was not designed for upward-compatible extension to meet new requirements. However it is desirable for DSMLv2 to follow the design of DSMLv1 where possible... DSMLv2 focuses on extending the reach of LDAP directories. Therefore, as in DSMLv1, the design approach is not to abstract the capabilities of LDAP directories as they exist today, but instead to faithfully represent LDAP directories in XML. The difference is that DSMLv1 represented the state of a directory while DSMLv2 represents the operations that an LDAP directory can perform and the results of such operations. Therefore the design approach for DSMLv2 is to express LDAP requests and responses as XML documents. For the most part DSMLv2 is a systematic translation of LDAP's ASN.1 grammar (defined by RFC 2251) into XML-Schema. Thus, when a DSMLv2 element name matches an identifier in LDAP's ASN.1 grammar, the named element means the same thing in DSMLv2 and in LDAP..." See also comments in the posting. References: "Directory Services Markup Language (DSML)." Also the announcement: "Microsoft Furthers Adoption of Directory Standards."

  • [June 11, 2001] "Special Report: The Language Of XML Security." By Pete Lindstrom. In Network Magazine (June 2001), pages 56-60. ['While XML documents must be protected from prying eyes and the corrupting influences of the Internet, XML can also be a security tool for many applications.'] "XML provides a structured way to add context to data so that it can be shared among different applications. Where old systems used ASCII text in batch file transfers, XML systems support 'transitional datasets' with predefined data records that are processed in real time through message queues and application servers. But accompanying XML's many advantages are critical security issues. XML is primarily for Internet-based communications; thus, it provides the opportunity for others to sniff or spoof information. For example, while XML allows medical records to be more efficiently shared between multiple parties such as doctors, insurers, and pharmacists, security breaches with such data can have very adverse consequences. . . The primary security issues surrounding XML fall into two basic categories: the security of XML instances themselves, and the use of XML technology to enhance security for a wider range of applications. This article discusses the importance of this distinction, processes that make XML data more secure, and how to apply XML on a broader scale to fortify the security of data exchange...When ad hoc working groups began discussing XML security issues, they basically split into two different directions. One category of groups is focused on the security of XML instances and documents, regardless of their use. The primary topics in this area are the use of digital signatures applied to XML instances (XML Signature) and the encryption of all or partial XML instances (XML Encryption). Another category of groups seeks to leverage XML's capabilities to further broaden security functions. Their primary focus is on the activity of the Organization for the Advancement of Structured Information Standards' (OASIS) Security Services Technical Committee, currently at work on the Security Assertion Markup Language (SAML). Other initiatives include Verisign's XML work, in particular the XML Key Management Specification (XKMS) and XML Trust Assertion Services Specification (XTASS)..." See: (1) "XML Digital Signature (Signed XML - IETF/W3C)"; (2) "XML and Encryption"; (3) OASIS Technical Committee work on Security Assertion Markup Language (SAML) and Access Control.

  • [June 11, 2001] "Architectures in an XML World." By Joshua Lubell (National Institute of Standards and Technology). Paper prepared for Extreme Markup Languages 2001, Montréal, August 17, 2001. "An often overlooked method for schema and DTD reuse is the specification of architectures. An architecture is a collection of rules (for creating and processing a class of documents) that application designers can apply in defining their XML vocabularies. An XML document using an architecture contains special architecture-support attributes that describe how its elements, attributes, and data correspond to their architectural counterparts. Software tools for processing architectures are called architecture engines. APEX is a non-validating generic architecture engine written in XSLT. Input to APEX consists of an XML document plus stylesheet parameters identifying an architecture used by the document. APEX produces as output an architectural document conforming to the architecture specified by the stylesheet parameters and the input document's architecture support attributes. Experience with APEX demonstrates that architectures and XSLT are complementary and that architectures can fulfill a role not well served by alternative approaches to reuse." APEX description from the web site: "APEX (Architectural Processor Employing XSLT) implements a simple subset of the Architectural Form Definition Requirements (AFDR) specified in Annex A.3 of ISO/IEC 10744:1997. APEX behaves similarly to David Megginson's XAF package for Java and differs from the AFDR in the same ways as XAF. Input to APEX consists of an XML document plus stylesheet parameters identifying an architecture used by the document. APEX produces as output an architectural document, i.e., an XML document containing only the markup and data defined by the architecture specified..." For background, see the introduction to Architectures, Architectural Forms, Architecture Support Attributes, and Architectural Processing. Source: see the program listing. References: "Architectural Forms and SGML/XML Architectures." **2001-08-02 Note: see "Architectures in an XML World" online.

  • [June 11, 2001] "W3C XML Schema Made Simple." By Kohsuke Kawaguchi. From June 6, 2001. ['The W3C XML Schema Definition Language can be easy to learn and use, claims Kohsuke Kawaguchi -- you just need to know what to avoid.'] "It's easy to learn and use W3C XML Schema once you know how to avoid the pitfalls. You should at least learn the following things. (1) Do use element declarations, attribute groups, model groups, and simple types. (2) Do use XML namespaces as much as possible. Learn the correct way to use them. (3) Do not try to be a master of XML Schema. It would take months. (4) Do not use complex types (why?), attribute declarations (why?), or notations (why?). (5) Do not use local declarations (why?). (6) Do not use substitution groups (why?). (7) Do not use a schema without the targetNamespace attribute (aka chameleon schema.) (why?) You won't lose anything by following these guidelines, as the rest of this article demonstrates. Too long to remember? Then try the one-line version: 'Consider W3C XML Schema as DTD + datatype + namespace'. The rest of this article justifies these recommendations... There are many pitfalls in XML Schema that should be avoided, which will make your life easier because you'll have less to learn. And you won't lose the expressiveness of W3C XML Schema..." Note: The "advice" in this article is obviously a collection of personal opinion; compare the work of Roger L. Costello in "XML Schemas: Best Practices Homepage."

  • [June 11, 2001] "XML Q&A: Big Documents, Little Attributes." By John E. Simpson. From June 6, 2001. ['This month our Q&A column tackles storing large numbers of records in XML ("Q: How do I process a big XML document?"), and explains the use of attribute definitions in DTDs ("Q: I'm confused about specifying attribute values in a DTD".']

  • [June 11, 2001] "Transforming XML: Using the W3C XSLT Specification." By Bob DuCharme. From June 6, 2001. ['For advanced XSLT use, the W3C's XSLT specification can be a handy tool. This guide helps you read the specification and clears up confusing terms.'] "The W3C's XSLT Recommendation (available at is a specification describing the XSLT language and the responsibilities of XSLT processors. If you're new to XSLT, the Recommendation can be difficult to read, especially if you're not familiar with W3C specifications in general and the XML, XPath, and Namespaces specs in particular. This month, I'd like to summarize some of the concepts and terms that are most likely to confuse an XSLT novice reading the XSLT Recommendation..." For related resources, see "Extensible Stylesheet Language (XSL/XSLT)."

  • [June 11, 2001] "XML-Deviant: Time for Consolidation." By Leigh Dodds. From June 6, 2001. ['Is XML changing the way applications are being designed? If so, what tools should you use to model these applications?'] "As Edd Dumbill noted in last week's XML-Deviant, the XML Schema specification may have been delivered, but the discussions are far from over. The Schema Working Group have delivered a meaty specification, and it will take some time for developers to digest. Expectations have already been raised about the features that may be delivered in Schema 1.1, and the prospect of Schema 2.0 has already been considered. It's premature to begin thinking too much about what these specifications might encompass until there's sufficient Schemas experience to allow it to be assessed. We must not forget the continuing work on other schema languages, most notably RELAX NG, the unification of RELAX and TREX being carried out at OASIS. Michael Fitzgerald observed that it's 'some of the more important work happening in XML now.' Rick Jelliffe, a strong advocate of a plurality of schema languages, characterized the current situation as an interim period, and XML Schema 1.0 as a 'provisional' specification..." See "RELAX NG."

  • [June 09, 2001] "[W3C] XML Schema Tutorial." By Roger L. Costello (of XML Technologies). The main tutorial is a PPT slide set with some 276 slides. The slides reference 36 worked examples and 14 lab exercises. From the June 9, 2001 update note: "The tutorial is now updated to the Recommendation specification (i.e., the latest W3C specification). It includes a complete set of labs with answers. All examples and lab answers are complete and have been validated using Henry Thompson's schema validator, xsv [self-installing Win32 .exe], which is bundled in with the tutorial (thanks Henry!). It also includes a Javascript program, written by Martin Gudgin, that enables you to use MSXML4.0 (thanks Martin!)... I have provided a number of DOS batch files (i.e., validate.bat, run-examples.bat, run-lab-answers.bat) to make it easy for you to schema validate your XML files. I am continually adding new material to this tutorial. Please check back periodically for updates..." Related references in "XML Schemas."

  • [June 08, 2001] "Understanding ebXML. Untangling the business Web of the future." By David Mertz, Ph.D. (Phenomenological unifier, Gnosis Software, Inc.). From IBM developerWorks. June 2001. ['ebXML is a big project with a lot of pieces. In this article David Mertz outlines how the pieces all fit together. This overview provides an introduction to the ebXML concept and then looks a bit more specifically at the representation of business processes, an important starting point for ebXML implementations. Two short bits of sample code demonstrate the ProcessSpecification DTD and a package of collaborations.'] "When you read about ebXML, it's difficult to get a handle on exactly what it is -- and on what it isn't. The 'eb' in ebXML stands for 'electronic business,' and you can pronounce the phrase as 'electronic business XML,' 'e-biz XML,' 'e-business XML,' or simply 'ee-bee-ex-em-el.' On one hand, ebXML seems to promise a grand unification of everything businesses do to communicate with each other. On the other hand, one could be forgiven for thinking that ebXML amounts to little more than a pious, but vacuous, declaration that existing standards are worth following. As with every 'next big thing,' the truth lies somewhere in the middle... Sorting out ebXML involves a few steps. Perhaps the first thing necessary for understanding the details of ebXML is to digest an alphabet soup of new acronyms and other special terms. There are a number of these terms in the sidebar (ebXML terminology) to consider before looking at the whole 'vision' of ebXML interactions. Additional terms fit into the entire system, but these particular terms make a good starting point. With this new vocabulary in mind, and a bit of the following background on where ebXML comes from, you can begin to make sense of how all of the differing processes in ebXML hold together. After describing what ebXML does (at least in outline) at the beginning of this article, a final section looks in more detail at the Business Process Specification Schema, which makes up one of the most important elements of ebXML's underlying infrastructure... The UN/CEFACT Modeling Methodology (UMM), which utilizes UML, may be instrumental in modeling the ebXML Business Processes. However, such modeling is simply a recommendation, not a requirement. In any case, since this article targets XML developers and does not address OOD (object-oriented design), it is more interesting herein to look at the representation of the models in XML documents conformant to the Business Process Specification DTD and XML Schema. The DTD (named 'ebXMLProcessSpecification-v1.00.dtd') appears, at this time, to be the primary rule representation. Both this DTD and a W3C XML Schema, which is (presumably) semantically and syntactically compatible, may be found in the EbXML_BPschema_1.0 recommendation... ... The approval of ebXML specifications is moving along at a fairly rapid pace (certainly for a standards organization). My own estimation is that it will take another year or two to shake out all of the issues and details for such an ambitious vision. It appears, however, that ebXML is on the way to widespread use a few years down the road. Now is the time, therefore, for businesses to begin a serious consideration of their own ebXML implementation plans." Note especially the sidebar, "ebXML Terminology." See: "Electronic Business XML Initiative (ebXML)."

  • [June 06, 2001] "W3C Works on Standards Development." By Stephen Lawson. In InfoWorld (June 1, 2001). "Conscious that the future success of e-commerce and Web services hinges on interoperability between different vendors' products, the World Wide Web Consortium (W3C) attempted to fill in some of the gaps in the current array of standards at its 10th annual World Wide Web Conference last month in Hong Kong. The W3C's painstaking standards work is critical for enabling companies to use the Web for commerce, according to Roger Cutler, a senior staff research scientist at the Chevron Information Technology division of Chevron U.S.A. and a member of the W3C for the past year. With this in mind, IBM revealed it is preparing to propose to the W3C a new standard dubbed the WSFL (Web Services Flow Language). WSFL is designed to describe how a series of functions would work in providing Web services, according to Robert Sutor, director of e-business standards strategy at IBM in Somers, N.Y. It would help developers and corporate users specify the many pieces they need to plug into workflow applications or business processes and the sequence in which they should operate, he says. For example, WSFL might include a way to describe how well a service, such as a transaction engine, should perform. That would enable a Web services provider to guarantee QoS (quality of service), according to Sutor. Getting the industry to agree on such workflow standards has been a common problem over the years, Sutor says, acknowledging that other initiatives compete with the WSFL, including ones from the Workflow Management Coalition and the Business Process Management Initiative. But Sutor senses a growing desire among many to consolidate the various proposals and technologies into one, and adds that he doesn't see why intelligent compromises can't be made. '[WSFL] is not something we are trying to force down people's throats as the de facto standard. We think it has a lot of good ideas in it that are very consistent with some of the other Web services standards people are working on,' Sutor says. Sutor feels confident that the W3C will get a Web Services workflow group charted by year's end, which can bring a number of proposals together into a single, cohesive standard..." See: "Web Services Flow Language (WSFL)."

  • [June 06, 2001] "Microsoft Continues Web Service Leadership With New XML Specs." By David Smith [Gartner Internet Strategies]. 25-May-2001. ['Microsoft's posting of specifications for three XML technologies again shows its leadership in developing Web service standards and may herald another cooperative effort with IBM to get a new standard approved by the World Wide Web Consortium (W3C)'] "...The announcement of these new specifications indicates Microsoft's continued leadership in XML standards development. Microsoft has previously demonstrated with SOAP, WSDL and, to some extent, UDDI that its first step toward standardization of these technologies is to post the specifications publicly. Six to 12 months later Microsoft submits the specifications to a standards organization, typically the W3C. That body's working group for XML Protocols (XMLP), which focuses on standardizing specifications for Web service technologies, is scheduled to produce its final recommendations by August 2001 and to disband by April 2002. Gartner believes Microsoft's introduction of these three technologies will convince W3C to extend the XMLP Working Group to focus on additional technologies and will extend the group by at least one year (0.8 probability). The addition of SOAP-RP allows SOAP to be routed through intermediate transports. Although SOAP 1.1 was already independent of HTTP transport, a single transport was required for an entire SOAP interaction. DIME allows for richer binary content such as images and audio to be more efficiently handled in an infrastructure optimized for XML text-based payloads. XLANG, the language implemented in BizTalk, which allows orchestration of Web services into business processes and composite Web services, is perhaps the most important of the three new specifications. Microsoft previously achieved recognition for WSDL by working with IBM. History may repeat itself here since IBM now has a similar technology to XLANG: In April, IBM published WSFL (i.e., Web Services Flow Language). Gartner expects IBM and Microsoft to jointly agree to submit a proposal to W3C that combines XLANG and WSFL by year-end 2001 (0.7 probability)..." See: (1) "XLANG" and (2) "Microsoft Publishes XML Web Services Specifications."

  • [June 06, 2001] "Web Services and XML Technologies CD." From IBM developerWorks. Announced in a posting from Jeffrey I Condon. "The Web services CD contains a selection of tools, examples, and articles for designing and developing Web services applications. It contains all the Web services applications that IBM has released to the general public, in addition to other useful tools that you may need. In addition to the software, the CD contains a set of all recent articles that have been published on the developerWorks Web services zone. These articles provide background information, tutorials, and news covering the protocols, techniques, and code used for creating Web services... IBM developerWorks is offering a Web Services and XML technology CD containing the following resources: All IBM developerWorks Web services articles; IBM Web services whitepapers; IBM developerWorks Web services newsletter; Web Services ToolKit; WSDL ToolKit; Web services Development Environment; Web Services Process Management ToolKit; Gourmet2Go Web services application; AggregationDemo Web services application; The IBM MQSeries transport for SOAP; Web Services Browser plug-in; WebSphere Preview Technologies for Developers; Tivoli Managment Extensions for Java (TMX4J). Contact: Jeffrey I Condon.

  • [June 06, 2001] "XML for Visio Scenarios." From Microsoft Corporation. June 2001. ['This article illustrates how XML for Visio can be used to extract Visio data for use in solution development, data analysis, text localization, Web publication, and database interoperability.'] "This article describes a new file format, XML for Visio, for native data in Microsoft Visio 2002. Extensible Markup Language (XML) is a tagged data format that is platform independent, vendor neutral, standardized by the World Wide Web Consortium (W3C), and widely available. XML is actually a metalanguage that forms the basis for other languages or vocabularies. Combined with W3C open standards and the ability to provide its own data definitions, XML is an enabling technology that provides the syntax for the expression of rich open data formats. The standard provides language and character set neutrality, unambiguous rules for white space, escape characters, extensibility, mixing of data models, and other syntactic details. Visio 2002 has defined an XML vocabulary (schema) that expresses all the Visio drawing, template and stencil data in its internal model. The XML extension rules allow users to attach and maintain custom data to a drawing. The familiarity of XML and the availability of standard tools give applications access to the Visio model without requiring the full Visio application. Open standards such as XML expand the opportunities and means of sharing and exchanging data. The XML for Visio scenarios in this article highlight some of the potential uses for this new format. If you are familiar with Visio but new to XML, the following summary will help you understand how to use the new XML for Visio format to its best advantage... XML for Visio Format: All types of Visio documents (drawings, stencils, and templates) can be saved in the XML for Visio format. Visio provides tag definitions for its document data in the XML for Visio schema, a separate document that lists the tags and their containment relationships. The schema generally follows the Visio object model and has predefined places for customized tags, which solution providers can use for preserving custom data. Solution providers can extract any XML data from the Visio documents for external processing by using existing XML tools, and then modify that data or create new drawings to display the results. Solution providers can extract customized shape definitions from the Masters section of the XML for Visio tag hierarchy. These shapes can then be shared, modified, or included in their custom solutions. Solution providers may be able to convert drawings to and from other drawing file formats using the XML for Visio schema, and by using XML as an import/export file format." See also (1) "Visio 2002 Incorporates XML Support with XML for Visio Format", (2) the Visio Developer Center, and (3) Visio Schema XDR, available for download.

  • [June 06, 2001] "Securing XML Documents with Author-X." By Elisa Bertino, Silvana Castano, and Elena Ferrari. In IEEE Internet Computing Volume 5, Number 3 (May/June, 2001). ['This Java-based access-control system supports secure administration of XML documents at varying levels of granularity.'] "The widespread adoption of XML for Web-based information exchange is laying a foundation for flexible granularity in information retrieval. XML can 'tag' semantic elements, which can then be directly and independently retrieved through XML query languages. Further, XML can define application-specific document types through the use of document type definitions (DTDs). Such granularity requires mechanisms to control access at varying levels within documents. In some cases, a single-access control policy may apply to a set of documents; in other cases, different policies may apply to fine-grained portions of the same document. Many other intermediate situations also arise... The typical three-tier architecture for accessing an XML document set over the Web consists of a Web client, network servers, and the back-end information system with a suite of data sources. In this framework, public-key infrastructures (PKIs) represent an important development for addressing security concerns such as user authentication. But such facilities do not provide mechanisms for access control to document contents nor for their release and distribution. Author-X is a Java-based system, developed at the University of Milan's Department of Information Science, to address the security issues of access control and policy design for XML documents. Author-X supports the specification of policies at varying granularity levels and the specification of user credentials as a way to enforce access control. Access control is available according to both push and pull document distribution policies, and document updates are distributed through a combination of hash functions and digital signature techniques. The Author-X approach to distributed updates allows a user to verify a document's integrity without contacting the document server. In this article, we will first illustrate the distinguishing features of credential-based security policies in Author-X, then examine the system's architecture, and conclude with details about its access-control and administration engines... In general, security policies state who can access enterprise data and under which modalities. Once policies are stated, they are implemented by an access-control mechanism. In Author-X, security policies for XML documents have the following distinguishing features: They can be set-oriented or instance-oriented, reflecting support for both DTD- and document-level protection. They can be positive or negative at different granularity levels, enforcing differentiated protection of XML documents and DTDs. They include options for controlled propagation of access rights, whereby a policy defined for a document or DTD can be applied to other semantically related documents and DTDs (or portions of them). They reflect user profiles through credential-based qualifications. Author-X security policies are implemented through six basic components: User Credentials, Protection Objects, Access Modes, Signs, Propagation Options, and Policy Base... The Web community generally regards XML as the most important standardization tool for information exchange and interoperability, and we believe that XML access control will constitute the core security mechanism of Web-based enterprise architectures. The current Author-X prototype is built on top of the eXcelon XML server and supports browsing and updating of DTD-based XML sources. We plan to extend protection toward secure access to Web pages and compliance with XML schemas. Additionally, we will experiment with incorporating Author-X within Web-based enterprise information system architectures by focusing on performance issues. In particular, we will study XML-based solutions for certifying user credentials as well as access-control schemes and architectures for securely disseminating information. We have proposed a preliminary set of XML-based access-control schemes for distributed architectures, and we are working on a prototype extending the Author-X functionalities accordingly."

  • [June 06, 2001] "JXTA: A Network Programming Environment. [Industry Report.]" By Li Gong (Sun Microsystems). In IEEE Internet Computing Volume 5, Number 3 (May/June, 2001). "JXTA technology is a network programming and computing platform that is designed to solve a number of problems in modern distributed computing, especially in the area broadly referred to as peer-to-peer computing, or peer-to-peer networking, or simply P2P... JXTA technology is designed to provide a layer on top of which services and applications are built. We designed this layer to be thin and small, while still offering powerful primitives for use by the services and applications. We envision this layer to stay thin and small as this is the best approach both to maintaining interoperability among competitive offerings from various P2P contributors... In theory, JXTA can be independent of any format used to encode advertisement documents and messages. In practice, it uses XML as the encoding format, mainly for its convenience in parsing and for its extensibility. Three points worth noting about the use of XML: If the world decides to abandon XML tomorrow and uses YML instead, JXTA can be simply redefined and recoded to use the YML format. The use of XML does not imply that all peer nodes must be able to parse and create XML documents. For example, a cell phone with limited resources can be programmed to recognize and create certain canned XML messages, and still participate in a network of peers. To keep version 1.0 small, we used a light-weight XML parser that supports a subset of XML. We are working toward normalizing this subset according to an existing effort called MicroXML. JXTA provides a network-programming platform specifically designed to be the foundation for peer-to-peer systems. As a set of protocols, the technology stays away from APIs and remains independent of programming languages. This means that heterogeneous devices with completely different software stacks can interoperate through JXTA protocols. JXTA technology is also independent of transport protocols. It can be implemented on top of TCP/IP, HTTP, Bluetooth, Home-PNA, and many other protocols. We have developed a JXTA Shell, similar to the Unix shell, for writing scripts. Like the Unix shell, the JXTA Shell helps users learn a lot about the inner workings of JXTA during the process of writing scripts..."

  • [June 06, 2001] "Java Vendors Need to Broaden Standards Support." By Mitch Wagner. In InternetWeek (June 4, 2001). "While vendors have been doing a good job using standard interfaces to build new, Java-based e-business applications, they need to go further to be sure that the applications interact with legacy enterprise software, said analysts and users. Leading Java software vendors such as Sun Microsystems, IBM, Hewlett-Packard, Oracle and BEA Systems are using Java 2 Enterprise Edition (J2EE) and associated standards to build new applications with browser-based front ends, and connect those applications to back-end legacy systems, said Nick Gall, an analyst with Meta Group. But the result is often a 'stovepipe' application that can't interoperate with other applications, he said. While vendors are supporting standards in building application servers, they need to commit to taking the next step and supporting standards in their Enterprise Application Integration (EAI) platforms. Right now, companies like IBM with WebSphere and BEA with WebLogic support many EAI standards, but also compete with each other by offering incompatible, proprietary technologies in areas such as workflow and messaging, Gall said... BEA, Hewlett-Packard, Oracle and Sun responded to the call for standards by extending their Java middleware to incorporate new standardized APIs. The new versions of the applications were introduced at the Sun JavaOne conference, the annual gathering of Java developers, in San Francisco this week. [week of 2001-04-04] BEA introduced WebLogic 6.1, which automatically binds Java 2 Enterprise Edition (J2EE) applications to Web services standards. Developers write applications as Enterprise Java Beans (EJBs), and WebLogic Server 6.1 automatically adds the appropriate Simple Object Access Protocol (SOAP) and Web Services Description Language (WSDL) interfaces to allow those applications to be invoked as services over the Web. BEA is also introducing several components in the WebLogic Integration 2.0 server. The server support for Java Connector Architecture (JCA), a standard interface for invoking enterprise applications over the Internet. WebLogic Integration also includes translators for EDI to XML, to allow developers to use EDI software as a service. And the WebLogic Integration server includes an extensible framework for XML protocol support, to allow developers to plug in modules for various XML protocols, including BizTalk, RosettaNet, ebXML. The server also includes a business-process integration module, to allow developers to create business processes using a Visio-style interface, and then expose those business processes to operate as services over the Internet. Hewlett-Packard shipped Total-e-Server Version 7.3, the HP application server. The new version adds JCA support to connect the Web server with enterprise software. The company also plans to introduce the HP Internet Server, which HP said it considers to be either a high-end Web server or a low-end application server, depending on how you look at it. The server runs HTTP, JSP and Java Servlets, which are server-side Java programs..."

  • [June 06, 2001] "Securing Web Services using the Java Platform and XML." By Andrew Brown, Loren Hart, and Monica Pawlan. From Java Developer Connection. June 15, 2001. "In today's fast-moving world of e-commerce and information technology, savvy companies realize that to stay competitive they have to make their products and services available over the Internet. Application-to-application cooperation and communication where one company needs the products or services of another to conduct business is at the core of Web-based business-to-business communications. To enable smooth, reliable, secure, and standardized cooperation and communication, more and more companies are taking advantage of Web services. Initiatives like the Universal Discovery, Description, and Integration (UDDI) specification define ways to discover and integrate Web-based services from all over the world. Sun Microsystems with its new Web services strategy is no exception especially given that its platform-independent and versatile Java technology is ideal for developing Web services... a Web service might be made up of companies (providers) in the same business sector who create software standards for setting up services to buy and sell parts. A Web service architecture is made up of providers who publish the availability of their services; brokers who register and categorize provider services and make search engines available; and requesters who use brokers to find a provider service. Web service providers need a communication standard and a way to verify the identity of companies and individuals with whom they are doing business. Extensible Markup Language (XML) has become the communication standard and Public Key Infrastructure (PKI) the verification standard. This article describes an example scenario where companies cooperate and communicate over the Internet to buy and sell parts. It also presents an example program written in the Java programming language that uses VeriSign's Trust Web service, an implementation of the XML Key Management Specification, to do cryptographic key management over the Internet using XML messaging... The XKMS specification is open, which means any company can implement an XKMS service and count on full interoperability. To encourage developers to begin using these new Web services, VeriSign has sponsored a site devoted to XML Trust Services, called the XML Trust Center, where developers can find the Java implementation of the XKMS client API. The XKMS client API includes an implementation of the XML Digital Signature specification, which provides API packages for digitally signing XML documents. An application can use Java APIs to generate cryptographic key pairs and use XKMS APIs to register those keys with an XKMS service. Public-private key pairs are registered with an XMKS service by sending the proper information about the keys in an XKMS XML message. This combination of APIs and services lets applications offload all key management operations, including key revocation in the event a key is compromised, and key recovery in the event a key is lost..." See: "XML Key Management Specification (XKMS)."

  • [June 06, 2001] "Souping Up Wireless." By Anne Chen. In ZDNet Ecommerce (June 3, 2001). "Sometimes Cindy Groner must feel like she's swimming in circles in a sea of acronyms. As director of mobile traveler services at Sabre Inc., Groner oversees a team of wireless developers who, in the process of coming up with new wireless services for Sabre's customers, must create multiple versions of each page. One in the Wireless Markup Language format for Wireless Application Protocol devices. Another in Handheld Device Markup Language for devices using the Openwave Systems Inc. browser. And another in Compact HTML for Nippon Telephone and Telegraph Corp.'s i-mode devices. Every time content or wireless services change, developers must test the application on multiple devices to make sure the experience is the same on all platforms, a strenuous process. . . Groner said she believes help is on the way. And it doesn't even matter that it's coming in the form of yet another acronym: XHTML. It stands for Extensible HTML, and it's a rapidly emerging standard that could soon allow e-businesses such as Sabre to write online applications just once and deliver them across multiple platforms -- whether wireless or PC-based. Having spent millions of dollars developing and deploying wireless applications for multiple devices -- cell phones, PDAs (personal digital assistants) and televisions -- companies such as Sabre, IBM and Inc. now are already planning to make XHTML a critical component of their wireless e-business strategies by pilot testing the new standard even before it's supported on mobile devices or embedded into networks... XHTML solves the problem by being more modular and structured than its predecessor. Essentially a marriage between HTML and XML (Extensible Markup Language), XHTML is able to ensure that only code suitable for smaller browsers is transmitted to wireless devices, something HTML is unable to do. Because its mobile subset, XHTML Basic, is essentially the same language designed to deliver Web content to devices ranging from mobile phones and PDAs to pagers and television-based browsers, it lets programmers write content for PCs and mobile devices at the same time without conversion. XHTML Basic, which was recommended as a standard by the W3C last December, includes features from many existing wireless protocols, enabling developers to take advantage of the larger color screens and greater graphics capabilities of new devices designed for networks that run at higher speeds. The wireless industry's big guns, including the WAP Forum and Nippon Telegraph and Telephone, of Tokyo, have already announced that they will support XHTML as the standard for next-generation browsers and other mobile devices (see story below). Handset manufacturers Nokia Corp., Motorola Inc. and Ericsson SpA, and mobile operators Vodafone Group plc., Orange SA and Telecom Italia SpA are also backing the standard and plan to develop products, content and services based on XHTML..." See: "XHTML and 'XML-Based' HTML Modules."

  • [June 06, 2001] "Services Battle to Heat Up." By Roberta Holland. In eWEEK (June 4, 2001). "The legal wranglings between Microsoft Corp. and Sun Microsystems Inc. may be over, but the competitive ones are not. This week, as Sun executives preach the Java gospel at the JavaOne conference, they will wage a battle on a new front: Web services. While the Sun ONE (Open Net Environment) strategy is seen as lagging behind Microsoft's .Net, Sun will prop up Java 2 Enterprise Edition as the best platform for Web services, trying to leverage J2EE's success in the enterprise. Sun also will outline its road map for incorporating additional Web services standards into J2EE. 'Certainly the competition of the next year or two is going to be in Web services, and, right now, it seems Microsoft is out in front,' said Peter Horan, CEO of Inc., an online resource for developers. But Horan, in Palo Alto, Calif., said there is still tremendous momentum around Java the language, adding that better tools are necessary to keep Java adoption growing. "Corporate developers and IT managers still believe the tool sets for C++ and Visual Basic are more fully developed," he said. While Java tools have improved greatly since the language's early days, both Sun and Microsoft need to deliver tools to build Web services, developers said. Both companies have sought to enlist partners in the battle, including Bowstreet Inc., Genuity Inc. and i2 Technologies Inc. on the Sun side and eBay Inc., Fujitsu Software Corp. and ActiveState Tool Corp. for Microsoft. Chief among Sun's goals this week is to show how it is easier for developers to build Web services using Java. Among the announcements at the conference in San Francisco will be a bundle for developers of Java APIs for XML (Extensible Markup Language) parsing, packaging and routing. Sun will also unveil a new version of its Forte for Java tool, with native support for XML; SOAP (Simple Object Access Protocol); Universal Description, Discovery and Integration; and Web Services Description Language... Oracle Corp., of Redwood Shores, Calif., will unveil its Oracle9i application server, its first version to be certified J2EE- compliant, with performance upgrades, SOAP support and new caching technology. The middleware division of Hewlett-Packard Co., in Palo Alto, Calif., will release its implementation of Java Services Framework, a specification that describes how to assemble components into Java server applications, along with a new Internet server. WebGain Inc. plans to release upgrades to several of its Java tools, including the WebGain Studio suite..."

  • [June 05, 2001] "XSD for Visual Basic Developers." By Yasser Shohoud. From the DevXpert Web Services Depot [for VB Developers]. May 2001. "The W3C's XML Schema is sometimes referred to as XML Schema Definition language or XSD for short. XSD is an XML-based grammar for describing the structure of XML documents. A schema-aware validating parser, like MSXML 4.0, can validate an XML document against an XSD schema and report any discrepancies. To solve the [invalid invoice document] problem outlined above, you'd create an XSD schema that describes the invoice document. You'd then make this schema available to the UI tier developers. The schema is now part of the 'interface contract' between the middle tier and the UI. While the application is in development, the UI tier can validate the invoice documents that they send against that schema to ensure they are valid. Similarly, the SaveInvoice function can validate the input invoice document against the schema before attempting to process it. Now if you change the invoice document to support a new feature, you must change the schema accordingly. Now the UI team tries to validate the invoice documents they're sending and this validation fails so they immediately realize that the schema has changed and that they must change the invoice documents they are sending. This can also help catch version mismatch problems where you have an older client trying to talk to a newer middle tier or vice versa.... In this brief introduction to XSD, you've seen how you can make a Visual Basic class to an XSD schema and how to use that schema with MSXML 4.0 to validate documents. You also learned the relation between XSD and XML namespaces and how namespaces can be used to combine elements from different schemas in one XML document. This tutorial barely scratches the surface of what you can do with XSD schemas. There are many more features and details you might be interested in (or might not care about). Once you are comfortable with the concepts explained in this tutorial, check out the XML Schema Primer (part of the XSD specification) which goes into a lot more details about XSD with many examples..." See "XML Schemas."

  • [June 05, 2001] "Introduction to UDDI." From the DevXpert Web Services Depot [for VB Developers]. June 02, 2001. ['This article walks you through the basics of the Universal Description, Discovery, and Integration. It describes scenarios where UDDI would be useful and shows you how you can implement UDDI in those scenarios. Learn what UDDI is all about, how it works, and how you can program it.'] "One of the primary potential uses of Web services is for business-to-business integration. For example, company X might expose an invoicing Web service that the company's suppliers use to send electronic invoices. Similarly, a vendor V might expose a Web service for placing orders electronically. If company X wanted to purchase computer equipment electronically, it would need to search for all vendors who sell computer equipment electronically. To do this, company X needs a yellow pages-type directory of all businesses that expose Web services. This directory is called Universal Description, Discovery, and Integration or UDDI. UDDI is an industry effort started in September of 2000 by Ariba, IBM, Microsoft, and 33 other companies. Today, UDDI has over 200 community members. Like a typical yellow pages directory, UDDI provides a database of businesses searchable by the type of business. You typically search using business taxonomy such as the North American Industry Classification System (NAICS) or the Standard Industrial Classification (SIC). You could also search by business name or geographical location... If you write commercial business software, you should start thinking about leveraging UDDI to make it easy for your software users to publish their Web services and to find other Web services that they need. If you work inside a large organization with several divisions each busy building Web services, you should consider using an internal, UDDI-like, registry of Web services that are available within your organization. Whether for commercial or internal uses, you can program the UDDI APIs directly by sending and receiving SOAP messages. If you program in a COM-aware language you can use the Microsoft UDDI SDK, which handles all the SOAP and XML work and lets you program against a COM-based object model." See: "Universal Description, Discovery, and Integration (UDDI)."

  • [June 05, 2001] "Sun Redraws Java Blueprint Around Web Services." By Mark Leon, Ed Scannell, and Eugene Grygo. In InfoWorld (June 4, 2001). "In the latest move in its competition with IBM and Microsoft, Sun this week at its JavaOne developer conference will leave no room for doubt: The Web services race is on and Sun is in the running. As next-generation software development converges around XML-based standards, Sun this week will recast the Java 2 Enterprise Edition (J2EE) and its own products to coexist with Web services standards. Sun's competitors, notably BEA Systems and IBM, will also detail their plans to tie Java and Web services at the conference. Sun officials argue that the Web services concept -- Web-centric applications loosely coupled with XML -- breathes new life into Java development and its Sun ONE framework... In a bid to get more developers to build those applications with Java, Sun will make several announcements detailing new Web services support in its iPlanet Application Server products and Forte development tools. Developers will get access to a J2EE Service Pack designed to simplify the creation of XML-based services. 'You will be able to visually develop Enterprise JavaBean [EJB] components, assemble them into J2EE applications, and then automatically deploy them to the iPlanet Application Server,'" said Sanjay Sarathy, director of product marketing for the Application Server Group at Sun. The Service Pack will be available soon for developers to download and will become part of J2EE with the Version 1.4 release sometime next year... Sun also will seek to make its Forte for Java development environment more attractive to less technically savvy developers. One feature, called Java Web Services Designer, will allow Web developers who work with Macromedia's Dreamweaver or Adobe's GoLive products to access an XML services-based registry... Also included in Forte for Java release 3.0, due this summer, is a set of wizards that will automatically bind Java and XML so that developers can more easily create Web services and publish them in a registry." See the announcements: (1) "Industry Effort to Define Native Web Services Support in J2EE. Industry Leaders Band Together Using Java Community Process"; (2) "Web Services Pack to Simplify Building Java-Based Web Services. Major Vendors to Integrate Open Technologies in Java Web Services Tools."

  • [June 05, 2001] "Gates launches Office XP." By Jennifer DiSabatino. In ComputerWorld (May 31, 2001). "With much fanfare, including rock music and flashing lights, Microsoft Corp. Chairman and Chief Software Architect Bill Gates today officially launched the latest version of his company's ubiquitous Office software known as Office XP. Gates was in full marketing mode as he led an hour and a half of Office XP feature demos and testimonials... Gates also touted XML as an integral part of Office XP. 'We're designing all our software products from the ground up around XML,' he said. 'Office XP is the first version of Office that supports XML.... It's our view that XML is going to unlock a lot of business processes that have been paperbound and bring them onto the network, and we need to use the standard Office interface as the way that people can navigate that information'... There's enough benefit to XP to skip Office 2000, Silver said, adding that XP is more like an upgrade of 2000 anyway. Users should still take precautions, he said, by testing the software to make sure it's stable in a given environment and then deploying from there. Currently, about 245 million people worldwide use Office products, according to David Bennie, Microsoft group manager for Office/Exchange and product marketing. In addition to the features outlined last week for the new version of Outlook, the e-mail software that comes bundled in the Office software, Microsoft has also added smart tags, which link content in Word documents to Web sites. There is also better version control on Word documents, with revisions color-coded and placed in the margins automatically when the author merges different versions in the main document. Office XP is also tightly integrated with the Share Point Portal system, a knowledge management tool and collaboration application..." See the announcement: "Gates Demonstrates at Office XP Launch How Office XP Unlocks Hidden Knowledge And Unleashes Next Wave of Productivity Gains. Ford,, UPS and LexisNexis Show How Office XP Dramatically Improves Personal and Business Productivity."

  • [June 05, 2001] "DIDL: Packaging Digital Content." By Vaughn Iverson, Todd Schwartz, and Mark Walker. From May 30, 2001. ['Internet applications generally fall short in their ability to transfer multimedia content. This article describes an XML vocabulary for packaging digital content, breaking the one-to-one mapping between the notion of a content item and an individual file.'] "In this article we detail the reasons for undertaking the development of a digital packaging standard and describe in depth a package manifest scheme that potentially addresses the enumerated needs. In doing so, we show how such a scheme effectively disassociates the notion of content item from individual files. We conclude by describing an XML vocabulary, the Digital Item Declaration Language (DIDL), a recently released first working draft from ISO/MPEG that will, when completed, provide standard means for packaging digital content... Today's popular Internet applications generally fall short in their ability to transfer raw resource content. The content of a web page for example may be defined as the collection of discrete resources -- bitmaps, JPEG images, text blocks, and so on -- that are aggregated within some predetermined format. The components of the web page may possess attributes and relationships that, while not explicitly part of the final, viewable form, may be critical in generating the displayed result. Information accompanying a JPEG image, for example, could be utilized in creating a photo caption. Information about the relationships among a group of images could be utilized in locating the images on the page. If the web page is generated from a script, information on the sizes of the various images could be utilized to decide which images to begin downloading first... Internet-transacted digital content is a reality, but the lack of standards makes it very difficult for non-technical users to obtain and transmit content. Content that is transacted generally is not interoperable across platforms and is still tightly bound to the directory/file paradigm which greatly limits its flexibility. The MPEG-21 Digital Item Declaration Language addresses these and related problems by providing a relatively simple, standard method for describing complex, multicomponent content source collections." [Note: see the MPEG-21 Overview and related XML design work in the MPEG-7 (Multimedia Content Description Interface) activity of the Moving Picture Experts Group under ISO/IEC JTC1/SC29/WG11. The MPEG-7 is a 'content representation standard for multimedia information search, filtering, management and processing'; the WG has produced the "Description Definition Language (DDL)" as an XML-based specification for multimedia metadata.] See (1) "Moving Picture Experts Group: MPEG-7 Standard," and (2) "MPEG-21 Part 2: Digital Item Declaration Language (DIDL)."

  • [June 05, 2001] "The State of XML: Why Individuals Matter." By Edd Dumbill. From May 30, 2001. ['A survey of the progress of XML over the last year, emphasizig that in an industry increasingly dominated by large vendors, individual contributors are still key.'] "This article is adapted from the closing keynote speech I [Edd Dumbill] delivered at XML Europe 2001 in Berlin, May 2001. I describe the progress of XML over the last year, emphasizing that in an industry increasingly dominated by large vendors, individual contributors are still key. XML has a tendency to spark new beginnings. Many existing technologies are being re-engineered to take advantage of XML, gaining interoperability benefits previously too costly to realize; industries are finding that XML vocabularies can form a basis for collaboration and cost-cutting, where such cooperation was previously thought counterproductive. XML's influence is proving disruptive to the technological status quo. For better or for worse, many parts of today's computing infrastructure are being re-examined in the light of XML. For better, in that the benefit to be gained from interoperability at the syntax level is large. For worse, in that lessons from the past are being overlooked; however, not learning from history is too broad a charge to lay on the shoulders of overzealous XML developers alone...The progress of adoption and change wrought by XML has accelerated over the last year, but with it comes certain dangers. XML must not be allowed to become so complex that it defeats the point of its original creation and unacceptably raises the level of financial and technological resource needed to use it. A growing reliance on vendor products also runs the risk of creating an identifiable market growth area, which, when it inevitably hits a decline, could take a chunk of XML as a technology down with it. Because of these dangers, the role of individual contributors in the XML community (whether affiliated with a company or not) is more important than ever. They remain among the most creative and influential participants in the development of XML."

  • [June 05, 2001] "XML-Deviant: Schema Scuffles and Namespace Pains." By Edd Dumbill. From May 30, 2001. ['W3C XML Schema is complete. End of story? No way! Debates over Schema best practice have dominated XML-DEV over recent weeks.'] "...Kohsuke Kawaguchi posted a reference to an article, XML Schema Dos and Don'ts, which gives his best practice for keeping XML Schemas simple... In response to a message that implied "co-constraints" will be introduced as a feature in XML Schema 1.1 or 2.0, Rick Jelliffe seemed doubtful such functionality would be in XML Schema 1.1. Co-constraints are constraints on instances where the permissible values of one element depend on the value of a different element..."

  • [June 05, 2001] "Xalan. Sun gives translets technology to Apache XML Project. Size and speed seen as major benefits." By Natalie Walker Whitlock (Casaflora Communications). From IBM developerWorks. May 31, 2001. "Sun Microsystems announced that it has donated its proprietary XSLT compiler technology to the open-source Apache XML Project. Part of the Sun XSLT Compiler -- commonly referred to as 'translets' -- will be made available to the nonprofit Apache organization to be incorporated into the Xalan XSLT engine. This technology attracted the interest of many Java/XML developers who learned about it through technical conference discussions and mailing-list exchanges... Typically, the XSLT process involves three parts: an XML file, an XSLT style sheet that describes and directs the transformation, and an XSLT engine that takes both files as inputs and produces the desired transformed output. These traditional XML transformation engines tend to be large and complex programs. The Sun XSLT Compiler takes a novel approach to XSLT processing. With the XSLT Compiler, the transformation is simplified into two pieces. According to David Hofert, Sun XML Technology Development Group leader, the primary step takes an XSLT style sheet as input and produces a Java class as the output. This compiling step takes the style sheet and creates a Java binary class file as an output -- known as a translet. The second step is to apply the translet to any XML files relevant to the style sheet. In addition to boasting small size, sample translets have performed three to ten times better than James Clark's XT transformation engine, according to Sun's testing. Sun attributes the compiler's increased speed to the unique internal representation of the Document Object Model (DOM) used by the translet, and to the fact that the translet is created by writing directly to Java assembler code, which is converted directly into Java byte code..."

  • [June 05, 2001] "What's the 'diff'? Some suggestions for comparing semantic equivalency of XML documents." By Brett McLaughlin (Enhydra Strategist, Lutris Technologies). From IBM DeveloperWorks. May 2001. ['How can you tell whether two XML document are equivalent? Brett McLaughlin explains why answering this common question is more than a trivial task. The explanation shows how to go about comparing XML documents, including how to deal with significant and ignorable whitespace and external entity references. Code samples include DTDs and SAX EntityResolver examples. This article assumes a basic knowledge of XML and a conceptual understanding of SAX.'] "Recently I went about trying to answer a simple question about how to compare XML documents to find out whether they're the same. The answer is not so simple, because it enters the shadowy realm of semantic equivalence... when comparing XML, you're going to want to formulate DTDs that constrain the documents you're comparing as closely as possible. In particular, if an element can contain only other elements, be sure to indicate that in the DTD. That precision will assure that any whitespace in your documents is ignored when working with APIs like SAX, DOM, and JDOM... [Summary:] Now you ought to have a solid understanding of what it means to say that two XML documents are 'the same.' You know why simple programs like diff simply are not enough for comparing XML documents. I hope that you can use some of the code shown here to begin to isolate comparison points in XML documents so that you can more easily perform XML comparisons..."

  • [June 05, 2001] "Translating XML Schema." By Timothy Dyck. In eWEEK (May 28, 2001). "Earlier this month at the Tenth International World Wide Web Conference in Hong Kong, XML took its biggest step forward since the document format was first standardized in February 1998. At the conference, the World Wide Web Consortium released XML Schema as a W3C Recommendation, finalizing efforts that started in 1998 to define a standard way of describing Extensible Markup Language document structures and adding data types to XML data fields. Now that it is finally out, the long-delayed XML Schema standard will catalyze the next big step in XML -- allowing cross-organizational XML document exchange and verification. Just as discovery of the Rosetta stone in 1799 provided a way to fix the meaning of Egyptian hieroglyphs so they could be understood across the gulf of two millennia, XML Schema provides a way for organizations to fix the meaning of XML documents so they can be understood across the gulf of organizational boundaries and otherwise incompatible IT architectures. As a result, XML Schema will be a cornerstone in the new e-commerce architecture that we are collectively building and will be a vital component for making business exchanges and other loose associations of trading partners possible. The arrival of XML Schema, more than three years after XML itself, has left many chafing at the bit (and others, such as Microsoft Corp., running off in their own direction implementing and shipping products based on prestandard efforts), and the market is now more than ready for this standard to take hold. However, XML Schema's long development cycle gave vendors time to understand the specification and start writing compliant software, and we are now seeing the rapid release of XML Schema-compliant (or soon-to-be-compliant) authoring tools and servers... That long, committee-driven development cycle also resulted in a specification that has a bit of everything in it, and fully compliant XML Schema parsers will have to be complex pieces of software to support all the options the specification allows. Fortunately, XML Schema documents have to reference only the functionality they need, and the more complex options in XML Schema, such as null elements and explicit types, may just fade away through disuse. The W3C recently published a recommendation on how to group Extensible HTML, the consortium's replacement for HTML, into well- defined subgroups so XHTML browsers (such as those in cellular phones) can clearly define which parts of the language they support and which they don't. Something similar is a possibility for XML Schema if the full specification proves too difficult to implement for some vendors (although large players such as IBM, Microsoft and Oracle Corp. are moving ahead full speed with plans to support the full specification as published). Over the next few years, eWeek Labs predicts XML Schema will become integral to the way that many companies exchange information..." For schema description and references, see "XML Schemas."

  • [June 05, 2001] "[W3C XML Schema] Speedy Adoption Expected." By Jim Rapoza. In eWEEK (May 28, 2001). "When XML was introduced, although there were early adopters, it still took about a year before Extensible Markup Language began to be regularly used in enterprise- level applications and deployments. Now that XML Schema is a standard, the waiting period for its adoption should be much shorter. Part of this can be attributed to how long businesses have been waiting for this schema. Many have been working on tools and compatibility issues while the standard was under development. However, it is also due in part to the complexity of the schema. Whereas the initial XML standard could be easily built and managed by anyone with an editor, many vendors plan to provide new tools to help shield users from the size and complexity of XSD (XML Schema Definition). Given the importance of XML Schema for handling data-driven communications among businesses, eWeek Labs recommends that developers begin evaluating tools that will help them move to XSD. In addition, companies should find out what their enterprise software vendors' plans are for supporting and integrating with XML Schema. As is true of most standards, many of the initial sets of XML Schema tools are essentially validators that help developers stay within the standard. Several are from individual World Wide Web Consortium members and universities, but some are also available from vendors such as IBM, and Java-based validators are available from Sun Microsystems Inc... Another important set of tools for businesses moving to XML Schema are conversion tools, which will help develop-ers convert content to the new standard. Probably the most important will be tools for converting standard XML DTDs (Document Type Definitions) to XSD, although some of those currently available have not been updated to the final standard. There are also tools for converting files from other schema languages, including a tool from Microsoft Corp. for converting files from XML Data Reduced to XSD...Microsoft recently released betas of MSXML and SQLXML that support the schema and has said that most of its products will support XSD in their next versions. Sun has released a new XML data types library that supports the final XML Schema standard, and Tibco Software Inc. includes tools for validating documents using XSD...

  • [June 05, 2001] "Other XML Standards Get Ready to Roll." By Jim Rapoza. In eWEEK (May 28, 2001). "As XML has progressed down the technological road since its introduction in 1996, it has steadily gained momentum, to the point where most other World Wide Web Consortium standards are now based on Extensible Markup Language. But for the last two years, the giant, wide-body truck that has slowed its progress has been the development of XML Schema as a standard. Now that the W3C has finally gotten XML Schema into gear, what's next for XML? eWeek Labs believes several core XML technologies will probably become standards (or Recommendations, as the W3C calls them) this year and, for the most part, all will help improve the interoperability of XML-based data and applications. Also, not surprisingly, most of these related technologies were initially proposed around the same time as XML Schema. The XML Information Set, which is expected to reach recommendation status next month, will provide a common reference set for defining abstract objects such as elements within a document. The main goal here isn't to provide a definitive set of definitions but to provide a base that will improve interoperability among XML tools and applications. Later this year, several technologies pertaining to XML linking -- Xlink, Xbase and Xpointer -- should become standards or reach candidate status. All these technologies deal with hyperlinking within XML documents, in a manner similar to the way Uniform Resource Indicators work. All three will enable a much more complex and multilayered linking than what is currently possible in HTML and XML. Whereas the other technologies listed here have been around for almost two years, XML Query was introduced just this year and is probably at least a year away from becoming a standard...

  • [June 05, 2001] "Using Schema and Serialization to Leverage Business Logic." By Eric Schmidt. From Microsoft MSDN Online. 'Extreme XML' Column. May 17, 2001. ['New columnist Eric Schmidt addresses how you can use schemas and serialization technology to leverage XML in your applications and services.'] "In this issue of Extreme XML, we are going to examine the importance of schema usage and the use of serialization technology to leverage XML in your applications and services. The majority of development tasks today revolve around developers taking existing infrastructure (business components, databases, queues, and so on) and morphing them into the next version of their product... The surge of XML usage over the past several years has not led to a complimentary increase in defined data models for XML documents. For this section, I am referring to a data model for XML to be the structure, content, and semantics for XML documents. The one main reason for this slow growth in XML data models is the lack of, until now, a robust XML schema standard. Document Type Definitions (DTDs) have out grown their usefulness in the enterprise space because of their focus on XML from a document perspective and not viewing XML document instances from a data and type perspective. Typed data items like addresses, line items, employees, orders, and so on have complex models and are the basis for most applications. Applications look at data from strongly typed perspective. For example, a Line Item is an inherited member of an order and contains typed information like product price, which is of type currency. The majority of this type of modeling cannot be accomplished with DTDs. Due to the simple structuring and typing mechanisms in DTDs, numerous XML validation, structuring, and typing systems have been created, including Document Content Description (DCD), SOX, Schematron, RELAX and XML-Data Reduced (XDR). The later, XDR, has gained much momentum in the Windows and B2B based communities due to its usage in products like SQL Server, BizTalk Server, and MSXML. In addition, most independent software vendors (ISVs) and B2B integrators support XDR because of its data typing support, namespace support, and its XML-based language. However, XDR's usefulness stills falls short of providing a truly extensible modeling and typing system for complex data structures. This was a known issue at the time of XDR's creation. Building on the lessons learned from previous schema implementations, the W3C XML Schema working group set out to create a specification (XML Schema) for defining the structure, content, and semantics of XML documents. Ultimately, this specification should provide an extensible environment so that it could be applied to any type of business or processing logic. During the development of this article, I was pleased to see that the W3C released XML Schema as a recommendation. This is a tremendous step in solidifying and stabilizing XML-based implementations that need to employ schema services. Next, we're going to look at the importance and power behind XML Schema... I have distilled five core items you need to know about XML Schema so you can get up and running: (1) XML Schema is represented in XML 1.0 syntax; this makes parsing XML Schema available to any XML 1.0-compliant parser, and thus can be used within a higher-level API like the DOM. (2) Data typing of simple content: XML Schema provides a specification for primitive data types (string, float, double, and so on) found in most common programming languages. (3) Typing of complex content: XML Schema provides the ability to define content models as types. (4) Distinction between the type definition and instance of that type: unlike XDR, XML Schema type definitions are independent of instance declarations; this makes it possible to reuse type definitions in different contexts to describe distinct nodes within the instance document. (5) W3C support and industry implementation... creating specific and lucid schema should be your first task when creating XML- and Web Service-enabled applications. If your partners need other schema definitions than XML Schema, for example DTD, start with an XML Schema approach and then port the implementation. You'll come out ahead in the long run." See also the sample code for the article. On XML schemas: "XML Schemas."

  • [June 05, 2001] "Web Team Talking: Out of Cache but Still Stylin'." By Mark Davis, Heidi Housten, Dan Mohr and Kusuma Vellanki. From Microsoft MSDN Online. June 4, 2001. ['This month the team serves up a new twist on the ever popular question of how to avoid caching, as well as some advice on using XSL to display XML data with different fields every time.'] "... We have a new twist on the ever popular question about how to avoid caching and an answer on using XSL to display XML data with different fields every time..." Covers: (1) a way in which to prevent Internet Explorer from putting a dynamically changed XML document in the cache; (2) displaying unknown XML data in a table; (3) pop-up window notifications; (4) XML object model center spread. See also "The Revised XML Object Model for Internet Explorer 5.0" ('an updated version for MSXML 3.0 is on its way to MSDN as we speak...')

  • [June 04, 2001] "A Triumph of Simplicity: James Clark on Markup Languages and XML. Markup Languages, the Standardization Process, and the Importance of Simplicity. [DDJ Interviews James Clark. Feature.]" By Eugene Eric Kim and James Clark. In Dr. Dobb's Journal Issue 326 (July 2001), pages 56-60. ['Whether you know it or not, James Clark has made your life easier by creating a number of open-source tools such as expat (an XML parser), groff (a GNU version of troff), TREX (an XML schema language), and more. Eugene Eric Kim talks to James about these tools, plus the state of XML.'] "If you peek under the hood of high-profile open-source projects such as Mozilla, Apache, Perl, and Python, you'll find a little program called 'expat' handling the XML parsing. If you've ever used the man command on your GNU/Linux distribution, then you've also used groff, the GNU version of the UNIX text formatting application, troff. If you've ever done any work with SGML, from generating documentation from DocBook to building your own SGML applications, you've undoubtedly come across sgmls, SP, and Jade. Whether you've heard of him or not (and mostly likely, you haven't), James Clark [pictured] has made your life easier. In addition to authoring these and other widely used open-source tools. Clark served as the technical lead of the original W3C XML Working Group and as the editor of the XSLT and XPath recommendations. He recently founded Thai Open Source Software Center. His latest project is TREX, an XML schema language. Clark sat down with Eugene Eric Kim to discuss markup languages, the standardization process, and the importance of simplicity... [The next step for XML?] JC: I think XML has become so widespread, it's like asking me, 'What's the next application for ASCII text? What's the next application for line-delimited files?' XML is becoming so common, it's not interesting anymore. One of the things that I was very inspired by in working with TREX was a project from the University of Pennsylvania called XDuce, which is an XML processing language. One thing that is interesting about XDuce is that it uses the type information from DTDs to actually type-check your program. Statically typed languages, like Java and C++, help you catch a lot of errors. But with XML processing at the moment, you use the DTD just to validate the file. You don't really use the type information after that. The fact that a document conforms to a DTD is not used by the typing system of the programming languages. I think one interesting direction is to try doing the kind of things that XDuce is doing, which is integrate the type system of your data, DTDs or schemas, into the type system of the programming language. You want them to all work together in a seamless way so that your compiler can catch a lot more errors when you write programs to process XML, so you can get more reliable programs..." Note: With the decision to merge RELAX Core and TREX under the name 'RELAX NG', we may assume that much of what Clark writes about TREX applies largely to RELAX NG as well. E.g., "...You can think of it as DTDs in XML syntax minus some things and plus some others. TREX just does validation. DTDs mush together both validation and interpretation of the documents, providing various things like entities and notations. Mushing them together is problematic because often you want one thing but not the other. My work with XML and SGML has convinced me that what you need is good separation between these different things. I wanted to remove from DTDs the things that augment the information in the XML document. And I wanted to add in some of the things that I think XML DTDs have always been missing. One of the things XML DTDs removed from SGML DTDs was AND groups, which allow you to have unordered content. The SGML AND groups had a bad reputation, and don't have quite the right semantics. TREX adds them back and tries to do them right. XML also radically simplified the kinds of mixed content that you're allowed because there's a problem with the way SGML does it. Instead of restricting it, TREX solves the problem..." See: "RELAX NG."

  • [June 02, 2001] "Bringing the Wireless Internet to Mobile Devices." By Subhasis Saha, Mark Jamtgaard, and John Villasenor. In IEEE Computer Volume 34, Number 6 (June 2001), pages 54-58. "Transcoding and Relational Markup Language are promising middleware solutions to the problem of bringing Internet content to the extremely diverse and dynamic mobile wireless devices universe... Mapping Internet content to mobile wireless devices requires new technologies, standards, and innovative solutions that minimize cost and maximize efficiency. The wireless Internet must deliver information in a suitable format to handheld device users -- regardless of location and connectivity. Although the exact form in which high-speed wireless data services will develop is uncertain, the authors predict an improvement over today's data rates. Current mobile devices suffer from small displays, limited memory, limited processing power, low battery power, and vulnerability to inherent wireless network transmission problems. To address these issues, a group of leading wireless and mobile communications companies have developed the wireless application protocol for transmitting wireless information and telephony services on mobile handheld devices. Whereas HTTP sends its data in text format, WAP uses Wireless Markup Language to create and deliver content in a compressed binary format that provides efficiency and security. Middleware, an alternative to manually replicating content, seamlessly translates a Web site's existing content to mobile devices that support operating systems, markup languages, microbrowsers, and protocols. The authors predict that middleware such as Relational Markup Language will be critical to bringing Internet content to wireless devices, and they anticipate that open standards based on this or similar techniques will gain acceptance..." See also "Relational Markup Language (RML)."

  • [June 02, 2001] "XML's Impact on Databases and Data Sharing." By Len Seligman and Arnon Rosenthal (of MITRE Corporation). In IEEE Computer Volume 34, Number 6 (June 2001), pages 59-67. [Research Feature.] "The Extensible Markup Language, HTML's likely successor for capturing Web content, has generated a lot of interest. Created by the World Wide Web Consortium to address HTML's limitations, XML resembles HTML's format but offers users a more extensible language. It lets information publishers invent their own tags for applications. Alternatively, they can work with organizations to define shared tag sets that promote interoperability and help separate content from presentation. While XML addresses content, Cascading Style Sheets, the Extensible Stylesheet Language, and Extensible HTML handle presentation separately. XML also supports data validation. XML's advantages over HTML include support for multiple views of the same content for different user groups and media; selective, field-sensitive queries over the Internet and intranets; a visible semantic structure for Web information; and a standard data and document interchange infrastructure. Using XML and related tools often eliminates problems associated with heterogeneous data structures. Like any new technology, XML has generated exaggerated claims. It does not come close to eliminating the need for database management systems or solving large organizations' data-sharing problems. Although XML hype has raised unrealistic expectations, the language does reduce the data-sharing obstacles among diverse applications and databases by providing a common format for expressing data structure and content... Some industry observers have heralded XML as the solution to data-sharing problems -- for example, one observer asserted that XML together with XSL will bring -- complete interoperability of both content and style across applications and platforms. In reality, XML technologies will contribute only indirectly to meeting many of the toughest data-sharing challenges. Architectures Users want seamless access to all relevant information about their domain's real-world objects. Several general architectures and hybrids are available for this purpose... Regardless of the distributed architecture chosen, someone -- a standard setter, application programmer, or warehouse builder -- must reconcile the differences between data sources and the consumer's view of that data so users can share it. This reconciliation must insulate applications from several forms of diversity. The insulation mechanisms also provide an interface for programmers to look beneath and see the diversity. XML's contributions to data sharing [include]: (1) Level 1: Geographic distribution. Data can be widely distributed geographically. Off-the-shelf middleware products handle most of the challenges at this level, often supporting standard protocols such as HTTP, the simple object access protocol (SOAP), or the common object request broker architecture. XML assists with remote function invocation. (2) Level 2: Heterogeneous data structures and languages. Diversity here includes different data-structuring primitives -- such as tables versus objects -- and data manipulation languages -- such as SQL versus a proprietary language versus file systems with no query language. XML provides a neutral syntax for describing graph-structured data as nested, tagged elements with links. Because developers can transform diverse data structures into such graphs, XML -- along with DOM and XQuery -- provides the operations users need to access these heterogeneous data structures. (3) Level 3: Heterogeneous attribute representations and semantics. This level deals with atomic concepts. Transmitting a fact between systems requires relating each system's semantics as well as their representations. The computer does not need to 'understand' either the source or target concept; rather, it only needs to know whether they are identical or how to convert them. XML provides a convenient mechanism for attaching descriptive metadata to both source and target schemas' attributes. (4) Level 4: Heterogeneous schemas. Developers are increasingly aware that schema diversity will be a serious problem even if XML schemas achieve wide usage. To support interoperability at this level, a way to describe and share community schemas and to express mappings across schemas is necessary. Communities developing standard schemas include e-commerce, healthcare, and data-warehousing vendors. Such schemas will reduce diversity among interfaces and ease data sharing. Oasis and BizTalk are examples of XML repository environments that map among XML elements and models. XML does not provide intrinsically simpler model standardization than object systems, but its ubiquity and cheap tools have sparked enthusiasm, motivating some communities to agree on standards when previously they could not. (5) Level 5: Object identification Improvements in describing attribute representation and semantics can remove one source of object misidentification -- for example, is the date in a payment in US or European format? Also, XML makes it easy to attach uncertainty estimates as subsidiary elements to any output -- although to be useful, the recipient must be prepared to interpret them. (6) Level 6: Data value reconciliation. Many strategies for data value reconciliation depend on having metadata such as time stamp and source quality attached to the data. In addition to attaching such annotations, XML makes it easy to return a set of alternative elements for an uncertain value if the recipient can use such output..." See the related paper from Mitre online; [cache]

  • [June 02, 2001] "Middleware Challenges Ahead." By Kurt Geihs (Goethe University). In IEEE Computer Volume 34, Number 6 (June 2001), pages 24-31. "New application requirements -- including the need to support enterprise application integration, Internet applications, quality of service, nomadic mobility, and ubiquitous computing -- challenge established middleware design principles. Meeting these challenges will lead to a major middleware design and development phase that requires new insights into distributed system technology. A middleware layer seeks to hide the underlying networked environment's complexity by, for example, insulating applications from explicit protocol handling, disjoint memories, data replication, network faults, and parallelism. Middleware masks the heterogeneity of computer architectures, operating systems, programming languages, and networking technologies to facilitate application programming and management... Asynchronous interaction: Independent from any particular communication style, distributed programming models such as RPC and the later remote object invocation (ROI) are natural companions for client-server applications. These programming models introduce a synchronous, blocking interaction style in which a server object remains passive until it receives a request, and the system blocks the client's execution until the server response arrives. Distributed programming models hide distribution because the transaction looks like a local procedure call, and they elegantly handle the implicit synchronization. RPC and ROI remain middleware's most popular communication models. Obvious drawbacks occur if the client uses the network environment's inherent parallelism, for example, to send a search request in parallel to several directory services. RPC-style communications offer two choices: Either use multithreading and spawn a separate thread per request or use a modified non-blocking RPC facility. The RPC system's inherently sequential interaction style has received some criticism... For Internet applications, the simple object access protocol defines a mechanism for transporting invocations between peers using HTTP or other protocols and XML as the interface description and encoding language. SOAP does not prescribe any particular programming model. SOAP implements patterns such as request-response pairs as one-way transmissions from a sender to a receiver. Developers designed SOAP to correspond with the Internet's need for a lightweight, open, and flexible mechanism for linking arbitrary applications and services. Event-based middleware architectures address the requirement for decoupled, asynchronous interaction in large-scale, widely distributed systems. Using events as the primary means of interaction allows asynchronous, peer-to-peer notifications between objects and provides flexible pattern-based event filtering and forwarding options.9 Message passing accommodates peer-to-peer interaction because it has weaker coupling and better scalability. However, in terms of programming abstractions, this low-level paradigm makes programming potentially more error-prone and more difficult to test and debug for elaborate communication patterns. Thus, we can view message passing as a backward step in middleware evolution that illustrates the design trade-off between degree of abstraction and practical requirements."

  • [June 01, 2001] "InfoWorld Readers' Choice Awards." By [Staff.] In InfoWorld (June 01, 2001). [Announcement: "InfoWorld Announces 2001 Readers' Choice Awards. XML Selected As Standard of The Year."] "...Finalists for the awards were nominated by InfoWorld editors, writers, and analysts, and then readers were asked to vote online for their favorites. To make sure that the results were unbiased and unsullied by vote tampering, we asked voters to use their subscription numbers to identify themselves, and each subscription number could vote only once. InfoWorld readers are known for their technological acumen, and subsequently the results of the voting are very revealing. Some choices you made were resounding and clear, but others in more detailed technical categories were close, with winners decided by only a fraction...XML won the standards battle with ease, gaining recognition for Most Important Standard of the Year with 59 percent of the vote, beating out Java 2 Enterprise Edition (J2EE) with 22 percent in second place, and Application Development Technology of the Year with 39 percent over J2EE again, which garnered a much closer 31 percent. Not surprisingly, J2EE won an award itself for Infrastructure Product of the Year, beating out Cisco's Long-Reach Ethernet technology. Other clear winners you voted for included Verio as ASP (application service provider) of the Year, which received 48 percent of the vote, 30 percent clear of the following pack. However, ISP of the Year was a closer call: AT&T WorldNet garnered 35 percent of your votes to 31 percent for UUNet and a surprising 23 percent for America Online. Your pick for Hosting Center of the Year was also a resounding choice: Qwest with 37 percent of the vote, trailed by Exodus at 21 percent..."

  • [June 01, 2001] "Web services unite tech giants ... somewhat." By Matt Berger. In InfoWorld (June 01, 2001). "Companies that for the most part have agreed to disagree appear to be making an exception when it comes to Web services, an emerging computing model that seems to be changing its definition as fast as it gathers new support. While they engaged in some of the usual corporate head-butting, representatives from Hewlett-Packard, Microsoft, Sun Microsystems and IBM found time for moments of accord during a panel discussion at Partech International's Web Services Conference here Thursday. At the heart of their agreement was a set of technology standards that the rivals agree will be central to the next stage of Internet computing. Still largely a concept, Web services describes a computing model in which information can be pulled together over the Internet from a variety of sources and assembled, on the fly, into services that are useful to businesses and consumers. In some cases the information being accessed is itself a kind of service, becoming a building-block component such as a shared online calendar that can be integrated into a larger service offering.... While each one pitched its platform as the best foundation for Internet-based applications and services, the four vendors made it clear that the Web services idea won't work without the broad adoption of technologies including XML (extensible markup language), UDDI (universal description, discovery and integration) and SOAP (simple object access protocol). So far, there has been little resistance. 'This is all just beginning to take shape,' said Ben Brauer, product marketing manager for the Web services division at Hewlett-Packard, who has worked on the development of UDDI. 'We all believe that standards are evolving more quickly than standards in the past because there is so much industry backing.' But while the vendors appear to be in agreement on basic standards, there's room for trouble yet. For example, XML comes in a variety of different formats, or 'schema,' depending on what it's being used for, and there's room for divergence from many of the agreed-upon standards at a deeper technical level, analysts said. 'There are standards, but they are the generic standards,' said Tim Clark, an analyst with Jupiter Media Metrix. The building-block standards used to create Web services can actually be very proprietary, he said, and it's also not clear yet how coding languages such as Microsoft's C# and Sun's Java will exist side by side..."

May 2001

  • [May 31, 2001] "An XML Encoding of Simple Dublin Core Metadata." Edited by Dave Beckett, Eric Miller, and Dan Brickley. Dublin Core Metadata Initiative Proposed Recommendation. 2001-04-11 or later. Dublin Core Metadata Initiative Proposed Recommendation. This version supersedes "The Dublin Core Metadata Element Set V1.1 (DCMES) can be represented in many syntax formats. This document explains how to encode the DCMES in XML, provides a DTD to validate the documents and describes a method to link them from web pages... This document describes an encoding for the DCMES in XML subject to these restrictions: (1) The Dublin Core elements described in the DCMES V1.1 reference can be used; (2) No other elements can be used; (3) No element qualifiers can be used; (4) The resulting XML cannot be embedded in web pages. The primary goal for this document is to provide a simple encoding, where there are no extra elements, qualifiers, optional or varying parts allowed. This allows the resulting data to be validated against a DTD and guaranteed usable by XML parsers. A secondary goal was to make the encoding also be valid RDF which allows the document to be manipulated using the RDF model. We have tried to limit the RDF constructs to the minimum, and the result is a mostly standard header and footer for every document. We acknowledge that there will be further documents describing other encodings for DC without these restrictions however this one is for the simplest possible form. One result of the restrictions is that the encoding does not create documents that can be embedded in HTML pages..." See: "Dublin Core Metadata Initiative (DCMI)."

  • [May 31, 2001] "BEA Next Up to Outline Web Services Strategy." By Kathleen Ohlson. In InfoWorld (May 29, 2001). "Joining the likes of IBM, Microsoft, and Sun, BEA Systems next week is expected to map out a Web services strategy that would provide enhanced access to and interaction with business functions over the Internet. According to sources, BEA is going to announce that portal functionality and support for Java Messaging Service (JMS), a specification that details how applications communicate in an asynchronous environment, will be added to BEA's flagship Java 2 Enterprise Edition (J2EE)-compliant application platform called WebLogic Server. JMS lets messages be sent between applications across a network, and this version of JMS would let messages pass from one ERP (enterprise resource planning) application to another. The Web services blueprint is expected to coincide with BEA's WebLogic Server upgrade announcement. WebLogic Server 6.1 will also include support for UDDI and WSDL (Web Services Description Language), according to BEA. UDDI is a universal registry of resources, and WSDL standardizes the way services and their providers are described. The application server would act as the backbone for BEA Web services... BEA previously has said its strategic products will consist of WebLogic Collaborate, its collaboration platform that integrates trading partners and e-business processes over the Web, and WebLogic Process Integrator, the workflow engine for Collaborate that controls the sequences of Web services. The products in BEA's Web services lineup will support UDDI, WSDL, SOAP (Simple Object Access Protocol), and ebXML (electronic business XML). SOAP exchanges XML-based messages from one business application to another over the Web, and ebXML creates a standard XML dialect for businesses to find each other on the Web, form trading partner deals, and exchange business documents electronically. The company also supports BTP (Business Transaction Protocol) in WebLogic Collaborate, which defines how to do transactions, security, and multiparty dialog in Web services. For example, Web transactions could be canceled without any changes to corporate systems if the receiving application did not get all the XML data..." See the announcement: "Market Leader BEA Systems To Showcase the BEA WebLogic E-Business Platform Advancements, New Partners and Customers at JavaOne. Developer Conference BEA CEO Bill Coleman to Deliver JavaOne Keynote Address on June 7, 2001."

  • [May 31, 2001] "IBM Retools For Web Services." By Wylie Wong. In CNET (May 28, 2001). "IBM is set to launch Tuesday its latest offensive in the market for e-business software with more versatile development tools. The company will announce further details of the next releases of its application-server software and new development tools for building Web-based software and services. IBM competes against BEA Systems, Microsoft, Oracle, Sun Microsystems and others in the market for e-business software that enables companies to share data and conduct trades online. Analysts say IBM's latest WebSphere application server, which will ship late next month, will help the company compete against market leader BEA, which holds the top spot in the market for application servers. In the $1.6 billion market in 2000, BEA captured 35 percent of the market share, followed by IBM with 30 percent, according to analyst firm Giga Information Group. The most important change, [Evan] Quinn [Hurwitz Group] said, is that every version of IBM's WebSphere application server is now built on the same software code. Previously, each version of WebSphere, from low end to high end, was built using slightly different code, making it harder for businesses to move to higher-end versions of the server as their needs grew. .. The new WebSphere application server version 4.0 will also support additional Web standards that allow people to build Web-based software and services. IBM, along with its rivals Microsoft and others, has been racing to build and sell software for building and delivering Web services by which people access software through the Web instead of on their local PCs. IBM had previously announced plans to support Web services throughout its e-business product family, including its DB2 database-management software. IBM's new database with support for Web services is expected to be released early next month... IBM on Tuesday also announced a new version of its Visual Age for Java and WebSphere Studio software development tools, which IBM executives said will offer better support for Web services and the Java 2 Enterprise Edition. The company will also release in July a free tool, called the WebSphere Studio WorkBench, which allows software developers to integrate IBM's development tools with other company's development tools and have one user interface on their computers for writing applications..."

  • [May 25, 2001] "Indexing XML Documents. [XML Matters, Part #10.]" By David Mertz, Ph.D. (He-Of-Innumerable-Epithets e.g., 'Objectifier,' Gnosis Software, Inc.) From IBM developerWorks. May 2001. ['As XML document storage formats become popular, especially for prose-oriented documents, the task of locating contents within XML document collections becomes more difficult. This column extends the generic full text indexer presented in David's Charming Python #15 column to include XML-specific search and indexing features. This column discusses how the tool design addresses indexing to take advantage of the hierarchical node structure of XML.'] "Large multi-megabyte documents consisting of thousands of pages are not uncommon in corporate and government circles. Writers and technicians routinely produce voluminous product specifications, regulatory requirements, and computer system documentation in SGML (Standard Generalized Markup Language) format. In a technical sense, XML is a simplification and specialization of SGML. At a first approximation then, XML documents should also be valid SGML documents. Culturally, however, XML has evolved from a different direction. In one respect, XML is a successor for EDI. In another respect, it is a successor for HTML. Having a different cultural history from SGML, XML is undergoing its own process of tool development. It is becoming more popular, so expect to see more and more of both (usually) informal HTML documents and (usually) formal SGML documents migrating in the direction of XML formats -- particularly using XML dialects like DocBook. However, XML has not yet grown, within its own culture, a tool that effectively and efficiently locates content within large XML documents. General file-search tools like grep on Unix, and similar tools on other platforms, are perfectly able to read the plain text of XML documents (except for possible Unicode issues), but a simple grep search (or even a complicated one) misses the structure of an XML document. When searching for content in a file containing thousands of pages of documentation, you are likely to know much more than you can specify in just a word, phrase, or regular expression. Just which of those agricultural reports, for example, did Ms. June Apple write? A coarse tool like grep will generally find a lot of things that are not of interest. Moreover, ad hoc tools like grep, while very efficient at what they do, need to check the entire contents of large files each time a search is performed. For frequent searches, repeated full-file searching is inefficient... In response to the need outlined above, I have created the public-domain utility xml_indexer. This Python module can be used as a runtime utility and can also be easily extended by custom applications that use its services. The module xml_indexer, in turn, relies on the services of two public-domain utilities I have described in earlier IBM developerWorks articles: indexer and xml_objectify... It turned out that the design of xml_indexer was aided enormously by the object-oriented principles that went into designing indexer. Overriding just a few methods in the GenericIndexer class (actually, in its descendent SlicedZPickleIndexer -- but one could just as easily mix in any concrete Indexer class), made possible the use of an entirely new set of identifiers and data source. Readers who wish to use xml_indexer as part of their own larger Python projects should find its further specialization equally simple." Article also available in PDF format. See: "XML and Python."

  • [May 24, 2001] RELAX NG Tutorial." Edited by James Clark [for the TREX TC]. Draft/Version: 2001-05-25. [Attached is a RELAX NG tutorial based on my TREX tutorial.] "RELAX NG is a simple schema language for XML, based on RELAX and TREX. A RELAX NG schema specifies a pattern for the structure and content of an XML document. A RELAX NG schema thus identifies a class of XML documents consisting of those documents that match the pattern. A RELAX NG schema is itself an XML document... RELAX NG Non-features: The role of RELAX NG is simply to specify a class of documents, not to assist in interpretation of the documents belonging to the class. It does not change the infoset of the document. In particular, RELAX NG does not allow defaults for attributes to be specified, does allow entities to be specified, does allow notations to be specified, [and] does not specify whether white-space is significant. Also, RELAX NG does not define a way for an XML document to associate itself with a RELAX NG pattern." Note section 17, 'Differences from TREX': "(1) the concur pattern has been removed; (2) the string pattern has been replaced by the value pattern; (3) the anyString pattern has been renamed to text; (4) the namespace URI is different; (5) pattern elements must be namespace qualified; (6) anonymous datatypes have been removed; (7) the data pattern can have parameters specified by param child elements; (8) oneOrMoreTokens and zeroOrMoreTokens patterns have been added for matching whitespace-separated sequences of tokens; (9) the data pattern can have a key or keyRef attribute; (10) the replace and group values for the combine attribute have been removed; (11) an include element in a grammar may contain define elements that replace included definitions." Note: RELAX Core and TREX (Tree Regular Expressions for XML) are to be unified, since the two are very similar as structure-validation languages. The unified TREX/RELAX language will be called RELAX NG [for "Relax Next Generation," pronounced "relaxing"]. This design work is now being conducted within the OASIS TREX Technical Committee, where a (first) specification is expected by July 1, 2001. The OASIS TC has also been renamed 'RELAX NG' [mailing list: ''] to reflect the new name of the unified TREX/RELAX language. The RELAX NG development team plans to submit the OASIS specification to ISO, given the importance of ISO standards in Europe. See the RELAX NG Issues List of 2001-05-24 for updates on the design of RELAX NG. References: see (1) Tree Regular Expressions for XML (TREX), and (2) REgular LAnguage description for XML (RELAX).

  • [May 24, 2001] "Components in tag land. Where components fit into the picture at XML DevCon." By Uche Ogbuji (Fourthought, Inc). From IBM developerWorks. April 2001. ['XML DevCon is, of course, all about XML. But since it's geared towards developer education, component technologies from COM to CORBA and beyond are inevitable parts of the picture. At XML DevCon Spring 2001 it seemed everyone wanted a piece of the emerging field of Web services. Uche Ogbuji reports from the front lines, sorting out the fresh meat from the vapor.'] "One thing about the grand term component is that its most clear usage is as a generic qualifier for one vendor to use in proclaiming its product as superior to the competition. Beyond that, it has never been clear what the term means. XML DevCon saw an emergence of a successor term in this regard: Web services. At the conference, everything was a Web service, and everyone was a Web services specialist. A Web service is basically a component that is designed to be accessed using Web technology. In most cases this involves XML messaging to HTTP servers. SOAP is the usual transport protocol, somewhat equivalent to CORBA's IIOP, say, or EJB's RMI. Web services has sprouted a bunch of other analogs to traditional component tools. WSDL is similar to IDL, and UDDI is similar to the CORBA naming and trader service, EJB's JNDI, or COM interface registries. Although I make fun of the woolliness of the term Web services, there is no doubt that Web services are serious business. There was clearly a great deal of money, promotion and developer effort going into the many Web services systems on display. And everyone gave the impression that the stakes in the poker game are quite high. One central conflict was between ebXML, which approaches Web services as an enhancement to traditional EDI, and another camp -- headed by IBM and Microsoft -- that revolves around UDDI and other newly minted technologies. There was an ebXML day and a UDDI day, and the partisans of each faction could be heard disparaging the other based on its standards credentials, openness, or lack of practical implementation. This central debate is very likely to move to center stage in the world of business components because XML has pretty much been accepted as the central glue that will tie macro business components (such as EDI tools) to micro business components (such as your favorite online shopping cart widget). At their core, both UDDI and ebXML have the idea (from component technologies) of a repository of interfaces to components offered on a system. This is similar to a CORBA interface repository or a COM interface registry, except that Web services registries have the potential to store millions of entries representing the global facilities that will be offered as Web services. Clearly such a wide-ranging directory of Web services would require an accessible way to manage metadata -- such as the network location and cost of the service -- and the composition of requests to, and responses from, the service. UDDI and ebXML take advantage of XML's general usefulness in representing metadata (although they don't take advantage of RDF: XML's most powerful tool for this), and they use UML for formal expression of the metamodel. Strangely enough, if they insist on using UML, you'd think they'd also use XMI for the XML representation. But as we all know, component technology vendors like to tout the concept of reuse even though they themselves are guilty of forgetting to practice this gospel..." Article also in PDF format.

  • [May 24, 2001] "IBM Customers Generally Bullish About Web Services." By Kathleen Ohlson. In Network World Volume 18, Number 21 (May 21, 2001), page 12. "At a press conference in New York last week IBM outlined Dynamic E-Business, a grand plan to help companies build applications that pull together information from multiple sources either internally, or externally over the Internet. For its Web services plan, IBM intends to make the most of its WebSphere, DB2, Tivoli and Lotus products, and improve them with support for emerging standards such as Simple Object Access Protocol (SOAP); Universal Description, Discovery and Integration (UDDI); and Web Services Description Language (WSDL). SOAP exchanges XML-based messages from one business application to another over the Internet. UDDI is a universal registry of resources, and WSDL standardizes how a service and its provider are described. IBM joins industry heavyweights such as Microsoft, Sun and Oracle, in the Web services arena. Users have high hopes for the iniative... Dave Kulakowski says Honeywell would use IBM Web services within the company until the technology standards become stabilized and more companies implement some form of Web services. Besides Honeywell, companies including Galileo International, He-witt Associates, Duck Head Ap-parel, CareTouch and Transacttools are looking into IBM's Web services and plan to implement them within a year. Sam Johnson, CEO of Transacttools, says trading partners trying to connect to each other's legacy systems could go through an XML interface, rather than connecting to individual systems such as equity, settlement and fixed income in the financial community. Transacttools, a financial services application service provider in New York, will roll out IBM Web services to customers including J.P. Morgan, Capital International and Instanet. Tim Hilgenberg, CTO at human resources consulting firm Hewitt Associates, says Web services, in general, have the potential to prevent customers from being locked into one vendor. 'Web services are like Switzerland,' because they're nonproprietary, so 'customers won't have these pocketed islands and have to create the connectivity to reach these islands, which costs a lot,' he says..." See the "IBM Global Services and IBM WebSphere Platform to Support IBM's Web Services Infrastructure."

  • [May 24, 2001] "Using the Jena API to Process RDF. [Tutorial.] By Joe Verzulli. From May 23, 2001. ['Jena is a freely-available Java API for processing RDF. This article provides an introduction to the API and its implementation.'] "There has been growing interest in the Resource Description Framework (RDF) and a number of tools and libraries have been developed for processing it. This article describes one such library, Jena, a Java API for processing RDF. It is also the name of an open source implementation of the API.'] XML is very flexible and allows information to be encoded in many different ways. If meaningful tag names are used it is relatively easy for a person to determine the intended interpretation of an XML string. However, it is difficult for programs to determine the intended interpretation since programs don't understand English tag names. DTDs and XML Schemas don't really help in this regard. They just allow a program to verify that XML strings conform to some set of rules. RDF is a model and XML syntax for representing information in a way that allows programs to understand the intended meaning. It's built on the concept of a statement, a triple of the form {predicate, subject, object}. The interpretation of a triple is that <subject> has a property <predicate> whose value is <object>... RDF requires that different kinds of semantic information (e.g., subjects, properties, and values) be placed in prescribed locations in XML. Programs that read an XML encoding of RDF can then tell whether a particular element or attribute refers to a subject, a property, or the value of a property. Jena was developed by Brian McBride of Hewlett-Packard and is derived from earlier work on the SiRPAC API. Jena allows one to parse, create, and search RDF models. Jena defines a number of interfaces for accessing and manipulating RDF statements..." [Website description: "Jena is an experimental java API for manipulating RDF models. Its features include: (1)statement centric methods for manipulating an RDF model as a set of RDF triples (2) resource centric methods for manipulating an RDF model as a set of resources with properties (3) cascading method calls for more convenient programming (4) built in support for RDF containers - bag, alt and seq (5) enhanced resources - the application can extend the behaviour of resources (6) mulptiple implementations (7) integrated parser (David Megginson's RDFFilter). An alpha quality implementation is available for download."] See "Resource Description Framework (RDF)."

  • [May 24, 2001] A Web Less Boring. [Talks.]" By Edd Dumbill. From May 23, 2001. ['Tim Bray condemned the state of web browser technology, saying it was responsible for making the Web dull, in his opening keynote at XML Europe 2001 in Berlin.'] " In his opening keynote at XML Europe 2001 in Berlin, Tim Bray explained how XML could make the Web more interesting -- specifically, the Web's user interface. Bray recounted that many members of the original team that created XML envisaged its application in web-enabled client document rendering systems, providing flexible user interfaces for exploring content. Instead XML seems to have found its immediate application in the backroom, connecting databases and disparate server systems. One of the most well-known uses of XML in this scenario is the SOAP protocol, which allows message passing between applications using XML and HTTP. Bray extolled SOAP, explaining that its many implementations and widespread deployment were key to its importance. He emphasized the significant role SOAP will play in the future of web applications. Bray also questioned the value of the W3C's XML Protocol Activity, saying that they should have rubber-stamped SOAP and got on with things. To the amusement of the audience, Bray mused that with the enormous size of the XML Protocol working group, it might just take them 18 months to make that decision alone..."

  • [May 24, 2001] "Interviews: Not Stuck in the Woods. [CTO Insider.]" By Michael Vizard and Bob Renner. In InfoWorld Issue 21 (May 18, 2001), page 54. ['Forest Express has built a public exchange for the timber industry.] "As the CTO of Forest Express, Bob Renner is charged with creating a public marketplace that is jointly funded by investors such as International Paper, Weyerhaeuser, Georgia-Pacific, Boise Cascade, Mead, Willamette Industries, and Morgan Stanley. Forest Express recently concluded its 100th transaction after launching last winter. In an interview with InfoWorld Editor in Chief Michael Vizard, Renner talks about what it takes to build a public exchange... Renner: Forest Express is a marketplace that is used to buy and sell products [related to] the forest products industry. Currently we have four vertical markets that we're focused on ... paper, building materials, timber, and recycling... So far we have all four of our verticals up in transacting business. The platform is comprised of a couple of base technology providers. For the middleware layer, webMethods is our partner. We also use Commerce One and SAP for the marketplace. The other products that we're using are best-of-breed type products, include Moai Technologies for auctions, and Corio is our ASP [application service provider] outsource provider. [InfoWorld: How important is XML to your efforts, especially as it relates to integration?] Renner: XML is a journey and not an event. The standards that are starting to evolve are clearly going to help us get to something that makes supporting cross industry interoperability easier. But it's going to take a fair amount of work to consolidate around the standards to support this particular industry. And that'll take time. There's been some very good leadership played in Europe around XML standards for the paper industry. We've latched onto those and tried to add additional value as those standards move forward..."

  • [April 24, 2000] "Managing Documentation with XML." By Christopher R. Maden. 10-April-2000. Some slides for a brief introduction I did [on XML] for the "Documentation & Training Conference [Tyngsboro, MA]... XML is just syntax; the information analysis part of the problem is more complex. If you understand your information, you can make good use of it in Word, Frame, XML, SGML, or HTML; if you don't understand it, you're going to have problems in any syntax. There are good reasons for going with XML, but there are drawbacks in that using it (generally speaking) forces you to confront how you're thinking about your information..."

  • [May 23, 2001] "RDF and TopicMaps: An Exercise in Convergence." By Graham Moore (Vice President Research & Development, Empolis GmbH). Paper for XML Europe 2001 Berlin. 2001-05-24. ['This paper presents: (1) a way in which RDF can be used to model topicmaps and vice versa; (2) the issues that arise when performing a model to model mapping; (3) some proposals for changes to XTM to enable semantic interchange of the two standards. I am presenting this paper on Thursday at XML Europe if anyone is around and interested. I don't think this is the complete solution to the integration issue. However, I think that this paper could help focus some of the discussions.'] "There has long been a sense in the semantic web community that there is a synergy between the work of ISO and on TopicMaps and that of the W3C on RDF. This paper looks at why and how we can bring these models together to provide a harmonised platform on which to build the semantic web. The reasoning behind bringing together these two standards is in the fact that both models are intent on describing relationships between entities with identity. The question we look to answer in this paper is 'Is the nature of the relationships and the identified entities the same'. If we can show this to be true then we will be able to have a common model that can be accessed as a TopicMap or as a RDF Model. To make this clearer, if we have a knowledge tool X we would expect to be able to import some XTM syntax, some RDF syntax and then run either a RDF or TMQL query in the space of tool X and expect sensible results back across the harmonised model. In order to achieve this aim we need to show a model to model mapping between the two standards. We present the TopicMap model, the RDF model, a discussion on modelling versus mappings and then a proposed mapping between the two. As part of the mapping we make suggestions as to the changes that could be made to better support interoperation and finally we conclude and provide an insight into future work... we define a clear goal that we should be able to run a TMQL query against an RDF model and get 'expected results' i.e., those that would be gained from running a query against the equivalent TopicMap. To make this possible we need to make the models map rather than using the models to describe each other. The key difference in these approaches is that one provides a mapping that is semantic, the other uses each standard as a tool for describing other models. It is interesting that both models are flexible enough and general enough to allow each to be modelled using the other... While we found there was a useful mapping that could be performed it was felt that some additions to the TopicMap model -- Templates and Arcs would enable two way transition from RDF to TopicMaps and vice versa. We conclude that making some non-regressive enhancements to TopicMaps would enable a useful degree of convergence between TopicMaps and RDF, creating a single common semantic space in which to define the semantic web." See (1) "Resource Description Framework (RDF)", and (2) "(XML) Topic Maps." [cache]

  • [May 23, 2001] "XML Catalogs." Edited by Norman Walsh for the OASIS Entity Resolution Committee. Working Draft 24-May-2001. "In order to make optimal use of the information about an XML external resource, there needs to be some interoperable way to map the information in an XML external identifier into a URI for the desired resource. This Standard defines an entity catalog that handles two simple cases: (1) Mapping an external entity's public identifier and/or system identifier to an alternate URI. (2) Mapping the URI of a resource (a namespace name, stylesheet, image, etc.) to an alternate URI. Though it does not handle all issues that a combination of a complete entity manager and storage manager addresses, it simplifies both the use of multiple products in a great majority of cases and the task of processing documents on different systems...This Standard defines a format for an application-independent entity catalog that maps external identifiers and URIs to (other) URIs. This catalog is expressed in terms of XML 1.0 (Second Edition) and XML Namespaces. This catalog is used by an application's entity manager. This Standard does not dictate when an entity manager should access this catalog; for example, an application may attempt other mapping algorithms before or (if the catalog fails to produce a successful mapping) after accessing this catalog. For the purposes of this Standard, the term catalog refers to the logical 'mapping' information that may be physically contained in one or more catalog entry files. The catalog, therefore, is effectively an ordered list of (one or more) catalog entry files. It is up to the application to determine the ordered list of catalog entry files to be used as the logical catalog. [This Standard uses the term 'catalog entry file' to refer to one component of a logical catalog even though a catalog entry file can be any kind of storage object or entity including -- but not limited to -- a table in a database, some object referenced by a URI, or some dynamically generated set of catalog entries.] Each entry in the catalog associates a URI with information about an external reference that appears in an XML document." Document appendices include: A. A W3C XML Schema for the XML Catalog (Non-Normative); B. A TREX Grammar for the XML Catalog (Non-Normative); C. A RELAX Grammar for the XML Catalog (Non-Normative); D. A DTD for the XML Catalog (Non-Normative); E. Support for TR9401 Catalog Semantics (Non-Normative). See also the new issues list, the diff from previous version, and the OASIS TC on Entity Resolution.

  • [May 23, 2001] "The Power Of Voice." By Ana Orubeondo (Test Center Senior Analyst, Wireless and Mobile Technologies). In InfoWorld Issue 21 (May 18, 2001), pages 73-74. ['VoiceXML should connect your existing Web infrastructure, the Internet, and the standard telephone by providing a standard language for building voice applications. E-business managers who plan voice portal strategies will need to decide whether to build the portals themselves or turn to a growing number of voice ASPs. Be careful when selecting rapidly evolving voice portal technologies. Key improvements such as grammar authoring in Version 2.0 should iron out some of the shortcomings VoiceXML exhibits.'] VoiceXML is a standard language for building interfaces between voice-recognition software and Web content. Just as HTML defines the display and delivery of text and images on the Internet, VoiceXML translates any XML-tagged Web content into a format that speech-recognition software can deliver by phone. VoiceXML 1.0 is a specification of the VoiceXML Forum, an industry organization founded by AT&T, IBM, Lucent Technologies, and Motorola and consisting of more than 300 companies. With the backing and technology contributions of its four world-class founders and the support of leading Internet industry players, the VoiceXML Forum has made speech-enabled applications on the Internet a reality through its mission to develop and promote VoiceXML. With VoiceXML, users can create a new class of Web sites using audio interfaces, which are not really Web sites in the normal sense because they provide Internet access with a standard telephone. These applications make online information available to users who do not have access to a computer but do have access to a telephone. Voice applications are useful for highly mobile users who need hands-and eyes-free interaction with Web applications, possibly while driving or carrying luggage through a busy airport... Voice portals such as BeVocal, TellMe, and Shoptalk are already providing voice access to stock quotes, movie and restaurant listings, and daily news. The best-suited applications for VoiceXML are information retrieval, electronic commerce, personal services, and unified messaging. Several companies have already employed VoiceXML in information retrieval applications to great success. Hotels, car rental agencies, and airlines have implemented continuous voice access to allow customers to make or confirm reservations, buy tickets, find rates, get store hours and driving directions, and access loyalty programs. Voice automated services help reduce call-center costs and increase customer satisfaction... As the volume of information published using HTML grows and the range of Web services broadens, VoiceXML will become an increasingly attractive technology. VoiceXML increases the leverage under a company's Web investment by offering voice interpretation of HTML content." See "VoiceXML Forum." [altURL]

  • [May 22, 2001] Web Services Flow Language (WSFL 1.0). By Prof. Dr. Frank Leymann (Distinguished Engineer; Member IBM Academy of Technology, IBM Software Group). May 2001. 108 pages. [Summary: 'The Web services Flow Language guide (WSFL) describes how Web services may be composed into new Web services to support business processes. Composition comes in two types: The first type allows to specify the logic of a business process; the second type allows to define the mutual exploitation of Web services of participants in a business process. A brief concepts of composition sketch is provided in an introductory chapter of the document. A detailed discussion of the metamodel behind composition follows. The language proper is described and illustrated by code snippets, followed by an XML schema of the language.'] "The Web Services Flow Language (WSFL) is an XML language for the description of Web Services compositions. Flow Models: In the first case, a composition is created by describing how to use the functionality provided by the collection of composed Web Services. This is also known as flow composition, orchestration, or choreography of Web Services. WSFL models these compositions as specifications of the execution sequence of the functionality provided by the composed Web Services. Execution orders are specified by defining the flow of control and data between Web Services. For this reason, in this document, we will also use the term flow model to refer to the first type of Web Services compositions. Flow models can especially be used to model business processes or workflows based on Web Services... Global Models: In the second case, no specification of an execution sequence is provided. Instead, the composition provides a description of how the composed Web Services interact with each other. The interactions are modeled as links between endpoints of the Web Services interfaces, each link corresponding to the interaction of one Web Service with an operation of another Web Service's interface. Because of the decentralized or distributed nature of these interactions, we will use the term global model in this document to refer to this type of Web Services composition. Recursive Composition: WSFL provides extensive support for the recursive composition of services. In WSFL, every Web Service composition (a flow model as well as a global model) can itself become a new Web Service, and can thus be used as a component of new compositions. The ability to do recursive composition of Web Services provides scalability to the language and support for top-down progressive refinement design as well as for bottom-up aggregation. For these reasons, recursive composition has been a central requirement in the design of the WSFL language. Hierarchical and Peer-to-Peer Interaction: WSFL compositions support a broad spectrum of interaction patterns between the partners participating in a business process. In particular, both hierarchical interactions and peer-to-peer interactions between partners are supported. Hierarchical interactions are often found in more stable, long-term relationships between partners, while peer-to-peer interactions reflect relationships that are often established dynamically on a per-instance basis... The guiding principle behind WSFL is to fit naturally into the Web Services computing stack. It is layered on top of the Web Services Description Language (WSDL). WSDL describes the service endpoints where individual business operations can be accessed WSFL uses WSDL for the description of service interfaces and their protocol bindings WSFL also relies on an envisioned 'endpoint description language' to describe non-operational characteristics of service endpoints, such as quality-of-service properties. Here, we will refer to this language as the 'Web Services Endpoint Language' (WSEL)..." Section 5 'Appendix A: WSFL Schema' features the W3C XML schema for WSFL [2000/10 schema version]. Note: "The Web Services Flow Language is the result of a team effort: Francisco Curbera, Frank Leymann, Dieter Roller, and Marc-Thomas Schmidt created the language and its underlying concepts. Matthias Kloppmann and Frank Skrzypczak focused on its lifecycle aspects. Francis Parr worked on details of the example in the appendix. Many others helped by reviewing and discussing earlier versions of the document, most notably Sanjiva Weerawarana and Claudia Zentner." See the recent announcement for IBM's Web Services infrastructure plans. [cache]

  • [May 22, 2001] "[Schema Algebra.] Defining Logical Relationships Between Documents, Schemata, URIs, Resources, and Entities." By Jonathan Borden (The Open Healthcare Group). May 4, 2001 or later. "This paper forms the foundation for a schema independent type framework. The relationship between URIs, Resources and Entities are formally defined. XML Namespaces are defined using tuples. We define a schema generically through a validity predicate. This predicate tests an instance with respect to a schema. This predicate serves to define the set of Instances of a particular schema..." [Citation context: see reference in XML-DEV posting "Types and Context," 21-May-2001: 'In the Schema Algebra, statements [7-9] a "type" is the property of belonging to a class. The predicate "typeOf(x, c)" tests a node "x" for membership in the instance set of the class "c"...']

  • [May 22, 2001] "A Generic Fragment Identifier Syntax." By [Jonathan Borden]. 2001-05-14 (or later). "Frequently URI references, which may contain a fragment identifiers, are used independent of their resolution into a particular document, or document fragment, at a particular point in time. A notable example is use of a URI reference as an XML Namespace name. In the current situation a the syntax of the fragment identifier part of a URI reference is defined by the MIME media type of the referenced document as in an HTTP transaction. This media type is not fixed, and may change from time to time and from reference to reference, or according to request headers such as with content negotiation. It turns out that the fragment identifier syntax is often constant from media type to media type. In order to enable robust use of fragment identifiers, particularly outside a particular HTTP transaction, we propose a generic, media type independent, fragment identifier syntax. This fragment identifier syntax is compatible with current usage of fragment identifiers, and is generally compatible with future proposed syntaxes such as XPointer. This specification does not itself specify how user agents are to process or interpret fragment identifiers, such as may be specified with individual MIME media type registrations, rather provides a consistent syntax for fragment identifiers and a registration mechanism for schemes associated with fragment identifier syntaxes..."

  • [May 22, 2001] "Augmenting UML with Fact-orientation." By Terry Halpin. Published in the Proceedings of the Hawai'i International Conference on System Sciences 2001, Section "Unified Modeling Language: A Critical Review and Suggested Future," [HICSS-34], January 3-6, 2001, Outrigger Wailea Resort, Island of Maui. "The Unified Modeling Language (UML) is more useful for object-oriented code design than conceptual information analysis. Its process-centric use-cases provide an inadequate basis for specifying class diagrams, and its graphical language is incomplete, inconsistent and unnecessarily complex. For example, multiplicity constraints on n-ary associations are problematic, the constraint primitives are weak and unorthogonal, and the graphical language impedes verbalization and multiple instantiation for model validation. This paper shows how to compensate for these defects by augmenting UML with concepts and techniques from the Object Role Modeling (ORM) approach. It exploits 'data use cases' to seed the data model, using verbalization of facts and rules with positive and negative examples to facilitate validation of business rules, and compares rule visualizations in UML and ORM. Three possible approaches are suggested: use ORM for conceptual analysis then map to UML; supplement UML with population diagrams and user-defined constraints; enhance the UML metamodel... The UML notation includes the following kinds of diagram for modeling different perspectives of an application: use case diagrams, class diagrams, object diagrams, statecharts, activity diagrams, sequence diagrams, collaboration diagrams, component diagrams and deployment diagrams. This paper focuses on conceptual data modeling, so considers only the static structure (class and object) diagrams. Class diagrams are used for the data model, and object diagrams for data populations. Although not yet widely used for designing database applications, UML class diagrams effectively provide an extended Entity-Relationship (ER) notation that can be annotated with database constructs (e.g., key declarations)... This paper identifies several weaknesses in the UML graphical language and discusses how fact-orientation can augment the object-oriented approach of UML. It shows how verbalization of facts and rules, with positive and negative examples, facilitates validation of business rules, and compares rule visualizations in UML and ORM on the basis of specified modeling language criteria... [Conclusion:] Fact-orientation, as exemplified by ORM, provides many advantages for conceptual data analysis, including expressibility, validation by verbalization and population at both fact and constraint levels, and semantic stability (e.g., avoiding changes caused by attributes evolving into associations). ORM also has a mature formal foundation that may be used to refine the semantics of UML. Object-orientation, as exemplified by UML, provides several advantages such as compactness, and the ability to drill down to detailed implementation levels for object-oriented code. If UML is to be used for conceptual analysis of data, some ORM features can be adapted for use in UML either as heuristic procedures or as reasonably straightforward extensions to the UML metamodel and syntax. These include mixfix verbalizations of associations and constraints for associations, and exploitation of data use cases by populating associations with tables of sample data using role names for the column headers. However there are some fundamental aspects that need drastic surgery to the semantics and syntax of UML if it is ever to cater adequately for non-binary associations and some commonly encountered business rules. This paper revealed some serious problems with multiplicity constraints on n-ary associations, especially concerning non-zero minimum multiplicities. For example, they cannot be used in general to capture mandatory and minimum occurrence frequency constraints on even single roles within n-aries, much less role combinations. Moreover, UML's treatment of set-comparison constraints is defective. Although it is possible to fix these problems by changing UML's metamodel to be closer to ORM's, such a drastic change to the metamodel may well be ruled out for pragmatic reasons (e.g., maintaining backward compatibility and getting the changes approved). In contrast to UML, ORM has only a small set of orthogonal concepts that are easily mastered. UML modelers willing to learn ORM can get the best of both approaches by using ORM as a front-end to their data analysis and then mapping the ORM models to UML, where the additional constraints can be captured in notes or textual constraints. Automatic transformation between ORM and UML is feasible, and is currently being researched." See: "Conceptual Modeling and Markup Languages." [alt URL, cache]

  • [May 22, 2001] "Chatting in Financial Messages." By Dmeetry Raizman ( May 2001. "IT people are still striving to bring their organizations to the promised land of straight through processing. But, according to this article, real-time chatting takes this idea one step beyond. In the business world today, most electronic messaging is asynchronous - that is, it goes one direction at a time, rather like an old-time telegraph system. Thus, while the transfer of the message itself can be quick, one system cannot talk to another in real-time. It must send a message and then wait for a response before speaking again. Now XML and Java are changing all that. Particularly in the case of financial messages, it will be possible for systems to 'chat' in real time - that is, to speak to one another in much the way that people now converse on the phone or in a group. Presenting debatable approach to Straight Through Processing (STP), this article exposes the convergence point where the combined strength of Java and XML turns toward the major trend of the financial industry - enabling the metamorphosis of conventional STP to RTC - Real Time Chatting between involved parties. Indeed, technology contributes to the compression of the securities trade settlement cycle from the existing T+3 to T+1 enabling the upcoming drift of the financial industry..." See also: "Straight Through Processing Markup Language (STPML)."

  • [May 21, 2001] "Leveraging the Business Analyst: Object Role Modeling with Visual Studio.NET." From Microsoft. 2001-05-21. "Object role modeling (ORM) provides a conceptual, easy-to-understand method of modeling data. The ORM methodology is based on three core principles: (1) Simplicity. Data is modeled in the most elementary form possible. (2) Communicability. Database structures are documented by using language that can be easily understood by everyone. (3) Accuracy. A correctly normalized schema is created based on the data model. Typically, a modeler develops an information model by gathering requirements from people who are familiar with the application but are not skilled data modelers. The modeler must be able to communicate data structures at a conceptual level in terms that the non-technical business expert can understand. The modeler must also analyze the information in simple units and work with sample populations. ORM is specifically designed to improve this kind of communication. Rules Expression: ORM represents the application world as a set of objects (entities or values) that play roles (parts in relationships). ORM is sometimes called fact-based modeling, because it verbalizes the relevant data as elementary facts. These facts can't be split into smaller facts without losing information... Where ORM describes business facts in terms of simple objects and predicates, entity relationship methodologies describe the world in terms of entities that have attributes and participate in relationships... ORM not only provides a simple, direct way of describing relationships between different objects. From the example, we see that ORM also provides flexibility. Models created using ORM have a greater capacity to adapt to changes in the system than those created using other methodologies. In addition, ORM allows non-technical business experts to talk about the model in terms of sample populations, so they can validate the model using real-world data. Because ORM allows reuse of objects, the data model automatically maps to a correctly normalized database schema. The simplicity of the ORM model eases the database querying process. Using an ORM query tool, the user can access the desired data without having to understand the underlying structure of the database... Like any good modeling method, ORM is more than just a notation. It includes various design procedures to help modelers map conceptual and logical models, and to use reverse engineering to switch between those models. ORM models can also be automatically mapped to database schemas for implementation on most popular relational databases..." See (1) the announcement, Microsoft Unveils Visual Studio.NET Enterprise Tools. Visual Studio.NET Enterprise Architect and Enterprise Developer to Lead Corporations Into New Age of XML Web Services", and (2) "Visual Studio.NET Enterprise Features." Visual Studio.NET is described as supporting Testing XML Web Services and Applications."

  • [May 21, 2001] "Decryption Transform for XML Signature." 10-May-2001. W3C draft document edited by Takeshi Imamura Hiroshi Maruyama as part of the W3C XML Encryption Working Group activity. See the note from Joseph Reagle Jr. "This document specifies the 'decryption transform', which enables XML Signatures verification even if both signature and encryption operations are performed on an XML document." Status: "This is a proposal being staged for publication and has (as of yet [2001-05-21]) no W3C status or standing." Excerpt: "Since encryption operations applied to part of the signed content after a signature operation cause a signature not to be verifiable, it is necessary to decrypt the portions encrypted after signing before the signature is verified. The 'decryption transform' proposed in this document provides a mechanism; decrypting only signed-then-encrypted portions (and ignoring encrypted-then-signed ones). A signer can insert this transform in a transform sequence (e.g., before Canonical XML or XPath) if there is a possibility that someone will encrypt portions of the signature. The transform defined in this document is intended to propose a resolution to the decryption/verification ordering issue within signed resources. It is out of scope of this document to deal with the cases where the ordering can be derived from the context. For example, when a ds:DigestValue element or a (part of) ds:SignedInfo element is encrypted, the ordering is obvious (without decryption, signature verification is not possible) and there is no need to introduce a new transform..." See: (1) XML Encryption Working Group, and (2) "XML and Encryption."

  • [May 21, 2001] "XML in Enterprise Information Systems." By David Jackson (DSTC, Brisbane, Australia). Version: May 2001. ['Some of you might be interested in this document on the use of XML in enterprise information sytems... There are two formats - (1) one big file and (2) a multi-file version. Links to both are found at This is the second public draft of this document (first draft last November). It is a rather high level overview of a large topic and there is still a long way to go. Some sections of it need a lot of work. The section on exchange is especially undeveloped, I think in part due to the fact that this area of exchange and XML is still very much in flux. The document gives few if any answers but asks a lot of questions. There are many gaps in the content reflecting gaps in my understanding. No doubt some of what I've written also reveals gaps in my understanding. I would be glad to have any feedback, ideas or contributions. All contributions will of course be acknowledged. I would also be happy to include references to related work.'] From the initial section: "Building useful systems and knowledge structures in the enterprise is merely difficult. Still, in an enterprise, as opposed to either of the above alternatives, the knowledge domains are limited, and there is some chance of control, however small, over the data and document structures and work practices of the people who work there. Using XML in information systems is still quite new, which means that, except in the simplest cases, we are not even sure of all the problems to be solved in systems which use XML. Even less do we know how best to solve those problems. We may not even be sure what kinds of systems we want to build with XML, or of the kinds of systems that XML makes possible. In other words, new kinds of systems are possible, not just new kinds of technology to build the systems we already know about. To come to grips with the new possibilities will require a period of learning and gaining experience... Should we think of XML from a systems perspective, and focus on the benefits of improved IT architectures? Or is XML about information, requiring the development of improved business, information and data models before its benefits can be fully realised?... Enterprises are interested in internal sharing of their knowledge in whatever form, and this means developing the knowledge formats and information systems to do so. The real problem is not agreeing on what we want. After all, wishing is cheap, and who would wish for part when for the same low price they could wish for all? Much harder is the design and implementation of the systems, and designing and using the information resources. However, when it comes to fitting XML into this picture we often do not really know what we are doing, beyond some simple cases, generally involving data exchange between applications..."

  • [May 21, 2001] "XML: Deriving Applications from Information Web Publishing with XML Part II. [Tools of the Trade.]" By Wes Biggs. In Online Journalism Review (May 17, 2001). ['To fully capture the value of XML-based data architectures, we need software applications that act on XML data to do something useful, like publish a Web site.'] If a given XML document in our system represents a single article or story, the most straightforward application for that document would be one that builds a Web page from it that can be displayed by a browser. While modern browsers like Internet Explorer 5.5 claim to be 'XML-enabled', an XML document does not detail how it should be displayed in the same way a Web-native HTML document does. It's possible to view a straight XML document in the browser, but it looks just about as awful as the text-only view. In order to build a useful Web site in real life, we need to prescribe a uniform method of translating our XML documents to HTML. Database-driven applications typically use a template file to describe the breakdown between static and dynamic content on a page. The most popular technique for translating XML to HTML takes the same basic approach, utilizing a "stylesheet" as the template. Stylesheets look a bit like HTML files, but include specialized tags that describe where to place data found in an XML document. An XSL (XML Stylesheet Language) processor is a generic software application that takes an XML document (data) and an XSL stylesheet (rules for transforming data), and generates another document. Many software programmers swear by an architecture called MVC, for Model/View/Controller. XSL's application to XML documents follows this design: an XML document models the data; the XSL processor and stylesheet serve as the controller, and the view is the generated document created by this process. By changing the controller, we can change the browser view without having to change the model XML document... To summarize this article, XML is an excellent concept for providing a layer of abstraction between the editorial version of your content and the form seen by the reader. Technologies like XSL can ease the pain of distributing to multiple platforms and varying devices, and can make site redesigns a less painful experience all around. And knowledgeable techies in almost any environment can find resources to put together and integrate an XML system or subsystem for a Web site." See also Part 1.

  • [May 21, 2001] "Web Publishing with XML. Part I: Defining the definitions." By Wes Biggs. In Online Journalism Review (May 01, 2001). "Online media conferences are rife with talk of XML. Industry pundits proclaim how well it slices, dices and tenderizes your cherished Web sites. The term adorns headlines in all the weekly rags on the boss's desk, but no one can figure out how to translate the gloss into something of substance for your online presence. It seems to be everything to everyone -- but what is it? In this series, we'll take a look at some practical applications of XML-based technologies that have the potential, if executed correctly, to simplify the process of bringing content online... a number of groups have taken on the work of creating XML tag languages that describe news media documents. The frontrunners are two standards proposed by the International Press Telecommunications Council: NITF, the News Industry Text Format, and the medium-agnostic NewsML. In addition to the IPTC's own push, media organizations like the American Press Institute have officially recommended NITF in place of their own competing proposals, which included other acronym-heavy standards like NMF and XMLNews... While XML is not a replacement for the data integrity and searching capabilities of databases, it provides a means of conceptualizing data that is easily understood by software, that doesn't have to know the proprietary details about a company's mainframe or the way the database engine operates. Another benefit of the use of an XML standard like NITF is data interchange. If you have software tools that know how to process a NITF-formatted document, it doesn't matter whether the document itself came from your in-house reporting staff, your database, or straight off a wire service that happened to deliver its feed in NITF form. And even if NITF isn't the native language of a given service, news aggregator sites like ScreamingMedia can help bridge the gap by turning all kinds of source feeds into compliant NITF documents... With XML, you can automate the population of templates in the same way a mail merge program combines a form letter with a list of recipients. One of the most straightforward returns on an XML investment is often a reduction in the amount of time and manual labor it takes to ready an article for Web publication..."

  • [May 21, 2001] "The Model Of Object Primitives: Representation of Object Structures based on State Primitives and Behaviour Policies." By Nektarios Georgalas (BT Adastral Park, B54/Rm125, Martlesham Heath, IPSWICH, IP5 3RE). In Succeeding with Object Databases: A Practical Look at Today's Implementations with Java and XML, edited by Roberto Zicari and Akmal Chaudhri. John Willey and Sons Publishers, 2000. ISBN 0471383848. "In contemporary business environments, different problems and a variety of diverse requirements compel designers to adopt numerous modelling methodologies that use semantics customised to suit ad-hoc needs. This fact hinders the unanimous acceptance of one modelling paradigm and lays the ground for the adoption of customised versions of some. Based on this principle, we devised and present the Model of Object Primitives which aims at providing a minimum as well as generic set of semantics without compromising expressive capability. It is a class-based model that accommodates the representation of static and dynamic characteristics, i.e., state and behaviour, of objects acting within the problem domain. It uses classes and policies to represent object state and behaviour and collections to collate them into complex structures. In this paper we introduce MOP and provide an insight into its semantics. We examine three case studies that use MOP to represent the XML, ODMG and Relational data-models and also schemata which are defined in these models. Subsequently, another two case studies illustrate practically how MOP can be used in distributed software environments to customise the behaviour or construct new components based on the powerful tool of behaviour policies... MOP, the Model of Object Primitives, is a class-based model that aims at analysing and modelling objects using their primitive constituents, i.e., state and behaviour. MOP contributes major advantages: (1) Minimal and rich. The semantics set includes only five basic representation mechanisms, namely, state, behaviour, collection, relationship and constraint. These suffice for the construction of highly complex schemata. MOP, additionally, incorporates policies through which it can express dynamic behaviour characteristics. (2) Cumulative and expandable concepts. The aggregation mechanism introduced by the Collection Class allows for the specification of classes that can incrementally expand. Since a class can participate in many collections, we can create Collections where one contains the other such that a complex structure is gradually produced including state as well as behaviour constituents. (3) Reusable concepts. A MOPClass can be included into more than one Collections. Therefore, concepts modelled in MOP are reusable. As such, any behaviour that is modelled as a combination of a Behaviour Class and a MOP Policy can be reusable. This provides for the usage of MOP in modelling software components. Reusability is a principal feature of components. (4) Extensible and customisable. MOP can be extended to support more semantics. Associating its main modelling constructs with constraints, more specialised representation mechanisms can be produced. (5) Use of graphs to represent MOP schemata and policies. MOP classes and relationships within a schema could potentially be visualised as nodes and edges of a graph. MOP policies are described to be graph-based as well. This provides for the development of CASE tools, similar to MOPper, which alleviate the design of MOP-based models. It is our belief that MOP can play a primary role in the integration of heterogeneous information systems both conceptually and practically. Conceptually, MOP performs as the connecting means for a variety of information models and can facilitate transformations among them. It was not within the paper's scope to study a formal framework for model transformations. However, the XML, ODMG and Relational data-model case studies give good evidence that MOP can be efficiently used to represent semantics of diverse modelling languages. This is a necessary condition before moving onto studying model transformations. Practically, MOP provides effective mechanisms to manage resources within a distributed software environment. Both practical case studies presented above show that MOP can assist in the construction of new components or in customising the behaviour of existing components. This is because MOP aids the representation and, therefore, the manipulation of context resources, state or behaviour, in a primitive form. Moreover, the adoption of policies as the means to describe the dynamic aspects of component behaviour, enhances MOP's role. Consequently, it is our overall conclusion that the Model of Object Primitives constitutes a useful paradigm capable of delivering efficient solutions in the worlds of data modelling and distributed information systems." See especially Section 4.1, "XML in MOP." Related references: "Conceptual Modeling and Markup Languages."

  • [May 19, 2001] "Standardizing XML Rules." By Benjamin N. Grosof (MIT Sloan School of Management, Cambridge, MA, USA. Email: or Invited paper for the IJCAI 2001 Workshop on E-Business and the Intelligent Web [August 5 2001], part of the Seventeenth International Joint Conference on Artificial Intelligence. ['The author provides an overview of current efforts to standardize rules knowledge representation in XML, with special focus on the design approach and criteria of RuleML, an emerging standard. With Harold Boley of DFKI (Germany) and Said Tabet of Nisus Inc. (USA), Benjamin N. Grosof leads an early-phase standards effort on a markup language for exchange of rules in XML, called RuleML (Rule Markup Language); the goal of this effort is eventual adoption as a Web standard, e.g., via the World Wide Web Consortium'] "RuleML is, at its heart, an XML syntax for rule knowledge representation (KR), that is inter-operable among major commercial rule systems. It is especially oriented towards four commercially important families of rule systems: SQL (relational database), Prolog, production rules (cf. OPS5, CLIPS, Jess) and Event-Condition-Action rules (ECA). These kinds of rules today are especially found embedded in Object-Oriented (OO) systems, and are often used for business process connectors / workflow. These four families of rule systems all have common core abstraction: declarative logic programs (LP). 'Declarative' here means in the sense of KR theory. Note that this supports both backward inferencing and forward inferencing. RuleML is actually a family (lattice) of rule KR expressive classes: each with a DTD (syntax) and an associated KR semantics (KRsem). These expressive classes form a generalization hierarchy (lattice). The KRsem specifies what set of conclusions are sanctioned for any given set of premises. Being able to define an XML syntax is relatively straightforward. Crucial is the semantics (KRsem) and the choice of expressive features. The motivation to have syntax for several different expressive classes, rather than for one most general expressive class, is that: precision facilitates and maximizes effective interoperability, given heterogeneity of the rule systems/applications that are exchanging rules. The kernel representation in RuleML is: Horn declarative logic programs. Extensions to this representation are defined for several additional expressive features: (1) negation: negation-as-failure and classical negation; (2) prioritized conflict handling: e.g., cf. courteous logic programs; (3) disciplined procedural attachments for queries and actions: e.g., cf. situated logic programs; (4) equivalences, equations, and rewriting; (5) and other features as well. In addition, RuleML defines some useful expressive restrictions (e.g., Datalog, facts-only, binary-relations-only), not only expressive generalizations... In January 2001, we released a first public version of a family of DTDs for several flavors of rules in RuleML. This was presented at the W3C's Technical Plenary Meeting held February 26 to March 2, 2001. Especially since then, RuleML has attracted a considerable degree of interest in the R&D community. Meanwhile, the design has been evolving to further versions." See: "Rule Markup Language (RuleML)." [cache]

  • [May 18, 2001] "ebXML: It Ain't Over 'til it's Over." By Alan Kotok. From May 16, 2001. ['The final meeting of the Electronic Business XML initiative in Vienna marked the 18-month deadline set for the project, yet there is still plenty left to do.'] "At the Electronic Business XML (ebXML) meeting in Vienna, Austria, 7-11 May 2001, the 150 participants approved the specifications and technical reports defining the ebXML technical architecture. The group also held its most complete proof-of-concept demonstrations at the midpoint of the meeting. But the session ended with ebXML's most promising features, interoperable business semantics, still incomplete. This meeting marked the end of ebXML's 18-month self-imposed deadline that began in November 1999, and the topic of ebXML's future direction took up much of the participants' time and energy. Until this meeting, the ebXML leadership put off any serious discussion of its post-Vienna future. In Vienna, participants got their first chance to see ebXML's new incarnation. EbXML is a joint initiative of Organization for the Advancement of Structured Information Standards (OASIS) and the UN's Centre for Trade Facilitation and Electronic Business (UN/CEFACT). Its goal is to develop a set of specifications to allow any business of any size in any industry to do business with any other entity in any other industry anywhere in the world. The group's work has focused particularly on making e-business possible for smaller companies, generally left out of electronic data interchange (EDI) in the past. In this partnership, OASIS brings XML knowledge and experience, while UN/CEFACT, the group that developed and manages the UN/EDIFACT EDI standard, offers the business expertise... At the opening general session, Ray Walker, of UN/CEFACT and one of the ebXML executives, said his organization and OASIS would divide up management of the technical teams, where UN/CEFACT would continue the work on business content and OASIS would handle further work on infrastructure. The groups would create a coordination committee to jointly publish the approved specifications, with further details released during the week...OASIS and UN/CEFACT, according to the new agreement, will jointly publish the ebXML documents, including specifications, technical reports, white papers, and reference documents. In response to an audience question, Walker said that the ebXML site would continue 'for the moment' to provide one source for all ebXML documentation. The Vienna meeting also provided a discussion of future implementation strategies for ebXML. At a briefing for visitors to the meeting, Jon Bosak of Sun Microsystems, and former co-chair of the W3C's XML working group, laid out a three-stage process for businesses to implement ebXML. (1) Standard infrastructure: a result of its current work, ebXML can immediately offer a package of standard message structures, registries and repositories, company e-business profiles, and trading partner agreements. It would allow even the smallest businesses to send ebXML-compliant messages by e-mail. Larger companies can also take part in ebXML when it suits their purposes, for example, as a complement to their EDI transactions. Registries can start providing the message specifications, industry vocabularies, and profiles of potential trading partners. (2) Standard electronic messages: provide standardized messages defined by individuals or organizations. The standard messages would encourage the development of off-the-shelf software solutions and begin the process of replacing paper documents with electronic counterparts. By 2003, Bosak expects repositories to store and registries to index business process models and standard messages, with the models using UML, DTDs, or prose representations. (3) Single standard semantic framework: a standard electronic semantic framework would automatically generate standard schemas and messages. Business models would represent complete top-down analysis and allow for dynamic modification as new business relationships emerged..." See the Vienna announcement and the main entry, "Electronic Business XML Initiative (ebXML)."

  • [May 18, 2001] "XML Technologies: A Success Story." By J. David Eisenberg. From May 16, 2001. ['XML's not just about big business. Read how XML technologies XSL-FO and SVG helped improve this year's California Central Coast Section High School wrestling tournament.'] "We've all heard stories of how new XML technologies have helped build immense corporate databases and complex, dynamic web sites. Well, this isn't one of those stories. This story is about how the Apache Software Foundation's XML tools helped improve this year's California Central Coast Section High School wrestling tournament... Since I'm using Linux and the CCS uses Windows, I needed a cross-platform solution. Adobe PDF format was the answer, and this is where Scalable Vector Graphics (SVG) and Formatting Objects to PDF (FOP) enter the story. Creating the Bout Sheet The bout sheet is not a typical text document; it's mostly a set of lines, empty boxes and a circle with minimal text labeling. Thus, I decided to use Scalable Vector Graphics (SVG) to describe the form, and use FOP as a wrapper to produce the desired PDF output. I took a ruler and an old bout sheet, redrew the lines, and measured the widths and locations of the boxes and text, and created the formatting objects XML file by hand...So after all that work and trouble, was it worth it? Yes. It took me less time to produce the bout sheet with SVG than it would have taken to find a Windows machine, learn to use a drawing program, and produce a file that would have been in a proprietary format. The bracket printout was also worthwhile, mostly as a learning exercise and also as a proof of concept. The PDF output also looks better than the RTF. Again, there was a time savings; it was easier for me to learn the syntax for formatting objects than it would have been for me to learn the RTF to produce an equally good-looking result in that format. Finally, the fact that I was able to accomplish all of these tasks with open source software is the icing on the cake." See: "W3C Scalable Vector Graphics (SVG)."

  • [May 18, 2001] "Perl XML Quickstart: The Standard XML Interfaces. [Tutorial.]" By Kip Hampton. From May 16, 2001. "This is the second part in a series of articles meant to quickly introduce some of the more popular Perl XML modules. This month we look at the Perl implementations of the standard XML APIs: The Document Object Model, The XPath language, and the Simple API for XML. As stated in part one, this series is not concerned with comparing the relative merits of the various XML modules. My only goal is to provide enough sample code to help you decide for yourself which module or approach is most appropriate for your situation by showing you how to achieve the same result with each module given two simple tasks. Those tasks are 1) extracting data from an XML document and 2) producing an XML document from a Perl hash... Up to this point each module we've looked at shares the common goal of providing a generic interface to the contents any well-formed XML document. Next month we will depart from this pattern a bit by exploring some of the modules that, while perhaps less generically useful, seek to simplify the execution of some specific XML-related task..." See: "XML and Perl."

  • [May 17, 2001] "Working with XML: The Java API for XML Parsing (JAXP) Tutorial." By Eric Armstrong. Updates for May 16, 2001. "XML Tutorial Update: The XML overview section is now complete. In particular, the descriptions of the XML standards intiatives and the JAXP APIs have been rewritten, and are worth a cursory review. The rewritten pages include (1) 'XML and Related Specs: Digesting the Alphabet Soup' and (2) 'An Overview of the APIs'. See also the main web site for Java APIs for XML Processing (JAXP), and local references in "Java API for XML Parsing (JAXP)."

  • [May 16, 2001] "The Hype Stuff. [Web Technology: XML.]" By Scott Berinato. In CIO Magazine (May 15, 2001). ['Will XML be the ultimate platform? Or will it be the next EDI? Discover how companies are using XML to create business solutions today. Learn what CIOs must do to maintain the value and openness of XML.'] "'I hear it's going to cure cancer,' says Tim Bray, XML's cocreator. 'It's going to do my dishes, I hear,' says Anne-Marie Keane, Staples' vice president of B2B e-commerce. Behind the flip jokes lies XML -- a syntax that underpins a growing list of more than 300 nascent data standards. MathML, for instance, will make it possible to manipulate advanced mathematical equations on a webpage. Spacecraft Markup Language standardizes databases that operate telemetry and mission control. And then there's MeatXML, a comical name for a serious effort to create a universal meat and poultry supply chain standard. With XML going in so many directions at once, you can't blame CIOs for being confused. The hyperbole often makes XML sound like a salve for all pain. Even worse, the vendor hype is overwhelming. CommerceOne, for example, boasts that British Telecommunications will cut purchase order processing costs by 90 percent using XML-based procurement. Software and service provider JetForm claims developers can write programs in days that would have taken months without XML. Finding the truth behind the tales takes some digging. Technologically, XML is a giant leap for IT. It can drastically reduce development time while making data transfer over the Internet simple. If nurtured properly, it may even become the ASCII text of online business -- ubiquitous and assumed. Or it could become the next EDI, fractured under the pressure of vendor self-interest. One thing is certain: For XML to reach its full potential, CIOs will have to take an active role in forcing their partners, their vendors and even their competitors toward a radically more open computing model than what exists today... XML to work, each in one of the three areas most agree the technology will first permeate: (1) Business-to-business data sharing, where Alistair Duncan of Visa International has built his own XML vocabulary for sharing corporate credit card information. [Visa XML Invoice Specification] (2) Content management, where Gary Guilland of Safeco is using XML but is hardly ready to coronate it as the future of computing. [XFA - XML forms] (3) Application-to-application integration, where Steve Morin of TAC Worldwide sees XML revolutionizing the human resources industry -- if 90 competing vendors can agree to cooperate. [HR-XML]."

  • [May 16, 2001] "Export a Word Document to XML." By Kevin McDowell (Microsoft Corporation). From MSDN Online Library. May 2001. ['This solution allows you to export a Word document to an XML file. Microsoft Word 2000.'] "Converting any data to XML requires parsing the data and tagging it with descriptors. Within a Word document, text and hyperlinks already tagged by their formatting. Most documents contain multiple structural elements, such as headings, bylines, footnotes, and quotations. All types of formatting can be applied to indicate what the elements are. For example, most headings are not the same size, weight, or even font as paragraph level text. Within a Word document, you alter text by one of two methods: by applying a style or by applying formatting manually. A style in Word is nothing more than a named set of specific instructions describing the formatting to apply. When you apply a style, you are basically tagging that text as something: a heading, a subheading, a code block, a quotation, or some other document element. When you apply formatting manually, you are tagging that text as something special, but that something is not defined. If you were to attempt to parse the document by formatting, you would know how the text appears in the document, but you wouldn't know what the text is. However, if you only apply formatting using styles, when you parse the document, not only do you know how the text appears in the document, but also you have a style name to describe what the text is. Creating a document in this manner requires that you know what your formatting represents. Instead of making text bold for emphasis, you apply a style that not only bolds the text but is descriptive of why the text is bold to begin with... After you author a document by using styles and then convert it to XML, it becomes a queryable data source. If you have a folder of XML documents, it is essentially a database. Using the FileSystemObject object in the Microsoft Scripting Runtime object model to loop through all of the files in the folder, you could apply an Extensible Stylesheet Language (XSL) query to pull out all of the headings, author information, quotations, or whatever you want, from each of the XML articles. Conclusion: This solution provides a starting point to build an XML parser for Word documents. In addition to the XML functionality, it discusses how to build custom objects to handle sequential instances of all styles and graphics and how to loop through tables and lists. Remember, documents shouldn't be converted to XML merely for the sake putting them in XML. The best document to convert to XML is one that makes use of styles and will be reused in other ways." Available online: sample download.

  • [May 16, 2001] XML for Analysis SDK. From MSDN. ['Download the SDK that provides for universal data access to analytical data sources residing over the Web, without the need to deploy a client component that exposes COM interfaces.'] "XML for Analysis SDK: msxainst.exe is a self-extracting download that contains the Microsoft XML for Analysis provider and sample client applications. The Microsoft XML for Analysis Provider supports data access to analytical data sources (OLAP and data mining) residing on the Web. This provider implements the XML for Analysis Specification, which provides for universal data access to analytical data sources residing over the Web, without the need to deploy a client component that exposes COM interfaces. The Microsoft Analysis Services server can be accessed with the provided download, from the web, without any COM components on the client..." See references in "XML for Analysis."

  • [May 16, 2001] "A simple SOAP client. A general-purpose Java SOAP client." By Bob DuCharme (VP of Corporate Documentation, UDICo). From IBM developerWorks. May 2001. ['Bob DuCharme introduces a simple, general purpose SOAP client (in Java) that uses no specialized SOAP libraries.'] "This article describes a simple, general purpose SOAP client in Java that uses no specialized SOAP libraries. Instead of creating the SOAP request XML document for you under the hood, this client lets you create your own request with any XML editor (or text editor). Instead of merely giving you the remote method's return values, the client shows you the actual SOAP response XML document. The short Java program shows exactly what SOAP is all about: opening up an HTTP connection, sending the appropriate XML to invoke a remote method, and then reading the XML response returned by the server. SOAP, the Simple Object Access Protocol, is an evolving W3C standard developed by representatives of IBM, Microsoft, DevelopMentor, and UserLand Software for the exchange of information over a network. As more SOAP servers become publicly available on the Web, SOAP is doing for programs written in nearly any language -- even short little programs written in popular, simple languages like Visual Basic, JavaScript, and perl -- what HTML does for Web browsers: It gives them a simple way to take advantage of an increasing number of information sources becoming available over the Web. Like HTML, SOAP provides a set of tags to indicate the roles of different pieces of information being sent over the Web using the HTTP transport protocol (and since SOAP 1.1, SMTP as well). SOAP, however, gives you much more power than HTML. With SOAP, your program sends a 'SOAP request' (a short XML document that describes a method to invoke on a remote machine and any parameters to pass to it) to a SOAP server. The SOAP server will try to execute that method with those parameters and send a SOAP response back to your program. The response is either the result of the execution or the appropriate error message. Public SOAP servers are available to provide stock prices, the latest currency conversion rates, FedEx package tracking information, solutions to algebraic expressions, and all kinds of information to any SOAP client that asks. Before SOAP existed, programs trying to use this kind of information had to pull down Web pages and 'scrape' the HTML to look for the appropriate text. A visual redesign of those Web pages (for example, putting the current stock price in a table's third column instead of its second column) was all it took to render these programs useless. The SOAP spec, along with its brief accompanying schemas for SOAP requests and responses, provides the framework for a contract between clients and servers that creates a foundation for much more robust information-gathering tools. There are plenty of SOAP clients available for most popular programming languages..." Article also available in PDF format. See "Simple Object Access Protocol (SOAP)."

  • [May 16, 2001] "Groups get approval for ebXML specifications. More than 200 IT organizations, companies, and software vendors vote to approve standard." By Margret Johnston. In InfoWorld (May 15, 2001). "The Organization for the Advancement of Structured Information Standards (OASIS), a nonprofit, international consortium that creates interoperable industry specifications based on public standards, and the United Nations Center for Trade Facilitation and Electronic Business (UNCEFACT), announced the news Monday following last week's meeting, which took place in Vienna. EbXML is a modular suite of specifications designed to enable companies of any size and in any country to conduct business over the Internet through the exchange of XML-based messages. It is aimed at facilitating global trade by enabling XML to be used in a consistent manner to exchange business data electronically... OASIS and UNCEFACT joined forces in September 1999 and since then have been working to identify the technical basis on which the global implementation of XML could be standardized. The groups held proof-of-concept demonstrations in several cities around the world, and the Vienna meeting marked the culmination of that effort. The suite of specifications approved in Vienna are the ebXML Technical Architecture, Business Process Specification Schema, Registry Information Model, Registry Services, ebXML Requirements, Message Service, and Collaboration-Protocol Profile and Agreement. Implementations of ebXML already are being announced, and the rate of deployment is expected to accelerate, said Patrick Gannon, chairman of the OASIS board of directors. Gannon cited recent announcements of ebXML integration and support from industry groups, including RosettaNet, a consortium of more than 400 IT and electronics companies. RosettaNet plans to integrate support for the ebXML Message Service specification in future releases of RosettaNet's Implementation Framework, the consortium announced in April. The Global Commerce Initiative, which represents manufacturers and retailers of consumer goods, also has decided to base its new Internet protocol standard for trading exchanges and business-to-business communications on ebXML, Gannon said..." See the announcement.

  • [May 15, 2001] "Next-generation e-biz." By James R. Borck [Test Center Managing Analyst]. In InfoWorld (May 11, 2001). ['Web-services-oriented architectures are gearing up to inspire e-business efficiency and dynamic partner integration.'] "... By the second half of 2002, Web services will emerge as the definitive standard for the next phase of global e-business. In the interim, your company should plan a Web services strategy, and your developers should familiarize themselves with Web services frameworks and tools. What exactly are Web services? 'Web services' describes a service-oriented architecture in which self-contained, distributed applications, comprising very specific business functions, are enveloped in XML to facilitate integration intraenterprise and with business partners. Using a global publish and lookup mechanism, these task-specific software services can be described, published, located, and engaged, either directly or programmatically, over the Internet and private networks. With Web services, you, your suppliers, trading partners, and customers will be able to dynamically discover and call one another's published applications and chain them together to automate entire workflow processes. Private exchanges and marketplaces, procurement, billing and payment verification, and legacy application availability all will benefit from the streamlining mechanisms of Web services. But companies won't reap these benefits overnight. Many, still trying to catch up to the adoption of XML, are hard-pressed to devote the necessary resources to break existing applications into discreet Web services components. But what is lost in XML's transactional inefficiency is made up for in the reduction of the complexity inherent in today's e-business systems. Development times, and consequently time to market, can be reduced from months to just days, systems can be made easier to maintain, and new revenue streams can be tapped thanks to improved application accessibility. Better still, Web services deliver these benefits not by supplanting the distributed technologies in use but by extending their functionality. Web services suffer from several popular misconceptions. One is that Web services are simply hosted applications. Another is that they are merely a new way of interfacing with another company's software. But Web services do more than merely expose interfaces, for this discounts the impact of run-time discovery and binding (the process of determining how the applications will interface), the key capabilities that separate Web services from other application integration methods. In the evolution of enterprise computing, the generic, object-oriented software components of yesterday yielded to standards such as COM (Component Object Model), CORBA, Enterprise JavaBeans (EJB), and then distributed server-side computing. But each new integration scheme continued to demand tightly coupled, agreed-on preconfigurations for exchanging data requirements that needed addressing at the point of design. If an interface was changed, the system was broken. The next generation of software components will be more loosely coupled, binding applications at execution instead of during development. The process will make interoperability a word of the past. How Web services work Web services frameworks use XML to envelop applications and facilitate messaging. The original executable can be coded in any language or run on any platform because Web services don't rely on object-model-specific protocols such as DCOM (Distributed Component Object Model), as do other component-based technologies. Services can be developed to offer multiple options for communication, selectable at the point of engagement. SOAP (Simple Object Access Protocol) is used to define distributed object communication procedures. The XML-based protocol carries additional instructions on how data should be processed and, like XML, is platform-and transport-neutral. WSDL (Web Services Description Language) provides an abstract for exchange by describing service-specific data including details on the interface, available protocols, and other implementation-specific particulars. And finally, the UDDI (Universal Description, Discovery, and Integration) specification at the repository level provides the indexing and lookup capability for services through DNS data and SOAP-based APIs. UDDI data contains information about businesses and their location, services they offer, billing information, and allowable protocols.... Although developers can see the great potential of Web services, CTOs can't cash in on the promise yet. Definitive standards, security, and QoS (quality of service) issues remain unanswered. Before CTOs can consider adoption seriously, problems of securing end-to-end data transport across multiple services and ensuring transactional committal, guaranteeing nonrepudiation, must be solved. The reality is that attempts at seamless interoperability will take some time..."

  • [May 14, 2001] "XML Schema becomes W3C Recommendation: What This Means. With the approval of the W3C and its 500+ members, XML is ready for the next big step to worldwide deployment." By Natalie Walker Whitlock (Casaflora Communications). From IBM developerWorks. May 2001. ['After more than two years of review and revision, the World Wide Web Consortium (W3C) announced on May 1 that it has embraced the XML Schema with a formal Recommendation. W3C Recommendation status is the final step in the consortium's standards approval process, indicating that the schema is a fully mature, stable standard backed by the 510 W3C member organizations.'] "Speaking at the 10th International World Wide Web Conference in Hong Kong, Web pioneer and W3C Director Tim Berners-Lee said that XML Schema (parts 0, 1 and 2) should now be considered as one of the foundations of XML, together with XML 1.0 and Namespaces in XML. He also stated that the specification provides 'an XML language for defining XML all languages.' The finalized Schema brings rich data descriptions to XML. Schema will solve the primary problem of B2B communication and interoperability that has held XML back from its full potential. The standardized Schema is expected to integrate data exchange across business, and ultimately realize the full promise of XML to facilitate and accelerate electronic business... Schema increases XML's power and utility to the developer by providing better integration with XML Namespaces. By introducing datatypes to XML, Schema makes it easier than ever to define the elements and attributes in a namespace, and to validate documents that use multiple namespaces defined by different schemas. XML Schema also introduces new levels of flexibility intended to speed its adoption for business use. According to [IBM's Noah] Mendelsohn, who also helped write the spec, XML Schema addresses a number of new issues and therefore has features for demanding apps. Yet, he says, developers can learn how to use XML Schema to do what they've been doing in XML with DTDs in 'about an hour or two.'... Berners-Lee added that XML Schema would need to be clarified and simplified after the many implementations and unexpected interpretations of the specification. Indeed, the cry of simplification has been one of the loudest heard from critics. The current complexity has been blamed for driving others to create alternative, lighter weight schemas, such as TREX and RELAX. Some have even said XML Schema is so complex that even some W3C insiders are calling for future versions to be incompatible with this first release so they do not repeat what critics say are the flaws of the first version... Despite the controversies, most groups have publicly stated that they will support and incorporate the W3C's XML Schema. These groups include IBM, Microsoft, Sun Microsystems, Commerce One, and Oracle. In a public statement, Oracle said its Oracle9i will be the first production database to implement the new Schema. In addition, both Microsoft's .Net initiative and Sun's SunOne Web services effort will take advantage of XML Schema..." For schema description and references, see "XML Schemas."

  • [May 12, 2001] "XRel: A Path-Based Approach to Storage and Retrieval of XML Documents Using Relational Databases." By Masatoshi Yoshikawa, Toshiyuki Amagasa, Takeyuki Shimura, and Shunsuke Uemura. In ACM Transactions on Internet Technology, Volume 1, Number 1 (June 2001). [Paper accepted for the inaugural issue. Edited by Won Kim.] "This paper describes XRel, a novel approach to storage and retrieval of XML documents using relational databases. In this approach, an XML document is decomposed into nodes based on its tree structure, and stored in relational tables according to the node type, with path information from the root to each node. XRel enables us to store XML documents using a fixed relational schema without any information about DTDs and element types, and also enables us to utilize indices such as the B+-tree and the R-tree supported by database management systems. For the processing of XML queries, we present an algorithm for translating a core subset of XPath expressions into SQL queries. Thus, XRel does not impose any extension of relational databases for storage of XML documents, and query retrieval based on XPath expressions can be realized in terms of a preprocessor for database query language. Finally, we demonstrate the effectiveness of this approach through several experiments using actual XML documents." See related slides in 'WebDB.html'.

  • [May 11, 2001] "Identification of Syntactically Similar DTD Elements for Schema Matching." By Hong Su, Sriram Padmanabhan, and Ming-Ling Lo (Computer Science Department, Worcester Polytechnic Institute). Paper presented at the Second International Conference on Web-Age Information Management (WAIM'2001), Xi'an, China, July. 2001. 13 pages. "XML Document Type Definition (DTD) enforces the structure of XML documents. XML applications such as data translation, schema integration, and wrapper generation require DTD schema matching as a core procedure. While schema matching usually relies on a human arbiter, we are aiming at an automated system that can give the arbiter a starting point for designing a matching that can best meet the requirements of the given application. We present an approach that identifies the syntactically similar DTD elements that can be potential matching components. We first describe DTD element graph, a data model for the DTD elements. We then define the distance between two DTD element graphs. We introduce the concept of syntactically equivalent and syntactically similar graphs. Then, we describe the algorithm to detect both schema equivalent and similar DTD elements. We have implemented the matching detection algorithm and several heuristics which improve performance. Our experimental results show reasonable precision of the algorithm in terms of recognition of correct matches... We need a mechanism to discover those semantically equivalent or similar components (we say these components match) to help generate an integrated conceptually clean schema. In this paper, we present a mechanism for detecting possible matches between components across DTDs. It consists of four phases: (1) DTDs are modeled as DTD graphs and a series of simplification transformations are performed to normalize the DTDs. (2) Initial matches are set up based on a series of matching criteria. (3) The initial matches are propagated by computing the matching likelihood of other component pairs based on their structural properties. (4) A 'best' matching plan is selected from multiple matching plans based on the overall matching likelihood of the component pairs... Conclusion and Future Work: XML based schema matching is likely to play an important role in e-commerce applications which rely on data integration and dynamic data sharing across distributed services (e.g., Emarkets). We have studied the problem of providing an automated tool for initial schema matching among DTDs. Our experimental evaluation using industry standard DTDs show that these algorithms are effective in identifying element and sub-tree matches. Renaming of elements reduces accuracy of the algorithms. However, this can be improved by using synonym dictionaries or domain-specific ontologies. We will be exploring these approaches as part of future work. Another issue that we would like to address is ambiguity. Ambiguity islikely to be very common when performing schema matching. For example, a one-to-one relationship can be modeled in several ways such as element and sub-element, or element and attribute, or element and an IDREF to another element. We would like to consider all alternatives as we perform the matching algorithm." Note description of Hong Su's research program: "A. Integration and management of semistructured web data. B. Complex model management, especially XML-based: (1) Underlying data publication: Export the data in non-XML data format to XML data format (2) Persist XML data in RDB, ODB databases (3) Evolution of XML. Many Applications of database systems come across the problems of manipulation of models. What we mean by model is a complex structure that represents a design artifact, such as relational schema, object model, UML model, XML DTD. The manipulation of models involve managing changes in models and transformations of data from one model to another which has to be addressed by the practitioners in schema integration and translation. We are studing how to represent model management in a data model that is able to capture the semantics of models and model mappings. C. Schema evolution in object databases: Schema evolution is one of the fundamental aspects of information and database systems. In the field of object database, we are studying the problems of how to specify complex schema changes, ensure structural consistency, support transparent schema changes that provide continued interoperability to active applications (behavioral consistency)." [cache]

  • [May 11, 2001] "XEM: Managing the Evolution of XML Documents." By Hong Su, Diane Kramer, Kajal T. Claypool, Li Chen, and Elke. A. Rundensteiner. Paper presented at the Eleventh International Workshop on Research Issues in Data Engineering (RIDE 2001): Document Management for Data Intensive Business and Scientific Applications [Heidelberg, Germany, April 1-2, 2001, Sponsored by the IEEE Computer Society, Held in conjunction with the 17th International Conference on Data Engineering - ICDE 2001. [In Paper Session Four: XML Document Versioning and Change Management; Chair: Susanne Boll, University of Vienna, Austria.] In Proceedings Eleventh International Workshop, pages 103-110. "As information on the World Wide Web continues to proliferate at an astounding rate, the Extensible Markup Language (XML) has been emerging as a standard format for data representation on the Web. In many applications, specific document type definitions (DTDs) are designed to enforce a semantically agreed-upon structure of the XML documents for management. However, both the data and the structure of XML documents tend to change over time for a multitude of reasons, including to correct design errors in the DTD, to allow expansion of the application scope over time, or to account for the merging of several businesses into one. However most of the current software tools that enable the use of XML do not provide explicit support for such data or schema changes. In this vein, we put forth the first solution framework, called XML Evolution Manager (XEM) to manage the evolution of XML. XEM provides a minimal yet complete taxonomy of basic change primitives. These primitives, classified as either data changes or schema changes, are consistency-preserving, i.e., for a data change, they ensure that the modified XML document conforms to its DTD both in structure and constraints; and for a schema change, they ensure that the new DTD is a valid DTD and all existing XML documents are transformed also to conform to the modified DTD. We prove the completeness of the taxonomy in terms of DTD transformation. To verify the feasibility of our XEM approach we have implemented a working prototype system using PSE Pro as our backend storage system... XML has become increasingly popular as the data exchange format over the Web. Although XML data is self-describing, most application domains tend to use Document Type Definitions (DTDs) to specify and enforce the structure of XML documents within their systems. DTDs thus assume a similar role as types in programming languages and schemata in database systems. Many systems, such as Oracle 8i, IBM DB2, and Excelon, have recently started to enhance their existing database technologies to accommodate and manage XML data. Many of them assume that the DTD is provided in advance and will not change over the life of the XML documents. They hence utilize the given DTD to construct a fixed relational (or object-relational) schema which then can serve as structure based on which to populate the XML documents that conform to this DTD. However, change is a fundamental aspect of persistent information and data-centric systems. Information over a period of time often needs to be modified to reflect perhaps a change in the real world, a change in the user's requirements, mistakes in the initial design or to allow for incremental maintenance. While these changes are inevitable during the life of an XML repository, most of the current XML management systems unfortunately do not provide enough (if any) support for these changes. Motivating Example of XML Changes. . . XML Evolution Manager (XEM) Approach: in this work we propose an XML Evolution Manager (XEM) as a middleware solution that provides uniform XML-centric data and schema evolution facilities. To the best of our knowledge, XEM is the first effort to provide such uniform evolution management for XML documents. In brief the contributions of our work are: (1) We identify the lack of generic support for change in current XML management systems; (2) We propose a taxonomy of XML evolution primitives that provide a system independent way to specify changes both at the DTD and XML data level. (3) We analyze change semantics and introduce the notion of constraint checking to ensure structural consistency during the evolution; (4) We can show that our proposed change taxonomy is complete; (5) We describe a working XML Evolution Management prototype system we have implemented to verify the feasibility of our approach... System Implementation: To verify the feasibility of our approach, we have implemented the ideas presented in this paper in a prototype system. We have implemented Marrow, a working framework for XML management. In Marrow we use Excelon Inc's Pse Pro, a lightweight object database system repository, as the underlying persistent storage system for XML documents. We require that the DTDs with which the incoming XML documents will comply are entered first into the system. PSE Pro's schema repository has been enhanced to not only manage traditional OO schema but also DTD as metadata. The DTD-OO schema mapper generates an OO schema according to the DTD metadata. Then we load the XML documents into the just prepared schema. The mapping and loading details are given in [Kramer]. We implemented all the proposed change primitives. Comparison of the performance of using the primitives to achieve incremental change versus reloading from scratch can be found in [Kramer]... In this paper, we present the first of its kind: a taxonomy of XML evolution operations. These primitives assure the consistency of XML documents, both when DTD changes are made and XML documents have to conform to the changes; and also when individual XML documents are changed to ensure that the changed documents still correspond to the specified DTD. We have implemented an XEM prototype system. The performance analysis can be found in [Kramer: D. Kramer. XML Evolution Management, Masters Thesis, Worcester Polytechnic Institute, 2001]." Also in PDF format. [cache PS, cache PDF]

  • [May 11, 2001] "Model Management: A Solution to Support Multiple Data Models, Their Mappings and Maintenance." By Kajal T. Claypool, Elke A. Rundensteiner, Xin Zhang, Su Hong, Harumi Kuno, Wang-chien Lee, and Gail Mitchell. In Proceedings of SIGMOD 2001 (Santa Barbara, CA, May 2001). 5 pages. "The growing accessibility of the Internet has brought about phenomenal growth in the publication of data. The data comes from heterogeneous distributed sources -- even within a company, similar data can be stored in a variety of formats. The growth of the Internet has also increased the need for applications that present a unified view of data from multiple sources, and thus the problem of how to integrate heterogeneous data is more important than ever. Although the problem of data exchange and integration has been well-studied for many years, in the era of electronic information exchange, data model impedance presents new critical challenges. Today, for example, we need to: 1) map an XML schema of one Web application to that of another to guide the exchange of XML instances between the applications; 2) map a web page wrapper to a database schema to guide the translation of queries on the schema to the underlying web sites; 3) map a web site's content to its page layout to drive the generation of web pages. Many projects in industry and academia have been and are continuing to struggle with this problem. However they typically tackle an individual slice of the overall problem, and develop and reinvent many (often very similar) tools for their specific domains. In this project, we present a model management system that provides a complete and integrated solution to the large problem, as well as an infrastructure for creating advanced tools and operators. Our system is capable of (1) describing different data models such as relational, OO and XML; (2) describing cross-model mappings to map for example a XML schema to a relational schema and to drive transformation of XML elements into rows of relational tables; (3) describing inter-model mappings, i.e., restructuring within one model, to map data sources into data warehouse tables; and (4) discovering mappings with the aid of pre-defined maps and additional domain knowledge between two given schemas to aid e-business to communicate with their own individual XML documents. With the aid of this middle-layer management tool, users can now describe the application schemas they are working with. Moreover, they can map from the application schema which may be in the relational model to an equivalent XML schema, can describe an XML, relational or OO view over it or can even discover the mapping between two pre- existing XML Schemas. A map or a discovered map are all represented in the model management system as first-class citizens thereby allowing users to operate and manipulate them. Generic tools and operators can now be built for the map models with a promise to provide an environment for re-use of technology and effort for meta- mappings and modeling of application schemas. As a proof of concept we provide a change management tool for our MM system. This allows modication of existing maps, i.e., maps in the MMS, to reflect schema changes in the source and/or the target irrespective of the type of map (cross-model or inter-model) and the data model of the source and target. Thus, the tool provides maintenance of a map transforming between the source and target schema when either one of them undergoes a schema change... In Section 2, we give a brief overview of our MM system and walk-through the steps involved in developing a mapping. Section 3 presents highlights of our system in terms of its features. Section 4 gives a brief look at our plans for demo. ... Our MMS prototype has been written in Java JDK 1.2 and uses Oracle8i as the MMS persistent store. We use the Oracle8i triggers to develop the change management tool and utilize the IBM XML Parser 4J to extract XML data using the DOM API. Our examples for the demo focus on cross-model translations from XML to relational and vice versa; and on inter-model re-structuring of the relational model. The MMS incorporates a full range of tools that allow users to describe their application schemas, to re-structure their schemas, map from a schema in one model to a schema in another model, and maintain once these maps are in place. We will demonstrate: (1) importing an application schema e.g., DTD from the application layer and representing it as data in the meta layer; (2) translating application schema (DTD) to an application schema in a different model (relational model) using several pre-dened maps; (3) re-structuring an application schema (DTD or relational) using pre-defined complex SPJ maps; (4) discovering maps between two application schemas in the same data model (DTD) using pre-defined maps and domain knowledge; (5) editing generated maps using a map editor; (6) generating code for the map to drive the schema translation as well as the data transformation using the specications given in the map; (7) propagating schema change from source (relational) to target (relational) and vice versa by allowing in-place modication of the map." See also the Database Systems Research Group (DSRG) web site. [cache PS, cache PDF]

  • [May 11, 2001] "Version Management of XML Documents: Copy-Based versus Edit-Based Schemes." By Shu-Yao Chien, Vassilis Tsotras (UCR), and Carlo Zaniolo (UCLA). Paper presented at the Eleventh International Workshop on Research Issues in Data Engineering (RIDE 2001): Document Management for Data Intensive Business and Scientific Applications [Heidelberg, Germany, April 1-2, 2001, Sponsored by the IEEE Computer Society, Held in conjunction with the 17th International Conference on Data Engineering - ICDE 2001. [In Paper Session Four: XML Document Versioning and Change Management; Chair: Susanne Boll, University of Vienna, Austria.] "Managing multiple versions of XML documents and semistructured data represents a problem of growing interest. Traditional version control methods, such as RCS, use edit scripts representing changes in the document to support the incremental reconstruction of different versions. The edit-based approaches have been recently enhanced with a replication scheme called UBCC. UBCC is based on the notion of page usefulness and ensures effective management for multi-version documents in terms of both retrieval and storage cost. These improvements notwithstanding, the edit-based representation suffers from limited generality and flexibility -- e.g., it cannot represent changes such as rearranging the document or duplicating parts of its content. To solve these problems, the paper proposes a copy-based UBCC versioning scheme, which also provides a simpler format for the electronic exchange of multi-version documents. With the objective of matching the performance of the edit-based UBCC technique, we develop algorithms that enhance the copy-based UBCC scheme with page usefulness management. We also present results of various experiments that test the storage and retrieval performance of the new copy-based approach, and compare it with that of the edit-based UBCC approach...The problem of managing multiple versions for XML and semistructured documents is of significant interest for content providers and cooperative work. The XML standard is considering this problem at the transport level. The WEBDAV working group is developing a standard extension to HTTP to support version control, meta data and name space management, and overwrite protection. Traditional document version management schemes, such as RCS and SCCS, are line-oriented and suffer from various limitations and performance problems. For instance, RCS stores the most current version intact while all other revisions are stored as reverse editing scripts. These scripts describe how to go backward in the document's development history. For any version except the current one, extra processing is needed to apply the re-verse editing script to generate the old version. Instead of appending version differences at the end like RCS, SCCS interleaves editing operations among the original document source code and associates a pair of timestamps (version ids) with each document segment specifying the lifespan of that segment. Versions are retrieved from an SCCS file via scanning through the file and retrieving valid segments based on their timestamps... However, in spite of these improvements, the edit-based representation of versions suffers from limited generality and flexibility. For example, it can not efficiently represent changes such as document content rearranging and docu-ment restructuring. To solve these problems, we propose a new copy-based versioning scheme, which gets rid of edit scripts and uses the concept of common segments to represent versions. This new scheme also provides a simpler, more flexible format which can be used for the electronic exchange of multi-version documents, WWW-based cooperative authoring and versioning activities. In addition, with the objective of matching the UBCC's performance, we develop algorithms that enhance the copy-based scheme with the usefulness-base page management method used in UBCC. After formalizing the algorithms used in the two methods, we present the results of various experiments to test and compare the performance of these two strategies... Due to the growing importance of versioned XML documents, we have been seeking strategies for optimizing their storage and retrieval. In a previous paper, we concentrated on edit-based representations and proposed a usefulness-based management technique (UBCC) that provides better overall performance and flexibility than more traditional version control methods such as RCS. As discussed, the edit-based UBCC for multi-version documents achieves performance levels that are typically better than those obtainable using techniques developed for transaction-time databases and persistent objects managers. In this paper we developed a copy-based representation technique that in terms of generality and flexibility of representation is superior to the edit-based representations favored by all previous authors. A main contribution of this paper has been to extend the usefulness based management to our new copy-based scheme, as to achieve the same level of performance on storage and retrieval as that obtained using the edit-based UBCC. Our copy based scheme stores and retrieves each version as a list of sublists without using edit scripts. This new scheme minimizes the version retrieval I/O overhead and offers the following advantages: (1) changes such as document reorganization, and replication of selected document objects are supported along with the traditional insertion, deletion and updates supported by the edit scripts; (2) list representation (unchanged segment records) are stored with actual objects, eliminating the need for a separate edit script. Only net effect of changes are used, and intermediate changes are factored out, (3) multiple concurrent versions can be supported along with successive temporal versions." See the previous work by S-Y. Chien, V.J. Tsotras, and C. Zaniolo: "Version Management of XML Documents." WebDB 2000 Workshop, Dallas, TX., alt URL. [cache]

  • [May 11, 2001] "Chemical Markup Language. A Position Paper." By Peter Murray-Rust ( and Henry S. Rzepa ( 2001-04-10. "This paper describes Chemical Markup Language and its relationship to IUPAC and other organisations... CML deliberately does not cover all chemistry but concentrates on 'molecules' (discrete entities representatable by a formula and usually a connection table). It supports a hierarchy for compound molecules (clathrates, macromolecules, etc.). It also supports reactions, and macromolcular strucures/sequences (though it can interoperate with other macromolecular XML languages as they are developed). It has no specific support for physicochemical concepts, but can support labelled numeric datatypes of several sorts which can cover a wide range of requirements. It allows quantities and properties to be specifically attached to molecules, atoms or bonds. CML is designed to interoperate with several leading MLs and XML protocols and we have demonstrated the following (1) XHTML for text and images; (2) SVG for line diagrams, graphs, reaction schemes, phase diagrams, etc.; (3) PlotML for graphs MathML for equations; (4) XLink for hypermedia (including atom-spectralPeak assignments, reaction mapping); (5) RDF and Dublin Core for metadata; (6) XML Schemas for numeric and other data types. There are other generic tools required in physical science including units, multidimensional arrays with varied datatypes, terminology and bibliography. There are no widely accepted MLs for these at present; we shall continue to develop our own to be used with CML but will use others if they become widespread. An example is physiochemical data held as SELF (Prof. Henry Kehiaian, IUPAC+CODATA) and now converted to SELFML (PMR+HK) as a IUPAC/CODATA project... Many different types of organisation have adopted, or are adopting CML. We list a few examples: (1) Governmental and global agencies (e.g., drug regulatory agencies through the International Committee on Harmonisation - ICH/M2). We have had additional meetings or discussions with several other agencies. Non-profit research (government). National Cancer Institute, Developmental Therapeutics program (NCI/DTP). ca. 500K compounds are being converted to CML Non-profit research (academia). (2) The University of California at San Diego (UCSD) has adopted CML as the chemical technology for its new terascale information and computing grid portals. This will also by used by the Protein Data Bank (PDB) at the same site..." For additional information, see (1) the Chemical Markup Language official web site, and (2) "Chemical Markup Language (CML)."

  • [May 11, 2001] "Chemical Markup Language 1.0 reference with examples." A Zvon resource. Written by Jiri Jirat. The indexes were extracted from the CML 1.0 specification. Main features: (1) Clickable indexes; (2) Graphical representation of examples: PNG and SVG created from CML using XSLT, both molecules and spectra; (3) Click on an atom in the example leads to relevant part of CML source."

  • [May 11, 2001] "The Role of Private Uddi Nodes in Web Services, Part 1. Six species of UDDI." By Steve Graham (Web Services Architect, IBM Emerging Internet Technologies; previously: faculty member in the Department of Computer Science, the University of Waterloo). From IBM developerWorks. May 2001. ['Steve Graham introduces the concepts behind Web services discovery and gives a brief overview of UDDI (Universal Description Discovery and Integration). He examines six variants of UDDI registries, highlighting the role each of these plays in a service-oriented architecture.'] "In service-oriented architectures, service descriptions and metadata play a central role in maintaining a loose coupling between service requestors and service providers. The service description, published by the service provider, allows service requestors to bind to the service provider. The service requestor obtains service descriptions through a variety of techniques, from the simple "e-mail me the service description" approach and the ever popular sneaker-net approach, to techniques such as Microsoft's DISCO and sophisticated service registries like the Universal Description, Discovery and Integration (UDDI), which is what I am going to examine here. UDDI defines four basic data elements within the data model in version 1.0: businessEntity (modeling business information), businessService (high level service description), tModel (modeling a technology type or service type), and bindingTemplate (mapping between businessService and tModels). I won't discuss the intricacies behind these elements, so if you need more basic information on UDDI, please visit the UDDI web site before continuing (see Resources). The set of operator nodes known as the UDDI business registry, or UDDI operator cloud, implies a particular programming model characterized by design-time service discovery. We need design-time discovery since it is often not feasible to implement dynamic discovery at run-time due to overwhelming complexity. However, the just-in-time integration value proposition of the IBM Web Services Initiative allows organizations to provide dynamic discovery and binding of Web services at run time. To do this, API characteristics and other non-functional requirements are specified as business policies at design time. This flexibility has important characteristics for loosely-coupled enterprise application integration, both within and between organizations. The role of the UDDI cloud to support a dynamic style of Web services binding is currently limited. However, the UDDI API and data model standard can still play a role in a service-oriented architecture. The notion of a private or non-operator UDDI node is critical to the emergence of a dynamic style of a service-oriented architecture... We have briefly examined the discovery role played by UDDI within a service-oriented architecture and enumerated six species of UDDI, each supporting different uses of a service-oriented architecture. In the next installment of this article, I will contrast the programming models that use private and operator UDDI nodes, and review requirements for functionality to make private UDDI nodes easier to use." Article also in PDF format. See: "Universal Description, Discovery, and Integration (UDDI)." [cache]

  • [May 11, 2001] "XML Meets Semantics. Meet the new kids on the block, and one more from the old neighborhood. [Thinking XML #2.]" By Uche Ogbuji (CEO and principal consultant, Fourthought, Inc.). From IBM developerWorks. May 2001. ['Addresses knowledge management aspects of XML, including metadata, semantics, Resource Description Framework (RDF), Topic Maps, and autonomous agents. In this column, Uche Ogbuji completes his introduction to XML and semantics, setting the stage for the more practical columns that will follow. Thinking XML addresses knowledge management aspects of XML, including metadata, semantics, Resource Description Framework (RDF), Topic Maps, and autonomous agents. Approaching the topic from a practical perspective, the column aims to reach programmers rather than philosophers.'] "In my first Thinking XML column, I introduced the idea of semantic transparency and its importance to XML-related developments. Because semantic transparency is so important, there has been a flurry of activity in the area lately -- more than I could cover in one installment. In this installment, I introduce some of the emerging players in XML and semantics. But first, I'll cover an interesting play by the old guard which I omitted from the first installment... XML markup for EDI standards: The Implementation Guide Mark Up (IgML) working group is an effort by a group of electronic-data interchange (EDI) vendors to represent EDI implementation guidelines and standards in XML format. They are developing a DTD (document type description) for this representation, with the goal of providing a high degree of structure to the normative text and directing the implementation path to EDI for maximum interoperability. From the IgML Web site you can download the current draft of the DTD, as well as samples of various subsets of ANSI X12 and UN/EDIFACT (the two main 'dialects' of EDI). While IgML does not itself provide a framework for semantic transparency, it will provide a useful tool for those implementing XML business-to-business systems that either work with EDI or just take advantage of the semantic infrastructure provided by EDI standards... [also covered: ebXML, eCo registries, RosettaNet, RDF...] Now that I've outlined the importance of semantic frameworks as a layer above XML, future columns will move on to examining practical ways to manage the knowledge represented by these high-level frameworks. The next article will discuss the use of RDF to develop inexpensive search and reference systems for XML data repositories." Article also in PDF format. See: "XML and 'The Semantic Web'."

  • [May 11, 2001] "ebXML: E-Business Language of Choice? How an XML Specification Strives to Create One Global Market." By Don Kiely. In InformationWeek Isue 836 (May 07, 2001), pages 79-84. "The grassroots ebXML effort has created a set of standard business processes using XML to create a global online marketplace. The challenge is making a framework that's generic, yet sweeping enough to accommodate large and small firms around the world. bXML, the United Nations-backed standard for E-business, aims to create a single online marketplace where companies of any size or nationality can collaborate and conduct business around the globe. By creating a standard way for companies to carry out common business practices, ebXML promoters hope to lower entry barriers and let small and midsize companies from the far corners of the globe join in the economic advances that their larger brethren already enjoy. It's ironic that this grassroots effort had its genesis in NATO, the North Atlantic Treaty Organization -- a pan-governmental bureaucracy. But the initiative seems to be resonating around the world, with supporters ranging from IBM and Sun Microsystems to government agencies such as the Saudi Export Development Corp. and small businesses like Martin's Famous Pastry Shoppe. The goal of ebXML -- being undertaken by about 1,000 participating organizations -- is to create a set of standards that will let companies use XML for E-business. The underlying tenet of ebXML is business workflow and common business processes that every business should be able to understand and use. You could think of ebXML as the successor to electronic data interchange. Where EDI delineated standard E-business documents such as purchase orders, ebXML specifies common business processes and an architecture for carrying out those processes over the Internet. EbXML is being spearheaded by the United Nations Organization for the Advancement of Structured Information Standards, as well as the Economic Commission for Europe's Centre for Trade Facilitation and Electronic Business. EbXML is nearing the end of its planned 18-month gestation period this month, with the publication of a complete set of specifications for using XML as the communication format for global business. The fate of ebXML after May is still undecided, but if the roster of computer vendors and consulting companies participating in the work is any indication, ebXML is likely to make its way into software and service offerings during the coming months. The draft ebXML architecture specifications describe a complex infrastructure for interactions between trading partners and a repository of XML documents from which business processes can be modeled. The architecture provides: (1) A way to define business processes and their associated messages and content; (2) A way to register and discover business process sequences with related message exchanges; (3) A way to define company profiles; (4) A way to define trading-partner agreements; (5) A uniform message transport mechanism... Several major security requirements must be addressed for ebXML to be accepted: (1) Confidentiality: Only the sender and receiver can interpret a document's contents; (2) Authentication of sender: Assurance of the sender's identity; (3) Authentication of receiver: Assurance of the receiver's identity; (4) Integrity: Assurance that the message contents haven't been altered while en route; (5) Nonrepudiation of origin: The sender cannot deny having sent the message; (6) Nonrepudiation of receipt: The receiver can't deny having received the message; and (7) Archiving: It must be possible to reconstruct the semantic intent of a document several years after the creation of the document. The biggest challenge of ebXML is to create a framework for automating trading-partner interactions that's both generic enough for implementation across the entire range of business processes and expressive enough to be more effective than ad hoc implementations between trading partners. The ebXML specification for the application of XML-based assembly and context rules describes how business rules are formed. Given that companies around the world operate in many different ways, it's unlikely that any single standard could possibly incorporate those many variations. No matter where ebXML heads after May, there will be some useful designs for global business interchange. Even if ebXML fizzles, it will have been a useful exercise that can make the world even smaller than it already is." In the same context: Differences between ebXML and UDDI. See references in "Electronic Business XML Initiative (ebXML)."

  • [May 11, 2001] "XML Databases Offer Greater Search Capabilities." By Charles Babcock. In Interactive Week Volume 8, Number 18 (May 01, 2001), pages 11-13. "The Extensible Markup Language is emerging not only as a Web page markup standard, but as a database technology with the potential to simplify and speed future Web operations. With databases that store whole documents in their native XML format, an archive becomes easier to search by title, author, keywords or other attributes. The development will broaden information that is available over the Web and make speedy content serving more practical, database experts said. The World Wide Web Consortium (W3C) last week released its XML Schema specification, which defines how to use XML -- a larger and more useful tagging language than its predecessor, HTML. At the same time, pioneering efforts to implement XML in database systems for managing XML documents are gaining steam. Software AG leads the field with its Tamino XML Database, and 9-month-old start-up Ipedo announced its own XML Database System last week. In the meantime, relational database vendors IBM, Oracle and Sybase continue to upgrade their products to give them more XML-handling capabilities... Both Ipedo and Software AG implement their own versions of the W3C's proposed specification for the XML Query language, now known as XQuery for short. The XQuery draft specification was released Feb. 16, 2001. Once it becomes a released specification, the use of XML documents and XML databases will proliferate, experts predicted. Ipedo is trying to capitalize on speed by urging its customers to equip their database servers with a gigabyte or more of memory. The Ipedo XML Database System dispenses with many of the time-consuming input/output operations of traditional databases by having the database engine and much of the data it works with reside in main memory. The move adds $1,500 or more to the cost of the server on which the database resides, but augments the speed already inherent in serving XML documents from an XML database, Matthews said. Software AG of Darmstadt, Germany has sold 300 copies of its mainframe-style Tamino product since the system was launched in 1999. 'Content delivery is one of our greatest strengths,' said John Taylor, Software AG's director of product marketing for Tamino. He conceded that customers wouldn't buy an XML database primarily to manage large financial accounts. On the other hand, Taylor added, emerging query languages such as XQuery, which was co-authored by IBM and Software AG, will make it possible to query the XML database using 'keys' and retrieve related information from a variety of documents. Just as Structured Query Language queries the relational database, pulling out data related to a primary key or identifier, XQuery will be able to query a large set of documents based on the name of an author, date filed, subject or keywords in the document, Taylor said..."

  • [May 11, 2001] "Efficient Evaluation of XML Middle-ware Queries." By Mary Fernández (AT&T Labs), Atsuyuki Morishima (University of Tsukuba), and Dan Suciu (University of Washington). Paper presented at ACM SIGMOD/PODS 2001, Santa Barbara, California, May 21-24, 2001. 12 pages. "We address the problem of efficiently constructing materialized XML views of relational databases. In our setting, the XML view is specified by a query in the declarative query language of a middle-ware system, called SilkRoute. The middle-ware system evaluates a query by sending one or more SQL queries to the target relational database, integrating the resulting tuple streams, and adding the XML tags. We focus on how to best choose the SQL queries, without having control over the target RDBMS... XML is the universal data-exchange format between applications on the Web. Most existing data, however, is stored in non-XML database systems, so applications typically convert data into XML for exchange purposes. When received by a target application, XML data can be re-mapped into the application's data structures or target database system. Thus, XML often serves as a language for defining a view of non-XML data. We are interested in the case when the source data is relational, and the exchange of XML data is between separate organizations or businesses on the Web. This scenario is common, because an important use of XML is in business-to-business (B2B) applications, and most business-critical data is stored in relational database systems (RDBMS). This scenario is also challenging, be cause the mapping from the relational model to XML is inherently complex and may be difficult to compute efficiently. Relational data is flat, normalized (3NF), and its schema is often proprietary. For example, relation and attribute names may refer to a company's internal organization, and this information should not be exposed in the exported XML data. In contrast, XML data is nested, unnormalized, and its schema (e.g., a DTD or XML Schema) is public. The mapping from the relational data to XML, therefore, usually requires nested queries, joins of multiple relations, and possibly integration of disparate databases. In this work, we address the problem of evaluating efficiently an XML view in the context of SilkRoute, a relational to XML middle-ware system. In SilkRoute, a relational to XML view is specified in the declarative query language RXL. An RXL query has constructs for data extraction and for XML construction. We are interested in the special case of materializing large RXL views. In practice, large, materialized views may be atypical: often the XML view is kept virtual, and users' queries extract small fragments of the entire XML view. For example, SilkRoute supports composition of user-defined queries in XML-QL and virtual RXL views and translates the composed queries into SQL. SilkRoute's query composition algorithm is described elsewhere. Our goal is to support data-export or warehousing applications, which require a large XML view of the entire database. In this case, computing the XML view may be costly, and query optimization can yield dramatic improvements...In our scenario, the XML document defined by an RXL view typically exceeds the size of main memory, therefore, the sorted, outer-union approach best suits our needs. This approach constructs one large, SQL query from the view query; reads the SQL query's resulting tuple stream; and then adds XML tags. The SQL query consists of several left-outer joins, which are combined in outer unions. The resulting tuples are sorted by the XML element in which they occur, so that the XML tagging algorithm can execute in constant space. SilkRoute initially used a more naive approach, in which the view query was decomposed into multiple SQL queries that do not contain outer joins or outer unions. Each result is sorted to permit merging and tagging of the tuples in constant space. We call this the fully partitioned strategy. This work makes two contributions. First, we show experimentally that neither of the above approaches is optimal. This is surprising for the sorted outer-union strategy, because only one SQL query is generated, and therefore has the greatest potential for optimization by the RDBMS. In experiments on a 100MB database, we found that the outer-union query was slower than the queries produced by the fully-partitioned strategy. We found that the optimal strategy generates multiple SQL queries, but fewer than the fully partitioned strategy, therefore the optimal SQL queries may contain outer joins and outer unions. XML tagging still uses constant space, because it merges sorted tuple streams. The optimal strategy executes 2.5 to 5 times faster than the sorted outer-union and fully-partitioned strategies... Generating SQL queries from an XML view definition is a tedious task, and as we have shown, different SQL-generation strategies dramatically effect query-evaluation time. These observations indicate that the user of a relational-to-XML publishing system should not be responsible for choosing SQL queries. To better support large XML views, we presented a method that decomposes the XML view definition into several, smaller SQL queries and submits the decomposed SQL queries to the target database. Our greedy algorithm for decomposing an XML view definition relies on query-cost estimates from the target query optimizer. This method works well in practice and generates execution plans that are near optimal. Although particularly effective in an XML middle-ware system, our view-tree representation can encompass the view-definition languages of commercial relational-to-XML systems. Commercial systems typically generate XML in-engine, because the cost of binding application variables to the tuples dominates execution time. Our decomposition method could be applied within a relational query optimizer as a preprocessing step to XML publishing of relational data in-engine. This work is focussed on publishing large XML documents in an environment in which the middle-ware system has no control over the physical environment or query optimizer of the target database. Given these constraints, our greedy algorithm for searching for optimal query plans is necessary and effective. The simpler outer-union strategy, however, might be adequate when the middle-ware system has more control over the target database. SilkRoute's generated optimal plans do better than the unified outer-union plan, because each individual query is smaller than the outer-union plan. Small queries are less likely to stress the query optimizer; they sort smaller result relations and therefore are less likely to spill tuples to disk; and they typically have many fewer null values than a unified query. An outer-union plan can be reduced by hand, which would provide the same benefits as automatic view-tree reduction. Assuming that the target database has plentiful memory and/or multiple disks, and efficiently supports null values, the resulting outer-union plan is likely to be comparable to SilkRoute's generated optimal plans. Finally, the outer-union plan may also be appropriate when a user query requests only a subset of the XML view, and the result document is small. In this scenario, the outer-union strategy should work well, because the resulting SQL query is usually simple. This scenario is considered in [SilkRoute: Trading], where the XML view of the database is virtual, and users query it using XML-QL." See also: Mary Fernández, Dan Suciu, and Wang-Chiew Tan: "SilkRoute: Trading Between Relations and XML," in Proceedings of WWW9 (2000). On XML query: "XML and Query Languages."

  • [May 11, 2001] "DTDs and XML Documents from SQL Queries. [XML Matters #9.]" By David Mertz, Ph.D. (Bricolateur, Gnosis Software, Inc.) From IBM developerWorks. May 2001. ['This column discusses the public-domain sql2dtd and sql2xml utilities that allow RDBMS-independent generation of portable XML result sets. SQL queries that extract data from relational databases can provide very practical ad hoc document-type information for the representation of query results in XML.'] "The previous "XML Matters" column discussed some of the theory and advantages underlying various data models. One conclusion of that column was that RDBMSs are here to stay (with good reasons), and that XML is best seen in this context as a means of transporting data between various DBMSs, rather than as something to replace them. XPath and XSLT are useful for certain "data querying" purposes, but their application is far less broad and general than that of RDBMSs, and SQL, in particular. However, for lack of space, I am deferring a discussion of the specific capabilities (and limits) of XPath and XSLT until a later column. A number of recent RDBMSs, including at least DB2, Oracle, and probably others, come with built-in (or at least optional) tools for exporting XML. However, the tools discussed in this column are intended to be generic; in particular, the DTDs generated by these tools will remain identical for the same query performed against different RDBMSs. I hope this will further goals of data transparency. Simplifying too much What you might imagine as the most obvious way to convert relational database data to XML is also generally a bad idea. That is, it would be simple enough -- conceptually and practically -- to do a table-by-table dump of all the contents of an RDBMS into corresponding XML documents... Suppose that A and B each has its own internal data storage strategy (for example, in different RDBMSs). Each maintains all sort of related information that is not relevant to the interaction between A and B, but they also both have some information they would like to share. Suppose, along these lines, that A needs to communicate a particular kind of data set to B on a recurrent basis. One thing A and B can do is agree that A will periodically send B a set of XML documents, each of which will conform to a DTD agreed to in advance. The specific data in one transmission will vary with time, but the validity rules have been specified in advance. Both A and B can carry out their programming, knowing the protocol between them. One way to develop this communication between A and B is to develop DTDs (or schemas) that match the specific needs of A and B. Then A will need to develop custom code to export data into the agreed DTDs from A's current RDBMS; and B will need to develop custom code to import the same data (into a differently structured database). Then, finally, the communication channel can be opened. However, a quicker way -- a way that is likely to leverage existing export/import procedures -- usually exists. The Standard Query Language (SQL) is a wonderfully compact means of expressing exactly what data interests you within an RDBMS database. Trying to bolt XML native techniques like XPath or XSLT onto a relational model will probably feel unnatural, although they can certainly express querying functions within XML's basically hierarchical model. Many organizations have already developed well-tested sets of SQL statements for achieving known tasks. Often, in fact, RDBMSs provide means for optimizing stored queries. While there are certainly cases where designing rich DTDs for data exchanges makes sense, in many or most cases, using the structuring information implicit in SQL queries as an (automatic) basis for XML data transmissions can be a good solution. While SQL queries can combine table data in complex ways, the result from any SQL query is a rather simple row-and-column arrangement. Query output has a fixed number of columns, with each row filling in values for every fixed column. (That is, as well as not changing in number, neither the value type nor the names of columns change within a SQL result -- even though both these things could change in XML documents.) The potential of XML to represent complex nesting patterns of elements is just simply not going to be deeply exercised in representing SQL results. Nonetheless, several important aspects of an SQL query can and should be represented in an XML DTD beyond simply row/column positions... In general, sql2dtd can generate the DTD from an SQL query but does not itself query any database. sql2xml peforms queries via ODBC and optionally utilizes sql2dtd to get a DTD (or it can generate DTD-less XML documents). These tools help with only approximately half the process contemplated between A and B. A and B can quickly arrive at DTDs using these tools, and A can equally quickly generate the output XML documents conforming with these DTDs. But B, at its end, still needs to do all the work involved in parsing, storing and processing these received documents. Later columns will discuss B's job in some more detail." See references in "XML and Databases."

  • [May 11, 2001] "Tutorial: Mapping DTDs to Databases." By Ronald Bourret. From May 09, 2001. ['XML and database expert Ron Bourret discusses mapping DTDs to database schemas, and vice versa. In his in-depth article, Bourret discusses both table-based and object-relational mappings. The article describes best practices.'] "A common question in the XML community is how to map XML to databases. This article discusses two mappings: a table-based mapping and an object-relational (object-based) mapping. Both mappings model the data in XML documents rather than the documents themselves. This makes the mappings a good choice for data-centric documents and a poor choice for document-centric documents. The table-based mapping can't handle mixed content at all, and the object-relational mapping of mixed content is extremely inefficient. Both mappings are commonly used as the basis for software that transfers data between XML documents and databases, especially relational databases. An important characteristic in this respect is that they are bidirectional. That is, they can be used to transfer data both from XML documents to the database and from the database to XML documents. One consequence is that they are likely to be used as canonical mappings on top of which XML query languages can be built over non-XML databases. The canonical mappings will define virtual XML documents that can be queried with something like XQuery. In addition to being used to transfer data between XML documents and databases, the first part of the object-relational mapping is used in "data binding", the marshalling and unmarshalling of data between XML documents and objects... Most XML schema languages can be mapped to databases with an object-relational mapping. The exact mappings depend on the language. DDML, DCD, and XML Data Reduced schemas can be mapped in a manner almost identical to DTDs. The mappings for W3C Schemas, Relax, TREX, and SOX appear to be somewhat more complex. It is not clear to me that Schematron can be mapped. In the case of W3C Schemas, a complete mapping to object schemas and then to database schemas is available. Briefly, this maps complex types to classes (with complex type extension mapped to inheritance) and maps simple types to scalar data types (although many facets are lost). "All" groups are treated like unordered sequences and substitution groups are treated like choices. Finally, most identity constraints are mapped to keys. For complete details, see" See also by Ronald Bourret: (1) "XML and Databases," and (2) "XML Database Products." Reference list: "XML and Databases."

  • [May 11, 2001] "Reports from WWW10." By Edd Dumbill. From May 09, 2001. [Highlights from the 10th International World Wide Web conference, which took place last week in Hong Kong. The reports feature Tim Berners-Lee's keynote, Web multimedia, the problems of deploying XHTML, and web annotations with Annotea.'] "Opening the conference on Wednesday, Tim Berners-Lee told the attendees they could congratulate themselves for the progress made so far on the Web, but that they weren't finished building yet. Announcing the release of a landmark XML specification, W3C XML Schema, Berners-Lee explained that the three specifications -- XML 1.0, XML Namespaces, and XML Schema -- formed the new foundation of XML. XML Schema allows the description, in XML, of XML languages, such as SVG or XHTML, and it's designed to replace DTDs, which served the same purpose in XML 1.0. The development of the XML Schema specification has been characterized by controversy and criticism, since the early concerns in late 1999 as to whether Microsoft would support it. Berners-Lee praised the Schema working group for their perseverance in difficult circumstances. Though many in the XML developer community still have reservations about the specification, most agree that XML Schema will, indeed, has to succeed. So now, over three years since the XML 1.0 Recommendation was first published, the W3C has built a foundation for XML that its member companies think can be used in today's applications. However, there's more to the total XML architecture than the foundation. Berners-Lee noted that a key technology, the XML Query language, a kind of SQL for XML data, is still in development, as are XLink and XPointer, XML technologies for linking documents together..."

  • [May 11, 2001] Can XML Help Write the Law?" By Alan Kotok. From May 09, 2001. ['A report from the Conference on Congressional Organizations' Application of XML, where both the mechanics and the public benefits of making legislation available in XML were discussed.'] "XML has spawned a number of new initiatives to improve the way enterprises, including government and not-for-profit organizations, do business. A meeting held on 24 April 2001 on Capitol Hill in Washington, D,C focused on applying XML to the process of crafting legislation, with the potential at least of transforming the basic relationship between citizens and their elected representatives. The meeting, organized by LegalXML and the House Committee on Administration, had speakers on the current ways of generating legislative documents and turning them full-fledged laws and regulations. However, the meeting also discussed ways that the public and political process could benefit from the wealth of data in government databases, when linked to legislation made available in XML documents. The few uses of XML in legislation so far have shown some impressive results. Brian Breneman of the Breneman Group, talked about the State of Michigan's experiences applying XML to its legislative documents. Breneman served as the contractor that developed the Michigan system. In Michigan, the state legislature converts its compiled law to XML, which makes it easier to offer the documents online in HTML and PDF formats..." See the COAX invitation letter, and other references in the events page.

  • [May 11, 2001] "ICE Keeps Data Fresh. Protocol for content exchange catches on slowly." By Chuck Moozakis. In InternetWeek (May 07, 2001), pages 17-18. ['ICE addresses the thorny issue of how a content provider manages the flow of information sent to users -- ensuring that the freshest information is sent to the correct audience at the right time.'] "German software maker Intershop was looking to pump up its Enfinity catalog content e-commerce application back in 1999, and was seeking a technology that would ensure that the right data was being pushed to the right audience. [CTO] Bassiri could have assigned Intershop's programmers the arduous task of writing code to support the management of supplier data. Instead, he was able to avoid that task by building Enfinity around the Information and Content Exchange protocol, a standard developed to help content providers direct information to a wide variety of users. 'No other standard addresses this area directly,' Bassiri said about ICE, an XML-based protocol developed in late 1998 to help companies route their content to disparate audiences. 'Without it we would have to write our own code, and that code would only be specific to a certain type of content.' In a nutshell, ICE addresses the thorny issue of how a content provider manages the flow of information sent to users--ensuring that the freshest information is sent to the correct audience at the right time. The standard also lets companies code their content so that it's sent to user sites during times when bandwidth is most prevalent -- for example, in the middle of the night--to avoid backbone bottlenecks. Dianne Kennedy, founder of consultancy XMLXperts, describes ICE as the 'data pump used to make sure content is where it needs to be when it needs to be there.' The protocol's great value is based on three primary attributes, according to ICE proponents. The first is that it lets users tag ICE-encoded information with effective dates and expiration dates. This means that content can be sent to users early and marked with the date on which it can be redistributed to users' customers. Similarly, ICE lets information be marked as "valid" only up until a specified expiration date. A second ICE attribute is that it lets users integrate their own syndicated content with a customer's existing information. Tribune Media Services, for example, is evaluating ICE to permit the syndicator of newspaper content to mesh its cartoon and entertainment information with news packages created by its member newspapers. A third important attribute of ICE, supporters say, is that the protocol supports a wide variety of delivery guarantees, assuring syndicators that content was delivered as promised. In this case, a company could be notified if critical content it's providing hasn't been delivered. For less time-sensitive information, however, that same company might choose not to be notified if its content has been delayed... Despite all of ICE's potential benefits, backers conceded adoption has been slower than many would have liked..." In the same article: "ICE Explained". For references, see "Information and Content Exchange (ICE) Protocol."

  • [May 11, 2001] "Breaking New Ground In Metro Interconnection." By Rebecca Wetzel. In Interactive Week (May 01, 2001). "It's finally getting easier to interconnect carriers and feed content into local and backbone pipes within metropolitan areas. Last week, MediaCenters, a Chantilly, Va., start-up, announced a set of services designed to solve what is becoming a metropolitan service interconnection crisis. Jim Greenberg, the company's chief technology officer and co-founder, says that the fact that content and applications are moving away from the backbone is driving the need for better, faster, easier and cheaper 'meet me' options at the edge of the network. The types of companies that need to meet in such metropolitan exchanges include long-haul service providers, local access providers of all stripes, content and application hosters, and content accelerators - such as Akamai Technologies. Analyst Peter Sevcik says these 'four horsemen of the Internet' currently require about 1,500 interconnections per metropolitan center... MediaCenters' networks have two components, a physical network called eXpressNet, and an eXtensible Markup Language (XML)-based operations support system called eXchangeNet. The network and OSS provide a carrier-neutral, optical, any-to-any, metropolitan network, enabling terabit-per-second interconnectivity among partners. The resulting service allows companies to instantly interconnect once they are physically hooked into the same eXpressNet network. MediaCenters also touts what it calls its 'e-bonding' tool, which allows service providers to link back-office systems so they can jointly deliver and bill for services. In addition, the XML format allow service providers to submit and receive order requests and trouble-ticket information between each other's systems. Andy Baer, MediaCenters' chief information officer, masterminded the eXchangeNet OSS. As he explains it, eXchangeNet provides service creation, assurance and billing..."

  • [May 11, 2001] "How Web Services Mean Business." By Whit Andrews, Daryl Plummer, and David Smith. From Gartner. 9-May-2001. ['Whether as a tool or a goal, Web services are poised to have a dramatic effect on business -- even enterprises that think of themselves as independent of technology trends.'] "Business must be forgiven its profound skepticism when the boosters of IT trumpet the benefits of any given innovation, but avoid acknowledging the inevitable organizational and technical challenges that technology brings as baggage. 'The Next Big Thing' is now a term of derision as often as it is a promise of innovation. But this understandable attitude has also had a surprising side effect. This time, a next big thing -- the Web services revolution in the continuum of technology evolution -- has, at its heart, the realistic possibility that it will bring fewer challenges than any previous generation. Simplicity is both the Web services concept's promise and strength. Businesses that ignore its potential, or decide to sit out its early stages, will find themselves outpaced by rivals that take advantage of Web services to improve their agility and even to transform themselves into new kinds of enterprises. Because of their inherent ease of use, dynamism and flexibility, Web services will permeate business from the executive suite to the IS 'clean room.' Enterprises of all sizes will find that Web services offer a more cost-effective way to perform agilely on the Supranet and in the other environments... Web services are software components that interact with one another dynamically and use standard Internet technologies, making it possible to build bridges between systems that otherwise would require extensive development efforts. One of the tenets of Web services is that systems can advertise the presence of business processes, information or tasks that can be consumed by other systems. Web services can be delivered to any customer device -- e.g., cell phone, (PDA) and PC -- and can be created or transformed from existing applications. More important, Web services use repositories of services that can be searched to locate the desired function to create a dynamic value chain. New specifications -- such as the Universal Description, Discovery and Integration specification -- allow the extension of business interaction by locating new processes or information, examining the description of what those processes do and binding to the new processes while the system runs. Bottom line: Web services will serve as an attractive means through which enterprises can gain access to software and business services. Through 2H02, 75 percent of enterprises with greater than $100 million in revenue will interface periodically with Web services (0.8 probability). Through 1H03, 50 percent of enterprises with less than $100 million in revenue will interface periodically with Web services (0.8 probability). This 'next big thing' will fulfill on many of the broken promises of the past and present a compelling opportunity for enterprises of all sizes."

  • [May 11, 2001] "IBM Set to Launch Major Web Services Initiative." By Jaikumar Vijayan. In ComputerWorld (May 03, 2001). "IBM on Monday [2001-05-14] plans to launch an e-business initiative aimed at helping users dynamically connect multiple enterprise applications and systems using a standards-based Web services architecture, according to sources familiar with the announcement. The effort is said to encompass all four of IBM's major software technologies below the operating system level -- its WebSphere application server and DB2 database, plus subsidiary Tivoli Systems Inc.'s management tools and the groupware and collaboration products made by the company's Lotus Development Corp. unit. As part of the initiative, the sources said, IBM will develop new tools and software components that are supposed to let the different technologies interact with one another more efficiently. IBM declined to comment on the announcement, which is due to take place at an event in New York. Among the products expected to be announced are WebSphere Studio tools for developing Web-based computing services and a WebSphere Business Integrator, which reportedly will provide integration, transaction and workflow services between different internal applications and between systems running at multiple companies. The new WebSphere products are scheduled to start shipping later this quarter and will incorporate support for standards such as the Simple Object Access Protocol [SOAP]; the Universal Description, Discovery and Integration [UDDI] directory; and the Web Services Description Language [WSDL], the sources said. Also in the works is a Lotus Web services enablement kit supporting that unit's software products, they added. Those tools are expected to become available in the second half of this year and will include a knowledge-discovery management module that can be used to capture information about various Web services. Meanwhile, the sources said a DB2 XML Extender is being added to bring Web services to IBM's relational database. The technology has already been integrated into IBM's recently announced DB2 Version 7.2 release and will enable applications built on top of that software to access information stored in databases made by other vendors..."

  • [May 10, 2001] "Unicode Character Database (UCD) in XML Format." Prepared by Mark Davis. From the posting to '' 2001-05-10, 'Subject: UCD in XML': "Several people asked me over the last month about the XML version of the Unicode character database that I presented at last November's UTC meeting. I posted it at, containing two files: UCD.xml and UCD-Notes.htm. Caveats: (1) I regenerated the data with Unicode 3.1 data. However, (a) I haven't done more than spot-check the results, and (b) the format differs somewhat from what is documented in the notes; (2) I still have to comment out characters FFF9..FFFD, and all surrogates, so that people can read the file with Internet Explorer (I do wish they would use a conformant XML parser). Also, note that IE takes quite a while to load the file... Format: The Unicode blocks are provided as a list of <block .../> elements, with attributes providing the start, end, and name. Each assigned code point is a <e .../> element, with attributes supplying specific properties. The meaning of the attributes is specified below. There is one exception: large ranges of code points  for characters such as Hangul Syllables are abbreviated by indicating the start and end of the range. Because of the volume of data, the attribute names are abbreviated. A key explains the abbreviations, and relates them to the fields and values of the original UCD semicolon-delimited files. With few exceptions, the values in the XML are directly copied from data in the original UCD semicolon-delimited files. Those exceptions are described below... Numeric character references (NCRs) are used to encode the Unicode code points. Some Unicode code points cannot be transmitted in XML, even as NCRs (see, or would not be visibly distinct (TAB, CR, LF) in the data. Such code points are represented by '#xX;', where X is a hex number. Attribute Abbreviations: To reduce the size of the document, the following attribute abbreviations are used. If an attribute is missing, that means it gets a default value. The defaults are listed in parentheses below. If there is no specific default, then a missing attribute should be read as N/A (not applicable). A default with '=' means the default is the value of another other field (recursively!). Thus if the titlecase attribute is missing, then the value is the same as the uppercase. If that in turn is missing, then the value is the same as the code point itself. For a description of the source files, see UnicodeCharacterDatabase.html. That file also has links to the descriptions of the fields within the files. Since the PropList values are so long, they will probably also be abbreviated in the future." See "XML and Unicode." [cache]

  • [May 10, 2001] "Summary of the XML Family of W3C Languages." By Airi Salminen [Email: 28-March-2001. Latest version URL: "XML is a markup language for presenting information as structured documents. The language has been developed from SGML as an activity of the World Wide Web Consortium (W3C). Within W3C there is going on a number of other XML-related language development activities where the intent is to specify syntactic and semantic rules either for some specific kind of XML data or for data to be used together with XML data for a specific purpose. In this report the term XML family of W3C languages refers to XML and those XML-related languages. The purpose of the report is to give a concise overview of the current state of the development of the languages... In this summary the XML family of W3C languages has been divided into four groups: XML, XML Accessories, XML Transducers, and XML Applications. (1) XML Accessories are languages which are intended for wide use to extend the capabilities specified in XML. Examples of XML accessories are the XML Schema language extending the definition capability of XML DTDs and the XML Names extending the naming mechanism to allow in a single XML document element and attribute names that are defined for and used by multiple software modules. (2) XML Transduces are languages which are intended for transducing some input XML data into some output form. Examples of XML transducers are the style sheet languages CSS2 and XSL intended to produce an external presentation from some XML data and XSLT intended for transforming XML documents into other XML documents. A transducer language is associated with some kind of processing model which defines the way output is derived from input. XML Applications are languages which define constraints for a class of XML data for some special application area, often by means of a DTD. Examples of XML applications are MathML defined for mathematical data or SMIL intended for multimedia documents... This report has been created as part of the X Group activities at the University of Waterloo in Canada." [cache]

  • [May 10, 2001] "Updating XML." By Igor Tatarinov, Zachary G. Ives, Alon Y. Halevy, and Daniel S. Weld. Paper presented at ACM SIGMOD/PODS 2001, Santa Barbara, California, May 21-24, 2001. 12 pages. The authors propose a set of operations for both ordered and unordered XML data, and describe extensions to the proposed W3C XML Query language (XQuery) to incorporate the update operations. They conclude that updates to an XML document can be expressed in a concise and natural way, even with support for ordering. They show that the basic set of constructs can be efficiently implemented over a relational database. Note: Zach Ives and Igor Tatarinov work on the the Tukwila data integration system. "Zack Ives is responsible for the Tukwila execution engine and its adaptive operation, as well as the optimizer for previewing query results. His work largely relates to adaptive query processing, processing of XML data, XML and zero-knowledge query optimization, and XML query languages. Igor Tatarinov is developing the next-generation Tukwila query optimizer, focusing on high-level optimization for data integration." For related XML research, see (1) the publications listing of Zachary Ives and Daniel Weld (Professor of Computer Science and Engineering, University of Washington); (2) "Tukwila Data Integration System (University of Washington)." [cache]

  • [May 09, 2001] "Model-Driven Architecture: Vision, Standards And Emerging Technologies." By John Poole (Hyperion Solutions Corp). April 2001. ['A paper submitted to ECOOP 2001 Workshop on Metamodeling and Adaptive Object Models. It discusses the MDA standards (including CWM), current Java platform initiatives, and they could ultimately be used to build totally dynamic systems.'] "Recently, the Object Management Group introduced the Model-Driven Architecture (MDA) initiative as an approach to system-specification and interoperability based on the use of formal models. In MDA, platform-independent models (PIMs) are initially expressed in a platform-independent modeling language, such as UML. The platform-independent model is subsequently translated to a platform-specific model (PSM) by mapping the PIM to some implementation language or platform (e.g., Java) using formal rules. At the core of the MDA concept are a number of important OMG standards: The Unified Modeling Language (UML), Meta Object Facility (MOF), XML Metadata Interchange (XMI), and the Common Warehouse Metamodel (CWM). These standards define the core infrastructure of the MDA, and have greatly contributed to the current state-of-the-art of systems modeling. As an OMG process, the MDA represents a major evolutionary step in the way the OMG defines interoperability standards. For a very long time, interoperability had been based largely on CORBA standards and services. Heterogeneous software systems inter-operate at the level of standard component interfaces. The MDA process, on the other hand, places formal system models at the core of the interoperability problem. What is most significant about this approach is the independence of the system specification from the implementation technology or platform. The system definition exists independently of any implementation model and has formal mappings to many possible platform infrastructures (e.g., Java, XML, SOAP). The MDA has significant implications for the disciplines of Metamodeling and Adaptive Object Models (AOMs). Metamodeling is the primary activity in the specification, or modeling, of metadata. Interoperability in heterogeneous environments is ultimately achieved via shared metadata and the overall strategy for sharing and understanding metadata consists of the automated development, publishing, management, and interpretation of models. AOM technology provides dynamic system behavior based on run-time interpretation of such models. Architectures based on AOMs are highly interoperable, easily extended at run-time, and completely dynamic in terms of their overall behavioral specifications (i.e., their range of behavior is not bound by hard-coded logic). The core standards of the MDA (UML, MOF, XMI, CWM) form the basis for building coherent schemes for authoring, publishing, and managing models within a model-driven architecture. There is also a highly complementary trend currently building within the industry toward the realization of these MDA standards in the Java platform (i.e., standard mappings of platform-independent models to platform-dependent models, where the platform-dependent model is the Java platform). This is a sensible implementation strategy, since development and integration is greatly facilitated through common platform services and programming models (interfaces or APIs), provided as part of the Java platform. Java 2 Platform, Enterprise Edition (J2EE), has become a leading industry standard for implementing and deploying component-based, distributed applications in multi-tier, Web-centric environments. Current efforts within the Java Community Process to develop pure Java programming models realizing OMG standards in the form of J2EE standard APIs (i.e., JMI, JOLAP and JDMAPI) further enhance the metadata-based interoperability of distributed applications. This paper surveys the core OMG MDA standards (i.e., UML, MOF, XMI and CWM) and discusses the current attempts at mapping these standards to J2EE, as examples of PIM-to-PSM translations that are currently under development. These forthcoming APIs will provide the initial building blocks for a new generation of systems based on the model-driven architecture concept. The progression of these initial MDA realizations to AOMs is the next logical step in this evolution." See: "OMG Model Driven Architecture (MDA)." [cache]

  • [May 09, 2001] "Data Warehousing Industry Weaves a Meta Data Standard. [Business Intelligence.]" By David Marco. In Application Development Trends Volume 8, Number 5 (May 2001), page 17. "The issue of meta data integration is one of the chief mitigating factors that have prevented most organizations from achieving successful data warehouse, e-business, Customer Relationship Management (CRM) and Enterprise Resource Planning (ERP) implementations. This column focuses on the Object Management Group (OMG) meta model standard Common Warehouse Metamodel (CWM), the impact this standard will have on the industry and its promise to aid in this task of meta data integration... the OMG CWM is a standard that offers the promise of improving these meta data integration processes. But what is a meta model? It is a fancy phrase for a physical data model that stores meta data. The CWM has initially focused on the data warehousing arena and is broadly supported by the vast majority of data warehouse vendors, meaning that they have integrated CWM into their tools' meta model or they are looking to provide an interface that will transfer their meta data into CWM. This capability will allow data warehousing products from different vendors to share technical meta data. The CWM specification can be downloaded from For many years all of us in the meta data arena have desired a global meta model standard. A year ago we had two competing standards, CWM and the Open Information Model (OIM), which was being moved forward by the Meta Data Coalition (MDC). Unfortunately for the industry, two standards were one too many. On September 25, 2000 the MDC merged with the OMG with the goal of consolidating the separate initiatives into one meta data standard under which all vendors can unify..." [From the OMG web site, 'Data Warehousing, CWM And MOF Resource Page': "The Common Warehouse Metamodel (CWM) is a specification that describes metadata interchange among data warehousing, business intelligence, knowledge management and portal technologies. The OMG Meta-Object Facility (MOF) bridges the gap between dissimilar meta-models by providing a common basis for meta-models. If two different meta-models are both MOF-conformant, then models based on them can reside in the same repository."] See: (1) "OMG Common Warehouse Metadata Interchange (CWMI) Specification", and (2) now absorbed/merged, "MDC Open Information Model (OIM)."

  • [May 09, 2001] "EJBs to the Rescue. [EJB Update.]" By Peter Fischer (Quantum Enterprise Solutions Inc.) and Stephen Reckford (Concept Five Technologies). In Application Development Trends Volume 8, Number 5 (May 2001), pages 29-37. [' As corporate IT's integration activities continue to accelerate and consume an increasingly large piece of the budgetary pie, EJBs can offer a more rapid component-based integration solution for the J2EE environment.'] "According to Forrester Research, 30 to 40% of corporate IT budgets are typically spent on integration activities. According to a GartnerGroup estimate, by 2005 e-business initiatives and infrastructure will consume 30 to 50% of enterprise IT spending. Based on these forecasts, there will continue to be strong budget and financial incentives to leverage successful integration strategies... The J2EE platform is a solid platform upon which component-based integration solutions targeted to e-business can be built. Java technologies fit into the J2EE platform and provide a platform for creating apps that combine elements in the client tier with applications in the EIS tier via a middle tier that is comprised of presentation and business logic entities... Combining point-to-point integration solutions with other middleware technologies, such as message or integration brokers, opens up new horizons in integration and provides a robust, scalable and extensible integration platform that provides message-based integration among multiple application systems. The power of this approach lies in replacing a potentially chaotic and disorganized set of point-to-point integrations with a coordinated set of interoperable connections which can be reused and serve multiple purposes. In this integration architecture, the integration broker provides a single interface for accessing legacy, CRM or ERP assets, replacing the point integration solutions such as JDBC, ECI, JMS and MQSeries with a single API set. Adapters provided by the integration broker vendor allow these systems to plug into the integration architecture and EJB components transparently... A significant advantage of this approach is the ability to leverage XML as a standard message format. A number of products available on the market today can be used to create Java classes from XML constructs, which eliminates the need to write Java code that utilizes XML parsers and the Document Object Model (DOM). One such tool is the Breeze XML Studio from The Breeze Factor LLC in Encinitas, Calif., which allows developers to create a JavaBeans class graph that encapsulates XML parsing and validation and has methods that map directly to the XML data elements and attributes. At runtime, these JavaBeans are populated with the appropriate data from the XML document. The beans can then be packaged as an EJB, called EJB A, enabling the XML information to become an integral component of an integration architecture. B2B integration brokers provide the ability to integrate processes and information external to an organization. They provide connectivity between supply chain partners, customers and exchanges with application components via data exchange using XML messaging (XML/HTTPS). Combining the capabilities of these brokers with a tool like Breeze allows the creation of integration components that can become an integral part of the B2B integration architecture... One particular pattern that we have implemented successfully in component integration frameworks is an extension of the classic Model-View-Controller (MVC) pattern. The MVC design pattern divides an interactive application into three discrete functional components. The model component contains the core functionality and data; the view component provides information to the user; and the controller component ties the model and view together to create the new transaction or process. In this approach the interactive application is the transaction or business process initiator, which can be a servlet, a Java Server Page or another presentation layer component. Multiple resource adapters (one resource adapter per type of EIS) can be plugged into an application server. This capability enables EJB and other J2EE components that are deployed in the application server to access the underlying EISs. The resource adapter is used by an application server or client to connect to an EIS. The resource adapter 'plugs into' the application server and collaborates with it via a set of standard interfaces to provide underlying security and transaction support in a manner similar to EJB containers. Several ERP vendors are getting a jump on the new architecture by releasing connectors that are compatible with popular application servers. Two examples are PeopleSoft and SAP. By using PeopleSoft's Component Interfaces, third-party systems can synchronously invoke PeopleSoft business logic via EJB. SAP validates third-party products for SAP's Business Technology that support development of business logic in Java and data transfer using XML..." Also in this issue of ADTMag: "Designing a scalable architecture using J2EE."

  • [May 09, 2001] "Integration: This Decade's Theme." By John D. Williams. In Application Development Trends Volume 8, Number 5 (May 2001), pages 63-64. "... Now that we are at the beginning of a new decade, I believe that its theme will be integration. The integration issue is at the heart of the way companies choose to do business, and it is a defining characteristic of e-business. This is why the deployment of technologies such as Enterprise Application Integration (EAI) is really a business issue and not a technology issue. The forces driving business integration are rapidly changing markets, new business opportunities and customer expectations. These forces of change are working their way into the IT organization, driving budgets and projects. An aspect of these forces at work in the growth of EAI spending. In 1999, IT organizations spent $500 million on EAI tools. In 2000, they spent $900 million. Analysts predict that by 2005, companies will spend $7.3 billion on EAI... I think it is helpful to use a framework to understand different needs in EAI and how different tools meet those needs. Imagine a four-layer framework describing EAI capabilities. The lowest layer is the Transportation layer, which has five communication models describing how one system communicates with another. These models, as GartnerGroup defines them, are Conversational; Request/Reply; Message Passing; Message Queuing; and Publish and Subscribe. The next layer up is Data Transformation. This layer describes the mechanisms for taking information from one database and transforming it before putting it into another database. The third layer is Business Rules. The Business Rules layer describes the method of taking select information from one system and transforming it into use by others. This may be a one-to-one transformation or a one-to-many transformation. The top layer is the Business Process layer, which coordinates the flow of information throughout a complete business process. It describes the workflow and transformation of information across multiple systems. In our framework, we also see that meta data lies across all these layers. As we move from the lower levels of our framework to the top, we typically see the business value of integration increase. Most middleware tools, such as those for messaging and data warehousing, provide capabilities for the lowest two layers: Transportation and Data Transformation. Most EAI-specific tools are focused on the Business Rules layer, while some venture into the Business Process layer. EAI tools also often integrate with or provide support for tools working in the lower layers... XML has sparked much interest in this area of meta data exchange. It provides a mechanism for the dynamic interchange of meaningful information. In particular, there are components of XML that have tremendous value in the support of system integration. The most useful components are DTDs, XML Schema (and related variations), XSLT and XMI. DTDs and XML Schema define the structure of a document or information interchange. DTDs are the standard today. They do not use XML syntax and have some important limitations. For example, DTDs do not support the automatic validation of values. On the other hand, XML Schema does support the automatic validation of values. It also has the ability to define recurring blocks of elements or attributes once. Unfortunately, it is not yet a standard, though that should change soon. There are other non-standard alternatives available. XSLT allows you to transform one document type into another. XMI is the XML Meta data Interchange format from the Object Management Group (OMG). XMI is currently a proposal for the open interchange of application components and assets. But do we really have all the tools we need to develop robust e-business systems? I don't think so. I've mentioned that EAI breaks down the stovepipes when you integrate systems. This can have serious implications. Higher levels of integration can lead to higher levels of unintended interaction..."

  • [May 09, 2001] "Borland Enters Web Services Fray." By Tom Sullivan. In Infoworld (May 07, 2001). "When Borland announces the latest incarnation of its Delphi RAD (rapid application development) environment this week, the company will focus on the toolkit's tighter integration with its Kylix Linux tools and on the product's cross-platform interoperability. But Delphi's more important enhancements clearly are its support for Web services standards. Delphi 6.0 will feature compiler-level support for SOAP (Simple Object Access Protocol) and WSDL (Web Services Description Language). That means programmers will be able to Web-enable their applications without writing extra code, according to Michael Swindell, director of product management at Borland, in Scotts Valley, California. 'Delphi programmers don't have to do anything differently; they just select to expose the code from a menu,' Swindell said. The software itself adds the SOAP and WSDL functionality. Also new to Delphi 6.0 are BizSnap, a Web-services platform for building and integrating components; WebSnap, a Web application design tool; and DataSnap, a tool for creating Web-enabled database middleware. Borland is not the only tools vendor looking to help developers build Web services. WebGain, in Santa Clara, Calif., is also equipping its toolbox for Web services, and is componentizing the development process in preparation for Web services, according to CTO Ted Farrell. Analysts expect other tools vendors, such as Merant and Rational, to release products designed to help developers build Web services. All the major Web services vendors, including Microsoft, IBM, Sun Microsystems, and Oracle, have tools in various stages of development as well. 'We're at the point right now where people are starting to build Web services,' said Rikki Kirzner, an analyst at Framingham, Mass.-based IDC. 'They're not really building mission-critical apps, but they are making things such as e-business applications that use the same functions over and over.' One such customer, Hewitt Associates, a Lincolnshire, Ill.-based management consulting firm specializing in human resource solutions, is using IBM's WebSphere application server and the WebSphere suite of tools to move toward Web services..." See also (1) the Borland announcement, and (2) "Simple Object Access Protocol (SOAP)."

  • [May 09, 2001] "Borland Aims to Make Web Services a 'Snap'." By Peter Coffee. In eWEEK (April 26, 2001). "Within the next few weeks, Borland hopes to dilute the dominance of Microsoft's mind share in setting the course for Web services. While Microsoft's .Net development tools slog through a surprisingly volatile beta program, Borland will unveil in May a services-oriented tool kit that could unleash a surge of standards-based Web application deployment. I found the foundation-level DataSnap framework a logical next step along the XML-based data-handling path blazed by Borland's Linux-hosted Kylix tool set, launched earlier this year. Developers using DataSnap APIs will be able to publish any broadly supported SQL relational database via XML syntax, manipulated by SOAP (Simple Object Access Protocol) messages. Developers will be able to offer database access to a wide range of clients, including thin client browsers and 'headless' Web services, without costly development and maintenance of parallel code bases. Most strategic for Borland is the top-level BizSnap framework that fully integrates Web services into an object-oriented development environment. Anticipating the rapid adoption of XML by many enterprise developers, outpacing the emergence of standard XML schema, BizSnap tools streamline the definition of modular XML transforms that let a single application interact with XML data streams of similar content but differing structure. I found the transform creation tools intuitive and powerful. Meanwhile, SOAP bindings to Borland's integrated development environment will help developers follow the learning curve by automating syntax checking and offering intelligent auto-completion of XML-manipulating expressions, just as for conventional application code..." [Website description: "Delphi 6 radically simplifies building next-generation eBusiness applications on the Internet with complete SOAP based Web Services and XML data exchange support. The seamless integration of XML and Web Services technologies with Delphi 6 delivers the only Rapid Application Development for industry standard Web Services and B2B, B2C, and P2P integration over the Internet... DataSnap delivers high-performance, Web Service-enabled database middleware that enables any client application or service to easily connect with any major database over the Internet DataSnap supports all major database servers such as Oracle, MS-SQL Server, Informix, IBM DB2, Sybase and InterBase. Client applications connect to high-performance DataSnap servers through industry standard SOAP/XML HTTP connections over the Internet without bulky database client drivers and complex configuration requirements. DCOM, CORBA, and TCP/IP connections are also supported... Connect any Delphi 6 application or Web Service with Borland AppServer/EJBs using new SIDL (Simple IDL). Easily build ultra high-performance rich GUI Windows clients for EJB based AppServer applications. Publish AppServer EJB functionality to the world over Internet as industry standard SOAP/XML Web Services."] See also (1) the Borland announcement, and (2) "Simple Object Access Protocol (SOAP)."

  • [May 08, 2001] "ebXML and the Road to Universal Standards." By Dave Carr. In InternetWorld (May 08, 2001). "Another chapter in the quest for universal electronic business standards will end this week in Vienna, Austria, where the ebXML organization is meeting to wrap up its 18-month project. It's still probably an early chapter, with major plot twists yet to come, but it does move the story forward. One encouraging sign: the ripple effect of independently developed XML specifications' being reconciled. The ebXML group recently agreed to incorporate the SOAP protocol, which is popular with many XML Web services enthusiasts, into the ebXML messaging specification, and the version that's going up for a vote this week is based on an extended version of SOAP. Those extensions, in turn, could wind up being incorporated into the World Wide Web Consortium's XML Protocol (XP) effort, which is supposed to produce the successor to SOAP. Previously, this had been shaping up as a typical industry battle, with Sun Microsystems favoring ebXML and Microsoft talking up SOAP (the Simple Object Access Protocol), which is fundamental to its .Net initiative. Then RosettaNet, the group behind the electronics industry's highly advanced electronic commerce standards, said it would incorporate ebXML messaging into the next revision of its standards, rather than continuing to develop its own messaging specification. Carry this convergence forward another couple of steps, and we should see agreement on messaging standards among Microsoft's BizTalk, RosettaNet, ebXML, and other electronic-commerce frameworks. On the other hand, messaging is just one component of ebXML, and there are other areas in which it still needs to be reconciled with competing initiatives. For example, there's overlap between the ebXML registry and repository specifications and UDDI (Universal Description, Discovery, and Integration), another widely supported Web services technology. The reason the ebXML organization was formed in the first place was to bring together electronic-commerce initiatives from OASIS, an industry consortium, and UN/CEFACT, the United Nations organization that created the international standards for Electronic Data Interchange (EDI). CEFACT, the Centre for Trade Facilitation and Electronic Business, was looking at addressing the demand for a modernized version of EDI that would use XML and the Internet while OASIS was trying to solve essentially the same problems from an XML-centric worldview... What ebXML tries to do is establish a baseline framework that can be used to solve problems that may cross industry boundaries. It also aspires to promote a generalized model that those vertical industry groups can build on. To implement ebXML, you're supposed to model your business processes using the Unified Modeling Language, a standard supported by object modeling tools such as Rational Rose, and the UN/CEFACT Modeling Methodology. Business partners exchange Collaboration Protocol Profiles and use them to forge Collaborative Protocol Agreements (an extension of IBM's Trading Partner Agreement specifications). Further, it tries to specify some common business processes for interaction that can be used within this scheme..." See: (1) "Electronic Business XML Initiative (ebXML)", and (2) the announcement, "UN/CEFACT and OASIS Meeting Showcases ebXML for Healthcare and B2B."

  • [May 08, 2001] "XML Group to Create Specifications For Voting Systems." By Todd R. Weiss. In ComputerWorld (May 04, 2001). "Six months after the tumultuous presidential balloting in Florida, a nonprofit technical consortium yesterday announced that it has formed a committee to develop a specialized XML standard aimed at improving the accuracy and efficiency of elections. The Billerica, Mass.-based Organization for the Advancement of Structured Information Standards (OASIS) said the new technical committee will work to develop an Election Markup Language (EML) based on XML technology. The EML proposal would include specifications for exchanging data between election and voter registration systems developed by different hardware, software and IT services vendors. Karl Best, director of technical operations for OASIS, said last November's voting brouhaha in Florida graphically showed the need for more accurate elections using modern technology. The improvements envisioned by OASIS could impact public and even private elections around the world, including those held by private groups and companies, he said. The EML committee will look at a wide range of possible implementations for the new specifications, including voter registration, change of address tracking, redistricting, requests for absentee ballots, polling place management, election notification, ballot delivery and tabulation and reporting of election results. While OASIS will only create the specifications and leave it up to technology vendors to implement them, Best said he's confident that the international consortium's standing in the XML world would encourage the adoption of EML by a wide range of companies that offer voting systems and software. Gregg McGilvray, chairman of the new Election and Voter Services Technical Committee within OASIS, said the EML standard will be applicable to far more than just Web-based voting systems. He envisions the standard allowing different platforms, including touchscreen voting machines and even telephone-based systems, to share data regardless of how the information is collected or what operating system is being used. But Steve Weissman, legislative representative for the Washington-based watchdog group Public Citizen, said it's too early to support the EML effort or any other specific ideas for how to improve elections..." See (1) "Election Markup Language (EML)", and (2) "XML and Voting (Ballots, Elections, Polls)."

  • [May 08, 2001] "The Electrified Supply Chain." By Rajeev Kasturi. In Intelligent ERP (May 03, 2001). ['RosettaNet is delivering on the promise of extensible B2B integration.] "RosettaNet, a self-funded, nonprofit consortium of over 250 IT, EC, and SM businesses, has been working since 1998 to establish and implement industrywide standards for e-business. Trading partners adopting RosettaNet standards will benefit from a common language and communication protocols based on Internet and XML technologies. Using the standards also will result in reduced transaction turnaround times, greater transparency in translation and integration with backend systems, reduced costs, and increased efficiency. RosettaNet wants be the 'lingua franca of e-business... RosettaNet standards address four aspects of transactions between trading partners: business processes, data elements, communication protocols, and product/partner codification. In a nutshell, these four components encapsulate the exchange of information among trading partners. RosettaNet's Partner Interface Processes (PIPs) are elements that define business processes among supply-chain partners, such as pricing and availability requests, purchase orders, and order acknowledgements. PIPs are system-to-system, XML-based dialogs carried out based on certain specifications and guidelines. PIPs lie at the bottom of a hierarchy headed by clusters and segments. Clusters represent fundamental business process groups. Clusters are further broken down into segments, which represent interenterprise processes involving different types of trading partners. Segments consist of PIPs that define specific processes. For example, Cluster 3 is for order management, and it includes a Segment A that pertains to quotes and order entry. This segment has seven published PIPs, including 3A1 (Request Quote), 3A2 (Request Price and Availability), and 3A3 (Transfer Shopping Cart). Each PIP comes with a message guideline and XML document type definition (DTD). Dictionaries, which define data elements, come in three flavors: Business Dictionary, IT Dictionary, and EC Dictionary. Business data entities and properties are defined in the Business Dictionary, the IT Dictionary defines IT products and properties, and the EC dictionary defines components and their properties. All these elements are mapped to codification standards such as UN/SPSC. One of the fundamental requirements for meaningful data exchange and efficient information processing for products and services is commonly accepted codification standards. Fortunately, RosettaNet supports three widely accepted codification standards. The Data Universal Numbering System (DUNS) is maintained by Dun & Bradstreet and identifies a business and its location. The Global Trade Item Number (GTIN) identifies products, and the United Nations/Standard Products and Services Code (UN/SPSC) robustly and comprehensively classifies products and services. Another fundamental requirement for meaningful data exchange and information processing is a communications protocol. The RosettaNet Implementation Framework (RNIF) adequately covers this need for communication standards. The framework defines open exchange protocols and guidelines for communications between applications on networks. These specifications encompass various requirements such as message packing and the transfer of PIP objects between Web or browser servers; they incorporate protocols such as Common Gateway Interface (CGI), HTTP, and Secure Sockets Layer (SSL). The RNIF also supports digital signatures, digital certificates, and SSL to ensure business transactions are secure..." See "RosettaNet."

  • [May 08, 2001] "Slicing the Enterprise Pie. Portal Developers Partnering With Integration Vendors to Make Software More Transaction-Oriented." By John S. McCright. In eWEEK (May 07, 2001). "Portal developers octopus Software Inc. and Data Channel Inc. are making their software more transaction-oriented through upgrades and partnerships with EAI vendors. The goal of products coming from both companies is to provide users with a front-end presentation layer with which to view and manipulate a broader slice of corporate applications. Octopus last week introduced its namesake platform for building so-called Meta Applications, which enable nontechnical business users to create customized views of data from multiple sources. The Octopus Platform uses specialized adapters and a drag-and-drop user interface to view fine-grained data in Extensible Markup Language, messaging systems, enterprise resource planning applications, and other enterprise and legacy software. The platform also gives users the ability to write business logic and rules to create dynamic relationships between data coming from various systems, said Octopus CEO Stephen Douty, in Palo Alto, Calif. In this way, Meta Applications enable users to weave together data and processes from existing applications to form new applications. Separately, DataChannel, of Bellevue, Wash., this week will introduce its DCS (Data Channel Server) Extension Kit for EAI (Enterprise Application Integration). The SDK (software development kit) lets companies tap into a broader range of enterprise applications through its DCS portal. The SDK enables DCS to integrate with any platform that supports asynchronous messaging through adapters. New adapters from SeeBeyond Technology Corp. and Vitria Technology Inc. will extend DCS to 125 more applications and databases, DataChannel officials said. Although the extension kit will be used by IT managers, DCS 5.0, due late this summer, will feature a new user interface that will allow nontechnical people to do drag-and-drop editing of their portal Web pages. Version 5.0 will also add an application server, additional EAI adapters, stronger versioning and workflow for document management, and the ability for users to have multiple virtual workspaces for collaboration, officials said. A Shared Object Repository will enhance Version 5.0's process integration capabilities, officials said..."

  • [May 08, 2001] "Enabling Access to Online Digital Services: IMS Digital Repositories Technical Specifications Group." By Kevin Riley. In Syllabus Magazine Volume 14, Number 10 (May 2001), pages 16-18. ['A look at the process of setting standards and specifications to support interoperability of digital repositories.'] The author surveys the goals of the IMS Digital Repositories Group, discusses the IMS specification process, and summarizes the key IMS specifications. The IMS Digital Repositories Work Group was established in February 2001, and scheduled its first meeting for May 7-9 in Lund, Switzerland. The article provides a table listing the seven (7) IMS specifications already published and three (3) specifications under development. "The group spans user communities, server-side technology providers, publishers, and middleware infrastructure vendors. Group members include EdNA (representing DETYA in Australia), Fretwell-Downing, GIUNTI (Italy), IOS Press (Netherlands), Oracle, Sun, TEMASEK (Singapore), UKOLN (participants in the UK Distributed Network of Electronic Resources Program), and the University of California at Berkeley and University of Wisconsin from the NSDL program. Others are coming on board as the group gets under way. The work of the group falls into two categories: (1) Integration of e-learning with existing online digital services; (2) Development of novel repository technology to support the configuration, presentation, and delivery of learning objects required for learner-centric learning to become a reality. The diversity of offerings under the umbrella of online digital services reflects a wide range of content formats, existing implemented systems, technologies, and established practice. However, given the investment made in their development,it is impractical even to consider a solution that requires their re-implementation on a short-to-medium-term timeframe. Rather, the group will focus on common functions, which can be used across services to enable them to present a common interface. These common functions encompass desirable and necessary features such as authentication, authorization, enrollment, search, location and retrieval, IPR management, user preferences, and profiling, payment, and search gateways across services. Learning Object Repositories share all of the above (either directly or via the LMS they serve), but also have the added dimension of supporting contextualized sequencing and navigation -- and potentially, dynamic branding of objects to a service at runtime. The group intends to construct a generic functional architecture and then define specific application profiles through that architecture to meet the needs of each of the services identified above. The functions will then be prioritized to identify the order in which they will be put through the IMS specification process. In addition to the specification work, linked R&D projects are being set up across Australia, Europe, and the U.S. that will support pilot implementations of the technology adopted, both as initial proof of concept and testing of the robustness of the emerging specifications." ["IMS Global Learning Consortium, Inc. (IMS) is developing and promoting open specifications for facilitating online distributed learning activities such as locating and using educational content, tracking learner progress, reporting learner performance, and exchanging student records between administrative systems. IMS has two key goals: (1) Defining the technical specifications for interoperability of applications and services in distributed learning, and (2) supporting the incorporation of the IMS specifications into products and services worldwide. IMS endeavors to promote the widespread adoption of specifications that will allow distributed learning environments and content from multiple authors to work together (in technical parlance, 'interoperate'). IMS uses XML as its current binding, and XML-Schema as its primary XML control document language. The IMS XML Bindings and the list of IMS specifications are available for download. Specifications materials include: IMS Content Packaging Specification, IMS Learning Resource Meta-data Specification, IMS Question and Test Specification, IMS Enterprise Specification, IMS Meta-data Specification, IMS Reusable Competencies Definition Information Model Specification, IMS Learner Information Package Specification, etc.] See: (1) "IMS Metadata Specification", and (2) the recent IMS announcement.

  • [May 08, 2001] "Pushing the SCORM Envelope. The Role of XML, Content Management Systems, And Dynamic Delivery in ADL-SCORM." By Jeff Larsen, Jeff Katzman, and Jeff Caton. Peer3 company white paper. December 12, 2000. 12 pages. "The Advanced Distributed Learning initiative (ADL) emerged this year as a focal point for eLearning standards. Its Shareable Content Object Reference Model (SCORM) 1.0 technical specifications gained widespread acceptance and implementation among government, commercial, and academic circles. SCORM represents the integration of all leading eLearning standards (AICC, IMS, IEEE, and soon Microsoft's LRN) to create a unified standard. SCORM seeks to enable reuse of Web-based content across multiple environments and products, as well as provide a means for individualized eLearning. The goals of ADL are laudable. By promoting a digital knowledge network based on reusable objects and individualized learning, ADL believes it can help reduce the cost of instruction by 30-60%; reduce the time of instruction by 20-40%; increase the effectiveness of instruction by 30%; increase student knowledge and performance by 10-30%; and improve organization efficiency and productivity. Further, the vision of ADL is consistent with that of many thought leaders in the eLearning and Knowledge Management industries - mainly, that true interchange of learning objects across disparate Learning Management Systems (LMS) will require adherence to accepted standards for describing learning taxonomies, course information, and course packaging. However, we believe that SCORM must address three fundamental issues before the goals of ADL can be fully realized. These issues can be posed as the following three questions: (1) Will XML be prescribed as the data format for learning content itself? (2) Will a standard methodology be specified for integrating Content Management Systems with Learning Management Systems? (3) Will dynamic delivery of content objects be supported? True reusability of learning objects requires a data format that separates content from its pre-sentation; this fundamental requirement is met by XML. Learning Management Systems (LMS) provide only part of the solution for eLearning; XML authoring, Content Management Systems (CMS), and dynamic delivery round out the technologies necessary to complete the ADL vision. As participants in the Technical Working Group for SCORM, Peer3 remains committed to supporting the ADL and the evolution of these important standards. Peer3 was the only vendor to present a commercially available eLearning solution for XML authoring, content management, and dynamic delivery at the first ADL PlugFest earlier this year. Now Peer3, in collaboration with other eLearning-oriented CMS vendors, is promoting the recognition of this distinct product category as well as changes to the SCORM that will result in open standards for XML-based eLearning content..." See (1) See: "Shareable Courseware Object Reference Model Initiative (SCORM)", and (2) Advanced Distributed Learning Initiative. [cache]

  • [May 08, 2001] "Converting from XML Schema data types to SQL data types." By Jasmin Wason. 2001-05-08 or later. [XML-DEV post: 'Here is a link to a table of possible mappings between XML Schema and SQL data types. This is based on the idea that a relational database schema has been generated from an XML Schema. The appropriate SQL data types should be used in the database so that data from conforming instance documents can be stored. The table is very much under construction and any comments or criticisms would be most welcome.'] "The XML Schema definition language is the new W3C Recommendation for describing the structure of XML documents. The specification consists of two parts, XML Schema Part 1: Structures and XML Schema Part 2: Datatypes. The rich data type and structural support of XML Schema makes it a good candidate for automatic conversion to a database schema, and the original XML Schema Requirements document specifies a type system adequate for import/export from database systems. Other features such as the ability to define default values, scoped unique values, keys and relationships can also be employed for use with relational databases. The following table describes a possible mapping between XML Schema and SQL data types. The table is still under construction. Any comments concerning its content are most welcome and should be sent to Jasmin Wason..."

  • [May 08, 2001] "XML Databases Gain Momentum." By L. Scott Tillett. In InternetWeek (May 07, 2001), pages 10-11. "As companies turn to XML as a common language for conducting intercompany business and as organizations publish more content using XML, IT shops are warming up to using specialized XML databases to manage content. When XML database developer Ipedo launches this week with a repository for XML content, it will join a host of such vendors that have emerged in recent months, including B-Bop, Ixia and X-Hive. Longtime vendor Software AG has offered a native XML database product since 1999. IT services firm ProLogic Inc. began testing Ipedo's XML Database to manage content for a Defense Department project. The project focuses on digitizing technical manuals, such as those used to repair helicopters. The manuals, called interactive electronic technical manuals (IETMs), enable repair technicians to take notebook computers instead of thick repair books with them to the hangars when they work on aircraft... Storing commonly used documents in an XML database saves having to translate documents from their native formats as they're needed. That usually requires custom JavaScript code. XML databases could also help users overcome a fundamental shortcoming of relational databases made by Microsoft, Oracle and others. Because relational databases structure data in rows and columns, it's difficult to express the relationship among different data records. XML databases lets data be structured hierarchically, thereby grouping documents that relate to one another, said Glenn Copen, director of application development at ProLogic. For example, the process of repairing a fan assembly may call for replacement of the fan belt first. Advocates of XML databases say relational databases work well for handling transactional data, while XML databases are better for data about multilayered processes that require context..." See "XML and Databases."

  • [May 07, 2001] "DTDs and Namespaces." By C. Michael Sperberg-McQueen. XML-DEV posting 2001-05-07. "... It is certainly true that DTDs can contain sufficient information to support namespaces, in the sense that they can be used to define the names in a namespace, in a system which understands DTD notation and which can resolve qualified names correctly. But some outside system is required; no system which validates using a DTD and the validation rules of XML 1.0, without extension, can support all the syntactic variations allowed by the namespaces recommendation. When I say that DTDs cannot 'support' namespaces I mean simply that given some plausible account of the rules which govern elements in some set of namespaces, and the rules of the namespace recommendation (which include the ability to bind arbitrary prefixes to arbitrary namespaces), it is not possible to write a DTD which (using the normal rules of DTD-based validation) recognizes the set of documents which follow the rules, and distinguishes them from documents which don't. It is possible, using clever parameter entity tricks, to allow the user to associate namespaces with arbitrary prefixes. This is a partial victory. In their full generality, however, the rules of the namespace recommendation allow homography: elements with different universal names (and thus potentially different declarations) can appear with the same prefix + colon + localname as their generic identifier... I continue to believe that the DTD notation does not support namespaces 'in their full generality'." References: "Namespaces in XML."

  • [May 05, 2001] "Java Hits The New Sweet Spot. [Tech Analyzer.]" By Alan Radding. In InformationWeek (April 30, 2001), pages 63-68. ['Sun is extending the Java environment to address messaging, wireless, networking, and storage issues. Sun's Java 2 Enterprise Edition version 1.3, due later this year, promises enhancements for application and XML integration. Sun also is extending the Java environment to address messaging, wireless, networking, and storage issues.'] "The upcoming version of Sun Microsystems' Java 2 Enterprise Edition version 1.3, due in the third quarter, promises to be packed with enhancements designed to win the hearts of IT. But will IT departments and the rest of the industry be ready? J2EE 1.3 will include an enterprise integration framework, called the Java Connector Architecture, that's designed to ease the integration of enterprise applications such as enterprise resource planning, customer-relationship management systems, databases, and mainframe transaction applications. The J2EE upgrade also will have better integration with XML... Along with Java Message Service support, Java 2 Enterprise Edition 1.3 introduces a new Enterprise JavaBean. The message-driven bean lets the J2EE platform receive asynchronous Java Message Service messages. While the message-driven bean is intended to work with Java Message Service, it will also work with other asynchronous messaging protocols. Without the message-driven bean, Java developers have to manually code the connection between the incoming message and the business logic contained in an Enterprise JavaBean. 'The new bean defines how the messaging and logic tie together, which gives developers a much better starting point,' Kassabgi says. J2EE 1.3 will offer a host of other enhancements. For example, it will provide improved interaction between Java and XML through a new Java API for XML parsing. This will let developers create applications that receive XML documents as Java Message Service messages and parse and process the incoming data. They can also generate Java Server Pages using XML. Beyond Java 2 Enterprise Edition, Sun is also continuing to expand the boundaries of Java. J2EE promises to bring Java capabilities to mobile devices such as cell phones, PDAs, and even cars. The goal is to extend Java applications from the server all the way to the smallest pocket device. Jiro is a Java initiative that addresses storage. Sun says Jiro is the storage industry's first intelligent development and deployment environment. Jiro defines an architecture of components and management services specific to the needs of storage networks. Sun argues that Jiro will simplify storage network management and reduce costs through its inherent intelligent connectivity in Jiro. Though several dozen vendors have indicated intentions to support Jiro, only Sun itself and Veritas Software Corp. have certified products ready to ship."

  • [May 05, 2001] "CSS Enhancements in Internet Explorer 6 Public Preview." By Lance Silver (Microsoft Corporation). From MSDN Online. March 2001. ['This document describes the enhanced support for the Cascading Style Sheets (CSS) specification provided by Microsoft Internet Explorer 6 Public Preview or later. This document assumes that you are familiar with HTML and CSS. To view the samples in this document, you must have Internet Explorer 6 Public Preview or later installed on your system.'] "Microsoft Internet Explorer 6 Public Preview and later supports CSS features that earlier versions of Internet Explorer do not support. Two additional CSS properties are supported -- min-height and word-spacing. Several additional possible values are supported, including the pre value of the white-space property and the list-item value of the display property. Other significant features include stricter parsing of style sheets and changing which HTML elements can represent the outermost surface onto which a document's content can be rendered. These enhancements were made to comply with the CSS specification. All of the properties, values, and features defined in the CSS, Level 1 (CSS1) specification are supported, including the box model that defines how to measure and format elements and their associated margin, border, and padding properties. But what's really cool is that even with all these enhancements, you're unlikely to experience any significant compatibility problems with applications you developed for earlier versions of Internet Explorer. The !DOCTYPE 'Switch': This section describes how to use the !DOCTYPE declaration in your document to switch on standards-compliant mode with Internet Explorer 6 Public Preview or later. The !DOCTYPE declaration is a Standard Generalized Markup Language (SGML) declaration that specifies the document type definition (DTD) a document (theoretically) conforms to. It looks like an HTML tag with no closing tag, but it starts with an exclamation point (!), and it contains single tokens instead of attribute name-value pairs. This declaration must occur at the beginning of the document, before the HTML tag. You switch on standards-compliant mode by including the !DOCTYPE declaration at the top of your document, specifying a valid Label in the declaration, and in some cases, specifying the Definition and/or URL. The Label specifies the unique name of the DTD, and can be appended with the version number of the DTD. The Definition specifies the definition of the DTD that is specified in the Label. The URL specifies the location of the DTD. There are three Definitions specified in the HTML 4.0 specification -- Frameset, Transitional, and Strict. Frameset is used for FRAMESET documents. Transitional contains everything except FRAMESET documents, and Strict, according to the HTML 4.0 specification, "...excludes the presentation attributes and elements the World Wide Web Consortium (W3C) expects to phase out as support for style sheets matures." The following table shows which values of the !DOCTYPE declaration switch on standards-compliant mode with Internet Explorer 6 Public Preview or later..." See "W3C Cascading Style Sheets."

  • [May 05, 2001] "Mix and Match Markup: XHTML Modularization." By Rick Jelliffe. From May 02, 2001. ['The latest development from the W3C on HTML is the XHTML Modularization specification, allowing the tailoring of XHTML to suit different applications or devices. This article discusses the motivation and techniques behind modularization.'] "XHTML Modularization makes it convenient to create specialized versions of XHTML: subsets with tailored content models and extensions in other namespaces. XHTML Modularization may be one of the most important new technologies of 2001. This article introduce the basics of XHTML modularization. The same approach can be used with many XML languages... XHTML Modularization is essentially a set of conventions for splitting up a large DTD into modules. Following these conventions, XHTML has been split into modules, which have been made available to the public. Modularization works by providing a construct more largely-grained than the element and more finely-grained than the entire HTML namespace. The purpose of modularization is allow someone, perhaps not an expert in DTDs or Schemas, to restrict and extend their own version of HTML. Using modules means they won't miss something out by accident, as well as that there are placeholders for extensions and restrictions that are convenient and visible to others. So modularization does not actually alter the expressive power of DTDs or W3C XML Schema. Instead it provides an abstract model and practical conventions for how to organize a DTD or Schema... Evidence for the success of the XHTML Modularization concept may be found in the rapid development of the RDDL language [Resource Directory Description Language], which is Jonathon Borden and Tim Bray's (with others) Resource Directory Description Language, a version of XHTML with a simple linking element added in another namespace to point to the various resources related to a namespace URI. It is exactly the kind of DTD that XHTML M12n is good at, though XML M12n was inspired by the needs of PDAs or small appliances. The next question that arises is whether the modularization system would be useful for other large languages that we might wish to subset (did anyone say XML Schema?) The question is whether that kind of modularization allows the cake to be sliced in the most appropriate way. Additive modules could clearly be used to handle, for example, selecting or not selecting a key/keyref module, but if facets were modularized, an additive driver might be quite large, and there is no subtractive M12n approach tabled for XML Schemas. But still, the document would be simple to read and straightforward to create. The final question is whether, if modularization in XML Schemas is useful, there should be first-class markup to support it? Presumably, the way to approach this would be to provide some first-class support so that there might be a <module> element which could be used by some schema-management tools directly, to provide markup rather than just conventions. A modularization system for XML Schemas should look at the modularization system in the RELAX language (which is based on Toru Takahashi's early designs for SGML.)" See (1) Modularization of XHTML in XML Schema and (2) Modularization of XHTML [W3C Recommendation]. References in: "XHTML and 'XML-Based' HTML Modules."

  • [May 05, 2001] "Building a Semantic Web Site." By Eric van der Vlist. From May 02, 2001. ['By simple use of XML vocabularies like XMLNews and RSS, Eric van der Vlist shows how you can build dynamic indexes to web site content.'] "Even though the Semantic Web may yet seem a remote dream, there are already tools one can use to make a tiny step forward by building 'semantic web sites,'" which can be much easier to navigate than ordinary sites. In this article, I will discuss how RSS 1.0 and its taxonomy module can be used as a central format to carry metadata collected in a classical news format, such as XMLNews-Story, to RDF or relational databases and XML Topic Maps. Readers should have basic familiarity with RSS and RDF, and a little topic maps knowledge would also help... I have built XMLfr (, a French site dedicated to and powered by XML, as a showcase for XML technologies and will use it as a real life example throughout this article. XMLfr is a dynamic site, using XML and XSLT, which stores its pages in the XMLNews-Story format. The site structure is described by a set of RSS 1.0 channels, and the semantic information encoded in the rich XMLNews-Story inline markup is converted into RSS 1.0 taxonomy markup. These RSS channels may be consolidated in an RDF database allowing ad hoc semantic queries on the global set of articles. They feed RDBMS tables for online, real-time queries that build a dynamic site index and include navigational information in the XHTML pages sent to the site users. The RSS channels can be transformed into XTM Topic Maps, to be displayed by Topic Maps visualization systems, and be enriched by the statistics extracted from the database in order to propose topic associations..." References: (1) See "RDF Site Summary (RSS)", (2) "(XML) Topic Maps."

  • [May 05, 2001] "Specification: Daring to Do Less with XML." By Michael Champion. From May 02, 2001. ['One person's tangled mess of XML is another's set of must-have features. This article offers advice for making your way through the jungle of XML and its associated specifications. Also, overview of the debate on XML simplification.'] "Many observers have noted that the basic simplicity of XML is a fundamental reason for its rapid acceptance in electronic business. The original XML specification can be printed on about 40 pages, compared to the more than 400-page SGML specification from which it is derived. This makes it relatively easy to implement an XML parser that processes input text and makes the XML structures encoded in that text available to ordinary software. On the other hand, XML has a number of quite notable limitations in its original form. For example, the DTD schema language in the XML Recommendation is too limited for many business purposes because it has little conception of datatypes that are used in almost all programming languages and e-business applications. Perhaps the most visible approach to this problem, and the focus of the World Wide Web Consortium that has defined XML and many related standards, is to build new XML-related specifications that address the limitations. These efforts include the recent or emerging specifications for XML Namespaces, RDF, XSL, and XML Schemas. These efforts have brought additional functionality to the basic XML toolset, but they've also brought widespread criticism because of the difficult prose in which they are described and in the complexity of the underlying structures and operations that they define. There are public calls by popular writers on XML topics for a refactorization of the XML specifications ('acknowledging that it is hard to get things right the first time, and allowing changes in requirements'), as well as for a minimization of their growing interdependency. We've even seen a number of rudely named Web sites maintained by well-known XML developers, put up to provide public forums from which to discuss topics such as 'XML Namespaces: Godsend or Demon Seed.' Needless to say, these sites have not generated much public approval from the XML community, but it is important to emphasize that they do not oppose XML, but rather advocate 'tough love' for it. In short, the XML community faces something of a dilemma. It's the simplicity of the XML specification itself that has brought it such widespread acceptance in such a short time, but the accompanying lack of features leads to increased complexity of the XML family of specifications. This in turn leads to the backlash that we are now observing..."

  • [May 05, 2001] "Transforming XML: Namespaces and Stylesheet Logic." By Bob DuCharme. From May 02, 2001. ['In the second part of a two-part series on handling XML Namespaces, Bob explains how to process namespaces in a source document, and gives an example of processing XLink into HTML.'] "In last month's 'Transforming XML' column, we saw how to control the namespaces that are declared and referenced in your result document. If your stylesheet needs to know details about which namespaces are used in your source document, and to perform tasks based on which namespaces certain elements or attributes belong to, XSLT offers a variety of ways to find out. This month we look at them. To experiment, we'll use the following document. It has one title element and one verse element from the namespace, two verse elements from the namespace, and one verse element from the default namespace... XSLT's ability to base processing logic on namespace values makes it a great tool for developing XLink applications. As you use XSLT to process more and more XML documents with elements from specialized namespaces -- for example, SOAP envelopes, XHTML elements, or XSL formatting objects -- you'll find these techniques invaluable for keeping track of which elements come from where so that you can take the appropriate action with them." [Sample documents and stylesheets are available in the zip file.] Related references in (1) "Extensible Stylesheet Language (XSL/XSLT)"; (2) "Namespaces in XML."

  • [May 05, 2001] "XML Web Services For The Rest Of Us. Report From XML DevCon 2001." By Jon Udell. In Byte Magazine (April 16, 2001). "I've just returned from the XML Developer's Conference in New York. The upshot, from my perspective, is both good news and bad news. Let's start with the good, of which there is plenty. There can be no doubt the fabric of the next-generation Internet is being woven of XML. As I'm sure is true for many of you, XML has already insinuated itself into many aspects of my daily work. I write this column in XHTML, because it's easy (for me) to do so, and because well-formedness helps ensure clean HTML rendering. I promote my own website and others that I work on, using RSS newsfeeds. I produce Linux Magazine's website by running Perl scripts over an XHTML repository. I've helped develop a service -- O'Reilly's Safari -- that stores content as XML, transforms it to HTML by way of XSLT, and performs its business logic using XML-RPC. There's nothing cutting-edge about any of this. It's just a matter of applying useful tools to basic problems... Will XML standards development someday fade into the woodwork? Will the 'XML community' dissolve back into the many constituencies from which it emerged -- publishing, software development, e-commerce? The answer to both questions is probably yes, but don't hold your breath. XML's charter puts it on a course to intersect with essentially all of the world's documents, data, and software. That's trivially true for ASCII, which can (and often does) encode all this stuff at a low level. For XML, which aims higher, it's not yet true, and far from trivial. The XML community is, in fact, wrestling with this whole question of levels of abstraction. During the panel discussion I sat in on, Tim Bray, co-editor of the original XML specification, noted that we already have the 'low-level' standards -- specifically SOAP -- that we need for the emerging web services architecture. In response Dave Orchard, an IBM technical architect and XML standards maven, observed that SOAP was until recently seen as a 'high-level' standard. In the same way, he argued, what we now see as 'high-level' proposed standards for orchestrating of SOAP-based interactions -- such as UDDI (Universal Description, Discovery, and Integration) and ebXML -- will work their way down the protocol stack. Everyone agreed that while XML is stable at its core, there's tectonic movement in the standards accreting around it. According to Bray, that's inevitable. The XML core, he pointed out, borrows heavily from proven SGML technology. Extensions such as XML Schema and XQuery break new ground, synthesizing ideas from object programming, relational database management, and other realms to create what is really a new way of representing and working with data. For myself, I tend to stick with the stable core, and watch with interest as the tectonic plates slide around on the map. Sometimes I find myself using, in an unofficial way, things that later become official standards. That was true for XHTML, a technique I was emulating in my own work about a year before I ever heard the term. And it's still true for XML-RPC, which I use because it's easy, supported in the environments I need, and good enough for the tasks at hand. .. As I walked the exhibition floor and listened to presentations, I wondered what will be the next piece of the cake that I'll put onto my own plate. What attracts me the most are the XML databases..."

  • [May 05, 2001] "The 'application/xhtml+xml' Media Type." By Mark A. Baker. IETF I-D 'draft-baker-xhtml-media-reg-01.txt'. [W3C note:] "An updated Internet Draft of the 'application/xhtml+xml' media type registration has been published. This document defines the 'application/xhtml+xml' MIME media type for XHTML based markup languages; it is not intended to obsolete any previous IETF documents, in particular RFC 2854 which registers 'text/html'. Note that while the revision number still says -01, this is the third draft which should have been numbered as -02. Please send comments to (archive)." See references in XML Media/MIME Types." [cache]

  • [May 04, 2001] "XML Enables Enterprise Web Site Development." Sun Microsystems. Written with the help of interviews with Anne Thomas Manes (Director of Market Innovation, Sun Software Group) and Mark Wallace (Manager of XML Tools Development, in the Forte Tools group). In Sun Journal Volume 5, Number 1 (May 2001). ['XML is a platform- and language-independent mechanism for describing data or, as it's better known on the Web, content. XML has rapidly gained acceptance as the lingua franca of the Internet. XML is described as a meta-language.'] "The term XML is no doubt familiar to enterprise IT managers by now. Magazines and books have been published and conferences have sprung up in recent months dedicated solely to XML. Many IT managers are also aware that this new abbreviation has found its way into their IT organizations, brought in by their programmers, in most cases. But just what is XML? One way to think of it is by analogy with the English language. English is the actual language, and English grammar is the meta-language, the set of procedures and rules that enables English speakers to communicate with and understand each other. Just as people are usually unconscious of the underlying grammatical rules when they are speaking or listening, the XML meta-language accomplishes tasks on the Web without needing to make Web users aware of its functionality. XML brings many key advantages to IT departments and their enterprises: It's a proven, robust technology based on prior standards and is easy to learn and use; It's a ubiquitous and open standard; It's integrated into Sun Microsystems' overall ONE (Open Net Environment) strategy; It works well with the Java programming language... XML is a way of representing data. There's no programming code involved with its use. It can be manipulated, though, with scripting languages such as Perl, Tcl, or JavaScript. It can also be used with more-powerful languages such C, C++, or Java. Because both Java and XML are completely platform-independent, applications written with them have data and behavior that can run on any platform in the world. Another advantage of XML is its ability to facilitate an application architecture in which nearly all of the heavy-duty processing is done on the server side. This makes it a natural fit with the Java programming language, which has demonstrated its value in server-side application logic. This is a crucial point for enterprise apps, because the servers must be maintained and evolved through multiple generations of client devices: old PCs, new PCs, old cell phones, new cell phones, and so on. The server-side logic must produce the right data or document for multiple platforms and generations of clients. In addition, both XML and Java are portable, enabling XML data and Java code to be sent to any type of device. Both XML and Java can run on modern clients, such as the recently released microJava, MIDP, Java TV, and Java Card... The Java/XML community has many strengths, too, including productive, open source and multivendor projects that are producing Java tools for working with XML. Besides the Sun-sponsored projects, such as, Java/XML programmers have access to open source XML technologies from the Apache XML project and other open source efforts. Because XML is owned by the W3C, rather than by a single company or group of companies, it will remain open and independent. Some people may choose to build applications that work only on their platform, even though it's using XML. Although XML does not magically enable different applications to talk to each other, it does specify a set of rules whereby particular groups that need to exchange information can define particular markup languages for exchanging that information. The applications still have to be programmed to understand the common language, whatever it is." Note: "This issue of Sun Journal focuses on the architecture that needs to be built and put into place for smart, anytime, anywhere, always-on services-and some of the technologies, such as XML, underlying this architecture. A detailed analysis of Sun's ONE, including information on how XML fits into the picture, can be found at"

  • [May 04, 2001] "DRM: 'Down-Right Messy' And Getting Worse." By Mark Walter, Patricia Evans and Mike Letts. In Seybold Report: Analyzing Publishing Technology [ISSN: 1533-9211] Volume 1, Number 3 (May 07, 2001). ['Digital rights management (DRM) is fundamentally about controlling access to content. On the surface, that seems like a straightforward definition for what sounds like a simple idea. But delve below the surface, and you will find that DRM is about a lot more than controlling access to content, and it is far from simple. That message came through loud and clear at Seybold Seminars last month, where publishers from all disciplines gathered to try and make sense of the mess that has become the DRM market. Why is it so confusing? There are a number of reasons.'] "At Seybold Seminars Boston 1999, there were two DRM vendors on the floor; at the 2000 event, there were eight; this year, there were 16 -- and that's not counting those in the market that didn't exhibit at this show... Standards may counter the Tower of Babel One route to compatibility is establishing a monopoly; another is to set standards for open competition. Recognizing the value that standards would bring to the adoption of DRM, several organizations have formed committees to write working drafts. It is not an easy task. At this point, the committees have not yet been able to nail down the requirements, let alone build consensus on how different machines will exchange rights information. They have not even agreed on what language they'll use to express the rights, once they figure out what they want to say. Since last fall, some progress has been made. In November, the Association of American Publishers, representing all of the major commercial book publishers, issued a set of proposed requirements. It was an extensive list -- beyond what any vendor is currently offering -- and therefore likely to be a superset of the requirements that all vendors would be willing to endorse. By winnowing down the AAP list to its common factors, the vendor committee may get an initial set of requirements as early as this summer. Meanwhile, over the winter season, two e-book groups with overlapping task forces tackling standards -- Open E-Book Forum (OEBF) and Electronic Book Exchange -- merged under the OEBF umbrella. They have agreed to address numbering and metadata -- two areas in which vendors are not arguing and the AAP's own committees have already made progress. Numbering, at least in e-books, is following the AAP suggestion of adopting the digital object identifier (DOI). To help propel that effort forward, and possibly enlist DOI support from music and film industries as well, the DOI Foundation has sponsored an e-book project that's scheduled to be shown at the Frankfurt Book Fair this October. The situation is encouraging on the metadata front as well. The AAP report last fall recommended building on the recently developed ONIX standard for book bibliographic data. The OEBF has not officially said yet what it's decided, but we expect that it will endorse an alignment with ONIX and that a unified scheme for printed and electronic books will be ratified within the next 12 months. The OEBF and other groups, including MPEG, are also considering the adoption of a universal rights-specification language, though to date no industry group has accepted the leading vendor-submitted candidate, ContentGuard's XrML. How much will numbering, metadata and a rights language help, if encryption remains vendor-specific? At this point, it's hard to tell how far toward interoperability they will take us... In the short term, the confusion over DRM is likely to worsen. In the absence of solidified standards, vendors are teaming with content providers to launch services tied to new rights-enforcement tools." On DRM and XML, see: (1) Extensible Rights Markup Language (XrML); (2) Digital Property Rights Language (DPRL); (3) Open Digital Rights Language (ODRL); (4) Open Ebook Initiative; (5) MPEG Rights Expression Language.

  • [May 04, 2001] "NetLibrary Adopts OEB Standard." From The Bulletin: Seybold News & Views on Electronic Publishing (May 04, 2001). "Last January, in a move designed to cut costs, online library service provider NetLibrary began charging publishers for the conversion services it originally provided for free. With sales sputtering, the company found that its in-house conversion services were a financial liability. Now, less than three months later, NetLibrary has scrapped all on-site conversion services for its NetLibrary service and cut nearly 90 jobs in the process. The conversion policy for MetaText, an interactive digital textbook developer acquired by NetLibrary last year, will remain unchanged. In addition, the company also announced that it has dropped its use of a proprietary format based on Folio Views in favor of the XML/HTML-based publication structure developed by the Open e-Book Forum (OeBF) consortium. Under the company's new service policy, publishers can submit any electronic file meeting OeBF standards, or NetLibrary will outsource any necessary conversion labor to another facility and act as middleman between the publisher and the conversion house..." See details in the announcement: "netLibrary Adopts Open eBook Specification Standards. Strategic decision positions the company to deliver eBooks faster, more efficiently." For OeBF references and description, see also "Open Ebook Initiative."

  • [May 04, 2001] "Athena rising? Irish typesetter seeks market for Word XML add-on." By Liora Alschuler. In Seybold Report: Analyzing Publishing Technology [ISSN: 1533-9211] Volume 1, Number 3 (May 07, 2001). Datapage is an Irish composition-services provider processing 300,000 pages a year, primarily in the scientific, medical and technical (STM) market. The XML-SGML editor the company developed over the past three years for its own use, which it calls Athena, may become available for general licensing if Datapage can find the right development and marketing partners. Athena has been tested in the field. For about two years several dozen people at Elsevier Science used it to mark up and edit journal manuscripts. However, it was recently dropped by the large journal publisher as part of its decision to outsource all manuscript editing to outside typesetting firms. While Datapage calls Athena an 'editor' and the software can generate new tagged text, its purpose in life is to take Word files and automate the insertion of markup as well as much of the routine editorial and composition processes. . . Athena works on a two-stage conversion basis, first generating a generic tagged file from the original binary, then using macros and XLISP to create SGML or XML that conforms to a specific DTD and house style. The company has developed conversions for two Elsevier DTDs and for the ISO 12083 article and book DTD. The editor incorporates MathType for equation editing and can accept TeX and LaTeX, plus any input that Word can accept. It tags the information, runs quality checks on cross-references and citations, and validates the tagged data according to the DTD..." [Athena features include: "New documents can be created or existing unstructured documents imported; Tex/Latex files can be imported into Athena; Athena can be used to generate SGML/XML for Books; No prior knowledge of SGML is required; All documents are tagged in virtually the same way, irrespective of target DTD; Once set up a high level of output can be achieved; Athena comes with a tutorial to get the beginner started; Athena includes a comprehensive help system; Athena has a built-in validator for pinpointing errors; Tables/lists/footnotes/endnotes are set up in the usual way in Word; Athena is a macro-based system; Complex sections are tagged automatically; Includes several useful bibliography-related macros; Several editorial aids; Athena can be customised to translate characters from any font; Equations are created using MathType; Equations can be translated to TeX or MathML; Custom tags can be created in Athena; SGML conversion is implemented with a programmable DTD engine; Athena provides several complete working DTD conversions; Athena can generate XML-compatible output."]

  • [May 04, 2001] "The Matrix of W3C Specifications." From the W3C "Conformance and Quality Assurance" resource section [Conformance and Quality Assurance Activity]. Maintained by Karl Dubost. For each named W3C specification, the table (matrix) includes information on the specification status (Rec, PR, CR, WD, etc) and URL for an [existing] QA Log, online Validator, Test Suites, W3C Notes / Tutorials, and Conformance section. See related resources in "XML Conformance."

  • [May 03, 2001] "WWW10: IBM Readies Web Services Workflow Proposal." By Stephen Lawson. In InfoWorld (May 03, 2001). "IBM in the next few weeks will unveil Web Services Flow Language (WSFL), a proposal for how to define workflows for Web services, the company's director of e-business standards strategy said on Thursday at the 10th International World Wide Web Conference (WWW10) here. Robert Sutor told hundreds of attendees here that now is a critical time in the development of Web services, which are functions that can be carried out over the Web through communication among both humans and machines. Examples of Web services may include matching vendors and customers or carrying out and recording transactions. . . WSFL will provide a way to describe how a series of functions will work in providing Web services, Sutor said in a keynote address on the second day of WWW10. For example, it will propose a way to define how a service will perform, which will let providers of any kind of Web service guarantee a level of performance in a standard way. Then companies looking for such a service will be able to tell how fast or reliable it will be. . . IBM's upcoming proposal for workflow is designed to follow on earlier specifications, such as the work of the Workflow Management Coalition, but take a 'clean slate' approach for the new age of services provided over the Web, Sutor said in an interview following the keynote. It is intended only as an initial proposal and IBM expects it to be refined by other companies and standards bodies."

  • [May 03, 2001] "XML Gets Nod From Net Standards Group." By Wylie Wong. From CNET (May 02, 2001). ['An Internet standards group has approved a new technology that it hopes will greatly improve the way businesses exchange information over the Web.'] "The World Wide Web Consortium (W3C) announced Tuesday that it has approved a new Web standard, called XML schemas, that make it easier to develop common vocabularies, so companies can communicate and exchange data. XML (Extensible Markup Language) is a Web standard for information exchange that proponents say will reshape communications between businesses. It not only allows companies to easily and cheaply conduct online transactions with their customers and partners, but it also delivers sound, video and other data across the Web. XML schemas are considered a big step forward for software developers because they allow businesses to better describe or interpret the information they are sending and receiving as part of a Web transaction. Programmers previously were forced to use technology called document type definitions (DTDs) to interpret vocabularies, a technique that is relatively rigid in comparison. 'On the surface, this may seem like a small change, but the impact is substantial,' said David Turner, Microsoft's senior program manager for XML technologies. Turner said XML schemas are to XML what grammar rules are to English. XML schemas can allow businesses to interpret purchase orders from different customers, for example, much the way a listener takes meaning from a sentence. XML schemas allow software developers to better manage different types of data, such as dates, numbers and other special forms of information. The DTD technology couldn't easily interpret numbers on a document, for example, which created hurdles for e-commerce companies. Because XML schemas can better handle numbers, companies can better interpret business orders they receive..." See the discussion.

April 2001

  • [April 28, 2001] "Foundation to promote Jabber IM." By Peter Galli. In eWEEK (April 26, 2001). " and the Jabber open-source project have joined forces to establish the Jabber Foundation, a not-for-profit organization that will work toward developing a Jabber-based open-source Instant Messaging and Presence standard. Jabber is an instant messaging system focused on privacy, security, ease of use and access from anywhere using any device. It is based on XML and is backed by a community of open-source developers who are building a set of common technologies for further development, including an open-source server, clients, libraries and transports. Andre Durand, a founder of Denver-based, said the foundation was necessary to breach the divide between the proprietary on the one hand and the open-source community on the other. It will be modeled on other open-source bodies such as the Apache Software Foundation and the Gnome Foundation. . . The foundation, which is in the process of being legally established, will also approach major corporations standing behind the Jabber technology to join the foundation as members, advisers and board members. A source working on establishing the foundation who requested anonymity said companies of the stature of France Telecom, Bell South, AT&T Corp., Cisco Systems Inc. and WorldCom, among others, are being invited to join. . . While the specifics of the foundation are still being finalized, it's expected that an initial or annual membership fee of about $5,000 will be required and that a five-member board of directors and an advisory board will be formed." Details in the announcement. See: "Jabber XML Protocol."

  • [April 27, 2001] "XML Catalogs." Edited by Norman Walsh. For the OASIS Entity Resolution Technical Committee. Revision date: 27-April-2001. "In order to make optimal use of the information about an XML external resource, there needs to be some interoperable way to map the information in an XML external identifier into a URI for the desired resource. This specification defines an entity catalog that handles two simple cases: (1) Mapping an external entity's public identifier and/or system identifier to an alternate URI. (2) Mapping the URI of a resource (a namespace name, stylesheet, image, etc.) to an alternate URI. Though it does not handle all issues that a combination of a complete entity manager and storage manager addresses, it simplifies both the use of multiple products in a great majority of cases and the task of processing documents on different systems..." Appendices include A: An XML Schema for the XML Catalog; B: A TREX Grammar for the XML Catalog; C: A RELAX Grammar for the XML Catalog; D: A DTD for the XML Catalog. See also the diff version from April 02, 2001.

  • [April 27, 2001] "Soapbox: Magic bullet or dud? A closer look at SOAP." By Brett McLaughlin (Enhydra strategist, Lutris Technologies). From IBM developerWorks. April 2001. ['Brett McLaughlin casts a critical eye on the Simple Object Access Protocol, assessing the value this much-discussed new technology can provide developers and demonstrating its foundation in a mixture of the old RPC (remote procedure calls) technology and in XML. Brett examines RPC, XML-RPC, RMI, and SOAP in detail, comparing and contrasting the use of each, and discusses whether SOAP makes sense. This article also includes sample code for a SOAP envelope.'] "Like almost anything that is related to XML, the Simple Object Access Protocol (SOAP) has received plenty of press lately. It may come as a surprise to you that while SOAP's window dressing is new, what's present under the hood dates back years, even decades. In this article, I cut through the hype surrounding SOAP and look at what it's supposed to be, what it actually is, and how it stacks up to similar technologies. As always with my articles, the bottom line is to determine whether this technology works for you, and here I'll try to get beyond the buzzword-mania SOAP comes with and identify the value it can bring to your applications. I'll start with a quick look at the acronym soup that makes up SOAP, including its less-than-auspicious origins in RPC (remote procedure calls), and its use of XML to solve some of RPC's early problems. Next I'll address the features SOAP brings to the table that normal XML-RPC toolkits do not deliver, and why these additions are, or aren't, important. From there, I'll go on to compare SOAP, and RPC in general, with one of its biggest competitors, remote method invocation (RMI). I'll discuss the RPC model, the RMI information-flow model, and advantages of using XML in this context. I'll also take a look at how to make SOAP work for you. Finally, I'll cover the actual practicalities versus future promises of SOAP, and whether the underlying XML is the complete answer for your communication needs, or just part of a larger equation... I hope I've demonstrated that SOAP is not the magic bullet that some people believe it to be. Even more importantly, I hope you can see that many of SOAP's 'features,' rather than being unique to SOAP, actually are parts of RPC and XML-RPC. I did identify some specific features of SOAP, such as the SOAP envelope. Is it possible to make any conclusions at this point about the value and feasibility of using SOAP in your own work? Absolutely! First, and this is nothing new, you should always keep your eye firmly on business needs, not technology needs. While SOAP is lots of fun to play with and very chic among all your geek friends, the fact is, if it doesn't offer you a way to solve your problems, it's probably going to waste a lot of time. Also, and this is an important point, it's very possible that the task for which you've chosen to use SOAP could be accomplished more easily using XML-RPC. So don't be fooled by the hype. SOAP has arrived, but it's not a stranger in a strange land. Instead, SOAP is just the big, sometimes bloated, brother of technologies that have been around for quite a while and often are easier to use. I'll see you next time, when I'll carve even deeper into SOAP and talk more about what it can do for you." Article also available in PDF format. See "Simple Object Access Protocol (SOAP)."

  • [April 27, 2001] "Web services architect, Part 2: Models for dynamic e-business." By Dan Gisolfi (Solutions Architect, IBM jStart Emerging Technologies). From IBM developerWorks. April 2001. ['Every emerging technology has to cross the chasm between innovation and acceptance. The technology adoption life cycle for Web services is no different. However, this technology does pertain to a different target audience of decision makers. Who are they? What will motivate them? Building on the vision of Dynamic e-business, this article explores the value proposition Web Service technologies offers to business entities in a variety of market segments.'] "Any sound investment in technology requires a business justification. Whether that justification reflects a new revenue channel or an efficiency improvement to the daily operation of a business, the person or persons responsible for the eventual decision to adopt a technology must be convinced that there is a strong value proposition for their business. Unlike other emerging technologies of recent years (Java, XML, Pervasive Computing), the promotion of Web services will not be focused solely on IT decision makers. The adoption of this technology is highly dependant on the roles and revenue models a business entity may decide to deploy. For this reason, line-of-business (LOB) executives will highly influence the rate and manner of adoption. In order to implement Web services, software architects will have to justify the business rationale of the Web services model to their superiors. Thus, I begin this article by trying to explain why a business would need Web services and how it would impact their business goals... Revenue justifications allow a business to reach new customers, expand existing partnerships or build new ones, and expose existing offerings to new delivery channels. This category of adoption motivators are the sweet spot for potential service providers. Although the following may not be an exhaustive list, my team has been hard pressed to come up with additions. I welcome any ideas from readers, of course. There five potential revenue models for Service Providers: transactional, membership/subscription, lease/licesnse, business partnership, and registration... As businesses seek to embrace the technologies of dynamic e-business, they must be able to associate their business with specific a SOA role. In most cases they will also need to provide a justification of for their adoption of the technology from a business perspective. In this article we declared five different SOA roles for a business and provided an two categories of business reasons to adopt the technology. In the next issue, I will discuss the nature of a dynamic e-business." See Part 1 and the "Web services Architecture Overview."

  • [April 27, 2001] "A Unified Constraint Model for XML." By Wenfei Fan, Gabriel Kuper, and Jérôme Siméon. Presentation for the 10th International World Wide Web Conference (WWW'10). "Integrity constraints are an essential part of modern schema definition languages. They are useful for semantic specification, update consistency control, query optimization, information preservation, etc. In this paper, we propose UCM, a model of integrity constraints for XML that is both simple and expressive. Because it relies on a single notion of keys and foreign keys, the UCM model is easy to use and makes formal reasoning possible. Because it relies onapowerful type system, the UCM model is expressive, capturing in a single framework the constraints found in relational databases, object- oriented schemas and XML DTDs. We study the problem of consistency of UCM constraints, the interaction between constraints and subtyping, and algorithms for implementing these constraints... XML has become the universal format for the representation and exchange of information over the Internet. In many applications, XML data is generated from legacy repositories (relational or object databases, proprietary file formats, etc.), or exported to a target application (Java applets, document management systems, etc.). In this context, integrity constraints play an essential role in preserving the original information and semantics of data. The choice of a constraint language is a sensitive one, where the main challenge is to find an optimal trade-off between expressive power (How many different kinds of constraints can be expressed?) and simplicity (Can one reason about these constraints and their properties? Can they be implemented efficiently?). The ID/IDREF mechanism of XML DTDs (Document Type Definitions) is too weak in terms of expressive power. On the other hand, XML Schema features a very powerful mechanism with three different forms of constraints, using full XPath expressions, and therefore the reasoning and implementation of XML Schema constraints has a high complexity. In this paper, we introduce UCM, a model of integrity constraints for XML. UCM relies on a single notion of keys and foreign keys, using a limited form of XPath expressions. The main idea behind UCM is a tight coupling of the integrity constraints with the schema language. This results in a model which is both simple and expressive enoughto support the classes of constraints that are most common in practice. UCM constraints are easy to manipulate in theory: we study the consistency of UCM schemas and how their constraints interact with subtyping. UCM constraints are easy to manipulate in practice: we illustrate their use with a number of examples and give simple algorithms for their implementation.. In particular, we make the following technical contributions: (1) We extend the type system of [The XML query algebra], along with the sub-typing mechanism of [Subsumption for XML Types], with a notion of keys and foreign keys. This constitutes UCM, a schema language for XML with integrity constraints. (2) We show that UCM schemas can capture relational constraints, object-oriented models (with object identity and scoped references), and the ID/IDREF mechanism of DTDs. (3) We show that, as for XML Schema, deciding consistency over full UCM schemas is a hard problem. We then propose a practical restriction over UCM schemas that guarantees consistency. This restriction is general enough to cover both the relational and object-oriented cases. (4) We propose an algorithm for propagating constraints through subtyping. This mechanism is the basis for supporting the notion of object-identity of object models within UCM schemas. (5) We present algorithms for schema validation in the presence of UCM constraints... [Conclusion:] We have proposed UCM, a schema language that supports the specification of structures, subtyping and integrity constraints for XML. UCM is simple, relying on a single notion of keys and foreign keys. UCM allows one to capture, in a unified framework, constraints commonly found in different application domains, including XML DTDs, relational and object-oriented schemas. We have also described preliminary results for the analysis of specification consistency, constraint propagation through subtyping, and schema validation. This work is a step toward an expressive yet simple schema language for XML. We are currently working on a first implementation of the algorithms presented in Section 6, based on our XML Algebra prototype. One of the objective is to obtain some more precise performance analysis. On the more theoretical side, we plan to work on the reasoning about UCM constraints, including but not limited to, questions in connection with consistency and implication." Related publications: (1) "On XML Integrity Constraints in the Presence of DTDs." By Wenfei Fan (Bell Labs and Temple University), and Leonid Libkin (University of Toronto). Paper presented at PODS 2001. Twentieth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). May 21 - 24, 2001. Santa Barbara, California, USA. With 32 references. (2) "Integrity Constraints for XML." By Wenfei Fan (Temple University), and Jérôme Siméon (Bell Labs). Pages 23-34 (with 28 references) in Proceedings of the Nineteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. PODS 2000. Dallas, Texas. May 15 - 17, 2000. [cache]

  • [April 27, 2001] "An Introduction to Prolog and RDF." By Bijan Parsia. From April 25, 2001. ['In the first of a series on creating Semantic Web applications with Prolog, Bijan Parsia introduces Prolog and its use in processing RDF.'] "Many Semantic Web advocates have gone out of their way to disassociate their visions and projects from the Artificial Intelligence moniker. No surprise, since the AI label has been the kiss of, if not death, at least scorn, since Lisp machines were frozen out of the marketplace during the great 'AI winter' of the mid-1980s. Lisp still suffers from its association with the AI label, though it does well by being connected with the actual technologies. However, it is a curious phenomenon that the AI label tends to get dropped once the problem AI researchers were studying becomes tractable to some degree and yields practical systems. Voice recognition and text-to-speech, expert systems, machine vision, text summarizers, and theorem provers are just a few examples of classic AI tech that has become part of the standard bag of tricks. The AI label tends to mark things which aren't yet implemented in a generally useful manner, often because hardware or general practices haven't yet caught up. That seems to describe the Semantic Web pretty well. I'm going to do a little down-to-earth exploration of RDF, a core Semantic Web technology, using a classic AI programming language, Prolog, plus some standard AI techniques and technologies... The root RDF data model is deliberately very minimal and, as with XML, that minimalism is intended to make things easier for programs. One consequence of that minimalism, when coupled with other machine-friendly design tropes, is that though 'human readable', RDF is not generally very human writable (although the Notation3 syntax tries to improve things.) Furthermore, while RDF's data model is specified, the processing model isn't (deliberately), so one should expect a wide variety of processors, each working in its own way, depending on a variety of constraints and desiderata. Standard Prolog provides a rich processing model which naturally subsumes RDF data. Deriving RDF triples from Prolog predicates, and then the reverse, can deepen our understanding of both. Furthermore, there is a lot of experience implementing a variety of alternative processing models (both forward and backward chaining systems, for example) in Prolog -- from the experimental toy, through the serious research project, to the industrially deployed, large-scale production system level. Furthermore, Prolog's roots in symbolic processing and language manipulation support a wide array of mechanisms for building expressive notations and languages for knowledge management, which serve well for hiding the less friendly aspects of RDF." See (1) See "Resource Description Framework (RDF)", and (2) "XML and 'The Semantic Web'."

  • [April 27, 2001] "Talks: XSLT UK 2001 Report." By Jeni Tennison. From April 25, 2001. ['Earlier this month Keble College, Oxford, England was the setting for the first ever conference dedicated to XSLT. XSLT expert Jeni Tennison reports on the proceedings.'] "April 8th and 9th 2001 saw the first conference dedicated to XSLT take place at Keble College in Oxford. While the basis of the conference was XSLT, this didn't stop people talking about the XSL effort in general or about other vocabularies and technologies that work with or against XSLT. Opening Address The conference was opened by Norm Walsh from Sun Microsystems, member of the XSL Working Group and maintainer of one of the more complex XSL applications -- the DocBook XSL family, which he talked about later in the day. Norm set the scene for the conference, reminding us of the origins of XSLT and outlining four requirements that will make XSLT and XPath as ubiquitous as XML has become: interoperable tools; cooperative specs; optimizations or compilations of stylesheets; and information set pipelines. . . The XSLT UK '01 conference was a very enjoyable opportunity to get to know the people behind the names on XSL-List and to be brought up to date with some of the advances and developments in the fields of XSLT and XSL-FO. I'm sure all who attended are looking forward to the next XSLT UK conference, whether it's held in 6 months or in a year. Many thanks are due Sebastian Rahtz and Dave Pawson for organizing it. The conference was sponsored by on-IDLE, who kept a modest low profile during the proceedings. 75% of the profits from the conference will be going to local charities in Oxford." Note the Pictures from XSLT-UK 2001 by Sebastian Rahtz, and the XSLT Conference: Oxford 2001 Slideshow from Mark Miller; conference references.

  • [April 27, 2001] "XML Q&A: XSLT Surgery." By John E. Simpson. From April 25, 2001. ['This month our question and answer columns covers XSLT issues, from using multiple languages to styling third party content.'] "...I have a source XML file which I don't control. I can't edit it. But I want to display that XML file on my site using XSLT. I know I need to add an xml-stylesheet PI to a document to associate it with an XSLT stylesheet... How can I use two different XSLT stylesheets for the same XML document?... How do I easily define many values for variables in a multi-language lexicon?"

  • [April 27, 2001] "XML-Deviant: Parsing the Atom." By Leigh Dodds. From April 25, 2001. ['Not every piece of data the XML programmer has to deal with comes neatly packaged in angle brackets. XML developers have been examining how W3C XML Schema could help out.'] "This week XML-DEV has been considering some interesting twists on XML data processing, prompted by the use of regular expressions in the W3C XML Schema specification to define complex data types. While the world may be increasingly surrounded by pointy brackets, the majority of data exchanged isn't XML, which will be true for some time to come (if not always). Yet even in situations where data has been generated as XML, there is a near infinite variety of forms which that markup can take. Hence the numerous initiatives to define horizontal and vertical XML standards; these schemas limit the acceptable forms of XML documents to those deemed suitable for particular application or business uses. However the core flexibility of XML -- the ability for anyone to quickly define and produce their own formats -- will mean that a variety of document types will evolve and coexist. There will never be a single blessed way to markup a single piece of information, just acceptable forms for particular processing contexts... it seems there may be some mileage in defining some finer-grained utilities for manipulating data within XML markup. Although regular expressions have long been a part of SGML and XML markup, as they form the basis for XML schema languages in general, it seems that their formal addition to XML Schemas opens up some additional possibilities. Perl programmers may feel justifiably smug. C developers may wish to look at Hackerlab Rx-XML, which is a regular expression matcher that processes XML Schema regular expressions. Whether the individual design decisions that have lead to complex data formats within elements and attributes are themselves questionable is a moot point. The very real situation is that there are a many varieties of data and markup to deal with, and it's always handy to have a few extra tools in the toolbox to handle them..."

  • [April 25, 2001] "Why UDDI Will Succeed, Quietly: Two Factors Push Web Services Forward." By Brent Sleeper. An Analysis Memo from The Stencil Group. April 2001. 7 pages. ['On UDDI's six-month birthday, we suggest that two aspects of the growing momentum for web services -- one economic, one technical -- will help make a success of the standard for registering and discovering web-based services. Bottom line, UDDI will succeed because its technical underpinnings work for the geeks.'] "There has not been a lot of vocal celebration, but the Universal Discovery, Description, and Integration (UDDI) standard recently turned six months old. Perhaps the Nasdaq's seeming freefall is causing commentators to finally turn a more skeptical eye towards the acronym-heavy technologies of the sector, but we argue that a softening economy, combined with growing support for one of UDDI's key technological foundations, actually bodes well for the initiative's success. The standard's big-name backers (chief among them, IBM, Microsoft, and Ariba) ensure that UDDI will receive a large share of attention, and we doubt any of our readers are completely unfamiliar with it. Still, web services, and UDDI in particular, are only recently generating substantive discussion among leaders in the e-business community. Whether this lack of vocal attention is owed to its seemingly simple objective or to a healthy dose of skepticism for vendor-sponsored standards, we are not sure, but significant momentum is growing, tortoise-like, behind the scenes. From an initial group of 36, more than 175 companies have now endorsed the initiative by contractually agreeing to 'support the future development of UDDI.' The group ranges from drivers of major industries like Boeing and Ford to a prominent cast of technology vendors. Notably, membership in the consortium is crossing political boundaries as competitors like Ariba and Commerce One, and Microsoft and Sun Microsystems all become involved with the UDDI project. None of UDDI's fundamental benefits has changed radically in the past six months, yet solutions for developing and supporting web services are being announced with increasing frequency. Cynics among us might wonder what-- beyond the incessant PR machines in Redmond and Silicon Valley-- is driving the recent flurry of activity. So, why are so many companies warming to such a seemingly ordinary protocol? We suggest that two major factors will drive UDDI's success: (1) Business and economic conditions are right for tackling the problems that UDDI solves (2) Very smart decisions drove the standard's technology underpinnings... Bottom line, UDDI will succeed because its technical underpinnings work for the geeks, and the geeks will use SOAP, UDDI, and other layers of the emerging web services stack to bridge a wide range of heterogeneous collaboration, supply chain, and EAI solutions. These bridges will make good on B2B e-commerce's promise to help companies trade and make products more efficiently than ever before." See "Universal Description, Discovery, and Integration (UDDI)."

  • [April 25, 2001] "Standards Required to Support XML-Based B2B Integration. A conceptual model for understanding XML convergence." April 2001. From RosettaNet. ['White paper describing Rosettanet's conceptual model for understanding XML convergence'] "Companies across all industries are realizing the fundamental benefits of using the Internet to integrate their supply chains. The potential to reduce inventory, improve time -to-market, reduce transaction costs and conduct business with a broader network of supply chain partners has direct, measurable benefits to a company's bottom line. Because of the benefits that result from supply chain integration, companies are exploring open, XML-based standards that help remove the formidable barriers associated with developing a common business language and process methodology for Internet-based collaboration, communication and commerce. Many private companies and industry organizations today are proposing a wide array of standards for creating this common e-business language -- so many, in fact, that it is becoming increasingly difficult to differentiate among the multitude of vertical and horizontal industry standards. There is currently no way of easily identifying the e-business standards challenge each standards organization strives to resolve. Perhaps more important, there is no mechanism for standards bodies to identify where their efforts may be duplicative and where they may be complementary. RosettaNet, an industry consortium and standards organization representing the needs of the Information Technology (IT), Electronic Components (EC), and Semiconductor Manufacturing (SM) industries, has surveyed the XML-related standards space and, as a service to the industry, has developed a conceptual model that enables the comparison of horizontal (universal) and vertical (supply chain- or business model-specific ) XML standards efforts. Using a model that identifies nine distinct components required to provide a total e-business process, RosettaNet's goal is to bring clarity to various industry efforts. It is possible to identify efforts that are complementary as well as areas where possible overlap -- and thus convergence opportunities exist. The conceptual model was developed with the input from many industry and technology organizations and respected thought leaders. RosettaNet acknowledges that there may be differing views or alternative perspectives to the model...Supply chain integration requires both horizontal and vertical XML standards in order to support both business process complexity as well as interoperability goals between supply chains. RosettaNet has developed a conceptual model for identifying the components of business process that allows for the direct comparison of all XML-based standards. Although many XML standards initiatives are complementary, the sheer number of standards initiatives has created confusion among end users. RosettaNet will continue to play a role in several components of the e-business process, but expects to converge efforts with other horizontal standards organizations." See "RosettaNet." [source]

  • [April 24, 2001] "Fuzzy Data: XML May Handle It." By Ralf Schweiger, Simon Hölzer, and Joachim Dudeck (Institute of Medical Informatics, Justus-Liebig-University Giessen, Germany). July 2000. "Data modeling is one of the most difficult tasks in application engineering. The engineer must be aware of the use cases and the required application services and at a certain point of time he has to fix the data model which forms the base for the application services. However, once the data model has been fixed it is difficult to consider changing needs. This might be a problem in specific domains, which are as dynamic as the healthcare domain. With fuzzy data we address all those data that are difficult to organize in a single database. In this paper we discuss a gradual and pragmatic approach that uses the XML technology to conquer more model flexibility. XML may provide the clue between unstructured text data and structured database solutions and shift the paradigm from 'organizing the data along a given model' towards 'organizing the data along user requirements'... For the description of the document model we prefer an XML schema definition over a Document Type Definition (DTD). The XML schema approach supports more abstraction concepts thus allowing a higher reuse of definitions. This shall be illustrated at the PersonType definition. The PersonType defines an element type respectively an element class, which is reused for the sender and the receiver of a report. The XML schema consequently supports the abstraction concept of 'classification' which can be already found in the DTD (Document Type Definition) approach. Furthermore, the XML schema supports a derivation concept that has no counterpart in the DTD. The PatientType e.g. is derived from the PersonType by extension, i.e., the PersonType is a generalization of the PatientType. Abstraction concepts such as classification and generalization allow a high reuse of definitions... Inflexible data models can lead to low acceptance of application systems in domains which are as dynamic as the healthcare domain. Our approach to this problem is to let the user requirements change the model description, i.e., to adjust the structure to the data instead of adjusting the data to the structure. The key concept of our solution is to make the structure, i.e., the XML schema definition as modifiable as the data itself. Necessary adjustments of the user front & storage back end are managed automatically. The XML technology provides a proper means in terms of XML Schema, XSLT, XPath and DOM to implement such an approach." See also: (1) Schweiger/Burkle/Holzer/Dudeck: "XML Structured Clinical Information: A Practical Example," Stud Health Technol Inform 77 (2000), pages 822-826. and (2) the HL7-XML Web Site at the Institute for Medical Informatics, University of Giessen; (3) "Health Level Seven XML Patient Record Architecture."

  • [April 24, 2001] Regular Expression Types for XML. By Haruo Hosoya. 2001. PhD Thesis presented to the Graduate School of the University of Tokyo, December 1, 2000. Thesis Supervisor: Akinori Yonezawa. 94 pages. ['This PhD thesis is an outcome of the XDuce project.'] Abstract: "XML is a simple, exible, and generic format for tree-structured data. XML also allows us to write a schema to define a sublanguage of XML, which can be used as a data exchange format for each individual application. As XML has rapidly become adopted for various next-generation Web applications, a need is emerging for a better programming language support to help with XML-based software development|in particular, (1) static analyses capable of guaranteeing that generated trees conform to an appropriate schema; and (2) convenient programming constructs for tree manipulation. This thesis explores two new core features for XML processing languages: regular expression types and regular expression pattern matching. Regular expression types capture and generalize the regular expression notations such as repetition (*), alternation (|), etc., that are commonly found in schema languages for XML, and support a 'semantic' notion of subtyping. We argue that the exibility provided by this form of subtyping is necessary to support smooth evolution of XML-based systems. Regular expression pattern matching is an extension of conventional pattern-matching facilities (as in functional programming languages) with repetition and alternation operators. This form of pattern can match arbitrarily long sequences of subtrees, allowing compact expressions to directly jump over to an arbitrary position in a sequence and extract data from there. We present an experimental programming language called XDuce ('transduce') as an example of languages using these two features, and we prove type soundness of the core part of XDuce language. The thesis also explores algorithmic questions arising from implementation of the above features. The subtyping problem can easily be reduced to the language inclusion problem between regular tree automata, which is known to be EXPTIME-complete. In order to deal with this high complexity, we develop an algorithm that runs efficiently in practice. We start with Aiken and Murphy's set-constraint solving algorithm. Our additions to their work are (1) correctness proofs, and (2) implementation techniques that perform well on typical cases arising from XML processing, and (3) preliminary experiments. The results of the experiments indicate that our algorithm can compute the subtype relation with acceptable efficiency on a small suite of fairly realistic examples. When incorporating regular expression patterns in a typed programming language, it is important for the compiler to automatically compute types of bound variables in patterns in order to avoid requiring excessive type annotations by the user. For this purpose, we develop a type inference scheme that propagates type constraints to pattern variables from the surrounding context. The type inference algorithm translates types and patterns into regular tree automata and then works in terms of standard closure operations (union, intersection, and difference) on tree automata. The main technical challenge is dealing with the interaction of repetition and alternation patterns with the first-match policy (where patterns appearing earlier are given higher priority), which makes it difficult to guarantee both the termination and the precision of the analysis. We solve this problem by introducing a data structure representing closure operations lazily." [cache]

  • [April 24, 2001] "Sun IPR statement on XPointer." [Submitted to W3C by Eve L. Maler.] Posted to XML-DEV 2001-04-24. Excerpt: "This policy shall apply only to Sun's Essential Patent Claims (as defined below) that read on the XML Pointer Language specification ('XPointer'). This undertaking is valid only as long as XPointer either is on the Recommendation track or has been adopted as a Recommendation. Sun Microsystems, Inc. ('Sun') agrees it will grant royalty-free licenses, under reasonable terms and conditions, and on a non-discriminatory basis, under Sun's essential patent claims to make, use, sell, offer for sale, or import implementations of XPointer. One precondition of any license granted to a party shall be the party's agreement to grant royalty-free licenses to Sun and other companies to make, use, sell, offer for sale, or import implementations of XPointer under the party's essential patent claims. Sun expressly reserves all other rights it may have..." (see the remainder). The text of the earlier patent statement is available in the W3C mailing list archives for 'www-xml-linking-comments'. On XLink/XPointer, see "XML Linking Language." [source]

  • [April 24, 2001] "Media Type for Resource Description Framework (RDF)." By Aaron Swartz. [Reference: 'Not-Yet-Internet-Draft'. Post to RDF Interest list: "I've taken Dan Connolly's rough draft and tried to put it together into a media type proposal..."] Abstract: "This memorandum describes a media type (application/rdf+xml) for use with the XML serialization of the Resource Description Framework (RDF). RDF is currently used for semantically-meaningful data on the World Wide Web, and is meant to help bring about the creation of a "Semantic Web" with semantically-meaningful information which machines are better able to process." Details: "The World Wide Web Consortium has issued Resource Description Framework (RDF) Model and Syntax Specification. To enable the exchange of RDF network entities, serialized using XML, this document registers the application/rdf+xml media type. Resource Description Framework (RDF) is a foundation for transmitting and processing semantically-meaningful data on the World Wide Web. It emphasizes facilities to enable automated processing of Web resources. While the RDF model can be serialized in a number of different ways, the media type registered by this document only deals with the XML serialization. Future registrations are expected to deal with alternate serializations. Because RDF is a format for semantically-meaningful information, it is important to note that transmission of RDF via HTTP, SMTP or some similar protocol, means that the sender asserts the content of the RDF document..." See also: XML Media/MIME Types."

  • [April 24, 2001] "[IBM's] XML DOM & SAX poster. Beta of 2nd Edition." By Nancy Dunn (INM developerWorks XML zone editor). From IBM developerWorks. April 2001. ['Preview this handy reference chart, find a bug first and win a book... Download the PDF of the beta version of developerWorks' forthcoming DOM and SAX reference poster. The poster illustrates classes and methods of DOM and SAX level 2. If you're one of the first 15 people to report a bug, you'll earn a book on XML development from Wrox.'] "More than a year ago, IBM developerWorks published a handy 22-by-38-inch poster that shows the classes and methods for the XML Document Object Model (DOM) specification of the W3C and the Simple API for XML (SAX) spec devised by Dave Megginson. Both the DOM and SAX have evolved since then, and we're about to publish a new edition of the poster. But first we're making available the beta of the poster (in PDF form) on the XML zone as a preview. Find a bug, win a book. The idea is to give the community a chance to participate in the review of the document: If you're among the first people to report a bug on the linked form, we'll reward you with a copy of one of three XML books from Wrox. The first 15 people to report a mistake in the poster will receive a copy of either Professional XML Databases by Kevin Williams and nine coauthors, XSLT Programmer's Reference by Michael Kay, or Professional Java XML Programming by Alexander Nakhimovsky and Tom Myers, all courtesy of the publisher, Wrox. Find out more about the Wrox books about XML on the Wrox site..."

  • [April 24, 2001] "XML Standards Components and Convergence: A RosettaNet Perspective." From RosettaNet. April 24, 2001. PowerPoint Presentation (35 slides). ['A RosettaNet industry perspective on both horizontal and vertical XML initiatives, including standards components and convergence.'] "RosettaNet has provided an industry perspective on both horizontal and vertical XML initiatives, including standards components and convergence. In its continuing leadership role, RosettaNet has attempted to clearly define the current state of e-business standards development and provide a context for the various industry efforts underway. Although many of the XML initiatives today are complementary, the sheer number of XML standards efforts is leading to confusion among implementers and key decision makers alike. To provide a clearer understanding in the industry, RosettaNet has developed a conceptual model for defining the layers of XML standards required to support B2B integration between trading partners across supply chains. To strengthen RosettaNet's perspective, the conceptual model was developed using input from a diverse group of respected business and technology leaders, both inside and outside the RosettaNet community. In support of its ongoing convergence goals, RosettaNet has also highlighted a number of horizontal XML standards and identified the specific layers that the respective standards efforts are focused on. Using the conceptual model and snapshot of various XML initiatives, it becomes easier to identify efforts that are complementary as well as areas where possible overlap -- and thus convergence opportunities - exist." See "RosettaNet." [source]

  • [April 24, 2001] "Extensible Markup Language. XML's Tower Of Babel. Industry-specific dialects may sound good, but they can create more confusion than communication." By Lenny Liebmann. In InternetWeek (April 24, 2001). "The original promise of Extensible Markup Language was to offer an easier way for companies to exchange data with customers and partners independent of their specific database platforms or architectures. But the proliferation of industry-specific XML dialects may create more confusion than communication as IT managers try to build B2B systems. The problem with so many schema popping up so quickly is that there's too much variation in how they define common data fields such as 'company name' or 'address.' Many have overlapping purposes, or are aimed at overly narrow functions. That makes it hard for any of them to gain critical mass. 'Most of these schema will be dead in two years,' says Gartner Group analyst Rita Knox, noting that the specificity of first-generation XML schema is exactly what will spell their demise. 'It's like giving a person 100 sentences to use for the day. That's not really how you communicate.' The best approach for companies is to stay pragmatic when evaluating the business benefits of industry-specific XML dialects. Sometimes, the flexibility of a company's XML middleware may be more important than any individual schema, because it's middleware that ultimately lets companies quickly translate and implement any given flavor of XML. Also, some companies, such as Airborne Logistics Services in Seattle, serve customers in different markets. It's unlikely that Airborne's diverse customer base will be willing to accommodate an XML dialect created by the transportation industry to make life easier for shipping and logistics providers. It's also improbable that the XML standards for the various markets Airborne serves will each include support for the specific types of inventory management and outsourced logistics services that Airborne offers. 'We're driven by what customers need today,' says Edward Pius, application architect at Airborne. 'Right now, industry-specific XML standards aren't really part of those requirements.' Instead, Pius and his team are crafting individual XML document-type definitions (DTDs) based on the specific requirements of each customer. Airborne customers that want to automate data exchange via XML usually have a clear idea of what they want and how they want it done, Pius says. Airborne's job is to accommodate those specifications as quickly and effectively as possible... Another XML dialect that shows some promise is the Research Information eXchange Markup Language (RIXML), a standard being developed by a consortium of top financial companies to bring some rationality to the huge volume of filings, research reports and other documents generated by the financial sector. 'Something on the order of 800,000 documents are being produced every year,' says Ellen Callahan, director of equity market data at Fidelity Investments and cochair of the standards committee. 'That's obviously an overwhelming volume of information to sort through for buyers.' By creating standardized document definitions, the RIXML group wants to help buyers pinpoint documents relating to a particular industry, company or topic. Producers of research, especially niche players, should find it easier to market their products, since they will be able to more clearly identify what exactly it is that they're selling. But questions remain. Will the 'infomediaries' that currently add value by helping buyers sort through the plethora of information sources now on the market lose their appeal once easier mechanisms are created? Will corporate IT departments that are already coping with a wide range of e-business implementation issues want to avoid the hassle of implementing RIXML apps and let infomediaries use RIXML to deliver even more sophisticated retrieval services? The emergence of a standard like RIXML raises other questions about the seemingly ceaseless growth of purpose-specific schema. Does the business world really need a specific standard for financial research documents, as opposed to medical or demographic research documents? Don't buyers of information want to pull from a variety of sources and not simply from those that are compliant with a very narrowly defined standard? The redundancies and overlaps among XML dialects will eventually lead to more rational solutions, says Gartner Group's Knox, possibly including the emergence of centralized XML 'dictionaries' that will help companies translate between multiple schema..."

  • [April 24, 2001] "XML Schema Catches Heat." By Roberta Holland. In eWEEK (April 23, 2001). "After more than two years of development, the World Wide Web Consortium could be only weeks away from releasing its long-awaited XML Schema specification. But despite its release, the specification, which is designed to automate data exchange between companies, is coming under fire. Now in the final review phase by W3C Director Tim Berners-Lee, the specification, according to critics, is far too complex -- so complex that it has driven several XML experts to create alternative and lighter-weight schemas. Furthermore, some W3C insiders are even calling for future versions to be incompatible with this first release so as not to repeat what they say are the flaws of the first version. 'There has been controversy,' said Tim Bray, co-author of the W3C's Extensible Markup Language specification, in Vancouver, British Columbia. 'XML Schema is a very large project. The working group is a very large body that has been very visible. All the major vendors are on it. As a result, the [specification] tends to have a lot of compromises in it.' XML Schema has been one of the most watched standards efforts of late. The schema expresses shared vocabularies and defines the structure and content of XML documents. XML Schema is expected to make data exchange among businesses cheaper and easier than what is possible using Document Type Definitions. The comment period for XML Schema ended last week. The spec now lies with Berners-Lee, who will determine if any technical issues raised should prevent its release. W3C officials expect a decision within weeks.... Trex, which Clark submitted to the Organization for the Advancement of Structured Information Standards, is simpler and more modular, as it focuses just on the validation of XML documents. A similar effort, Relax, was started late last year by schema working group member Makoto Murata. Murata, who works with the International University of Japan Research Institute and IBM in Tokyo, also dissatisfied with the W3C's direction, said the schema group was focused more on benefits to vendors than on the technology, unlike the original XML working group. Clark and Murata recently merged their efforts under OASIS, in Billerica, Mass. Clark hopes they can produce a first draft in two to three months. Another alternative, called Schematron, was started in October 1999 by working group member Rick Jelliffe, who represented Academia Sinica Computing Centre, in Taipei, Taiwan, until this month. The most positive change 'is a widespread realization that XML Schema will not be the universal and terminal schema language,' said Jelliffe, now CTO of Topologi Pty. Ltd., in Sydney, Australia. 'I think if we can hose down people's expectations and mindshare-grabbing marketing, XML Schema will be successful.' [...] Despite the controversies, many are supporting XML Schema, including Microsoft, IBM and Oracle Corp. Micro soft last week announced a technical preview of its XML parser supporting schema. The Redmond, Wash., company also will include XML Schema in the second beta version of Visual Studio.Net, which will be given to attendees at Microsoft's TechEd conference in June." For schema description and references, see "XML Schemas."

  • [April 23, 2001] "Microsoft, Hyperion Preach XML-based OLAP Querying Specification." By Tom Sullivan. In InfoWorld (April 23, 2001). "Microsoft and Hyperion Solutions have teamed up on the Open XML for Analysis specification, officials from the two companies announced on Monday at Hyperion's Solutions 2001 conference in Orlando, Fla. The specification, according to the companies, will enable client-side, Web-based BI (business intelligence) applications to query OLAP (online analytical processing) servers from Microsoft, Hyperion, and any other vendors supporting the specification, without having to use several APIs. Right now, there is no standard language for accessing OLAP cubes, according to Mark Shainman, an analyst at Meta Group in Stamford, Conn. Vendors such as SAS Institute, Cognos, and Brio have proprietary methods for interacting with OLAP cubes... The Open XML for Analysis specification has the potential to reduce the number of languages that programmers typically have to write to for reaching OLAP cubes from five or six to just one, Shainman said... Currently, the specification has the support of several front-end BI vendors, including AlphaBlox Software, Brio Technology, Business Objects, Cognos, Crystal Decisions, Knosys, MicroStrategy, and SAP. . . In May, Microsoft will post an SDK (software development kit) on its Web site as an add-on to the SQL Server 2000 database, Eng said. The technology is likely to become incorporated into the vendors' offerings, rather than a standalone product, Hyperion's Gersten said. 'It will become a key part of our OLAP access technologies,' he continued. The companies said that they plan to submit the specification, which is currently available on their respective Web pages, to a standards body in approximately six months, although they have yet to decide which one." See the announcement and the main entry, "XML for Analysis."

  • [April 21, 2001] "Arc - An OAI Service Provider for Digital Library Federation." By Xiaoming Liu, Kurt Maly, and Mohammad Zubair (Old Dominion University, Norfolk, Virginia USA) and Michael L. Nelson (NASA Langley Research Center Hampton, Virginia USA). In D-Lib Magazine [ISSN: 1082-9873] Volume 7, Number 4 (April, 2001). "The usefulness of the many on-line journals and scientific digital libraries that exist today is limited by the inability to federate these resources through a unified interface. The Open Archive Initiative (OAI) is one major effort to address technical interoperability among distributed archives. The objective of OAI is to develop a framework to facilitate the discovery of content in distributed archives. In this paper, we describe our experience and lessons learned in building Arc, the first federated searching service based on the OAI protocol. Arc harvests metadata from several OAI compliant archives, normalizes them, and stores them in a search service based on a relational database (MySQL or Oracle). At present we have over 320,000 metadata records from 18 data providers from various subject domains. We have also implemented an OAI layer over Arc, thus making hierarchical harvesting possible. The experiences described within should be applicable to others who seek to build an OAI service provider... Bulk harvesting is ideal because of its simplicity for both the service provider and data provider. It collects the entire data set through a single http connection, thus avoiding a great deal of network traffic. However, bulk harvesting has two problems. First, the data provider may not implement the resumptionToken flow control mechanism of the OAI metadata harvesting protocol, and thus may not be able to correctly process large (but partial) data requests. Secondly, XML syntax errors and character-encoding problems -- these were surprisingly common -- can invalidate entire large data sets... During the testing of data harvesting from OAI data providers, numerous problems were found. We discovered that not all archives strictly follow the OAI protocol; many have XML syntax and encoding problems; and some data providers are periodically unavailable. Many OAI responses were not well-formatted XML files. Sometimes foreign language and other special characters were not correctly encoded. XML syntax errors and character-encoding problems were surprisingly common and could invalidate entire large data sets. Incremental harvesting proved beneficial as a work-around. The OAI website validates registered data providers for protocol compliance. It uses XML schemas to verify the standard conformance. However, this verification is not complete; it does not cover the entire harvesting scenario and does not verify the entire data set. Additionally, such verification cannot detect semantic errors in the protocol implementation, such as misunderstanding of DC fields. For certain XML encoding errors, an XML parser can help avoid common syntax and encoding errors... The contribution of Arc is to prove not only that an OAI-compliant service provider can be built, but also that one can be built at a scale previously unrealized within the e-print community. The Open Archives Initiative has been successful in getting data providers to adopt the protocol and provide an OAI layer to their repositories..." See: "Open Archives Metadata Set (OAMS)."

  • [April 21, 2001] "[DRAFT] Namespace Policy for the Dublin Core Metadata Initiative (DCMI)." By Stuart Weibel, Tom Baker, Tod Matola, and Eric Miller. 2001-03-09. Latest version: "The use of XML namespaces [XML-NAMES] for formal, machine-processable declarations of metadata entities is a convention intended to support web-addressable concepts that can be shared across applications, and hence promote the possibility of shared semantics. DCMI adopts this convention for the identification of all DCMI terms. This document identifies the policies and procedures associated with the naming of existing DCMI terms and those that will be defined in the future... The namespace of the Dublin Core metadata element set (version 1.1) is: Declarations for individual elements are constructed by adding entity_name, for example: is the web-addressable identifier for one of the 15 elements of the Dublin Core metadata element set. Each of the 15 elements can be so identified. Stability of namespace identifiers for metadata terms is critical to interoperability over time. Thus, the wide promulgation of this set of identifiers dictates that they be maintained to support legacy applications that have adopted them... Official declarations for all additional metadata packages approved by DCMI, including additional elements, qualifiers and domain specific metadata packages, are specified as[ term_name]." See: "Dublin Core Metadata Initiative (DCMI)."

  • [April 21, 2001] "Dublin Core Metadata Initiative." By Stuart Weibel (Executive Director of the Dublin Core Metadata Initiative). Presentation given at FAO [United Nation's Food and Agriculture Organization] in Rome on 5-6 April, 2001. Presentation covers: Introduction to Metadata; Dublin Core Metadata Initiative; Metadata Registries; Syntax Alternatives for Web Metadata; A Few Strategic Applications. Also available in .PPT format. See: "Dublin Core Metadata Initiative (DCMI)."

  • [April 20, 2001] "XML Topic Maps: Finding Aids for the Web." By Michel Biezunski (InfoLoom) and Steven R. Newcomb (Coolheads Consulting). In IEEE MultiMedia Volume 8, Number 2 (April-June 2001), pages 104-108. "Topic maps superimpose an external layer that describes the nature of the knowledge represented in the information resources. There are no limitations on the kinds of information that can be characterized by topic maps. The purpose of the Extensible Markup Language topic maps (XTM) initiative is to apply the topic maps paradigm in the context of the World Wide Web. Finding information: In a world of infoglut, it's becoming a real challenge to find desired information. Hiding irrelevant information is most effectively and accurately done on the basis of categories, but there's a number of ways to categorize the contents of any corpus, and each system of categorization represents only one particular worldview. Information users shouldn't be forced to use a single ontology, taxonomy, glossary, namespace, or other implicit worldview. On the Web, we should federate and exploit different worldviews simultaneously, even if those worldviews are cognitively incompatible with each other. Finding information -- metadata that helps information seekers to find other information -- is often too valuable to limit its exploitability to a single closed or proprietary environment. Finding information should be application- and vendor-neutral, so that users can freely exploit it in many ways and contexts. The topic map paradigm provides a solution for interchanging and federating finding information that diverse sources produce and maintain according to different worldviews. What's a topic map? A topic map is a representation of information used to describe and navigate information objects. The topic maps paradigm requires topic map authors to think in terms of topics (subjects, topics of conversation, specific notions, ideas, or concepts), and to associate various kinds of information with specific topics. A topic map is an unobtrusive superimposed layer, external to the information objects it makes findable. The findability of a given information object, (that is, the ease with which it can be found) has two aspects: (1) The ease with which a list of information objects that is guaranteed to include the information object can be created by means of some query, and (2) The brevity of that list. The shorter the list, the easier it is for a human being to find the desired information object within the list. A topic map can act as a kind of glue between disparate information objects, allowing all of the objects relevant to a specific concept to be associated with one another. Topic maps are metadata that need not be inside the information they describe. Interchangeable versus application-internal topic maps: Topic maps take two forms: interchangeable topic maps that are XML or SGML documents, and directly usable topic map graphs that are the application-internal result of processing interchangeable topic maps. Topic map graphs are abstractly described in terms of nodes and arcs. Topic maps can be formatted as specific kinds of finding aids: indexes, glossaries, thesauri, and so on. We sometimes regard formatted finding aids as a third form of a topic map, but this isn't strictly true, because such finding aids cannot necessarily contain or reflect all of the information present in the topic maps from which they were derived... There's some overlap between topic maps and the Resource Description Framework (RDF, specification. Both standards aim to represent connections between information objects and can encode metadata, among other things. At the Graphic Communcations Association (GCA) XML 2000 Conference where the publication of the XTM 1.0 Core Deliverables was announced, Tim Berners-Lee, the Director of the World Wide Web Consortium (W3C) proposed in his presentation on the semantic Web that there should be a convergence between RDF and topic maps. The implications of such a convergence will include benefits for DARPA's Agent Markup Language (DAML) initiative, as well as with the ontology interface layer (OIL)... The central notions of topic maps will play increasingly significant roles in future generations of Web technology, because the severity of the infoglut problem is only going to increase. The Topic Maps paradigm is designed not only to accommodate diversity; it preserves, cherishes, and leverages diversity in the conquest of infoglut. Whenever a new vocabulary, ontology, and so on appears, it need not be regarded as evidence suggesting that the dream of global knowledge interchange can't be realized. On the contrary, it's cause for hope, because of the knowledge-federating, diversity-leveraging power of topic maps. No great difficulty is posed by the need to welcome yet another community of interest into the global community of communities of interest. Communities of interest are defined by their worldviews, and whenever a community of interest rigorously exposes its worldview in a fashion that permits its knowledge to be federated with the worldviews and knowledge of other communities, the whole human family is enriched." See: "(XML) Topic Maps." [copyright notice]

  • [April 20, 2001] "Taxonomy of XML Schema Languages Using Formal Language Theory." By Murata Makoto (IBM Tokyo Research Labs), Dongwon Lee (UCLA / CSD), and Murali Mani (UCLA / CSD). 24 pages. April, 2001. "On the basis of regular tree languages we present a mathematical framework for XML schema languages. This framework helps to formally describe, compare, and implement such XML schema languages. Our main results are as follows: (1) Four subclasses of regular tree languages: local tree languages, single-type tree languages, restrained competition tree languages, and regular tree languages. (2) A classification and comparison of a few XML schema proposals and type systems:DTD, XML-Schema, DSD, XDuce, RELAX, and TREX (3) Properties of the grammar classes under two common operations: XML document validation and type assignment... As the popularity of XML increases substantially, the importance of XML schema language to describe the structure and semantics of XML documents also increases. Although there have been about a dozen XML schema language proposals made recently, no comprehensive mathematical analysis of such schema proposals has been available. We believe that providing a framework in abstract mathematical terms is important to understand various aspects of XML schema languages and to facilitate their efficient implementations. Towards this goal, in this paper, we propose to use formal language theory, especially tree grammar theory, as such a framework. Given an XML document and its schema, suppose one wants to check whether the document is valid against the schema and further find out the types (or non-terminals) associated with each element in the document. We are interested in algorithms for such document validation and type assignment operations, and the time complexity of such algorithms. Furthermore, we would like to know if such algorithms can be implemented on top of SAX or rather require DOM. Such issues are closely related with the efficient implementation of XML schema language proposals, and are directly addressed by our mathematical framework... [Conclusion:] A mathematical framework using formal language theory to compare various XML schema languages is presented. This framework enables us to define various subclasses of regular tree languages, and study the closure properties and expressive power of these languages. Also, algorithms for document validation and type assignment for the different grammar classes are described. Finally, various schema language proposals are compared using our framework, and the implementations available are discussed. Our framework brings forward a very important question: Do we need to migrate from deterministic content models of XML 1.0 in favor of schema languages such as RELAX and TREX that allow non-deterministic content models? Our work in this paper as well as other work, with regard to document processing and XML query, makes us believe that we should allow non-deterministic content models in XML schema languages. We have multiple directions for future research which we are pursuing presently. We are examining ambiguity in regular tree grammars and languages, and studying how to determine whether a given regular tree grammar is ambiguous or not. We are also examining integrity constraints necessary for a schema language as studied widely in the area of database systems, and examining efficient implementations of these constraints for XML applications." [Note the comment posted: "I hope that this paper (submitted to Extreme) helps to understand validation algorithms for schema languages such as RELAX and TREX. You might have seen earlier versions of this paper, but this is away more readable... In my understanding, Algorithm 4.1 shown in this paper is similar to the algorithms of PyTREX, VBRELAX, and XDuce. The algorithm of RELAX Verifier for Java is based on Algorithm 5, but is more advanced. The algorithm of JTREX is more advanced than Algorithm 5 in that it constructs tree automata lazily. In the final version, I will try to add more information about JTREX."] References for related papers are given on Murali Mani's web site. For schema description and references, see "XML Schemas." [cache]

  • [April 20, 2001] "Ipedo XML Database." Company white paper. April, 2001. Excerpts: "The Ipedo XML Database stores XML data natively in its structured, hierarchical form. Queries can be resolved much faster because there is no need to map the XML data tree structure to tables. This preserves the hierarchy of the data and increases performance. Working along side a relational database and file system, a native XML database adds flexibility and speed. Ipedo's sophisticated user-defined indexing allows the database to directly address any node or element of an XML document. This indexing method allows for much finer-grained access to XML document information. Integrated XML Query and Translation: The Ipedo XML Database can be queried directly using the W3C's XPath query language that was designed to retrieve information directly from XML documents. The result of an effort to provide a common syntax and semantics for functionality shared between Extensible Stylesheet Language Transformations (XSLT) and XPointer, XPath allows you to address parts of a document. It also provides basic facilities for manipulation of strings, numbers and booleans. By combining an XPath query with an XSL transformation, the Ipedo XML Database allows data access and XML transformation to be completed in one step. The result is a fast query that skips time-consuming steps and a result document already in the desired output format. Performance Caching and In-Memory Database Processing: Ipedo's XML Database is built using unique in-memory database architecture. Running in the in-memory mode, the entire database is mapped directly into virtual address space in main memory, eliminating disk access to retrieve data. This model provides direct, high-speed access to data, significantly reducing disk I/O while increasing performance. Running Ipedo's XML Database in this mode provides data in real-time, and allows developers to focus their efforts on minimizing the number of program instructions required to perform data-base functions, instead of worrying about minimizing disk operations. Ipedo's XML Database includes Hot Indexing technology, which allows administrators to make memory versus performance trade-offs. Ipedo's Hot Indexing distinguishes heavily used data and loads those indexes into memory for faster access. Hot Indexing adds flexibility to in-memory data processing to yield an order of magnitude better performance than conventional systems... Ipedo, Inc. provides high-performance data delivery and management products optimized for Internet and wireless applications. Based on its Active Edge performance technology, Ipedo's products include directory, caching, and XML database servers. Ipedo's products provide rapid personalization, instant delivery, and scalable data access to very large user populations, ideal for ASPs, ISPs, Web portals, B2B exchanges, wireless services, and next-generation Internet telephony." Comments from the XML-DEV post of Samantha Cichon: "Ipedo, Inc. for serious testers, who can offer constructive feedback, to participate in our beta program for our XML database. The Ipedo XML Database is a native XML database with fast XSLT processing, geared to optimize your system's performance. Featured in this release are: (1) Support for SOAP and HTTP Servlet; (2) Query through XPath's direct access to document; (3) Integrated XML Transformation through XSLT; (4) Persistent DOM allows access to the document object model after it has been loaded; (5) Data pre-indexed for faster queries; (6) Schema-based dynamic indexing." See: "XML and Databases."

  • [April 20, 2001] "XML projects in Japan and Fujitsu's approach to XLink/XPointer." By Toshimitsu Suzuki and Masatomo Goto (Fujitsu Labs Ltd, Nakahara Ku, Kawasaki, Kanagawa 211, Japan). In Fujitsu Scientific and Technical Journal Volume 36, Number 2 (December 2000), pages 175-184 (with 46 references). [Special Issue on Information Technologies in the Internet Era.] "The Extensible Markup Language (XML) is a markup language developed in response to a recommendation by the World Wide Web Consortium (W3C). It is a meta language used to make an information structure. XML's original specification is the Standard Generalized Markup Language (SGML). Now, XML is used not only as a format language but also as a framework in various areas beyond the SGML field. This paper describes the XML technology trends, the current state of XML technology, and some case studies in Japan and at Fujitsu. The paper also describes HyBrick, which is an XML/SGML browser that was demonstrated at SGML'97, and the XLink/XPointer technology... XML has the dual role of being a document notation and a data exchange format. As described in this paper, XML is mainly used as a data exchange format to access the same data from different kinds of platforms and applications and to execute processing based on this data. New hyperlink applications will conform to this trend. Currently, XSL is popular as a data exchange format, but if it is standardized, its use for document notation or as a substitute for HTML will expand. However, in the current situation, basic HTML technology for hyperlinks remains essential, and XLink and XPointer will become more important in the foreseeable future. Lastly, a few words about Minimal XML. Minimal XML is especially aimed at data exchange and features simple and fast parse processing to eliminate attributes and entity references. Keep an eye on the development of this protocol because it is suitable for use in Electronic Data Interchange (EDI)..." See also Fujitsu XLink Processor. XLink references: "XML Linking Language." [cache PDF; see also an approximation of the PDF in a cached/rescued version.]

  • [April 20, 2001] "High-performance XML Storage/Retrieval System." By Yasuo Yamane, Nobuyuki Igata, and Isao Namba. In Fujitsu Scientific and Technical Journal Volume 36, Number 2 (December 2000), pages 185-192 (with 8 references). [Special Issue on Information Technologies in the Internet Era.] "This paper describes a system that integrates full-text searching and database technologies for storing XML (Extensible Markup Language) documents and retrieving information from them while providing a uniform interface. Our main goal with this system is to achieve high-performance, because there will be a large amount of XML documents in the near future if XML becomes a standard for structured documents and data exchange. We have therefore developed techniques for achieving high-performance storage and retrieval of XML documents. For full-text searches, we improved the Structure Index + Text Index model, which references both indexes alternately at retrieval. In our improved method, a hierarchical structure query is converted into a flat structure query by referencing just the structure index, then the optimized query can be quickly processed using only the text index. For storage, we developed an offset space, which is an address space in secondary memory that can compactly store any structure, for example, a tree. We use the offset space to solve the problem that occurs in other methods which store the analyzed result of XML documents as multiple relations in an RDB. In our method, the analyzed result can be stored in a single page in the best case. This makes it superior to other methods which store the analysis results in multiple relations so that storage of N relations needs at least N pages. As a result, generally, our method greatly reduces I/O costs."

  • [April 20, 2001] "Information Objects and Rights Management. A Mediation-based Approach to DRM Interoperability." By John S. Erickson (Hewlett-Packard Laboratories). In D-Lib Magazine [ISSN: 1082-9873] Volume 7, Number 4 (April, 2001). "This article identifies certain architectural principles for the deployment of networked information that, when applied, should contribute to an environment in which digital objects can be readily discovered, retrieved and consumed in ways that encourage the free flow of information while being consistent with individual and organizational intellectual property rights (IPR) policies and preferences. I attempt to reconcile the disparate forces driving the dissemination of information today, including open and free access to materials; interoperability of networked information systems; collection, deployment and maintenance of metadata services; naming infrastructures for information objects; relationship-based policy management; and practical aspects of today's rapidly evolving applications and services. Although the central focus of this article is to confront current information-opaque approaches to digital rights management (DRM), I hope the principles presented here are broader in scope and will suggest solutions elsewhere... In this article I explore the development of a conceptual "platform" for IPR policy expression, discovery and interpretation; in an earlier paper my co-authors and I referred to this as a Policy and Rights Expression Platform (PREP). PREP is a set of guiding principles; it should not be thought of as a digital rights management (DRM) system, but rather as a basis for interoperability for DRM systems and services. I believe these principles are complementary to advanced metadata expression and transport mechanisms currently in development (RDF), and indeed may suggest ways for DRM technologies to leverage those mechanisms. In the second article in this series, I will examine some of the architectural implications of these principles, and will provide an example of implementation..." On DRM and XML, see: (1) Extensible Rights Markup Language (XrML); (2) Digital Property Rights Language (DPRL); (3) Open Digital Rights Language (ODRL); (4) MPEG Rights Expression Language.

  • [April 20, 2001] "Automated Name Authority Control and Enhanced Searching in the Levy Collection." By Tim DiLauro, G. Sayeed Choudhury, Mark Patton, and James W. Warner (Digital Knowledge Center Milton S. Eisenhower Library Johns Hopkins University) and Elizabeth W. Brown (Cataloging Department Milton S. Eisenhower Library Johns Hopkins University). In D-Lib Magazine [ISSN: 1082-9873] Volume 7, Number 4 (April, 2001). "This paper is the second in a series in D-Lib Magazine and describes a workflow management system being developed by the Digital Knowledge Center (DKC) at the Milton S. Eisenhower Library (MSEL) of The Johns Hopkins University. Based on experience from digitizing the Lester S. Levy Collection of Sheet Music, it was apparent that large-scale digitization efforts require a significant amount of human labor that is both time-consuming and costly. Consequently, this workflow management system aims to reduce the amount of human labor and time for large-scale digitization projects... The cornerstones of the workflow management system include optical music recognition (OMR) software and an automated name authority control system (ANAC). The OMR software generates a logical representation of the score for sound generation, music searching, and musicological research. The ANAC disambiguates names, associating each name with an individual (e.g., the composer Septimus Winner also published under the pseudonyms Alice Hawthorne and Apsley Street, among others). Complementing the workflow tools, a suite of research tools focuses upon enhanced searching capabilities through the development and application of a fast, disk-based search engine for lyrics and music and the incorporation of an XML structure for metadata. One of the ultimate goals of the overall Levy Project is to utilize the bibliographic 'raw material' described above as the basis for powerful searching, retrieval, and navigation of the multimedia elements of the collection, including text, images and sound. The existing index records have been converted from text files to more structured metadata using XML tagging. Between now and the end of the project, name information from the index records will be extracted into specific indices such as composer, lyricist, arranger, performer, artist, engraver, lithographer, dedicatee and, possibly, publisher. At the end of Levy II, cross-references will direct users to index records that contain various forms of names. Consistent with the philosophy of Levy II and the workflow management system, we have developed automated tools that will reduce the amount of human labor (and therefore costs) necessary to accomplish the metadata and intellectual access goals described above. On the collection ingestion side is the automated name authority control system (ANAC) that is described below. On the intellectual access side, we have developed a search engine that augments metadata searching with unique search capabilities for lyrics and music. By combining metadata-based searching with full-text and fuzzy searching via the search engine, the full range of rich intellectual information will be made more easily accessible from the online Levy Collection..." Note: "The MARC 21 Format for Authority Data is designed to be a carrier for information concerning the authorized forms of names and subjects to be used as access points in MARC records, the forms of these names, subjects and subdivisions to be used as references to the authorized forms, and the interrelationships among these forms. A name may be used as a main, added, subject added, or series added access entry. The term name refers to: Personal names (X00); Corporate names(X10); Meeting names (X11); Names of jurisdictions (X51); Uniform titles (X30); Name/title combinations." See: "Resource Description and Classification."

  • [April 20, 2001] "ComicsML: A proposed simple markup language for online comics." By Jason McIntosh. "I like comics. I like the Web. I like comics on the Web! But I could like them a lot more, especially if more of them started realizing they're not on paper, and took advantage of some of the wonderful mechanisms their electronic format and globally linked position makes available to them. So, I propose a simple XML-based markup language which, I believe, could help digital comics assert their value as an online resource, as well as an art form... Like any other XML-based language, the kernel of ComicsML is its DTD, which I've tried to keep relatively simple. Experienced DTD pokers-at may note its similarity to RSS, and there's good reason for that, which I'll also explain. Much of the rest is inspired by John Bosak's play.dtd, which he wrote so he could mark up the complete works of Shakespeare... ComicsML has no concept of layout. It knows that each strip has a bunch of panels in it, but it makes no presumptions about how to present them, beyond the order in which the reader is to see them. If a comic that makes use of ComicsML chooses to add URLs for all its panel images, then an application will not know how to display them without further information from other source, and will instead use its own methods for displaying the panels one after the other. The first draft of the ComicsML DTD actually did have some elements for performing rudimentary layout modeled after HTML's table elements, but I decided to ditch it, preferring to keep this markup language simple and all about content, not presentation. Which might be the wrong idea entirely, for comics. What do you think?... Given enough time and testing, I think ComicsML can eventually turn into something that lives up to its name, becoming a method to let digital comics, no matter who produces them, truly hook in to the vast information potential their presence on the Internet offers, while remaining simple enough for any aspiring comics creator to use. Please contact me if you have any thoughts regarding my scribbling here." [cache XML DTD]

  • [April 20, 2001] "ComicsML: A Simple Markup Language for Comics." By Jason McIntosh. From See also previous citation. April 18, 2001. ['ComicsML came to life as a result of a comics artist and fan starting to work with XML.'] "I like comics. I like the Web. I like comics on the Web. But I could like them a lot more, especially if more of them would realize they're not on paper, taking advantage of their electronic format and globally-linked positions. So I propose a simple XML-based markup language which, I believe, could help digital comics assert their value as online resources and as art forms... Like any other XML language, the kernel of ComicsML is its DTD, which I've tried to keep relatively simple. Experienced DTD users may note its similarity to RSS, and there's good reason for that, which I'll explain below. Much of the rest is inspired by John Bosak's play.dtd, which he wrote so he could mark up the complete works of Shakespeare. ComicML's atomic element is the panel, which is inspired by the McCloud/Eisner reading of comics as being an art form based around the magic that occurs when images appear in a juxtaposed sequence. Comics artists most often use individual panels as their images, and so that's the word I picked for my core element. These panel elements hold all the information about a comic's words and pictures, and panels are bundled together into elements called strips, which in turn can live in the DTD's root element, comic. The use of multiple levels was also inspired by RSS. The comic element contains various bits of static metadata about the comic in question, including name, a list of its artists, or a clever tagline. Each strip represents a separate instance of that comic, the meat of which is held within panel. Pretty simple... In the short time since I first shared a draft of this document with friends and coworkers, I've seen glimmers of wonderful ideas for directions to extend ComicsML that I doubt I would have ever conceived myself, covering everything from cross-referencing to story line management, and lots of folks seem game to try and catch the elusive problem of sanely describing layout. What's more, each of these people seemed to name a different reason why they liked the idea of a comics markup language. Given enough time and testing, I think ComicsML can live up to its name, becoming a method to let digital comics, no matter who produces them, hook into the vast information potential their presence on the Internet offers, while remaining simple enough for any aspiring comics creator to use."

  • [April 20, 2001] "ComicsML: A Simple Markup Language for Comics." By Jason McIntosh. From See also previous citation. April 18, 2001. ['ComicsML came to life as a result of a comics artist and fan starting to work with XML.'] "I like comics. I like the Web. I like comics on the Web. But I could like them a lot more, especially if more of them would realize they're not on paper, taking advantage of their electronic format and globally-linked positions. So I propose a simple XML-based markup language which, I believe, could help digital comics assert their value as online resources and as art forms... Like any other XML language, the kernel of ComicsML is its DTD, which I've tried to keep relatively simple. Experienced DTD users may note its similarity to RSS, and there's good reason for that, which I'll explain below. Much of the rest is inspired by John Bosak's play.dtd, which he wrote so he could mark up the complete works of Shakespeare. ComicML's atomic element is the panel, which is inspired by the McCloud/Eisner reading of comics as being an art form based around the magic that occurs when images appear in a juxtaposed sequence. Comics artists most often use individual panels as their images, and so that's the word I picked for my core element. These panel elements hold all the information about a comic's words and pictures, and panels are bundled together into elements called strips, which in turn can live in the DTD's root element, comic. The use of multiple levels was also inspired by RSS. The comic element contains various bits of static metadata about the comic in question, including name, a list of its artists, or a clever tagline. Each strip represents a separate instance of that comic, the meat of which is held within panel. Pretty simple... In the short time since I first shared a draft of this document with friends and coworkers, I've seen glimmers of wonderful ideas for directions to extend ComicsML that I doubt I would have ever conceived myself, covering everything from cross-referencing to story line management, and lots of folks seem game to try and catch the elusive problem of sanely describing layout. What's more, each of these people seemed to name a different reason why they liked the idea of a comics markup language. Given enough time and testing, I think ComicsML can live up to its name, becoming a method to let digital comics, no matter who produces them, hook into the vast information potential their presence on the Internet offers, while remaining simple enough for any aspiring comics creator to use."

  • [April 20, 2001] "Perl and XML: Perl XML Quickstart: The Perl XML Interfaces." By Kip Hampton. From April 18, 2001. ['This first installment of our guide to Perl and XML covers Perl-specific interfaces for reading and writing.'] "A recent flurry of questions to the Perl-XML mailing list points to the need for a document that gives new users a quick, how-to overview of the various Perl XML modules. For the next few months I will be devoting this column solely to that purpose. The XML modules available from CPAN can be divided into three main categories: modules that provide unique interfaces to XML data (usually concerned with translating data between an XML instance and Perl data structures), modules that implement one of the standard XML APIs, and special-purpose modules that seek to simplify the execution of some specific XML-related task. This month we will be looking the first of these, the Perl-specific XML interfaces. This is not an exercise in comparative performance benchmarking, nor is it my intention to suggest that any one module is inherently more useful than another. Choosing the right XML module for your project depends largely upon the nature of the project and your past experience. Different interfaces lend themselves to different kinds of tasks and to different kinds of people. My only goal is to offer working examples of the various interfaces by defining two simple tasks, and then showing how to achieve the same net result using each of the selected modules..." Note: A search for XML modules on CPAN 2001-04-20 returned "214 modules found in 86 distributions matching 'XML'." See: "XML and Perl."

  • [April 20, 2001] "Practical Internationalization." By Edd Dumbill. From April 18, 2001. ['An interview with Tim Bray about the joys and pains of implementing a truly internationalized web application. Internationalized applications are something XML is ideally suited to, yet many shy away from internationalization as 'too difficult.' Tim Bray, however, met the challenge head-on for his site. I talked to Tim and asked him about the joys and pains of internationalizing a web application. Bottom line: he'd do it again, without hesitation.'] "Excerpts: "Writing a fully internationalized app is more expensive than ignoring the issues, but not that much. What's really expensive is the task of going back to i18n-izing an existing i18n-oblivious app. It should also be said that using modern technologies like XML and Java makes it harder to ignore, and easier to implement, good internationalization practices... [Is browsing technology up to coping with multilingual web sites?] Better than I'd hoped. Both IE5 and Mozilla seem to have their acts pretty well together. IE will sometimes even realize when it doesn't have the right fonts installed, and it will ask if it can go off to Microsoft and get Japanese or Cyrillic or whatever; we haven't figured out exactly which combination of HTTP headers makes this work yet. Once the fonts are installed, it doesn't seem to care whether you send the stuff in Unicode or a native encoding like JIS or KO18. If you're not using Unicode, and just reading ordinary HTML pages, the browsers still guess wrong quite a bit, and you have to tell them what to do. If you send UTF-8 along with an HTTP header saying so, I've never seen a modern browser get it wrong. In terms of the actual rendering and display, even of graphically-challenging or 'difficult' languages such as Arabic and Thai, the browsers produce results that look pretty good to my semi-educated eye... [What more could be done to improve awareness of internationalization?] It's not an XML problem, it's a problem of American psychology. Too many smart, good people who are Americans have trouble seeing past their borders and realizing that we Anglophones are a minority in the big picture. I'd say that the population of people using and deploying XML is probably way more i18n-savvy than the average in the computing professions." See W3C Internationalization (I18N) / Localization (L10N): World-Wide Character Sets, Languages, and Writing Systems

  • [April 20, 2001] "XML-Deviant: Intuition and Binary XML." By Leigh Dodds. From April 18, 2001. ['Binary encodings for XML is a well-worn topicon XML-DEV, yet last week's revisiting of the debate introduced some interesting new evidence.'] "In short, the consensus is that a binary XML will at best equal the advantages of XML as it is today. Greater rewards will be found from pursuing the application, and not the re-engineering, of XML."

  • [April 19, 2001] "Introduction to the W3C Grammar Format." By Andrew Hunt. In VoiceXML Review Volume 1, Issue 4 (April 2001). ['The W3C Voice Browser Working has released a draft specification for a standard grammar format that promises to enhance the interoperability of VoiceXML Browsers and drive portability of VoiceXML applications. This article summarizes the key features of the specification and the application of the specification to VoiceXML application development.'] The W3C Speech Recognition Grammar Format specification embodies two equivalent languages.XML Form of the W3C Speech Recognition Grammar Format: Represents a grammar as an XML document with the logical structure of the grammar captured by XML elements. This format is ideal for computer-to-computer communication of grammars because widely available XML technology (parsers, XSLT, etc.) can be used to produce and accept the grammar format.Augmented BNF (ABNF) Form of the W3C Speech Recognition Grammar Format: The logical structure of the grammar is captured by a combination of traditional BNF (Backus-Naur Form) and a regular expression language. This format is familiar to many current speech application developers, is similar to the proprietary grammar formats of most current speech recognizers and is a more compact representation than XML. However, a special parser is required to accept this format. Grammars written in either format can be converted to the other format without loss of information (except formatting). The two formats co-exist because the Working Group found it important to support both computer-to-computer communication format and a more familiar human-readable format (but, as with all decisions reached by a committee, there is a spectrum of opinion on these matters)... The new W3C Speech Recognition Grammar Format is a powerful language for developing both simple grammars and natural language grammars for use in VoiceXML applications. The availability of a standard grammar format will increase the interoperability of VoiceXML applications by allowing each grammar to be authored once and reused across many VoiceXML browsers." See "VoiceXML Forum."

  • [April 19, 2001] "The Speech Synthesis Markup Langauage for the W3C VoiceXML Standard." By Mark R. Walker and Andrew Hunt. In VoiceXML Review Volume 1, Issue 4 (April 2001). ['Among the first in a series of the W3C's soon-to-be-released XML-based markup specifications is the speech synthesis text markup standard. This article summarizes the markup element design philosophy and includes descriptions of each of the speech synthesis markup elements.'] "A new set of XML-based markup standards developed for the purpose of enabling voice browsing of the Internet will begin emerging in 2001 from the Voice Browser Working Group, which was recently organized under the auspices of the W3C. Among the first in this series of soon-to-be-released specifications is the speech synthesis text markup standard. The Speech Synthesis Markup Language (SSML) Specification is largely based on the Java Speech Markup Language (JSML), but also incorporates elements and concepts from SABLE, a previously published text markup standard, and from VoiceXML, which itself is based on JSML and SABLE. SSML also includes new elements designed to optimize the capabilities of contemporary speech synthesis engines in the task of converting text into speech. This article summarizes the markup element design philosophy and includes descriptions of each of the speech synthesis markup elements. The Voice Browser Working Group has utilized the open processes of the W3C for the purpose of developing standards that enable access to the web using spoken interaction. The nearly completed SSML specification is part of a new set of markup specifications for voice browsers, and is designed to provide a rich, XML-based markup language for assisting the generation of synthetic speech in web and other applications. The essential role of the markup language is to give authors of synthesizable content a standard way to control aspects of speech output such as pronunciation, volume, pitch and rate across different synthesis-capable platforms. It is anticipated that SSML will enable a large number of new applications simply because XML documents would be able to simultaneously support viewable and audio output forms. Email messages would potentially contain SSML elements automatically inserted by synthesis-enabled, mail editing tools that render themessages into speech when no text display was present. Web sites designed for sight-impaired users would likely acquire a standard form, and would be accessible with a potentially larger variety of Internet access devices. Finally, SSML has been designed to integrate with the Voice Dialogue markup standard in the creation of text-based dialogue prompts. The greatest impact of SSML may be the way it spurs the development of new generations of synthesis-knowledgeable tools for assisting synthesis text authors. It is anticipated that authors of synthesizable documents will initially possess differing amounts of expertise. The effect of such differences may diminish as high-level tools for generating SSML content eventually appear. Some authors with little expertise may rely on choices made by the SSML processor at render time. Authorspossessing higher levels of expertise will make considerable effort to mark as many details of the document to ensure consistent speech quality across platforms and to more precisely specify output qualities. Other document authors, those who demand the highest possible control over the rendered speech, may utilize synthesis-knowledgeable tools to produce 'low-level' synthesis markup sequences composed of phoneme, pitch and timing information for segments of documents or for entire documents..." See "VoiceXML Forum."

  • [April 19, 2001] "Modeling XHTML with UML." By Dave Carlson (Ontogenics Corporation). This white paper describes the first complete XML Schema for XHTML Basic, which was adopted as a W3C Recommendation in December 2000. The W3C Recommendation specifies XHTML Basic with a DTD implementation, principally because DTDs were the only recommendation in force at that time. However, we will soon reach a point when the W3C has two schema recommendations, and there are several other XML schema/validation languages that are competing for our attention (RELAX, TREX, and Schematron). Thus, a new approach was taken to produce the XML Schema described here: the XHTML Basic specification was manually reverse-engineered into a Unified Modeling Language (UML) class diagram, then the Schema was automatically generated from that UML model. The Schema generation tool was developed by Ontogenics Corp. (the creator of this portal). Other schema languages can be produced in a similar manner; prototypes are under development for generation of DTD and RELAX." Includes XML Schemas generated from the UML model of XHTML Basic: a) without complexType inheritance: XHTML.xsd, and b) with complexType inheritance: XHTML-inheritance.xsd. References: (1) David Carlson's 2001 book Modeling XML Applications with UML. Practical e-Business Applications; (2) web site; (3) "Conceptual Modeling and Markup Languages." [cache]

  • [April 19, 2001] "Transitive Closure for XPath." By Christian Nentwich (Department of Computer Science, University College London). "During the last couple of months, while working on xlinkit and my thesis, I have had to deal with several cases where I had to compute the transitive closure of elements in XML files inside an xpath expression. Unfortunately, XPath has no proper transitive closure operator. Let me give you an example. The XMI standard specifies a DTD for encoding UML models. My thesis is to do with consistency checking, and one of the rules in the UML is that no class can be a superclass and a subclass of another at the same time. In order to specify this rule, I have to compute the set of parent classes and the set of descendent classes for each class - transitively. It helps to have a look at some XMI fragments to understand the complexity of the problem, I've only left in the relevant bits... I propose that a new function be added to xpath, closure(node-set,node-set). This function will first evaluate its first parameter to form a base set. It will then repeatedly evaluate the second parameter, which has to be a relative expression, over the base set, to form a new base set..." [XML-DEV note: "Something I really lose sleep over is the lack of a transitive closure function in XPath. I use XPath standalone, outside XSLT and I need such an operator (to compute the set of all parent classes of a class in XMI, for example). Would you please comment on [this] little proposal I have written? It includes an implementation of the operator for Xalan as a freebie. Just imagine what this operator could do for your family tree XSLT stylesheet. Am I the only one who finds this (extremely) useful? Please comment."]

  • [April 19, 2001] "ContentGuard An early leader in digital rights management. Company to Watch." By Elizabeth Gardner. In InternetWorld (April 15, 2001), page 16. ['With a portfolio of digital rights patents from Xerox PARC, ContentGuard may be the best-positioned player in the emerging DRM field.'] "... There may be no company better situated to take advantage of this [DRM - Digital Rights Management] market than ContentGuard, which spun off from Xerox a year ago and owns a portfolio of DRM patents developed at the renowned Palo Alto Research Center (PARC). The company unveiled its first suite of tools and services, called RightsEdge, earlier this year. A minority investment from Microsoft means ContentGuard products are likely to become front-runners as DRM gets built into various Microsoft products. There's nothing like being affiliated with the software industry's most powerful setter of de facto standards. But ContentGuard would prefer an open standard to a de facto one. To that end it has developed the Extensible Rights Markup Language (XrML), a subset of XML used to protect assets in combination with a compliant vendor's products. ContentGuard CEO Michael Miron says the company has issued 1,800 XrML licenses so far; the company doesn't charge for the license, but requires that licensees adhere to the terms of an agreement on how the specification may be implemented and modified. Backers include Adobe, Hewlett-Packard, AOL Time Warner, and Bertelsmann. The make-or-break year for DRM and XrML may well not come until 2003..." References: (1) "Extensible Rights Markup Language (XrML)"; (2) "Digital Property Rights Language (DPRL)"; (3) W3C Workshop on Digital Rights Management.

  • [April 18, 2001] "Microsoft SOAP Toolkit Version 1.0 to 2.0 Migration Tutorial." By Jay Zhang and Matt Powell. Microsoft Developer Network, April 2001. ['This article describes how to migrate a simple application from version 1.0 to version 2.0 of the Microsoft SOAP Toolkit using high-level interfaces.'] "This article is intended to help developers and software architects obtain a better understanding of the differences between the SOAP Toolkit 1.0 and 2.0 object models and infrastructure. We'll discuss how to migrate simple applications from SOAP Toolkit 1.0 to SOAP Toolkit 2.0 using high-level interfaces, and then go on to demonstrate a migration implementation using the ROPEDEMO sample application from version 1.0 of the SOAP Toolkit... Both versions 1.0 and 2.0 of the Microsoft SOAP Toolkit provide an infrastructure for performing remote procedure calls (RPCs) using SOAP over HTTP. As shown in Figure 1, the Proxy object in 1.0 and SoapClient and SoapServer objects in 2.0 are considered high-level interfaces; the SOAPPackager and WireTransfer objects in 1.0 and SoapSerializer, SOAPReader, and SoapConnector objects in 2.0 are considered low-level interfaces. As a superset of SOAP Toolkit 1.0, version 2.0 also provides SOAP Messaging Objects (SMO) for XML document messages (see Figure 1). The version 2.0 SOAP SMO frameworks is a major extension as compared to version 1.0 of the SOAP Toolkit, allowing business XML documents to be added to incoming and outgoing SOAP messages. The SMO framework works with the lower level 2.0 interfaces. For migration purposes, our discussion will focus primarily on SOAP RPC and the XML functionality of passing parameters as XML that is provided by version 1.0 using high-level interfaces in this article... As a superset of SOAP Toolkit 1.0, SOAP Toolkit 2.0 definitely provides a much-improved infrastructure with better performance. Simple applications using the high-level SOAP Toolkit 1.0 interfaces can be easily migrated to the SOAP Toolkit 2.0 high-level interfaces provided by the MSSOAP.SoapClient and MSSOAP.SoapServer objects."

  • [April 17, 2001] "Electronic Business XML (ebXML) Requirements Specification Version 1.04." By ebXML Requirements Team. Project team lead: Mike Rawlins. "March 19, 2001." Draft Specification for Review. End of Review Period: 30 April 2001. "This ebXML Requirements Specification represents the work of the ebXML Requirements Project Team. It defines ebXML and the ebXML effort, articulates business requirements for ebXML, and defines specific requirements that shall be addressed by the various ebXML project teams in preparing their deliverables. The scope of the ebXML business requirements is to meet the needs for the business side of both business to business (B2B) and business to consumer (B2C) activities. Consumer requirements of the B2C model are beyond the scope of the ebXML technical specifications. Application-to-application (A2A) exchanges within an enterprise may also be able to use the ebXML technical specifications, however ebXML A2A solutions shall not be developed at the expense of simplified B2B and B2C solutions. The business requirements to be addressed by the ebXML initiative are divided into nine core areas - General Business, Electronic Business, Globalization, Openness , Usability/Interoperability, Security, Legal, Digital Signature, and Organizational..." See "Electronic Business XML Initiative (ebXML)." [cache]

  • [April 17, 2001] "Group Maps RosettaNet to Supply-Chain Process." By Marc L. Songini. In InfoWorld (April 17, 2001). "Intel and Siemens are spearheading a fledgling initiative to marry XML to a complex set of supply-chain business procedures to streamline e-commerce transactions. The companies are taking RosettaNet, the electronics industry's XML-based language, and aligning it to the Supply Chain Operations Reference (SCOR) model, aiming to create reusable, intricate procedures based on standard supply-chain practices. They are currently working on a pilot project that will serve as proof-of-concept for the initiative, though no specific time frame has been established. SCOR is the industry standard set of procedures defined by the Supply-Chain Council in Pittsburgh, which has 800 members, many of them large manufacturers, including Intel and Siemens. It offers best practices procedures for a wide variety of supply-chain activities, including the planning, sourcing and delivery of goods, spanning from the supplier to the manufacturer to the end customer. The council's board hopes the RosettaNet-to-SCOR initiative, if successful, will serve as a 'frame of reference' that other industry groups, such as those serving the chemical or automotive industries, can use in the future, said Scott Stephens, the Supply-Chain Council's chief technology officer. It's unclear how many council members have implemented SCOR procedures, but at least 100 have documented installations, said Stephens. The initiative relies on tying specific SCOR procedures to RosettaNet Partner Interface Processes, which handle multiple data transactions among partners. Advocates claim that this will let RosettaNet handle new, sophisticated supply-chain processes and will result in new collaboration capabilities. For instance, SCOR and RosettaNet could be aligned for things such as handling purchase orders or scheduling product deliveries, and they could take into account things as varied as different business methods and network protocol requirements. The initiative targets RosettaNet because of its importance to Intel and Siemens, but the companies intend to broaden its purview later, said George Brown, a council board member and senior staff architect for worldwide IT at Intel in Chandler, Ariz. 'The approach is independent. . . . We plan to map to [other XML standards] and are looking for a close alignment with ebXML,' a proposed specification for an electronic-business framework, he said..." References: (1) Supply Chain Council; (2) SCOR FAQ document; (3) "RosettaNet."

  • [April 17, 2001] "Software Tool Pulls Data into Corporate Portals." By Cathleen Moore. In InfoWorld (April 16, 2001). "Attempting to simplify the process of pumping content into corporate portals, software vendor OnePage will roll out a content aggregation tool this week, designed to convert Web-accessible data into components that can slide into portals. The software, dubbed Content Connect, extracts data from document-based systems and transforms it into an XML feed that can be reformatted for any portal interface. The tool uses the company's Content Collection Language and Feature Extraction technology to identify and automatically pull content from legacy systems, partner sites, subscription services, as well as CRM (customer relationship management) and ERP (enterprise resource planning) applications, according to OnePage officials. According to one analyst, because enterprises are struggling to tame an onslaught of information, finding an effective means of collecting content from internal and third-party sources is extremely useful. 'The content-extraction issue is a lot more important now as data proliferates both within the enterprise and outside. It becomes a lot more critical to isolate those elements that are germane to a particular user's needs,' said Ed Maguire, industry analyst for e-business analytics at Merrill Lynch Global Securities Research and Economics, in New York... OnePage is offering its portal-building software as a stand-alone offering for enterprises with large portal installations as well as a possible integrated option in enterprise portal products from vendors such as Plumtree, Sybase, BroadVision, IBM, and Oracle. The technology's Java and XML-based approach allows easy integration with other portal platforms, OnePage officials said. Tapping XML as the means of portalizing data will ensure a broad reach, Maguire said. 'XML is the lingua franca of content interchange. Any solution that doesn't support XML for content metatagging will become obsolete very quickly,' he said. In a similar vein, enterprise portal vendor DataChannel plans to unveil an extension kit next month for integrating enterprise application data into the company's DataChannel Server XML-based portal. The DataChannel Server extension kit is designed to access data locked in enterprise applications, such as ERP and CRM, and legacy data stores. DataChannel's portal will leverage the EAI (enterprise application integration) connector to deepen integration with applications and systems, using it as a backbone to access rules for data extraction and management, said Eric Varness, director of product marketing at DataChannel, in Bellevue, WA..."

  • [April 17, 2001] XML Aids Content Publishing." By Roberta Holland. In eWEEK (April 15, 2001). "As the number of new computing devices and new data formats grows, so, too, is demand for ways to create content once and repurpose it for multiple platforms without duplicating their work for each medium... Adobe CEO Bruce Chizen said: 'A key technology to enable that goal will be XML (Extensible Markup Language). Adobe will tag all its products with XML metadata to help its partners more easily repurpose the content... Officials with Arbortext Inc., of Ann Arbor, Mich., said the company already is providing an XML solution like the one envisioned by Adobe... Efforts are also under way to help publishers search and find content from repositories and ensure that the rights to the content are protected. A group called Publishing Requirements for Industry Standard Metadata (PRISM) released the first version of a metadata specification last week. The specification provides a vocabulary for print and online publishing, including descriptions of the data and rights and permissions associated with the data. Also working in the digital rights management space is ContentGuard Inc., of Bethesda, Md. ContentGuard, a spinoff from Xerox Corp., created Extensible rights Markup Language to express rights, terms and conditions attached to content. The company soon will release the language to an open standards group it is forming."

  • [April 16, 2001] "Enabling Open, Interoperable, and Smart Web Services The Need for Shared Context." By Anne Thomas Manes (Sun Microsystems). Paper presented at the W3C Workshop on Web Services (11-12 April 2001 San Jose, CA, USA); see the complete list of 64 position papers. "The Internet has had a profound impact on user expectations. Not too long ago, a computer user was a highly trained individual. Businesses trained their users to be experts in one or two specific applications. A Web user, though, is an entirely different breed. A business doesn't have the luxury of training its Web users. Web applications have to be intuitive. Or perhaps it's more appropriate to say that Web applications have to be invisible. A Web user simply uses the Web--to read email, find directions, pay bills, approve an expense request, and more. The user doesn't view these activities as executing applications. The user is simply using services that are available on the Web. The user wants to be able to access these services from a wide variety of client devices, such as desktop systems, PDAs, mobile phones, and in-car computers. Furthermore, the user wants these services to understand and act differently according to the context of the situation. Who am I? Where am I? What time is it? Am I acting as an employee or an individual? These new user expectations are causing businesses to change the way they build application systems. Rather than building large, monolithic application systems or desktop-oriented client/server applications, businesses are starting to build applications using a service-oriented application design. Application software is being broken down into its constituent parts--into smaller, more modular application components or services. These application services make use of infrastructure software that has also been decomposed into discrete system services. All of these discrete services can be deployed across any number of physical machines that are connected to the Internet. This modular service approach gives businesses great flexibility in system design. By reassembling a few services into a new configuration, a business can create a new business service."

  • [April 16, 2001] "Web Services Framework." By Andrew Layman (Microsoft). Paper presented at the W3C Workshop on Web Services (11-12 April 2001 San Jose, CA, USA); see the complete list of 64 position papers. "While most descriptions of Web based solutions emphasize their distributed characteristics, their decentralized nature -- they have distinct management and control environments and communicate across trust domains -- has much more impact on architecture of this framework and the requirements of the underlying protocols. So, we focus our framework first on supporting application-to-application integration between enterprises having disjoint management, infrastructure and trust domains. The focus of this document and the framework it defines is a model for describing, discovering and exchanging information that is independent of application implementations and the platforms on which applications are developed and deployed. We note that other organizations such as the IETF and ebXML are tackling a related set of problems, and we are pleased there are already formal liaisons between the W3C XML Protocol Working group and its counterparts in both ebXML and IETF. We hope and expect that the realization of this framework will embody the best ideas from all of these sources, and yield a common framework and supporting components. This integration and convergence is essential for realizing the benefits of cross-enterprise, global application integration. Why Have a Framework? A common framework identifies specific functions that need to be addressed in order to achieve decentralized interoperability. It does not determine the particular technologies used to fulfill the functions but rather divides the problem space into sub-problems with specified relationships. This functional decomposition allows differing solutions to sub-problems without overlaps, conflicts or omitted functionality. This is not to say that all applications must offer the same facilities, rather that when a feature is offered it should fit into a common framework and preferably have a standard expression."

  • [April 16, 2001] "Web Services Architecture: Direction and Position Paper." By Donald F. Ferguson (IBM Corporation). Paper presented at the W3C Workshop on Web Services (11-12 April 2001 San Jose, CA, USA); see the complete list of 64 position papers. "... what's new about web services? This is an important question because the answer defines the requirements, and thus the solution. We think that the new concepts are: (1) More traditional distributed programming models focus on locating instances of 'services' by name. Some examples are symbolic queue names in message middleware and JNDI look up in distributed Java. The web service model introduces the concepts of By-what and By-how publication and discovery of instances. We can view this like the 'crawler' and 'search engine' approaches to finding interesting web sites. (2) More traditional approaches to programming rely on pre-defined interfaces. The code that uses the service understands the message/command formats of the target service. Web service models will increasingly rely on brokers that convert from requested interfaces to published interfaces. In fact, initial 'service brokers' are a natural evolution of existing approaches to message brokering. For example, if there are two or three common standards for services and message formats in a specific industry, a broker allows an enterprise to implement and use one vocabulary and still interoperate with companies using other dialects. General solutions to the interface and service brokering appear intractable, and are not well understood. This is an area that will undergo much research and pilot development over the next few web years. (3) Since web services are an evolution of many existing models, it must support a broad set of underlying implementation technology. The same conceptual model will cover both simple XML posting of forms and robust, reliable message oriented middleware. (4) Most previous approaches to application integration are systematic. Web services focus more on ad hoc, shorter-term partnerships and collaborations. Moreover, the interfaces between the partners may be newly defined, specifically for the duration of the collaboration. In the remainder of this paper, we discuss the functions and technology that should support web services."

  • [April 16, 2001] "Web Services: A position paper for the W3C." By Gerald W. Edgar (Boeing Commercial Airplanes Group). February 22, 2001. Paper presented at the W3C Workshop on Web Services (11-12 April 2001 San Jose, CA, USA); see the complete list of 64 position papers. "This position paper proposes directions for the development of Web services to provide value for application development and deployment. Web services have the potential of enabling a platform-independent integration architecture if the right tools and infrastructure are in place. To this writer the key elements of Web services consist of the following: search and discovery mechanisms, methods of secure transport, and verifiable means to prevent data interception and corruption. Each of these elements are currently being worked by different groups and organizations. For Web services to deliver on its potential requires that each element work in concert, and that the whole infrastructure is configured to support development and deployment. Web services development needs to encompass discovering and using information object definitions, using platform- and implementation-specific capabilities to access the information objects along with packaging for using the objects. Using a broadly available infrastructure, Web services enable applications to create requests for service, a transport mechanism to send the request, and then the capability to receive results. These parts all work in concert to enable the creation and deployment of new applications in a stable (always working) but dynamic (always changing) environment. Each element in the infrastructure provides support and a connection to other elements, enabling development. When an application is deployed, the elements also need to support its operation. All these parts need to work together to maximize their value. Each element of this has value, and that value is increased as the parts work together better."

  • [April 16, 2001] "Industry Closer To XML-Based Web Services Standards, IBM Says. W3C Could Tackle Security Issue." By Elizabeth Montalbano. In (April 13, 2001). "In a two-day meeting this week, about 70 representatives from leading industry players moved closer to a consensus on XML-based standards for Web services, an IBM executive says. Such a consensus would make it easier for solution providers to build Web services. The meeting here Wednesday and Thursday included IBM, Sun Microsystems, Microsoft, Oracle, Hewlett-Packard, Cisco Systems, Nokia, WebMethods, BEA Systems and Bowstreet. The meeting was sponsored by IBM and the W3C, the international technology standards consortium. Reaching consensus on Web services standards could mean the W3C will take over some of the work being done by industry leaders and other consortia, says Bob Sutor, director of e-business standards strategy for IBM... Web services are dynamic applications that allow computer systems to communicate and perform functions over the Internet without human interaction. Microsoft, Oracle, Sun, IBM, Bowstreet and BEA have unveiled strategies to give solution providers the tools and frameworks to build Web services for their clients. Industry experts say the biggest problem facing the development of Web services is how to effectively and securely facilitate communication between systems with different architectures. Industry watchers identify standards-based security as widely uncharted territory. Simple Object Access Protocol (SOAP) defines how to send standard XML-based messages. The Universal Description, Discovery and Integration (UDDI) initiative tackles how to register services and discover partners. And Web Services Description Language (WSDL) solves how to describe services. But security remains a big issue..."

  • [April 16, 2001] "B2B Transactions. Will Web Services Do The Trick?" By John Webster. In InternetWeek (April 10, 2001). ['Business partners need a better way to share back-end applications data. Standards-based services promise fast deployment and lower costs.'] "The ultimate goal of Web services, many vendors say, is to let Internet applications interact with each other the same way humans interact with them. Exactly how that will happen remains unclear, but Web services proponents say just as the Internet is navigated by humans using Web browsers, applications will be able to navigate the Web and interact using emerging Web service standards. Web services promise to offer a standard API that will let any two companies conduct e-business transactions, regardless of their IT infrastructure. The technology lets one company's inventory management application interact with a trading partner's product shipping application, even if the applications are running on different platforms. And emerging standards will help companies locate potential business partners based on the services the partners offer, and ensure that their respective apps can work together. Web services use XML to let applications talk to one another. They primarily use the Simple Object Access Protocol (SOAP), an Internet messaging specification that can be "wrapped around'' standard business apps so they can work together. Another important Web services standard is the Web Service Description Language (WSDL), which provides businesses with a standardized means for describing how apps can interoperate online. Web services vendors have also joined forces to develop an XML directory for companies doing business on the Web, called the Universal Description, Discovery and Integration (UDDI) specification. Yet another standard gathering strength is e-business XML (ebXML), which defines core components, business processes, registry and repository, messaging services, trading partner agreements and security. By using these standards, apps can navigate, discover, identify and bind with other apps on the Internet and establish one-to-one relationships, says Scott Hebner, director of WebSphere marketing at IBM. "Today, programmers have to do that manually," he says... The Web services standards and products are so new that not many companies are deploying them yet. One company that's moving forward, though, is CSE Insurance Group, a general insurance company based in Walnut Creek, Calif. Last year, CSE wrote $110 million in auto, property, commercial and life insurance. To reduce paper and mailing costs and speed up forms processing by eliminating time-consuming fax exchanges, CSE built an insurance premium rating application that uses SOAP to link the rating app to a forms app that can be accessed by CSE agents and underwriters via a Web interface. The rating application's engine is a set of algorithms and tables from which insurance premium rates are set based on state-approved variables, such as coverage, territory and risk. CSE used the Visual Basic tools within Microsoft's VisualStudio.Net to generate a SOAP interface to wrap around the app. The company then used Avinon's NetScenarios development tool to link the rating app -- wrapped in the SOAP envelope -- to the forms application... An important benefit of using Web services is that the SOAP interface will let the company make the rating engine available over the Internet to agents or other carriers as a syndicated service, says systems analyst Jimm Pierson. He says because SOAP lets companies exchange information in a decentralized, distributed environment, the server that hosts the Web app--data-entry forms in this case--can be located anywhere and still plug into CSE's rating engine. SOAP would let insurance carriers that have already built the data-entry screens collect information from insurance agents or consumers and exchange the information with CSE's rating engine over the Internet... CSE now retrieves DMV data using batch-oriented electronic data interchange (EDI), which can take three business days to process. A few states, including Arizona, have the technology in place to post DMV information on the Internet. As more states do this, CSE will be able to retrieve DMV information in seconds. Because the application functionality is enclosed in a SOAP envelope, it is platform- and language-independent, Pierson says. COBOL apps running on the company's AS/400, for example, can be delivered to a trading partner's Oracle database on a Sun server via a SOAP interface."

  • [April 16, 2001] "The Design of the DocBook XSL Stylesheets." By Norman Walsh (XML Standards Engineer Sun Microsystems, Technology Development Center). Paper presented at XSLT-UK 01 (08 Apr - 09 Apr 2001, Keble College, Oxford, England). 08-April-2001. Version 1.0. "Building stylesheets for a large, rich XML vocabulary is a challenging exercise. This paper explores some of the design issues confronted by the author in designing XSL stylesheets for DocBook, an XML DTD maintained by the DocBook Technical Committee of OASIS. It is particularly well suited to books and papers about computer hardware and software (though it is by no means limited to these applications). DocBook consists of nearly 400 tags. The HTML and Formatting Object stylesheets each consist of roughly 1000 templates spread over about 30 files. The design for the DocBook XSL Stylesheets attempts to meet the following goals: (1) Full support for all of DocBook. (2) Full support for both HTML and XSL Formatting Object presentations. (3) Utility for a wide range of users, with varying levels of technical skill. (4) Support across diverse hardware and software platforms. (5) Provide a framework on top of which additional stylesheets can be written for schemas derived from DocBook. (6) Support for internationalization. (7) Support for a wide range of projects (books, articles, online- and print-centric presentations, etc.) Although not all of these goals have been completely achieved, progress has been made on all of them. Five techniques stand out as important factors in achieving these goals: modularity, parameterization, self-customizing stylesheets, 'literate' programming, and extensions. The rest of this paper will discuss these techniques in detail... [Conclusion:] XSL Transformations and Formatting Objects are a rich platform on which to build stylesheets for large, sophisticated XML vocabularies. Designing stylesheets that will be adaptable and maintainable is an interesting software engineering challenge. In this paper we've examined five factors that contribute to the successful design of XSL stylesheets: modularity, parameterization, stylesheet generation, documentation, and XSLT extensions." References: (1) "DocBook XML DTD"; (2) "SGML/XML and Literate Programming"; (3) "Extensible Stylesheet Language (XSL/XSLT)."

  • [April 16, 2001] "XML Specifications Dependencies Chart." By Ian Graham. April 16, 2001. "As part of a lecture, I prepared a slide showing the dependencies of the various W3C XML specs (plus SAX), and I thought this might be of interest to some on this list. A screen capture of the PPT slide is [available]. The arrows indicate which specs a given spec depends on (dependencies being accumulative). Orange corresponds to a 'recommendation', and blue to a working draft or a 'proposed recommendation'. There are no specification versions .... I hope to eventually turn this into an animated slide ... could have the various spec's pop into appearance according to the historical timelines. I suspect that the growth is at least exponential..." [from XML-DEV posting]

  • [April 13, 2001] "XML Tools for Publishers Gain Ground." By Mark Walter. In Seybold Report: Analyzing Publishing Technology [ISSN: 1533-9211] Volume 1, Number 1 (April 02, 2001), pages 36-38. ['Cross-media needs fuel XML adoption. Vendors offer tools tailored for commercial publishing applications. As XML gains acceptance, new tools are hitting the market to help publishers convert to an XML-based workflow. We highlight some of those tools and the rest of our Editors' Hot Picks.'] "The Extensible Markup Language (XML) continues to gain acceptance among publishers of all types, but commercial publishers have had trouble locating XML-aware tools that took into account their unique needs. Fortunately, that void is starting to be filled; niche suppliers are introducing innovative XML-related tools and services to help publishers cope with the challenges of capturing their authored text into a form that is suitable for cross-media publishing. Though XML's benefits are apparent and it is simpler than SGML, XML has implementation hurdles that publishers must clear. First, of course, is that you have to define your markup -- your tagset and document definitions. There really aren't short cuts that can be taken in that step; even if you want to start with a public definition, you'll probably want to add elements or attributes that are unique to your content or organization. A second hurdle is converting from a word-processing or page-oriented workflow to one based on XML. For print publishers accustomed to treating their film as the final form of their publications, neither Quark nor Adobe have moved as quickly on this front as customers would have liked. Quark has released its Avenue.Quark Xtension, but it has dragged its feet on adding automation and QPS support. This spring, Adobe is introducing a new XML-based metadata architecture for Acrobat, but it has yet to add XML export to InDesign. Help on the way. In the meantime, third-party developers are stepping up to offer relief. Several companies at Seybold Seminars Boston this month will unveil products designed to help editorial teams convert their stories to XML... Whether it's to feed syndication, produce Web channels, enhance print workflows or support future product development, XML is taking hold and proving its versatility -- and paving the road to cross-media publishing. The hard part has been changing old print-centric editorial processes to write and edit in a less media-specific form. At long last, vendors are recognizing the opportunity and stepping forward with offerings that are better tailored to editorial needs than many of those we've seen in the past." [Note: "The Seybold Report on Publishing Systems and The Seybold Report on Internet Publishing are converging: The new Seybold Report: Analyzing Publishing Technology covers the full spectrum of technology and business issues facing publishers today." The new report will appear twice a month. The first three issues will be online free. See news item.

  • [April 13, 2001] "Acrobat 5 Makes the Pitch For Online Sharing." By Mark Walter and John Parsons. In Seybold Report: Analyzing Publishing Technology [ISSN: 1533-9211] Volume 1, Number 1 (April 02, 2001), pages 8-14. ['Adobe's latest upgrade to its electronic document viewer adds a number of new features that should keep its strong market position intact. But the integration with Glassbook's e-book technology won't come until later this year, and PDF print workflows received scant attention. Is Acrobat 5 a must-have upgrade or can users wait for further development?'] "Striving to attract corporate customers without alienating its longtime constituents, Adobe has introduced Acrobat 5, a new version of its venerable electronic document viewer. Accompanying the new software release is PDF 1.4, a new version of the underlying Portable Document Format specification. Though the new software sports an updated user interface and useful incremental improvements, it is probably the updates to PDF -- including its metadata architecture, tagged PDF spec, and transparency model -- that are most significant... On the plus side, all Acrobat functions can now be turned into batch functions that run on a group of files, and new functions can be added using JavaScript. A built-in JavaScript editor is supplied, and JavaScript support has been enhanced to work with forms and the Web capture facility, which converts Web pages into PDF. XML support: Under pressure to extend PDF to include XML markup, Adobe has made changes in three areas. First, Acrobat forms can now be set up to capture data as tagged XML, as well as HTML and Adobe's FDF format. Second, Adobe is introducing a new metadata architecture to PDF, one based on an RDF-compliant DTD. Metadata can be attached both at the document and object level, and the DTD can be extended, opening up interesting possibilities for defining and embedding metadata other than the basic set supported in Acrobat 5. Third, Adobe has defined a way to embed structure into PDF. Called 'Tagged PDF,' it is a set of conventions for marking structural elements within the file... Adobe is offering two methods of embedding metadata, both relying on XML encoding. One, XML Packets, is specific to PDF and intended for applications that read PDF documents. The second, XAP (Extensible Authoring and Publishing), is an RDF-compatible schema that can be extended by developers or sophisticated end users. Metadata may be associated at the stream level (document objects) as well as at the document level. The introduction of the new metadata handling can be done in a way that is backward compatible with PDF 1.3, and Adobe encourages developers to do this by also writing name-value pairs having an equivalent in PDF 1.3 into the Info dictionary as well as into the metadata stream. Talk with your vendors. A flexible and extensible method of attaching metadata to PDF -- and document components -- is an important addition to the base specification, but its value will not be realized until software developers take advantage of it... In short, though half a dozen engineering teams collaborated on the new release, the basic functions that Acrobat and PDF serve have not changed much, if at all, in Acrobat 5. In particular, this release does not meld Glassbook's e-book interface with that of Acrobat, a future development that is sure to be a dramatic change for the product. Nevertheless, this release does coincide with an update to PDF, which merits investigation because of its worldwide stature as a standard for representing final-form documents..." Article also available in PDF format.

  • [April 13, 2001] "E-Textbooks Test Emerging Platforms." By Mike Letts. In Seybold Report: Analyzing Publishing Technology [ISSN: 1533-9211] Volume 1, Number 1 (April 02, 2001), pages 25-30. ['Textbooks: E-Books Best Chance for Success? Textbook publishers are beginning to take advantage of high levels of computer literacy, while developing new ideas about multimedia technology and electronic formats. But issues such as conversion, content management and distribution format remain hurdles. In this article, we begin profiling e-book vendors working to define textbook delivery platforms of the future.'] "We researched several electronic textbook vendors, which we will present over the course of several issues, to uncover exactly how applicable their technology is for today's textbook market, and how close those companies are to implementing adequate solutions in the classroom. Issues such as authoring and content conversion, cost (both to the publisher and the consumer), multimedia functionality, format support, and experience in the classroom will be addressed to provide publishers with a better sense of the solutions that are currently out there and how the technologies are evolving. Over the next several issues, we'll profile vendors such as Poliplus, Versaware, Rovia, MetaText, WizeUp Digital, and ByteSizeBooks. What we found were a wide range of answers -- some are further along than others, each providing unique advantages to readers and publishers alike... One of the most interesting companies operating in the electronic textbook space today is GoReader. A relative newcomer to the space, GoReader has developed a dedicated device designed specifically for academic use. The company is using Linux as its operating system. As a result, GoReader has built its own Java-based user interface on top of the OS that is based on an XML browser. The file format supported is OEB... The GoReader device has a memory capacity of 5GB -- enough, the company argues, to store approximately 350, 1,000-page high-graphics textbooks -- and has a 206 MHz internal processor... At the moment, GoReader does all conversion in-house, a tall order for a staff of 12 and contracts Addison Wesley and Harcourt in the works. However, to alleviate some of this backlog is a third-party authoring toolkit still in development. Professors will also be able to use the toolkit -- which will allow them to create OEB-based files, add hyperlinks, and author content through the GoReader server -- will be password protected and available only to designated GoReader devices... GoReader's proprietary conversion utility is automated to extract XML or PDF formats from Quark files, as well as convert directly from PDF to XML. The company also handles PWPT, Word files, and printed text. Mark Cassin, director of sales at GoReader, said the files will be converted to an XML format at no charge to the publisher." Also available in PDF format. On OEB, see "Open Ebook Initiative."

  • [April 13, 2001] "Associated Newspapers Builds a System for Its Future." By Andrew Tribute. In Seybold Report: Analyzing Publishing Technology [ISSN: 1533-9211] Volume 1, Number 1 (April 02, 2001), pages 15-24. ['Associated Newspapers in the U.K. is one of the most technologically advanced media groups in the world. Its latest innovation is a large-scale, browser-based open-architecture system for creating and managing editorial and advertising content for print and Web production. We delve into the system and explain how Associated Newspapers has made it work.'] "The U.K. newspaper chain has integrated a production system combining Lotus Notes for the writers and QPS for page makeup (with an XML XTension in between), along with high-speed networks to support printing at 22 plants. Its Web strategy: Invest in markets, not titles... Associated Newspapers has for more than ten years been regarded as technologically the leading newspaper operation in the world; it routinely uses the latest technology to enhance its publishing processes, to reduce costs, and to open up new market areas... Associated Newspapers has begun to implement a structure that will automatically put all its published content into what it calls 'Smart Media,' which is an XML format that will allow any form of publishing to be carried out easily... To implement such a strategy required all media data (i.e., all content) to be in a format that was reusable. This, in turn, called for a fundamental change in the way content was generated and managed. It meant having a database-centric approach that would allow the different data outlets to be able to access each other's content. Any new system would therefore have to have an open structure. Associated Newspapers had become a major user of Unix-based servers and had established a strong relationship with Sun. It decided to standardize on Sun Solaris servers for all applications. It also saw that TCP/IP protocols for communication were going to become the standard format for computers and users to communicate with each other... The move to XML: In today's publishing world, XML is seen as the de-facto standard format for storing reusable content. But declaring XML to be the required format is one thing; implementing it is another, particularly where existing systems for complex print pagination are in use. In fact, today neither Quark XPress nor Adobe InDesign, the two preferred editorial pagination programs, support input or output of XML data as a standard facility. This means that if the content-creation system stores data in XML format, it will need to be translated before going to pagination. It also means that if the final editing of content is done in the pagination system, then the pagination system has to find a way to export XML data back into the content repository... Associated uses a Quark Xtension, called Story Manager, that was developed by PCS, the leading QPS integrator in Europe. Story Manager is more than an XML converter. It assists the editorial staff in planning and laying out articles through templating. Templates are created that define all of the elements of an article and, in addition, specify a shape: number of columns, column dimensions, gutters, etc. Each element has an underlying XML structure, though that is rarely seen by a user... In operation, Story Manager polls the Notes output baskets and brings their contents into QPS. When a story is brought across from Notes, it automatically maps all elements of the copy to QPS elements, based on the style information it picked up. Later, when a publication has been made up and sent off to the production systems, the contents are exported to XML elements. This is done by taking each element within the article (as defined by the template) and creating an XML-tagged file, referred to as 'Smart Media,' for subsequent storage and reuse. This approach fits Associated's current methods of working, where pages are made up for print first and only later repurposed for any of the Internet pages handled by Associated. The creation of Smart Media can also be done directly from the Lotus Notes database. XML generation is built into version 5.0 of Notes. We were told that Notes 5.0 is now being implemented throughout the organization." Also available in PDF format.

  • [April 13, 2001] "OpenMarket rounds out its product line. Content Server Enterprise Edition offers far-reaching support." By Luke Cavanagh. In Seybold Report: Analyzing Publishing Technology [ISSN: 1533-9211] Volume 1, Number 1 (April 02, 2001), pages 4, 36. "Since its surprising 1999 acquisition of then-budding content-management vendor FutureTense, OpenMarket has been a company in painful transition. On the positive side, it has done well to expand the old FutureTense customer base from about 40 customers at the time of the merger to more than 300 today. It also has shown a strong commitment to the technology, developing FutureTense's Content Server into one of the most highly-regarded content-management platforms on the market today... The updated OpenMarket Product Suite, released in March, includes three major feature enhancements: Support for IBM's WebSphere application server; a new XML document exchange; and a seventh server product called the Marketing Studio. The newly added support for IBM's WebSphere Application Server means that Open Market's products now are suited to run natively on three different application servers. The other two are BEA WebLogic and iPlanet Application Server (formerly Netscape Internet Application Server)... The XML Document exchange, a new feature contained in the Integration Centre module, allows a way to handle incoming and outgoing documents tagged in XML for use in the main Content Server. The exchange is targeted at business to business applications, as it contains a set of standard b-to-b Document Type Definitions (DTDs) as defined by the Open Applications Group. Some development work will be required in order to extend these DTDs beyond their preset configuration. There is not a graphical user interface for doing so. This addition plugs into the system's Power Assets Information Architecture, which stores assets in a relational database with XML metadata for reuse across multiple outputs. It supports various, extensible classes of information objects that can be defined by non-technical users after installation. Developers can do this by using the Asset Maker; non-technical users can do it through the Flex Assets interface."

  • [April 13, 2001] "Seybold Conference to Focus On Content, Digital Rights Management." By James Evans. In InfoWorld (April 11, 2001). "Members of the digital printing, publishing, and creative communities will converge on Boston this week [April 8-13, 2001] for Seybold Seminars, a conference that will focus this year on desktop and Web publishing, digital rights and document management, and e-books... A strong emphasis also will be put on XML tools to assist with getting content on the Web. Thirty-eight companies are listed as exhibiting HTML, SGML (Standard Generalized Markup Language), and XML products and tools. E-books are also maturing gradually, and Palm and GoReader will have handheld reader devices on display at the show. Collaboration will also be a focus, with companies assisting in ways to maximize the use of existing content, text, pictures, and other information, Gable said... One of the larger vendors exhibiting this year is Adobe. It will have its new Acrobat 5.0 software on display, which lets users convert files into PDF, and is expected to demonstrate its forthcoming three-dimensional product for the Web, Atmosphere. Several digital rights management companies will be on hand to showcase products that assist with distributing and protecting content, including Entrust Technologies, Authentica, and ContentGuard. . . Reciprocal will launch its Reciprocal Storefront, which supports the distribution of content that has been packaged using DRM (digital rights management) technologies. .. North Atlantic Publishing Systems will show its new NAPS Translation System, which assists with conversion from Quark files and Microsoft Word/RTF to XML and from XML to HTML. With the NAPS Translation System, a publication can reformat print media content so that it can be used on a Web site, the company said..." On DRM, see "Digital Property Rights Language (DPRL)" and "Extensible Rights Markup Language (XrML)." Also: "Open Ebook Initiative."

  • [April 13, 2001] "TREX Basics." By J. David Eisenberg. From (April 11, 2001). ['Tutorial article. TREX is an alternative schema language created by James Clark, designed to be simpler and more lightweight than W3C's XML Schema.'] "In this article, we'll explore the TREX markup language for validating XML documents, focusing on validating a subset of XMLNews-Story Markup Language. Although the XMLNews-Story markup language has been superseded by the News Industry Text Format, we use the old version because it's simple, it looks a great deal like HTML, and it lets us easily show some of TREX's features... TREX is a powerful markup language that permits you to specify how other XML documents are to be validated. As with other specification languages, you can (1) specify an element with an ordered sequence of sub-elements; (2) specify an element with a choice of sub-elements; (3) permit mixed content (text outside of tags); and (4) specify attributes for tags. Advanced features of TREX allow you to combine externally-defined grammars in highly sophisticated ways. For more information, consult James Clark's extensive TREX tutorial or the formal specification." See references in "Tree Regular Expressions for XML (TREX)."

  • [April 13, 2001] "XML Hype Down But Not Out In New York." By Edd Dumbill. From (April 11, 2001). "Signs of reality were setting in this week at XML DevCon 2001 in New York City. As vendors and professionals were feeling the pinch of the economic conditions, the cloud of dust raised by recent overmarketing was starting to settle. Web Services: 'Evolution Not Revolution'. While the vendor evangelists were still busy running around proclaiming web services from the podiums, the top dogs of the industry were more realistic about where this latest trend stood. Norbert Mikula, CTO of DataChannel, reminded the audience at a keynote panel about the lessons of recent history, giving 4GL as an example. Ten years ago we were told that programming was dead, managers could glue together components to create systems, and 4GLs would solve all our problems. That never happened, and programming is still very much with us. Likewise, Mikula warned the audience about overselling web services. Recent vendor keynotes have seemed a far cry from the not-even W3C-recommended SOAP -- audiences have learned of the car that will automatically reschedule their day and lunch appointments because it and they got stuck in traffic, order up a pair of pants from the cleaners, and so on. Instead, web services are an 'evolution, not a revolution', said Bob Sutor of IBM and OASIS, 'It's an attempt to clean things up a bit, get the lower level of our stack uniform, and free ourselves up to develop the high layers.' This cleaning up is a valuable and important move, but developers and managers should recognise it for what it is. There are plenty of higher layers that SOAP, WSDL, and UDDI leave unaddressed. In particular, panelists noted that there's a lot of out-of-band human communication involved with business transactions. If it ever arrives, at best the fully automatic conduct of business using web services is a long way away. In a later talk, Uche Ogbuji of Fourthought Inc. observed that, if anything, web services emphasized the need for human oversight of a system due to the vastly expanded (compared to traditional application architectures) range of communication partners and transaction types..." See also IBM DeveloperWorks coverage of the conference, including dedicated newsgroup with daily reports.

  • [April 13, 2001] "Top 10 Interview Questions When Hiring XML Developers." By Brian Buehling. From (April 11, 2001). [''s guide for managers faced with the task of filling positions within their organizations that require a solid understanding of the foundations of XML-related technologies.'] "As XML becomes more pervasive, hiring managers won't have to look very hard to find candidates claiming to have experience working on projects involving XML. Despite this trend, it is still not an easy task to find a truly skilled XML developer. This fact, combined with the increasing compensation awarded to job candidates, makes hiring the right people one of the most important parts of any IT project. Consequently, the list of questions below is intended to be a guide for managers faced with the task of filling positions within their organizations that require a solid understanding of the foundations of XML-related technologies. Describe the differences between XML and HTML. It's amazing how many developers claim to be proficient programming with XML, yet do not understand the basic differences between XML and HTML. Anyone with a fundamental grasp of XML should be able describe some of the main differences outlined below..."

  • [April 13, 2001] "DocBook-Based Literate Programming." By Mark Wroth. Version 1.3, April 7, 2001. 35 pages. "The DocBook-based Literate Programming system provides a mechanism to write literate programs using a minor extension of the Standard Generalized Markup Language (SGML) DocBook Document Type Definition, Document Type Declaration (DTD). The system consists of two main parts: (1) A DTD that extends DocBook to add the logic needed for literate pro-gramming. These are relatively minor extensions to the basic DTD. The details are discussed in Chapter 3. (2) Document Style Semantics Specification Language (DSSSL) style sheets that, together with a DSSSL engine that implements some of James Clark's extensions, serve as 'weave' and 'tangle' processors. These style sheets are discussed in Chapters 5 and 4, respectively. This document also discusses the design considerations behind the implementation, and provides a short sample document that serves as an example of how the DTD is used (and serves as a simple test case). ... This project creates a set of extensions to the DocBook SGML DTD to allow its use for literate programming markup. The resulting system shall (1) Provide a mechanism to extract program files from the literate programming source in appropriate forms for their use as source code in the in-tended programming language or languages. (2) Permit the use of existing DocBook-based tools with only minor modifications (ideally none) to produce documentation of software projects. It is the intention of this system to: a) Maintain the ability to update the extensions to new versions of Doc-Book as they are published. b) Make the extensions as easy as possible to to move between the SGML and XML versions. c) Make it as simple as possible to add the literate programming functionality to other DocBook-based DTDs. d) Provide a basis on which other implementations of the tangle and weave functions could be built to support other tool chains. This system performs three basic functions: (1) Provides a DTD that allows the markup of literate programs, including a exible system for describing the purpose and implementation of the computer program (based on DocBook) and markup of the program code itself to allow the literate program to produce the computer instructions. (2) A tangle mechanism that actually produces the computer instructions from the literate programming source code. This implementation, SGMLTangle.dsl is a DSSSL style sheet using extensions to the DSSSL standard as implemented in James Clark's Jade DSSSL engine. (3) A weave implementation that renders the literate programming source into useful documentation. This style specification, SGMLWeave.dsl, extends Norman Walsh's Modular DocBook Style Sheets. It provides both print and HTML output, in the style sheets print and HTML respectively." See also the posting and "An Experiment in Literate Programming Using SGML and DSSSL", Revision 0.109, December 31, 1999. References: "SGML/XML and Literate Programming." [cache article, and code]

  • [April 13, 2001] "RSA: Microsoft Outlines .Net And XP Privacy Strategy." By Stephen Lee. In InfoWorld (April 11, 2001). "Declaring 'war on hostile code' here on Tuesday, Microsoft detailed several new features for securing privacy in both current and future Microsoft products here at the RSA Conference 2001. On the Windows .NET front, XML-based user authentication technology, code-named 'HailStorm', topped the list. HailStorm will allow client-side applications and Web services to exchange user information. The Windows XP operating system, too, will feature beefed-up security. In particular, Thompson noted that PKI (Public Key Infrastructure) improvements are in the pipeline. Smart card support will further bolster XP privacy, according to Thompson. 'All the functions that an administrator needs to do can be done with smart cards, including Terminal Service sections. You can uniformly require an administrator to use smart cards. That eliminates the risk of using passwords,' he said. XP will also allow for interoperability between smart cards and EFS (electronic filing system), which will let companies accept certificates issued by other PKIs. Other XP announcements included faster SSL (Secure Sockets Layer) services, version 2 of the company's Security Configuration Wizard, which will let users configure access controls and turn off unneeded services, and an Internet Connection Firewall to allow users to safely connect directly to a network. Thompson also addressed Microsoft's upcoming Internet Explorer 6.0 browser. As the company announced on March 21, IE 6.0 will feature native support for P3P (the Platform for Privacy Preferences). P3P is a privacy standard developed by the World Wide Web Consortium that notifies users about the privacy rules used at visited Web sites..." See: "Microsoft Hailstorm."

  • [April 13, 2001] "RSA: VeriSign Touts Trust Services." By Brian Fonseca. In InfoWorld (April 11, 2001). "Laying the groundwork for its mission to move security complexity from applications to system infrastructure, VeriSign made a host of announcements at RSA Conference 2001 on Tuesday, including the launch of its next-generation Internet trust services. The services, accessible through XML interfaces as part of a utility-based managed services platform, include a managed user provisioning service called, second-generation PKI (public key infrastructure), new entitlements management, and trade settlement services., developed in conjunction with resource provisioning management vendor Access360, is a hosted service that automates connecting of customers, employees, and business partners to designated information. The service will be available this quarter, according to Anil Pereira, senior vice president and group general manager at Mountain View, Calif.-based VeriSign. VeriSign also announced that the World Wide Web Consortium (W3C) has officially recognized its XKMS (XML key management specification) developed with Microsoft and webMethods... Based on the XKMS architecture, VeriSign's revamped second-generation PKI service 'unshackles applications from issuers'..." See (1) discussion "XKMS Trust Services Specification Receives Broad Declaration of Industry Support" and (2) "XML Key Management Specification (XKMS)."

  • [April 13, 2001] "Putting XML in Context with Hierarchical, Relational, and Object-Oriented Models. [XML Matters #8.]" By David Mertz, Ph.D. (Ideationist, Gnosis Software, Inc.). From IBM developerWorks (April 2001). ['On the way to making a point about how XML is best suited to work with databases, David Mertz discusses how XML fits with hierarchical, relational, and object-oriented data modeling paradigms.'] "XML is an extremely versatile data transport format, but despite high hopes for it, XML is mediocre to poor as a data storage and access format. It is not nearly time to throw away your (SQL) relational databases that are tuned to quickly and reliably query complex data. So just what is the relationship between XML and the relational data model? And more specifically, what's a good design approach for projects that utilize both XML and relational databases -- with data transitions between the two? This column discusses how abstract theories of data models, as conceptualized by computer scientists, help us develop specific multirepresentational data flows. Future columns will look at specific code and tools to aid the transitions; this column addresses the design considerations... The problem for many XML-everywhere (and XML-only) aspirations is that at the core of an RDBMS are its relations -- in particular, the set of constraints that exists between tables. Enforcing the constraints is what makes RDBMSs so useful and powerful. While it would surely be possible to represent a constraint set in XML for purposes of communicating it, XML has no inherent mechanism for enforcing constraints of this sort (DTDs and schemas are constraints of a different, more limited sort). Without constraints, you just have data, not a data model (to slightly oversimplify matters). Some XML proponents advocate adding RDBMS-type constraints into XML; others suggest building XML into RDBMSs in some deep way. I believe that these are extremely bad ideas that arise mostly out of a "buzzword compliance" style of thinking. Major RDBMS vendors have spent many years of effort in getting relational matters right, and especially right in a way that maximizes performance. You cannot just quickly tack on a set of robust and reliable relational constraints to the representation in XML that, really, is closer to a different modeling paradigm. Moreover, the verbosity and formatting looseness of XML are, at heart, quite opposite to the strategies RDBMSs use to maximize performance (and, to a lesser extent, reliability), such as fixed record lengths and compact storage formats. In other words, go ahead and be excited by XML's promise of a universal data transport mechanism, but keep your backend data on something designed for it, like DB2 or Oracle (or on Postgres or MySQL for smaller-scale systems)." Article also in PDF format. See: "XML and Databases."

  • [April 13, 2001] "Microsoft Updates SOAP toolkit, Further Supports Web Services Standards." By Tom Sullivan. In InfoWorld (April 11, 2001). "Microsoft on Wednesday announced a new version of its SOAP (Simple Object Access Protocol) toolkit and said that the forthcoming version of Windows will natively support SOAP. Version 2.0 of the toolkit supports the latest iteration of SOAP, Version 1.1, and, perhaps more important, the emerging standard for describing Web services, WSDL (Web Services Description Language). Microsoft unveiled the toolkit at the Web Services World show here. Developers using Visual Studio 6.0 tools in conjunction with the toolkit have a means to describe Web services as well as the transport mechanism for delivering services to devices that support SOAP and XML. Programmers also can add such functionality to existing COM (Component Object Model) applications or components, according to Redmond, Wash.-based Microsoft. In a surprise to no one because Microsoft is one of the key players driving the SOAP standard, Microsoft also stated that the next-generation Windows operating system, Windows XP, will natively support SOAP, thereby making it easier for developers to build Web services and for users to access them. Microsoft is not the only one backing SOAP. Earlier this week, SilverStream Software and Cape Clear both announced that their J2EE (Java 2 Enterprise Edition) application servers now support the protocol..." See discussion and references.

  • [April 13, 2001] "Industry giants agree on XML security." By Louise Carroll. From News (April 12, 2001). "A raft of internet technology companies have announced their support for an XML security specification that has been developed by VeriSign, Microsoft and webMethods. Baltimore Technologies, Entrust, RSA Security, HP, IBM, IONA and Reuters have all now agreed to back the new specification. The technology they are promoting is the XML key management specification (XKMS), which makes it easier for the integration of advanced PKI (public key infrastructure) technologies such as digital signature handling and encryption into e-commerce applications. The spec also intends to ensure interoperability between various PKI solutions..." See discussion.

  • [April 13, 2001] "XML Query Engine. The XQEngine utility lets you perform full-text-searches across multiple files using the XML Query Language (XQL) and Java." By Piroz Mohseni. From DevX XML-Zone (April 2001). "A while ago, I was looking for an XML search utility. My application had to search a relatively large number of XML files (they were small files) on a periodic basis. The primary goal was to find out if there was a match or not, but sometimes we needed to extract the 'found' data as well. I was first driven towards XSLT and its sister XPath thinking the search problem could be mapped into a transformation and solved that way. After some experimenting, I decided I really had a search problem at hand. The comma-separated values (CSV) output I needed was not appropriate for XSLT and full-text searching was not available. I decided the XML Query Language (XQL) seemed to better address my problem. As I was looking for implementations of XQL, I came across a small utility program called XQEngine which seemed like a good fit. In this article, I'll show you how you can use XQEngine for your search needs. XQEngine [available at] is a JavaBean which uses a SAX parser to index one or more XML documents and then allows you to perform multiple searches on them. The search language is a superset of XQL which uses a syntax similar to XPath. Recently the XML Query working group at W3C released several new working documents so the language most definitely will change in the future... Searching XML documents continues to be an evolving challenge. Database vendors are now offering XML support and there are a number of new XML store and search solutions. XQEngine offers an effective search solution when simplicity is of more concern than scalability. It has several useful configurations and can return the search results in various formats." [From 'XML Query Engine (XQEngine for short) is a full-text search engine component for XML. It lets you search small to medium-size collections of XML documents for boolean combinations of keywords, much as web-based search engines let you do for HTML. Queries are specified using XQL, a de facto standard for querying XML documents that is nearly identical to the simplified form of XPath. Queries expressed in XQL are much more expressive and powerful than the standard search interfaces available through web-based search engines. XML Query Engine is a compact (roughly 160K), embeddable component written in Java. It has a straightforward programming interface that lets you easily call it from your own Java application. The engine should work well as a personal productivity tool on an individual desktop, as part of a CD-based application, or on a server with low to medium-volume traffic.'] See "XML and Query Languages."

  • [April 11, 2001] "The Semantic Web. A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities." By Tim Berners-Lee, James Hendler, and Ora Lassila. In Scientific American Volume 284, Number 5 (May, 2001), pages 34-43. Cover story title: 'Get the Idea? Tomorrow's Web Will.' "Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully. Computers can adeptly parse Web pages for layout and routine processing -- here a header, there a link to another page -- but in general, computers have no reliable way to process the semantics: this is the home page of the Hartman and Strauss Physio Clinic, this link goes to Dr. Hartman's curriculum vitae. The Semantic Web will bring structure to the meaningful content of Web pages, creating an environment where software agents roaming from page to page can readily carry out sophisticated tasks for users. Such an agent coming to the clinic's Web page will know not just that the page has keywords such as 'treatment, medicine, physical, therapy' (as might be encoded today) but also that Dr. Hartman works at this clinic on Mondays, Wednesdays and Fridays and that the script takes a date range in yyyy-mm-dd format and returns appointment times. And it will 'know' all this without needing artificial intelligence on the scale of 2001's Hal or Star Wars's C-3PO. Instead these semantics were encoded into the Web page when the clinic's office manager (who never took Comp Sci 101) massaged it into shape using off-the-shelf software for writing Semantic Web pages along with resources listed on the Physical Therapy Association's site. The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation. The first steps in weaving the Semantic Web into the structure of the existing Web are already under way. In the near future, these developments will usher in significant new functionality as machines become much better able to process and 'understand' the data that they merely display at present... Two important technologies for developing the Semantic Web are already in place: eXtensible Markup Language (XML) and the Resource Description Framework (RDF). XML lets everyone create their own tags -- hidden labels such as or that annotate Web pages or sections of text on a page. Scripts, or programs, can make use of these tags in sophisticated ways, but the script writer has to know what the page writer uses each tag for. In short, XML allows users to add arbitrary structure to their documents but says nothing about what the structures mean . Meaning is expressed by RDF, which encodes it in sets of triples, each triple being rather like the subject, verb and object of an elementary sentence. These triples can be written using XML tags. In RDF, a document makes assertions that particular things (people, Web pages or whatever) have properties (such as 'is a sister of,' 'is the author of') with certain values (another person, another Web page). This structure turns out to be a natural way to describe the vast majority of the data processed by machines. Subject and object are each identified by a Universal Resource Identifier (URI), just as used in a link on a Web page. (URLs, Uniform Resource Locators, are the most common type of URI.) The verbs are also identified by URIs, which enables anyone to define a new concept, a new verb, just by defining a URI for it somewhere on the Web... this is not the end of the story, because two databases may use different identifiers for what is in fact the same concept, such as zip code. A program that wants to compare or combine information across the two databases has to know that these two terms are being used to mean the same thing. Ideally, the program must have a way to discover such common meanings for whatever databases it encounters. A solution to this problem is provided by the third basic component of the Semantic Web, collections of information called ontologies. In philosophy, an ontology is a theory about the nature of existence, of what types of things exist; ontology as a discipline studies such theories. Artificial-intelligence and Web researchers have co-opted the term for their own jargon, and for them an ontology is a document or file that formally defines the relations among terms. The most typical kind of ontology for the Web has a taxonomy and a set of inference rules... The real power of the Semantic Web will be realized when people create many programs that collect Web content from diverse sources, process the information and exchange the results with other programs; the Semantic Web promotes this synergy: even agents that were not expressly designed to work together can transfer data among themselves when the data come with semantics." See: (1) W3C Semantic Web Advanced Development, (2) "XML and 'The Semantic Web'" and (3) "Resource Description Framework (RDF)."

  • [April 11, 2001] "The Big Bang. As Covisint attests, building the"smarter enterprise" begins by replacing supply chain friction with harmony." By Kevin Vasconi (Covisint CTO). "Depending on your viewpoint, Covisint represents either the unlikeliest strategic alliance in business management history or the inevitable conclusion to the evolution of the traditional supply chain. Perhaps both. Formed in early 2000 by the merger of Ford Motor Co., General Motors Corp., and DaimlerChrysler's two previously independent efforts to establish an Internet-based B2B exchange for the automotive industry (Nissan/Renault has since signed on as well), Covisint has a goal of staggering ambition: to serve as the common procurement and collaborative manufacturing and design platform for $300 billion worth of business. In essence, the company views itself as an open integration framework that provides a portal interface into the supply chain for OEMs, their suppliers, and even other exchanges. For that reason, Covisint may signify the "big bang" of business process integration: the first glimpse of what the post-Fordist, agile demand network will look like. [Excerpts:] "our first goal at Covisint was to create an open integration framework. Unlike other exchanges, which built their architectures on a single technology provider or application, we determined from the start that our solution has to be big, scalable, secure, and capable of integrating with just about any software release that has ever existed. Since then, we've proven that our architecture is a good integration framework. By the middle of this year, we'll have a secure, deterministic XML [Extensible Markup Language] dial tone in place between us and all our founding OEM partners upon which we can integrate multiple applications running on disparate operating systems, provide a single view to the ultimate customer, and offer common services such as security registration, monitoring, and transformation routing and workflow in the back end. That's a compelling value proposition to our customers, who don't care what kind of technology we run at the end of the day. What they're trying to do is solve business problems... Basically, if you're a supplier that wants to get involved, you simply have to ensure you have a secure network connection and train the businesspeople who are going to use the system. We'll take care of the integration problem, make sure the software's up and running, and process your transactions or convert your EDI stream into XML... But here's the problem: Covisint is just one among many exchanges based on an XML messaging bus. As you know, there are at least five competing standards out there. 'XML standard' is definitely an oxymoron... We don't view ourselves as a standards organization; in fact, we are absolutely committed to not creating an XML standard. We do need a standard for the industry, however, and ideally, I'd like to see an XML standard across industries. So we're having what I consider all the appropriate conversations with RosettaNet and the other standards bodies. Now the knock on a lot of standards committees is that they're too slow and often impractical. That's what we're trying to add to the equation: We're trying to be the guys out here saying, 'Hey, we're trying to implement this standard in production for real customers, and these are the things you need to do to make that a standard across our industry. And by the way, guess what? We only want one of them.' In the short term, we've had to pick and choose among the standards that are out there. For now, we think ebXML is the way to go... [IE: Will Covisint XML messages emulate paper documents, or will they reflect a true data modeling implementation?] Vasconi: We're not quite sure. Some of the XML standards are definitely taking the paper-based approach, but that's not really very robust. You're probably going to see a combination of both approaches. Ideally, we would love to have XML schema based on a data model..."

  • [April 11, 2001] "XML Will Make It Easier. Data integration and reuse possibilities are wide open but not yet very well realized." By Ralph Kimball. In Intelligent Enterprise Volume 4, Number 6 (April 16, 2001), pages 26-28. "By now, most of us in the data warehouse community have heard of Extensible Markup Language (XML). We have been told that it will substantially extend HTML for much of our Web communications, and it will make the content of these communications far more transparent. That all sounds fine. But what is XML? What will it really do for us? Where XML Will Make Life Easier? XML is intended to be the 'dial tone' for computers to exchange information. Although XML is independent from the Web, most of the interesting uses will piggyback on the Web's hypertext transfer protocol (HTTP). Here, in my opinion, are the places where XML will affect data warehousing: legacy data extraction, input transaction capture, direct storage of XML, and agnostic front-end information delivery. (1) Legacy data extraction. If a data source can expose its data in XML format, then parties sharing a common XML schema can transmit and receive the data. Although older legacy applications may need to be retrofitted with XML writers, many more modern systems will be equipped to write data in XML format. Relational databases, such as Oracle, DB2, and Microsoft SQL Server 2000 already support query output and bulk data transfer directly in XML form, with no intermediate application required... (3) Input transaction capture. In many cases, the data warehouse is focused on capturing the original transaction inputs and using them as the source of data, rather than waiting for an extract or a query from the production system. This transaction capture capability has been growing in importance with the increased interest in realtime data warehouses. A serendipitous side effect of the growing use of XML is that the transaction flow from a data input terminal to the production system can be siphoned off so that the data warehouse receives the transactions in parallel with the production system... (3) Direct storage of XML. One of the more interesting points of overlap between XML and relational databases is the possibility of storing XML documents directly as relations. Direct storage can take a couple of forms. An XML document can be a kind of replacement for a SQL INSERT or UPDATE statement, where the data in the XML document ends up in relational tables... (4) Agnostic front-end information delivery. The widespread deployment of XML will be the final step in removing query and reporting tools from end users' machines. An XML data transfer plus an associated XSLT formatting specification may be enough to produce any desired user interface presentation on a remote browser.... There are dozens of subcommittees, vendors, and industry groups working on various aspects of XML, DTDs, and myriad other standards. But be aware that many of these standards are part of the plumbing of XML, and are of more interest to software developers in the vendor community than to data warehouse project managers..."

  • [April 10, 2001] "Networked Knowledge Representation and Exchange using UML and RDF." By Stephen Cranefield (Department of Information Science, University of Otago PO Box 56, Dunedin, New Zealand). In Journal of Digital Information (JoDI) Volume 1, Issue 8 (February 15, 2001) [Themes: Information management, Information discovery.] "This paper proposes the use of the Unified Modeling Language (UML) as a language for modelling ontologies for Web resources and the knowledge contained within them. To provide a mechanism for serialising and processing object diagrams representing knowledge, a pair of XSLT stylesheets have been developed to map from XML Metadata Interchange (XMI) encodings of class diagrams to corresponding RDF schemas and to Java classes representing the concepts in the ontologies. The Java code includes methods for marshalling and unmarshalling object-oriented information between in-memory data structures and RDF serialisations of that information. This provides a convenient mechanism for Java applications to share knowledge on the Web... [Conclusions:] This paper has illustrated the use of UML for representing ontologies and knowledge about particular instances in the domains modelled by those ontologies. As a widely known and supported modelling language, UML has great potential for describing Web resources in a machine accessible way. Although the XMI specification defines a standard way of serialising UML models, this does not provide a convenient way of serialising knowledge in the form of object diagrams. As a solution to this problem, a pair of mappings from XMI encodings of UML class diagrams to Java classes and to RDF schemas have been defined and implemented using XSLT. The Java classes include code to marshal and unmarshal knowledge expressed as in-memory representations of object diagrams to and from RDF documents in the XML encoding. There is considerable interest in developing bridges between standards from the domains of object-oriented modelling, distributed object computing and Internet computing (Jagannathan and Fuchs 1999). It is hoped that the specification of mappings such as the ones described here will be addressed by a recognised industry standards body. To aid this process, the work described here will be made publicly available at" See "Resource Description Framework (RDF)."

  • [April 08, 2001] "Schematron: Validating XML Using XSLT." By Leigh Dodds. Paper presented at the XSLT UK Conference (2001) (Keble College, Oxford, England). "Schematron [Schematron] is a structural based validation language, defined by Rick Jelliffe, as an alternative to existing grammar based approaches. Tree patterns, defined as XPath expressions, are used to make assertions, and provide user-centred reports about XML documents. Expressing validation rules using patterns is often easier than defining the same rule using a content model. Tree patterns are collected together to form a Schematron schema. Schematron is a useful and accessible supplement to other schema languages. The open-source XSLT implementation is based around a core framework which is open for extension and customisation. This paper introduces the Schematron language and the available implementations. An overview of the architecture, with a view to producing customised versions is also provided... Schematron is unique amongst current schema languages in its divergence from the regular grammar paradigm, and its user-centric approach. Schematron is not meant as a replacement for other schema languages; it is not expected to be easily mappable onto database schemas or programming language constructs. It is a simple, easy to learn language that can perform useful functions in addition to other tools in the XML developers toolkit. It is also a tool with little overhead, both in terms of its learning curve and its requirements. XSLT engines are regular components in any XML application framework. Schematrons use of XPath and XSLT make it instantly familiar to XML developers. A significant advantage of Schematron is the ability to quickly produce schemas that can be used to enforce house style rules and, more importantly, accessibility guidelines without alteration to the schema to which a document conforms. An XHTML document is still an XHTML document even if it does not meet the Web Accessibility Initiative Guidelines [WAI]. These kind of constraints describe a policy which is to be enforced on a document, and can thus be layered above other schema languages. Indeed in many cases it may be impossible for other languages to test these kinds of constraints. See: "Schematron: XML Structure Validation Language Using Patterns in Trees." [cache]

  • [April 06, 2001] "Introduction to CSS3." Edited by Eric A. Meyer and Bert Bos (W3C). W3C Working Draft, 6-April-2001. Latest version URL: This WD updates the previous version of 2001-01-19. Abstract: "The members of the CSS&FP Working Group have decided to modularize the CSS specification. This modularization will help to clarify the relationships between the different parts of the specification, and reduce the size of the complete document. It will also allow us to build specific tests on a per module basis and will help implementors in deciding which portions of CSS to support. Furthermore, the modular nature of the specification will make it possible for individual modules to be updated as needed, thus allowing for a more flexible and timely evolution of the spcification as a whole. This document lists all the modules to be contained in the future CSS3 specification." Description: "As the popularity of CSS grows, so does interest in making additions to the specification. Rather than attempting to shove dozens of updates into a single monolithic specification, it will be much easier and more efficient to be able to update individual pieces of the specification. Modules will enable CSS to be updated in a more timely and precise fashion, thus allowing for a more flexible and timely evolution of the spcification as a whole. For resource constrained devices, it may be impractical to support all of CSS. For example, an aural browser may be concerned only with aural styles, whereas a visual browser may care nothing for aural styles. In such cases, a user agent may implement a subset of CSS. Subsets of CSS are limited to combining selected CSS modules, and once a module has been chosen, all of its features must be supported." Section 2 of the WD displays the modules: "Module Overview: All modules contain a 'Conformance: Requirements and Recommendations' section. Any module whose table row is backed with green is considered part of the 'CSS Core.' The listed deadlines (backed in red) represent the time at which a module should be ready for Working Draft publication. There are also columns which indicate a module's participation in each of three 'profiles': HTML Basic, CSS3, and SVG. A module without any indicated module participation is at risk of being dropped from CSS3 before it reaches Proposed Recommendation status. A module without a listed editor is backed in yellow, and is in serious danger of being dropped..." See the W3C CSS web page and "W3C Cascading Style Sheets."

  • [April 06, 2001] "Generating Synthetic Complex-structured XML Data." By Ashraf Aboulnaga, Jeffrey F. Naughton, and Chun Zhang (Computer Sciences Department, University of Wisconsin - Madison, WI). Email: {ashraf,naughton,czhang } Submitted for publication. "Synthetically generated data has always been important for evaluating and understanding new ideas in database research. In this paper, we describe a data generator for generating synthetic complex-structured XML data that allows for a high level of control over the characteristics of the generated data. This data generator is certainly not the ultimate solution to the problem of generating synthetic XML data, but we have found it very useful in our research on XML data management, and we believe that it can also be useful to other researchers. Furthermore, we hope that this paper starts a discussion in the XML community about characterizing and generating XML data, and that it may serve as a first step towards developing a commonly accepted XML data generator for our community... We are aware of three proposals for generating synthetic XML data. Florescu and Kossman use synthetic XML data to evaluate different strategies for storing XML in relational database systems. The XML data they use consists of elements at one level with no nesting. The elements are randomly connected in a graph structure using IDREF attributes. This graph-structured view of XML data is useful in some contexts, but XML data is by nature tree structured, and it may often be useful to have a tree-structured view of this data. For example, the important notion of 'element containment' only applies to tree-structured XML data. Furthermore, the data generation process of Daniela Florescu and Donald Kossmann ["Storing and querying XML data using an RDBMS"] has very few opportunities for varying the structure and distribution of the generated data. In [Timo Böhme and Erhard Rahm, "XMach-1: A benchmark for XML data management"], a benchmark for evaluating the performance of XML data management systems is proposed. The data used by this benchmark is synthetic XML data that models a database of structured text documents and a directory of these documents. The structure of the data is fixed and simple, and there is very little opportunity for varying it. IBM provides a data generator that generates XML data that conforms to an input DTD. Like the previous two approaches, the IBM data generator is restrictive in the control it provides over the data generation process. For example, we can specify a maximum number of levels for the generated XML documents, but we cannot dictate that these documents have exactly this number of levels. Other restrictions include using only uniform frequency distributions with no opportunity for generating skewed data. In contrast to these proposals for generating synthetic XML data, our data generator can generate much more com-plex data, and it provides much more control over the characteristics of the generated data. Nevertheless, it may be possible to use ideas from these proposals to extend our data generator. For example, IDREF attributes may be used to connect the elements of the generated documents as in [Florescu/Kossmann]... Conclusions: In this paper, we presented a data generator for generating complex-structured synthetic XML data, which can be of great use to researchers in XML data management. The data generator has several input parameters that control the characteristics of the generated data. The parameters all have simple and intuitive meanings, so it is easy to understand the structure of the generated data and to set the parameters that are not important for a specific usage situation to reasonable default values. The data generator is publicly available from It can easily be extended and modified to allow for different methods of data generation not covered in this paper. While we think our data generator and the ideas it incorporates are useful, our goal at this point is definitely not to claim that it is a finished product. Rather, our goal in writing this paper is to initiate a discussion in the research community with the eventual goal of developing a shared synthetic XML data generation resource." See: XML Synthetic Data Generator

  • [April 06, 2001] "X-Diff: A Fast Change Detection Algorithm for XML Documents." By Yuan Wang, David J. DeWitt, and Jin-Yi Cai. From the Niagara Project. Submitted for publication. 23 pages. "Over the next several years XML is likely to replace HTML as the standard web publishing language and data transportation format. Since online information changes frequently, being able to quickly detect changes in XML documents is important to Internet query systems, search engines, and continuous query systems. Previous work in change detection on XML or other hierarchically structured documents used an ordered tree model, in which left-to-right order among siblings is important and it affects the change result. In this paper, we argue that an unordered model (only ancestor relationships are significant) is more suitable for most database applications. Using an unordered model, change detection is substantially harder than using ordered model, but the change result that it generates is more accurate. We propose X-Diff, a fast algorithm that integrates key XML structure characteristics with standard tree-to-tree correction techniques. We also analyze the algorithm and study its performance... [Summary and Future Work:] X-Diff is motivated by the problem of efficiently detecting changes to XML documents on the web. Previous work in change detection on XML or other hierarchically structured data] used the ordered-tree model. In this paper, we argue that using the unordered-tree model is more suitable for most database and web applications, although it is substantially harder than using the ordered-tree model. We study the XML domain characteristics and introduce several key notions, such as node signature, and XHash. Using these techniques in combination with standard tree-to-tree correction technique, we propose a fast algorithm for computing the difference between two versions of an XML document. We present and analyze the algorithm, and also show some preliminary performance results. We have implemented X-Diff as a stand-alone tool and also used it as a component of an Internet query system. More details about X-Diff can be found in [ref]. We are working on improving the performance of our algorithm. We also plan to develop a faster variant of the algorithm capable of producing 'near optimal' results on very large documents. Other interesting future work includes change detection on XML data streams and incremental index update on change of XML documents."

  • [April 06, 2001] "The Design and Performance Evaluation of Alternative XML Storage Strategies." By Feng Tian, David J. DeWitt Jianjun Chen, and Chun Zhang (Department of Computer Science University of Wisconsin, Madison). Submitted for publication. 26 pages. "XML is an emerging Internet standard for data representation and exchange. When used in conjunction with a DTD (Document Type Definition) XML permits the execution of a rich collection of queries using a query language such as XML-QL or Quilt. This paper describes six strategies for storing XML documents including one that leaves documents in the file system, three that use a relational database system, and two that use an object manager. Each approach was implemented and evaluated using a number of different Quilt queries. A number of interesting insights were gained from these experiments and a summary of the advantages and disadvantages of each of the six approaches is presented. ... This paper describes six alternative ways of storing XML documents: one that employs text files stored in the file system, three that use a relational database system, and two that use an object manager. These alternatives are evaluated using different queries representing both navigational and associative query workloads. A navigational workload can be generated from requests to an XML server from either a web browser or database query engine. It can also be generated from database queries since XML data is usually modeled as a labeled graph and queries on XML documents generally involve navigation on the graph. On the other hand, many database queries involve selection predicates that test the contents of an element for a particular value or range of values. An index is indispensable for executing this type of query. With an index, the query can directly access the relevant elements without having to repeatedly traverse the tree from document's root node. Both types of workloads need to be supported efficiently. [...] Our results clearly indicate that DTD information is vital to achieve good performance and compact data representation. When DTD is available, the DTD approach has more compact data representation and excellent performance across different datasets and different queries. We conclude DTD approach is the best strategy among the six approaches we studied and there is no clear need to build an 'XML-specific' database system. On the other hand, there are applications that need to handle XML files without DTDs or XML files used as a Markup Language. When DTD has cycles, a path express in Quilt will be translated into recursive SQL queries. Our results showed object storage manager based approaches can out perform relational approach on fixed point evaluation. With proper indices, the Text approach can achieve similar performance to the object manager based approaches. However, the cost of maintaining indices will make this approach only useful when update frequency is low."

  • [April 06, 2001] "Estimating the Selectivity of XML Path Expressions for Internet Scale Applications." By Ashraf Aboulnaga, Alaa R. Alameldeen, and Jeffrey F. Naughton. (Computer Sciences Department, University of Wisconsin). Submitted for publication. 24 pages. "Data on the Internet is increasingly presented in XML format. This allows for novel applications that query all this data using some XML query language. All XML query languages use path expressions to navigate through the tree structure of the data. Estimating the selectivity of these path expressions is therefore essential for optimizing queries in these languages. In this paper, we propose two techniques for capturing the structure of complex large-scale XML data as would be handled by Internet-scale applications in a small amount of memory for estimating the selectivity of XML path expressions: summarized path trees and summarized Markov tables. We experimentally demonstrate the accuracy of our proposed techniques, and explore the different situations that would favor one technique over the other. We also demonstrate that our proposed techniques are more accurate than the best previously known technique. [...] Our approach to estimating the selectivity of XML path expressions is to construct a tree representing the structure of the XML data, which we call the path tree. We then summarize this tree to ensure that it fits in the available memory by deleting low-frequency nodes in the tree and replacing them with generic nodes containing more coarse-grained information about the deleted nodes. We also propose an alternate approach in which we store all paths in the data up to a certain length in a table of paths that we call the Markov table. We summarize the Markov table so that it fits in the available memory and use the summarized information for selectivity estimation. The paths of limited length in the Markov table are combined to estimate the selectivity of longer paths. The best previously known techniques that can be applied to this problem were developed by Chen et al. in [Chen/Jagadish 'Counting twig matches in a tree']. The authors of this paper actually considered a more general problem, considering both branching path expressions and specific data values found at the ends of the path expressions (rather than navigating based only on the structure of the XML data). We restricted the data structures developed in [Chen/Jagadish] to the simpler problem of estimating the selectivity of path expressions by storing the minimum amount of information needed for selectivity estimation and no information about values or path correlations. We found that our data structures were able to give much more accurate selectivity estimates with significantly less memory for this simpler case. The rest of this paper is organized as follows. Section 2 presents an overview of related work. Section 3 describes path trees and their summarization. Section 4 describes Markov tables. Section 5 presents an experimental evaluation of the proposed techniques... we present experiments on one synthetic data sets and one real data set. The synthetic data set has 100,000 XML elements. Its unsummarized path tree has 3197 nodes and 6 levels. The average fanout of the internal nodes of this tree is 4.6. The frequencies of the nodes of the path tree follow a Zipfian distribution with skew parameter z = 1 [Zip49]. The Zipfian frequencies are assigned in ascending order to the path tree nodes in breadth first order (i.e., the root node has the lowest frequency and the rightmost leaf node has the highest frequency). 50% of the internal nodes of this path tree have repeated tag names. This introduces some 'Markovian memory' in the data. For example, if two internal nodes of the path tree have tag name A, and one of the A nodes has a child node B while the other does not, then if we are at a node A, whether or not this node has a child B will depend on which A node this is, which in turn depends on how we got to this node from the root node. The real data set is an XML representation of the DBLP bibliography database. This dataset has 1,399,765 XML elements. Its unsummarized path tree has 5883 nodes and 6 levels. We also experimented with many other real and synthetic data sets with varying complexities and data distributions..." See also "Following the Paths of XML Data: An Algebraic Framework for XML Query Evaluation." By Leonidas Galanis, Efstratios Viglas, David J. DeWitt, Jeffrey. F. Naughton, and David Maier.

  • [April 06, 2001] "Speech Recognition Heads to Web via XML." By R. Colin Johnson. In EE Times (March 27, 2001). "The Internet's Extensible Markup Language (XML) has been adapted to voice-command technology by SpeechWorks International Inc. The company's SpeechGenie system is designed to integrate speech recognition into Web-based telephony applications. Based on technology from VoiceGenie Technologies Inc., SpeechGenie demonstrates how voice extensions to XML enable speech-recognition systems to access Web-based information. The product was formed by grafting VoiceGenie's VoiceXML Gateway onto SpeechWorks' speech-recognition engine and its Speechify text-to-speech engine. The advent of VoiceXML, now endorsed by the necessary World Wide Web standards committees, offers the opportunity for creating open systems like SpeechWorks' proprietary SpeechSite. Part of a fully customizable set of standard speech-enabled e-business solutions, SpeechSite is a prepackaged self-service telephone application that greets callers, routes their calls and responds to spoken requests 24 hours a day for Federal Express, America Online and other brand-name clients. By integrating VoiceGenie's VoiceXML Gateway with SpeechSite's resources, SpeechGenie extends the speech-enabled business solution into open-systems territory, the company said... VoiceGenie built its VoiceXML Gateway -- the server that answers requests from applications and speech-recognition engines -- by creating an XML schema. XML schemas express shared vocabularies, thereby allowing computers to carry out instructions crafted by programmers within a structured syntax and a semantics customized for their application. Portability is ensured by explicit platform abstractions within the syntax of VoiceXML, and by separate support of popular audio formats, speech grammar formats and user-interaction schemes -- all of which can be platform-independent. VoiceXML incorporates all the control flow mechanisms, such as if-then branches, of a computer language, and allows for the separation of service-side logic from user-interaction behaviors. The computational and database operators are not intended for heavy use, but routinely depend on external resources when heavy computation is needed. SpeechGenie provides an open-systems solution yet integrates the proprietary SpeechWorks automated speech-recognition and Speechify text-to- speech engines. As an integrated platform, SpeechGenie enables standard speech-driven applications to access Web-based information, conduct online transactions and manage personal communications such as e-mail and voice-activated dialing. SpeechGenie also handles operations administration and management such as user access, platform status and caching. Scheduled for availability in the second quarter, SpeechGenie comes with Genie Tools, a set of VoiceXML-based DialogModules in the form of prepackaged application building blocks. SpeechWorks provides self-service e-business speech-recognition software including its flagship SpeechWorks, SpeechSite, Speechify and SpeechSecure authentication products. Applications made from these tools can direct customer phone calls, obtain information and complete transactions by speaking over any phone." See "VoiceXML Forum."

  • [April 06, 2001] "A Web Services Primer." By Venu Vasudevan. From April 04, 2001. ['A review of the emerging XML-based web services platform, examining the core components of SOAP, WSDL and UDDI.'] "... Viewed from an n-tier application architecture perspective, the web service is a veneer for programmatic access to a service which is then implemented by other kinds of middleware. Access consists of service-agnostic request handling (a listener) and a facade that exposes the operations supported by the business logic. The logic itself is implemented by a traditional middleware platform. So what is the web service platform? The basic platform is XML plus HTTP. HTTP is a ubiquitous protocol, running practically everywhere on the Internet. XML provides a metalanguage in which you can write specialized languages to express complex interactions between clients and services or between components of a composite service. Behind the facade of a web server, the XML message gets converted to a middleware request and the results converted back to XML. Wait a minute, you say, access and invocation are only the bare bones, that would be like saying CORBA is only IDL plus remote procedure calls. What about the platform support services -- discovery, transactions, security, authentication and so on -- the usual raft of services that make a platform a platform? That's where you step up to the next level. The Web needs to be augmented with a few other platform services, which maintain the ubiquity and simplicity of the Web, to constitute a more functional platform. The full-function web services platform can be thought of as XML plus HTTP plus SOAP plus WSDL plus UDDI. At higher levels, one might also add technologies such as XAML, XLANG, XKMS, and XFS -- services that are not universally accepted as mandatory. Below is a brief description of the platform elements. It should be noted that while vendors try to present the emergent web services platform as coherent, it's really a series of in-development technologies. Often at the higher levels there are, and may remain, multiple approaches to the same problem. (1) SOAP - remote invocation; (2) UDDI - trader, directory service; (3) WSDL - expression of service characteristics; (4) XLANG/XAML - transactional support for complex web transactions involving multiple web services; (5) XKMS [XML Key Management Specification] - ongoing work by Microsoft and Verisign to support authentication and registration..." (1) Web Services Description Language (WSDL); (2) Simple Object Access Protocol (SOAP); (3) Universal Description, Discovery, and Integration (UDDI); (4) [W3C] XML Protocol; (5) WEBDAV (Extensions for Distributed Authoring and Versioning on the World Wide Web; (6) XML Key Management Specification (XKMS).

  • [April 06, 2001] "ebXML Ropes in SOAP." By Alan Kotok. From April 04, 2001. ['Our report on the latest happenings in ebXML covers their adoption of SOAP, and takes stock as ebXML nears the end of its project.'] "The Electronic Business XML (ebXML) project released three more technical specifications for review on 28-March-2001, including a new draft document on messaging services. This part of ebXML -- formerly known as transport, routing, and packaging -- had made more early progress than the other technical features, but it also came under more pressure to include the work of other initiatives, specifically the Simple Object Access Protocol (SOAP). Enhancements to the original SOAP specification made it easier for ebXML to join forces. But it also marked something of a change in operation for ebXML, now more willing to make accommodations with other related initiatives in order achieve its goal of a single worldwide e-business standard... SOAP's importance extends beyond its definition of an XML-based message protocol. Several other e-business specifications based on XML -- most notably BizTalk and Universal Description, Discovery and Integration (UDDI) -- use SOAP for its messaging functions. ... The technical architecture specifications approved by the ebXML plenary at its mid-February meeting in Vancouver, Canada, provide a technical map for the other ebXML project teams in developing the details of the technology. The document also provides a look ahead into the end game for the initiative. Part of that strategy includes the critical ebXML registry specifications. Companies will have most of their early encounters with ebXML through the registries, as companies download business processes, list their capabilities to conduct e-business, and search for potential trading partners. At the same time as the release of the messaging specifications, ebXML also released for review two draft specifications for registries: one for the registry services and the other for the registry information model. The registry services document spells out the functions and operations of ebXML registries, while the information model details how registries organize and index the data they represent. Also on 28 March ebXML released for review the draft trading partner profile and agreement specifications. The ebXML specifications carve out specific technically-oriented functions for trading partner information stored in registries (profiles) and the rules of the road for conducting e-business (agreements). As a result, ebXML uses the terms collaboration-protocol profiles and collaboration-protocol agreements to distinguish them from the more comprehensive trading partner profiles and agreements that contain much more than technical details. EbXML expects to finish its technical specifications in May 2000 at its last meeting in Vienna, Austria. At that time, the participants plan to take up the business process and core components specifications, the last two pieces of the technology still in development." For bibliographic details and other description, see (1) "ebXML Specifications Completed and Submitted for Quality Review Process", (2) "ebXML Integrates SOAP Into Messaging Services Specification." General references: (1) "Electronic Business XML Initiative (ebXML)" and (2) "Simple Object Access Protocol (SOAP)."

  • [April 06, 2001] "XML-Deviant: XP Meets XML." By Leigh Dodds. From April 04, 2001. ['The XML-Deviant has been watching advocates of the latest trend in software development, Extreme Programming, get to grips with XML. At least they have acronyms in common.'] "Extreme Programming (XP) is a software development methodology that has been causing as much of a stir in development communities as has XML. XP is the brain child of Kent Beck and has been around since 1996, although over the last year or so it has been rapidly gaining acceptance by an increasing number of developers. Indeed the Extreme Programming mailing list supports a daily traffic level which rivals XML-DEV at its most vociferous. The methodology promotes several core principles of which the most well-known and critiqued is 'pair-programming': teaming up two developers at a single machine to improve code quality, while ensuring that no single developer takes isolated ownership of a particular section of code -- thereby enabling a higher degree of knowledge sharing. Other important aspects of XP include continuous integration, constant refactoring of code, and a heavy emphasis on unit testing..."

  • [April 06, 2001] "Transforming XML: Namespaces and XSLT Stylesheets." By Bob DuCharme. From April 04, 2001. ['A guide to using XSLT to create documents that use XML Namespaces.'] "In XML a namespace is a collection of names used for elements and attributes. A URI (usually, a URL) is used to identify a particular collection of names. Instead of including the URI with every element to show which namespace it came from, it's more convenient to name a short abbreviation when a namespace is declared and to then use that abbreviation to identify an element or attribute's namespace. Many simple XML applications never need to declare and use namespaces. If they don't, the XML processor treats all elements and attributes as being in the default namespace. This may be the case with some of your source documents, but it's certainly not the case with your XSLT stylesheets: the declaring and referencing of the XSLT namespace is how we tell an XSLT processor which elements and attributes in a stylesheet to treat as XSLT instructions... In this column, we've seen how to control the namespaces that are declared and referenced in your result document. Next month, we'll see how your XSLT stylesheet can check which namespaces are used in your source document and perform tasks based on which namespace each element belongs to." See "Namespaces" and for related XSLT resources, "Extensible Stylesheet Language (XSL/XSLT)."

  • [April 06, 2001] "XML And Distributed Computing." By Boris Lublinsky. In XML Journal Volume 2, Issue 4 (April, 2001), pages 8-16. ['There are three big challenges when implementing distributed computing systems: data transfer, interface management, and remote invocation. This article examines how XML can help with each of these, and how XML-based semantic messaging can unify disparate distributed architectures.'] "Most popular distributed computing models, such as DCE, DCOM, RMI, and CORBA, attempt to present the developer with the standard function/method invocation paradigm, which is exactly the same as a local invocation... This paradigm is convenient for the developer, because it hides the distribution aspect from him or her, making it nearly transparent. Unfortunately, the penalty for this convenience is tight coupling between communicating applications... For several years semantic messaging has been suggested as a way to solve these problems. Semantic messaging is defined many different ways; even execution and transactional semantics have been considered part of the definition. Throughout this article we define a semantic message as purely data semantics - a message should contain data and a definition of what this data element represents. Thus, applications that deal with semantic messages are not driven by the sequence of data or its type, but rather by naming conventions (data semantics). This paradigm is significantly better suited for implementing a data transfer. The simplest case of semantic messages is name/value pairs, which are used internally by distributed computing models. These types of messages are self-describing in the sense that transferred data is defined not by its position in the data stream, but rather by the name of this particular piece of data. Self-describing data allows two applications to share data semantics (names), instead of agreeing on an internal data representation and a data sequence within the message. Instead of parsing incoming information and providing access to every piece (as in the IDL-based approach), a self-describing data approach presents all the input data to the application as a single message. This approach forces an application to extract the required information by parsing the incoming message. XML enables self-describing structured data of any complexity to be implemented in a uniform fashion using XML documents. The availability of standardized XML parsers and the standard representation of XML documents (DOM) simplifies the parsing and extracting of XML data on the fly. Although this approach requires more effort than standard distributed computing models, it allows for significantly fewer coupled applications... XML-based semantic messaging revolutionized distributed systems development. The main advantages of using semantic messaging for building distributed systems are: (1) Significantly more flexible data transfer: XML-based semantic messaging lets you deal with data semantics rather than data position and type. (2) Simplified interface management: XML-based semantic messaging lets you simplify interface management by expressing parameters in the form of XML documents, thus making them more generic and more resilient to the changes of the parameters. (3) Simplified remote invocation: XML-based semantic messaging, coupled with the introduction of the gate object, minimizes the amount of required proxy/stub pairs to one per process and doesn't require coupling the component's life cycles to improve the overall performance. Using XML-based semantic messages and a gate-based approach you can create a generic architecture that provides any kind of access protocol (HTTP, CORBA, DCOM, etc.) to the existing systems..."

  • [April 06, 2001] "TS-SQL." By Jinbo Chen. In XML Journal Volume 2, Issue 4 (April, 2001), pages 32-35. ['XML enables the integration and transformation of data from distributed data sources. These data sources may be relational database systems, ERP systems, legacy applications, or a combination. By accessing data from one or more sources in an XML format, applications can focus on the functional logic and let the data layer deal with multiple systems and different formats with different drivers and tools. To enable the easy and flexible creation of XML data from these data sources, I developed a platform- and database-independent Java tool - tree-structured SQL (TS-SQL).'] "Structured Query Language (SQL) is a powerful and flexible tool that allows users to define and manipulate the data in the database. However, it's not intuitive in some cases, such as queries with both self- and outer-joins. Furthermore, SQL query results are always in a flat structure, a denormalized representation of the original table data. The relationship between tables isn't preserved and some redundant information is presented as well, which makes processing difficult. To restore the relations, extra work is needed to transform retrieved flat-structured data... Using TS-SQL as the data layer, the query result is automatically structured in tree-structured XML format, thus preserving all the data relationships and eliminating redundant data. This makes the subsequent processing and use of the data extremely easy and flexible, and significantly reduces development time and overall cost... TS-SQL takes an XML template file in which the user specifies query content and structure, and outputs an XML document. From a system architectural point of view, TS-SQL consists of three key components: (1) Data source: Handles all database query requests from the template processor. (2) Template parser: Parses an XML template file and creates a template object. (3) Template processor: Instantiates a template object given runtime inputs. Instantiation of a template object consists of executing SQL statements specified in the template files and formatting query result data in XML format. All retrieval and formatting are transparent to end users; they only need to know how to write a template file... TS-SQL is a powerful and flexible XML tool. It efficiently retrieves complex data using a query template, preserves data relationships, brings the power of XML technologies to your database and e-business systems with ease, and all retrieval and output formatting are transparent to end users. Also it's both database and platform independent with distributed information collection support. You can download the binary at"

  • [April 06, 2001] "Hybrid XSL Transformation Engine." By Ravi Akireddy (Comergent Technologies Inc.). In XML Journal Volume 2, Issue 4 (April, 2001), pages 36-40. ['XSLT is an extremely popular technology that supports business-to-business and business-to-customer integration, and device-based Web access. Due to its ability to deliver portable data, XML plays a key role in e-commerce applications. Organizations that are developing XML vocabularies and DTD/schemas are playing a great role in enabling the standardization of intrabusiness and business-to-business communications.'] "Effective and intelligent processing of XML data will save hundreds of hours of development work and can simplify complex business logic in your application. Using XSLT tools to process these XML documents into different formats, such as HTML, XML, and plain text, is an ideal approach. Of course, this is not the only solution, but it's definitely one of the best. Transformation of XML data is not just converting/rendering it into a presentable view, it can also be used to simplify business logic definitions and transform business logic data from one form to another. Multiple transformations on a single XML document can be done in a single automatic transaction using SAX2 filters, which is another powerful implementation to supplement the strength of XSLT. If you have some experience with stylesheet transformation or transformation techniques, you'll definitely have questions about its speed, efficiency, and memory-related issues. These are top priority issues and major concerns in any application especially e-commerce ones. Rendering XML documents with stylesheets are killer consumers of CPU processing time. Performance-wise, using transformations in real-time Web-based applications is a big bottleneck and also memory intensive. In this article, I've attempted to address these issues in the form of a transformation engine, a tool that encapsulates two different XSL transformers - Sun's Processor, which uses a binary .class form of stylesheets, and James Clark's XT... Sun's XML Technology Center has come out with an efficient and neat way of transforming/rendering XML data using stylesheets. According to their approach, stylesheets are compiled into '.class' files called translets and are used to process XML data. Performance gain achieved by this approach was impressive enough to inspire this article. I observed a performance improvement of over 50% in transforming XML documents and fewer memory footprints. The intent of this article is to familiarize users with the different transformation techniques and also achieve the advantages that a particular processor gives, especially Sun's implementation..." Note, from Sun XML resources web site, that the XSLT Compiler (5th Preview Version) uses an important BCEL Library [Code Engineering Library (BCEL)] Update. The XSLT Compiler is a "Java-based tool for compiling XSL style sheets into a lightweight and portable Java class for transforming XML documents." See "Extensible Stylesheet Language (XSL/XSLT)" and "XSL/XSLT Software Support."

  • [April 06, 2001] "WebSphere Studio Leverages XML to Empower Web Developers." By Amy Wu and Sharon Thompson. In XML Journal Volume 2, Issue 4 (April, 2001), pages 56-60. ['A good Web development tool should be easy to use, yet robust enough to create and edit static and dynamic pages, organize and publish files, and help the developer properly maintain the site. IBM's WebSphere Studio is a total project management workbench with several integrated tools that assist developers in all stages of Web development. This article introduces you to Studio's wizards, editors, and publishing functions and exposes some of Studio's weaknesses as well.'] "Studio may be used in conjunction with some of the more common version control software (VCS). But even without an integrated VCS, Studio allows multiple users to access a project and check files in and out. Throughout the development process, Studio assists with link management that maintains links even while users move the source files around within the project. Studio's various editors and wizards are particularly helpful for nonprogrammers. Wizards can enable even novice users to generate server-side logic and add powerful functions to Web sites. They also leverage XML technology to make it easy for users to create Java servlets or JavaServer Pages that access databases and implement JavaBeans. When the development process is complete, or when the team is ready, Studio's powerful publishing feature enables them to easily publish files to a local server for review or to a live production server. Studio assists developers from page creation through editing and finally to publishing. WebSphere Studio includes three powerful wizards: SQL, Database, and JavaBean. These help developers easily create dynamic content for Web sites through a simple step-by-step procedure. Studio's SQL wizard generates a SQL file that specifies the query and information needed for the Database wizard so it can produce pages that access the database. The JavaBean wizard creates Web pages that utilize any JavaBeans you may have in your project. The JavaBean and Database wizards have similar steps and options. Both enable the user to select the code-generation style, generate markup languages, create error pages and specify display fields in the input and output pages, as well as choose which JavaBean methods to invoke, and in which order..." Note: IBM WebSphere Studio "provides an easy-to-use tool set that helps reduce time and effort when creating, managing and debugging multiplatform Web applications. A tool for visual layout of dynamic Web pages, Studio supports JSPs, full HTML, JavaScript and DHTML, uses wizards (for generating database-driven pages), and updates and corrects links automatically when content changes Has built-in support for the creation, management and deployment of Wireless Markup Language (WML), Voice Extensible Language (VXML) and Compact HTML for pervasive devices."

  • [April 05, 2001] "Why URLs are good URIs, and why they are not." By Pierre-Antoine CHAMPIN, Jérôme Euzenat, and Alain Mille. ["There is a recurring debate on both RDF lists about URIs, what they mean, and how some problems with RDF come from problems with them. Actually, we think there is actually a problem URIs, and especially with URLs used as URIs. Here is an attempt to clarify those problems and give some pieces of solution."] "Uniform Resource Identifiers or URIs have first been designed to offer a global and uniform mechanism to identify network accessible resources. More recently, the will to achieve the Semantic Web, and more particularly the Resource Description Framework (RDF) made it a base vocabulary to describe not only network accessible resources, but any resource. As a matter of fact, people are used to handle URIs, but mostly one kind of them: Uniform Resource Locators or URLs . Hence, whenever a resource needs to be identified, a URL is used which corresponds more or less to that resource. This is the less part which is concerning, has become a problem for RDF, and may become, in our opinion, a serious obstacle to the Semantic Web. We will first present our understanding of the notion of resource, which is the ground of the following discussions. Then we will explain why we think that URLs are often misused when employed as URIs, while they nevertheless have some advantages. Finally we discuss straightforward solutions which could be used to keep those advantages, without the drawbacks, based on Uniform Resource Names (URNs)... [Conclusion:] In this note, we discussed the issue of what exactly is identified by URLs, when they are employed as Resource Identifiers (URIs). As a matter of fact, we think that they are often misinterpreted when used as such. The interpretation we proposed, though not the most intuitive, seems to be more robust that more intuitive ones. We then discussed a way of building URNs inheriting the good properties of URLs (unicity and retrievability) and allowing to identify any (network retrievable or not) resource. We believe that such URN schemes are necessary to achieve the goals of the Semantic Web, since they provide cleaner identifiers than URLs. We finally discussed the necessity of implementing L2Ns services so as to encourage the use of URNs." Also in PDF format. See the response of Dan Connolly (W3C).

  • [April 05, 2001] Caltrop - an XML/XSLT calendar toy. Work in progress. By Charles McCathieNevile [with help from Dan Connolly, Max Froumentin, Karl Dubost]. "About this URI: this is going to have a schema describing the XML/XSLT calendar system that I am building. But at the moment it is a placeholder for the Namespace - - please use that exact string if you are going to link to it (can't imagine why you would at the moment - the thing is an alpha that doesn't work yet), since the HTML version may disappear by and by or morph into some other form. The goal [is to] produce a calendar system implemented in XML, using XSLT to generate various views of the calendar. The idea is that this will allow multiple content-types to be generated (I was initially looking for XHTML and iCal to be generated from one source), and that it is relatively easy to add data. It's really just a database application - it could be done in SQL/PHP (which is how I was initially going to do what I wanted) but this seems like a neat implementation demo like this. It struck me that we could also use the source to provide different calendars. What happens and how to make it happen: The source XML data is transformed using an XSLT stylesheet (at the moment there is only one, for generating XHTML). The form below lets you select a stylesheet and a source file... How it works: The XML source document uses a schema that I am in the process of making up. There is an XSLT transform to produce XHTML -- for the WAI Conferences page - the original motivator for this work. They are linked, so anyone with an XSLT browser should get the calendar list. There is also an online version of the service that can be used (but it produces an xml file, so if you can't deal with those you need to save it and pretned it is HTML.. At the moment there is a calendar element, with any number of event elements as children..."

  • [April 05, 2001] "Comparison of Technologies Submitted for the DSML 2.0 Specification." Draft document prepared by Jeff Bohren (Access360), submitted by Gavenraj Sodhi to the DSML TC for discussion. April 4, 2001. "Directory Services Markup Language (DSML) is a proposed standard for representing LDAP in XML. The first version (1.0) of DSML defined how to represent LDAP schema and data, but did not address LDAP operations and protocol issues. This paper is a summary of three technologies that have been submitted to the DSML organization for consideration as the next generation (2.0) for the DSML standard... Directory Access Markup Language (DAML), Submitted by Access360 Access360 has offered the DAML specification to for consideration as part of the DSML 2.0 specification. The DAML specification was defined by Access360 as the protocol for communication to agents. To be as standards based as possible, the DAML syntax matches the LDAP syntax very closely, but is represented as XML text rather than BER encoded data ... Novell has submitted a XML specification (NDS-DTD) that is currently part of the Dir-XML product for consideration as part of the DSML 2.0 specification. The NDS-DTD is designed mainly to support the concept of an LDAP join engine (system that keeps data from 3rd party data sources synchronized with it's representation in an LDAP directory)... iPlanet has submitted the XML schema from the iPlanet XMLDAP Directory Gateway (iXDGW) product to for consideration as part of the DSML 2.0 specification. The XMLDAP, unlike DAML and Dir-XML, does not define an XML DTD. Instead, it defines a generic template specification that can be used in conjunction with their product to transform representation of data in XML from one form to another. They then define a default implementation of that transformation language that can transform LDAP like data..." See "Directory Services Markup Language (DSML)." [source]

  • [April 05, 2001] "Peering into the Future. XML plays a critical role as P2P technologies head for the enterprise." By Stuart J. Johnston and Steve Gillmor. In XML Magazine Volume 2, Number 2 (April / May 2001). ['We've created a virtual conversation from discussions that XML Magazine editor-in-chief Steve Gillmor, contributing editor Stuart J. Johnston, and editorial director Sean Gallagher conducted with several of the [P2P] major architects of this burgeoning and vital new movement, letting them speak to you peer to peer.'] "Today, combined with a common language for communication -- XML -- a gaggle of those early visionaries are now either shipping or are near to shipping production P2P applications built to enable just that: collaborative virtual workspaces. Ideally, these packages will enable realtime collaboration on, and sharing of, not only documents and charts but also rich multimedia such as audio and video... And the nascent market for such tools is already becoming crowded. Almost daily, new companies emerge from stealth mode into the public eye, touting their brand, flavor, or style of P2P technology for business users. Two cases in point: Groove Networks, the brainchild of Lotus Notes creator Ray Ozzie, and OpenDesign, a start-up funded by former Microsoft chief technology officer Nathan Myhrvold. Both use XML extensively. Most of the new P2P start-ups that have begun popping up like mushrooms in a spring rain are basing their technologies around XML to be able to interoperate with the rest of the world... Comments on XML and P2P are provided from: (1) Alex Cohen, chief evangelist for OpenDesign, a Bellevue, Washington-based startup spun out of Intellectual Ventures, a partnership between two former Microsoft executives: chief technology officer Nathan Myhrvold and chief software architect Edward Jung. OpenDesign is working on XML-programmable infrastructure software to support distributed computing in a hybrid peer-to-peer and client-server environment, including dynamic load balancing. (2) Charles Fitzgerald, Microsoft's director of business development; (3) Jonathan Hare, CEO of Consilient Inc. - the company's technology is designed to aggregate business processes as XML documents called sitelets; (3) Ken Levy as director of technology for, and David Pool as CEO and founder of, XMLFund, a leading venture capital fund located in Bellevue, Washington that specializes in XML startups; (4) Tim O'Reilly is CEO of O'Reilly & Associates; (5) Ray Ozzie, founder and CEO of Groove Networks. For the Groove/XML connection, see "A New Groove," by Steve Gillmor and Ray Ozzie. Steve Gillmor spoke with Ozzie about Groove and its underlying XML object store and communications technologies.

  • [April 05, 2001] "XML Through the Wall." By Daniel Nehren. In XML Magazine Volume 2, Number 2 (April / May 2001). ['Connecting client-server applications through a firewall is difficult, because usually the only port you've got is the one you use for the Web. Discover how our framework using HTTP tunneling and XML surmounts this obstacle.'] "You've built a servlet application that connects to your corporate database and provides a specific service to your customers. This application is protected by a powerful authentication mechanism, has a sophisticated connection-pooling framework to improve scalability, and is used by thousands of clients worldwide. Some users complain that the Web interface is inefficient when the application is used for highly repetitive tasks. They say that they would like to have a desktop application with more powerful features to make their work easier. Now the question arises: How will you provide the users access to the database from your application if it is outside the corporate firewall? You know that the network administrator won't open a special port so that your application can connect to the database!... The way to solve this problem is to use HTTP tunneling. This method involves wrapping your request in an HTTP POST request that will be handled by a CGI application that lives on the Web server inside the firewall. It's like a Trojan horse: It looks like a normal request, but it hides an unexpected payload. How should we format our requests? By using XML, of course, which is the perfect candidate for carrying the payload to an HTTP request. By no means is this idea original; XML over HTTP is a hot new field. New specifications are being written that could become the standard communication protocol for distributed applications. Simple Object Access Protocol (SOAP) is the most accredited of the pack... Let's build a simple framework that has XML over HTTP as the underlining communication strategy and that lets us create a set of services that can be accessed from desktop applications scattered to the four corners of the Internet world. We first need to establish the syntax of our generic requests and responses..."

  • [April 05, 2001] "Overcoming the Trials of Data Gathering." By Alan Sproat ['Gathering data from HTML is not always easy; however, if you transform the HTML documents to XML, it becomes a whole lot easier. Alan discusses a simple way of doing this.'] "Gathering data from HTML pages is easy once you transform the documents to XML. Data is everywhere. You see it every day on Web pages of your clients, suppliers, and partners. It could be simple data, such as contact information, or complex data, such as a quarterly report. Whatever the case, you need an efficient way to gather this data. The trouble with most Web sites is, HTML is great at presentation but terrible at data definition. So, gathering data from HTML is not always easy. However, with reasonably well-formatted HTML and some minor editing, you can turn these Web pages into manageable XML from which you can extract the data you need. Not only that, but the XML also lets you know if the document format has changed so that you can perform the necessary updates... Now that you have your XML document, what will it look like? Most data on HTML pages is contained within tables. This translates to numerous, repetitive combinations... Due to the sometimes complex and convoluted XPath syntax caused by nested tables and colspan and rowspan attributes, you will probably find it helpful to use a tool that visually displays the result set from an XPath syntax. For example, IBM's XSL Editor displays the XPath location of an element in a tree view of an XML document... At this point, there are three possible methods of pulling the data out of the XML document: DOM, XSLT, or SAX. Extracting the data from the XML document using DOM is fairly straightforward. You need to locate each piece of data and place its value in a variable or object property for later storage or further processing... The problem with gathering data from any format is that formats are almost never constant. This is especially true of HTML pages -- a site may change its look completely from one week to the next. With HTML, when the look changes, so may the location of the data within the format. Each of these three methods -- DOM, XSLT, and SAX -- may be extended not only to gather the data, but also to verify the format of the page where your data resides...Until XHTML becomes prevalent, or until all your data sources can send you XML documents, 'scraping' data off their Web pages may be the best way to get the information you need. Using XML tools such as the DOM, XSLT, and SAX, you can not only process the data, but you can find out when the format of the HTML page has changed. You don't need to wait for the data source to inform you of changes..." OpEd: This article reminded me of the sense of irony I felt in about 1993, when I investigated the state of research on "SGML+OCR+ICR". Despite the availability of SGML content management and document production tools for (then) over five years, it was obvious to Xerox that lots of money might still be made by building expensive AI-based OCR software that could scan paper-print copy, produced from WYSIWYG document systems, which would continue to be used for another decade or two. Culturally, it's just more sensible to screw up your information with a "presentation oriented" document system, and then pay lots of money trying to get the information back via scanning and OCR-ing legacy documents. Some things never change.

  • [April 05, 2001] "Tim Bray: XML from the Inside." By Steve Gillmor. In XML Magazine Volume 2, Number 2 (April / May 2001). ['XML Magazine Editor-in-Chief Steve Gillmor interviews Tim Bray, XML coeditor, about the leading role of XML in the future of the Web.'] "Tim Bray is the co-editor of XML 1.0 and Namespaces in XML, and since 1999, CEO of Systems. He sat down with XML Magazine editor in chief Steve Gillmor for a conversation about XML's ubiquitous role at the leading edge of the technology revolution. Tim and Steve also discuss these XML-related topics: Peer-to-peer technologies; Layering cross-Internet services over HTTP; The centricity of the browser to the Internet; Groove and other up and coming P2P products; Lotus Notes; XML's relationship with open source; Visual Net,'s recent product that provides greater accessibility for network environments. Final A: " could have been built without XML, but it would have been immensely more difficult and expensive. The strong decoupling of the browser and the server, the internationalization -- our thing is totally international, end to end today -- it works in Japanese, Chinese, Cyrillic, you name it. I've got to say IE, the browser, is superbly well built-out on the internationalization side. So XML made that possible. One of the original dreams of the people who built XML was that it would be a smarter publishing format -- a format that you could ship down into the client and make it faster and more involving and more interactive by also running some code on that client, and making the information-publishing experience better than it is now where you have to do all the work on the server and you ship down flat pages one at a time. Visual Net is one of the first real working examples of what can happen when you send the code to the client and do the work on the client, and you can get something that's much faster and more engrossing and interactive than the traditional, totally server-side-based publishing mechanism. Clearly, nobody would reasonably try and do something like that without basing it on XML. That was what XML was built for and it's the one area where XML has perhaps failed to come up to the potential dreams of its creators. XML has succeeded beyond any measure of hope in the areas of B-to-B, systems integration, glue-ware of various kinds, Web-site architecture, and so on -- but, still, we were hoping for a more intelligent publishing format for the future. Web publishing still remains kind of dumb. Treating all these powerful PCs on desktops as dumb terminals, and the server makes the determination as to how the screen should look -- that's just not the right way to do it..."

  • [April 05, 2001] "Working with XML in the .Net Platform." By Dan Wahlin. In XML Magazine Volume 2, Number 2 (April / May 2001). ['Microsoft's .Net platform includes classes that can be used to leverage the power of XML. Columnist Dan Wahlin discusses a few of the classes in the System.Xml assembly that can get you started using XML in .Net applications.'] XML's growth in popularity and utility over the past years has resulted in more software platforms providing support for XML. Microsoft's .Net platform is no exception; it provides robust end-to-end XML support. Let's look at several classes found within .Net's System.Xml assembly (Beta 1) and show how they can be used to leverage the power of XML. What is an Assembly? The .Net SDK defines an assembly as: 'a collection of types and resources [which] are built to work together and form a logical unit of functionality, a 'logical' dll.' Basically, an assembly takes several physical files such as interfaces, classes, resource files, and so forth, and creates metadata (referred to as a manifest) about how the files work together. The assembly can also contain information about versioning and security. The .Net platform allows assemblies to be used by applications without relying on regsvr32.exe for registration in the registry. For ASP.Net applications, this means that a custom assembly (one that you may have created) can be used by an ASP.Net page simply by copying the assembly to the bin directory of the application. Fortunately, you won't have to create any custom assemblies to work with XML in the .Net platform (although you can). XML has been integrated directly into the platform and many .Net classes are available to use once you understand how to access them... While the System.Xml assembly has many more classes than I have covered, these samples will get you up to speed quickly on working with XML in .Net applications. To see some of these classes in action in an application, take a look at 'Make Web Browsing Easier'. It provides a good basis for learning how to use some of the classes within the System.Xml assembly to construct an XML-based menu application. In future columns, I'll cover additional ways to leverage XML in the .Net platform."

  • [April 05, 2001] "Clean up Your Wire Protocol with SOAP, Part 1. An Introduction to The Basics of SOAP. [Wire Protocol.]" By Tarak Modi. In JavaWorld (March 2001). ['SOAP is not just another buzzword. It is a powerful new application of vendor-agnostic technologies, such as XML, that can help take the world of distributed programming to new heights. This article, the first in a series of four articles, introduces you to the basics of SOAP.'] "SOAP stands for Simple Object Access Protocol. In a nutshell, SOAP is a wire protocol similar to the IIOP for CORBA, ORPC for DCOM, or Java Remote Method Protocol (JRMP) for Java Remote Method Invocation (RMI). At this point you may be wondering, with so many wire protocols in existence, why do we need another one. In fact, isn't that what caused the problem discussed in the opening paragraph in the first place? Those are valid questions, however SOAP is somewhat different from the other wire protocols. Let's examine how: While IIOP, ORPC, and JRMP are binary protocols, SOAP is a text-based protocol that uses XML. Using XML for data encoding gives SOAP some unique capabilities. For example, it is much easier to debug applications based on SOAP because it is much easier to read XML than a binary stream. And since all the information in SOAP is in text form, SOAP is much more firewall-friendly than IIOP, ORPC, or JRMP. Because it is based on a vendor-agnostic technology, namely XML, HTTP, and Simple Mail Transfer Protocol (SMTP), SOAP appeals to all vendors. For example, Microsoft is committed to SOAP, as are a variety of CORBA ORB vendors such as Iona. IBM, which played a major role in the specification of SOAP, has also created an excellent SOAP toolkit for Java programmers. The company has donated that toolkit to Apache Software Foundation's XML Project, which has created the Apache-SOAP implementation based on the toolkit. The implementation is freely available under the Apache license. Returning to the problem stated in the opening paragraph, if DCOM uses SOAP and the ORB vendor uses SOAP, then the problem of COM/CORBA interoperability becomes significantly smaller. SOAP is not just another buzzword; it's a technology that will be deeply embedded in the future of distributed computing. Coupled with other technologies such as Universal Discovery, Description, and Integration (UDDI) and Web Services Description Language (WSDL), SOAP is set to transform the way business applications communicate over the Web with the notion of Web services. I can't emphasize enough the importance of having the knowledge of SOAP in your developer's toolkit. In Part 1 of this four-part series on SOAP, I will cover the basics, starting with how the idea of SOAP was conceived. . ." Note also on the early history of SOAP, the article of Don Box, "A Brief History of SOAP." SOAP references: "Simple Object Access Protocol (SOAP)."

  • [April 05, 2001] "Jato: The New Kid on The Open Source Block, Part 1. A New Library for Converting Between Java and XML." By Andy Krumel. In JavaWorld (March 2001). ['The Jato API converts XML documents into Java objects and back again. In January, Andy Krumel publicly released the API in beta form at SourceForge. Based on the observation that transformations are mechanical and tedious, with Jato a simple XML script describes the XML/Java mapping. In this article, the first of three, Andy explains how to use Jato to perform basic Java-to-XML and XML-to-Java transformations. In Part 2, he will focus on performing complex Java-to-XML transformations. Part 3 will explore converting an XML document into Java application objects.'] "Kept separate, XML and Java are environmentally friendly, but sound scientific evidence indicates the effort developers exert to merge them may contribute to global warming. This article, the first of three, introduces the open-source Jato API, a better way to turn XML documents into Java application objects and vice versa. First, we will examine Jato's key features, architecture, and important classes. Then we will develop Jato scripts to perform basic Java-to-XML and XML-to-Java transformations. So get ready to strip out those hundreds of lines of XML parsing code and replace them with a few lines of Jato script! Note: At the time of this writing, Jato is in Beta 2, with tremendous development work being piled into it. Occasionally, a change is made that will break backwards compatibility. To ensure the article examples work properly, the distribution will include all the samples from this series. What is Jato? Jato is an open-source Java API and XML language for transforming XML documents into a set of Java objects and back again. Jato scripts describe the operations to perform and leave the algorithms for implementing the operations to an interpreter. A Jato script expresses the relationships between XML elements and Java objects, freeing the developer from writing iteration loops, recursive routines, error-checking code, and many other error-prone, verbose, and monotonous XML parsing chores. Jato has many advantages over directly employing traditional Java XML APIs such as JDOM, SAX, or DOM. Those advantages include: Jato encourages XML and Java designs to be optimized for their specified tasks. Well-designed systems have a low degree of coupling, allowing one to make independent changes to portions of a system without it breaking. Indeed, it's a good idea for XML DTD and Java object-oriented design to be developed and deployed independently. With Jato, developers simply express the XML elements that map to or from specific Java classes. The Jato interpreter then implements the necessary parsing and generation algorithms to accomplish the desired actions. As such, you avoid the monotonous, monolithic, and difficult-to-maintain XML parsing and generation code. Using XML to describe transformation to or from XML in Java applications just seems natural. I will never forget the first time an application called for parsing an XML document to create a set of Java objects. After about 150 lines of code, most of the remaining 850 lines consisted of cut-and-paste operations followed by altering element, attribute, class, and method names. Whenever a task involves that much cut-and-paste, it just begs to be automated..." Jato is built on JDOM. See the news item.

  • [April 04, 2001] Common Warehouse Metamodel - CWM 1.0 Specification. From OMG. "This [.zip] file contains the two volumes of the CWM 1.0 specification and associated files. These reflect corrections of errata found since the spec/files were first published in February 2001." Includes: (1) Main specification with normative portions of CWM 1.0; (2) CWM 1.0 Specification, Volume 2 Extensions [The specification for the non-normative portions of CWM 1.0]; (3) CWM 1.0 Metamodel in XML [The CWM 1.0 metamodel defined using the Metaobject Facility (MOF) 1.3 Model. The XML document's type is defined based on the XML Metadata Interchange (XMI) 1.1 Specification]; (4) CWM 1.0 DTD [The XML document type (DTD) for the normative portions of the CWM 1.0 metamodel. The DTD is generated from the CWM 1.0 Metamodel per rules of the XML Metadata Interchange (XMI) 1.1 Specification]; (5) CWMX 1.0 DTD [The XML document type (DTD) for the non-normative portions of the CWM 1.0 metamodel. The DTD is generated from the CWM 1.0 Metamodel per rules of the XML Metadata Interchange (XMI) 1.1 Specification]; (6) CWM 1.0 CORBA IDL [CORBA IDL modules defining IDL interfaces to a CWM 1.0 CORBA facility. IDL is generated from the CWM 1.0 metamodel per rules of the Metaobject Facility (MOF) 1.3 Specification]; (7) CWM 1.0 Metamodel Diagrams - mdl [The CWM 1.0 metamodel expressed as UML diagrams using Rational Rose (nonnormative)]. With description of the documents in the README. See "OMG Common Warehouse Metadata Interchange (CWMI) Specification." [cache README, cache spec]

  • [April 03, 2001] "Fun with SOAP Extensions." By Keith Ballinger and Rob Howard. From MSDN Online. March 27, 2001. ['As promised, in this month's column we're going to look at one of the more advanced, but cooler features of ASP.NET Web Services -- SOAP Extensions. For this month's column Keith Ballinger, a Program Manager for .NET Web Services, has offered to share some of his knowledge of this subject.'] " One of the more interesting things you can do with the .NET Frameworks Web Services technology is create SOAP Extensions. These extensions allow you to gain access to the actual network stream before it is deserialized into objects within the framework, and vice versa. SOAP Extensions allow developers to create very interesting applications on top of the core SOAP architecture found within .NET. For instance, you can implement an encryption algorithm on top of the Web Service call. Alternatively, you could implement a compression routine, or even create a SOAP Extension that will accept SOAP Attachments. How does this work? It's easy. First, I'd recommend that you review Rob Howard's earlier article 'Web Services with ASP.NET,' and then come back. You may also want to read 'SOAP in the Microsoft .NET Framework and Visual Studio.NET.' Basically, you need to do two things: (1) Create a class that derives from System.Web.Services.Protocols.SoapExtension (2) Create a class that derives from System.Web.Services.Protocols.SoapExtensionAttribute And you are almost done! Now, all that you have to do is create something interesting from you derived classes. OK, so maybe that isn't as easy as I make it sound. . . For this column, we will create an extension that records incoming and outgoing SOAP messages to our Web Service. This trace extension is useful for debugging when you really care about getting the SOAP message to look exactly the way you want it to look... SOAP Extensions are a very useful feature, and I hope to see a lot imaginative uses for them over the next few years." See "Simple Object Access Protocol (SOAP)."

  • [April 03, 2001] "Find a Home For Your XML Data. The Sudden Rise of XML Puts a New Twist on The Old Problem of Data Storage." By Mark Leon. In InfoWorld (April 02, 2001). "Customers say they want them, vendors are scrambling to provide them, and opinions vary as to how to set them up correctly. They are XML databases, a way to store, search, and retrieve all that mission-critical business data that is finding expression in XML format. Currently, XML rivals HTTP, HTML, and SQL as one of the big hits on the top 10 chart of information management standards. But XML's strength, its great capability of facilitating the flow of semistructured data among applications and heterogeneous systems, also introduces several new problems. One of the more pressing problems is how to store and manage XML data. . . Microsoft, Oracle, and IBM have already added XML extensions to their relational databases, but these efforts will not satisfy everyone for a number of reasons... 'The relational database design does not easily support indexing or searching XML,' says Satish Maripuri, president and COO of eXcelon. 'An object database such as ours offers a more natural way to store, search, and retrieve XML data. This is why we took a bet with XML.' Analysts give eXcelon high marks for what it has been able to accomplish both in simplifying its product and in adapting it to XML, but few are willing to predict the bet will pay off. The reasons were the by-now-familiar advantages of XML: It is more flexible than EDI (electronic data interchange) and cheaper because it can, via the Internet, bypass expensive, private VANs (value-added networks)... the big boys of data storage have not been sitting on their hands. Oracle, IBM, and Microsoft have added XML extensions to their relational offerings... Not exactly a household name in the United States, Software AG of Germany owns a substantial share of the global database market with its Adabase product. And now the company, with U. S. headquarters in Reston, Va., thinks it has an edge in the XML storage space. 'We released Tamino in September of 1999,' says John Taylor, director of product marketing at Software AG. 'Tamino is not a relational database, nor is it an object database modified for XML. It is, rather, a database built from the ground up specifically for XML.' The interface for Tamino is HTTP, and Taylor says his company is working with the World Wide Web Consortium (W3C) to develop the next XML query language. 'The issue of query and retrieval is key,' Taylor says. 'You can use extensions to SQL for this, but to do that you need to break the XML hierarchy into a set of relational tables. This means queries will necessarily contain a complex set of join statements. With XPath, our query language, we can replace all that with one line.' Taylor says that more than 280 customers are currently using Tamino. One of these is the California Board of Equalization in Sacramento, Calif. The board collects about $37 billion in taxes (primarily sales tax) for California. 'We started looking at XML to facilitate the electronic filing of taxes,' says Larry Hanson, data architect for the board. 'Before long we also realized XML would be the best way to store tax returns, tax schedules, and tax-related messages'..." See: "XML and Databases."

  • [April 03, 2001] "Wall Street Releases Draft XML Standard. Brokerage group invites public comment." By Maria Trombly. In ComputerWorld (April 02, 2001). "The Ltd. standards committee, a consortium of brokerage firms that's committed to creating a standard computer language for presenting investment and financial research, last week released a draft version of Research Information Exchange Markup Language (RIXML) 1.0 for public comment. The need for a standard approach to investment reports is critical on Wall Street because brokerages produce some 2,000 notes and reports daily, said sell-side co-chairman Christopher Betz, vice president for the institutional equity division at New York-based Morgan Stanley Dean Witter & Co... The new voluntary, open standard is designed to let the authors of these reports tag the content with four major types of information, Betz said. This includes source information such as publisher, analyst and research team; content information that describes whether the content is a Web address, an HTML file or an Adobe Acrobat file; legal material such as disclosures, disclaimers, trademarks and copyrights; and context information that describes what the report is about - a country, an industry or a specific sector such as semiconductors..." See (1) the recent news item " Announces Release of RIXML Specification for Public Comment" and (2) "Research Information Exchange Markup Language (RIXML)."

  • [April 03, 2001] "Fun with XML." By Eric Lease Morgan. April 03, 2001. "I have done some experimentation with XML for the past couple of months, and I have written a bit about this process in an essay called 'Fun with XML.' From the introduction and conclusion: This text outlines the process I used to learn a bit about XML, Extensible Markup Language. In a nutshell, XML is an operating system-independent, standards-based method for marking up text and data for the purposes of sharing information between computers and ultimately people. XML and its associated technologies provide a means for libraries to collect, organize, archive, and disseminate information in a manner more in tune with the current digitally networked environment.... I found my explorations into XML to be fun and exciting. Because I was describing, manipulating, and disseminating data and information I found myself doing real library work. Tell me what you thinque..."

  • [April 02, 2001] "Definition of SXML: an instance of XML Infoset as S-expressions, an Abstract Syntax Tree of an XML document." By Oleg Kiselyov. Revision: 2.0. Last updated March 7, 2001 or later. ['SXML is an instance of XML Infoset as S-expressions. SXML is an Abstract Syntax Tree of an XML document.'] "An XML information set (Infoset) is an abstract data set that describes information available in a well-formed XML document. Infoset is made of 'information items', which denote components of the document: elements, attributes, character data, processing instructions, etc. Each information item has a number of associated properties, e.g., name, namespace URI. Some properties -- for example, 'children' and 'attributes' -- are (ordered) collections of other information items. Infoset describes only the information in an XML document that is relevant to applications. Element declarations from DTD, XML version, parameter entities, etc. data used merely for parsing or validation are not included. XML Infoset is described in [W3C XML Infoset]. Although technically Infoset is specified for XML, it largely applies to HTML as well. SXML is a concrete instance of the XML Infoset. Infoset's goal is to present in some form all relevant pieces of data and their abstract, container-slot relationships to each other. SXML gives the nest of containers a concrete implementation as S-expressions, and provides means of accessing items and their properties. SXML is a 'relative' of XPath and DOM, whose data models are two other instances of the XML Infoset. SXML is particularly suitable for Scheme-based XML/HTML authoring, SXPath queries, and tree transformations. In John Hughes' terminology, SXML is a term implementation of evaluation of the XML document..." See the discussion.

Hosted By
OASIS - Organization for the Advancement of Structured Information Standards

Sponsored By

IBM Corporation
ISIS Papyrus
Microsoft Corporation
Oracle Corporation


XML Daily Newslink
Receive daily news updates from Managing Editor, Robin Cover.

 Newsletter Subscription
 Newsletter Archives
Globe Image

Document URI:  —  Legal stuff
Robin Cover, Editor: