The Cover PagesThe OASIS Cover Pages: The Online Resource for Markup Language Technologies
Advanced Search
Site Map
CP RSS Channel
Contact Us
Sponsoring CP
About Our Sponsors

Cover Stories
Articles & Papers
Press Releases

XML Query

XML Applications
General Apps
Government Apps
Academic Apps

Technology and Society
Tech Topics
Related Standards
Last modified: May 28, 2002
XML Articles and Papers. July-September 2000.

XML General Articles and Papers: Surveys, Overviews, Presentations, Introductions, Announcements

References to general and technical publications on XML/XSL/XLink are also available in several other collections:

The following list of articles and papers on XML represents a mixed collection of references: articles in professional journals, slide sets from presentations, press releases, articles in trade magazines, Usenet News postings, etc. Some are from experts and some are not; some are refereed and others are not; some are semi-technical and others are popular; some contain errors and others don't. Discretion is strongly advised. The articles are listed approximately in the reverse chronological order of their appearance. Publications covering specific XML applications may be referenced in the dedicated sections rather than in the following listing.

September 2000

  • [September 30, 2000] "Final Messaging Services Specification Due From ebXML. SOAP to play no part in OASIS/UN standard, expected out November 6." By Douglas Finlay. In Software Development Times (October 01, 2000). "The ebXML initiative, a joint venture between the Organization for Advancement of Structured Information Standards (OASIS) and the United Nations' CEFACT (UN/CEFACT) body, is set to finalize and release the Messaging Services Specification at its next meeting in Tokyo November 6, 2000. It is the first specification from ebXML's Transport, Routing and Packaging (TRP) group and is intended to standardize how messages are wrapped and sent from business to business in an open environment across the Internet. The Simple Object Access Protocol (SOAP) architecture, in which Microsoft has played a leading role and which had been under consideration as a possible transport mechanism for the messages by the TRP group, was rejected as being too closed an architecture for the stated open and collaborative direction of the ebXML initiative ( Instead, the ebXML initiative chose MIME-XML technology to wrap and send the message. 'The goal of the ebXML initiative is to facilitate global trade between organizations of any size through a set of XML-based standards defined through an open and collaborative process,' remarked Ed Julson, business development manager for Sun Microsystems Inc.'s XML technologies and a working member of the group. Julson said the ebXML initiative, begun in December 1999 to help companies exchange data over the Internet in a public way at lower cost, would release a number of specifications over an 18-month time frame, and that all of the specifications are entirely platform- and language-independent. Areas in which specifications are currently being worked on include business processes; registry and repository; core components; and trading partners. Two prototypes of the ebXML specification have already been built, and both were spawned from prior ebXML proof-of-concept meetings in Brussels, Belgium, and San Jose, Calif. While Microsoft threw its hat into the open-source ring by submitting its SOAP specification to the ebXML committee for use as a transport mechanism for delivering the messages, Drummond said the TRP working group found that MIME-XML was best suited for the job. Fujitsu and Sun are solidly behind the ebXML initiative and have plans to implement any ebXML specifications as quickly as they become available. Jim Hughes, Fujitsu's director of industry relations, said the company was actively combining its popular reliable messaging technology for its mainframes into a prototype of the messaging spec to get the implementation into its products." See: "Electronic Business XML Initiative (ebXML)."

  • [September 30, 2000] "XSL Formatting Objects." Chapter 15 of the XML Bible, by Elliotte Rusty Harold. Announcement posted September 30, 2000. "I'm happy to announce that I've posted a completely updated version of Chapter 15 of the XML Bible, XSL Formatting Objects, at Cafe con Leche. This is the complete chapter, approximately 70 pages with many full examples of XSL-FO. Everything should be up-to-date with the March 27, 2000 Last Call working draft of the XSL-FO specification and FOP 0.14.0. To the best of my knowledge, this is the only comprehensive tutorial covering the current version of XSL-FO. Doubtless there are some errors since I was breaking new ground here and had to work from an incomplete and sometimes contradictory spec document, as well as using unfinished pre-alpha software. Since this is more-or-less what's going to go into the second edition of the XML Bible, as well as likely being the primary source for many new users learning XSL-FO, I'd very much appreciate it if you can inform me of any mistakes you spot so I can fix them." Intro: "XSL Formatting Objects (XSL-FO) is the second half of the Extensible Stylesheet Language (XSL). XSL-FO is an XML application describing how pages will look when presented to a reader. Generally, a style sheet uses the XSL transformation language to transform an XML document in a semantic vocabulary into a new XML document that uses the XSL-FO presentational vocabulary. While many hope that Web browsers will one day know how to directly display data marked up with XSL formatting objects, for now an additional step is necessary in which the output document is further transformed into some other format such as PDF." For related resources, see "Extensible Stylesheet Language (XSL/XSLT)."

  • [September 30, 2000] Harvesting RDF Statements from XLinks. Reference: W3C Note 29-September-2000, edited by Ron Daniel Jr. (Metacode Technologies Inc.). This Note is not a formal product of the W3C XML Linking Working Group, but "is made available by the W3C XML Linking Working Group for the consideration of the XLink and RDF communities in the hopes that it may prove useful." Abstract: "Both XLink and RDF provide a way of asserting relations between resources. RDF is primarily for describing resources and their relations, while XLink is primarily for specifying and traversing hyperlinks. However, the overlap between the two is sufficient that a mapping from XLink links to statements in an RDF model can be defined. Such a mapping allows XLink elements to be harvested as a source of RDF statements. XLink links (hereafter, 'links') thus provide an alternate syntax for RDF information that may be useful in some situations. This Note specifies such a mapping, so that links can be harvested and RDF statements generated. The purpose of this harvesting is to create RDF models that, in some sense, represent the intent of the XML document. The purpose is not to represent the XLink structure in enough detail that a set of links could be round-tripped through an RDF model." [Principles:] "Simple RDF statements are comprised of a subject, a predicate, and an object. The subject and predicate are identified by URI references, and the object may be a URI reference or a literal string. To map an XLink link into an RDF statement, we need to be able to determine the URI references of the subject and predicate. We must also be able to determine the object, be it a URI reference or a literal. The general principle behind the mapping specified here is that each arc in a link gives rise to one RDF statement. The starting resource of the arc is mapped to the subject of the RDF statement. The ending resource of the arc is mapped to the object of the RDF statement. The arc role is mapped to the predicate of the RDF statement. However, a number of corner cases arise, described in [Section] 3, 'Mapping Specification'. RDF statements are typically collected together into 'models.' The details of how models are structured are implementation dependent. This Note assumes that harvested statements are added to 'the current model,' which is the model being constructed when the statement was harvested. But this Note, like RDFSchema, does not specify exactly how models must be structured." See also (1) "XML Linking Language", (2) "Resource Description Framework (RDF)", and (3) "XML and 'The Semantic Web'."

  • [September 30, 2000] "Opposing groups merge to develop metadata standard. A new, single standard is expected within six to 12 months." By Dan Verton. In ITWorld (September 29, 2000). "Two opposing camps of database and data warehousing software vendors last week ended a five-year rivalry, deciding to combine forces in search of a single metadata standard aimed at creating a plug-and-play environment for users who are building data warehouses. In a joint announcement, the Meta Data Coalition (MDC) in Austin, Texas, and the Object Management Group Inc. (OMG) in Needham, Mass., said the two organizations would merge to work on a combined set of specifications for metadata interoperability among different data warehousing tools. Until now, the two industry groups have supported competing standards for metadata, which functions as a card catalog for warehoused data. The merger signals an end to a political tug-of-war between the Microsoft Corp.-affiliated MDC and the OMG metadata effort, which has been backed by vendors such as Oracle Corp. and IBM Corp. The two groups plan to merge the features of the OMG's recently ratified Common Warehouse Metamodel standard with the MDC's Open Information Model standard, emerging with a single standard within six to 12 months. The resulting specifications should allow companies that run data warehouses to exchange metadata among products developed by different software vendors, improving interoperability." See the announcement of September 25, 2000: ""Competing Data Warehousing Standards to Merge in the OMG." - "Today, the Meta Data Coalition (MDC) and the Object Management Group (OMG), two industry organizations with competing data warehousing standards, jointly announced that the MDC will merge into the OMG. As a result, the MDC will discontinue independent operations and work will continue in the OMG to integrate the two standards. Until this week, there were two major standards for metadata and modeling in the areas of data warehousing and component-based development. Data warehousing is a response to the enterprise need to integrate valuable data spread across organizations from multiple sources. Analysis of an enterprise's accumulated data not only allows sales and production to be tuned for maximum profitability, but also allows entirely new and profitable products to be discovered and exploited..." See (1) "OMG Common Warehouse Metadata Interchange (CWMI) Specification" and (2) "MDC Open Information Model (OIM)."

  • [September 30, 2000] "Tools vendors look to simplify XML implementation." By Tom Sullivan. In InfoWorld (September 29, 2000). "EDI (electronic data interchange) has worked in the past, but only for large companies with deep pockets, and its use had been limited primarily to interenterprise information exchanges. EDI is expensive, and the cost has plagued smaller companies. Typically, a large company with an EDI infrastructure will have smaller partners that may use a Web browser to access, for instance, supply-side data. Such a solution may suit that purpose, but it is not data integration. When the large retailer with EDI capabilities in place puts an order in to a third-or fourth-tier supplier that does not have an EDI system, the large retailer has no way of knowing if the supplier has the needed inventory in stock; in other words, there is no guarantee that the order will be filled. If the data is integrated between both companies, however, the retailer could access inventory data and, if the order cannot be filled immediately, request the inventory from a different supplier. 'Companies want to be able to automate that data as deeply as they can, and to enable bidirectional integration into both companies,' said Jon Derome, a senior analyst at the Yankee Group, in Boston. To help customers integrate their EDI and XML data, Denver-based New Era of Networks (NEON) last week announced its PaperFree EDI Adapter, which provides an easy connection between XML and EDI formats. Although EDI certainly isn't going away, the industry seems to have agreed on XML as the glue that bonds together disparate systems. 'One of the big benefits of XML in e-commerce is that it is going to level the playing field and let smaller companies that don't have EDI communicate with larger companies,' said Chris Silva, associate research analyst at IDC. To help developers realize that level playing field, a number of vendors are making XML easier for developers to use. Compuware last week, for example, announced that its Uniface 8 supports XML, which the company claims extends development and deployment of multitier e-business applications. Uniface can be used to generate XML and valid DTDs (Document Type Definitions) on request. As a result, no XML skills or DTD knowledge are necessary for e-business application development, and organizations benefit from increased developer productivity, ease of maintenance, and reduced costs, according to the company. Uniface has traditionally used its own proprietary way of sending data streams..."

  • [September 30, 2000] "DXML - Dynamic XML. A Proposal Defining the Principles Involved in Creating a Dynamic XML System." By Sean B. Palmer. September 28, 2000. "This is a proposal to outline the principles and protocols that are needed to create a truly dynamic form of XML. XML/XHTML documents are excellent means of conveying information, but the static nature of these languages is rapidly becoming out of date. To cope, the W3C has introduced standards such as XSLT which can enable processing of the XML document structure tree; and furthering the DOM specifications. However, this proposal (referred to hereafter as DXML) is not primarily based upon these systems. Rather it is a real form of dynamicism - it involves actually changing the XML tree on the server and/or client side. In brief, DXML is a separate system to the XML DOM, and DHTML - but it can implement these to achieve the dynamic status. To do this, we can (only?) achieve the effects of XSLT, DOM and DHTML by using CC/PP [CCPP], and in particular, the new CC/PP - DTPP proposal. The CC/PP - DTPP proposal defines a number of 'statuses' for documents being shunted through a CC/PP processing system. As a document goes through a CC/PP system, it gets 'customized', that is deliberately changed to suit a specific User Agents needs through Preference Profiles. What DXML aims to do, is use this customization in a slightly different way - to process the XML document for a single UA, using a DXML CC/PP profile. The method for processing DXML profiles is beyond the scope of this document, which simply proposes it. However, the CC/PP group has foreseen the fact that CC/PP may be used for many different applications, and so have written it in an extensible language - RDF. Therefore, CC/PP profiles are not adverse to being modified in a manner that may make DXML become a reality. Overall, one possible application of DXML is that it may use CC/PP to perform the tasks of XSLT, DHTML, and scripting functions on the DOM...'

  • [September 30, 2000] "XML and" By Chris Lovett. From MSDN Library (September 29, 2000). ['Extreme XML columnist Chris Lovett interviews Mike Moore and discusses XML implementation on the site.'] "Some new developers joined my team in late 1998, and they gave us a demonstration of an XML dialect they had created for the Knowledge Base using a beta version of MSXML. Seeing MSXML survive processing the entire Knowledge Base gave us great confidence in the stability of the MSXML component. In early 1999, we were working on a new product catalog application and were running into some very real operational limits. We had a back-end system that was generating about 100,000 fully localized HTML pages. Managing this quantity of data across a Web farm of about 30 U.S. machines-- and a total of 20 machines outside the U.S. that had slower connections-- together with very high churn, was next to impossible. We were about two weeks away from going into production with this new system when we decided XML was a much better way to do this. Internationalization was a big factor. We did some performance analysis on MSXML and were very impressed with the fact that it ran rock solid for 20 hours with no memory leaks (using our typical 20-KB XML test file), so we decided to completely scrap the old system and start over with XML. One guy sat down and prototyped something in a couple days using XML for the data and ASP script code that loaded the XML by using MSXML and rendered the HTML page. . . To be honest, I was blown away by the performance of the new XSL-based catalog when it shipped. We could literally see that the pages were a lot snappier. On every ASP request, we are loading from four to six localized XML files. Then we're applying XSL on the fly to dynamically generate a fully localized HTML page, and we're doing about 30 of these transformations per second per machine. It really exceeded our expectations. The whole code/data/presentation separation thing is a huge win. A lot of people don't realize how big a win this is. We didn't even realize it going in, but now we are finding new opportunities all over the place for re-using the content in ways we would have never thought of. We've even found that some other groups inside Microsoft are already reusing our XML content. They never even talked to us. They just looked at the XML, understood the schema, and ran with it. We had no idea until we stumbled onto it! [...] We have about 30 machines in our main cluster, each handling about 100,000 users per day. You have to understand that we hammer Windows 2000 really hard. We have about 75,000 different ASP pages on and a churn of up to 150,000 page changes per day, so the operational maintenance side of this site is huge. We are using stock standard Windows 2000 SP1 with MSXML 2.5. We're evaluating MSXML 3.0 right now."

  • [September 30, 2000] "Develop Web Applications with XML and Exchange 2000." By Thomas Rizzo. From MSDN Library (September 29, 2000). ['This article describes the difference between XML and HTML, and shows how you can use XML to get, set, and search data in Microsoft Exchange 2000.'] "Since its introduction, many computer pundits have touted Extensible Markup Language (XML) as a cure-all for electronic data interchange problems. The hype is often justified -- XML lets you easily describe data and share it among applications. Microsoft Exchange 2000 Server supports XML natively, making it a great Web-development platform. In this article, I'll briefly describe the difference between XML and HTML, then show how you can use XML to get, set, and search data in Exchange 2000. You can retrieve XML data from Exchange 2000 in several ways, but you'll probably use the Web Distributed Authoring and Versioning (WebDAV) protocol most often. WebDAV is an extension to the HTTP protocol that specifies how to perform file processing, making it easy to read and write to the Web. Using WebDAV commands, you can lock a resource, and get or change a property. Because it piggybacks on HTTP, WebDAV can also work through firewalls and proxy servers."

  • [September 29, 2000] "IBM, Microsoft, Ariba release WSDL Specification." By Roberta Holland. In eWEEK (September 26, 2000). "Less than a month after a coalition of 36 companies announced a wide-ranging initiative to create a directory for Web services, IBM and Microsoft Corp. have released a new language specification to describe those services. IBM, Microsoft, Ariba Inc. and other companies joined together last month on the UDDI (Universal Description, Discovery and Integration) initiative, intended to form a collection of registries and databases describing what businesses do and how to access their services electronically. What IBM and Microsoft released Monday is an XML syntax to describe those services, called the Web Services Description Language. Ariba also helped in the effort, which essentially was a merger of existing technologies from IBM and Microsoft. Officials involved say WSDL will allow for better interoperability among Web services and development tools. The language is based both on IBM's Network Accessible Services Specification Language and Microsoft's SOAP (Simple Object Access Protocol) Contract Language. WSDL was developed outside of the UDDI group and will either formally be submitted to the coalition for a specification or be submitted to a separate standards organization, said Bob Sutor, IBM's program director for e-business standards strategy in Somers, N.Y. Sutor said WSDL will not be the only choice for describing Web services, adding that Microsoft and IBM felt it made sense to combine their efforts." The WSDL specification is available for review on the IBM and Microsoft web sites. See also "Universal Description, Discovery, and Integration (UDDI)" and "Web Services Description Language (WSDL)."

  • [September 29, 2000] "Microsoft, IBM release directory specs." By James Evans. In Network World (September 29, 2000). "IBM and Microsoft have developed a language standard for the new Universal Description, Discovery and Integration business directory, which is designed to fuel business-to-business commerce. The standard, called Web Services Description Language (WSDL), is a mixture of both IBM's Network Accessible Services Specification Language and Microsoft's Simple Object Access Protocol (SOAP) contract language. SOAP is an open standards-based interoperability protocol that uses XML to provide a common messaging format to link together applications and services anywhere on the Internet regardless of operating system, object model or programming language. The companies are evaluating the appropriate path for submitting the specification to the industry as a draft for standardization. A coalition of 36 vendors and consultants are working on the UDDI business directory, which, at its core, will be an XML-based holding tank for what businesses do, the services they offer and how they interface with their computing systems. The registry announced in early September is expected to support a number of APIs for gathering and offering information. There will be three initial versions of the registry as it gradually becomes more elaborate. It initially will provide basic information and later will offer more detailed company information, such as how to deal with a specific business unit. Ariba, along with IBM and Microsoft, launched the UDDI business directory, which will be built on TCP/IP, HTML and XML. Beta testing is expected to begin sometime in October." See also "Universal Description, Discovery, and Integration (UDDI)" and "Web Services Description Language (WSDL)."

  • [September 29, 2000] "The Beginning of the Endgame. A Look at the Changes in the Pre-CR W3C XML Schemas Draft." By Rick Jelliffe. From September 27, 2000. ['The W3C's XML Schemas technology, vital to the use of XML in e-business, is finally nearing completion. This article catalogs the most significant changes from the recent draft specs,and highlights areas where priority feedback is required from implementors and users.'] "This article looks at those changes in the recent Pre-CR draft of W3C XML Schemas that will most effect developers and users. Requirements for data interchange with database systems have been important during W3C XML Schema's development. The recent changes also support markup languages and schema construction better. The Candidate Recommendation (CR) drafts are slated to appear hot on the heels of the current drafts. The XML Schema Working Group was aware that authors, implementers, schema writers, and technical evaluators needed to know the most recent changes, especially since they include some syntax changes that will affect schemas using type derivation." See "XML Schema Definition Language - Seventh Working Draft" for a summary of the 22-September-2000 XML Schema 'Pre-CR' release.

  • [September 29, 2000] "XML Q&A: From DTDs to Documents." By John E. Simpson. From September 27, 2000. "This month our question and answer column covers guidelines for good DTD design and the thorny problem of generating Microsoft Word or Adobe Acrobat documents from XML."

  • [September 29, 2000] "XML-Deviant: Schemas in the Wild." By Leigh Dodds. From September 27, 2000. "As adoption of W3C XML Schema technology increases, the need for documenting best practices is becoming more important, not least where namespaces are concerned. This week the XML-Deviant revisits the topic of schemas. As the W3C XML Schemas Working Group is making final strides toward releasing XML Schemas as a Candidate Recommendation, the XML community is exploring best practice in schema design."

  • [September 29, 2000] "Guidelines for Markup of Electronic Texts." Edited by Peter C. Gorman, UW-Madison TEI Markup Guidelines Working Group; Endorsed by the UW-Madison Libraries Digital Steering Committee September 11, 2000. September 29, 2000. "This document is intended for use by staff using the Text Encoding Initiative (TEI) Guidelines TEIP3 to mark up electronic texts for inclusion in the UW-Madison Libraries' digital collections. It is not relevant to other types of projects using SGML encoding, e.g., page-image projects or digital finding aids. Some of the content has been quoted or adapted from other published guidelines, which are referenced in each case. The purpose of this document is not to teach or otherwise document the TEI itself, but rather to create a profile of the TEI for use in the UW-Madison digital library collections. It is assumed that the user is already familiar with TEI markup. The motivation for creating these guidelines is a desire to create a consistent and scalable infrastructure for text encoding projects, whereby new works can be created and added to the collection with minimal development effort on the part of project leaders, text encoders, and technical staff. At the same time, text encoded according to these guidelines should provide a suitable base for further elaboration or expansion by future encoders with minimal restructuring. At any point in this document, you can click on the magnifying glass icon to see examples of the point being discussed. The examples will open in a new window. [...] The primary motivation for creating this document was a desire to define encoding standards for a 'base' level: the minimal level of markup we would accept for locally-produced collections. The result, a 'Reading Level', falls somewhere between the poles of 'use nothing but <div0>, <p>, and <lb>' and 'TEILite is useless for real documents'. But why define a minimal level at all? For us, the answer is that we want to provide basic ('reading') access to as many materials as possible (as appropriate for the curricular and research needs of our campus), but the production of marked-up texts can be expensive." See "Text Encoding Initiative (TEI) - XML for TEI Lite."

  • [September 29, 2000] "Microsoft to air wireless server. Mobile Information 2001 Server to make its debut." By John Fontana. In Network World News (September 22, 2000). "Microsoft next week will introduce a server designed to give companies a way to wireless-enable applications for access from any number of handheld devices. The Mobile Information 2001 Server is middleware that transforms output from corporate applications into formats that can be displayed on mobile phones and other handheld devices. The server, which was code-named Airstream, also is a platform for building new wireless-enabled applications. The server will be a central point for establishing what devices can connect to the network, managing user access and security across the corporate firewall, and setting content-delivery preferences for devices such as Palm Pilots, Windows CE computers and mobile phones...Mobile Server works by taking in application data from servers and transforming it into formats such as the Wireless Markup Language or compact HTML for presentation on wireless devices. Key to the data transformation is the Extensible Stylesheet Language, which provides information to identify what type of device is requesting information and what kind of network it is running on. Mobile Server also includes support for a number of services for building mobile applications, including XML, Web Distributed Authoring and Versioning (WebDAV) and Wireless Markup Language. The server also supports standards based mobile specific transports and security mechanisms including, IETF DAV, Handheld Devices Markup Language, HTTP/HTML, XML, Secure Sockets Layer, Secure Hypertext Transport Protocol, Wireless Access Protocol, Wireless Markup Language, Active Directory Services Interface, Lightweight Directory Access Protocol and Short Message Service. The server also includes Microsoft Message Queuing to support asynchronous delivery of data to devices without a persistent connection."

  • [September 28, 2000] "Language Identification and IT: Addressing Problems of Linguistic Diversity on a Global Scale." By Peter Constable and Gary Simons. In SIL Electronic Working Papers. Reference: SILEWP 2000-001. September 2000. 22 pages. Keywords: ISO 639, RFC 1766, internationalization, I18N, linguistic diversity, web development, XML, language identification, information technology (IT). [A revised version of a paper that was presented at the 17th International Unicode Conference in San José , California in September, 2000, and which appears in the conference proceedings.] "Many processes used within information technology need to be customized to work for specific languages. For this purpose, systems of tags are needed to identify the language in which information is expressed. Various systems exist and are commonly used, but all of them cover only a minor portion of languages used in the world today, and technologies are being applied to an increasingly diverse range of languages that go well beyond those already covered by these systems. Furthermore, there are several other problems that limit these systems in their ability to cope with these expanding needs. This paper examines five specific problem areas in existing tagging systems for language identification and proposes a particular solution that covers all the world's languages while addressing all five problems." [...] The information technology (IT) industry has been driven in recent years to address problems of multilingualism and internationalization. This has been driven to a significant extent by the growth of the Internet. Rapidly increasing economic development throughout the world, together with the growth of the 'Net, has actually resulted in a significant increase in the number of languages that technologies need to support. In many parts of the world, speakers of previously 'unknown' languages (that is, unknown to speakers of 'major' languages) are beginning to make their mark on the World Wide Web, and are using their own languages to do so. Even apart from the Internet, communities of speakers of lesser-known languages are using technology to pursue linguistic development of their communities through literacy, literature development and other means. In addition, researchers such as linguists and anthropologists, development and relief organizations, and governments are pursuing interests involving thousands of different linguistic and ethnic communities around the world. In this work, they are seeking to make use of current information technologies, such as Unicode and XML. . .[Problem of scale:] The need for systems to cover thousands of languages is real, not merely hypothetical. For instance, SIL has been involved in projects in some 1,600 different languages, of which about 1,100 are current, and new projects are begun regularly. Thus, just within SIL, we have an immediate need for over 1,600 identifiers that conform to RFC 1766 for use within XML documents. We are aware of several other agencies that have similar, vastly multilingual needs, such as the Linguistics Data Consortium, the Linguist List, the Endangered Language Fund, UNESCO, various departments of the U.S. and other governments, and others. When we add the work of other institutions, individual linguists and the language communities themselves, the existing needs for language identifiers are considerably greater, and are only continuing to grow. As stated earlier, every language in the world represents a real need for a unique language identifier. When confronted with needs for thousands of language identifiers, we find that some existing systems do not scale well. There is the obvious problem of devising several thousand new tags. There are other problems with scaling, however, due either to the mechanism that a system uses for tags, or to the procedures for extending the coverage of a system. We will consider each of these in turn..." Also in PDF format. See "Names of Languages - ISO 639." [cache]

  • [September 27, 2000] "Airlines turn to XML to try to fix e-ticket transfer problems." By Michael Meehan. In ComputerWorld (September 25, 2000). "After a summer plagued by record numbers of delayed and cancelled flights, the top U.S. airlines have decided to try to fix the clunky links between their individual electronic-ticketing systems in an effort to make it easier for stranded passengers who don't have paper tickets to rebook flights with a different carrier. Jim Young, managing director for cost measurement and distribution strategy at Continental Airlines Inc. in Houston, said here last week that an XML-based standard for sharing electronic-ticket information is being developed by the OpenTravel Alliance (OTA) travel-industry trade association. Young is the chairman of the OTA, which includes all of the leading international airlines, computerized reservations systems and hotel chains. At the eTravelWorld conference, Young said the OTA is looking to fast-track the XML interoperability standard in hopes of eliminating one of the major impediments blocking a full conversion to electronic tickets. A draft of the standard is expected by year's end, and Young said a finished version could be in place before next summer's travel season starts. Currently, passengers who have electronic tickets have to wait in line to receive a paper ticket from their initial airline if a flight has been canceled and they want to try to switch to another carrier. In addition, airline employees must fill out a handwritten 'flight interruption manifest' for each ticketholder who's looking to rebook elsewhere. But with an industry-standard setup based on XML, Young said, a passenger's electronic ticket could automatically be transferred to another airline's system. The common XML technology would provide an easy-to-process format for all the airlines and could make electronic tickets more valuable than paper ones, he added. 'We want to create an environment where we're treating our electronic customers better than our paper-ticket customers, which is certainly the opposite of what it is today,' Young said. Al Lenza, vice president of distribution planning at Minneapolis-based Northwest Airlines Inc., said 67% of his company's domestic flyers use electronic tickets -- making it imperative that the transferability problem be solved. At the conference, executives from Chicago-based United Air Lines Inc. and Fort Worth, Texas-based American Airlines Inc. also pledged their commitment to fixing the problem..." See "OpenTravel Alliance (OTA)."

  • [September 25, 2000] "XML from Your Palm." By Norman Walsh (Staff Engineer. Sun Microsystems, XML Technology Center). From Sun Developer Connection. "If you're like me, you rely on your Palm organizer to keep a semblance of order in your life. Without it, I wouldn't get to meetings on time, or remember to participate in telephone conference calls[1] , or know how to reach my colleagues when I'm on the road. Unfortunately, for all its benefits, I still have some troubles with my Palm. Among them, the fact that I can't sync my Palm address book with other information management tools that are important to me (e.g., my BBDB in Emacs) or publish the calendar on the web so that I can share my calendar with my manager and colleagues. Now, I'm sure I could have gone out and found solutions for some of these problems, for example, one of the web calendar syncing tools, but when I hear a problem described that involves open-format information exchange and multiple output formats, one answer springs immediately to mind: XML. So, what I wanted was some way to sync my Palm to my desktop machine using XML so that I could transform the XML into other formats and sync it with other formats. . . I'm by no means done hacking my XML address book and schedule. I still have to write the XML BBDB/Palm merging tool and I may decided to write a stylesheet for putting my address book online. I hope some of you find the SyncXml Conduits useful and feel inspired to introduce XML into your applications. Let me know!"

  • [September 23, 2000] "The Petri Net Markup Language." 6 pages, with 6 references. By Matthias Jüngel, Ekkart Kindler, and Michael Weber (Humboldt-Universität zu Berlin; email: {juengel|kindler|mweber} 31-August-2000. To be presented at the 'Workshop Algorithmen und Werkzeuge für Petrinetze, Koblenz University, Germany, 2000. ['A position paper which argues in favour of a generic interchange format and discusses the basic idea of PNML.'] "At the "Meeting on XML/SGML based Interchange Formats for Petri Nets" held in Aarhus in June 2000, different aspects of interchange formats for Petri nets were discussed, requirements were identified, and several interchange formats were proposed. Here, we present an interchange format for Petri nets that is based on this discussion and on our preliminary proposal. We call it the Petri Net Markup Language (PNML). The proposed format is quite basic, but it is open for future extensions. In this paper, we present the concepts and terminology of the interchange format as well as its syntax, which is based on XML. It should provide a starting point for the development of a standard interchange format for Petri nets. Concepts and terminology: Before introducing the syntax of the interchange format, we briefy discuss its basic concepts and terminology, which is independent of XML. . . A file that meets the requirements of the interchange format is called a Petri net file; it may contain several Petri nets. Each Petri net consists of objects, where the objects, basically, represent the graph structure of the Petri net. Thus, an object is a place, a transition, or an arc. For structuring a Petri net, there are three other kinds of objects, which will be explained later in this section: pages, reference places, and reference transitions. Each object within a Petri net file has a unique identifier, which can be used to refer to this object. For convenience, we call places, transitions, reference places, and reference transitions nodes, and we call a reference place and a reference transition a reference node. In order to assign further meaning to an object, each object may have some labels. Typically, a label represents the name of a node, the marking of a place, the guard of a transition, or the inscription of an arc. The legal labels -- and the legal combinations of labels -- of an object are defined by the type of the Petri net, which will be defined later in this section. In addition, the Petri net itself may have some labels. For example, the declaration of functions and variables that are used in the arc-inscriptions could be the labels of a Petri net. We distinguish between two kinds of labels: annotations and attributes. Typically, an annotation is a label with an infinite domain of legal values. For example, names, markings, arc-inscriptions, and transition guards are annotations. An attribute is a label with a finite (and small) domain of legal values. For examples, an arc-type could be a label of an arc with domain: normal, read, inhibitor, reset (and maybe some more). Another example are attributes for classifying the nodes of a net as proposed by Mailund and Mortensen. Besides this pragmatic difference, annotations have graphical information whereas attributes do not have graphical information. [...] The available labels and the legal combinations of labels for a particular object are defined by a Petri net type. Technically, a Petri net type is a document that defines the XML-syntax of labels; i.e., either a DTD-file or an XML-Schema. In this paper, we concentrate on a single Petri net type: a Petri net type for high-level Petri nets. In principle, a Petri net type can be freely defined. In practice, however, a Petri net type chooses the labels from a collection of predefined labels, which are provided in a separate document: the conventions. The conventions guarantee that the same label has the same meaning in all Petri net types. This allows us to exchange nets between tools with a different, but similar Petri net type. . . we present some concrete XML syntax in order to exemplify the concepts discussed in Sect. 2. Here, we can only give a flavour of PNML by examples. The examples are pieces of a PNML-coded document representing a Petri net. The examples refer to the PNML version 0.99. In PNML, the net, the Petri net objects, and the labels are represented as XML elements. An XML element is included in a pair of a start tag <element> and an end tag </element>. An XML element may have XML attributes in order to qualify it. XML attributes of XML elements are denoted by an assignment of a value to a key in the start tag of the XML element <element key="value">. XML elements may contain text or further XML elements. An XML element without text or sub-elements is denoted by a single tag <element/>. In our examples, we sometimes omit some XML elements. We denote this by an ellipsis (...). The tags of the following XML elements are named after the concepts given in Sect. 2 except for the labels. Labels are named after their meaning. Thus, an unknown XML element appearing in a Petri net or in an object may indicated as a label of the net or the object..." See: "Petri Net Markup Language (PNML)" and "XML and Petri Nets." [cache]

  • [September 23, 2000] "Textual Interchange Format for High-level Petri Nets." By R.B. Lyngsø and Thomas Mailund [Jensen]. "In this paper a text format for High-level Petri Net (HLPN) diagrams is presented. The text format is designed to serve as a platform-independent file format for the Design/CPN tool. It is consistent with the forthcoming standard for High-level Petri Nets. The text format may also be seen as our contribution to the development of an open, tool-independent interchange format for High-level Petri nets. The text format will make it possible to move Design/CPN diagrams between all supported hardware platforms and versions. It is also designed to be a bridge to other Petri Net tools, e.g., other analysis tools which the user may want to use with Design/CPN diagrams. The proposed text format does not address any standardization for the inscription language used in the diagram. It is, however, possible to extend the format to incorporate such a standardization. The text format is designed for the exchange of Hierarchical Coloured Petri Nets but the structure is general enough to cope with other High level Petri Nets as well. The text format presented here has been implemented as part of Design/CPN version 3.1. [...] Design/CPN is a widely used tool within the Petri Net community and has been developed for more than 10 years. The tool has been used in many projects in a broad range of application areas. Design/CPN supports Hierarchical Coloured Petri Nets (CP-nets or CPNs) with complex data types (colour sets) and complex data manipulations (arc expressions and guards) - both specified in the functional programming language CPN ML. It also supports hierarchical CP-nets, i.e., net diagrams that consist of a set of separate modules (subnets) with well-defined interfaces. . . We believe however that the inscription language is a far more integrated part of each tool than the graphical layout, thus translation to and from a common inscription language was considered beyond the scope of this text format. On the other hand it was feasible to implement known standards for both naming conventions and syntax in order to make it easier for humans to read the description of the diagram. To that end we have chosen to use the terminology presented in the current version of the committee draft of the HLPN standard. This means that the entities known in Design/CPN as arc expressions, colour regions and guards are called arc annotations, type region and transition conditions, respectively. Furthermore we chose to use the SGML (Standard Generalized Markup Language) ISO standard, which will be described in Sect. 3. The structure of the text format is chosen to be based on the semantic properties rather than the graphical appearance. It is thus considered more important to know that a certain object is a place than it is to know that the object has the shape of an ellipse. We want the text format to be both general enough to be used by various different tools and specific enough to save the same information about a diagram as the binary format used so far. Different tools need to add different kinds of information to the text format, and later versions of Design/CPN will probably add to the format as well." See also the DTD. On Petri Nets, see "XML and Petri Nets." [cache]

  • [September 23, 2000] "Separation of Style and Content with XML in an Interchange Format for High-level Petri Nets." By Thomas Mailund and Kjeld H. Mortensen (Department of Computer Science University of Aarhus, Denmark). "Style sheets have been proposed by the World Wide Web Consortium (W3C) as a means for separating presentation and content in World Wide Web documents, and they are now widely used and generally popular. In this paper we use the same idea for an XML interchange format proposal for high-level Petri nets. There are several benefits of using this design principle: It is easier to exchange content-only for non-graphics tools, and alternative styles can conveniently be replaced to make a new graphical appearance. The ideas presented in this paper are illustrated by means of examples. . . Most high-level Petri net tools can agree on the underlying mathematical model of a Petri net: a set of places, a set of transitions, a set of arcs connecting places and transitions, and some annotations describing data types and data. It is quite another matter when it comes to the graphical layout of nets. Different tools have dierent graphical attributes that can be associated with the net elements. To be able to interchange models between different tools, we would like a minimal format containing only the information found in the standard for highlevel nets, and a more format for describing graphical attributes. The latter should be extendable to include tool-specific graphical information. . . In this paper we suggest how to separate presentation and content for a high-level Petri net interchange format. We used CSS examples to illustrate the main benefits of such a design technique. However CSS is more suitable for presentation of text rather than vector graphics such as Petri nets. Hence as a future activity we propose to design a special purpose style language in the context of the interchange format activity." On Petri Nets, see "XML and Petri Nets." [cache]

  • [September 23, 2000] "An Experimental Approach Towards the XML Representation of Petri Net Models." By Ousmane Sy, Mathieu Buffo, and Didier Buchs (LIHS-FROGIS, Université Toulouse I, Place Anatole France, F-31042 Toulouse CEDEX, France). "XML has attracted many application developers because it eases the interchange of data between heterogeneous applications. XML based proposal are currently being elaborated to store Petri nets models. The fact that some Petri nets tools already use XML for storage purposes shows that XML may be suitable for Petri nets tools. But it also raises the question of interchange of models between different tools. Indeed, each tool defines its own storage format (DTD - Document Type Definition) using XML, which can make the interchange difficult if the DTD is not standardized. In order to get insights for the definition of standard XML representations, we have set up a research team whose goals are the following: (1) make a survey of the available tools in order to gather information about the various features they support; (2) extract and build a taxonomy of formalisms in order to identify clusters of Petri net dialects that have the same needs; (3) and finally propose a XML representation standard for Petri nets tools derived from this taxonomy. This paper presents the preliminary results derived from our survey and will be completed according to incoming information up to the Petri net conference." See: "Petri Net Markup Language (PNML)."

  • [September 23, 2000] "XML Based Schema Definition for Support of Inter-organizational Workflow." By W.M.P. van der Aalst and A. Kumar. Paper presented at the "Meeting on XML/SGML based Interchange Formats for Petri Nets," 21st International Conference on Application and Theory of Petri Nets [ICATPN 2000]. Aarhus, Denmark, June 26-30, 2000. 41 pages (with 40 references). "Commerce on the Internet is still seriously hindered by the lack of a common language for collaborative commercial activities. Although XML (Extensible Markup Language) allows trading partners to exchange semantic information electronically, it does not provide support for document routing. In this paper, we propose the design for an eXchangeable routing language (XRL) using XML syntax. Since XML is becoming a major international standard, it is understood widely. The routing schema in XRL can be used to support flexible routing of documents in the Internet environment. The formal semantics of XRL are expressed in terms of Petri nets and examples are used to demonstrate how it can be used for implementing inter-organizational electronic commerce applications. . . A core feature of XRL is that it provides a mechanism to describe processes at an instance level, i.e., an XRL routing schema describes the partial ordering of tasks for one specific instance. Traditional workflow modeling languages describe processes at a class or type level. Workflow instances, often referred to as cases, typically have a state which is expressed in terms of the class model. From an efficiency point of view, it is beneficial to split state (i.e., instance level) and process model (i.e., class level): If there are many instances of the same class, then duplication of routing information is avoided. However, in the context of inter-organizational workflow such a split is undesirable. It is unrealistic to assume that the different organizations share a common process model. Moreover, it should be possible to migrate instances (or parts of instances) from one organization to another without prior agreement on the precise order in which tasks are executed. Since the XRL routing schema describes the partial ordering of tasks for one specific instance instead of a class: (1) the schema can be exchanged more easily, (2) the schema can be changed without causing any problems for other instances, and (3) the expressive power is increased (workflow modeling languages typically have problems handling a variable number of parallel or alternative branches)... A Document Type Definition (DTD) which describes all the constructs is given in Appendix 1 using standard XML notation. The DTD contains markup declarations for the class of XRL documents. The document element, also called the root, is the route element. Any XRL document should be well-formed, i.e., taken as a whole it should match the production labeled document in the XML version 1.0 standard. Moreover, any XRL document should also be valid, i.e., the document should satisfy the constraints expressed by the declarations in the DTD. In Section 4, we introduced the XRL (eXchangeable Routing Language). The syntax of this language was defined in terms of a DTD. XRL is used to describe the dynamics of inter-organizational workflows. Therefore, it is of the utmost importance to have clear semantics for each of the constructs supported by XRL. For this purpose, we map XRL onto Petri nets. On the one hand, Petri nets can be used to represent the business logic in a graphical manner. In fact, the Petri net language is close to many of the diagramming languages used by both commercial workflow management systems and researchers active in this domain. For example, workflow management systems and ERP systems such as COSA (Software Ley), Income (Promatis), BaanERP (Baan), and ARIS/SAP (IDL/SAP) use (variants of) Petri nets. On the other hand, Petri nets are a formal language with clear semantics, powerful analysis techniques, and strong theoretical results. By mapping XRL onto Petri nets, we give formal semantics, are able to reason about XRL (e.g., about its expressive power), can use state-of-the-art analysis techniques, and can use existing software..." See: "Exchangeable Routing Language (XRL)" and "XML and Petri Nets." [cache]

  • [September 23, 2000] "XML Schemas: Best Practices." By Roger L. Costello et al. (1) Hiding (Localizing) Namespace Complexities within the Schema (2) Namespaces: Expose them or Not? Purpose: collectively come up with a set of 'best practices' in designing XML Schemas. The specifics of designing a schema are dependent upon the task at hand. The goal of this effort is to come up with a set of schema design guidelines that hold true irrespective of the specific task. Below are some of the things that must be considered in designing a schema. It is by no means an exhaustive list. For example, it doesn't address when to block a type from derivation, when to create a schema without a namespace, when to make an element or a type abstract, etc. Nonetheless, it is a start to some hopefully useful discussions. First, a quick list of the issues: (1) Element versus Type Reuse; (2) Local versus Global; (3) elementFormDefault - to qualify or not to qualify; (4) Evolvability/versioning; (5) One namespace versus many namespaces (import verus include); (6) Capturing semantics of elements and types ... For schema description and references, see "XML Schemas."

  • [September 23, 2000] "Bidcom Tames the XML Beast." By Tom Sullivan. In CTO FirstMover [InfoWorld] (September 18, 2000), page 39. "XML has emerged from HTML's shadow to become a major enabler for e-business, allowing dissimilar applications to interchange data efficiently. But this same flexibility creates a challenge for e-businesses that now need to extend XML to enable business processes. It's a tall order, but San Francisco-based Bidcom, an e-services provider that enables building professionals to communicate and collaborate, manage business processes, conduct e-commerce, and access industry-related content, met the challenge handily. Bidcom CTO Larry Chen explains that from the outset his company had a variety of disparate digital and paper-based systems, such as accounting, that needed to be tied together. So Bidcom began adopting XML as a query language two years ago and has since been using it as the vehicle that enables Bidcom's systems to exchange data. By using XML as a querying language, Bidcom enabled its customers to send and receive data with each other on the Web. In this particular case, Chen says, Bidcom had to overcome two levels of XML's extensibility. The first was extending the semantics of XML itself, and the second was extending the schema. To make things simple for Bidcom and its customers, Chen opted to stick to the World Wide Web Consortium's (W3C's) standards for extending XML's semantics. The W3C provided clear guidelines and something that all parties could easily agree on. . . . On top of that platform, Bidcom uses XSL (extensible stylesheet language) style sheets to customize forms, documents, and logos for its customers. Chen says that Bidcom did this by separating the presentation from the function via XML." [Note: Mr. Chen has over ten years of experience in the technology industry. Mr. Chen, representing Bidcom, is the current chair of the Construction/Project Management working group of the aecXML Project, a building industry consortium chartered to standardize the definition and exchange of architecture, engineering and construction (AEC) data. Mr. Chen also represents Bidcom on the board of the International Alliance for Interoperability (IAI).]

  • [September 23, 2000] "W3C Is Moving Aggressively On SOAP." By Antone Gonsalves. In TechWeb News (September 21, 2000). "The World Wide Web Consortium, which recently formed a working group for standardizing SOAP, is expected to finish within a year its first specification for it, an official close to the group said Thursday. The W3C announced last week it has formed an XML protocol activity working group that would be chaired by David Fallside, who works for IBM's standards division. '[The working group] has a very aggressive schedule,' said Robert Sutor, Fallside's boss and program director of IBM e-Business Standards Strategy. 'If you look at the charter, they believe they can get this done pretty much within a year, which is extremely fast by W3C standards.' 'I don't see any major roadblocks,' Sutor said. 'It's just a question of these companies who have said they would work on it to get their technical resources out there in doing so.' Because SOAP is XML-based, the technology is easier to use than programming-based cross-platform solutions. However, it is best suited for lightweight applications and exchanging information in an environment like the Web. It is not optimized for industrial-strength applications requiring tightly coupled, synchronous, ultra-secure processing among applications, observers said." See: (1) "Simple Object Access Protocol (SOAP)" and (2) the earlier announcement for the W3C working group and related XML Protocol Activity.

  • [September 23, 2000] "SXML: Streaming XML. By Boris Rogge, Dimitri Van De Ville, Rik Van de Walle, Wilfried Philips, and Ignace Lemahieu (University of Ghent, B-9000 Ghent, Belgium); Tel: 1 +32 9/264.89.11; Fax: 1 +32 9/264.35.94; E-mail: Pages 389-393 (with 9 references) in ProRISC/IEEE. Proceedings of the 10th Annual Workshop on 'Circuits and Systems and Signal Processing'. [ISBN: 90-73461-18-9.] Utrech, Netherlands. "When broadband networks will be implemented, huge amount of bandwidth will be available and hence, a number of new applications will emerge. Application developers will need a framework that enables them to utilize the possibilities of these types of new networks. In this article we present a document type that will allow the addition of (meta-)information to data streams and the synchronization of different data streams. It is called SXML (Streaming XML) and is based on the eXtensible Markup Language (XML). The SXML grammar is defined in a document type definition (SXML-DTD). The content of an SXML document can be processed in real time or can be retrieved from disk. XML is being used in a complete new manner and in a totally different environment in order to easily describe the structure of the stream. Finally, a preliminary implementation has been developed and is being tested." See similarly: B. Rogge, R. Van de Walle, I. Lemahieu, and W. Philips, "Introducing Streaming XML (SXML)," in SPIE International Symposium on Voice, Video, and Data Communications, (Boston), 2000. Accepted for publication; in press. [cache]

  • [September 23, 2000] "A New Method for Synchronizing Media Based on XML." By Boris Rogge, Dimitri Van de Ville, Rik Van de Walle, Wilfried Philips, and Ignace Lemahieu (University of Ghent, Ghent, Belgium). Presented at EUROMEDIA'2000. See "Ghent builds XML-driven demonstrator for remote multimedia data access in teleradiology and PACS": "XML can mean a big help when it comes to synchronising multimedia data in radiology departments. Huge amounts of data have to be post-processed, so you will have to need a system which is fast and easy to use and has been implemented by using open standards. The suitable characteristics of the Internet do not suffice for real synchronisation. The Quality of Service (QoS) heavily relies on the server. To meet these requirements, the ELIS research team has built an XML-driven demonstrator for radiological images, administrative patient data, and radiologist's comments. The database describes the links between the data." Related publications: (1) R. Van de Walle, B. Rogge, and I. Lemahieu, "Remote multimedia information access by using extensible markup language (XML)," in Proceedings Euromedia, (Antwerp), 2000. Invited presentation. (2) R. Van de Walle, B. Rogge, K. Dreelinck, and I. Lemahieu, "XML-based description and presentation of multimedia radiological data," in SPIE International Symposium on Voice, Video, and Data Communications, (Boston), 2000. Accepted for publication, in press.

  • [September 22, 2000] "Towards a Library of Formal Mathematics." By Andrea Asperti, Luca Padovani, Claudio Sacerdoti Coen, and Irene Schena (Department of Computer Science, University of Bologna). Paper submitted to TPHOLS2000. "The Extensible Markup Language (XML) opens the possibility to start anew, on a solid technological ground, the ambitious goal of developing a suitable technology for the creation and maintenance of a virtual, distributed, hypertextual library of formal mathematical knowledge. In particular, XML provides a central technology for storing, retrieving and processing mathematical documents, comprising sophisticated web-publishing mechanisms (stylesheets) covering notational and stylistic issues. In this paper, we discuss the overall architectural design of the new systems, and our progress in this direction." Note also "Formal Mathematics in MathML," presented at the October MathML Conference at UIUC. Published within the HELM Project [XML and the Hypertextual Electronic Library of Mathematics]. See further references in the W3C Math Home Page and "Mathematical Markup Language (MathML)."

  • [September 22, 2000] "XML, Stylesheets and the re-mathematization of formal content." By Andrea Asperti, Luca Padovani, Claudio Sacerdoti Coen, and Irene Schena (Department of Computer Science, University of Bologna). Submitted to LPAR2000. "An important part of the descriptive power of mathematics derives from its ability to represent formal concepts in a highly evolved, two-dimensional system of symbolic notations. Tools for the mechanization of mathematics and the automation of formal reasoning must eventually face the problem of re-mathematization of the logical, symbolic content of the information, especially in view of their integration with the World Wide Web. In a different work, we already discussed the pivotal role that XML-technology is likely to play in such an integration. In this paper, we focus on the problem of (Web) publishing, advocating the use of XSL-Stylesheets, in conjunction with the Mathematical Markup Language (MathML), as a standard, application independent and modular way for associating notation to formal content." Published within the HELM Project [XML and the Hypertextual Electronic Library of Mathematics]. See further references in the W3C Math Home Page and "Mathematical Markup Language (MathML)."

  • [September 22, 2000] "W3C Pursues SOAP-like Activity." By Carolyn Duffy Marsan. In Network World Fusion (September 19, 2000). The World Wide Web Consortium has created a new working group to develop an XML-based messaging protocol, but the international standards body says it won't rubberstamp Microsoft's Simple Object Access Protocol - SOAP - as some vendors had hoped. W3C's XML Protocol Working Group will develop a common way for Web applications to communicate with each other in an automated fashion using XML-encoded messages. An anticipated feature of the next-generation Web, XML is a simple, flexible text format designed for large-scale electronic publishing. . . The XML Protocol Working Group will host its first face-to-face meeting in October, with an initial working draft of the protocol due next January. Companies already signed up to participate in the working group include IBM, Epicentric, SAP and Jamcracker. One unusual aspect of the XML Protocol Working Group is that it will conduct all of its business in public, W3C spokeswoman Janet Daly says. The working group's charter, members and meeting minutes will be posted on the W3C's Web site." See (1) "Simple Object Access Protocol (SOAP)" and (2) the earlier announcement for the W3C working group and related XML Protocol Activity.

  • [September 22, 2000] "XML Overview." From Microsoft ISN (Internet Services Network). August 31, 2000. "Extensible Markup Language (XML) is the universal language for data on the Web. XML gives developers the power to deliver structured data from a wide variety of applications to the desktop for local computation and presentation. XML allows the creation of unique data formats for specific applications; it is also an ideal format for server-to-server transfer of structured data. There are many benefits to using XML both on the Web and in the middle tier: (1) Delivers data for local computation. Data delivered to the desktop is available for local computation. The data can be read by the XML parser, then delivered to a local application such as a browser for further viewing or processing. Or the data can be manipulated through script or other programming languages using the XML Object Model. (2) Gives users an appropriate view of structured data. Data delivered to the desktop can be presented in multiple ways. A local data set can be presented in the view that is right for the user, dynamically, based on factors such as user preference and configuration. (3) Enables the integration of structured data from multiple sources into common logical views. Typically, agents will be used to integrate data from server databases and other applications on a middle-tier server, making this data available for delivery to the desktop or to other servers for further aggregation, processing, and distribution. (4) Describes data from a wide variety of applications. Because XML is extensible, it can be used to describe data contained in a wide variety of applications, from describing collections of Web pages to data records. Because the data is self-describing, data can be received and processed without the need for a built-in description of the data. (5) Improves performance through granular updates. XML enables granular updating. Developers do not have to send the entire structured data set each time there is a change. With granular updating, only the changed element must be sent from the server to the client. The changed data can be presented without the need to refresh the entire page or table. To date, Microsoft has actively participated in the W3C creation and standardization of XML and has aggressively delivered XML support in its products..."

  • [September 22, 2000] "XML, .NET and distributed applications." In Analytic. September 2000. 'Analysis of the market and technical aspects of electronic publishing'. Also in .DOC format.

  • [September 22, 2000] "Getting The Message. [APPLICATION INTEGTRATION & MANAGEMENT.]" By Max P. Grasso. In Application Development Trends (August 2000). ['EJB, COM, RPC and CORBA get the headlines, but messaging middleware is still the preferred choice for building large, distributed corporate systems; the addition of XML is expected to quickly boost capabilities.'] "Messaging middleware can offer more than primitives for message delivery. In the past few years, many middleware products have started to provide higher level services that we will cumulatively call message brokering. These services allow sophisticated custom logic to run co-located with the middleware and to examine, route, filter and modify messages according to content or message envelope attributes. In practice, message brokering can be exploited to the point where a substantial part of a business app will be contained in the middleware's nodes and the global communication patterns will be defined by the local behavior at the nodes. In the past, messaging middleware required the use of proprietary message definitions, so that it could look not only in the message envelope but also into the message content formats -- and could possibly transform the messages in transit. Because this required a specific product focus, vendors would either focus on the message delivery or the message-brokering facilities. The acceptance of eXtensible Markup Language (XML) has changed things dramatically. It has allowed any messaging product to easily provide message-brokering functionality for XML-formatted messages. Such facilities can be put together easily with vendor-provided or open source tools. Without a doubt, messaging products will make increasing use of XML and provide message filtering and content-based routing. In general, message-brokering facilities provide an excellent platform for integrating legacy and new systems, as long as one can easily describe business flows in terms of event-driven logic (otherwise, we would rather recommend developing a solution using distributed components). This approach to solving business problems using routing, filtering and transformation logic in the messaging middleware is targeted by EAI products."

  • [September 22, 2000] W3C RDF and ISO/XML Topicmap discussion. Dan Connolly posted the announcement: "Following the energy and good will built up at the 'RDF vs. Topic Maps' session at Extreme Markup Languages in Montreal, a few of us got together by phone/IRC this morning (well... morning in my time zone). The notes we managed to take are available at" See RDF - ISO/XML Topicmap Agenda - 20000918: "A continuation of the conversations that started in Montreal at the Extreme Markup Languages Conference to increase knowledge and understanding about the relationships between the W3C RDF and ISO/XML Topicmap activities." See "(XML) Topic Maps" and "Resource Description Framework (RDF)."

  • [September 22, 2000] Zvon Xlink Reference. Miloslav Nic announced 2000-09-22: "Jiri Jirat has published XLink reference and XLink examples at and The examples are functional If you have browser which supports Xlink (Mozilla M17 recommended), you can test real behaviour. If you do not have Xlink capable browser, you can go through tutorial where the behaviour is simulated (using JavaScript)." Main features (1) Names of elements and attributes of Xlink are clickable; (2) Click on 'Go to standard' leads to the relevant part of the W3C XLink specification; (3) Where applicable, links to relevant examples are given." See XLink resources referenced in "XML Linking Language."

  • [September 22, 2000] "Out-of-Line XML Linking with X2X: Satisfying the need for better access to information." By Jason Markos (Director of Research and Development, empolis UK). Technical [white] paper. "In the past, [linking] has been enabled through either HREFs or ID/IDREFs. While these two approaches are syntactically different, they are very similar in how they work. They are both based on the conventional idea of a link being about a source file and a target file and embedding the syntax within the source file to get to the target. This approach has a number of limitations. Firstly, neither of them are bi-directional links. However, the biggest failing is that the link has no way of expressing any semantic value as to why the relationship was established. For example, HREFs rely on the text that is highlighted to explain why the link was inserted. ID/IDREFs rely on the element or it's content in a similar way. This model works if the authors understand the semantic meaning of the links that should point to or from the file that they were authoring. [Hence:] Out-of-line Linking and XLink. The fundamental principle behind 'out-of-line' linking is that the link information is stored independently of the resources that are being linked together. This concept has been around for a considerable time, but it wasn't until HyTime was produced that the idea was popularised. While XLink is a more recent standard, it is trying to solve the same problems that HyTime tries to address with regards to linking issues. Out-of-line linking has much richer functionality than the more conventional types of linking discussed earlier. This richer functionality can allow us to address some of the linking issues such as multi-directional navigation, management, validation and link semantics. There is a lot more information to be captured in an out-of-link. A link can have a title and other attributes which allows applications to distinguish different links. Each link can be viewed as a typed relationship between a number of resources. Each one of these resources can be a document, a part of a document or a data fragment. These points are represented using anchors. An anchor is a handle to a document or a position within a document. Each anchor can have a role defining the part that the anchor plays within the link. . . In summary, [the XLink tool] X2X enables you to: (1) Link between resources without the need to change them. (2) Build new documents dynamically from a template link document. (3) Have true bi-directional links between resources. (4) Group sets of related resources in strongly typed relationships. (5) Link between structured and unstructured information. (6) Manage large repositories of link information in a centralised efficient manner. (7) Link resources that are stored in a variety of different data repositories. (8) Create links using any application that creates XML XLink documents." Note: X2X from empolis UK is advertised as "the first commercial XLink engine that enables organisations to create links that are more valuable than the information they link..." See XLink resources referenced in "XML Linking Language."

  • [September 22, 2000] "Out-of-Line Linking with X2X: Satisfying the need for better access to information." By Jason Markos. In Interchange [ISSN: 1463-662X] Volume 6, Number 1 (March 2000), pages 12-15. "Approximately eighty percent of all Web page accesses are from links from four key sites; search engine sites such as Alta Vista. Apart from the fact that it is very interesting that these four sites have such a monopoly, they are all dependent on links. Linking provides the potential to define relationships between information sets that can be tailored, maintained, and given extended functionality based on individual user requirements. This is not about replacing search engine technology, but further enabling it. The fundamental principle behind out-of-line linking is that the link information is stored independently of the resources that are being linked together. Once we have our link information expressed in a rich standards-based format (XLink), an important remaining issue is how to combine that link information with the resources for delivery to end users. This is often referred to as link resolving or resolution. X2X, a product from STEP UK, has been designed to provide an engine for the resolution of out-of-line links based on XLink." See XLink resources referenced in "XML Linking Language." [See the previous entry.]

  • [September 22, 2000] "What is XLink?" By Fabio Arciniegas A. From (September 20, 2000). ['XLink is an XML specification for describing links between resources in XML. Our introduction shows you how to get to grips with using XLinks in your own documents.'] "The very nature of the success of the Web lies in its capability for linking resources. However, the unidirectional, simple linking structures of the Web today are not enough for the growing needs of an XML world. The official W3C solution for linking in XML is called XLink (XML Linking Language). This article explains its structure and use according to the most recent Candidate Recommendation. . . [Conclusion:] XLink is a powerful and compact specification for the use of links in XML documents. Even though XLink has not been implemented in any of the major commercial browsers yet, its impact will be crucial for the XML applications of the near future. Its extensible and easy-to-learn design should prove an advantage as the new generation of XML applications develop." Part 2 of the article is "XLink Reference." See XLink resources referenced in "XML Linking Language."

  • [September 22, 2000] "Getting into i-Mode." By Didier Martin. From (September 20, 2000). ['Following on with his investigations into XML and wireless devices, Didier Martin explains i-Mode, the technology fueling the Japanese explosion in wireless Web access, and contrasts it with WAP.'] "If we look at the technology, the i-Mode service is based on packed switched overlay over circuit-switched digital communications. In contrast to most European or North American WAP services, it is based on TCP/IP, is always on, and hence does not require a dial-in connection. The content is encoded in an HTML variant named cHTML (Compact HTML)... The first thing to notice is that cHTML is unfortunately not XML-based. A cHTML document is like an HTML document. In contrast to a WAP document, which contains more than one screen (i.e. cards), a cHTML document contains only one screen. Thus, the cHTML rendering model is identical to the HTML rendering model: one page at a time." Note: See "Compact HTML for Small Information Appliances" submitted as a NOTE to W3C on 1998-02-09.

  • [September 22, 2000] "XML-Deviant: Super Model." By Leigh Dodds. From (September 20, 2000). ['Growing interest in RDF is seeing renewed work to increase understanding of the specification, including a move to separate RDF's simple data model from its oft-maligned syntax.'] "The RDF Interest Group has recently been gathering momentum, and the XML-Deviant takes a look at the progress they're making towards improving understanding of RDF." See references in "Resource Description Framework (RDF)."

  • [September 22, 2000] "XML: Get out your hype-o-meter." By Mary Jo Foley [ZDNet News]. In ZDNet News (September 21, 2000). "It slices. It dices. It cuts through all standards confusion with a single swish. Just when you think the Exensible Markup Language (XML, to you) can't get any hotter or more hyped, it's poised to be invoked yet again over the next few weeks, as various software vendors trot out their scalability announcements. When Microsoft talks up its enterprise servers next week in San Francisco, you can bet that XML will be touted as Redmond's answer to questions about why it still has no cross-platform strategy. XML is part of just about all of the 2000 generation of Microsoft products that it officially will roll out next week: Exchange Server, SQL Server, BizTalk Server, Commerce Server, et al. XML also likely will be a key component of Microsoft's Airstream wireless middleware platform. Sources said Microsoft might use its Enterprise 2000 launch to roll out Beta 1 of Airstream next week..."

  • [September 22, 2000] "IntraNet Solutions Wants to Spread Some XML." By Clint Boulton. In (September 19, 2000). "With an eye toward securing the standards employed in its product line, IntraNet Solutions Inc. Tuesday joined the Organization for Structured Information Standards. OASIS, the world's largest Extensible Markup Language (XML) interoperability consortium, fights for industry standards to make it easier for firms to conduct business online. Essentially, it lobbies for businesses to use XML, a common Web language, in formats that can be shared and accessed across applications. Why XML as opposed to HTML? The latter is a set language while the former allows designers to create their own customized tags, enabling the definition, transmission, validation, and interpretation of data between applications and between organizations..."

  • [September 22, 2000] "XML and the Net: Better Than Sliced Bread - Really." By David Streitfeld [Washington Post Service] In International Herald Tribune (September 21, 2000). "The programming language known as XML may be taking the Internet to a new level, but its many fans have not yet settled on a mundane way to describe to the uninitiated exactly what it does. It is the glue that knits the Net together, says one. A bar code for information, says another. It is the lingua franca of electronic commerce and content, says a third. It is an Esperanto for computers, says a fourth, referring to an attempt to merge the main European languages into a universal form of communication. If anything, such labels understate the fervor with which XML is being touted. Networking guru Craig Burton has said XML will do for this era what calculus did for the Renaissance: Both are dynamic new approaches that render the previous system outmoded, if not irrelevant. David Turner, a product manager at Microsoft Corp. specializing in XML, goes even further. ''The introduction of XML is in many ways like the creation of writing in the evolution of language,'' he said. ''People had spoken language for a long period before they got to the point of inventing writing. But as soon as they did, they were able to make huge steps forward.'' XML, which stands for extensible markup language, is technically a metalanguage, a language that describes other languages. It is a loose cousin of hypertext markup language, whose development laid the groundwork for the Web. But while HTML is a fine tool for writing Web pages, it is static. HTML presents a fixed snapshot of data. XML merely structures it, allowing for much more fluidity..."

  • [September 16, 2000] "Introducing the Schematron. A fresh approach to XML validation and reporting." By Uche Ogbuji. In SunWorld Magazine (September 16, 2000). ['Judging from the ongoing developments and debates about XML document validation, it's evident the language is in flux. In this article, writer and consultant Uche Ogbuji gets a handle on some of these changes and introduces the Schematron, a new validation and reporting methodology and toolkit.]' The Schematron is a validation and reporting methodology and toolkit developed by Rick Jelliffe, a member of the W3C Schema working group. Without denigrating the efforts of his group, Mr. Jelliffe has pointed out that XML Schemas may be too complex for many users, and so he approaches validation from the same approach as the DTD. Jelliffe developed the Schematron as a simple tool to harness the power of XPath, attacking the schema problem from a new angle. As he writes on his Website: 'The Schematron differs in basic concept from other schema languages in that it is not based on grammars but on finding tree patterns in the parsed document. This approach allows many kinds of structures to be represented which are inconvenient and difficult in grammar-based schema languages.' The Schematron is no more than an XML vocabulary that can be used as an instruction set for generating stylesheets. . . There are several things that DTDs provide that Schematron cannot, such as entity and notation definitions, and fixed or default attribute values. RELAX does not provide any of these facilities either, but XML Schemas provide them all -- as they must, because they are positioned as a DTD replacement. RELAX makes no such claim, and indeed the RELAX documentation has a section on using RELAX in concert with DTDs. We have already mentioned that Schematron, far from claiming to be a DTD replacement, is positioned as an entirely fresh approach to validation. Nevertheless, attribute-value defaulting can be a useful way to reduce the clutter of XML documents for human readability, so we'll examine one way to provide default attributes in association with Schematron. Remember that you're always free to combine DTDs with Schematron to get the best of both worlds, but if you want to leave DTDs behind, you can still get attribute-defaulting at the cost of one more pass through the document when the values are to be substituted. This can be done by a stylesheet that transforms a source document into a result that is identical except that all default attribute values are given. . . At Fourthought, we've used Schematron in deployed work products both for our clients and for ourselves. Because we already do a lot of work with XSLT, it's a very comfortable system and there's not much training required for XPath. To match the basic features of DTD, not a lot more knowledge is needed than path expressions, predicates, unions, the sibling and attribute axes, and a handful of functions. Performance has not been an issue because we typically have strong control over XML data in our systems and rarely use defaulted attributes. This allows us to validate only when a new XML datum is input, or an existing datum has modified our systems, reducing performance concerns. Schematron is a clean, well-considered approach to validation and simple reporting. XML Schemas are significant, but it is debatable whether such a new and complex system is required for validation. RELAX and the Schematron both present simpler approaches coming from different angles, and might be a better fit for quick integration into XML systems. In any case, Schematron once again demonstrates the extraordinary reach of XSLT and the flexibility of XML as a data-management technology." Rick also wrote: "Because Schematron works at a level that is sort-of intermediate between XML Schemas (rather storage-oriented) and RDF (integrating statements into a semantic web), I haven't expected it to get popular until after XML Schemas is rolled out, so this article comes at a great time. I see there is also an article on Schematron in German by Oliver Becker slated for German computer journal iX in a month or two. Schematron works quite well with XML Schemas or DTDs (as well as by itself) because it expresses not only static constraints (a <dog> must contain a <tail>) but also co-occurence constraints (if <dog sex="f" then must contain <name>FIFI</name>) and even constraints based on external vocaularies in any format (if <dog sex="f" then must contain <name> with a value being any item from a list <names> in some external document)." Note that there is now an open source Schematron project on SourceForge.

  • [September 16, 2000] "XMIDDLE: An XML based Middleware for Mobile Computing." By Cecilia Mascolo and Wolfgang Emmerich. University College London, Research Note RN/00/54. Submitted for publication. September 2000. "An increasing number of distributed applications will be written for mobile hosts, such as laptop computers, third generation mobile phones, personal digital assistants, watches and the like. These applications face temporary loss of network connectivity when they move. They need to discover other hosts in an ad-hoc manner, and they are likely to have scarce resources including CPU speed, memory and battery power. Software engineers building mobile applications need to use a suitable middleware that resolves these problems and offers appropriate support for developing mobile applications. In this paper, we describe the XMIDDLE mobile computing middleware that addresses data synchronization issues using replication and reconciliation techniques. XMIDDLE enables the transparent sharing of XML trees across heterogeneous mobile hosts, allowing on-line and off-line access to data. We describe XMIDDLE using a collaborative e-shopping case study on mobile clients. . . The principal contribution of this paper is a presentation of the basic primitives provided by the XMIDDLE middleware and an overview of their implementation in the XMIDDLE architecture. XMIDDLE overcomes the defects of other mobile computing middleware approaches by firstly choosing a more powerful data structure and secondly by supporting replication and reconciliation. XMIDDLE's data structure are trees rather than tuple spaces. More precisely, XMIDDLE uses the eXtended Markup Language (XML) to represent information and supports XML standards, most notably the Document Object Model (DOM) to support the manipulation of its data. This means that XMIDDLE data can be represented in a hierarchical structure rather than, for instance, in a flat tuple space. The structure is typed and the types are defined in an XML Document Type Definition or Schema. XMIDDLE applications use XML Parsers to validate that the tree structures actually conform to these types. The introduction of hierarchies also facilitates the coordination between mobile hosts at different levels of granularity as XMIDDLE supports sharing of subtrees. Furthermore, representing mobile data structures in XML enables seamless integration of XMIDDLE applications with the Micro Browsers, such as WAP browsers in mobile phones, that future mobile hosts will include. XMIDDLE allows mobile and fixed hosts to share each other's information when they are connected, it supports the specification of the information that should be replicated so that replicas continue to be available when connections are lost. Data synchronization is considered the Achilles' heel of mobile computing and the SyncML consortium is building synchronization standards based on XML to promote interoperability and integration of mobile devices and their data. Upon re-connection XMIDDLE executes a reconciliation protocol that re-establishes of data consistency. The paper is organized as follows: in Section 2 we introduce XMIDDLE and the main characteristics of the system. In Section 3 we describe a collaborative electronic shopping system that we use as a case study to illustrate XMIDDLE. In Section 4 we depict the architecture of the system, and in Section 5 we describe the details of XMIDDLE primitives. Section 6 contains the details of the implementation of the case study. In Section 7 we discuss and evaluate the XMIDDLE system and in Section 8 we conclude the paper and list some future work." [cache]

  • [September 16, 2000] "XMILE: An Incremental Code Mobility System based on XML Technologies." By Cecilia Mascolo, Wolfgang Emmerich, and Anthony Finkelstein (Dept. of Computer Science, University College London). 2nd International Symposium on Agent Systems and Applications and Mobile Agents (ASA/MA2000), September 2000. "Logical mobility ranges from simple data mobility, where information is transferred, through code mobility that allows the migration of executable code, to mobile agents, in which code and data move together. Several application domains need a more exible approach to code mobility than the one that can be achieved with Java and with mobile agents in general. This exibility can either be required as a result of low network bandwidth, scarce resources, and slow or expensive connectivity, like in mobile computing settings, or scalability requirements like in applications on several thousand clients that haveto be kept in sync and be updated with new code fragments. We show how to achieve more fine-grained mobility than in the approaches using mobile agents and Java class loading. We demonstrate that the unit of mobility can be decomposed from an agent or class level, if necessary, down to the level of individual statements. We can then support incremental insertion or substitution of, possibly small, code fragments and open new application areas for code mobility such as management of applications on mobile thin clients, for example wireless connected PDAs or mobile phones, or more in general distributed code update and management. This work builds on the formal foundation for fine-grained code mobility that was established in [an earlier paper]. That paper develops a theoretical model for fine-grained mobility at the level of single statements or variables and argues that the potential of code mobility is submerged by the capability of the most commonly used language for code mobility, i.e., Java. We focus on an implementation of fine-grained mobility using standardized and widely available technology. It has been identified that mobile code is a design concept, independent of technology and can be embodied in various ways in different technologies. The eXtensible Markup Language (XML) can be exploited to achieve code mobility at a very fine-grained level. XML has not been designed for code mobility, however it happens to have some interesting characteristics, mainly related to exibility, that allow its use for code migration. In particular, we will exploit the tree structure of XML documents and then use XML related technologies, such as XPath and the Document Object Model (DOM) to modify programs dynamically. The availability of this technology considerably simplifies the construction of application-specific languages and their interpreters. XML provides a exible approach to describe data structures. We now show that XML can also be used to describe code. XML DTDs (i.e., Data Type Definition) are, in fact, very similar to attribute grammars. Each element ofan XML DTD corresponds to a production of a grammar. The contents of the element define the right-hand side of the production. Contents can be declared as enumerations of further elements, element sequences or element alternatives. These give the same expressive power to DTDs as BNFs have for context free grammars. The markup tags, as well as the PCDATA that is included in unrefined DTD elements, define terminal symbols. Elements of XML DTDs can be attributed. These attributes can be used to store the value of identifiers, constants or static semantic information, such as symbol tables and static types. Thus, XML DTDs can be used to define the abstract syntax of programming languages. We refer to documents that are instances of such DTDs as XML programs. XML programs can be interpreted and in interpreters can be constructed using XML technologies. When XML programs are sent from one host to another we effectively achieve code mobility." [cache]

  • [September 16, 2000] "Consistency Management of Distributed Documents using XML and Related Technologies." By Ernst Ellmer, Wolfgang Emmerich, Anthony Finkelstein, Danila Smolko and Andrea Zisman (Department of Computer Science, University College London, Gower Street, London WC1E 6BT, UK). Email contact: {E.Ellmer|W.Emmerich|A.Finkelstein|D.Smolko|A.Zisman} UCL-CS Research Note 99/94 (Submitted for Publication), 1999. "In this paper we describe an approach and associated techniques for managing consistency of distributed documents. We give an account of a toolkit which demonstrates the approach. The approach supports the management of consistency of documents with Internet-scale distribution. It takes advantage of XML (Extensible Markup Language) and related technologies. The paper contains a brief account of the base technologies and an extended discussion of related work. The approach and the toolkit are described in detail in the context of a typical application... We provide means for managing consistency of documents with Internet-scale distribution. Our approach is very simple and light-weight and can be readily deployed in a variety of different settings. This simplicity is achieved by building on existing Internet technologies which form a very powerful and widely used base. It is also achieved by exploiting an emerging standard XML (Extensible Markup Language) and related technologies. The work we have carried out originates in our interest in soft-ware engineering and particularly in the problem of managing consistency among software engineering documents. Though the work we describe has clear and immediate application in this area and we will continue to use this as an example, our concerns are wider and the approach applies, we believe to all classes of structured document. In the paper which follows we set out an example problem, managing software engineering documents produced in the UML (Unified Modelling Language). We describe XML and related technologies. We outline our approach and provide an account of our implementation of the architecture. The approach is evaluated and related work reviewed. The paper outlines future directions and suggests ways in which the overall approach could be extended."

  • [September 16, 2000] "BOX: Browsing Objects in XML." By Christian Nentwich, Wolfgang Emmerich, Anthony Finkelstein and Andrea Zisman (Dept. of Computer Science, University College London). In Software Practice and Experience, 30 (2000), pages 1-16. "The latest Internet markup languages support the representation of structured information and vector graphics. In this paper we describe how these languages can be used to publish software engineering diagrams on the Internet. We do so by describing BOX, a portable, distributed and interoperable approach to browsing UML models with off-the-shelf browser technology. Our approach to browsing UML models leverages XML and related specifications, such as the Document Object Model (DOM), the XML Metadata Interchange (XMI) and a Vector Graphic Markup Language (VML). BOX translates a UML model that is represented in XMI into VML. VML can be directly displayed in Internet browsers, such as Microsoft's Internet Explorer 5. BOX enables software engineers to access, review and browse UML models without the need to purchase licenses of tools that produced the models. BOX has been successfully evaluated in two industrial case studies. The case studies used BOX to make extensive domain and enterprise object models available to a large number of stakeholders over a corporate intranets and the Internet. We discuss why XML and the BOX architecture can be applied to other software engineering notations. We also argue that the approach taken in BOX can be applied to other domains that already started to adopt XML and have a need for graphic representation of XML information. These include browsing gene sequences, chemical molecule structures and conceptual knowledge representations."

  • [September 16, 2000] "WAP Forum moves toward Net standards." By Stephen Lawson. In InfoWorld (September 14, 2000). "The WAP Forum expects to approve Version 2.0 of WAP (Wireless Application Protocol) by mid-2001 and may complete specifications before that for features such as animation, streaming media, and downloading of music files, leaders of the group said at a press event Thursday following a two-day meeting here. The next major version of WAP, a protocol for providing Internet-based data services on mobile phones, will complete a migration to XHTML (Extensible Hypertext Markup Language) and TCP (Transmission Control Protocol) as the foundation of the technology, which will make it easier for developers to write WAP applications, said Michael Short, director of international affairs and strategy at BTCellnet, in Slough, England, and a member of the WAP Forum board of directors. The group, which has more than 580 member companies and hosted about 700 delegates here, is also making progress toward enabling additional services on WAP devices, according to Scott Goldman, chief executive officer of the WAP Forum. In addition to animation, streaming media, and music downloads, WAP will display color graphics, provide location-specific content, and allow users to synchronize information with personal information manager software on a desktop PC in a remote location. Goldman painted WAP, which has been labeled a transitional technology because of its slow performance and rudimentary display, as a vital technology even for the upcoming age of third-generation (3G) wireless communications. The 384Kbps that 3G will deliver to roving users will be shared bandwidth, so each user typically will get only 20Kbps to 30Kbps throughput to a mobile device anyway, Goldman said... While the WAP Forum moves WAP toward XHTML and TCP, another wireless Internet technology, NTT DoCoMo's I-Mode, is moving in the same direction, Goldman said. The two probably will converge next year, he said." See: "WAP Wireless Markup Language Specification (WML)."

  • [September 15, 2000] "CFOs shun manual labor. Standard takes sweat out of pulling together data." By Roberta Holland. In eWEEK (September 11, 2000). "For Greg Adams, chief financial officer of Edgar Online Inc., quarterly report time means long hours scrolling through massive Securities and Exchange Commission documents to see how the competition is doing. Currently, Adams has to search each SEC filing individually, pluck out the 20 or so variables he needs, and manually re-enter the data into a new file. But software based on an emerging standard called XBRL (Extensible Business Reporting Language) will enable him to type in competitors' ticker symbols and the variables he needs and watch while the information is automatically pulled into a spreadsheet or chart. The estimated time difference: 8 hours whittled down to 3 minutes. Based on XML (Extensible Markup Language), XBRL allows for the generation, extraction and exchange of financial data from financial statements presented in software applications. Once tagged in XBRL, the financial statements can be rendered in different forms, such as an annual report, HTML for a company Web site, tax returns or an SEC filing. The protocol is being developed by the XBRL Committee, an international coalition whose 63 members include a virtual who's who of the financial services world. The group's first published taxonomy, for financial reporting of commercial and industrial companies under generally accepted U.S. accounting principles, was released July 31. Software companies are now starting to roll out products that incorporate XBRL. While the U.S. taxonomy was the first published, roughly a dozen more are expected, to accommodate accounting terms used in other countries, including New Zealand, Australia and Germany. Once all the taxonomies are in place, the next layer will be tools to make those frameworks useful. All of the 63 companies and groups in the coalition have agreed to implement XBRL into their own products and processes. Edgar Online, in fact, is setting up a repository of companies that have their information tagged in XBRL." See: "Extensible Business Reporting Language (XBRL)."

  • [September 16, 2000] "LeapIt's CTO puts real-world lessons to work." By Mark Leon. In InfoWorld Volume 22, Issue 37 (September 08, 2000), page 44. "In 1999 Lee and two partners received financial backing from Sterling Capital, in Chicago, to found LeapIt, a distance-learning software developer that builds and hosts software allowing educators to interact with students through the Internet. The company launched last August...At General Physics, Lee got his feet wet in Internet programming, creating distance-learning portal sites for the U.S. Army, U.S. Navy, and Fortune 100 companies such as Ford and Chrysler. Before LeapIt could attract customers, however, it needed to choose what technologies to use to build its e-learning platform. Some of the leading candidates for LeapIt's system included Active Server Pages (ASP) from Microsoft, server-side Java, and XML. Lee says server-side Java and XML turned out to be the best way to protect a client's brand. 'XML in particular facilitates this,' Lee explains. 'It is due to the hierarchical nature of XML. It allows you to put not only your data but also your processes in a hierarchy. What this means is that we can formalize processes in a way that ensures that a client's existing systems can be easily integrated with ours. This preserves whatever look and feel the client currently has.' A couple of other things that sold Lee on XML was the standard's newfound status and natural fit with object-oriented technology. 'XML is here to stay,' he says. 'We made our system very granular so people can get just the information they want. To do this we created learning objects, and XML supports these very well.'

  • [September 15, 2000] "XML takes on content delivery. Web technology is poised to save time and money by eliminating redundant distribution tasks." By Roberta Holland. In eWEEK (September 11, 2000). "Having proved its mettle as a data delivery format, XML is now working its way into both content delivery and content management systems. The language, which can easily be transformed into any presentation format it needs, such as HTML, Wireless Markup Language, text and other formats, will become a crucial element in content delivery as the spectrum of devices accessing online content expands. Mark Cahill, editor of, an Internet journal of saltwater fly-fishing, said he plans to add Extensible Markup Language content delivery to his site soon, particularly for Wireless Application Protocol phones. 'It's a natural for the type of data we deliver,' said Cahill, in Worcester, Mass. 'It will enable us to give area reports to fishermen and deliver it in a way they can take with them.' Companies such as Arbortext Inc., of Ann Arbor, Mich., and Xyvision Enterprise Solutions Inc., of Reading, Mass., showcased their respective answers to how XML can be used in content delivery and management at the XML World 2000 show here last week. Arbortext unveiled at the show its Epic E-Content Engine, or E3, content aggregation and delivery server, which converts content from a variety of sources into XML and then enables the content to be served into any format a device requests. Xyvision demonstrated its recently released content management software, Content@XML, which allows editing and reuse of content, creation of single-source repositories, and publishing for multiple channels. Both companies' products are available now. The benefits that XML brings to content delivery and management are many. For starters, the language uses a media-neutral representation and can be combined and repurposed for different documents and formats. It also allows for personalization, intelligent searches and greater automation and can aggregate content from multiple sources... Users said savings in cost and time are important. Noel Albertson, director of AnswerLab, the Philadelphia-based R&D arm of AnswerThink Inc., said using XML to solve the problem of separate teams creating identical content for different formats may present some tough collaboration issues. But Albertson believes money won't be the main reason that companies gravitate toward XML."

  • [September 15, 2000] "Btrade helps business learn the lingo." By Jefferey Burt. In eWEEK (September 16, 2000). "One of the biggest e-commerce hurdles facing small and midsize enterprises has been the language they use to conduct business on the Internet. Most, armed with nothing more than a browser, use HTML or XML. Meanwhile, their larger trading partners use the legacy-based languages of EDI, including X12 and Electronic Data Interchance For Administration Commerce and Transport. Large companies also often prefer to transport e-business documents using value-added networks rather than the internet. Software vendors such os The EC Co. and Cyclone Commerce Inc. are developing tools to enable smaller businesses to more easily conduct e-business wiht larger partners. This week,, of Irving, Texas, is launching WebAcess2000, a desktop software application that quickly translates EDI documents from many of the world's larger retailers to Internet protocols, such as Extensible Markup Language, and back. It also can move the EDI documents into smaller businesses' back and systems. The software, available, now, can either be downloaded or accessed through a hosted ASP (application service provider) model for a one-time fee of $750. Those using the ASP sevice are then charged $90 per month. BTrade officials compare that with costs of $2,500 or more for a translator..."

  • [September 15, 2000] "Going to Extremes." By Liora Alschuler. From (September 13, 2000). ['Geeks in tweed and metadata maniacs, shapers of the future of structured information representation. The recent Extreme Markup Languages conference had it all. Liora Alschuler was there and reports back on the Topic Maps and RDF head-to-head.]' "XML has to date achieved a degree of syntactic, but not semantic, interoperability. On the Web, you still can't find what you need easily, and when you do, you may not know what it means. Grammars, that is, DTDs and schemas, don't supply meaning any more than the Elements of Style can tell you the size, shape, and location of a certain white whale. (The draft W3C schemas do include a type system, a necessary but not sufficient component of 'meaning.' W3C schemas figured remarkably little in discussion, although a pre-conference tutorial was well attended.) As Jeff Heflin and James Hendler put it, 'To achieve semantic interoperability, systems must be able to exchange data in such a way that the precise meaning of the data is readily accessible and the data itself can be translated by any system into a form that it understands.' The problem is that XML itself has, by design, no semantics. Or, as John Heintz and W. Eliot Kimber said, 'DTDs constrain syntax, not data models. They don't capture abstraction across models, they are simply an implementation view of a higher abstraction.' The conference program was rich in reports of real-world, large-scale implementations actively engaged in the search for meaning, and they were not all focused on Topic Maps or RDF -- although these specs (ISO and W3C respectively) were the most prevalent form of semantic representation addressed. . . Facing the conflict between Topic Maps and RDF head-on, the conference staged a debate between Eric 'RDF' Miller of OCLC and Eric 'Topic Maps' Freese of ISOGEN. Freese and Miller provided this comparison between the two specs: Both (1) are hard to read, (2) share a goal: to tie semantics to document structures, (3) provide a systematic way to declare a vocabulary and basic integrity constraints, (4) provide a typing system, (5) provide entity relationships, (6) both work well with established ontologies. Differences between the two specifications (1) Topic Maps are not XML-specific and have so far been standardized for SGML only. The XML Topic Map activity under the GCA's IDEAlliance is drafting a proposal for such an implementation. (2) RDF is also not XML-specific, but to date has been implemented only in XML. (3) RDF now has which provides a standard way to express and link an ontology; such a schema is proposed for Topic Maps. (4) RDF uses XML linking, Topic Maps use HyTime linking. (5) Topic Maps have explicit scoping. (6) Topic Maps start with the abstract layer and (optionally) link to resources; RDF starts at the resource layer and (optionally) creates an abstract layer. [...] But as C. M. Sperberg-McQueen reminded us in his closing keynote, meaning is always in the instance. It would be reassuring to think that the Topic Map and RDF folks will hold this in mind as they convene their joint meetings and deliberate on the future of angle brackets with metadata. Reducing Tosca to a Topic Map, or a set of directed graphs, and calling the libretto 'mere information,' while calling the metadata schema 'knowledge,' misses a very large and important boat. Again, as Sperberg-McQueen put it, we should all 'resist the temptation to be more meta than thou,' and not lose sight of the importance of the instance itself." For other papers on Topic Maps, see "(XML) Topic Maps"; for papers on RDF, see "Resource Description Framework (RDF)."

  • [September 15, 2000] "Classifying the Web: Glimmer of Hope For an Indexed Web. Are you ready for the 'Resource Topic Description Map Framework'?" By Liora Alschuler. In The Seybold Report on Internet Publishing Volume 5, Number 1 (September 2000), pages 23-24. ['RDF has come up short as the standard for library-like searching on the Web. But reconciliation with topic maps (a competing ISO effort) has insiders feeling better about its future.'] "The likelihood of a workable standard for Web metadata rose substantially last month when a scheduled "shootout" between the W3C's Resource Description Framework (RDF) and ISO's Topic Map (TM) standard at the GCA's Extreme Markup conference in Montréal turned instead into a lovefest. A standard for Web metadata would be very helpful in making sense of the tangle the Web has become. It is possible to search for content now, but the results are haphazard. . . In his summary compare/contrast, Eric Miller, senior research scientist at OCLC, compared RDF resources to TM topics; RDF schemas to TM templates; RDF properties to TM facets and association roles and RDF URIs to TM topic identifiers. Both specifications have a typing system, entity relationships and similar goals, but there are differences. Topic Maps are not Web-specific and will not have a Web-specific semantic until the XTM effort is completed. Unlike RDF, TMs link to actual occurrences of concepts within a resource. . . After the conference, the XML Topic Map group met and divided into three subgroups to produce a conceptual model, usage cases and the XML syntax. According to Eric Freese, director of product services at ISOGEN, harmonization with RDF is a firm requirement for all three groups. Ideally, according to Freese, the two specs will merge, but for the time being, they are targeting harmonization, such that an XSLT script could be written to convert between RDF, XML and XTM. . . User sentiment at the conference was unequivocal: if having one standard is better than none, having two is worse. The response to possible convergence of these was strong, positive and influential, given the luminaries in the audience. C. Michael Sperberg-McQueen, co-editor of the original XML Recommendation and chair of the W3C Schema working group, spoke from the floor and made the point that users have influence if they make it known that they will not tolerate dueling specifications. Sperberg-McQueen compared the relationship between RDF and topic maps to physics and chemistry, where RDF is concerned with the atomic level of representing reality, and topic maps are concerned with the molecular compositions. In his closing keynote he came back to this theme and, while a W3C partisan by association, he warned that 'there is no "better living through physics" slogan,' meaning that the conceptual purity of RDF is not more important than the utilitarian approach of topic maps. Jon Bosak of Sun Microsystems, who chaired the original XML work within W3C and has since taken up the work of ebXML, pointed out that the two specifications are aimed at different semantic levels, but, if they were combined, it could be a tremendous win for the user community." For other papers on Topic Maps, see "(XML) Topic Maps"; for papers on RDF, see "Resource Description Framework (RDF)."

  • [September 15, 2000] "XDuce: An XML Processing Language." Preliminary Report. By Haruo Hosoya and Benjamin C. Pierce (Department of CIS, University of Pennsylvania). In Proceedings of Third International Workshop on the Web and Databases (WebDB2000). May 18-19, 2000, Adam's Mark Hotel, Dallas, TX. 6 pages, 15 references. Among the reasons for the popularity of XML is the hope that the static typing provided by DTDs (or more sophisticated mechanisms such as XML-Schema) will improve the safety of data exchange and processing. But, in order to make the best use of such typing mechanisms, we need to go beyond types for documents and exploit type information in static checking of programs for XML processing. In this paper, we present a preliminary design for a statically typed programming language, XDuce (pronounced 'transduce'). XDuce is a tree transformation language, similar in spirit to mainstream functional languages but specialized to the domain of XML processing. Its novel features are regular expression types and a corresponding mechanism for regular expression pattern matching. Regular expression types are a natural generalization of DTDs, describing, as DTDs do, structures in XML documents using regular expression operators (i.e., *, ?, |, etc.). Moreover, regular expression types support a simple but powerful notion of subtyping, yielding a substantial degree of flexibility in programming. Regular expression pattern matching is similar to ML pattern matching except that regular expression types can be embedded in patterns, which allows even more flexible matching. In this preliminary report, we show by example the role of these features in writing robust and flexible programs for XML processing. After discussing the relationship of our work to other work, we briefly sketch some larger applications that we have written in XDuce, and close with remarks on future work. A formal definition of the core language can be found in the full version of this paper. . . XDuce's values are XML documents. A XDuce program may read in an XML document as a value and write out a value as an XML document. Even values for intermediate results during the execution of the program have a one-to-one correspondance to XML documents (besides some trivial differences). As concrete syntax, the user has two choices: XML syntax or XDuce's native syntax. [In the paper] we have presented several examples of XDuce programming and shown how we can write flexible and robust programs for processing XML by combining regular expression types and regular expression pattern matching. We consider XDuce suitable for applications involving rather complicated tree transformation. Moreover, for such applications, our static typing mechanism would help in reducing development periods. In this view, we have built a prototype implementation of XDuce and used it to develop some small but non-trivial applications. . . [Related work:] Mainstream XML-specific languages can be divided into query languages such as XML-QL and Lorel and programming languages such as XSLT. In general, when one is interested in rather simple information extraction from XML databases, programs in programming languages are less succinct than the same programs in a suitable query language. On the other hand, programming languages tend to be more suitable for writing complicated transformations like conversion to a display format. XDuce is categorized as a programming language... (a) A recent example of the embedding approach is Wallace and Runciman proposal to use Haskell as a host language for XML processing. The only thing they add to Haskell is a mapping from DTDs into Haskell datatypes. (b) Another piece of work along similar lines is the functional language XMlambda for XML processing, proposed by Meijer and Shields. Their type system is not described in detail in this paper, but seems to be close to Haskell's, except that they incorporate Glushkov automata in type checking, resulting in a more flexible type system. (c) A closer relative to XDuce is the query language YAT, which allows optional use of types similar to DTDs. The notion of subtyping between these types is somewhat weaker than ours (lacking, in particular, the distributivity laws used in the 'database evolution' example in Section 2.1). Types based on tree automata have also been proposed in a more abstract study of typechecking for a general form of 'tree transformers' for XML by Milo, Suciu, and Vianu. The types there are conceptually identical to those of XDuce. The type system of XDuce was originally motivated by the observation by Buneman and Pierce that untagged union types corresponded naturally to forms of variation found in semistructured databases." See the XDuce web site; XDuce examples; download XDuce system 0.1.10. For papers on XML QLs, see "XML and Query Languages." [cache]

  • [September 15, 2000] "Tree Automata and Pattern Matching." By Haruo Hosoya and Benjamin Pierce (Department of Computer and Information Science, University of Pennsylvania). Email: {fhahosoya,bcpierceg} July 18, 2000. 15 pages, 28 references. "Tree automata have been used both in program analyses for functional languages and in type systems for tree-like data such as SGML and XML. They also form a natural basis for rich forms of pattern matching on tree structures, including conventional ML-style patterns as well as extensions such as 'OR-patterns' and 'recursive patterns.' Incorporating tree patterns and tree types into full-blown programming languages involves substantial generalizations to a number of standard analyses, including (1) exhaustiveness and redundancy checks, (2) propagation of type constraints to pattern variables from the surrounding context, to avoid the need for type annotations on pattern variables, and (3) optimization of pattern matching. We present algorithms for all three. The main technical challenges arise from the fact that recursively defined patterns make termination of the algorithms somewhat delicate, and that complex control arising from alternation patterns and the first-match policy make it non-trivial to infer the best types for pattern variables. To address the first difficulty, we rely on the finiteness of state spaces generated by regular tree automata. For the second, we exploit closure properties of tree automata to maintain the precision of type information as it is propagated through complex control." See also the full paper, "Tree Automata and Pattern Matching." For related work on hedge/forest theory, see: "SGML/XML and Forest/Hedge Automata Theory." [cache]

  • [September 15, 2000] "Regular Expression Types for XML." By Haruo Hosoya, Jérôme Vouillon, and Benjamin Pierce. In Proceedings of the International Conference on Functional Programming (ICFP), 2000. 12 pages, 23 references. We propose regular expression types as a foundation for XML processing languages. Regular expression types are a natural generalization of Document Type Definitions (DTDs), describing structures in XML documents using regular expression operators (i.e., *, ?, |, etc.) and supporting a simple but powerful notion of subtyping. The decision problem for the subtype relation is EXPTIME-hard, but it can be checked quite efficiently in many cases of practical interest. The subtyping algorithm developed here is a variant of Aiken and Murphy's setinclusion constraint solver, to which are added several optimizations and two new properties: (1) our algorithm is provably complete, and (2) it allows a useful 'subtagging' relation between nodes with different labels in XML trees. We have used regular expression types in the design of a domain-specific language called XDuce ('transduce') for XML processing. In the present paper, though, our focus is on the structure of the types themselves, their role in describing transformations on XML documents, and the algorithmic problems they pose. Interested readers are invited to visit the XDuce home page for more information on the language as a whole. [Conclusions:] We have proposed regular expression types for XML processing, arguing that set-inclusion-based subtyping and subtagging yield useful expressive power in this domain. We developed an algorithm for subtyping, giving soundness, completeness, and termination proofs. By incorporating several optimization techniques, our algorithm runs at acceptable speeds on several applications involving fairly large types, such as the complete DTD for HTML documents. Our work on type systems for XML processing has just begun. In the future, we hope to incorporate other standard features from functional programming, such as higher-order functions and parametric polymorphism. The combination of these features with regular expression types raises some subtle issues. For function types, we have not found a sensible semantics of types that yields a complete algorithm. For polymorphism, inference of type arguments at type applications is not obvious (there is no unique minimal solution in general)." On XDuce, a Typed XML Processing Language: "XDuce ('transduce') is a typed programming language that is specifically designed for processing XML data. One can read an XML document as an XDuce value, extract information from it or convert it to another format, and write out the result value as an XML document. Since XDuce is statically typed, XDuce programs never crash at run time and the resulting XML documents always conform specified types. XDuce has several notable features. (1) XDuce features regular expression types, which are similar in spirit to Document Type Definitions (DTD). (2) XDuce provides a powerful notion of subtyping. (It allows any subtyping relation that you may expect from your intuition on inclusion relation of regular expressions.) It not only gives substantial flexibility in programming, but also is useful for schema evolution or integration. (3) XDuce supports regular expression pattern matching. In addition to ML-like patterns, we can match values against regular expression types." [cache]

  • [September 15, 2000] W3C XML Plenary Decision on Relative URI References In Namespace Declarations. Dan Connolly (W3C) announced the W3C's decision on the matter of relative URI references. The decision and rationale are provided in a W3C document Results of W3C XML Plenary. Ballot on relative URI References in namespace declarations. [3-17 July 2000]. "By a large margin, the XML Plenary endorses the proposal given below. A number of organizations (both some those who find the proposal acceptable and some of those voting otherwise) believe that the W3C should proceed to develop a longer-term solution by means of the normal WG process. Proposal: "to deprecate the use of relative URI references in namespace declarations; that is: to say that while they conform to the Namespace Recommendation of January 1999, later specifications such as DOM, XPath, etc. will define no interpretation for them." See the document for examples of deprecated usages of relative URI references in namespace declarations. See references in "Namespaces in XML." [cache]

  • [September 15, 2000] "Collaborative XML Publishing with Sitegarden. Sitegarden: Inexpensive, Open-Source Content Management Built on Top of Lotus Notes. Australian firm offers collaborative site-building tool that outputs XML." By Brian Dooley. In The Seybold Report on Internet Publishing Volume 5, Number 1 (September 2000), pages 1, 15-17. ['Little-known Australian vendor Touchdown Systems is looking at the Lotus Notes developer crowd as fertile ground for Sitegarden, its XML-based content management system. Despite the company's youth, however, there may be a future for the system based on its price and its surprising functionality. Inside, Brian Dooley reviews Sitegarden's functions and potential.'] "Touchdown Systems Design of Melbourne, Australia recently began selling Sitegarden, an interesting Web content-management system that first appeared on an ASP basis for a few months in 1998. An interesting aspect of Sitegarden is that HTML is produced by XML code, which is directed through supplied templates to create Active Server Pages for Microsoft Web servers. Once content has been entered into the system, it is accessible to users as ordinary Web pages. The user's browser page request goes to the Content Management Server, which returns XML code to the Content Rendering System (generally, MS IIS) using a template. The Content Rendering System uses the XML and the template to create a page, which is then served to the browser. Touchdown Systems provides mirroring software to provide true HTML pages from composite pages produced by systems such as ASP. Sitegarden's supplied templates were developed using MS IIS/ASP. The templates read the XML into ASP variables. The ASP system then displays the HTML page, plugging the ASP variables into the correct positions in the HTML layout. This template structure may be modified by an experienced IIS/ASP programmer. Templates may also be converted to any target server environment such as Apache/PHP. By creating other templates, administrators can feed XML to other applications, such as a database, different page-delivery system or syndication outlets."

  • [September 15, 2000] "Transforming XML: XSLT, Comments and Processing Instructions." By Bob DuCharme. From (September 13, 2000). ['XSLT isn't just for transforming elements and attributes. In this month's Transforming XML column we show how to create and transform processing instructions and comments too.'] "XSLT includes built-in template rules to copy the text element content from a source tree to a result tree. Comments and processing instructions, however, get left out of this; unless you specify otherwise, an XSLT processor ignores them. You can specify otherwise, and this ability gives you access to the potentially valuable information they store. You can also add comments and processing instructions to your output, which lets you store information that wouldn't otherwise fit into your document. This ability to both read and write comments and processing instructions lets you copy comments and processing instructions from your source tree to your input tree and even to convert them to elements." For other books and articles on XSL/XSLT, see "Extensible Stylesheet Language (XSL/XSLT)."

  • [September 15, 2000] "XML-Deviant: Gentrifying the Web." By Leigh Dodds. From (September 13, 2000). ['XHTML promises to civilize the unruly mass of HTML on the Web. But is anybody listening? Leigh Dodds examines whether web developers know or care about XHTML.'] "Taking a look at XHTML, the XML-Deviant finds that although the W3C HTML Activity is moving forward, the rest of the web is still lagging behind. Few XML developers haven't heard of XHTML. Ask one of them to describe what it is, and the answer you get will be something like: 'it's just HTML 4 expressed in XML.' This is quite true, but it only captures what is the first phase of a much more ambitious task. Take a look at the W3C HTML Activity page, and you'll discover that XHTML 1.0 is just the first step down a long road which is set to radically alter the face of the Web. Following the XHTML 1.0 specification comes a suite of documents that describe how XHTML can be divided up into a set of modules, and how one goes about defining new modules. This adds a great deal of extensibility to the language, allowing user-defined subsets and extensions to be created within a common framework..."

  • [September 15, 2000] "UDDI: A New Proposed Standard Delivers on Promises of the Internet for Businesses of All Sizes." From Microsoft [UDDI Web site]. 2000-09-06. "With almost unimaginable speed, e-commerce is transforming the business landscape. Groundbreaking Internet technologies are providing companies with the ability to interact with suppliers, partners, and customers online in real time. These electronic business-to-business relationships are creating incredible opportunities as organizations create powerful new ways to streamline supply chains, automate complex business processes, provide new services and reach new customers. Not since the Industrial Revolution has business faced such momentous or far-reaching change. But this is a revolution that is still in its earliest stages. While critical technologies are evolving quickly, major barriers remain. The complexity and cost inherent in sharing data over networks and across applications is a significant issue, but it is one that is being rapidly addressed by the advent of new standards and technologies such as extensible markup language (XML) and simple object access protocol (SOAP). But there is another problem. As the number of companies that offer Web-based services increases exponentially into the millions, how do buyers looking for a specific service find all of the potential sellers who can meet their needs? And once buyer and seller have hooked up, how do they ensure that they can integrate their systems to manage transactions smoothly? A new specification, called Universal Description, Discovery and Integration, or UDDI, appears to hold the answer. The result of a project initiated by Microsoft, IBM and Ariba, and announced today, UDDI will allow companies to publish information about the Web services they offer in a Universal Business Registry that will be accessible by anyone. If UDDI achieves widespread acceptance, say industry leaders, it will lead to rapid acceleration in the growth of online business-to-business commerce, helping companies of all sizes benefit from the global opportunities offered by the digital revolution. . . In simplest terms, UDDI will be a comprehensive directory of businesses operating in the online world and the Web-based services they offer. Sellers will participate without cost in the UDDI Universal Business Registry by providing contact information, product and service information. Buyers will then be able to search the registry -- again, without cost -- and locate companies that provide the products or services they need. Today, conducting a thorough search for vendors and suppliers is highly labor intensive, and in today's economy where the potential seller of a service may be located almost anywhere in the world, finding every possible vendor is virtually impossible. Existing online directories and marketplaces are a step forward, but they are usually limited by industry or region. The UDDI registry solves the problem by providing one central registry for businesses in any location and industry. In addition, the Universal Business Registry offers sophisticated search parameters that allow buyers to set parameters based on everything from geographic location to business category, service details, and technical product specifications. But in the world of electronic commerce, knowing that a company offers the services you need is not enough. In many cases, identifying a supplier is only the beginning of the process. Once a company has identified a suitable seller, there is important technical data that must be exchanged before a transaction can be completed. " See: "Universal Description, Discovery, and Integration (UDDI)."

  • [September 14, 2000] The Blocks eXtensible eXchange Protocol Framework. By Marshall T. Rose (Invisible Worlds, Inc.). Network Working Group Internet-Draft. 'draft-ietf-beep-framework-01'. 58 pages. September 11, 2000. "This memo describes a generic application protocol framework for connection-oriented, asynchronous interactions. The framework permits simultaneous and independent exchanges within the context of a single application user-identity, supporting both textual and binary messages... At the core of the BXXP framework is a framing mechanism that permits simultaneous and independent exchanges of messages between peers. Messages are arbitrary MIME content, but are usually textual (structured using XML). Frames are exchanged in the context of a 'channel'. Each channel has an associated 'profile' that defines the syntax and semantics of the messages exchanged. Implicit in the operation of BXXP is the notion of channel management. In addition to defining BXXP's channel management profile, this document defines: (1)the TLS transport security profile; and, (2) the SASL family of profiles. Other profiles, such as those used for data exchange, are defined by an application protocol designer. A registration template is provided for this purpose." See especially: (1) section on "XML-based Profiles", (2) section 6.2 for "BXXP Channel Management DTD", (3) section 6.4 for "TLS Transport Security Profile DTD", and (4) section 6.6 for "SASL Family of Profiles DTD." See extracts. For related documents, see "Blocks eXtensible eXchange Protocol Framework (BEEP)." See: "Blocks eXtensible eXchange Protocol Framework (BEEP)." [cache]

  • [September 14, 2000] "Standards will touch e-hubs, marketers alike." By Richard Karpinski. In B2B Magazine (September 11, 2000), pages 1, 41. "Based on extensible markup language, UDDI seeks to set a standard way for e-businesses to define themselves on the Internet. Some view specifications like UDDI as a threat to e-marketers, and say branding and marketing strategies will be lost when the next generation of e-commerce permits machines to talk to machines and automatically enable transactions. But UDDI proponents say it does not aim to override traditional marketing efforts but rather provide a basic infrastructure, on top of which future b-to-b marketing initiatives will ride. 'The fact that Ford has registered isn't the end of their marketing efforts. It's the beginning,' said Boris Putanec, Ariba Inc.'s VP-corporate strategy. Putanec and others compare UDDI to the Web's domain name service--the underlying technology that translates Internet protocol addresses into fam-iliar domain names. On top of DNS is an entire infrastructure of Web sites, search engines and other services that actually make up the World Wide Web. . . The UDDI initiative comes amid efforts at defining XML-based standards to grease the b-to-b tracks. Last week, for example, a group of companies working on ebXML, a project to standardize the exchange of electronic business data, said it had created a team to standardize electronic contracts and trading partnerships using XML. The ebXML group -- a joint initiative of the United Nations and Oasis, an XML standards body -- aims to define so-called TPAs, or trading partner profiles, and agreements. TPAs move beyond mere supplier look-up and define detailed technical parameters necessary for two companies to conduct transactions over the Internet. Because IBM provided much of the early work on TPAs, the project may intersect with UDDI at some point. A consolidation of XML standards would be welcome. Despite XML's promise of simplifying electronic conversations between businesses, a number of XML 'standards' are standard in name only. For instance, both Ariba and Commerce One Inc. have used XML to create a method for describing products in their procurement catalogs. But the two company's methods are still incompatible. In contrast, there seems to be broad industry backing of UDDI. Ariba, along with IBM Corp. and Microsoft Corp., proposed the UDDI standard and will be the first to roll out UDDI-supporting databases. The vendors promise to have their first UDDI products ready within 30 days." See: "Universal Description, Discovery, and Integration (UDDI)."

  • [September 14, 2000] "Beyond Schemas." By Scott Vorthmann (Extensibility, Inc.) and Jonathan Robie (Software AG). Paper presented at the Extreme Markup Languages 2000 Conference (August 13 - 18, 2000, Montréal, Canada). Published as pages 249-255 (with 3 references) in Conference Proceedings: Extreme Markup Languages 2000. 'The Expanding XML/SGML Universe', edited by Steven R. Newcomb, B. Tommie Usdin, Deborah A. Lapeyre, and C. M. Sperberg-McQueen. "The Schema Adjunct Framework is an XML-based language used to associate task-specific metadata with schemas and their instances, effectively extending the power of existing XML schema languages such as DTDs or XML Schema. This is useful because in many environments additional information which is typically not available in the schema itself is needed to process XML documents. Such information includes mappings to relational databases, indexing parameters for native XML databases, business rules for additional validation, internationalization and localization parameters, or parameters used for presentation and input forms. Some of this information is used for domain-specific validation, some to provide information for domain-specific processing. No schema language provides support for all the information that might be provided at this level, nor should it -- instead, we suggest a way to associate such information with a schema without affecting the underlying schema language." See the Schema Adjunct Framework overview in the "Schema Adjunct Framework Developer's Guide" and specification document from Extensibility. For schema description and references, see "XML Schemas."

  • [September 14, 2000] "Fragmentation May Inhibit XML Adoption." By Darryl K. Taft. In InternetWeek (September 13, 2000). "While XML is emerging as a key language of B-to-B commerce, fragmentation will be an inhibitor for XML adoption in general, a GartnerGroup XML expert said. Despite the fragmentation, GartnerGroup expects the XML uptake to be quite rapid, said Sanjeev Varma, a research director at the company. Varma spoke at GartnerGroup's 'AD Summit 2000: Application Development in the New Economy' conference in New Orleans this week. He predicts that through 2002, no single XML protocol standard will be used in more than 5 percent of all new XML applications. But by 2005, only two or three protocol standards will be used to create the protocols expressed in more than 90 percent of all new XML applications. In addition, Varma said through 2002, at least 75 percent of successfully deployed XML schemas will have been designedby individual enterprises or industry-specific trading groups. The remaining schemas will come from industry standards bodies and vendor-led initiatives, he said."

  • [September 12, 2000] "Extensible Markup Language." In Scaffolding the New Web: Standards and Standards Policy for the Digital Economy. By Martin Libicki, James Schneider, Dave R. Frelinger, and Anna Slomovic. Rand Science and Technology Institute's Report for The White House Office of Science and Technology Policy. Appendix B, 20 pages. "Commerce is buying and selling. E-commerce is commerce with human interaction replaced by digital information insofar as practical. To work, it needs a commonly understood modality of exchange; a reliable description of the product to be purchased; and, in some cases (see Appendix E), a succinct statement of the buyer's expectations. People do all this by talking to each other, using brainpower, social cues, and a shared cultural context to figure out what each is trying to say. Machines, to repeat a familiar refrain, have only symbolic notation to go by. Thus, if E-commerce is to get past its current incarnation as mail-order but with keyboards, rather than phones, such notations must be explicit, mutually understood, and well-formatted -- hence, standardized. In a field that insists on shedding its skin as often as the Web does, it may be premature to say that the search for such a standard has ended. But today's bettors seem increasingly inclined to place their money on a metalanguage, XML. The use of metalanguage is deliberate. It has been remarked that XML solves how we are going to talk to each other, but we still need to agree on what we are going to talk about. XML is the grammar, not the words -- necessary, but by no means sufficient. And therein lies both the hope and the hype of what may be the keystone of tomorrow's E-commerce. In analyzing XML, this case study attempts to do several things: explain the broader advantages of markup, trace the history of XML through its origins in earlier standards, limn its current status, and portray the hurdles it must overcome to fulfill its promise. . . XML, if it works, may very well be the heart of tomorrow's Web because documents structured in a standard can be understood and thereby manipulated by stupid but fast and cheap machines rather than intelligent but slow and expensive humans. But despite the enthusiasm with which XML is being offered to, and, accepted by the world, the hard work lies ahead. Whether the XML standards processes can result in commonly defined terms within (and, perhaps more importantly, across) the disparate communities of commerce is yet to be determined." [cache]

  • [September 12, 2000] Rand Releases Report on Digital Standards." Announcement for a Rand Report commissioned by The White House Office of Science and Technology Policy. Appendex B covers "Extensible Markup Language." Summary: "The digital economy is growing rapidly. However, the scaffolding for the Internet and World Wide Web relies on information technology standards that are older than many of the products and systems they knit together. Can the digital economy's headlong momentum be sustained absent new standards? Is the process for creating new standards up to the task? The White House Office of Science and Technology Policy recently asked RAND's Science and Technology Institute to assess the adequacy of today's standards. RAND was also asked to analyze where these standards are taking the industry and whether government intervention will be required to address systemic failures in the standards development process. Scaffolding the New Web: Standards and Standards Policy for the Digital Economy, was written by analysts Martin Libicki, James Schneider, David R. Frelinger and Anna Slomovic. The RAND research team conducted case studies that covered existing Web standards, the Extensible Markup Language (XML), digital library standards, issues related to property and privacy, and transactions between buyers and sellers in electronic commerce. The team concludes that the current standards process 'remains basically healthy' but cautions that 'the success of standards in the marketplace depends on the play of larger forces.' HTML and Java succeeded in the recent past because they were straightforward and unique ways of doing things, the analysts point out. Today, Web standards development is caught up in the contests between corporations that are trying to do end-runs around each other's proprietary advantages. Meanwhile, the standards governing the other case study areas are being buffeted by the varied interests of such affected groups as authors, librarians, rights holders, consumers, banks, merchants, privacy activities and governments. Government may not have a major role to play, according to the report. Washington might consider allowing researchers to use a fraction of their government research and development funding to work on standards, the authors suggest. But 'perhaps the best help the government can offer is to have the National Institute for Standards and Technology intensify its traditional functions: developing metrologies; broadening the technology base; and constructing, on neutral ground, terrain maps of the various electronic-commerce standards and standards contenders'." See previous entry.

  • [September 12, 2000] "XML struggling for enterprise customer acceptance. Technology is still widely lauded, but vendor agendas and slow standards development hurt implementation." By Ellen Messmer and John Fontana. In Network World (September 11, 2000). "Everyone loves XML, the technology that can be used to tag electronic document content for easy searching and sharing among business partners. Microsoft, IBM and a slew of e-commerce hotshots such as Commerce One and Ariba can't stop talking about XML as the foundation for Web-based commerce. Despite the fact that XML 1.0 debuted nearly three years ago as a World Wide Web Consortium standard, few businesses are using applications based on it, even though almost every e-commerce application vendor or network service claims to support XML. Observers say vendors of e-commerce applications are largely to blame. Many vendors at the forefront of the XML revolution are working at cross-purposes in the way they implement XML, thus forcing users to convert purchase orders defined according to Ariba's XML specification, for instance, into purchase orders defined according to rival Commerce One's specifications. Users often end up paying a service provider to do this XML 'dialect' conversion, which adds to the cost. In addition, XML is an ever-evolving set of standards that has led many to believe the technology's not soup yet... One important emerging XML standard, called XML Schema, is set to be approved by year-end. XML Schema, now a candidate recommendation within the World Wide Web Consortium, is the needed 'blueprint' for defining the structure of XML messages, says IBM's XML technology expert Bob Sutor. 'If we are exchanging business information, such as dollar amounts, it has to be in the right place and format.' Supported in software, XML Schema will spare e-commerce providers from having to write software to validate a range of business information, Sutor notes. Sutor acknowledges that few businesses are using XML in e-commerce, but he notes that the technology is gaining use in the publishing and storing of data. 'We're seeing XML a lot inside corporations, such as banks or government. The Defense Department is building repositories holding XML Schema'."

  • [September 12, 2000] "Content Management." By Sandra Haimila. In KMWorld Magazine. "Delivering more news to its customers faster is just one of the benefits Congressional Quarterly has reaped from its new XML-based content management system. And those customers can be influential people -- 95% of the members of Congress are said to be CQ readers. The nonpartisan news and legislative tracking service based in Washington, D.C., provides information on government, politics and public policy through many print and online publications. Implemented by Thomas Technology Solutions, the new content management system has streamlined CQ's editorial and production processes, speeding time to print and to the Web. It contains relational data sets that allow CQ to maintain and publish information relating to Congressional members, committees, schedules, events and floor votes, as well as to track the progress of legislation. Built on an Oracle database, the solution provides the ability to import and convert data automatically, a custom editorial interface, custom searching and reporting modules, workflow and versioning capabilities, and automatic data extraction and release mechanisms. Also, reporters can transmit and edit stories remotely, enabling CQ to post real-time information to its Web sites. 'The new system is not only fast, reliable and easy to use, but will ultimately give us much more flexibility in the presentation of our editorial content,' says Robert W. Merry, president and publisher of Congressional Quarterly. 'We are now able to provide more content and, ultimately, more print and electronic publications to our subscribers, as well as editorial input that will keep them better informed of the issues.' Incoming content is automatically captured, converted to XML and then stored in the database as XML documents. The content then can be edited by CQ staff via a custom editorial interface according to built-in workflow and business rules, and released for publication to one of CQ's many Web sites or print products. An XML authoring tool from SoftQuad is integrated into the system to facilitate XML document creation and editing. In addition to delivering news faster, the solution allows CQ to: (1) streamline information management processes by eliminating redundancy, (2) develop new publications and Web sites more rapidly, (3) share content between publications and Web sites, (4) automate the content routing and approval process, and (5) reduce information maintenance and administrative tasks."

  • [September 12, 2000] "XML marks the spot at Microsoft." By John Fontana. In Network World (September 11, 2000). ['XML is the defining technology for interoperability between unlike computing systems, Microsoft Chairman and Chief Software Architect Bill Gates told financial analysts recently. And it's the glue for Microsoft's .Net Internet platform.'] "Gates says every Microsoft product will be touched by XML. Two of the company's most popular servers already bear XML markings. The SQL Server 2000 database allows functions such as XML-based queries, and the soon-to-arrive Exchange 2000 uses XML to describe data housed in its Web Storage System. Microsoft's BizTalk Server 2000, which recently went into beta testing, is the XML workhorse, providing XML translation and tools to coordinate the delivery of XML messages....Microsoft has included SOAP 1.1 in its BizTalk Framework 2.0, an open specification for XML-based data routing and exchange. Other efforts by Microsoft include the Web site, in which XML formats, or schemas, can be submitted for peer review."

  • [September 08, 2000] "GraX: Graph Exchange Format." By Jürgen Ebert, Bernt Kullbach, and Andreas Winter (University of Koblenz-Landau, Institute for Software Technology, Rheinau 1, D-56075 Koblenz, Germany; Tel: ++49 262 287-2722; Email: {ebert | kullbach | winter)}). In Proceedings ICSE 2000 Workshop on Standard Exchange Format (WoSEF) Limerick, 6.6.2000. Edited by S. E. Sim, R. C. Holt, and R. Koschke. 5 pages, 16 references. "This paper introduces the GraX graph exchange format that can be used by software engineering tools. The data to be transferred are separated into a schema and an instance part which are both exchanged in the same way. The application of GraX as a vehicle for tool interoperability will be exemplified in the context of CASE and software reengineering tools. . . To enable interoperability between tools supporting various tasks in software engineering a suitable mechanism for interchanging data between those tools is required. Several data exchange formats have been developed to exchange models of software systems and information systems on various levels of abstraction (for CASE 1 tools see e.g. CDIF and XMI and for CARE 1 tools see e.g. ASFIX, RSF, and TA). Due to the heterogeneity of the subject domain of different tools there is evidence that data to be interchanged can not be mapped to a general metaschema. As a consequence a common interchange format enabling tool interoperability in software engineering has to support the exchange of instance data and schema data. Here, the GraX (graph exchange) format is proposed as an interchange format, which allows ex-changing instance and schema data in the same way. GraX is formally based on TGraphs which define a very general class of graphs. As notation GraX uses the markup language XML. . . Data representations in CASE and CARE tools are usually based on data structures like relations, syntax trees or graphs. To enable data interchange between tools a common data format has to be chosen. This either has to enclose all of the these data structures or it else has to allow an easy mapping between them. A common kind of data structure which can be matched to all of the above structures is given by TGraphs. TGraphs are directed graphs, whose vertices and edges are typed and attributed. Within TGraphs, edges are viewed as first class entities. While being treated independently from vertices, edges can be traversed along and against their direction. To express sequences of edges or vertices TGraphs are additionally ordered. Furthermore, TGraphs are scalable with respect to the application context in the sense, that not all properties of TGraphs have to be used to their full extent. Since TGraphs are a purely structural means for modeling, their meaning depends on the application context in which they are exchanged. This context determines which vertex and edge types, which attributes and which incidence relations are modeled. Conceptual modeling techniques using extended entity-relationship diagrams or class diagrams are used to define classes of TGraphs representing this application-related knowledge. Here, entity types and relationship types are used to specify vertex types and edge types together with their attribute and incidence structures. Multiple generalization is allowed for vertex and edge types. Further structural information can be modeled by using aggregation. Additional constraints, e.g. degree constraints or restrictions to relational graphs or dags are specified by graphical annotations or by textual constraints. To describe the schema part of the data to be interchanged, we propose the EER/GRAL-approach to graph-based, conceptual modeling, which is suited to TGraphs. So, the instance data structures supported by the GraX interchange format are TGraphs and the according schematic information is given by EER/GRAL conceptual models. Thus, in a concrete notation the underlying conceptual model, the vertices and edges including their type and attribute information, the incidence relationships and the ordering of vertices and edges have to be described. Figure 1 shows the XML document type definition (DTD) supplying such a notation. This DTD reflects the formal specification of TGraphs..." See also the PDF presentation slides for WoSEF 2000: "Components of Interchange Formats (Metaschemas and Typed Graphs)"; also in PowerPoint format. Note that work on graph exchange has been continued in the 'Graph Exchange Language (GXL)' project since the Workshop on Standard Exchange Format; see the web site and XML DTD. [cache]

  • [September 08, 2000] "Java and XML." By Brett McLaughlin. "The JDC is proud to present two chapters excerpted from Java and XML, by Brett McLaughlin, published by O'Reilly and Associates. Chapter 3, "Parsing XML," details the parsing lifecycle and demonstrates the events that can be caught by SAX. Chapter 9, "Web Publishing Frameworks," looks at web-publishing framework, why it matters to you, and how to choose a good one. . . The first half of the book Java and XMLalternates between XML and Java -- one chapter teaches a facet of XML, such as DTDs and XML Schema, or XSLT, and the next chapter discusses how to use that facet from Java, with SAX, DOM, JAXP, and JDOM. Extensive coverage of the major open source parsers is covered, particularly Apache Xerces and Xalan, allowing you to easily download a parser and be up and running in minutes (or hours, depending on your connection speed!). The second half of the book takes the concepts introduced in the first half to the next level. Specific hot topics in the world of Java and XML are covered through practical examples. XML-RPC, XML for configurations, web publishing (using Apache Cocoon), creating and writing XML, and more are all looked at in detail.

  • [September 08, 2000] "Tutorial on Knowledge Markup Techniques." By Harold Boley, Stefan Decker, and Michael Sintek. The tutorial slides are now available online in HTML, PDF, and PostScript formats. "There is an increasing demand for formalized knowledge on the Web. Several communities (e.g., in bioinformatics and educational media) are getting ready to offer semiformal or formal Web content. XML-based markup languages provide a 'universal' storage and interchange format for such Web-distributed knowledge representation. This tutorial introduces techniques for knowledge markup: we show how to map AI representations (e.g., logics and frames) to XML (incl. RDF and RDF Schema), discuss how to specify XML DTDs and RDF (Schema) descriptions for various representations, survey existing XML extensions for knowledge bases/ontologies, deal with the acquisition and processing of such representations, and detail selected applications. After the tutorial, participants will have absorbed the theoretical foundation and practical use of knowledge markup and will be able to assess XML applications and extensions for AI. Besides bringing to bear existing AI techniques for a Web-based knowledge markup scenario, the tutorial will identify new AI research directions for further developing this scenario." [Harold Boley has reinterpreted markup techniques for knowledge representation, showing the use of functional-logic programming in/for the Web, mapping the knowledge model of Protégé to XML-based systems, and developing the Relational-Functional Markup Language RFML. Stefan Decker has worked in IT support for knowledge management using knowledge markup techniques facilitated by ontology, metadata and knowledge representation based approaches; he is currently working on scalable knowledge composition methods. Michael Sintek developed an XML import/export extension of the frame-based knowledge acquisition and modeling tool Protégé-2000 and currently works on XML/RDF-based methods and tools for building organizational memories in the DFKI FRODO project.] [cache, PDF]

  • [September 08, 2000] "Dynamic Agents, Workflow and XML for E-Commerce Automation." By Qiming Chen, Umesh Dayal, Meichun Hsu, and Martin Griss (HP Labs, Hewlett-Packard, 1501 Page Mill Road, MS 1U4, Palo Alto, CA 94303, USA; Email: {qchen,dayal,mhsu,griss} 10 pages, with 11 references. In Proceedings of the First International Conference on E-Commerce and Web-Technology (EC 2000). "Agent technologies are now being considered for automating tasks in e-commerce applications. However, conventional software agents with predefined functions, but without the ability to modify behavior dynamically, may be too limited for mediating E-Commerce applications properly, since they cannot switch roles or adjust their behavior to participate in dynamically formed partnerships. We have developed a Java-based dynamic agent infrastructure for E-Commerce automation, which supports dynamic behavior modification of agents, a significant difference from other agent platforms. Supported by dynamic agents, mechanisms have been developed for plugging in workflow and multi-agent cooperation, and for dynamic service provisioning, allowing services to be constructed on the fly. We treat an agent as a Web object with an associated URL, which makes it possible to monitor and interact with the agent remotely via any Web browser. XML is chosen as our agent communication message format. Dynamic agents can carry, switch and exchange domain-specific XML interpreters. In this way, the cooperation of dynamic agents supports plug-and-play commerce, mediating businesses that are built on one another's services. A prototype has been developed at HP Labs. . . Since information sources are evolving, it is unlikely that we can use fixed programs for information accessing and processing. Our solution is to let a dynamic agent carry program tools that generate XML oriented data access and processing pro-grams based on DTDs. A DTD (like a schema) provides a grammar that tells which data structures can occur, and in what sequence. Such schematic information is used to automatically generate programs for basic data access and processing, i.e., creating classes that recognize and process different data elements according to the specification of those elements. For example, from an XML document including tag UNIT_PRICE, it is easy to generate a Java method 'getUnitPrice', provided that the meanings of tags are understood, and an XML parser is appended to the JDK classpath. The XML parser we use is the one developed by Sun Microsystems that supports SAX (Simple API for XML) and conforms to W3C DOM (Document Object Model). The advantage of automatic program generation from DTDs, is allowing tasks to be created on the fly, in order to handle the possible change of document structures. Thus for example, when a vendor publishes a new DTD for its product data sheet, based on that DTD, an agent can generate the appropriate programs for handling the corresponding XML documents. Agents use different programs to handle data provided by different vendors. Different application domains have different ontology models, with different agent communication languages and language interpreters, although they are in XML format. In a particular application domain, agents communicate using domain specific XML language constructs and interpreters. In our implementation, a dynamic agent can participate in multiple applications. . ." See related references in H-P Labs "Data Mining Research Publications." [cache]

  • [September 08, 2000] "Multi-Agent Cooperation, Dynamic Workflow and XML for E-Commerce Automation." By Qiming Chen, Meichun Hsu, Umeshwar Dayal, and Martin Griss (HP Labs, 1501 Page Mill Road, MS 1U4, Palo Alto, California, CA 94303, USA; Tel: +1-650-857-3060; Email: {qchen,mhsu,dayal,griss} Pages 255-256 in Proceedings of the Fourth International Conference on Autonomous Agents. June 3 - 7, 2000, Barcelona Spain. "Autonomous agents cooperate by sending messages and using concepts from a domain ontology. A standard message format with meaningful structure and semantics, and a mechanism for agents to exchange ontologies and message interpreters, have become key issues. Furthermore, the message format should be accepted not only by the agent research community, but also by all information providers. Dynamic agents send and receive information through XML encoded messages. We use a KQML/FIPA ACL-like format, encoded in XML. In fact, an XML document is an information container for reusable and customizable components, which can be used by any receiving agent This is the foundation for document-driven agent cooperation. By making Web accessible to agents with XML, the need for customer interfaces for each consumer and supplier will be eliminated. Agents may use XML format to explain their BDI, explaining new performatives by existing, mutually understood ones. Based on the commonly agreed tags, agents may use different style DTDs to fit the taste of the business units they mediate. Further, a dynamic agent can carry an XML front-end to a database for data exchange, where both queries and answers are XML encoded. The power of XML, the role of XML in E-Commerce, and even the use of XML for agent communication, have been recognized. Although XML is well structured for encoding semantically meaningful information, it must be based on an ontology. As ontology varies from domain to domain, and dynamic for dynamically formed domains, The more significant issue is to exchange the semantics of domain models, and interpret messages differently in different problem domains etc. Accordingly, we use an individual interpreter for each language. Dynamic agents can exchange those DTDs together with documents, and exchange those interpreters as programming objects, in order to understand each other in communication. These approaches allow us to provide a unified application carrier architecture, a unified agent communication mechanism, a unified way of data flow, control flow and even program flow, but flexible application switching capability, for supporting E-Commerce. . . At HP labs, we have developed a Java based dynamic agent infrastructure for E-Commerce which supports dynamic behavior modification of agents, a significant difference from other agent platforms. A dynamic agent does not have a fixed set of predefined functions, but instead, it carries application-specific actions, which can be loaded and modified on the fly. A dynamic-agent has a fixed part and a changeable part. As its fixed part, a dynamic-agent is provided with light-weight, built- in management facilities for distributed communication, object storage and resource management. A dynamic agent is capable of carrying data, knowledge and programs as objects, and executing the programs. The data and programs carried by a dynamic agent form its changeable part. All newly created agents are the same; their application-specific behaviors are gained and modified by dynamically loading Java classes representing data, knowledge and application programs." See related references in H-P Labs "Data Mining Research Publications." [cache]

  • [September 08, 2000] "XML and the Web." By Tim Berners-Lee (Director, World Wide Web Consortium). Keynote address delivered at XML World 2000, Boston, Massachusetts, September 06, 2000. ['XML was created at W3C as a generic markup language for the World Wide Web. Its role is to be a foundation for future developments, from interlinked data repositories through scalable vector graphics to synchronized multimedia. This talk discusses some architectural aspects of XML and the way it works on the Web, and the properties it must have to be a sound foundation for electronic commerce and the future Semantic Web.'] "XML is a universal language for structured documents and data on the Web... On the Web == has a URI... Namespaces remove ambiguity ...Think about persistence of URIs... Make sure everything of importance has one..."

  • [September 08, 2000] "XLink and Open Hypermedia Systems: A Preliminary Investigation." By Brent Halsey and Kenneth M. Anderson (Department of Computer Science, University of Colorado, Boulder, Boulder, CO 80309-0430, USA; Tel: 1-303-492-6003; E-mail: {halseyb, kena} Pages 212-213 in Proceedings of the eleventh ACM on Hypertext and Hypermedia. May 30 - June 3, 2000, San Antonio, TX USA. "XLink is an emerging Internet standard designed to support the linking of XML documents. We present preliminary work on using XLink as an export format for the links of an open hypermedia system. Our work provides insights into XLink's suitability as a vehicle for extending the benefits of open hypermedia to the rapidly evolving world of XML. This paper presents preliminary work on using XLink as an export format for open hypermedia links. . . [Conclusions:] Our preliminary work shows that XLink can be used to capture the link structures of an existing open hypermedia system, and demonstrates that extending open hypermedia systems to export XLink information is straightforward. We believe this work also demonstrates the potential of using open hypermedia systems as authoring environments of future XML-based hyperwebs delivered over the WWW as additional XML-aware applications become available. In particular, we believe the problems described by [Hall] when using open hypermedia systems to author HTML-based hyperwebs can be avoided or mitigated." See "XML Linking Language."

  • [September 08, 2000] "Who's Open Now? By Sean Gallagher. In XML Magazine (Fall 2000). "All the world's a stage, and the XML play's the thing, to paraphrase the Bard. Of the cast's usual actors, Microsoft, Sun, and IBM have leading roles, and the rest of the industry forms something like a Greek chorus. . . . Considering that Sun essentially claims to have invented XML, this position is strange on the surface. If you dig deeper, you'll understand where that attitude comes from. While XML can be used to extend Java like every other platform, Sun no doubt realizes that this is a double-edged sword. Being extensible with XML means that Java can interact with software written in other languages on other platforms -- which may help sell Java, but will place it in a much more competitive market in other cases. At some level, Java and XML are competing technologies -- Java is designed for cross-platform code portability, and XML is designed for cross-platform interoperability and information access. For many -- especially software vendors with extensive installed bases -- interoperability is just as good or better than consolidating on a single set or source. IBM, for example, has embraced Java this far in the interest of integrating all its software platforms more effectively with a relatively open technology -- but now XML and SOAP (and C#, for that matter) give IBM other options for building interoperability. Then there's deployment. When it comes to wireless and embedded devices, XML is just a lot easier to deploy than a Java VM and code -- something that Sun realized when it pushed server-side presentation with Java Server Pages, which generate, among other things, XML..."

  • [September 07, 2000] "XMLambda: A functional language for constructing and manipulating XML documents." By Erik Meijer and Mark Shields (Department of Computer Science, Oregon Graduate Institute). Draft version. "XML has been widely adopted as a standard language for describing static documents and data. However, many application domains require XML, and its cousin HTML, to be filtered and generated dynamically, and each such domain has adopted a language for the tasks at hand. These languages are often ill-suited, unsafe, and interact poorly with each other. In this paper we present XMLambda, a small functional language which has XML documents as its basic data types. It is expressly designed for the task of generating and filtering XML fragments. The language is statically typed, which guarantees every document it generates at runtime will conform to its DTD, but also uses type inference to avoid the need for many tedious type annotations. The language is also higher-order and polymorphic, which allows many common programming patterns to be captured in a small highly reusable library. Furthermore, the language uses pattern-matching so that XML fragments may be deconstructed into their components just as easily as they are constructed. We present the language by a series of worked examples. A formal definition and an implementation are in preparation." Available in PostScript format. Note in this connection the abstract for the PhD thesis of Mark Shields (Static Types for Dynamic Douments: "Dynamic documents, such as HTML or XML pages generated by a server in response to client requests, are particularly troublesome to program in conventional languages. This thesis presents a typed functional programming language, XMLambda, which makes this task simpler and less error prone. The language contains four novel features, which we first motivate and develop in isolation: (1) dynamically typed staged computation allows server-side and client-side code to be distinguished within the type system; (2) implicit parameters allow attributes (such as font or colour) to be inherited dynamically; (3) type-indexed rows allow XML's choice types, and SGML's unordered tuple types to be represented as native types, independently of the particular syntax of XML or SGML; and, (4) automata-directed type inference allows XML or SGML document fragments to be embedded directly within XMLambda programs. Each feature enjoys type inference, type soundness (subject reduction), and a simple method of compilation. Furthermore, each feature naturally coexists with polymorphism and higher-order functions, and thus can be added to a simple functional programming language without undue stress. We conclude with some motivating examples in which all four features, in addition to polymorphism and higher-order lazy functions, work together particularly harmoniously."

  • [September 07, 2000] "A Conceptual Graph Model for W3C RDF." By Olivier Corby, Rose Dieng, and Cédric Hébert (INRIA Sophia Antipolis, Acacia Project, 2004 route des Lucioles, BP 93, 06902 Sophia Antipolis Cedex, France). In Proceedings of the International Conference On Conceptual Structures (ICCS '2000) 2000. 14 pages, 19 references. "With the aim of building a 'Semantic Web', the content of the documents must be explicitly represented through metadata in order to enable contents-guided search. Our approach is to exploit a standard language (RDF, recommended by W3C) for expressing such metadata and to interpret these metadata in conceptual graphs (CG) in order to exploit querying and inferencing capabilities enabled by CG formalism. The paper presents our mapping of RDF into CG and its interest in the context of the semantic Web. [...] We are convinced of the interest of AI representation languages that enable not only the representation of metadata but also support inferences on them. Among such AI knowledge representation formalisms, stress the advantages of conceptual graph (CG) formalism for expressing metadata. Another approach is to exploit a standard language for expressing metadata and to be able to interpret these metadata in conceptual graphs in order to exploit querying and inferencing capabilities enabled by conceptual graph formalism. Our approach delivers a more powerful and relevant search with the CG projection. In particular, the parametrization of the projection enables several levels of search. Our approach takes advantage of the CG formalism, as, but without requiring the author of the document to know CG. The interest of our approach is that if RDF, recommended by W3C, is widely adopted as a standard by the Web community, then a Web document author can both continue to use RDF annotations and draw benefit from the CG formalism, even without knowing himself the CG formalism. The exploitation of RDF schemas by means of Conceptual Graphs seems more relevant in the context of a company or of a given community: this company or community can agree on the conceptual vocabulary used for expressing the metadata about their documents. In the future, we plan to study a query language for RDF statements and the mapping to appropriate CG projections." [cache]

  • [September 07, 2000] "VoiceXML for Web-Based Distributed Conversational Applications." By Bruce Lucas (IBM). In Communications of the ACM (CACM) Volume 43, Number 9 (September 2000),pages 53-57. "VoiceXML replaces the familiar HTML interpreter (Web browser) with a VoiceXML interpreter and the mouse and keyboard with the human voice. Until recently, the Web delivered information and services exclusively through visual interfaces on computers with displays, keyboards, and pointing devices. The Web revolution largely bypassed the huge market for information and services represented by the worldwide installed base of telephones, for which voice input and audio output provide the sole means of interaction. Development of speech services has been hindered by a lack of easy-to-use standard tools for managing the dialogue between user and service. Interactive voice-response systems are characterized by expensive, closed application-development environments. Lack of tools inhibits portability of applications and limits the availability of skilled application developers. Consequently, voice applications are costly to develop and deploy, so voice access is limited to only those services for which the business case is most compelling for voice access. Here, I offer an introduction to VoiceXML, an emerging standard XML-based markup language for distributed Web-based voice services, much as HTML is a language for distributed visual services. VoiceXML brings the power of Web development and content delivery to voice-response applications, freeing developers from low-level programming and resource management. It also enables integration of voice services with data services, using the familiar client/server paradigm and leveraging the skills of Web developers to speed application development for this new medium. . . VoiceXML supports simple 'directed' dialogues; the computer directs the conversation at each step by prompting the user for the next piece of information. Dialogues between humans don't operate on this simple model, of course. In a natural dialogue, each participant may take the initiative in leading the conversation. A computer-human dialogue modeled on this idea is referred to as a 'mixed-initiative' dialogue, because either the computer or the human may take the initiative. The field of spoken interfaces is not nearly as mature as the field of visual interfaces, so standardizing an approach to natural dialogue is more difficult than designing a standard language for describing visual interfaces like HTML. Nevertheless, VoiceXML takes some modest steps toward allowing applications to give users some degree of control over the conversation. In the forms described earlier, the user was asked to supply (by speaking) a value for each field of a form in sequence. The set of phrases the user could speak in response to each field prompt was specified by a separate grammar for each field. This approach allowed the user to supply one field value in sequence. Consider a form for airline travel reservations in which the user supplies a date, a city to fly from, and a city to fly to. . . VoiceXML enables such relatively natural dialogues by allowing input grammars to be specified at the form level, not just at the field level. A form-level grammar for these applications defines utterances that allow users to supply values for a number of fields in one utterance. For example, the utterance 'I'd like to fly from New York on February 29th' supplies values for both the 'from city' field and the 'date' field. VoiceXML specifies a form-interpretation algorithm that then causes the browser to prompt the user for the values (one by one) of missing pieces of information (in this example, the 'to city' field). VoiceXML's special ability to accept free-form utterances is only a first step toward natural dialogue. VoiceXML will continue to evolve, incorporating more advanced features in support of natural dialogue..." For other references, see: (1) "VoiceXML Forum" and (2) the VoiceXML Consortium web site.

  • [September 07, 2000] "The Promise of a Voice-Enabled Web. [Software Technologies.]" By Peter J. Danielsen. In IEEE Computer Volume 33, Number 8 (August 2000), pages 104-106. "In 1999, AT&T, IBM, Lucent Technologies, and Motorola formed the VoiceXML Forum to establish and promote the Voice eXtensible Markup Language (VoiceXML) as a standard for making Internet content available by voice and phone (see Each company had previously developed its own markup language, but customers were reluctant to invest in a proprietary technology that worked on only one vendor's platform. Released in March 2000, version 1.0 of the language specification is based on years of research and development at the founding companies and on comments received from among the more than 150 companies that belong to the Forum. In this column, I review the existing architectures for Web and phone services, describe how VoiceXML enables consolidation of service logic for Web and phone, and summarize the features of the VoiceXML 1.0 specification... The features of VoiceXML can be grouped into four broad areas: dialog, telephony, platform, and performance. (1) Dialog features: Each VoiceXML document consists of one or more dialogs. The dialog features cover the collection of input, generation of audio output, handling of asynchronous events, performance of client-side scripting, and continuation of the dialog. VoiceXML supports the following input forms: audio recording, automatic speech recognition, and touch-tone. Output may be prerecorded audio files, text-to-speech synthesis, or both. The language supports the generation and handling of asynchronous events: both 'built-in' events -- such as a time-out, an unrecognized input, or a request for help -- and user-defined events. Event handlers typically specify some new output to be provided to the caller and whether to continue the existing dialog or switch to another. (2) Telephony features: VoiceXML provides basic control of the telephony connection. It allows a document author to specify when to disconnect and when to transfer the call. Transfers follow one of two scenarios: a blind transfer that terminates the VoiceXML session as soon as the call is successfully transferred, and bridging, which suspends the VoiceXML session while the caller is connected to a third party - for example, a customer service representative. The session resumes once the conversation with the third party has completed. The system saves the outcome of the transfer, which may be submitted with other data in a subsequent URL request. (3) Platform features: While VoiceXML provides a standard way to describe dialogs, it also provides mechanisms to accommodate individual platform capabilities. This includes invoking platform-specific functionality and controlling platform-specific properties. One platform, for example, may have an advanced speaker-verification package, while another may have a custom credit-card dialog. Another may permit control of its proprietary speech-recognition parameters. can use caching to avoid a fetch attempt altogether. (4) Performance features: Not only are VoiceXML documents Web-based, but so are the resources they use, with each resource's location specified by a URL. These resources include audio files, input grammars, scripts, and objects. The VoiceXML client must retrieve and install them prior to use. One challenge VoiceXML service providers will face involves minimizing the amount of 'dead air' the caller hears while the system fetches resources. A PC user with a visual browser sees a spinning icon when the system retrieves resources, but a caller in contact with a VoiceXML platform may not be aware that the service is Web-based, and thus likely considers silence a signal that something has gone awry. VoiceXML provides several facilities to either eliminate or hide the delays associated with retrieving Web resources." [Peter J. Danielsen is a distinguished member of the technical staff in the Software Production Research Department at Lucent Technologies' Bell Labs in Naperville, Illinois. One of the authors of the VoiceXML 1.0 specification, he has developed a variety of interactive voice-response services and service-creation environments for AT&T and Lucent Technologies.] For other references, see: (1) "VoiceXML Forum" and (2) the VoiceXML Consortium web site.

  • [September 07, 2000] "Hello, Voice World." By Didier Martin. From (September 06, 2000)." ['Ever written a "Hello World" program that talks back? Didier Martin has, and now he shares his experiences in order to show us around VoiceXML, a markup language for voice interactions.'] In our last trip to Didier's Lab, we encountered the aural world of XML made possible by the VoiceXML language. This week I'll explain more about VoiceXML and create the classic 'Hello World' application. But this time instead of seeing the result, you'll listen to it. People intrigued by the last article asked me if and how VoiceXML documents are used to build voice applications. Answering this question presents an opportunity to highlight VoiceXML's features, and the way its basic concepts make it very different from HTML or XHTML. A VoiceXML application is a collection of dialogs. A dialog is the basic interaction unit between the VoiceXML interpreter and an interlocutor. A dialog unit can either be a form or a menu. A form consists of a collection of fields which are filled by the interlocutor. A menu is a choice made by an interlocutor. The figure below shows an example VoiceXML application with the links between the various dialogs shown...The IBM VoiceXML interpreter is freely available from the IBM alphaWorks site." For other references, see: (1) "VoiceXML Forum" and (2) the VoiceXML Consortium web site.

  • [September 07, 2000] "Why WAP May Never Get off the Ground. [Binary Critic.]" By Ted Lewis (DaimlerChrysler Research & Technology). In IEEE Computer Volume 33, Number 8 (August 2000), pages 110-112. "WAP isn't making it as a standard. More hype than reality, it is already nearly dead. . .On the surface, WAP seems perfect. Based on XML, HTTP 1.1, and many other emerging Internet formats, it avoids the trap of trying to force a new infrastructure standard. Instead, it allows any kind of lower-level protocol to deliver its content. For example, providers can choose among SMS, CDMA (3G), and other network layer protocols. To its credit, the WAP protocol stack is very open, as Table 1 shows. Further, you can implement WAP easily. A WAP gateway to a standard HTTP 1.1 server is all you need to make WAP work with existing Web services. Wireless handsets can use the standard URL addressing scheme. Wireless Markup Language (WML), a proper subset of XML, provides markup tags best suited for handheld devices with small screens and null keyboards. WAP departs from TCP/IP mainly in the areas required to support high-latency, low-bandwidth wireless networks. Because the link between server and client is unique, WAP eliminates the reassembly of out-of-order TCP packets. WAP permits only one packet stream order: the one formed as the packets generate. Thus, WAP simplifies many parts of TCP/IP, and makes systems run more efficiently over low-bandwidth wireless networks. WML is also smaller than HTML, with WAP's simplified protocol allowing shorter messages that save on bandwidth. WAP also defines new functionality. For example, its telephony functions handle dialing from a mobile phone. WAP permits 'push' functions so that the server can send information to a client without the client initiating the request. The server can send a stock quotation or airline reservation change when a change occurs, for example... Perhaps the most damaging indictment of WAP is the most obvious: Telephones were made for ears, not eyes. The current mobile telecom marketshare leaders own a lot of ears -- and mobile phones simply aren't intended to be watched. Thus, the market for WAP never developed because it solved a problem nobody cared about: how to turn an audio phone into a visual browser. Instead, server-based interactive voice response is taking off. Once the IVR technology converts information into audio, any old cell phone will do just fine. There will never be much need for WAP as long as mobile-device users prefer audio to video..." See: "WAP Wireless Markup Language Specification (WML)."

  • [September 07, 2000] "Ariba, IBM, Microsoft Outline Plans for B2B Standard." By Jeffrey Burt and Peter Galli. In eWEEK (September 07, 2000). "Thirty-six technology companies, led by Ariba Inc., IBM and Microsoft Corp., unveiled plans Wednesday to create a universal Internet standard designed to accelerate e-commerce. At a joint news conference here, officials with the three companies said developing such a standard was the only way to meet analysts' forecasts of more than $1 trillion dollars worth of business being conducted over the Internet by 2004. The initiative, dubbed the Universal Description, Discovery and Integration Project, is designed to create a platform-neutral standard based on XML (Extensible Markup Language) to fuel automated integration of all e-commerce transactions and Web services. At the heart of the project is a directory in which businesses can register themselves, the services they offer and their Internet capabilities. White, yellow and green pages: The UDDI Business Registry will include a white pages section that lists the names of the companies, a yellow pages section listing a company's standard codes and geographic information, and a green pages section categorizing the services offered by a particular company. The registry will not be limited to particular industries, and companies also can use it to find other businesses... 'The problem up to now has been that many businesses were unable to find and hook up with the right service providers,' said Stewart Allen, vice president of architecture and technology at San Francisco-based webMethods. 'This will make that possible. I believe it is possible to meet the 18-month timeframe to have globally acceptable standards for a common registry'." See: "Universal Description, Discovery, and Integration (UDDI)."

  • [September 07, 2000] "Conference Spotlights XML's Inexorable Rise." By Roberta Holland. In eWEEK (September 07, 2000). "The Extensible Markup Language has evolved beyond being just cool technology to become a practical tool that can save businesses time and money, according to experts gathered at the XML World 2000 conference here this week. Joe Gollner, founder of XIA Information Architects Corp. and the conference chairman, offered anecdotal proof that XML has hit the mainstream: A large advertisement for XML was posted at Boston's Logan Airport by the German company Software AG. 'Things have changed obviously,' Gollner said. 'Seeing a big sign, an expensive sign, in a mainstream venue was staggering. It's kind of like reaching the land of milk and honey.' Keynote speaker Tim Berners-Lee, inventor of the Web and chairman of the World Wide Web Consortium, urged the audience to shed its closed-world assumptions and instead build systems that can connect with one another. He acknowledged some may question why bother using namespaces when building a home page, but he likened the possibilities to the beginnings of hypertext. . . Jaime Ellertson, executive vice president and chief strategy officer of BroadVision Inc., said many large companies have much or most of their back-end systems formatted in XML. About a year ago, Lycos Inc. had 50 percent of all its data feeds in an XML format, while CNN uses virtually all XML. Likewise, XML has become the common format for B2B content, he said."

  • [September 07, 2000] "IBM, Microsoft sign partners for e-commerce standard." By Wylie Wong. In CNET (September 06, 2000). "Ariba, IBM and Microsoft have signed on more than 30 technology companies to support their plan to build a giant online Yellow Pages for companies looking to conduct business online. . .the three companies are proposing a Web standard and a new initiative that will let businesses register in an online database. The database will help companies advertise their services and find each other so they can conduct transactions over the Web. In the announcement today, executives from Ariba, IBM and Microsoft said they signed on 33 other companies that will take part in the effort. Companies include American Express, Commerce One, Compaq Computer, Dell Computer, Loudcloud, Nortel Networks, NTT Communications, SAP, Sun Microsystems and Tibco Software, among others. The proposed standard, called Universal Description Discovery and Integration (UDDI), will allow businesses to describe the type of services they offer and will allow those services to be located by other businesses via the online directory. The directory will be an online marketplace larger in scope than previous attempts to list businesses online. It will not be limited to a specific industry and will list companies participating in any business. Companies would be required to register themselves but could then be found automatically by potential customers. A test version of the directory is expected to go online in about 30 days. The proposed standard is based on Extensible Markup Language (XML), a Web standard for data exchange that is rapidly becoming the preferred language of online business. . ."

  • [September 07, 2000] "The role played by XML in the next-generation Web." By Edd Dumbill. From (September 06, 2000). ['In this speech to the XML World 2000 conference in Boston, Editor Edd Dumbill gives an overview of the integrated future of XML and the Web, and the role that SOAP and RDF will play in that vision.'] "It's not cool to be different, at least not where Internet computing is concerned. Despite widespread agreement about certain subsections of Internet technology -- SMTP for email, for instance -- many services and sources of data remain desperately unconnected. The same is equally true of desktop computing. Although office suites provide some degree of integration, exchanging data between applications from different vendors is very frustrating. Add the Web and email into the mix and the problem gets worse. The more I use and rely on computers, the more I realize I ought to stand up for my rights. I'm abused and tormented by a patchwork of programs that hardly work together, that trap my data in places I don't want, and make me adopt unnatural working styles. The busier I get, the more I get buried in information overload, the more I realize this is happening, and the more I want it fixed. XML offers hope for escape from the current situation of fragmentation and disarray. In this talk I will focus on two technologies that look as though they'll have a big impact in this area: SOAP and RDF. I'll also talk about the shift in architecture, from centralized to decentralized, that we'll need to embrace as the world of Internet computing continues to grow... The dream that drives the integrated vision of the future is of a universal homogeneous view of information. No special cases or peculiar formats but a universally accessible 'data bus' over the realm of the Internet, and by extension, all your private data sources. Let's look at the components of such a system. Fundamentally, they are a universal addressing scheme and a universal data format. In many ways these represent the essential components of a universal computer. The addressing scheme, universal resource identifiers (URIs), has been in operation over the web for a long time now. The data format, XML, has been around for nearly three years, and it's clearly providing many benefits in reducing the translation overheads of communication within and between organizations. We're now close to the conditions in which computing can be performed over the whole span of the Internet. However, every computer requires instructions and a language in which to program. These are the problems we need to work on now in order to realize the greater promise."

  • [September 07, 2000] "Schema Round-up." By Leigh Dodds. From (September 06, 2000). "Noting an increasing interest in XML Schemas on several mailing lists, this week the XML Deviant takes a look at some of the resources available to the aspiring schema developer." For schema description and extensive references, see "XML Schemas."

  • [September 07, 2000] "Transforming XML with XSLT." Chapter 7, "Beta" from Building Oracle XML Applications, by Steve Muench. We've used XSLT stylesheets in previous chapters to transform database-driven XML into HTML pages, XML datagrams of a particular vocabulary, SQL scripts, emails, and so on. If you're a developer trying to harness your database information to maximum advantage on the Web, you'll find that XSLT is the Swiss Army knife you want permanently attached to your belt. In a world where the exchange of structured information is core to your success, and where the ability to rapidly evolve and repurpose information is paramount, Oracle XML developers who fully understand how to exploit XSLT are way ahead of the pack. XSLT 1.0 is the W3C standard language for describing transformations between XML documents. It is closely aligned with the companion XPath 1.0 standard and works in concert with it. As we'll see in this chapter, XPath lets you say what to transform, and XSLT provides the complementary language describing how to carry out the transformation. An XSLT stylesheet describes a set of rules for transforming a source XML document into a result XML document. An XSLT processor is the software that carries out the transformation based on these rules. In the simple examples in previous chapters, we have seen three primary ways to use the Oracle XSLT processor. We've used the oraxsl command-line utility, the XSLT processor's programmatic API, and the <?xml-stylesheet?> instruction to associate a stylesheet with an XSQL page. In this chapter, we begin exploring the full power of the XSLT language to understand how best to use it in our applications..." ['Steve said on XML-DEV: "O'Reilly published a beta chapter today on their website of my forthcoming book Building Oracle XML Applications. It's the chapter that explains the fundamentals of XSLT through a number of examples. I highly recommend the PDF version (over the HTML version) for printing out and reading. The book focuses on being an example-rich tutorial for Java and PL/SQL developers to learn the basics of XML, XPath, and XSLT, and how to exploit them with the XML capabilities of the Oracle database. I hope the beta chapter is interesting and useful to you." See the full description for details. Book ISBN: 1-56592-691-9, 848 pages (est.). Chapter also in HTML format.

  • [September 06, 2000] "Internet registry alliance forged. IBM, Microsoft, Ariba team up on business-to-business standards push." By Ashlee Vance. In InfoWorld (September 06, 2000). "Ariba, Microsoft, and IBM were joined on Wednesday by several key players in the e-commerce world to design a type of standardized electronic yellow pages that describes and categorizes companies throughout the world. While the trio of Ariba, Microsoft, and IBM currently lead the project's development, approximately 36 other vendors have agreed to act as advisors and developers of the technology. The trio of founding companies set a September target date for the availability of an Internet-based registry of companies using what they have called the Universal Description, Discovery, and Integration (UDDI) standard. The UDDI standard should create a way for companies in the b-to-b marketplace to find out what types of commerce other companies conduct and what those companies use as their protocol for transactions and communications. Companies around the globe will be able to provide data and information for the registry at no charge. The first implementation slated for this month will contain basic categorization and service listings. Others versions of the registry are scheduled to appear in March 2001 and then December 2001, with more complex features added for varying types of b-to-b operations at each step. After 18 months, the project will move into the hands of a yet-unnamed standards body. At the moment, sources for the three companies said the UDDI system contains three types of information divided into what they refer to as white, yellow, and green pages, officials from the companies said here Wednesday at a press conference to launch the project. The white pages will contain business names, descriptions of the type of business, and other information regarding what kinds of services a vendor uses, and also what technology they can respond to. The yellow pages section adopts current government codes for tagging types of business operations as well as international and technology-based naming protocols. In addition, the yellow pages section arranges companies by geographical location. The green pages should provide more specific information on what types of documents a company can receive, the entry points for transactions, and the technology the company currently interacts with and supports. . . Ariba spearheaded the project and will offer resources, along with IBM and Microsoft, for the initial nodes, or data collection points, that will serve as the backbone for the system. Other vendors including American Express, Compaq Computer, SAP AG, Dell Computer, Nortel Networks, and Andersen Consulting will aid the development of the fledgling project, helping to work through the bugs of the proposed open standard. Over the next 18 months, the partners will try to expand the number of categories and add more complete features to help the complicated b-to-b transaction ladder. Suggestions include customizing the categorization features and accommodating the needs of large corporations with a variety of business units focused on different goals. In addition, a number of vendors expressed interest in building upon the standard as it progresses and developing registries with different features that lie on top of UDDI." See: "Universal Description, Discovery, and Integration (UDDI)."

  • [September 06, 2000] "IBM, Microsoft, Ariba Team Up To Standardize XML. UDDI initiative to create company registry for B2B integration." By Elizabeth Montalbano. In CRN (September 06, 2000). "The race to create XML standards for B2B exchanges just got hotter. IBM, Microsoft and Ariba on Wednesday unveiled the Universal Description, Discovery and Integration (UDDI) Project, an initiative designed to create a standard registry for companies that will accelerate the integration of systems in the B2B marketplace. XML, a standard, tag-based language used for data exchange, is at the core of UDDI and increasingly is being used in B2B technology because it makes it easier to transfer data between disparate systems. IBM, Sun and other partners in the UDDI initiative also are working in other groups, most notably OASIS and its ebXML initiative, that attempt to standardize how XML is used in B2B integration. The aim of UDDI is to standardize how companies can interface with one another using XML, said Paul Maritz, group vice president of the platform group for Microsoft, at a press conference here. UDDI will do this by storing information about companies' B2B capabilities in a shared directory that companies can access via a set of XML standards the three vendors are working to produce in tandem with UDDI partners, said Maritz. 'Businesses then will use these standards and conventions to register their business in a directory that stores their names and the services they can offer,' says Maritz. The registry will consist of three sections, he said. The first is a 'white pages' directory that will allow companies to register their names and the key services they provide, and allows other companies to search the directory by company name, said Maritz. The second is a 'yellow pages' directory that categorizes companies in three ways: by NAICS industry standard codes set by the U.S. government, by United Nations/SPSC codes and by geographical company information. The last element of UDDI is a 'green pages' directory, where companies will be able to interface with companies in the registry using XML because they can find out what format the companies support, and then can send documents based on that XML format, said Maritz. [. . .] Now that vendors are agreeing on how to develop a standardized XML-based B2B registry, it remains to be seen whether it will actually work." See: "Universal Description, Discovery, and Integration (UDDI)."

  • [September 06, 2000] "IBM's Web Services architecture debuts. An architecture overview and interview with Rod Smith." By Dave Fisco. From IBM DeveloperWorks (September 2000). ['IBM developerWorks just released a interview with Rod Smith, VP of IBM Emerging Technologies, about Web Services, a model for doing development where they use XML messaging and SOAP to talk to each other...Introducing IBM's Web Services, a distributed software architecture of service components. This brief overview and in-depth interview cover the fundamental concepts of Web Services architecture and what they mean for developers. The interview, with IBM's Rod Smith, Vice President of Emerging Technologies, explores which types of developers Web Services targets, how Web Services reduce development time, what developers could be doing with Web Services now, as well as a glance at the economics of dynamically discoverable services.]' "Web Services describes a distributed software architecture of service components that can be integrated at run time to produce dynamic and flexible applications The services can be new applications or legacy systems with wrappers to make them network-savvy. Services need not work independently; they can rely on other services to achieve their goals. In the initial stages of engaging a service, the availability of components and the specifics of their functionality and APIs is achieved through XML messaging. Later, the components are free to achieve their objectives using any communications protocol, such as RMI, CORBA, or XML messaging. A network component in a Web Services architecture can play one or more fundamental roles: (1) Service Provider, (2) Service Requester, and (3) Service Broker. Service Providers supply Service Requesters with tasks necessary in the Requester's application. Service Requesters find Service Providers using a Service Broker. Service Brokers register available services from Service Providers and make matches with requests from Service Requesters. The three fundamental operations of Web Services are: (1) Publish, (2) Find, and (3) Bind. Service Providers publish their abilities to Service Brokers. This publication is in the form of the XML-based Well-Defined Services (WDS) document. The WDS document provides nonoperational information about the service, including a description of the service, the category under which the service falls, and the company that created the service. Service Requesters ask Service Providers to execute the find function based on the Requester's needs. The Service Requester looks up known services based on the WDS document and the Network Accessible Service Specification Language (NASSL) document. NASSL is an Interface Definition Language (IDL), based on XML, that describes the interfaces necessary to access a service. NASSL does not describe the service itself; that responsibility is left to the WDS. A Service Requester must bind to a Service Provider before performing any API calls. Binding involves establishing all Environmental Prerequisites necessary to successfully complete the services. Examples of Environmental Prerequisites include security, transaction monitoring, and HTTP availability. Businesses make service descriptions and other information available on the network through the Universal Description, Discovery, and Integration (UDDI) registry. This information will include taxonomies such as industry codes, products and services, and geography..."

  • [September 06, 2000] "IBM, Microsoft, Ariba Team for Web Business Standard." By Nicole Volpe. In Reuters News (September 05, 2000). "International Business Machines Corp., Microsoft Corp., and Ariba Inc. said on Tuesday they have teamed up to create a directory which aims to become the standard way for businesses to find and connect with partners on the Web. The three companies said they planned to have a framework for the directory available on Thursday, listing the basic contact information for businesses by industry, along the same lines as yellow pages listings. A test registry with more detailed information about each company's electronic-commerce abilities and practices was expected to be up and running within 30 days. The proposed standard is called Universal Description Discovery and Integration and is based on XML (Extensible Markup Language), a Web standard for data exchange which is already widely used for online business... Executives from the three companies said the registry could eliminate current problems caused by companies having to call and e-mail each other to see how to connect, for example, their accounting or their order fulfillment systems. The registry would allow companies to know in advance what software to use to carry out a transaction, eliminating the need for technicians to spend time collaborating, as they often do now. The companies said they expected the registry, which would update all corporate listings as changes occurred within industries, also to bring a new era of automated transactions. James [Utzschneider] said with the use of the new registry he expected such seamless, automatic transactions to take place within a year. Twenty-nine other companies have signed up for the registry so far, executives said." See: "Universal Description, Discovery, and Integration (UDDI)."

  • [September 06, 2000] "Research notebook: On edge-labelled graphs in XML." By Dan Brickley (W3C RDF Interest Group Chair). "There seems to be some consensus around the claim that RDF has a useful data model but a problematic XML syntax. This document is an attempt to gather together the various discussion documents and proposals that relate this topic to the broader context of XML-based graph serialization systems. . .This document serves as an informal survey of XML applications that adopt an edge-labelled graph data model similar to that used in W3C's Resource Description Framework (RDF). It also points to discussion and proposals regarding improvements to the RDF XML syntax. Hopefully these pointers will prove useful even while the document is incomplete..." [Forwarded from www-rdf-interest for hopefully obvious reasons. RDF syntax discussions have a lot in common with some more mainstream XML concerns; I'm hoping I might raid XML-DEV's bookmarks on this topic to build a useful overview for both communities. Basically I'm wondering whether the notion of a 'new, better' RDF syntax makes sense, or whether such a thing is likely to fall out of mainstream XML-for-data-graphs interchange conventions. Apologies for the sketchy state of the draft etc etc.] See "Resource Description Framework (RDF)."

  • [September 06, 2000] "SOAP, RDF and the Semantic Web." By Henrik Frystyk Nielsen. Presented at WWW9, Semantic Web Track (May 2000) Developers Day Slides. "At the WWW9 Conference, I was exited to give a presentation on dev day as part of the Semantic Web track on SOAP serialization. Dan has been so kind to make the slides available... The purpose of the presentation was to explain the model behind the SOAP serialization as well as how it might be used to serialize RDF graphs. Similarly, SOAP may be used to serialize object graphs etc. For more information on the SOAP specification, see the W3C Note which was submitted May 8 by 11 W3C Member organizations. . ." (1) ZIP format, (2) HTML. See also the " Walkthrough of RDF Examples" [examples in the RDF spec and how they serialize in SOAP]. On RDF: see "Resource Description Framework (RDF)."

  • [September 06, 2000] "Riding the XML Bus to E-business Heaven. Microsoft, IBM Put Aside Rivalry to Concur on Internet Standards." By Chris Preimesberger. From DevX (September 06, 2000). ['Who would have imagined that dyed-in-the-wool Web-services enemies Microsoft, IBM and Sun Microsystems would be joining hands around the XML campfire and singing its praises -- for the good of the connected world? These reluctant partners are seeing eye-to-eye, at least as it pertains to their new XML-based online business registry.'] "Bitter rivals Microsoft Corp., IBM, and Sun Microsystems, as well as 33 other companies, have boarded the same Internet bus, venturing on the road they believe leads to e-business heaven. And there at the wheel, tipping a cap and grinning ear-to-ear, is the driver: XML. These heavy hitters put differences behind them Wednesday to announce they have joined forces in the creation of a new entity called UDDI (for Universal Description, Discovery, and Integration standard). The UDDI is not a technology but a consortium, the main product of which its founders liken to a 'yellow pages' for Internet business. The goal is to create a platform-independent database of businesses. Companies register for inclusion in the database and can then use it for purposes of describing business services, discovering business partners and integrating business services using the Internet. The UDDI database depends heavily on one ingredient: Extensible Markup Language, which will be the foundation on which it is built. If not for that X-factor in the database, these 36 companies probably wouldn't be coming along for the ride. In fact, if the UDDI works as company officials think it will, XML will stand alone as the universal catalyst for business-to-business and business-to-customer commerce. '[UDDI] happened because at high levels in each of these companies we came to separate agreement about one thing: We must have universal standards for doing business over the Internet,' said Microsoft vice-president Paul Maritz. 'XML is already a universal standard, and it works very well.' There is an underlying message for developers of all levels: If you're not already implementing XML for back end data exchange, it would behoove you get with it if you want to develop apps to be used on the Web. But proficiency in XML won't be a prerequisite for success -- at least at first..." See: "Universal Description, Discovery, and Integration (UDDI)."

  • [September 06, 2000] "Group develops XML-based yellow pages. SOFTWARE DEVELOPMENT.]" By John Geralds. In (September 06, 2000). "Ariba, IBM and Microsoft have teamed up to provide an online database of companies that want to conduct business on the web. The three software providers plan to make a framework available this week for listing the basic contact information for businesses by industry, similar to a yellow pages listing. The framework is called Universal Description Discovery and Integration (UDDI) and is based on XML (extensible markup language), a web standard for data exchange. Under the proposal, each business would have its own UDDI address that would contain information such as what the business provides and how to connect to it. The initiative will allow businesses to register on a central database by providing corporate information such as company contact details, industry category and services or products. For example, it would allow a company to automate the integration of a business-to-business transaction. John Mann, an analyst at the Patricia Seybold Group, said that UDDI lets a company find out how its computers can talk to another company's computers. Ariba, IBM and Microsoft said they had another 30 or so hi-tech firms that will use trial versions of the protocol starting at the end of the month. Partners are expected to be announced Wednesday, and there are plans to turn the framework over to one of the internet standards bodies in 12 to 18 months." See: "Universal Description, Discovery, and Integration (UDDI)."

  • [September 05, 2000] "SQL Server 2000 scales .Net up, out." By Timothy Dyck. In eWEEK (September 05, 2000). "The linchpin of Microsoft Corp.'s .Net framework, Microsoft SQL Server 2000, provides a stronger foundation for enterprise data storage through its new federated database features as well as easier access to that data through a rich set of XML interfaces. The product, which shipped last month, is the first of Microsoft's updated 2000 servers to hit the market. eWeek Labs doesn't think this release will convince customers to switch from other platforms, but it enables Microsoft shops to scale existing databases higher and access them in new ways. Extensible Markup Language is a key enabler for .Net and is broadly supported in SQL Server 2000. In tests of gold code, we found it easy to query relational data and retrieve results in XML as well as store XML in relational formats. However, SQL Server uses, in some places, an outdated XML data-type proposal called XML-Data Reduced. Customers will need to rework XML templates when the final XML Schema standard is released later this year. . . Microsoft has been slower than competitors to add XML support to its database -- Oracle, IBM and Sybase Inc. each have added similar features over the past 12 months -- but Microsoft has gone a step further by including a Web gateway to its XML interfaces, providing not just the XML formatting engine but a built-in mechanism to get the data in and out of the database as well, although at the cost of more security concerns. When we set up the gateway, which works only with Microsoft's Internet Information Server, we specified which database it should connect to, then submitted a query directly to SQL Server using only a Web browser and received the output as XML."

  • [September 05, 2000] "A lot of effort goes into standards organizations, but much of it is wasted." By Michael Vizard. In InfoWorld Volume 22, Issue 35 (August 28 2000), page 65. "As any vendor will tell you, the beautiful thing about standards is that there are so many of them. This of course means that in some form or another, vendors can always pick some standard to comply with that fits with their business model. On the reverse side of that equation is the customer, who must often feel that all the standards in the world are ill-defined and knows that two products that support the same standard may not work with each other. The problem we face today, however, is that standards have become too much of a good thing. IBM claims to now have more than 150 people working on some 100-plus different standards committees. These efforts typically involve everything from high-level XML schemas as defined by to the next generations of fiber-optic networking technology being defined by the Internet Engineering Task Force (IETF). For vendors, this process has become so cumbersome that making sure their products are compatible with all these standards has become a major chore. And making that process even more difficult is the fact that the standards bodies are not adept at communicating with each other. So very few of these groups really understand what the other group is doing and what implications that work will have on their own efforts. For customers, the situation is moving from inane to insane. Instead of just getting incomplete standards that lead to product incompatibilities between vendors in the same category, we're now moving to a situation where standards are potentially incompatible across diverse product categories which absolutely need to work together. The good news is that tolerance for this kind of nonsense in the business community is starting to drop. The problem businesspeople are starting to ponder is that many of these technologies are now the basis for a global expansion in e-business that is being hampered by wars over infrastructure standards. Given these issues, it won't be too long before businesspeople begin pushing for all the diverse standards bodies to be rolled up under the auspices of something like the United Nations or the World Trade Organization...

  • [September 05, 2000] "BroadVision, others team on wireless B2B services." By Erich Luening. In CNET (September 05, 2000). "E-commerce software maker BroadVision today said it has teamed with four other companies to form a new venture that will provide wireless business-to-business software and services. The new company, named B-Mobile, is slated to begin operations by early October. The other companies involved in the venture are California-based H&Q Asia Pacific; Itochu, a Japanese trading company, and its information processing unit, Itochu Techno-Science; and Access, a Japanese browser-software maker for non-PC devices. B-Mobile, to be headquartered in Tokyo, will provide software and services for Internet-based business transactions over wireless networks for mobile phones and personal digital assistants (PDAs), the companies said in a joint statement. The new company is also teaming with NTT Communications, a unit of Nippon Telegraph and Telephone, to use NTT Communications' network B-Mobile will base its services on BroadVision's One-To-One e-business applications and XML (Extensible Markup Language) content-management technologies, the companies said. XML is a Web standard for data exchange that is rapidly becoming the preferred language of online business."

  • [September 05, 2000] "Army group adds XML to its repertoire." By Patricia Daukantas. In Government Computer News Volume 19, Number 24 (August 21, 2000), pages 1, 74. ['The Army Publishing Agency is turning to the Extensible Markup Language to make Army regulations easier to view and search.'] "The Army Publishing Agency is turning to the Extensible Markup Language to make Army regulations easier to view and search. The current edition of the Army Electronic Library, a quarterly CD-ROM set, contains four prototype documents converted from their original Standard Generalized Markup Language versions. An edition due out next month will have about 100 documents tagged in XML, said Stephen P. Wehrly, chief of electronic publishing for the Alexandria, Va., agency. Making the documents available in XML as well as two proprietary formats, IBM BookManager and Adobe Portable Document Format, Wehrly said. The four XML prototype documents in the July 2000 edition of the Army Electronic Library include the 400-page Army Regulation 25-30, which governs the Army Publishing and Printing Program. The CD-ROM set also holds 1,204 publications in IBM format, 852 publications in PDF, and 1,933 forms in both PDF and several versions of FormFlow from JetFlow Corp. of Ottawa. Last December, however, agency officials changed their minds after they attended an XML conference in Philadelphia. They asked their document support contractor to convert a document from SGML to XML as a test, and it was surprisingly easy, Wehrly said. The XML version turned out to be more searchable and less cumbersome than the BookServer-formatted document. Also, users who couldn't see the PDF version because they were unable or unwilling to install Adobe Acrobat Reader could still read the document in a browser window, Wehrly said..."

  • [September 05, 2000] "Commentary: Microsoft, IBM, Ariba to create major advance in B2B." By [CNET Staff]. In CNet (September 01 2000). "IBM, Microsoft and Ariba will propose a set of XML-based interface and procedure standards necessary to maintain an online database that will enable companies to divulge and discover the online business standards required to streamline commerce and procedure to bring companies to the Net. Although the directory will contain some human-readable information, it will primarily support the system-to-system (S2S) domain. XML provides a vehicle for creating S2S interfaces for Internet-based communication. As such, it has been called the next generation of electronic data interchange (EDI). However, like EDI it is language-oriented. Companies and industries need to develop the consistent interface formats necessary to conduct actual S2S, market-style transactions. Each industry, product type, market and customer type has unique needs that must be accommodated. The major challenge for trading partners has always been to establish and publish those interfaces. This new proposal the Universal Description, Discovery and Integration (UDDI) standard, is designed to provide a way for a company's computers to consistently publish and subscribe to information pertinent to participation in business-to-business e-commerce and Net market systems. Commerce systems of actual and potential business partners can search, discover and download the basic UDDI constructs automatically, dramatically reducing programming complexity. This provides those business partners with the information they need to conduct basic transactions, such as submitting bids and sending invoices S2S. One of the most important aspects of this project is that these companies intend their approach to become a published standard. While competitors to UDDI may appear, and more than one may ultimately be used, no one will be able to establish a proprietary solution that allows one company to own this important part of e-commerce. The combination of Microsoft and IBM, the two largest companies in the computer industry, with Ariba, which has taken a leadership role in business-to-business commerce on the Web, makes this a powerful move in the industry and guarantees that proprietary solutions will fail. . . The UDDI initiative also promises to help reduce costs for business-to-business e-commerce technology to a reasonable level. EDI and XML have always been very expensive to implement. Companies have had to buy millions of dollars in software and spend large amounts of staff time on the detailed technical problems of creating their interfaces." See: "Universal Description, Discovery, and Integration (UDDI)."

  • [September 05, 2000] "XML-EDI Translations Get Boost." By Ephraim Schwartz. In InfoWorld (September 01, 2000). "XMLSolutions next week will announce its XMLSolutions Business Integration Platform for converting legacy data to Web formats. XMLSolutions, based in McLean, Va., is a company whose potential impact on e-business may be best measured by the fact that its new CEO, Ron Shelby, was lured from General Motors, where he was CIO. 'There's an opportunity, especially on the direct materials [procurement] side, to be a player with a significant piece of software,' Shelby said. The platform is a universal interpreter that one analyst likened to the Star Trek universal language translator. XMLSolutions' platform can translate, in either direction, any version of EDI (electronic data interchange) to any version of XML, giving immediate access to data regardless of format. The cost savings to companies trying to contain escalating re-engineering expenditures as they go online with new procurement systems will be significant, according to analyst Ken Volmer, research director at Giga Information Group, in Cambridge, Mass. 'It makes the whole body of work done in the last 20 years in developing the EDI data dictionaries available in the Internet world,' Volmer said. What the XMLSolutions platform will do is automate the mapping of meta data (the coded name for data fields) from one format to another. For example, in a two-step process, the software describes a particular field, such as 'part price,' by taking that field description and representing it as an XML structure. In the second step, it reformats the XML structure into the particular version of XML being used, such as cXML. The system stores all EDI and XML standards and builds mappings between them as if they were native on either end, according to Shelby..."

  • [September 05, 2000] "XML: the lingua franca for B2B 'Net applications." By Frank Dzubeck. In Network World (September 04, 2000). "A few years ago, I had to sell clients on the merits of XML. Today, XML has unequivocally become the universal format for structured documents and data on the Web. Software vendors are slowly embracing XML. IBM was an early adopter and even became an evangelist. Cisco has completely opened its policy management software with XML. Sun intends to include a native XML parser within Java 2 Enterprise Edition and even offers a transformation compiler that converts XML scripts into Java code. XML is synonymous with portable data, as Java is synonymous with portable code: both are essential ingredients to next-generation Web-based applications. Microsoft, with the announcement of .Net, has made a significant commitment to XML...We have reached another milestone in the transformation of the computer and communications industry. Using the Open Systems Interconnection model as a reference, IP established a level playing field for developer and user interoperability at Layer 3. TCP/UDP [User Datagram Protocol] did the same at Layer 4, and now XML addresses interoperability at Layer 7. The significance of this event cannot be understated. The power of the Internet was always presumed to be IP and TCP/UDP, but these protocols are only the "plumbing" standards to let applications interact and communicate."

  • [September 01, 2000] "XML Standard Readied for Businesses." By Ed Scannell and Tom Sullivan. In InfoWorld (August 31, 2000). "IBM, Microsoft, and Ariba next week will propose an XML-based standard that will allow thousands of vendors to register their businesses in a Web-based database that will help them match up with partners to carry out e-commerce transactions, according to industry sources briefed by the companies. The three companies have come up with a standard called the Universal Description Discovery and Integration (UDDI). The three companies intend to ask a dozen or so other key companies in the industry to serve on an advisory board, and together they will evolve the specification over the next 12 to 18 months before turning it over to an open-standards body. The proposed online database would allow a company to register all its basic corporate information, 'much like you might find in a brochure or on a Web site,' according to one source. Perhaps more importantly a company can also register all the technical aspects of its e-business, such as the transport protocols it supports. 'What this [database] will eventually be able to do is allow you to automate the integration of a b-to-b (business-to-business) transaction,' said one source. The way businesses would integrate the technical aspects of their business now is to go through the laborious process of custom-coding large portions of their products to get them to work together seamlessly. But by searching the on-line database for the appropriate partners, companies can save time and money by contacting only those prospective partners with whom they best match."

  • [September 01, 2000] "New XML Variant Targets Chemical Industry." By Renee Boucher Ferguson. In eWEEK (September 01, 2000). "...a group of three major chemical companies is working feverishly to finish development of a new strain of XML to complement forthcoming online exchanges for the chemical industry. The group, called eStandards, comprises Dow Chemical Co., DuPont and BASF Corp. It's developing a dialect of the Extensible Markup Language that, when used in a chemical exchange, will enable companies to engage in many-to-many trading transactions. Later this month, the trio will present a draft of its as-yet-unnamed XML specification to the Chemicals Industry Data Exchange, or CIDX, consortium in hopes of creating an industry standard. XML is used to describe the contents of documents on Web pages. In its first incarnation, the chemical-industry XML variant promises to standardize customer information, catalog data, order placement and security. A second phase of development, expected to be completed by year's end, will add international trade parameters, logistics, invoicing and forecasting. The pressure to get the spec developed comes from the expected fourth-quarter launch of two trading exchanges: Omnexus, which will focus on plastics, and Elemica, a chemical exchange. Founders of the former include BASF and DuPont; the latter is backed by BASF, DuPont, Dow and others. In the meantime, smaller chemical exchanges such as Envera, which was formed a year ago by Ethyl Corp. and is set to launch October 1 [2000], have announced plans to hand over their own XML specifications to the CIDX next week. Envera, which gave input to eStandards, and the other exchanges are competing not only against one another but also against trading practices prevalent in the chemical industry. The focus of current electronic trading is EDI (electronic data interchange). The danger, said The Delphi Group analyst Nathaniel Palmer, is potential infighting within the industry, which could 'easily prevent any [standards from taking off] and ensure EDI stays where it is. What they really want is their standard to be developed and everyone to use it on their trading platform,' said Palmer, in Boston." See "Envera provides CIDX with key XML data tags for industry-wide application. Chemical industry establishes e-commerce transaction data standards." For project description, see: "XML-Based 'eStandard' for the Chemical Industry."

  • [September 01, 2000] "Relaxer: Java classes from RELAX modules." By Murata Makoto and ASAMI Tomoharu. Presented at Extreme Markup Languages Conference 2000. "Relaxer is a Java program that generates Java classes from RELAX modules: XML documents valid against a RELAX module can be handled by the Java classes generated from that RELAX module. Relaxer liberates programmers from tedious work: (1) Variables in generated classes have programmer-friendly names, since they are borrowed from RELAX modules; (2) Datatypes specified in RELAX modules are used as datatypes of Java variables; (3) Convenient methods such as reader/writer for XML documents and access function are generated; and, (4) Functions for design patterns 'factory', 'composite' and 'visitor' are generated. Unlike other Java class generators or XML-Java mapping tools, Relaxer supports all features of RELAX Core including mixed content models, element content models, and standard attributes such as xml:lang, xml:space and xml:base. Relaxer has been extensively used by some early adopters, and has received very positive feedback. See the complete description.

  • [September 01, 2000] XML Tutorial 3: XSL Transformations." By Bonnie SooHoo. From (September 01, 2000). "XML allows us to build arbitrary tag sets to describe our data, and writing a well-formed XML document gives relational structure to our data. But how do we deliver that data in a presentable manner people can use? While CSS is one option for adding style, there's another style sheet language built from XML that was designed for this very purpose, called XSL, or eXtensible Style Language...XSL originally consisted of two parts: one for transforming XML documents into other types of documents, and the other part for formatting objects used to format a document. These two parts have since evolved into separate specifications with the XSL transformations part being referred to as XSLT -- that's the focus for this tutorial. The XSL formatting language is still under development; as such, it currently has little support. For those of you who are professing XML to be too impractical for real use because of limited browser support, XSLT comes to the rescue. XSLT offers incredible capabilities for transforming raw XML data into another type of document, such as a well-formed HTML document that should be accessible by everyone. This is basically how it works: An XSL processor takes an XML document and applies the XSL style sheet to it to generate a third source as the transformed output, which is the final product that users will actually see. For this tutorial, this output will be HTML. . ." For related resources, see "Extensible Stylesheet Language (XSL/XSLT)."

August 2000

  • [August 31, 2000] XMLDOM: DOM/Value Mapping. Revised Submission. OMG Document 'orbos/2000-08-10'. 94 pages. August 21, 2000. A specification for DOM-based XML/Value Mapping [XMLDOM], in response to the ORBOS RFP - XML/Value Mapping. Also available in PostScript and .ZIP format. Submitted by: BEA Systems, Cape Clear Software Ltd, Hewlett-Packard Company, International Business Machines Corporation, IONA Technologies PLC, Oracle Corporation, PeerLogic, Inc., Persistence Software, Rogue Wave Software, Unisys Corporation. Supported by: Sun Microsystems. Contact: Dr. Stephen Brodsky, Ph.D (IBM). ['This document is an update of the joint submission to the XML/Value RFP. This document has dependencies on DOM Level 2, which is under development at the W3C as a condidate recommendation. The Finalization Task Force will incorporate the final recommendations from the W3C for the DOM Level 2 specification.'] Background: "XML has become an important and widespread standard for representing hierarchical tagged data. So much so, that it has become a common requirement to pass XML documents in CORBA interface operations. While it is possible to pass XML documents as strings, it is cumbersome to do so, and it requires each recipient of the string to parse its XML content. A better way is to create a data structure representing the XML document that can be traversed and manipulated in memory, and passed to a remote context without further processing by the sender or the receiver. To address this problem, this submission provides a mapping from XML documents to IDL valuetype hierarchies, based on XML DTDs. This submission provides two essential scenarios for using XML to create IDL valuetypes. The first scenario, where dynamic information is present, leverages existing standards to provide access to the full contents of an XML document in terms of IDL valuetypes. The second scenario builds upon the first where additional static information is present from XML DTDs and (in the future) XML Schemas. The DTDs/Schemas are metadata used to generate Valuetypes that match the types of information expected to be present in XML documents. The metadata from the DTDs/Schemas and Valuetypes may be imported into CORBA Interface Repositories and the Meta Object Facility, providing wide metadata distribution through OMG standards. The dynamic information scenario is the processing of an XML document when the meaning of the XML elements found in the document is not defined. In this case, only minimal information is known -- what is in the XML document and little else. The DOM is a standard representation for the complete contents of an XML document. The DOM satisfies the requirement of the W3C XML Information Set (Infoset) to provide an access mechanism to the document contents. By expressing the DOM in terms of IDL valuetypes, a CORBA implementation has practical, standardized, and direct access the full information in of the XML document. The RFP requests "a standard way to represent XML values (documents) using OMG IDL non-object (value) types." This response provides an XML to IDL mapping leveraging the Document Object Model (DOM) technical recommendation from the World Wide Web Consortium (W3C). The DOM is an extensively used standard mechanism for defining access to XML content. The DOM includes a set of interfaces defined in IDL with mappings to Java and C++. The purpose here is to enable IDL users to access XML content using IDL valuetypes while maintaining maximum DOM compatibility. To this end, DOM level 1 and level 2 interfaces are re-declared as IDL valuetypes instead of the IDL interfaces in the DOM standard. The RFP does not request a mapping from IDL to XML. Mapping from IDL to XML is already accomplished using the MOF and XMI OMG standards. "Metadata: There are two fundamental sources of information in XML: DTDs and XML documents. DTDs provide static information since they define XML elements for a class of XML documents. XML documents provide dynamic information: (1) The document contents may be instances of DTD declarations. (2) The document contents may be instances of new types not declared in a DTD. (3) A document may not have a DTD. The DTD may not exist or be referenced. (4) The DTD is updated while the deployed software remains at a previous level, so information which could be available statically in a future software revision must be treated dynamically in the mean time. Dynamic information is available through XML Parsing into a DOM tree. Knowing static information ahead of time supplements the dynamic information. If all the dynamic information is also available in a previously known DTD, this is the static scenario. If both static and dynamic information is used, this is the mixed scenario. When static information from a DTD is available, the metadata defining XML documents can be extracted into IDL Valuetypes. The metadata in the valuetypes is made widely available through the CORBA Interface Repository (IR) and the Meta Object Facility (MOF). The Corba Component Model describes the mappings from the valuetype declarations in the IR to the MOF. This provides a pathway from valuetypes to MOF metamodels. Mapping from XML documents, DTDs, and XML Schema to the MOF is covered by the XMI production of XML Schemas RFP." Section 7 "Static Mapping from a DTD" describes a static mapping of XML documents to valuetypes based on XML DTDs. The mapping defines a hierarchy of valuetypes that mirror an XML document's structure. Specific valuetypes are used to represent elements that may themselves contain other elements. Valuetypes representing document elements inherit from generic valuetypes defined in the DOM module. Note: The submission "does not contain mappings to XML Schemas since XML Schemas will not be finalized by the W3C for several months." See "Object Management Group XML/Value RFP." [cache]

  • [August 31, 2000] Speaking in Charsets: Building a Multilingual Web Site." By John Yunker. In WebTechniques Volume 5, Issue 9 (September 2000). ['Creating Japanese Web pages presents its own unique set of challenges. John guides you through the quagmire of character sets, encodings, glyphs, and other mysterious elements of written language.'] "There are many character sets from which to choose, including anything from Western to Cyrillic. When working with different languages, you'll need to understand the different character sets and the ways computers manipulate and display them. A character can be a tough concept to grasp if all you've ever worked with is English. Characters are not just letters of the alphabet. You have to be careful not to confuse a character with a glyph, which is the visual representation of a character. For example, the letter Z is a character, but it may be represented by a number of different glyphs. In the Times New Roman font, a Z looks much different from the way it looks in the Bookman font. . . Character sets by themselves don't mean much to computers unless they're attached to encodings that describe how to convert from bits to characters. For example, under ASCII, the number 65 represents the letter A. This mapping from number to letter is referred to as the character encoding. A computer must be told which encoding is being used, then it simply matches the number to the character..." For related resources, see "XML and Unicode."

  • [August 31, 2000] "MSXML Conformance Update." By Chris Lovett. From August 30, 2000. ['In the past, has tested Microsoft's MSXML parser for XML conformance with less than glorious results. In this article, Chris Lovett presents the significant improvements made by Microsoft in MSXML in recent months.]' "This article is an update to previous articles by David Brownell on the conformance of the Microsoft XML Parser (MSXML). The July 2000 MSXML 3.0 Beta Release has made a significant improvement in conformance against the OASIS XML conformance test suite. Besides reporting the test OASIS conformance results, this article also reports on the compliance of the new Visual Basic SAX interface included in MSXML 3.0. To run compliance testing on this component I developed a brand new test harness in Visual Basic, which is also included with this article... The OASIS XML conformance test suite is a published set of tests, collected over time from various sources, which measure the conformance of XML parsers against the W3C XML 1.0 specification. It does not include any tests from Microsoft at this time. For my test, I downloaded the updated test suite that David Brownell published in February. This updated test suite takes into account the W3C errata for the XML 1.0 specification. I made two modifications to this suite... see for my updated version of the test suite. . . I used the same ECMAScript test harness that David Brownell published, except for one minor modification. This modification stemmed from the issue of what to do with tests marked 'valid' that have no DTD (document type definition) at all. David's test harness treated this issue in a manner contrary to the design of MSXML. . . MSXML still has some issues to resolve relating to non-existent or malformed unused entities, attribute-value normalization, end-of-line handling, and reporting validity constraints when running in non-validating mode. However, you can see from the following table that MSXML is on a steady march towards 100% compliance." For background and references, see "XML Conformance."

  • [August 31, 2000] "Transforming XML: HTML and XSLT." By Bob DuCharme. From August 30, 2000. ['While HTML isn't an XML application itself, it can be both generated and transformed using XSLT. Bob DuCharme show us how.'] "HTML Web pages have played a big part in electronic publishing for some time now, and will continue to for several years. If you use XSLT as a system development tool, you may work on an application that needs to read or write HTML. If your application is reading or writing the HTML flavor known as XHTML, a W3C Recommendation that describes itself in its spec as 'a reformulation of HTML 4 as an XML 1.0 application,' then there's nothing special to worry about: XHTML is perfectly good XML, just like anything else that XSLT can read or write. If your application is reading older legacy HTML or outputting HTML for use in older browsers, however, there are a few small problems to keep in mind and some simple techniques for getting around these problems. HTML as Input XSLT processors expect their input to be well-formed XML, and although HTML documents can be well-formed, but most aren't. For example, any Web browser would understand the following HTML document, but a number of things prevent it from being well-formed XML... Between Dave Raggett's Tidy program and the xsl:output element, you should be all set to incorporate old-fashioned HTML into your new XSLT-based systems!" See related resources in "XSL/XSLT: Articles and Papers."

  • [August 31, 2000] "XML-Deviant: Instant RDF?" By Leigh Dodds. From August 30, 2000. RDF has some devoted followers, but is yet to hit the XML mainstream. Many believe this is because of its complicated syntax. XML-Deviant investigates the quest for 'instant RDF'.'] "Complexity has been one criticism which RDF has had difficulty in shaking off. Both the RDF model, and its serialization syntax, have fallen foul of this issue at various points in its development. Efforts to produce a simpler serialization syntax have lead to several alternate proposals, including one from Tim Berners-Lee ("The Strawman Proposal"), and one from Sergey Melnick ("Simplified Syntax for RDF"). For non-RDF-afficionados, the serialization syntax is the representation of the RDF data model as XML. (Although XML is only one possible means of representing this information). While technical concerns have been raised about specific details of the RDF syntax, the main aim of simplification is to make it easier to generate RDF from existing (and future) XML documents--documents which were not produced with RDF applications in mind. Given the slow adoption of RDF, this seems a useful approach. While discussion of the finer points of the RDF syntax are no doubt beneficial, for developers seeking to gain some benefit from using RDF this transitional step from XML to RDF is important. An increasingly large amount of XML data coupled with a vast amount of HTML (suitably tidied for well-formedness) provides a rich data source for bootstrapping RDF applications... While it's too early to say whether RSS will its proving ground, RDF's supporters are keen to see more adoption. Dan Brickley has suggested to developers that effort should be spent on producing interoperability tests for the increasing range of available RDF parsers..." See "Resource Description Framework (RDF)."

  • [August 30, 2000] "Keys for XML." By Peter Buneman, Susan Davidson, Wenfei Fan, Carmem Hara, and Wang-Chiew Tan (PENN Database Research Group). August 9, 2000. 14 pages (with 8 references). ['We discuss the definition of keys for XML documents, paying particular attention to the concept of a relative key, which is commonly used in hierarchically structured documents. A critical issue for for annotations, updates, and the definition of constraints inside documents.'] "Keys are an essential part of database design: they are fundamental to data models and conceptual design; they provide the means by which one tuple in a relational database may refer to another tuple; and they are important in update, for they enable us to guarantee that an update will affect precisely one tuple. More philosophically, if we think of a tuple as representing some real-world entity, the key provides an invariant connection between the tuple and entity. If XML documents are to do double duty as databases, then we shall need keys for them. In fact, a cursory examination of existing DTDs reveals a number of cases in which some element or attribute is specified -- in comments -- as a 'unique identifier'. Moreover a number of scientific databases, which are typically stored in some special-purpose hierarchical data format which is ripe for conversion to XML, have a well-organized hierarchical key structure. Both the XML specification itself and XML-Schema include some form of specification of keys. Through the use of ID attributes in a document type descriptor (DTD) one can specify an identifier for an element that is unique within a document. XML-Schema has a more elaborate proposal which is the starting point for this note. There are a number of technical issues concerning the XML-Schema proposal, but the important point is that neither XML nor XML-Schema properly address the issue of hierarchical keys, which appear to be ubiquitous in hierarchically structured databases. This is the main reason for this note. Also, the authors believe that the use of keys for citing parts of a document is sufficiently important that it is appropriate to consider key specification independently of other proposals for constraining the structure of XML documents. How then, are we to describe keys for XML or, more generally, for semistructured data? From the start, how we identify components of XML documents is very different from the way we identify components of relational databases..." Also available in Postscript format; see the publications listing. [cache]

  • [August 30, 2000] "The basics of using XML Schema to define elements. Get started using XML Schema instead of DTDs for defining the structure of XML documents." By Ashvin Radiya and Vibha Dixit (AvantSoft, Inc.). From IBM DeveloperWorks. August 2000. ['The new XML Schema system, now nearing acceptance as a W3C recommendation, aims to provide a rich grammatical structure for XML documents that overcomes the limitations of the DTD. This article demonstrates the flexibility of schemas and shows how to define the most fundamental building block of XML documents -- the element -- in the XML Schema system.'] " We have covered the most fundamental concepts needed to define elements in XML Schema, giving you a flavor of its power through simple examples. Many more powerful mechanisms are available: (1) XML Schema includes extensive support for type inheritance, enabling the reuse of previously defined structures. Using what are called facets, you can derive new types that represent a smaller subset of values of some other types, for example, to define a subset by enumeration, range, or pattern matching. In the example for this article, ProductCode type was defined using pattern facet. A subtype can also add more element and attribute declarations to the base type. (2) Several mechanisms can control whether a subtype can be defined at all or whether a subtype can be substituted in a specific document. For example, it is possible to express that InvoiceType (type of Invoice number) cannot be subtyped, that is, no one can define a new version of InvoiceType. You can also express that, in a particular context, no subtype of ProductCode type can be substituted. (3) Besides subtyping, it is possible to define equivalence types such that the value of one type can be replaced by another type. (4) By declaring an element or type to be abstract, XML Schema provides a mechanism to force substitution for it. (5) For convenience, groups of attributes and elements can be defined and named. That makes reuse possible by subsequently referring to the groups. (6) XML Schema provides three elements -- appInfo, documentation, and annotation -- for annotating schemas for both human readers (documentation) and applications (appInfo). (7) You can express uniqueness constraints based on certain attributes of child elements. . ." Also available in PDF format. For schema description and references, see "XML Schemas." [cache]

  • [August 30, 2000] "ERX: A Conceptual Model for XML Documents." By Giuseppe Psaila (University of Bergamo, Faculty of Engineering, Viale Marconi 5, 24044 Dalmine [BG], Italy). Pages 898-903 with 11 references [volume 2] in Proceedings of the 2000 ACM Symposium on Applied Computing. Paper presented at conference (March 19 - 21, 2000, Como Italy). "The Extensible Markup Language (XML) is able to represent any kind of structured or semi-structured document, such as papers, web pages, database schemas and instances, style-sheets, etc. However, the tree-structure of XML documents, induced by nested mark-ups, does not provide a sufficiently expressive and general conceptual model of data in the documents, particularly when multiple source documents are processed at the same time. This paper proposes the ERX (Entity Relationship for XML) conceptual model, an evolution of the Entity Relationship model that copes with the peculiar features of XML. ERX is devised to provide an effective support to the development of complex XML processors for advanced applications. By discussing an applicative scenario, the paper shows that suitable CASE tools can provide a practical support during the implementation of XML processors, by the automatic generation of software components based on the ERX model. [...] In this paper, we address the problem of defining a conceptual model for XML documents. We propose the ERX conceptual model, an evolution of the classical Entity Relationship model. ERX provides specific features which are suitable to model large collections of XML documents. ERX is devised to be effective for building advanced XML processors, specifically processors that have to manipulate complex XML documents, or multiple classes of documents at the same time. In particular, the database-like view of the modeled documents offered by ERX is a valuable factor of the proposal, both because it determines a better understanding of the data in the documents, and because it opens the way to the development of applications that exploit the well known and reliable technology of relational DBMSs to manage and manipulate large collections of documents. The paper shows by means of the definition of an applicative scenario that suitable CASE tools can be developed to assist the development of complex XML processors based on the ERX conceptual model. Future work: We plan to follow several directions. At first, we are going to test the effectiveness of ERX on complex cases coming from industry and/or bank applications; these tests will be the occasion to validate ERX, specifically as far as its completeness is concerned. From the implementation point of view, we are realizing several versions of the ERX data manager; in particular, we are investigating both a main memory solution and an implementation that exploits a relational DBMS; in fact, this latter solution seems very promising to build complex and distributed information systems based on XML. A third research line will consider the definition of a formalism to specify XML to ERX transformation rules; this formalism is the first necessary step toward the development of CASE tools that generates XML-ERX mappers." See also "ERX: A Data Model for Collections of XML Documents."

  • [August 30, 2000] "Efficient Evaluation of Regular Path Expressions on Streaming XML Data." By Zachary G. Ives, Alon Y. Levy, and Daniel S. Weld (University of Washington Database Group). Technical Report UW-CSE-2000-05-02, University of Washington, 2000 Submitted for publication. 22 pages (with 18 references) "The adoption of XML promises to accelerate construction of systems that integrate distributed, heterogeneous data. Query languages for XML are typically based on regular path expressions that traverse the logical XML graph structure; the efficient evaluation of such path expressions is central to good query processing performance. Most existing XML query processing systems convert XML documents to an internal representation, generally a set of tables or objects; path expressions are evaluated using either index structures or join operations across the tables or objects. Unfortunately, the required index creation or join operations are often costly even with locally stored data, and they are especially expensive in the data integration domain, where the system reads data streamed from remote sources across a network, and seldom reuses results for subsequent queries. This paper presents the x-scan operator which efficiently processes non-materialized XML data as it is being received by the data integration system. X-scan matches regular path expression patterns from the query, returning results in pipelined fashion as the data streams across the network. We experimentally demonstrate the benefits of the x-scan operator versus the approaches used in current systems, and we analyze the algorithm's performance and scalability across a range of XML document types and queries. [Conclusions:] In this paper we have presented the x-scan algorithm, a new primitive for XML query processing, that evaluates regular path expressions to produce bindings. X-scan is scalable to larger XML documents than previous approaches and provides important advantages for data integration, with the following contributions: (1) X-scan is pipelined and produces bindings as data is being streamed into the system, rather than requiring an initial stage to store and index the data. (2) X-scan handles graph-structured data, including cyclical data, by resolving and traversing IDREF edges, and it does this following document order and eliminating duplicate bindings. (3) X-scan generates an index of the structure of the XML document, while preserving the original XML structure.(4) X-scan uses a set of dependen finite state machines to efficiently compute variable bindings as edges are traversed. In contrast to semi-structured indexing techniques, x-scan constructs finite automata for the paths in the query, rather than for the paths in the data. (5) X-scan is very efficient, typically imposing only an 8% overhead on top of the time required to parse the XML document. X-scan scales to handle large XML sources and compares favorably to Lore and a commerical XML repository, sometimes even when the cost of loading data into those systems is ignored." Note from ZI home page: "Zack works with Professors Alon Levy and Dan Weld on the Tukwila data integration system. Tukwila uses adaptive query processing techniques to efficiently deal with processing heterogeneous, XML-based data from across the Internet. Current research in Tukwila is in new adaptive query processing techniques for streaming XML data, as well as policies for governing the use of adaptive techniques." For related references, see "XML and Query Languages." [cache]

  • [August 30, 2000] "X-Scan: a Foundation for XML Data Integration." Project overview. From the University of Washington Database Group. 'The x-scan algorithm is a new operator designed to facilitate integration of XMLdata sources in the context of the Tukwila data integration system.' "The adoption of XML promises to accelerate construction of systems that integrate distributed, heterogeneous data. Query languages for XML are typically based on regular path expressions that traverse the logical XML graph structure; the efficient evaluation of such path expressions is central to good query processing performance. Most existing XML query processing systems convert XML documents to an internal representation, generally a set of tables or objects; path expressions are evaluated using either index structures or join operations across the tables or objects. Unfortunately, the required index creation or join operations are often costly even with locally stored data, and they are especially expensive in the data integration domain, where the system reads data streamed from remote sources across a network, and seldom reuses results for subsequent queries. We propose the x-scan operator, which efficiently processes non-materialized XML data as it is being received by the data integration system. X-scan matches regular path expression patterns from the query, returning results in pipelined fashion as the data streams across the network. We have experimentally demonstrated the benefits of the x-scan operator versus the approaches used in current systems and analyzed the algorithm's performance and scalability across a range of XML document types and queries. [...] X-scan is a new method for evaluating path expressions as data is streaming into the system. The input to x-scan is an XML data stream and a set of regular path expressions occurring in a query; x-scan's output is a stream of bindings for the variables occuring in the expressions. A key feature of x-scan is that it produces these bindings incrementally, as the XML data is streaming in; hence, x-scan fits naturally as the source operator to a complex pipeline, and it is highly suited for data integration applications. X-scan is motivated by the observation that IDREF links are limited to the scope of the current document, so in principle, the entire XML query graph for a document could be constructed in a single pass. X-scan achieves this by simultaneously parsing the XML data, indexing nodes by their IDs, resolving IDREFs, and returning the nodes that match the path expressions of the query. In addition to the path expression evaluation routines, x-scan includes the following functionality: (1) Parsing the XML document; (2) Node ID recording and reference resolving; (3) Creating a graph-structured index of the file; (4) Returning tuples of node locations..."

  • [August 30, 2000] "XML Query Languages in Practice: An Evaluation." By Zachary G. Ives and Ying Lu. Paper presented at Web Age Information Management 2000, Shanghai, China. Abstract. The popularity of XML as a data representation format has led to significant interest in querying XML documents. Although a 'universal' query language is still being designed, two language proposals, XQL and XML-QL, are being implemented and applied. Experience with these early implementations and applications has been instructive in determining the requirements of an XML query language. In this paper, we discuss issues in attempting to query XML, analyze the strengths and weaknesses of current approaches, and propose a number of extensions. We hope that this will be helpful both in forming the upcoming XML Query language standard and in supplementing existing languages. [Conclusion:] In this paper, we have described the two most widely accepted XML query languages, XQL and XML-QL, and examined how they can be applied to three different domains: relational queries, queries over arbitrary XML data, and graph-structured scientific applications. While we believe this to be the first analysis of XML query languages' applicability, issues in designing an XML query language have been frequently discussed in the literature. Recently, Bonifati and Ceri presented a survey of five major XML query languages that compared the features present in each. The goal of this paper is more than to provide a feature comparison: we hope to promote a greater understanding of XML query semantics, and to detail some of the problems encountered in trying to apply these languages. While a query language containing the 'union' of the features present in XQL and XML-QL will go a large way towards solving the needs of querying XML, we also propose a number of extensions that we feel are necessary: (1) An XML graph model with defined order between IDREFs and subelements; (2) Regular path expression extensions for subelement, IDREF, or arbitrary edges; (3) Support for 'optional' path expression components and null values; (4) Support for following XPointers; (5) Pruning of query output; (6) Clearer semantics for copying subgraphs to query output." For related references, see "XML and Query Languages." [cache]

  • [August 30, 2000] "XPERANTO: Publishing Object-Relational Data as XML." By Michael Carey, Daniela Florescu, Zachary Ives, Ying Lu, Jayavel Shanmugasundaram, Eugene Shekita, and Subbu Subramanian. Third International Workshop on the Web and Databases (WebDB), May 2000, Dallas, TX. Since its introduction, XML, the Extensible Markup Language, has quickly emerged as the universal format for publishing and exchanging data in the World Wide Web. As a result, data sources, including object-relational databases, are now faced with a new class of users: clients and customers who would like to deal directly with XML data rather than being forced to deal with the data source's particular (e.g., object-relational) schema and query language. The goal of the XPERANTO project at the IBM Almaden Research Center is to serve as a middleware layer that supports the publishing of XML data to this class of users. XPERANTO provides a uniform, XML-based query interface over an object-relational database that allows users to query and (re)structure the contents of the database as XML data, ignoring the underlying SQL tables and query language. In this paper, we give an overview of the XPERANTO system prototype, explaining how it translates XML-based queries into SQL requests, receives and then structures the tabular query results, and finally returns XML documents to the system's users and applications. [Conclusions:] In this paper, we have described a systematic approach to publishing XML data from existing object-relational databases. As we have explained, our work on XPERANTO is based on a 'pure XML' philosophy -- we are building the system as a middleware layer that makes it possible for XML experts to define XML views of existing databases in XML terms. As a result, XPERANTO makes it possible for its users to create XML documents from object-relational databases without having to deal with their native schemas or SQL query interfaces. XPERANTO also provides a means to seamlessly query over object-relational data and meta-data. Our plans for future work include providing support for insertable and updateable XML views. We are also exploring the construction and querying of XML documents having a recursive structure, such as part hierarchies and bill of material documents." See also: "XPERANTO: A Middleware for Publishing Object-Relational Data as XML Documents", by M. Carey, J. Kiernan, J. Shanmugasundaram, E. Shekita, and S. Subramanian. VLDB Conference, September 2000.[cache]

  • [August 30, 2000] "Efficiently Publishing Relational Data as XML Documents." By J. Shanmugasundaram, E. Shekita, R. Barr, M. Carey, B. Lindsay, H. Pirahesh, and B. Reinwald VLDB Conference, September 2000. "XML is rapidly emerging as a standard for exchanging business data on the World Wide Web. For the foreseeable future, however, most business data will continue to be stored in relational database systems. Consequently, if XML is to fulfill its potential, some mechanism is needed to publish relational data as XML documents. Towards that goal, one of the major challenges is finding a way to efficiently structure and tag data from one or more tables as a hierarchical XML document. Different alternatives are possible depending on when this processing takes place and how much of it is done inside the relational engine. In this paper, we characterize and study the performance of these alternatives. Among other things, we explore the use of new scalar and aggregate functions in SQL for constructing complex XML documents directly in the relational engine. We also explore different execution plans for generating the content of an XML document. The results of an experimental study show that constructing XML documents inside the relational engine can have a significant performance benefit. Our results also show the superiority of having the relational engine use what we call an 'outer union plan' to generate the content of an XML document. [.. .] To summarize, our performance comparison of the alternatives for publishing XML documents points to the following conclusions: (1) Constructing an XML document inside the relational engine is far more efficient that doing so outside the engine, mainly because of the high cost of binding out tuples to host variables. (2) When processing can be done in main memory, a stable approach that is always among the very best (both inside and outside the engine), is the Unsorted Outer Union approach. (3) When processing cannot be done in main memory, the Sorted Outer Union approach is the approach of choice (both inside and outside the engine). This is because the relational sort operator scales well." [cache]

  • [August 30, 2000] "Relational Databases for Querying XML Documents: Limitations and Opportunities." By J. Shanmugasundaram, K. Tufte, G. He, C. Zhang, D. DeWitt, and J. Naughton. VLDB Conference, September 1999. See also the slides. "With the growing importance of XML documents as a means to represent data in the World Wide Web, there has been a lot of effort on devising new technologies to process queries over XML documents. Our focus in this paper, however, has been to study the virtues and limitations of the traditional relational model for processing queries over XML documents conforming to a schema. The potential advantages of this approach are many -- reusing a mature technology, using an existing high performance system, and seamlessly querying over data represented as XML documents or relations. We have shown that it is possible to handle most queries on XML documents using a relational database, barring certain types of complex recursion. Our experience has shown that relational systems could more effectively handle XML query workloads with the following extensions: (1) Support for Sets: Set-valued attributes would be useful in two important ways. First, storing set sub-elements as set-valued attributes would reduce fragmentation. This is likely to be a big win because most of the fragmentation we observed in real DTDs was due to sets. Second, set-valued attributes, along with support for nesting [13], would allow a relational system to perform more of the processing required for generating complex XML results. (2) Untyped/Variable-Typed References: IDREFs are not typed in XML. Therefore, queries that navigate through IDREFs cannot be handled in current relational systems without a proliferation of joins -- one for each possible reference type. (3) Information Retrieval Style Indices: More powerful indices, such as Oracle8i's ConText search engine for XML, that can index over the structure of string attributes would be useful in querying over ANY fields in a DTD. Further, under restricted query requirements, whole fragments of a document can be stored as an indexed text field, thus reducing fragmentation. (4) Flexible Comparisons Operators: A DTD schema treats every value as a string. This often creates the need to compare a string attribute with, say, an integer value, after typecasting the string to an integer. The traditional relational model cannot support such comparisons. The problem persists even in the presence of DCDs or XML Schemas because different DTDs may represent 'comparable' values as different types. A related issue is that of flexible indices. Techniques for building such indices have been proposed in the context of semi-structured databases . (5) Multiple-Query Optimization/Execution: As outlined in Section 4, complex path expressions are handled in a relational database by converting them into many simple path expressions, each corresponding to a separate SQL query. Since these SQL queries are derived from a single regular path expression, they are likely to share many relational scans, selections and joins. Rather than treating them all as separate queries, it may be more efficient to optimize and execute them as a group. (6) More Powerful Recursion: As mentioned in Section 4, in order to fully support all recursive path expressions, support for fixed point expressions defined in terms of other fixed point expressions (i.e., nested fixed point expressions) is required. These extensions are not by themselves new and have been proposed in other contexts. However, they gain new importance in light of our evaluation of the requirements for processing XML documents. Another important issue to be considered in the context of the World Wide Web is distributed query processing -- taking advantage of queryable XML sources. Further research on these techniques in the context of processing XML documents will, we believe, facilitate the use of sophisticated relational data management techniques in handling the novel requirements of emerging XML-based applications." For related references, see "XML and Query Languages." [cache]

  • [August 29, 2000] "Constraints-preseving Transformation from XML Document Type Definition to Relational Schema." By Dongwon Lee and Wesley W. Chu (Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095, USA Email: {dongwon,wwc} UCLA CS-TR 200001 (Technical Report). 22 pages (with 21 references). Also, paper presented at the 19th International Conference on Conceptual Modeling (ER), Salt Lake City, Utah, October, 2000. "As the Extensible Markup Language (XML) is emerging as the data format of the internet era, more needs to efficiently store and query XML data arise. One way towards this goal is using relational database by transforming XML data into relational format. In this paper, we argue that existing transformation algorithms are not complete in the sense that they focus only on structural aspects, while ignoring semantic aspects. We show the kinds of semantic knowledge that needs to be captured during the transformation in order to ensure correct relational schema at the end. Further, we show a simple algorithm that can: (1) derive such semantic knowledge from the given XML Document Type Definition (DTD), and (2) preserve the knowledge by representing them in terms of semantic constraints in relational database terms. By combining the existing transformation algorithms and our constraints- preserving algorithm, one can transform XML DTD to relational schema where correct semantics and behaviors are guaranteed by the preserved constraints. Our implementation and complete experimental results are available from [XPRESS Home Page]. . . [Conclusion:] Since the schema design in relational databases greatly affects the query processing efficiency, how to transform the XML DTD to its corresponding relational schema is an important problem. Further, due to the XML DTD's peculiar characteristics and its incompatibility between the hierarchical XML and flat relational model, the transformation process is not a straightforward task. After showing a variety of semantic constraints hidden implicitly or explicitly in DTD, we presented two algorithms on: 1) how to discover the semantic constraints using one of the existing transformation algorithms, and 2) how to re-write the semantic constraints in relational notation. Then, using a complete example developed through the paper, we showed semantic constraints found in both XML and relational terms. The final relational schema transformed from our CPI algorithm not only captures the structure, but also the semantics of the given DTD. Further research direction of using the semantic constraints towards query optimization and semantic caching is also presented." Available in PDF format. [cache]

  • [August 29, 2000] "Comparative Analysis of Six XML Schema Languages." By Dongwon Lee and Wesley W. Chu (Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095, USA Email: {dongwon,wwc} UCLA CS-TR 200008 (Technical Report). Also published in ACM SIGMOD Record Volume 29, Number 3 (September, 2000). " Abstract: As XML is emerging as the data format of the internet era, there is an substantial increase of the amount of data in XML format. To better describe such XML data structures and constraints, several XML schema languages have been proposed. In this paper, we present a comparative analysis of the six noteworthy XML schema languages. As of June 2000, there are about a dozen of XML schema languages that have been proposed. Among those, in this paper, we choose six schema languages (XML DTD, XML Schema, XDR, SOX, Schematron, DSD) as representatives. Our rationale in choosing the representatives is as follows: (1) they are backed by substantial organizations so that their chances of survival is high (e.g., XML DTD and XML Schema by W3C, XDR by Microsoft, DSD by AT&T), (2) there are publically known usages or applications (e.g., XML DTD in XML, XDR in BizTalk, SOX in xCBL), (3) the language has a unique approach distinct from XML DTD (e.g., SOX, Schematron, DSD)..." The document is also available in Postscript and HTML formats. For schema description and references, see "XML Schemas." [cache]

  • [August 29, 2000] "DTD-Miner: A Tool for Mining DTD from XML Documents." By Chuang-Hue Moh, Ee-Peng Lim, and Ng Wee Keong [Email:]. Pages 144-151 in Proceedings of the Second International Workshop on Advanced Issues of E-Commerce and Web-Based Information Systems. Second International Workshop on Advanced Issues of E-Commerce and Web-Based Information Systems (WECWIS 2000), Milpitas, CA, June 8-9, 2000. "XML documents are semi-structured and the structure of the documents is embedded in the tags. Although XML documents can be accompanied by a document type definition (DTD) that defines the structure of the documents, the presence of a DTD is not mandatory. The difficulty in deriving the DTD for XML documents lies in the fact that DTDs are of a different syntax from XML and that prior knowledge of the structure of the documents is required. In this paper, we introduce DTD-Miner, an automatic structure mining tool for XML documents. Using a Web-based interface, the user is able to submit a set of similarly structured XML documents and the system automatically suggests a DTD. The user is also able to further refine the DTD generated to reduce the complexity by relaxing some the rules used in the system." Note: The authors have provided an online demo for DTD-Miner. From the Web site: 'Automatic Derivation of DTDs for XML Documents': "The DTD-Miner [Version 1.5] is a prototype system for mining DTDs from XML documents. This system was built at the Centre for Advanced Information Systems, School of Applied Science of the Nanyang Technological University, under the supervision of Asst. Prof. (Dr) Lim Ee Peng. For further details pertaining to this project, please refer to the Project Objective and Project Description pages. Also, do not forget the people that made this project possible."] Objective: "Web documents are semistructured and this encumbers the automatic post-processing of the information that they contain. Semistructured data however, do contain some form of non-rigid structures, which is often encapsulated in the documents. XML documents, in particular, are semistructured and the structure of the documents is embedded in the tags. Although XML documents can be accompanied by a DTD that defines the structure of the documents, the presence of a DTD is not mandatory. The difficulty in deriving the DTD for XML documents lies in the fact that DTDs are of different syntax as XML and that prior knowledge of the structure of the documents is required. The DTD-Miner is an automatic structure mining tool for XML documents. Using a Web-based interface, the user will be able to submit a set of similarly structured XML documents and the system will automatically suggest a structure for the set of documents in the form of a DTD. The system further ensures that the set of documents will be in conformance to the DTD generated. The user is also able to further refine the DTD generated to reduce the complexity by relaxing some the rules used in the system. For related work, see (1) The OCLC Fred Home Page and (2) "SGML/XML DTD Transduction and Generation."

  • [August 29, 2000] "Re-engineering Structures from Web Documents." By Chuang-Hue Moh, Ee-Peng Lim, and Wee-Keong Ng (Center for Advanced Information Systems, School of Applied Science, Nanyang Technological University, Nanyang Avenue, Singapore 639798, SINGAPORE). Pages 67-76 in Proceedings of the Fifth ACM Conference on Digital Libraries. ACM Digital Libraries 2000, June 2-7, 2000, San Antonio, Texas. "To realise a wide range of applications (including digital libraries) on the Web, a more structured way of accessing the Web is required and such requirement can be facilitated by the use of XML standard. In this paper, we propose a general framework for reverse engineering (or re-engineering) the underlying structures i.e., the DTD from a collection of similarly structured XML documents when they share some common but unknown DTDs. The essential data structures and algorithms for the DTD generation have been developed and experiments on real Web collections have been conducted to demonstrate their feasibility. In addition, we also proposed a method of imposing a constraint on the repetitiveness on the elements in a DTD rule to further simplify the generated DTD without compromising their correctness. . . The key objective of this project is to re-engineer the underlying structures of a given set of Web documents. We propose a general framework for Structure Re-engineering from Web documents and produce a DTD for each subset of similarly structured Web documents as a final result. In the project, we do not attempt to solve all the problems pertaining to structure re-engineering. Instead, we propose a general framework for structure re-engineering of Web documents. We introduce a structural representation for a set of Web documents, in particular XML documents or semantically tagged HTML documents 1 that share a common structure but do not come with a DTD. We then develop the algorithms for discovering the DTD from the structural representation. We have also conducted experiments on real-life examples of Web documents to demonstrate the discovery algorithms. In this research, we focus on the textual and tag information within the Web documents. Other objects embedded in the documents such as multimedia data, hyperlinks, entity references and element attributes have not been considered so far but extensions of our algorithms to cater for such objects can be made in the future research. . . The automatic creation of DTD in the OCLC's GB-Engine uses an approach that is fairly similar to ours. In the GB-Engine, an internal tree representation is built and converted into a grammar. The grammatical rules are then combined, generalized and reduced to produce a corresponding DTD. We see that the generation of an internal tree representation is similar to the Document Tree data structure that we propose. In their work, reduction rules like 'identical bases', 'off by one' and 'redundant' were used to reduce the complexity of the DTDs generated. Nevertheless, the complexity of generated DTDs cannot be easily controlled by the users. In our proposal, we employ the Longest Common Subsequence (LCS) concept and also a user defined parameter maximum repetition factor to provide a more general and flexible method to reduce the complexity of the DTD generated. In the Lore project, the OEM was proposed to model the structures of semistructured data. The OEM model addresses the need of a more flexible data model for semistructured data like Web documents, as compared to conventional data models like object-oriented models. The main 'drawback' of the OEM model is the missing ordering information about the elements in the schematic description of OEM model, also known as the DataGuides. XML, on the other hand, does require the elements to conform to the ordering defined in the DTD. [Conclusions:] In this paper, the concept of re-engineering structures from Web documents has been introduced. Based on a structure re-engineering framework, we have developed some algorithm to construct a Spanning Graph that describes the structures of a set of similarly structured XML documents. We further proposed to generate the DTD for these XML document using a set of heuristic rules. For demonstration purposes, we have implemented our proposed technique into a prototype system known as DTDMiner. The Web interface for the system can be found at The system allows the user to supply some XML files and generates a DTD for them. It also supports relaxation of the generated DTDs. As part of our future research, we plan to extend the reengineering techniques in the following directions: (1) Discovering of attributes and attribute types: The way that we have handled attributes so far is to simply assume that all the attributes are mandatory and of type CDATA. Attributes however, can be of various data types and may not always be required in the XML standard. As a result, we need to explore into more sophisticated ways of handling attributes to produce more accurate DTDs. Note that attributes can prove to be important to the structures of XML documents e.g., the XLink standard utilizes attributes to define the hyperlinks between XML documents. (2) Discovering inter-document structures: The framework we have proposed is primarily used to discover the structures within Web documents i.e., intra-document structures. We see that such structures are not the only category of structures that can exist in Web documents. The hyperlinks that exist in almost all Web documents present an inter-document structure (e.g., Web-site structure). Used in conjunction with the DTD discovered, the inter-document structures can provide a useful road-map to user query formulation."

  • [August 29, 2000] "Practical XML with Linux, Part 2: A survey of tools. The standard remains uncorrupted by the influx of heavy hitters." [XML on Linux.] By Uche Ogbuji. In Linux World (August 23, 2000). ['The open standard has caught fire in the last year and the community suddenly has a plethora of tools to play with. XML's popularity as a document-exchange format has soared recently. Uche Ogbuji surveys the vast menagerie of sometimes remarkably polished tools available for creating and serving XML documents.]' "This article briefly introduces some XML tools for Linux in a few basic categories: parsers, Web servers, application servers, GUIs, and bare-bones tools. Most users' introduction to XML will be geared toward better Webpage management. They may then choose to migrate to complete, all-inclusive application servers or to construct custom systems from the various XML toolkits available and the usual Unix duct tape and wire ties. There is usually some content to manage, and you may see no reason to leave the world of Emacs or vi to churn out documents. However, content managers are often non-technical, so it's helpful that there is a good selection of GUI XML editors... As I mentioned in my last article, the W3C and other standards organizations are working very quickly to complete specifications for technologies complementary to XML. I mentioned namespaces, which are a key facility for managing global names and are now used in almost every XML technology. I also mentioned DOM and XSLT. Since then, XLink, XPointer, XML Schemas, SVG, and other specs have neared completion. I will discuss these later in the series, as well as RDF, Schematron, and other beasts in the XML menagerie. . ."

  • [August 29, 2000] "Seybold shows new face of DTP." By Andreas Pfeiffer [Pfeiffer Report on Emerging Trends and Technologies.] In (August 28, 2000). "Like earlier editions of the publishing conference and trade show before it, Seybold San Francisco 2000, which opens its doors at Moscone Center this week, offers a snapshot of the eccentric habits of digital publishing in the Internet Era. . . XML at the vanguard: If on the other hand, you are looking for an industry that is quietly preparing for its next revolution, you are in the right spot; you just may have to revise your expectations a little. Watch out for XML, for instance, which is quickly moving from the status of a rather hermetic data-encoding standard for high-end applications to the much-coveted spot of "the next big thing" as a flurry of companies both big and small are moving to adapt their wares for the brave new world of cross-media publishing. XML has been the backbone of high-end systems such Arbortext's EPIC system or Artesia's TEAMS digital-asset-management systems for a long time, but now the market is getting ready for a more horizontal approach. Quark already has avenue.quark, an XTension for XPress 4, which gives XPress a certain extent of XML awareness, but Adobe shouldn't be far behind when it comes to announcing some form of XML strategy as well. In other words, XML is on the way..."

  • [August 29, 2000] Wireless apps streamlined." By Roberta Holland. In eWEEK (August 27, 2000). "New wireless server solutions are being rolled out that prom ise to make it easy for corporations to build and launch applications for smart phones and other mobile devices. BEA Systems Inc. and Nokia Mobile Phones Inc. have teamed to develop the BEA WebLogic M-Commerce Solution. The package, which combines BEA's WebLogic Server and WebLogic Commerce Server with Nokia's WAP (Wireless Application Protocol) Server, allows developers to build scalable e-commerce services on a single platform, regardless of whether a user is accessing the information via a PC or a personal digital assistant. . . Pricing for the server starts at $6,500 for a starter development kit. A 50-user deployment kit, which includes a four-CPU clustering license for WebLogic Commerce Server, costs $268,600. Another wireless server, produced by Bluestone Software Inc., of Philadelphia, and New York-based Zyglobe Inc., is slated for release next month. The Zy-MobileServer incorporates Bluestone's J2EE-compliant and XML (Extensible Markup Language)-based platform with Zyglobe's wireless expertise built on top. The server will be able to automatically detect the protocol and device making a request and generate appropriate content back, whether it is intended for Wireless Markup Language or Short Message Service."

  • [August 29, 2000] "Microsoft ships test version of Office update." By Erich Luening. From CNET (August 29, 2000). "Microsoft today will begin shipping the first test version of the latest release of its Office desktop productivity suite. Code-named Office 10, the update includes new speech-recognition technology, additional support for Extensible Markup Language (XML), a Web-based collaboration application to let workers share documents and other data, and content-management tools. Gurry said the speech-recognition technology marks the first time the company has included such a feature in its products. Office 10 furthers the application's support for XML -- a system for marking up documents with industry- or task-specific tags -- within the business applications Excel and Access. In what the company has dubbed a 'team workplace' application, people will be able to add and edit content, such as announcements and documents, on a common Web page using a Web browser without needing to know HTML, Gurry said. The workplace feature meets the needs expressed by many Office users who 'asked for Office to do more things with the Web,' Gurry said. The newest version of Office also includes Smart Tags, which pop up on screen while people access or enter data. The tags will give people choices to automatically correct, number or format the data they are working on..." See the announcement.

  • [August 29, 2000] "Macromedia sparks overhaul with Flash update." By Paul Festa. From CNET (August 29, 2000). "Macromedia began a redesign of its entire product line today with the latest version of its Flash animation authoring software. Flash authors are the first to get Macromedia's new user interface, which the company is in the process of revamping with both new users and professionals used to traditional authoring software in mind. . . Apart from the user interface, Flash 5 comes with a variety of other new features. These include a new scripting language, ActionScript, which is modeled on JavaScript, the ubiquitous scripting language developed by Netscape Communications. Flash 5 also includes support for Extensible Markup Language (XML), a standard for tagging documents with industry- or task-specific markup tags. XML support will mean that Flash documents can exchange XML-formatted data with back-end database and server applications."

  • [August 28, 2000] "A Logical Interpretation of RDF." By Wolfram Conen and Reinhold Klapsing. Discussion Paper -- Version: 1.0, 26-August-2000. ['A document written to capture the precise meaning of RDF: we tried to express the concepts and constraints of the RDF model in first-order logic (FOL). The discussion paper contains the facts and rules we found. A SiLRI-conform datalog formulation of the rules and facts is also contained. Our approach tries to separate clearly between the level of representation (triples) and the level of knowledge (instanceOf, Resource, statements, predicates to express constraint violations etc.), avoids asserting negated facts, and discusses a way to utilize the rules in applications. A summary of motivation and key aspects is included on the title page of the paper. We really invite interested readers to have a look at it and to send us their comments.]' Abstract: "The Resource Description Framework (RDF) is intended to be used to capture and express the conceptual structure of information offered in the Web. Interoperability is considered to be an important enabler of future web applications. While XML supports syntactic interoperability, RDF is aimed at semantic interoperability. Interoperability is only given if different users/agents interpret an RDF data model in the same way. Important aspects of the RDF model are, however, expressed in prose which may lead to misunderstandings. To avoid this, capturing the intended semantics of RDF in first-order logic might be a valuable contribution and may provide RDF with a formalization allowing its full exploitation as a key ingredient of the evolving semantic Web. This paper tries to express the concepts and constraints of the RDF model in first-order logic. [Note]: This is a step towards a logic-based formalization of RDF. It grew out of the necessity to precisely capture the semantics of RDF schemata while developing an RDF-based modeling framework for web applications (XWMF). The rules and facts that are described in the following allow to validate RDF triple sets. This in itself is not very exciting (VRP could be used instead). It becomes, however, relevant, if schema constructs are required that restrict, modify or extend the initial semantics, as, for example, monotone inheritance or typed containers. In such cases, rules that allow a precise interpretation of introduced schema constructs can be to extend the basic rule set provided here. If the rules are themself given as XML/RDF, this can be a natural extension of the preciseness of the semantic expressibility of RDF schema definitions, i.e., each RDF schema definition (syntax) may be accompanied by a document defining 'logically' its semantics. Here, this starts to become pretty interesting. It would, however, require to extend parsers with a logic inference engine (e.g., as has been done with SiLRI, which is a nice tool that is -- in the version that we have available -- unfortunately not free from bugs). Some more remarks on our philosophy: we tried to retain as much of the RDF 'spirit' as possible. We avoided, for example, to introduce new meta constructs etc. We designed the rules in a way such that violations of constraints are explicitly detected (this avoids to leave the knowledge base in an inconsistent state as it would happen if negated facts would be asserted). In this way, the violation predicates can be queried and appropriate actions can be determined by the interpreting application. We did not 'resolve any ambiguities', simply because we haven't found ambiguities in the strict sense. Instead, we tried to point out where problems in applying the rules may occur (e.g., subPropertyOf). The rules make use of negations -- but, on the knowledge level (that is: we interpret the triples by making certain relationships explicit, such as the instanceOf predicate), everything is stratified (in contrary to the statement that due to the triple nature of RDF statements, no reasonable stratification can be found -- this is true only on the representation level). This is nice, because a natural model-theoretic interpretation of the rules exist. The rules and facts can easily be fed into a datalog interpreter (e.g., SiLRI). Inference engines based on SLD resolution may have some nasty problems with the subPropertyOf rules (not a problem of the rules but of top-down/depth-first inference mechanisms)..." Also in Postscript and HTML-Format. [From '' list] See "Resource Description Framework (RDF)."

  • [August 28, 2000] "The Relationship Between General and Specific DTDs: Criticizing TEI Critical Editions." By djb@clover.slavic.pitt.eduDavid J. Birnbaum (Department of Slavic Languages and Literatures, 1417 Cathedral of Learning, University of Pittsburgh, Pittsburgh, PA 15260 USA). Paper presented at the Extreme Markup Languages 2000 (August 13 - 18, 2000, Montréal, Canada). Published as pages 9-27 (with 13 references) in Conference Proceedings: Extreme Markup Languages 2000. 'The Expanding XML/SGML Universe', edited by Steven R. Newcomb, B. Tommie Usdin, Deborah A. Lapeyre, and C. M. Sperberg-McQueen. "The present study discusses the advantages and disadvantages of general vs specific DTDs at different stages in the life of an SGML document based on the example of support for textual critical editions in the TEI. These issues are related to the question of when to use elements, attribute, or data content to represent information in SGML and XML documents, and the article identifies several ways in which these decisions control both the degree of structural control and validation during authoring and the generality of the DTDs. It then offers three strategies for reconciling the need for general DTDs for some purposes and specific DTDs for others. All three strategies require no non-SGML structural validation and ultimately produce fully TEI-conformant output. The issues under consideration are relevant not only for the preparation of textual critical editions, but also for other element-vs-attribute decisions and general design issues pertaining to broad and flexible DTDs, such as those employed by the TEI. [...] General Conclusions: Any of the three strategies discussed (processing a modified TEI DTD with respect to TEIform attribute values, transformation of a custom DTDs to a TEI structure, and architectural forms) provides a solution to the issues posed by a score-like edition. Specifically, these strategies all permit much greater structural control than is available in the standard TEI DTDs, rely entirely on SGML for all validation, and produce a final document that is fully TEI-conformant." See also "Text Encoding Initiative (TEI)."

  • [August 28, 2000] "A TEI-Compatible Edition of the Rus' Primary Chronicle." By">David J. Birnbaum (Department of Slavic Languages and Literatures, 1417 Cathedral of Learning, University of Pittsburgh, Pittsburgh, PA 15260 USA). To be published in Medieval Slavic Manuscripts and SGML: Problems and Perspectives, Anisava Miltenova and David J. Birnbaum, ed), Sofia: Institute of Literature, Bulgarian Academy of Sciences, Marin Drinov Publishing House. 1999. In press [2000-08-26]. "This report describes the development of a TEI-conformant SGML edition of the Rus' Primary Chronicle (Povest' vremennykh let) on the basis of an electronic transcription of the text that originally had been prepared for paper publication using troff. The present report also discusses strategies for browsing, indexing and querying the resulting SGML edition. Selected electronic files developed for this project are available at a web site maintained by the author. . . The Rus' Primary Chronicle (PVL) tells the history of Rus' from the creation of the world through the beginning of the twelfth century. It was based on both Byzantine chronicles and local sources and underwent a series of redactions before emerging in the early twelfth century in the form that scholars currently identify as the PVL. This text was then adopted as the foundation of later East Slavic chronicle compilations. [. . .] I decided to use the Text Encoding Initiative (TEI) document type description (DTD) for the SGML edition of the PVL for two reasons. First, the TEI DTD is widely used, which means that a TEI-conformant edition of the PVL can be processed using existing tools and can easily be incorporated into existing TEI-oriented digital libraries. Second, the support for critical editions in the TEI DTD was developed with input from an international committee of experienced philologists from different disciplines, and it was clearly sensible to take advantage of their careful analysis of issues confronting the preparation of critical editions, particularly in an electronic format. In fact, the TEI DTD supports three different encoding strategies for critical editions (the location-referenced method, the double-end-point-attached method, and the parallel segmentation method), and my decision to adopt a TEI approach required me to evaluate and choose among those strategies. . . [Conclusions:] In general, any electronic edition will provide faster searching and retrieval than a paper edition. If one wishes to take the structure of a document into consideration, an SGML document will support more sophisticated structural queries than plain text or text with procedural markup (such as troff). The present report has documented the generation of a TEI-conformant SGML edition of the PVL from troff source using free tools. It has also illustrated the convenience of browsing and searching the text in Panorama, which includes support for queries that refer to the SGML element structure. This report has also described the use of Pat in a web-based environment to retrieve and render only selected portions of the document. Although Pat does not support regular expressions directly, this report has outlined a method for overcoming this limitation." See also the PVL reference document. On TEI: "Text Encoding Initiative (TEI)." [cache]

  • [August 28, 2000] "XML To Be, VRML To See." By B. Arun and A. D. Ganguly. In ACM Crossroads Issue 6.2 [Markup Languages] (Winter 1999). Guest edited by Jason Bedunah. "We are trying to combine XML and VRML to allow visualization of chemical simulations over the web. The simulation data is marked up in a new XML application which is then converted to VRML for display on a Web browser using a plugin. MoDL (pronounced as Model) allows very simple constructs like atom, molecule, radical, bond, etc. Using these building blocks, chemists can build very complex macromolecules easily. There is a DEFINE construct which allows users to define particular types of atoms, molecules, bonds or radicals and then use them later to instantiate objects of that type. For example, oxygen and hydrogen atoms can be defined like this [...] Typically, bonds form between atoms when the distance between them is less than some particular value. We have used this concept so that authors don't have to explicitly write every bond they want in the molecular structure. The maximum inter-atomic distance between pairs of atoms can be recorded in the head of the document in a bond-table, and bonds will be put automatically between atoms where the distance is less than the specified value. In addition to this, authors can still explicitly put bonds between atoms where they want to. Chemists typically want to see plots and graphs of various variables when the simulation is running. To facilitate this, MoDL has support for 2D and 3D plots. Three types of plot styles are supported - points, histogram and line plots. For every point on the curve, one can provide the time (as a fraction of the period) when it will be shown. In this way we can have animation in the plots as well. Another useful thing is to be able to view the simulation from any point in the 3D space. Users viewing the simulation can navigate around in the 3D scene using the navigation controls present in VRML browsers. In addition, users can specify any of the atoms or molecules to be viewpoints in the MoDL file, jump to its position and view the simulation. Users can also sit on one of dynamic atoms and go for a ride, seeing the simulation as that particular atom would see it. [...] The conversion from XML to VRML is done by a Perl script that uses the XML::Parser module. XML::Parser is the Perl interface to expat, a C library written by James Clark. Expat is an event-based non-validating XML parser. Ideally, this conversion should be done using style sheets. But the current state of XSL technology does not provide appropriate tools for this purpose. The combination of XML and VRML can prove to be a very effective way of conveying chemical information over the Internet. Visualization of chemical simulations will get a boost once streaming is introduced in VRML. This will allow us to visualize a simulation when it is actually running in real-time. Other disciplines like Architecture and Planning can also benefit from the use of these technologies."

  • [August 28, 2000] "Molecular Dynamics Visualization with XML and VRML." By B. Arun, V. Chandru, A. D. Ganguly, and Swami Manohar (Department of Computer Science and Automation, Indian Institute of Science, Bangalore 560012, India). Pages 335-341 in Proceedings of the Computer Graphics International (CGI 2000) [Computer Graphics International 2000, Geneva, Switzerland, June 19-24, 2000]. Los Alamitos, CA: IEEE Computer Society, 2000. "A new Extensible Markup Language (XML) application, Molecular Dynamics Language (MoDL) has been developed. MoDL provides a simple, but powerful tool for molecular dynamics visualization and has been developed by combining, for the first time, the strengths of XML and the Virtual Reality Modeling Language. The details of MoDL, its implementation and examples of its use are presented in this paper. . . We have combined XML and VRML for visualizing chemical simulations over the web. The simulation data is marked up in a new XML application which is then converted to VRML for display on a Web browser using a plugin. We present the details of the language, the implementation and the results of this work in this paper. [...] Although MoDL has been designed to be simple and flexible, creating large documents by hand can prove to be a cumbersome task. For chemical systems having few tens of atoms, it is possible to edit the MoDL representation using any text editor. But when the systems contain few hundreds to thousands of atoms and molecules along with dynamics, it is best to automate the process of generation of MoDL documents. We have created a tool, MoDLEd,the MoDL authoring cum editing tool. MoDLEd allows users to specify format of their data and generates a Perl script that converts data from that format to MoDL. The only constraints on the format is that each line has information for one atom/molecule and various values in a line (like x,y,z) are separated by space(s). Meta information about the types of atoms and bonds in a molecule can be specified in a separate file. Minimal editing facilities are also made available using the XML::DOM module. MoDLEd has been written in Perl with the GUI built using Tk. [Conclusion:] The combination of XML and VRML can prove to be a very effective way of conveying chemical information over the Internet. Visualization of chemical simulations will get a boost once streaming is introduced in VRML. This will allow us to visualize a simulation when it is actually running in real time. Other disciplines like Architecture and Planning can also benefit from the use of these technologies. MoDL and the accompanying visualization can also be used as an instructional tool. While explaining a physical process, instructors can prepare a MoDL file from a simulation of that process and show the visualization to the students. This will allow the students to easily grasp what is going on. Our current work is focused on improving the functionality of MoDL as well as to address visualization problems in related areas. For instance, recently we have started work on visualizing shear in fluid suspensions as well as problems in crystallography."

  • [August 28, 2000] "Getting Started with XSLT and XPath." By G. Ken Holman. From (August 23, 2000). ['In the second part of his comprehensive introduction to XSLT and XPath, G. Ken Holman examines practical stylesheets and explains the various approaches to writing XSLT. Using example stylesheets, Ken covers the basic syntax of XSLT and the common patterns of stylesheet design.'] "Examining working stylesheets can help us understand how we use XSLT and XPath to perform transformations. This article first dissects some example stylesheets before introducing basic terminology and design principles. Let's first look at some example stylesheets using two implementations of XSLT 1.0 and XPath 1.0: the XT processor from James Clark, and the third web release of MS Internet Explorer 5's MSXML Technology Preview. These two processors were chosen merely as examples of, respectively, standalone and browser-based XSLT/XPath implementations, without prejudice to other conforming implementations. The code samples only use syntax conforming to XSLT 1.0 and XPath 1.0 Recommendations and will work with any conformant XSLT processor..." For related resources, see "XSL/XSLT: Articles and Papers."

  • [August 28, 2000] "XML Q&A: Choosing an XML Parser." By John E. Simpson. From (August 22, 2000). "... it's not really a question to concern yourself with if you're just interested in browsing XML, editing it, or creating style sheets. The developer of the browsing application will almost certainly have made the decision for you, and you can probably override it only with difficulty (and perhaps intellectual pain). For instance, Microsoft's Internet Explorer 5.x browsers use a parser built into the MSXML.DLL file, and the Mozilla browser is built on a parser, written in C, called 'expat.' Nevertheless, in some cases you do need to select a parser. . . One final thing to bear in mind when you embark on a search for the 'best parser,' whatever that means for you: You'll need to limit your search very quickly or go crazy. Back in 1998, within a few months of the XML 1.0 Recommendation's release, one observer reported on XML-DEV that he'd found over 200 parsers (after hitting 200, he gave up counting)..." Note: The W3C work on XML schema and related efforts introduce varying kinds and degrees of "validation" that extend 'parsing' beyond notions of mere "well-formedness" and "DTD-validity" as defined in XML 10. Additionally: (1) for references to interactive online validation tools for XML, see the collection of URLs; (2) for a categorized list of "parsers and engines", see the reference collection from Lars Marius Garshol; this 'parser' list covers Architectural forms engines, DOM implementations, DSSSL engines, RDF parsers, SGML/XML parsers, XLink/XPointer engines, XML middleware, XML parsers, XML validators (software for validating XML documents by other means than DTDs), XSL engines, XSLT engines, etc.

  • [August 28, 2000] "Adapting Content for VoiceXML." By Didier Martin. From (August 23, 2000). ['In the second part of his "Write Once, Publish Everywhere" project, Didier Martin takes us through creating content for voice browsers. He deals with the architecture necessary to interface with voice browsers, and includes a simple introduction to VoiceXML.'] "Not all devices are able to accept and transform an XML document into something we can perceive with our senses. Most of the time, the target device supports only a rendering interpreter. Therefore we need to transform the abstract model -- as held in our server-side XML document, into a format the rendering interpreter understands. Browsers with more resources at their disposal will probably in the near future perform the transformation on the client side. But devices with less available resources require that any transformation from an XML model to a rendering format occurs on the server side. To transform the <xdoc> element and its content, we'll use XSLT (Extensible Stylesheet Language Transformations). Dependent on the device profile, we perform one of three different transformations..." See "VoiceXML Forum (Voice Extensible Markup Language Forum)."

  • [August 28, 2000] "Object Management Group Readies Standards Upgrades." By Tom Sullivan. In InfoWorld August 24, 2000. "The Object Management Group (OMG) is planning to meet next month to work on the evolution of standards such as CORBA and UML (Unified Modeling Language) standards. Andrew Watson, vice president and technical director of CORBA, said that the lineup of topics includes evaluating submissions for technologies such as a version 2.0 specification of UML. To boost UML, Needham, Mass.-based OMG will focus on language infrastructure, language superstructure, OCL (Object Constraint Language), and UML Diagram Interchange. Another goal of UML 2.0 is to embrace new technologies, such as Java and Jini, Watson said. Watson continued that OMG will start the technical process for enabling CORBA and SOAP (Simple Object Access Protocol) to work together, an effort that will be finished early next year. OMG members also will spend some time focusing on older technologies. Specifically, OMG will review a submission for a language mapping from PL1 to CORBA. . ." For other references to XML-Corba, see "XML and CORBA."

  • [August 26, 2000] "Semantic Interoperability on the Web." By Jeff Heflin and James Hendler (University of Maryland). Paper presented at the Extreme Markup Languages 2000 Conference (August 13 - 18, 2000, Montréal, Canada). Published as pages 111-120 (with 10 references) in Conference Proceedings: Extreme Markup Languages 2000. 'The Expanding XML/SGML Universe', edited by Steven R. Newcomb, B. Tommie Usdin, Deborah A. Lapeyre, and C. M. Sperberg-McQueen. "XML will have a profound impact on the way data is exchanged on the Internet. An important feature of this language is the separation of content from presentation, which makes it easier to select and/or reformat the data. However, due to the likelihood of numerous industry and domain specific DTDs, those who wish to integrate information will still be faced with the problem of semantic interoperability. In this paper we discuss why this is not solved by XML, and then discuss why the Resource Description Framework is only a partial solution. We then present the SHOE language, which we feel has many of the features necessary to enable a semantic web, describe an existing set of tools that make it easy to use the language. [...] if the growing proliferation of DTDs is indicative, then web developers will still be faced with the problem of semantic interoperability, i.e., the difficulty in integrating resources that were developed using different vocabularies and different perspectives on the data. To achieve semantic interoperability, systems must be able to exchange data in such a way that the precise meaning of the data is readily accessible and the data itself can be translated by any system into a form that it understands. The benefits of semantic interoperability would be numerous. For example, search can often be frustrating because of the limitations of keyword-based matching techniques. Users frequently experience one of two problems: they either get back no results or too many irrelevant results. The problem is that words can be synonymous (that is, two words have the same meaning) or polysemous (a single word has multiple meanings). However, if the languages used to describe web pages were semantically interoperable, then the user could specify a query in the terminology that was most convenient, and be assured that the correct results were returned, regardless of how the data was expressed in the sources. Intelligent internet agents could also benefit from semantic interoperability. Currently, such agents are very sensitive to changes in the format of a web page. Although these agents would not be affected by presentation changes if the pages were available in XML, they would still break if the XML representation of the data was changed slightly (for example if the element <PRICE> was renamed to <UNITPRICE>). Semantic interoperability would allow agents to be more robust. A useful function for internet agents would be the ability to automatically integrate information from multiple sources. For example, a travel-planning agent might need to access data from airlines, hotels, rental car companies, weather sites, and tourist sites, each of which may have different ways of representing the relevant data. Such an agent would be faced with the problem of translating the different vocabularies and representations for this data into a common format that it could work with. [...] Conclusion: SHOE is not the only AI language for the Web. The Ontobroker project uses a language to describe data that is embedded in HTML, but relies on a centralized broker for ontology definitions. The Ontology Markup Language (OML) and Conceptual Knowledge Markup Language (CKML) are used together for semantic markup that is based on the theories of formal concept analysis and information flow. However, both of the languages are basically web syntaxes for more traditional KR languages, and neither considers the special characteristics of the Web to the degree that SHOE does. XML will revolutionize the Web, but semantic interoperability is needed to achieve the Web's true potential. We have discussed the limitations of XML and RDF with respect to semantic interoperability, and presented the SHOE language. In describing the basic elements of SHOE, we have explained how it is better suited for semantics on the Web than either XML DTDs or RDF. In order to demonstrate SHOE, we have built a suite of tools for its use and have described those tools here." Also in PostScript format. See: "Simple HTML Ontology Extensions (SHOE)." [cache]

  • [August 25, 2000] "Representation and Organization of Information in the Web Space: From MARC to XML." By Jian Qin (School of Information Studies, Syracuse University). In Informing Science Volume 3, Number 2 (2000), pages 83-88 (with 19 references). "Representing and organizing information in libraries has a long tradition of using rules and standards. The very first standard encoding format for bibliographic data in libraries, the MARC (MAchine-Readable Cataloging) format has been joined by a large number of new formats since the late 1980s. The new formats, mostly SGML/HTML-based, are actively taking a role in representing and organizing networked information resources. This article briefly describes the historical connection between MARC and the newer formats for representing information, and the current development of XML applications that will benefit information/knowledge management in the new environment. [...] It becomes a reality now that almost all the information flowing within and between organizations can be represented as one of these two kinds of documents (marked up by XML), stored in databases, and communicated through network systems. A recent statistical survey found that up to October 1999, a total of 179 initiatives and applications emerged [Qin 1999: "Discipline- and industry-wide metadata schemas: Semantics and Namespace Control"]. Many of these applications propose specialized data elements and attributes that range from business processes to scientific disciplinary domains [See Figure 2]. Businesses and industry associations are the most active developers in XML initiatives and applications [See Figure 3]. The burgeoning of these specialized XML applications raises a critical issue: how can we be sure that data/documents marked up by these specialized tags can be understood correctly cross different systems in different applications? It is well known that different domains use their own naming conventions for data elements in their operations. For example, the same data element 'Customer ID' may be named as 'Client ID' or 'Patron ID.' Besides the same data may be named differently, the same term may also mean different things, such as 'title' may be referring to a book, a journal article, or a person's job position. To further complicate the issue, future XML documents will most likely contain multiple markup vocabularies, which pose problems for recognition and collision. Solutions to the problems related to XML namespaces lie largely in the hands of the library and information science community who, over the years of research on information/knowledge representation and organization, have developed a whole spectrum of methodologies and systems. An immediate example is that the techniques used in thesaurus construction and control can be applied to standardize the naming of data elements in various XML applications and map out semantics of data element names in namespace repositories. [Conclusion:] When libraries began to use MARC format for their library catalogs back in the late 1960's, they mainly converted their printed records into electronic form for storage and retrieval. The materials represented by these records are physical and static. In the Web space, there is not much physical, nor static -- the material is virtual and the information is dynamic. The library's role today has more emphasis in being as a 'pathfinder' than a 'gatekeeper.' All these grant the library and information profession a wonderful opportunity to take a significant part in this information revolution, as well as a great challenge to demonstrate the value of library and information science and its potential contribution to e-organizations and e-enterprises." [cache]

  • [August 25, 2000] "Mapping XML to Java, Part 1. Employ the SAX API to map XML documents to Java objects. [SAX API Tutorial.]" By Robert Hustead. In JavaWorld Magazine (August, 2000). ['The SAX API is superior to the DOM API in many aspects of runtime performance. In this article we will explore using SAX to map XML data to Java. Because using SAX is not as intuitive as using DOM, we will also spend some time familiarizing ourselves with coding to SAX.]' "Because XML is a form of self-describing data, it can be used to encode rich data models. It's easy to see XML's utility as a data exchange medium between very dissimilar systems. Data can be easily exposed or published as XML from all kinds of systems: legacy COBOL programs, databases, C++ programs, and so on. However, using XML to build systems poses two challenges. First, while generating XML is a straightforward procedure, the inverse operation, using XML data from within a program, is not. Second, current XML technologies are easy to misapply, which can leave a programmer with a slow, memory-hungry system. Indeed, heavy memory requirements and slow speeds can prove problematic for systems that use XML as their primary data exchange format. Some standard tools currently available for working with XML are better than others. The SAX API in particular has some important runtime features for performance-sensitive code. In this article, we will develop some patterns for applying the SAX API. You will be able to create fast XML-to-Java mapping code with a minimum memory footprint, even for fairly complex XML structures (with the exception of recursive structures). In Part 2 of this series, we will cover applying the SAX API to recursive XML structures in which some of the XML elements represent lists of lists. We will also develop a class library that manages the navigational aspects of the SAX API. This library simplifies writing XML mapping code based on SAX..." [alt URL]

  • [August 25, 2000] "XML Access Control: A Fine-Grained Access Control System for XML Documents." By Ernesto Damiani, et al. August, 2000. [From the project model description:] ". . .We present an access control model to protect information distributed on the Web that, by exploiting XML's own capabilities, allows the definition and enforcement of access restrictions directly on the structure and content of the documents. The result is a flexible and powerful security system offering a simple integration with current solutions. . . The application to XML data of the latest advancement of public-key cryptography has remedied most of the security problems in communication; commercial products are becoming available providing fine-grained security features such as digital signatures and element-wise encryption to transactions involving XML data as a way to meet authenticity, secrecy and non-repudiation requirements in XML-based transactions. The objective of our work is to complete this picture, exploiting XML's own capabilities to define and implement an authorization model for regulating access to XML documents. The rationale for our approach is defining an XML markup for a set of security elements describing the protection requirements of XML documents. Our security markup can be used to provide both instance level and schema level authorizations at the granularity of XML elements. Taken together with a user's identification and its associated group memberships, as well as with the support for both permissions and denials of access, our security markup allows to easily express different protection requirements with support of exceptions. The enforcement of the requirements stated by the authorizations produces a view on the documents for each requester; the view includes only the information that the requester is entitled to see. A main feature of our model is that it smoothly accommodates the needs of both organization-wide policy managers and single document authors, automatically taking both into account to define who can exercise which access privileges on a particular XML document. Our notion of subject comprises identity and location; identity can include information about group or organization membership. The granularity of objects may be as fine as single elements or even attributes within XML documents. . . Our processor takes as input a valid XML document requested by the user, together with its XML Access Sheet (XAS) listing the associated access authorizations at the instance level. The processor operation also involves the document's DTD and the associated XAS specifying schema level authorizations. The processor output is a valid XML document including only the information the user is allowed to access. To provide a uniform representation of XASs and other XML-based information, the syntax of XASs is given by the XML DTD . . ." See also: "XML and Encryption."

  • [August 24, 2000] "XML Matters: On the 'Pythonic' treatment of XML documents as objects." By David Mertz, Ph.D. (Data Masseur, Gnosis Software, Inc.). From IBM DeveloperWorks. August 21, 2000. ['David Mertz discuss some of the goals, decisions, and limitations of the project to create a more seamless integration between XML and Python; and hopefully provide you with a set of useful modules and techniques that point to easier ways to meet programming goals. All tools created as part of the project will be released to the public domain. David Mertz presents the xml_pickle module. He discusses the design goals and decisions that went into xml_pickle and provides a list of likely uses.]' "There are a number of techniques and tools for dealing with XML documents in Python. However, one thing that most existing XML/Python tools have in common is that they are much more XML-centric than Python-centric. Certain constructs and coding techniques feel 'natural' in a given programming language, and others feel much more like they are imported from other domains. But in an ideal environment all constructs fit intuitively into their domain, and domains merge seamlessly. When they do, programmers can wax poetic rather than merely make it work. I've begun a research project of creating a more seamless and more natural integration between XML and Python. In this article, and subsequent articles in this column, I'll discuss some of the goals, decisions, and limitations of the project; and hopefully provide you with a set of useful modules and techniques that point to easier ways to meet programming goals. All tools created as part of the project will be released to the public domain. Python is a language with a flexible object system and a rich set of built-in types. The richness of Python is both an advantage and a disadvantage for the project. On one hand, having a wide range of native facilities in Python makes it easier to represent a wide range of XML structures. On the other hand, the range of native types and structures of Python makes for more cases to worry about in representing native Python objects in XML. As a result of these asymmetries between XML and Python, the project -- at least initially -- contains two separate modules: xml_pickle, for representing arbitrary Python objects in XML, and xml_objectify, for 'native' representation of XML documents as Python objects. We'll address xml_pickle in this article..."

  • [August 24, 2000] "NewsML Standard Gets Public Release." By [Seybold Staff]. In The Bulletin: Seybold News & Views on Electronic Publishing (July 19, 2000). ['XML-based standard designed to represent and manage news throughout its life cycle.'] "The technical standards body for the international news industry is moving forward with a new XML-based standard for managing and interchanging multimedia news. At its annual meeting in Geneva, more than 50 members of the International Press Telecommunications Council (IPTC) approved the public release of the first version of the standard, called NewsML, and called for trial implementations. NewsML is a result of the IPTC initiative begun last October to create an XML-based standard to represent and manage news through its life cycle. The ITPC2000 committee proceeded swiftly, and ratification of the NewsML DTD is slated for October. With NewsML, an individual news item can contain files of-text, photos, graphics, video, etc.-together with metadata describing the status (publishable, embargoed, etc.), revision level, copyright data, and relationships among the components. For example, an item can include the same picture or video in multiple resolutions. An important facet of NewsML is that it will support multiple metadata vocabularies. Though NewsML has defaults for metadata tags and attributes, it permits publishers to use multiple vocabularies, including their own, and provides a mechanism for specifying which vocabularies are being used. For example, text stories can make use of the existing News Industry Text Format (NITF), a DTD already developed by the wire services." See "NewsML and IPTC2000."

  • [August 24, 2000] "Jeni's XSLT Pages." XSLT tutorials from Jeni Tennison. "These pages are all about XSLT, an XML-based language for translating one set of XML into another set of XML, or into HTML. . . the pages are dedicated to helping people understand and make the most of using XSLT. Right now, these pages are just a lot of links to some of my contributions to XSL-List. I've managed to sort them into various groups, and I'm going to use them as the basis of a proper set of tutorial pages. If there are any particular areas that you'd like me to work on, please let me know. Topics include [2000-08-15]: (1) Fundamentals: Calling Stylesheets, General Processing, Handling Namespaces, Using XPaths, Escaping and CDATA Sections. (2) Specific Functions: Variables and Parameters, Using Keys, Using document(). (3) Basic Tasks: Creating Result Elements and Attributes, Copying the Source, Conditional Processing, Sorting, Numbering Output, String Manipulation, Combining Stylesheets. (4) Complex Tasks: Identifying Unique Items, Grouping, Constructing Hierarchies, Flattening Input, Combining Documents, Comparing Documents. (5) Improving Your XSLT: Debugging Stylesheets, Improving Performance, Simplifying your XSLT, Documenting Stylesheets.

  • [August 24, 2000] XML Media Types [draft-murata-xml-07.txt]. Internet-Draft, Network Working Group. August 9, 2000. "This document standardizes five new media types -- text/xml, application/xml, text/xml-external-parsed-entity, application/xml-external-parsed-entity, and application/xml-dtd -- for use in exchanging network entities that are related to the Extensible Markup Language (XML). This document also standardizes a convention (using the suffix '+xml') for naming media types outside of these five types when those media types represent XML entities. XML MIME entities are currently exchanged via the HyperText Transfer Protocol on the World Wide Web, are an integral part of the WebDAV protocol for remote web authoring, and are expected to have utility in many domains. Major differences from RFC 2376 are: (1) the addition of text/xml-external-parsed-entity, application/xml-external-parsed-entity, and application/xml-dtd, (2) the '+xml' suffix convention (which also updates the RFC 2048 registration process), and (3) the discussion of 'utf-16le' and 'utf-16be'. [...] Since XML is an integral part of the WebDAV Distributed Authoring Protocol, and since World Wide Web Consortium Recommendations have conventionally been assigned IETF tree media types, and since similar media types (HTML, SGML) have been assigned IETF tree media types, the XML media types also belong in the IETF media types tree. Similarly, XML will be used as a foundation for other media types, including types in every branch of the IETF media types tree. To facilitate the processing of such types, media types based on XML, but that are not identified using text/xml or application/xml, SHOULD be named using a suffix of '+xml' as described in Section 7. This will allow XML-based tools -- browsers, editors, search engines, and other processors -- to work with all XML-based media types." See "XML Media/MIME Types." [cache]

  • [August 24, 2000] "How to render XML documents in popular Web browsers." By Ray Seddigh (Founder, Intias Corporation -- XML Education Services) and Larry Najafi (Professor, Computer Science Department, University of Wisconsin). From IBM DeveloperWorks. August 21, 2000. ['Here's a practical step-by-step explanation of how to display XML documents in two popular browsers. The authors demonstrate the default display behaviors of Internet Explorer 5.5 and Netscape 6.0 browsers and show how to apply stylesheets for custom display. Details include sample code used to create displays of XML documents. After reading this article, you will be able to set up your XML document for browser display using default and custom display approaches, avoiding common pitfalls in the process.'] "Once you have an XML document with a .xml extension, you often want to view it in a browser. You can be forgiven for assuming that merely pointing to the document with a browser will produce the expected readable view. Alas, when developers new to XML try to display their passionately created XML documents, they are often surprised by the outcome. We've witnessed the ensuing state of disbelief when the displayed document bears little resemblance to what the developer had imagined would appear in the browser window. Confusion mounts when the same "standard" XML content displays differently in different browsers. Though the process of displaying XML in a browser is simple once you understand it, the uninitiated often work under mistaken assumptions or forget important details that lead to unexpected results. Read on for explanations of those unexpected but normal behaviors, followed by specific steps to prepare your XML document for predictable -- and desirable -- rendering by a browser. . .

  • [August 24, 2000] "Flash 5 Coming Soon from Macromedia." By [Seybold Staff]. In The Bulletin: Seybold News & Views on Electronic Publishing (July 28, 2000). "Macromedia has announced the upcoming release of version five of its Flash authoring environment. The release, due in late August or early September, will be accompanied by an updated Flash player. XML support and improved scripting top the list of new features. Support for structured XML and HTML content will enable developers to render XML and HTML-tagged data through the Flash Player. On the scripting side, Macromedia has added a full-bore script editor and tweaked its ActionScript language to be compliant with JavaScript syntax, just as it's done with Fireworks. Flash 5 will also include 'Smart Clips,' which allow the re-use of complex interactive objects across multiple Flash projects and sites; and Flash objects for multiple projects can now be stored in a central repository and updated dynamically across all projects from the central repository based on a piece of code that refers back to the database. These features should help users manage objects that get replicated across projects and sites."

  • [August 23, 2000] "XTRACT: A System for Extracting Document Type Descriptors from XML." By Minos N. Garofalakis, Aristides Gionis, Rajeev Rastogi, S. Seshadri, and Kyuseok Shim. Presented at ACM SIGMOD 2000 - International Conference on Management of Data (Dallas, TX, USA, 16-18 May 2000). Published in SIGMOD Record Volume 29, Number 2 (June 2000), pages 165-176 (with 25 references). "XML is rapidly emerging as the new standard for data representation and exchange on the Web. An XML document can be accompanied by a document type descriptor (DTD) which plays the role of a schema for an XML data collection. DTDs contain valuable information on the structure of documents and thus have a crucial role in the efficient storage of XML data, as well as the effective formulation and optimization of XML queries. In this paper, we propose XTRACT, a novel system for inferring a DTD schema for a database of XML documents. Since the DTD syntax incorporates the full expressive power of regular expressions, naive approaches typically fail to produce concise and intuitive DTDs. Instead, the XTRACT inference algorithms employ a sequence of sophisticated steps that involve: (1) finding patterns in the input sequences and replacing them with regular expressions to generate "general" candidate DTDs, (2) factoring candidate DTDs using adaptations of algorithms from the logic optimization literature, and (3) applying the minimum description length (MDL) principle to find the best DTD among the candidates. The results of our experiments with real-life and synthetic DTDs demonstrate the effectiveness of XTRACT's approach in inferring concise and semantically meaningful DTD schemas for XML databases. . . A naive and straightforward solution to our DTD extraction problem would be to infer as the DTD for an element, a 'concise' expression which describes exactly all the sequences of subelements nested within the element in the entire document collection. As we demonstrate in Section 3, however, the DTDs generated by this approach tend to be voluminous and unintuitive (especially for large XML document collections). In fact, we discover that accurate and meaningful DTD schemas that are also intuitive and appealing to humans (i.e., resemble what a human expert is likely to come up with) tend to generalize. That is, 'good' DTDs are typically regular expressions describing subelement sequences that may not actually occur in the input XML documents. (Note that this, in fact, is always the case for DTD regular expressions that correspond to infinite regular languages, e.g., DTDs containing one or more Kleene stars '*') In practice, however, there are numerous such candidate DTDs that generalize the subelement sequences in the input, and choosing the DTD that best describes the structure of these sequences is a non-trivial task. In the inference algorithms employed in the XTRACT system, we propose novel combination of sophisticated techniques to generate DTD schemas that effectively capture the structure of the input sequences... As a first step, the XTRACT system employs novel heuristic algorithms for finding patterns in each input sequence and replacing them with appropriate regular expressions to produce more general candidate DTDs. As a second step, the XTRACT system factors common subexpressions from the generalized candidate DTDs obtained from the generalization step, in order to make them more concise. In the final and most important step, the XTRACT system employs Rissanen's Minimum Description Length (MDL) principle to derive an elegant mechanism for composing a near-optimal DTD schema from the set of candidate DTDs generated by the earlier two steps. (Our MDL-based notion of optimality will be defined formally later in the paper.) The MDL principle has its roots in information theory and, essentially, provides a principled, scientific definition of the optimal 'theory/model' that can be inferred from a set of data examples. Abstractly, in our problem setting, MDL ranks each candidate DTD depending on the number of bits required to describe the input collection of sequences in terms of the DTD (DTDs requiring fewer bits are ranked higher). As a consequence, the optimal DTD according to the MDL principle is the one that is general enough to cover a large subset of the input sequences but, at the same time, captures the structure of the input sequences with a fair amount of detail, so that they can be described easily (with few additional bits) using the DTD. Thus, the MDL principle provides a formal notion of 'best DTD' that exactly matches our intuition. Using MDL essentially allows XTRACT to control the amount of generalization introduced in the inferred DTD in a principled, scientific and, at the same time, intuitively appealing fashion. We demonstrate that selecting the optimal DTD based on the MDL principle has a direct and natural mapping to the Facility Location Problem (FLP), which is known to be NP-complete. . . . [Conclusions:] We have presented the architecture of the XTRACT system for inferring a DTD for a database of XML documents. The problem of automated DTD derivation is complicated by the fact that the DTD syntax incorporates the full expressive power of regular expressions. Specifically, as we have shown, naive approaches that do not 'generalize' beyond the input element sequences fail to deduce concise and semantically meaningful DTDs. Instead, XTRACT applies sophisticated algorithms to compute a DTD that is more along the lines of what a human expert would infer. We compared the quality of the DTDs inferred by XTRACT with those returned by the IBM Alphaworks DDbE tool on synthetic and real-life DTDs. In our experiments, XTRACT outperformed DDbE by a wide margin; for most of our test cases, XTRACT was able to accurately infer the DTD whereas DDbE completely failed to do so. A number of the DTDs which were correctly identified by XTRACT were fairly complex and contained factors, metacharacters and nested regular expression terms. Thus, our results clearly demonstrate the effectiveness of the XTRACT approach that employs generalization and factorization to derive a range of general and concise candidate DTDs, and then uses the MDL principle as the basis to select amongst them." [cache]

  • [August 23, 2000] "Pillar of the Community. XML is becoming the standard platform for the interenterprise processes on which B2B e-commerce depends." By William J. Lewis (Cambridge Technology Partners). In Intelligent Enterprise Volume 3, Number 13 (August 18, 2000), pages 33-38. "The trade press calls them 'the next big thing.' Forrester Research says they're 'all the rage.' They're business- to-business (B2B) exchanges, also known as e-marketplaces or online exchanges, a growing class of Web portals that facilitate buying and selling interactions among a community of organizations. In essence, they're the trading floors of the New Economy. . . B2B data integration is a strict requirement of this process: matching buy and sell orders requires many-to-many mapping and transformation of digital transactions among various formats. In the past, these processes involved tedious, manual development, but now numerous software vendors, taking several differing technical approaches, offer packaged solutions to support these capabilities. Not surprisingly, the common thread among these approaches is the use of extensible markup language (XML) as a universal data source, destination, storage format, and in some cases, transformation language. We can classify these approaches into four overlapping categories: enterprise application integration (EAI) extensions, transformation hubs, XML-database interfaces, and XML transformation-management utilities. From an architectural viewpoint, B2B data integration is closely related to EAI, also known as application-to-application integration (or A2A, the term I'll use here). However, they have at least two important differences. First, in B2B data integration, the applications involved run inside the firewalls of two or more companies; in A2A, they run inside just one. Second, B2B technologies increasingly leverage XML and extensible style language (XSL) for data integration, in addition to the process-integration support via message brokering and workflow automation typical of A2A solutions. Until recently, most data integration projects have focused on intraenterprise goals; that is, doing A2A within a single enterprise. But with the rapid proliferation of external-facing applications such as B2B exchanges, the focus of data integration is now rapidly moving toward interenterprise applications. In such applications, transforming and moving data precisely and efficiently is critical to not just a single enterprise, but to groups of enterprises -- all the trading partners involved. And XML will be the standard input and output language of these data transformations. . . The established A2A vendors now recognize this opportunity and are joining with relative newcomers to swarm the B2B data integration space like ants on a picnic. The most visible players include Acta Technology Inc., Mercator Software Inc., Vitria Technology Inc., Extricity Inc., and Computer Associates International. Not surprisingly, most of the A2A software providers are taking a B2B approach that leverages their existing product architectures. The B2B Mapper component of Active Software Inc.'s Business Exchange Server, for example, supports the generation and consumption of XML documents via transformations between XML documents and ActiveWorks events. For example, an XML order document -- carrying customer, product, and amount elements -- could be transformed to an ActiveWorks event for subsequent routing to an inventory application. This architectural approach is similar to the XML interfaces supported by RDBMS and 'XDBMS' products. A more 'lightweight' alternative, however, is to map and transform data among just XML documents -- a strategy implemented by B2B transformation hubs. These hubs provide a neutral, many-to-many focal point for the external interfaces of buyer supply chains and provider delivery chains. Standalone, full-function transformation hubs are on the cutting edge, and are either in beta or have just recently been made available. Examples include Microsoft's BizTalk Server, in beta as of this writing; IBM's WebSphere Commerce Suite Marketplace Edition, scheduled for release in Q3 2000; Intershop Communications' Marketplace Toolkit; Tron, under development by Allaire Corp.; and OnDisplay Inc.'s (recently acquired by Vignette Corp.) CenterStage eBizXchange. Because B2B exchange partners will probably use disparate formats even within the same transaction type, a vehicle for rationalizing and transforming these various formats is critically needed. Given the numerous existing and possible standard and nonstandard data interchange formats -- EDI, X12, SWIFT, delimited, positional, multiple XML document type definitions (DTDs) and schemas, and so on -- establishing an 'any-to-any' hub transformation architecture would yield an unmanageable Cartesian product of mappings..."

  • [August 23, 2000] "Talking About BizTalk. BizTalk's functionality includes a messaging component for integration and an Orchestration engine for modeling -- and all the work is done in XLANG, an XML language." By Steve Gillmor and Sean Gallagher. In XML Magazine (Fall 2000). "Integration is necessary, but it's not sufficient to organize a business process that spans lots of applications and lots of organizations. That's where the Orchestration -- the second piece of the functionality -- comes in. It's based on top of Visio, where you can model your business processes graphically, and connect them or bind them to components and applications and trading-partner relationships that you establish in BizTalk Messaging. We take the visual representation of your business process and generate XLANG -- an XML language for describing processes -- that is executed by the Orchestration engine at runtime. The fact that XLANG is an XML-based language is really arbitrary -- it could be anything. We've chosen XML because there's a lot of value in being able to share your business processes with other people -- [business processes] that are actually running on other platforms. XLANG is a full-fledged language that's complete and sound. We say business processes but it's used for describing any process. It's complementary to the Orchestration engine, which is essentially a finite state machine. If you took any electrical engineering classes, you probably had to build a finite state machine that counted to 10 in hardware. And it's essentially a simple version of what we've built in software. The language is just semantics for describing a process. It has support for spawning things concurrently, which is very difficult to do today: true concurrency, where there are things running at the same time and you are coordinating these concurrent processes together within a larger process. There's support for long-running transactions -- putting these loosely coupled message-based kinds of interactions within a transactional context. Just like today, if we want a two-phase commit on a component, we'll wrap it in MTS (Microsoft Transaction Service)..."

  • [August 22, 2000] "XML and the Resource Description Framework: The Great Web Hope." By Norm Medeiros (Technical Services Librarian, New York University School of Medicine). In ONLINE Magazine Volume 24, Number 5 (September 2000). "What incentive do search engine companies have for altering their indexing and rating algorithms? Moreover, what motivation do Web content providers have for implementing an intricate metadata standard if such efforts are futile at increasing retrieval or page ranking? The World Wide Web Consortium's (W3C) Resource Description Framework (RDF) and Extensible Markup Language (XML) offer a potential means to enhanced resource discovery on the Web. But will they work? [...] XML, Extensible Markup Language, is the syntax for RDF. Not surprisingly, XML was also developed by the World Wide Web Consortium and shares similarities with its metadata sibling. Both XML and RDF are concerned with information contexts. RDF focuses on this concern in regard to metadata, and uses the XML namespace facility to parse element sets. XML is poised to dethrone HTML as the language of the Web since it is extensible (not restricted to a limited number of elements), and supportive of automated data exchange (able to create contextual relationships that are machine-processable). In particular, XML's document type definition (DTD) and Extensible Stylesheet Language (XSL) separate the context from the display of information--a division HTML is incapable of achieving. . . Although RDF, XML, and a number of semantic standards avail themselves to the metadata community, the question remains: Will they work? Clearly, locally-defined metadata projects will benefit from the W3C's commitment to the Semantic Web. OCLC's Cooperative Online Resource Catalog (CORC) and UKOLN's DC-DOT Generator are just two examples of projects utilizing RDF in an attempt to populate the Internet with highly descriptive, machine-readable metadata. If digital signature technology can operate within the framework of RDF, perhaps commercial search engines will once again trust content providers to incorporate and associate appropriate metadata within their works. Will searching the Web one day be like searching an OPAC? Perhaps. It can't get any worse than it is...can it?" See "Resource Description Framework (RDF)."

  • [August 22, 2000] "Tools and Relatively Painless How-to for Working with XML and XSL/T on Windows and Mac." By John Robert Gardner, Ph.D. (ATLA-CERTR, Emory University).

  • [August 22, 2000] "What is XSLT? [XSLT Tutorial, Part One.]" By G. Ken Holman. From (August 16, 2000). [Edd Dumbill says: "I'm delighted to publish this comprehensive introduction to the W3C's extensible stylesheet technology from the universally respected XSLT expert, G. Ken Holman. This week and next, Ken will take us through both XSLT's place in the world of XML standards and its practical applications. The first three sections of "What is XSLT?", published this week, introduce XSLT and XPath -- the concept of styling structured information and architectures for applying XSLT stylesheets."] "Now that we are successfully using XML to mark up our information according to our own vocabularies, we are taking control and responsibility for our information, instead of abdicating such control to product vendors. These vendors would rather lock our information into their proprietary schemes to keep us beholden to their solutions and technology. But the flexibility inherent in the power given to each of us to develop our own vocabularies, and for industry associations, e-commerce consortia, and the W3C to develop their own vocabularies, presents the need to be able to transform information marked up in XML from one vocabulary to another. Two W3C Recommendations, XSLT (the Extensible Stylesheet Language Transformations) and XPath (the XML Path Language), meet that need. They provide a powerful implementation of a tree-oriented transformation language for transmuting instances of XML using one vocabulary into either simple text, the legacy HTML vocabulary, or XML instances using any other vocabulary imaginable. We use the XSLT language, which itself uses XPath, to specify how an implementation of an XSLT processor is to create our desired output from our given marked-up input. XSLT enables and empowers interoperability. This introduction strives to overview essential aspects of understanding the context in which these languages help us meet our transformation requirements, and to introduce substantive concepts and terminology to bolster the information available in the W3C Recommendation documents themselves." [From Crane Softwrights Ltd. published commercial training material titled Practical Transformation Using XSLT and XPath.] For related resources, see "Extensible Stylesheet Language (XSL/XSLT)."

  • [August 22, 2000] "ebXML: Assembling the Rubik's Cube." By Alan Kotok. From (August 16, 2000). ['The fourth meeting of the Electronic Business XML intiative sees the project make good progress. However, much remains to be achieved in order to hit ebXML's self-imposed 18-month deadline.'] "The fourth meeting of the Electronic Business XML (ebXML) initiative, 7-11 August 2000 in San Jose, California, saw the project consolidate some of its early progress, add new functionality, show off more of its messaging capabilities, and attract an important new industry partner to its cause. But despite the progress, participants at this latest meeting were bogged down fitting important pieces of its technology together to complete this comprehensive specification on time. The ebXML initiative, started last November, is creating a global framework for the exchange of business data using XML across industries and among businesses of all sizes. And while at first glance it may look like another business framework using XML, ebXML hopes to combine the experience from 20 years of EDI with XML's capabilities to fix EDI's shortcomings, which has prevented all but the world's largest businesses from enjoying the productivity benefits and process improvements of exchanging data electronically. The ebXML architecture combines message format specifications with business process models, a set of syntax-neutral core components, and distributed repositories with which businesses will interact. In their earlier meetings, ebXML's project teams wrote the requirements and outlined the various parts of the architecture. In this meeting, the development teams continued defining the individual specifications, but also started to reconcile these various parts..." See "Electronic Business XML Initiative (ebXML)."

  • [August 22, 2000] "Write Once, Publish Everywhere." By Didier Martin. From (August 16, 2000). ['In the first of an ongoing series of "Didier's Labs", we embark on the process of building a multi-device XML portal. The end result will be an XML server that can be accessed via desktop web browsers, mobile phone WAP browsers, and by voice using VoiceXML. In this installment, Didier outlines the project and models the login process for the server.'] "Our task is to create interactive applications using XML technologies. These applications should be accessible on three different devices: the telephone (VoiceXML), mobile phone mini browsers (WML), and finally, PC browsers (HTML). To realize this vision, we will create abstract models and encode them in XML. Then, we must recognize the device and transform these models into the appropriate rendering format. The rules of the game are to use, as much as possible, XML technologies which are freely available, and also to restrict our scope to XML technologies documented by a public specification." See "VoiceXML Forum."

  • [August 22, 2000] "Platform-Neutral Interoperability with SOAP. Big player support gives momentum to a new Internet protocol for distributed programming." By Kent Brown. In XML Magazine (Fall 2000). ['Kent Brown shows how SOAP, a new Internet protocol, will bring about a new age of interoperability by strengthening interaction among distributed computing systems.'] "The goal of SOAP is to eliminate (or at least penetrate) the walls separating distributed computing platforms. SOAP attempts to do this by following the same model attributes as the other successful Web protocols: simplicity, flexibility, platform neutrality, and text-based. In fact, when you get down to it, SOAP is less a new technology than a codification of the usage of existing Internet technologies to standardize distributed computing communications over the Web. SOAP started out as XML-RPC, a very simple protocol designed by Dave Winer of UserLand for doing RPC over HTTP using XML. SOAP 1.0 was a refinement of the original XML-RPC specification and was released in September 1999. Don Box of DevelopMentor, and several engineers from Microsoft, helped with the 1.0 specification. Initially IBM and Sun, who were getting involved in a similar but more ambitious effort called ebXML, dismissed SOAP as unimportant. This would have been a shame because an interoperability protocol is useless if the big players choose not to interop. Fortunately for SOAP and the entire distributed computing industry, IBM changed its mind and now not only supports SOAP but helps to drive it. David Ehnebuske of IBM and Noah Mendelsohn of the IBM subsidiary Lotus Development Corporation are co-authors of the SOAP 1.1 specification. Proving that its support of SOAP is strong, IBM was quick to supply a reference implementation of SOAP for Java. In fact, IBM beat Microsoft to the punch by several weeks, delivering SOAP4J a few days after SOAP 1.1 was announced. By contrast, the long-awaited Web Services Toolkit (renamed the SOAP Toolkit) was not released until more than a month later. Open source fans will be happy to hear that IBM has donated its SOAP4J reference implementation to Apache. . . Although synchronous RPC was the original intent of SOAP, the latest specification adds flexibility that allows SOAP messages to be used for asynchronous and one-way message passing schemes as well. In this type of usage, the format of the SOAP Body does not follow the SOAP encoding rules defined in Section 5 of the SOAP 1.1 specification, and the data doesn't necessarily map to the parameters of a specific RPC. The recipient of the XML document needs to know how to interpret it, which will usually be accomplished by referencing an XML schema that defines the structure of the document. There are some restrictions on the types of XML documents that can be included in SOAP messages. SOAP prohibits DTDs and Processing Instructions and therefore you can't necessarily pass an existing XML document (say a RosettaNet message) in a SOAP call if the XML document wasn't designed to be SOAP friendly. Perhaps the best example of the application of SOAP to messaging scenarios is the draft of the BizTalk Framework version 2.0 released on June 27, 2000. This newest version of the BizTalk Framework has been redefined to be SOAP 1.1 compliant, allowing BizTalk Framework XML documents to be transmitted in the form of SOAP messages. In addition, BizTalk Framework version 2.0 has been extended to provide Multi-Part MIME (Multipurpose Internet Mail Extensions) encoding guidelines to support the inclusion of one or more non-XML attachments within a BizTalk message. This will also allow non-SOAP-friendly XML documents to be passed in SOAP messages. All in all, the latest developments add up to good news for SOAP and the future of cross-platform interoperability. With luck, you'll soon find yourself reaping the benefits of using SOAP to provide or consume Web services." See "Simple Object Access Protocol (SOAP)."

  • [August 21, 2000] "Bowstreet's Web Factory: Mass-Producing Business Webs." By Brian J. Dooley. In The Seybold Report on Internet Publishing Volume 4, Number 12 (August 2000), pages 1, 3-6. ['As Web publishing continues to blend with e-business, companies face new challenges in pulling content from diverse sources onto their sites. Whether your service connects to consumers, to other businesses, or to both, it's no easy feat to integrate personalized, real-time data from your partners with the content of your site. To help streamline this complicated process, Bowstreet has developed the Business Web Factory, a toolkit sites use to glue each others' services into an intergated whole for their customers. Its innovation -- exposing LDAP directory services as XML files -- has become the basis for an emerging standard (DSML) endorsed by rivals Microsoft, Novell, Sun, Oracle and IBM. It also is the foundation for Bowstreet's Business Web Factory and its new online service (at for building Web exchanges. . . Using XML to describe directory resources and profiles, Bowstreet's Business Web Factory and its sister service Business Web Exchange help companies build, share and snap on reusable software building blocks to create customized Web sites. Bowstreet's approach is designed specifically to scale when multiple trading partners collaborate on commercial sites for their customers.'] "Bowstreet's Business Web Factory is one of a new generation of tools designed to meld services and content from different vendors into a seamless web-based package, which might be presented to the customer as an integrated account or purchasing experience. Business Web Factory simplifies the complex task of rolling out customized sites in the thousands by applying a building-block approach. This is the same concept touted by Microsoft in its .NET initiative and supported in various ways by all of the vendors who are proponents of XML. Integration of third-party services in Web sites is hardly a new idea; it has existed since Web-based delivery tracking could be hot linked to a vendor's ordering system, or since financial and ordering services were made available to vendors via a seamless third-party checkout process. The difference lies in the level of integration, the flexibility in adding whole new kinds of services, and the ease with which new standards such as XML and SOAP permit integration between systems, legacy data, databases, delivery paradigms and the like. Business Web Factory relies upon two standards whose support base is now solidifying: XML and LDAP. XML uses rule files and descriptive tags to unambiguously identify and describe data and provide vendor-neutral hooks for processing it. Directory services based on LDAP keep track of services and profiles needed to create a customized Web site. Business Web Factory ties XML and directory services together with an automatic building process based on reusable models. This approach simplifies development and deployment by making it possible to construct complex Web operations using XML-based building blocks containing attributes that can be switched on or off according to need. If your services have not yet been converted to XML, Business Web Factory provides tools to either convert them or to directly incorporate other components, such as Java classes, COM and CORBA objects and Enterprise Java Beans..." See "Directory Services Markup Language (DSML)."

  • [August 21, 2000] "Battle Lines Forming Over Next E-Publishing Platform." By Mark Walter. In The Seybold Report on Internet Publishing Volume 4, Number 12 (August 2000), pages 18, 24. [Commentary. 'Microsoft's ambitious foray into e-books will heighten the PDF-OEB distinctions, but look for other issues, especially DRM, to be the real points of contention.'] "Microsoft Reader recognizes books formatted in Open E-Book format, encrypted using Microsoft's specific digital rights management (DRM) technology, which at this time is incompatible with Adobe's own WebBuy DRM. Microsoft Reader 1.0 is a first-generation product, with plenty of limitations that leave it unsuitable for many types of documents that Acrobat can handle -- documents with unusual characters, non-Roman languages and high-resolution graphics, to name just a few. But Reader, like Web browsers (Open E-Book is based on XML and HTML), will point out the weakness of Acrobat for on-screen reading, and it will improve -- Microsoft has already said it intends to upgrade the product to make it suitable for nonfiction and textbooks. One thing is for sure: the rise of OEB readers represents a real opportunity for printers and other service firms to offer e-book conversion and manufacturing services. Yes, there will be tools available that publishers can buy, but as the Reader itself matures in capability, so will the complexity of the task of re-expressing books written and designed for print in another medium. . . publishers will not be choosing between PDF and OEB; they'll be making both. Acrobat, because it still has uses beyond on-screen reading (including delivery of final form documents) will not suffer Navigator's fate, at least not that quickly. The real battle then, will not be over the data format, but over DRM." See "Open Ebook Initiative" and "Digital Property Rights Language (DPRL)."

  • [August 21, 2000] "Wireless Wars: Broadvision, Vignette launch platforms for mobile publishing." By Mark Walter. In The Seybold Report on Internet Publishing Volume 4, Number 12 (August 2000), page 13. ['Dueling for supremacy in the e-business application software market, powerhouses Vignette and Broadvision announced upgrades to their products that will automate the publishing of personalized Web content on wireless devices.'] Broadvision plans personalization: For Broadvision, the wireless upgrade to its One-to-One product family is the result of its acquisition of Interleaf last year. Broadvision is drawing on Interleaf's XML expertise -- including XML-to-WML transformations -- to build a production engine that can take personalized content in XML and convert it to various formats and templates on the fly. Today, this conversion runs inside One-to-One Publisher (formerly known as Interleaf BladeRunner), which lacks the rules-based personalization that characterize Broadvision's One-to-One product. By the end of the year, Broadvision plans to transfer this facility to its One-to-One Enterprise content-delivery server -- in the form of an XSLT engine. Support for XSL style sheets will make it easy for Broadvision and its customers to add support for a wide array of device formats..."

  • [August 21, 2000] XForms Requirements. Reference: W3C Working Draft 21-August-2000, edited by Micah Dubinko (Cardiff), Sebastian Schnitzenbaumer (Mozquito Technologies), Malte Wedel (Mozquito Technologies), and Dave Raggett (W3C/HP). "After careful consideration, the HTML Working Group decided that the goals for the next generation of forms are incompatible with preserving full backwards compatibility with browsers designed for earlier versions of HTML. A forms sub-group was formed within the HTML Working Group, later becoming the XForms Working Group. It is our objective to provide a clean new forms model ('XForms') based on a set of well-defined requirements. The requirements described in this document are based on experience with a broad spectrum of form applications. This document provides a comprehensive set of requirements for the W3C's work on XForms. We envisage this work being conducted in several steps, starting with the development of a core forms module, followed by work on additional modules for specific features. The Modularization of XHTML provides a mechanism for defining modules which can be recombined as appropriate for the capabilities of different platforms." [cache]

  • [August 21, 2000] "Inside XSL-T." By Michael Classen. From (August 2000). ['XML may be all the buzz, but a lot of the heavy lifting is actually done with XSL. Extensible Style Sheet Language Transformation sounds like a mouthful, but XSLT is your key to converting XML documents into display languages like HTML.]' "XSL was initially devised to solve two problems: (1) Transforming an XML document into something else. (2) Formatting an XML document for display on a page-oriented device, such as a printer or a browser. Subsequently it has proven difficult to solve the second problem in a fashion that satisfies all the different requirements from low resolution screen displays all the way to hi-res printing and copying. Furthermore, screen formatting is currently done with Cascading Style Sheets (CSS), so little interest developed in yet another method. The World Wide Web Committee (W3C) then decided to split the two tasks into separate sub-standards, XSL Transformations (XSL-T) and XSL formatting objects (XSL-FO). While XSL-T has been an official recommendation since November of last year, XSL-FO is still in the making. A transformation expressed in XSLT describes rules for transforming a source tree into a result tree. The transformation is achieved by associating patterns with templates. Whenever a pattern matches elements in the source tree, a template is used to create part of the result tree. The result tree is separate from the source tree, and their structures can be completely different. In constructing the result tree, elements from the source tree can be filtered and reordered, and new elements can be added. A transformation expressed in XSLT is called a stylesheet in the case where XSLT is transforming into a display language, such as HTML or WML..."

  • [August 21, 2000] "Site News Application Using ASP and Server-side XML." By Phil Sandler. In ASP Today (August 18, 2000). "As web programmers, it usually falls to us to provide a means to get this information in front of the user in a timely fashion. There are several ways that this can be accomplished. (1) opening a database connection and querying an underlying news table.... (2) generating and publishing flat HTML files in a scheduled or manual batch process. In this article, we are going to examine a third method, one that falls squarely between database driven ASP and flat HTML. We are going to use batch -- updated XML to hold our news data and ASP to serve it up on the fly. We are going to use an Intranet departmental news application as an example. Many of the concepts discussed here could easily be adapted to any single- or multiple-category news application. The broader concepts could be applied to any system that requires periodic content updates." See also: "Enhance Input Forms Using XML."

  • [August 19, 2000] "Gnome gets major commitments." By Dan Neel and Tom Sullivan. In InfoWorld (August 16, 2000). "During an impressive show of support from a wide range of industry players for the Gnome desktop environment, both Hewlett-Packard and Sun Microsystems declared Tuesday that they will use the Gnome environment as the default desktop interface for all future Unix PC shipments. Both companies will, however, continue to ship Microsoft's Windows desktop environment as the default for non-Unix PCs. Gnome, a free software project featuring more than 500 developers, has built what its proponents say is an easy-to-use, free desktop environment with the potential to unify both Linux and other Unix-like operating systems through a common interface. Up to now, industry observes have criticized Linux's interface as being too complicated for the majority of desktop computer users. The surprise announcements from the two companies came during a Gnome press conference at the fourth LinuxWorld Conference and Expo, in San Jose, California. . . Two primary goals were announced at Tuesday's press conference: the creation of the Gnome Foundation, which will be governed by a board of directors elected by volunteer developers who contribute to Gnome, and a list of five major Gnome initiatives aimed at creating an industry-wide open user environment. The first initiative is the establishment of the Gnome user environment as the unifying desktop for the Linux and Unix communities. The second is the adoption of technologies for integration into Gnome. is the open-source project through which Sun Microsystems is releasing the technology for the popular StarOffice productivity suite. Sun will also begin specifying XML office formats and publish them on the OpenOffice site, according to Boerries. The third initiative is the integration of the Mozilla browser technology into Gnome. Mozilla is the free, open-source version of the Netscape browser."

  • [August 19, 2000] "E-commerce puts directories in spotlight." By Stephanie Sanborn. In InfoWorld Volume 22, Issue 32 (August 07, 2000), page 36. "The explosion of business-to-business commerce and marketplaces -- environments where partners, customers, and suppliers interact using the same core of information -- are giving directories a chance to shine. To help leverage the store of valuable user data held in directories, the industry is turning toward metadirectory solutions and XML to simplify the task of integrating applications and systems. Serving as a layer between the directory itself and the business applications, metadirectories can help link multiple systems to relevant directory information, becoming a 'glue' to ensure all applications are synchronized around the same user data. . . According to Kim Cameron, an architect for Microsoft Metadirectory Solutions (MMS), XML is a 'directory wonder child' because it can tie together platforms and link identities across unconnected systems. Redmond, Wash.-based Microsoft recently released MMS 2.2, which has enhanced provisioning systems and a tighter interface with Active Directory. The XML wizard management agent for MMS 2.2 will allow MMS to interoperate with any XML data store and should emerge this fall. Officials said the company is planning to use XML in many applications, such as BizTalk, to ease application convergence. 'XML is certainly a piece of technology that's going to make using directories an awful lot easier, because the directory language is now basically in English -- using XML. It's very easy to look at XML-based files and coding and understand the kind of information that's being moved back and forth,' explains Jackson Shaw, a product manager for Windows 2000 server at Microsoft. 'XML is picking up steam, but there is still a long way to go to get a lot of applications, especially legacy apps, moved over to speaking XML.' For iPlanet, making sure developers can access directory information and use it in their applications is of utmost importance, because more directory-enabled applications result in more reasons to implement directories for business use, explains Wes Wasson, vice president of product marketing at Mountain View, Calif.-based iPlanet. The iPlanet Directory Application Integration (DAI) architecture sits atop the Unified User Management Services layer, which includes iPlanet directory and customer management solutions. Through the DAI, application developers can incorporate directory information into their applications using a series of tags rather than having to write out code and understand the intricacies of LDAP. According to David McNeely, director of product management for iPlanet directory and security products, a platform such as DAI that is extensible and makes a directory valuable to developers will help directory and metadirectory technology evolve. Novell is also aiming to ramp up directories' value through its DirXML technology, which is finally coming to light after being announced about a year ago."

  • [August 19, 2000] "Eliminate Web publishing bottlenecks. [Documentum 4i Review.]" By Mike Heck. In InfoWorld Volume 22, Issue 32 (August 07, 2000), page 62. "Documentum 4i eBusiness Edition 4.0 provides a first-rate solution for large-scale content management. Global 2000 and Internet companies will find this product moves content quickly from authors, through approvals, and onto live Web sites, allowing e-businesses to respond quickly to changes in the online marketplace and deliver timely, relevant information to customers and business partners. . . Documentum's template-based XML-authoring environment permits documents to be repurposed for different needs. The WebPublisher module moves content through a formal approval process and provides a tailored work view according to the role of the user. Enterprise scalability allows deployment to large server farms; XML templates accelerate site updates. . . creating new pages requires very little experience. Using predefined XML templates presented within a Web form, I selected the appropriate page header and footer and formatted the body of the page using a word processor like interface." ['Documentum 4i eBusiness Edition provides sophisticated native support for XML content management from content capture and creation through component management, format transformation and content delivery.]'

  • [August 17, 2000] "Test Center Analysis: Apache's open-source XML Project will help speed development and adoption of XML." By Maggie Biggs. In InfoWorld Volume 22, Issue 33 (August 14, 2000), page 48. [Part of special coverage 'Linux In Depth'.] "The folks at The Apache Software Foundation (formerly known as Apache Group and creators of the popular Apache Web server) believe they have the answer: Meld XML implementation with open-source development paradigms. By leveraging the speed of open-source development and Apache's link with the standards bodies that are defining XML, companies can be on the fast track to implementing this leading-edge technology. Apache's XML Projectwas launched last fall with contributions from the open-source community as well as leading industry participants, such as Sun Microsystems and IBM. The project has three core goals that will translate into a competitive edge for companies that take advantage of the project. According to the project's Web site, the first objective is to create 'commercial-quality, standards-based XML solutions that are developed in an open and cooperative fashion.' The second goal is to provide feedback on XML implementations to standards bodies, such as the World Wide Web Consortium, or W3C. Lastly, the group expects to provide XML resources for other Apache projects. Apache's plans will widen the adoption of XML and speed up the process of implementing the standard, which will benefit businesses that plan to implement XML as a core part of their e-business strategies. In an unusual move, several commercial vendors and open-source XML developers have contributed technologies to the project, hoping to help put XML on the fast track. For example, IBMcontributed its XML parser technology for Java and C++; the parser reads and validates XML documents. Also, Sun Microsystems contributed its XML parser and validation technology. In addition, Lotus Development offered the code for its XSLT (Extensible Stylesheet Language Transformation). XSLT is useful for reorganizing XML documents from one format to another. DataChannel contributed its Xpages technology, which helps developers build XML applications that integrate data from disparate sources. . ."

  • [August 17, 2000] "Rational unveils one-stop development kit." By Tim Fielden. In InfoWorld Volume 22, Issue 33 (August 14, 2000), pages 58-59. "Rational Software, the company best known for its modeling solutions, has bundled many of its stand-alone tools into an integrated, all-in-one kit intended to serve your staff throughout the entire life cycle of a project. In the box, we found tools grouped into three general categories. There were the developers' tools, including Rose, a visual component modeling program; Robot, an automated testing device; TestFactory, a reliability testing product; Purify, an automatic run-time error checker; and Quantify, a performance-testing application. We also had a group of collaboration tools at our disposal: RequisitePro for managing requirements, ClearQuest for tracking defects and managing change control, and SoDA for automatically generating product documentation. Round-trip engineering: Another bonus [is that] we could reverse-engineer dynamic Web sites consisting of Active Server Pages and HTML into a visual model. A round-trip engineering feature lets you model, generate, and reverse-engineer XML DTDs (Document Type Definitions), dramatically increasing the speed and quality of e-commerce application development. Furthermore, we were happy to find that the Suite provided support for both the IBM Application Framework and Windows Distributed interNet Application. Apart from Rational's tried-and-true roster of applications, some features are entirely new. For example, there's improved business modeling and a Web interface for managing requirements, which lets users add, change, and view requirements regardless of their geographical location. You also get support for round-trip engineering of Enterprise JavaBeans component architectures into UML visual models, meaning that changes to the source code will be reflected in the models and vice versa. The benefit: Users can model and understand how components, whether developed or purchased, fit into the overall application architecture."

  • [August 17, 2000] "NewsML Controlled Vocabularies and DUIDS." "I have updated the NewsML Controlled Vocabularies so they all have Duids. There is also some additional vocabularies and an XSLT stylesheet to view them in. The compressed package has also been updated. Changes are at "Duid is a 'Document-unique Identifier'. It must satisfy the rules for XML ID [type] attributes: it must only contain name characters, and it must start with a name-start character (not a digit). Its value must be unique within any NewsML document. Every NewsML element type has Duid as an optional attribute. Combined with the Identifier element, providing a value for the Duid of any element in a NewsML document makes the element globally identifiable. The Identifier element gives global identification to the document, and the Duid provides local identification for the element within the document." Compare 'Euid'. "Euid is an 'Element-unique Identifier'. Its value must be unique among elements of the same element-type and having the same parent element. Use of Euid attribute makes it possible to identify any NewsML element within the context of its local branch of the NewsML document tree. This makes it possible to copy, or include by reference, subtrees into new combinations in ways that would break the uniqueness of Duids (thereby forcing new Duids to be allocated), but still being able to retain the identity of each element. If Euids are maintained at every level, it is possible to identify, for example 'The ContentItem whose Euid is abc within the NewsComponent whose Euid is def'. Such identification patterns would be preserved even after 'pruning and grafting' of subtrees..." [From David Allen, 17 Aug 2000.] See "NewsML and IPTC2000."

  • [August 15, 2000] "XML Markup Language. [Hardcopy.]" By Deborah Lynne Wiley. In EContent Magazine (August 2000). ['This month, HARDCOPY looks at that up and coming markup language XML (eXtensible Markup Language), developed by the W3C. The books provide four different perspectives: application, enterprise development, hands-on, and overview. Pick your favorite and dive into this strange new world! (1) Data on the Web: From Relations to Semistructured Data and XML. By Serge Abiteboul, Peter Buneman, and Dan Suciu. "When we think about the Web, we tend to think about text. When we think about databases, we tend to think about data. This book tries to combine the two views and show how the use of XML as a data exchange language can enhance the flexibility, adaptability, and scalability of your information..." (2) Building Corporate Portals with XML. By Clive Finkelstein and Peter Aiken. "Portals are the buzzword of the year. Internet portals guide millions of users to their desired Web site. Corporate portals are designed to guide your internal users through the masses of data within your enterprise, and to gain new knowledge from combining and analyzing previously separate data. This hefty book aims to guide you through the entire process..." (3) XML Web Documents from Scratch. By Jesse Liberty and Mike Kraley. "If you prefer learning by doing, this book is a step-by-step guide to actually creating XML documents. It starts with a Microsoft Word 2000 document, and uses the "save as HTML feature" to start the XML coding process. Apparently in this latest version of Word, Microsoft has built in some XML tagging along with the standard HTML. The authors refer to an accompanying CD-ROM that contains the source code and sample documents..." (4) The ABCs of XML. By Norman Desmarais. "This book provides a general overview of XML and how it might be used in libraries. Written by a librarian, it is more focused on relevant applications than the previous three books. However, you will not learn to code XML documents using this book. What you will learn are concepts and issues related to XML..."

  • [August 15, 2000] "Revolutionizing Content for XML." By Debbie Kenny (Information Mapping). August 2000. From American Society for Training & Development (ASTD), and LearningCircuits. ['Major Web tools are rushing to support XML because it promises to revolutionize how information is created, stored, and accessed. Many organizations are already using XML as the basis for creating sophisticated database-publishing solutions.'] "Organizations that publish large quantities of content over the Internet, intranets, and extranets are discovering that XML has several upgrades useful for developing industrial-strength applications. Perhaps the most significant benefit of XML is its capability to store metadata--information that defines a document's logical structure -- that databases can access and use in different ways. For example, a product description could be tagged in XML and accessed for use in a product specification list, a Website, a training manual, and a troubleshooting guide. When the information is revised, its various forms are immediately updated. Furthermore, XML separates content from layout data, allowing greater flexibility for design and redesign. The effectiveness of XML hinges on a consistent and standard structure. Often, when an organization begins to implement an XML-based solution and its supporting technologies, it encounters difficulties converting source information into the new format. Most companies discover that existing information was created in an inconsistent manner and lacks standard style use, clear units of information, and a discernable document structure."

  • [August 15, 2000] "XML: Green Light, Go." By Brian Maschhoff. August 2000. From American Society for Training & Development (ASTD), and LearningCircuits. ['The drive to create interoperability standards among digital learning products has people singing future praises of Extensible Markup Language. Here's why you should include XML in your e-learning strategy now.'] "Eventually, all e-learning products will use XML, making developing, delivering, and managing Web-based training easier and faster. In all likelihood, though, HTML (the Web publishing standard) won't be singing its swan song for years to come. So, what does XML mean for training now? XML: (1) Streamlines content development. What makes XML special is how it tags -- or labels -- learning objects with precise, customized descriptions, making the content 'smarter.' HTML is a markup language that tells Web browsers how to present data on Webpages, but XML markup tags tell applications what the content means and how it's organized. (2) Helps make learning content more interactive. Making your content interactive and engaging is one key to effective online learning. XML can simplify the integration of multimedia elements into courseware, making it easier to develop and maintain. (3) Adds long-term value to learning content. Developing content is expensive, so it's essential to create it in a format that eases maintenance and reuse. One of the key benefits of Web-based training is that content can, at least in principle, be updated instantaneously. If the content is stored in static HTML pages, however, maintenance costs over time can exceed development costs quickly. (4) Enhances content delivery. XML's flexibility and versatility has wide-reaching implications for how online courses will be delivered. Delivering training content in XML will enable e-learners to tailor their experience by choosing the most effective presentation style for their needs. Supplemental information will be included in an accessible, yet unobtrusive way..."

  • [August 15, 2000] "XML 101." By Brian Maschhoff. August 2000. From American Society for Training & Development (ASTD), and LearningCircuits. "Both XML and HTML are markup languages. HTML tags indicate how content should be displayed in a browser, and XML tags organize content by meaning. XML data exchange will become practical only when the participants agree on a standard, and proof that XML is gaining momentum can be found in the vigorous efforts by various industries and organizations to collectively define schemas for their areas. The premier collection of these can be found at the Organization for the Advancement of Structured Information (OASIS) repository at Several standards for computer-based instruction have been developed in the last few years. The most recent of these is the Sharable Courseware Object Model (SCORM) released by the Advanced Distributed Learning Initiative (, and a schema in DTD format has also been made available. Though the standards are still evolving, no one should be discouraged from using XML immediately. XML created using one schema can usually be converted to another using XSLT (Extensible Style Language Transformations), yet another goody in the XML tool bag. What's important is to begin creating structured information now."

  • [August 12, 2000] "XHub: An Online Service for Creating OEB eBooks from XML Documents." By Elli Mylonas (Brown University). Presentation at Extreme Markup 2000 (August 2000). "Brown University's Scholarly Technology Group has developed a web-based environment, based on an underlying XSLT conversion architecture, to support the creation of OEB (Open eBook Publication Structure) ebooks from XML inputs. This service allows users to perform intelligent conversions of documents in formats like XHTML, TEI, DocBook, and others, into XML eBook Publications. This presentation will describe the design of XHub, some of the interesting problems solved in the course of its development, and some broader issues related to managing real-world XML transformations. We will also describe plans to use XHub as a test bed for exploring topics such as annotation exchange." See the earlier announcement for XHub and (generally) "Open eBook Initiative."

  • [August 11, 2000] "Using Regular Tree Automata as XML Schemas." By Boris Chidlovskii (Xerox Research Centre Europe, Grenoble Laboratory, 6 Chemin de Maupertuis, F-38240 Meylan, France). Pages 89-98 (with 28 references) in Proceedings of IEEE Advances in Digital Libraries 2000 [Washington, DC, USA, May 22-24, 2000; Los Alamitos, CA, USA: IEEE Computer Society, 2000, edited by J. Hoppenbrouwers, T. de Souza Lima, M. Papazoglou, and A. Sheth]. PostScript, 147K. Document: P91004. "We address the problem of tight XML schemas and propose regular tree automata to model XML data. We show that the tree automata model is more powerful than the XML DTDs and is closed under main algebraic operations. We introduce the XML query algebra based on the tree automata model, and discuss the query optimization and query pruning techniques. Finally we show the conversion of tree automata schema into XML DTDs. . . Based on SGML (ISO 8879), XML is designed to meet the challenges of large-scale electronic publishing and data exchange on the Web. XML documents are composed of nested elements and the logical structure of elements is defined by Document Type Definitions (DTDs). Defined as grammars, DTDs are highly flexible; the structure they impose in documents is often less restrictive than the rigid structure of relational schemas but more restrictive than allowing any-to-any relationships between object types. The knowledge of DTDs is highly useful for the query formulation through the user graphic interface and for the XML query optimization. However, despite all these features, DTDs appear to be somewhat unsatisfactory as a schema formalism. Some obvious missing things in DTDs, such as elementary data types, structural and validation constraints, have been recently addressed in the W3C XML Schema proposal. There remains however another conceptual difficulty with DTDs, not addressed by the W3C proposal. The DTDs appear to be surprisingly inflexible when one attempts to capture modifications imposed by various XML retrieval and manipulation mechanisms. Even for simple XML selection queries, DTDs face the tightness problem, when no DTD can precisely represent the structure of query results. To understand the cause of the tightness problem, we have to look at the way SGML/XML define a document structure. The content models of individual elements in XML documents are regular expressions. However, the language defined by an entire DTD is not regular; it is formalized as the extended context-free language (ECFL). The class of context-free grammars is not closed under certain elementary operations, such as intersection and negation and extending context-free grammars with regular expressions brings in no additional power. Consequently, if an XML query is more complex than a regular expression, we cannot necessarily make a context-free grammar (the result DTD) for the set of documents fitting both the original DTD and the query formulation. . . In this paper, we address the problem of tight XML schemas and introduce a novel mechanism for modeling XML documents, based on tree automata. This model owns a number of beneficial properties similar to the string regular expressions; in particular, tree automata form the boolean algebra. With the tree automata model, we consider a powerful set of algebraic operations, defined in the way similar to the relational algebra. For any query expressed by these operations, a precise schema represented by a tree automaton can be obtained in polynomial time. The algebraic operations allow using regular path expressions in query formulation and we show how XML schema given by tree automata can be used for the pruning of path expressions. The tree automata mechanism is more powerful than DTDs. Thus, any DTD can be correctly captured by a tree automaton. On the other hand, we show that translation of tree automata into DTD may require some generalization. We describe the translation procedure and prove that the generalization performed in this way is minimal. Research on XML data has been preceded by intensive research on semi-structured data. . . Our algorithm 1 generates the minimal element contents for XML tags, however it does not cope with the DTD unambiguity. Some ambiguous DTDs can be converted into equivalent unambiguous ones. Unfortunately, in most cases, the conversion requires generalization of grammars. For DTD unambiguity, we rely on work done by Wood and Brueuggeman-Klein, who studied the conditions when regular expressions used for definition of XML element contents are unambiguous. They introduced the notion of 1-determinism and proved that only those regular expressions satisfying the 1-deterministic condition can be used in DTD. Our method also exploits the 1-deterministic property established for element contents of an unambiguous DTD. Orbit properties of 1-deterministic grammars give the sufficient condition that a given finite-state automaton recognizes words that form 1-deterministic regular language. If the automaton has no orbit property, it should be generalized by merging nodes and adding transitions. For a detailed description of the method transforming ambiguous regular expression into a unambiguous, we refer to [H. Ahonen, 'Disambiguation of SGML Content Models.']. [Conclusion:] We study the use of tree automata as schemas for XML files. Tree automata are proven to be closed under boolean operators; this allows us to design a XML query language in the way similar to the relational algebra, and induce precise schema for any XML query formulated in this language. We show how to translate tree automata into DTDs and discussed the DTD unambiguity." For related research, see "SGML/XML and Forest/Hedge Automata Theory."

  • [August 11, 2000] "On Wrapping Query Languages and Efficient XML Integration." By Vassilis Christophides [email:], Sophie Cluet, and Jérôme Siméon. Presented at ACM SIGMOD 2000 - International Conference on Management of Data (Dallas, TX, USA, 16-18 May 2000). Published in SIGMOD Record Volume 29, Number 2 (June 2000), pages 141-152 (with 49 references). "Modern applications (Web portals, digital libraries, etc.) require integrated access to various information sources (from traditional DBMS to semistructured Web repositories), fast deployment and low maintenance cost in a rapidly evolving environment. Because of its flexibility, there is an increasing interest in using XML as a middleware model for such applications. XML enables fast wrapping and declarative integration. However, query processing in XML-based integration systems is still penalized by the lack of an algebra with adequate optimization properties and the difficulty to understand source query capabilities. In this paper, we propose an algebraic approach to support efficient XML query evaluation. We define a general purpose algebra suitable for semistructured or XML query languages. We show how this algebra can be used, with appropriate type information, to also wrap more structured query languages such as OQL or SQL. Finally, we develop new optimization techniques for XML-based integration systems. . . At the time of writing, the new XML version of the system, with its algebraic evaluation engine, is running and stable. The implementation of the optimizer is still ongoing. This first implementation is based on heuristics and a simple linear search strategy consisting of the three rewriting rounds presented previously. [Conclusion:] We have presented an algebraic framework to support effcient query evaluation in XML integration systems. It relies on a general purpose algebra allowing to capture the expressive power of semistructured or XML query languages but also to wrap, with appropriate type information, more structured query languages such as OQL or SQL. The proposed XML algebra comes equipped with a number of equivalences offering interesting optimization opportunities. Notably, they enable to optimize query compositions, exploit type information and push query evaluation to the external source." Available also in PostScript format. See "XML and Query Languages." [cache] Also: 'full version', [cache].

  • [August 11, 2000] Comparative Analysis of Five XML Query Languages." By Angela Bonifati and Stefano Ceri (Dipartimento di Elettronica e Informazione, Politecnico di Milano, Piazza Leonardo da Vinci 32, I-20133 Milano, Italy). [Email: bonifati/]. September 15, 1999, submitted for publication. "XML is becoming the most relevant new standard for data representation and exchange on the WWW. Novel languages for extracting and restructuring the XML content have been proposed, some in the tradition of database query languages (i.e. SQL, OQL), others more closely inspired by XML. No standard for XML query language has yet been decided, but the discussion is ongoing within the World Wide Web Consortium and within many academic institutions and Internet-related major companies. We present a comparison of five, representative query languages for XML, highlighting their common features and differences. [...] Conclusions: A Unified View. The five reviewed languages can be organized in a taxonomy, where: (1) LOREL and XML-QL are the OQL-like and XML-like representatives of Class 2 of expressive query languages for XML, playing the same role as high-level SQL standards and languages (e.g., SQL2) in the relational world. Our study indicates that they need certain additions in order to become equivalent inpower, in which case it would be possible to translate between them. Currently, a major portion of the queries that they accept can be translated from anyone language to another. (2) XSL and XQL are representative of Class 1 of single-document query languages, playing the same role as core SQL standards and languages (e.g. the SQL supported by ODBC) in the relational world; they do not have joins. Their expressive power is included within the expressive power of Class 2 languages. Their rationale is to extract information from a single document, to be expressed as a single string and passed as one of the URL parameters. (3) XML-GL can be considered as a graphical query interface to XML, playing the same role as graphical query interfaces (e.g., QBE) in the relational world. The queries being supported by XML- GL are the most relevant queries supported by Class 2 languages. If the common features (as initially identified in this paper) will become fully understood, it is possible to envision a collection of translators between languages of the same class, and/or between languages of different classes, and/or from the graphic language XML-GL to the programmative languages of Classes 1 and 2. In this way, query languages for XML will constitute a language hierarchy similar to the one existing for relational and object-relational databases." See "XML and Query Languages." [cache]

  • [August 11, 2000] "Integrating Keyword Search into XML Query Processing." By Daniela Florescu, Ioana Manolescu, and Donald Kossmann (INRIA). Pages 119-35 (with 20 references) in Computer Networks. Proceedings of the Ninth International World Wide Web Conference, Amsterdam, Netherlands, May 15-19, 2000. "Due to the popularity of the XML data format, several query languages for XML have been proposed, specially devised to handle data of which the structure is unknown, loose or absent. While these languages are rich enough to allow for querying the content and structure of an XML document, a varying or unknown structure can make formulating queries a very difficult task. We propose an extension to XML query languages that enables keyword searching at the granularity of XML elements, that helps novice users formulate queries and also yields new optimization opportunities for the query processor. We present an implementation of this extension on top of a commercial relational DBMS; we then discuss implementation choices and performance results." Available also in PostScript, [cache; alt URL HTML]. See also the related publications from Florescu.

  • [August 11, 2000] "An XML-Based Framework for Dynamic SNMP MIB Extension." By Ajita John, Keith Vanderveen, and Binay Sugla. Paper presented at The 10th IFIP/IEEE International Workshop on Distributed Systems: Operations and Management, DSOM '99 (Zurich, Switzerland, 11-13 October 1999). Pages 107-120 (with 21 references) in Active Technologies for Network and Service Management. Proceedings of the Tenth IFIP/IEEE International Workshop on Distributed Systems: Operations and Management. Lecture Notes in Computer Science, Volume 1700. Germany: Springer-Verlag, 1999. Edited by R. Stadler and B. Stiller. "Current SNMP-based management frameworks make it difficult for a network administrator to dynamically choose the variables comprising the MIB at managed elements. This is because most SNMP implementations represent the MIB implicitly as part of the agent code-an approach which impedes the runtime transfer of the MIB as a separate entity to be easily shipped around the network and incorporated into different applications. We propose the use of XML to represent MIB at managed elements. We describe a network management toolkit which uses XML and the document object model (DOM) to specify a MIB at runtime. This approach allows the MIB structure to be serialized and shipped over the network between managers and agents. This use of the DOM for MIB specification facilitates dynamic MIB modification. The use of XML also allows the MIB to be easily browsed and seamlessly integrated with online documentation for management tasks. XML further allows for the easy interchange of data between network management applications using different information models."

  • [August 11, 2000] "XNAMI - An Extensible XML-Based Paradigm for Network and Application Management Instrumentation." By Ajita John, Keith Vanderveen, and Binay Sugla (Bell Laboratories, Lucent Technologies, 101 Crawfords Corner Road, Holmdel, NJ 07733; Email: {fajita, vandervn, suglag} Paper presented at the IEEE International Conference on Networks. 28 September - 1 October, 1999, Brisbane, Australia. "We introduce a new paradigm for the retrieval and presentation of management data. XNAMI is a Java- based network and application management toolkit which uses XML and the Document Object Model to specify a Management Information Base (MIB) at run-time. This approach allows the MIB to be serialized and shipped over the network between managers and agents. It also facilitates runtime agent extension and allows the MIB to be easily browsed and seamlessly integrated with online documentation. XML further allows for the interchange of data between management applications using different information models. . . [Conclusions and Future Work:] This paper describes the ideas and implementation behind a toolkit for management of networks and applications. The toolkit provides for a framework that facilitates runtime MIB extension within the SNMP framework. However, it can be easily extended to use HTTP. Italsointroduces the concept of an XML-based MIB. The use of XML along with DOM allows the MIB to be shipped over the network, easily browsed, and seamlessly integrated with online documentation for configuration and management tasks. XML also facilitates interoperability by allowing for the easy interchange of data between management applications using different information models. We plan to extend the XML description of MIB objects in XNAMI to include suggestions to the XNAMI manager on how data for that object should be presented visually to the user. For example, an object such asthenumber of dropped packets at an interface might most logically be presented as a line chart and plotted with respect to time, while the up or down status of an interface could be presented using an icon which is either green (up) or red (down). The advantage to storing visual presentation cues such as these with the XNAMI agent is that managers will be able to more appropriately display data from MIB objects with which they have no familiarity. We also plan to explore the use of the XNAMI framework in various application management areas such as distributed program debugging where program variables in memory have to be monitored."

  • [August 11, 2000] "On Extending the XML Engine with Query-Processing Capabilities." By Klemens Boehm (Informationssysteme, ETH Zentrum, CH-8092 Zürich). Pages 127-138 in Proceedings of IEEE Advances in Digital Libraries 2000 [Washington, DC, USA, May 22-24, 2000; Los Alamitos, CA, USA: IEEE Computer Society, 2000, edited by J. Hoppenbrouwers, T. de Souza Lima, M Papazoglou, and A. Sheth]. "We study how to efficiently evaluate queries over XML documents whose representation is according to the XML specification, i.e., XML files. The software architecture is as follows: the XML engine (i.e., XML parser) makes the structure of the documents explicit. The query processor operates directly on the output of the XML engine. We see two basic alternatives of how such a query processor operates: event-based and tree-based. In the first case, the query processor immediately checks for each event, e.g., begin of an element, if it contributes to a query result or if it invalidates current partial results. In the second case, the query processor generates an explicit transient representation of the document structure and evaluates the query set-at-a-time. This work evaluates these approaches and some optimizations in quantitative terms. Our main results are as follows. The event-based evaluation scheme is approximately 10% faster, even with all the optimizations from this article. The overhead of the query processors is small, compared to the running times of the XML engine. Finally exploiting DTD information in this particular context does not lead to a better performance."

  • [August 11, 2000] "Subsumption for XML Types." By Gabriel M. Kuper and Jérôme Siméon. Draft manuscript, June 2000. 20 pages, 34 references. "XML data is often used (validated, stored, queried, etc) with respect to different types. Understanding the relationship between these types can provide important information for manipulating this data. We propose a notion of subsumption for XML to capture such relationships. Subsumption relies on a syntactic mapping between types, and can be used for facilitating validation and query processing. We study the properties of subsumption, in particular the notion of the greatest lower bound of two schemas, and show how this can be used as a guide for selecting a storage structure. While less powerful than inclusion, subsumption generalizes several other mechanisms for reusing types, notably extension and refinement from XML Schema, and subtyping. [...] XML is a data format for Web applications. As opposed to e.g., relational databases, XML documents do not have to be created and used with respect to a fixed, existing schema. This is particularly useful in Web applications, for simplifying exchange of documents and for dealing with semistructured data. But the lack of typing has many drawbacks, inspiring many proposals of type systems for XML. The main challenge in this context is to design a typing scheme that retains the portability and flexibility of untyped XML. To achieve this goal, the above proposals depart from traditional typing frameworks in a number of ways. First, in order to deal with both structured and semistructured data, they support very powerful primitives, such as regular expressions and predicate languages to describe atomic values . Secondly, documents remain independent from their type, which allows the same document to be typed in multiple ways according to various application needs. These features result in additional complexity: the fact that data is often used with respect to different types, means that it is difficult to recover the traditional advantages (such as safety and performance enhancements) that one expects from type systems. To get these advantages back, one need to understand how types of the same document relates to each other. In this paper, we propose a notion of subsumption to capture the relationship between XML types. Intuitively, subsumption captures not just the fact than one type is contained in another, but also captures some of the structural relationships between the two schemas. We show that subsumption can be used to facilitate commonly used type-related operations on XML data, such as type assignment, or for query processing. We compare subsumption with several other mechanisms aimed at reusing types. Subsumption is less powerful than inclusion, but it captures refinement and extension, recently introduced by XML Schema, subtyping, as in traditional type systems, as well as the instantiation mechanism of [...]. As a consequence, subsumption provides some formal foundations to these notions, and techniques to take advantage of them. We study the lattice theoretic properties of subsumption. These provide techniques to rewrite inclusion into subsumption. Notably we show the existence of a greatest lower bound. Greatest lower bound captures the information from several schemas, while preserving the relationship with them, and can be used as the basis for storage design..." [cache]

  • [August 11, 2000] "YATL: a Functional and Declarative Language for XML." By Sophie Cluet (INRIA Rocquencourt) and Jérôme Siméon. Draft manuscript, submitted to ICFP '2000. This paper describes YATL, a language to query, convert and integrate XML data. YATL comes from the database community: it is not Turing complete, but it captures a large class of useful data transformations, it is declarative and subject to optimization. The first version of YATL was based on logic programming and datalog. This paper presents the new version of YATL which benefits from functional programming in two ways. First, YATL is a functional language. The functional design aided in areas that are traditionally problematic for database languages: notably recursion, treatment of references, pattern matching, and processing of alternatives. Second, YATL is implemented in MSL. . . XML is expected to play a central role in the new generation of Web applications, ranging from electronic commerce and corporate portals to digital libraries. In this paper, we present YATL, a functional language that can be used for all these tasks, and supports database-strength optimization techniques. Starting with SQL and OQL, there has been a large body of work in the database community on the design of optimizable languages for the efficient manipulation of large amounts of data. Recently, many languages have been proposed to support the above operations on semistructured data or XML. For instance, Lorel, XML-QL and XQL have been used for querying, datalog and XSLT for conversion, MSL and the first version of YATL for integration. Yet, due to the number of new features required by each of these tasks and their complex interactions, none of the above languages supports all of them at once. We believe that the key that enabled us to capture these features in a single framework is the functional design chosen for the new YATL (as opposed to our original design, which was based on logic programming and datalog). To the best of our knowledge, YATL is the first database language that supports at the same time querying, conversion, and integration. . . YATL is a small functional language for an XML data model based on ordered trees with references. It provides basic primitives to create or access trees, supports function definitions and (possibly recursive) function calls. On top of this core language we add two advanced features: (i) functions with a limited form of side-effect, designed to create tree identifiers and build references (the so-called Skolem functions), and (ii) two specialized declarative operations, designed for optimization purposes (the so-called iterators). In Section 2.1, we present the data model, and the basic primitives on trees. In Section 2.2, we introduce the functional core of the language plus Skolem functions. Finally, in Section 2.3, we present the language iterators. YATL comes equipped with a predefined set of basic functions to access, construct or compare trees, perform arithmetic operations, etc. In this section, we introduce our data model and type system, and show examples of simple combinations of these core functions..." [cache]

  • [August 11, 2000] "An Algebra for XML Query." By Mary Fernandez, Jérôme Siméon, and Philip Wadler. Draft manuscript, June 2000. This document proposes an algebra for XML Query. This work builds on long standing traditions in the database community. In particular, we have been inspired by systems such as SQL, OQL, and nested relational algebra (NRA). We have also been inspired by systems such as Quilt, UnQL, XDuce, XML-QL, XPath, XQL, and YaTL. We give citations for all these systems below. In the database world, it is common to translate a query language into an algebra; this happens in SQL, OQL, and NRA, among others. The purpose of the algebra is twofold. First, the algebra is used to give a semantics for the query language, so the operations of the algebra should be well-defined. Second, the algebra is used to support query optimization, so the algebra should possess a rich set of laws. Our algebra is powerful enough to capture the semantics of many XML query languages, and the laws we give include analogues of most of the laws of relational algebra. In the database world, it is common for a query language to exploit schemas or types; this happens in SQL, OQL, and NRA, among others. The purpose of types is twofold. Types can be used to detect certain kinds of errors at compile time and to support query optimization. DTDs and XML Schema can be thought of as providing something like types for XML. Our algebra uses a simple type system that captures the essence of XML Schema. The type system is close to that used in XDuce. Our type system can detect common type errors and support optimization. The best way to learn any language is to use it. To better familiarize readers with the algebra, we have implemented a type checker and an interpreter for the algebra in OCaml. A demonstration version of the system is available... The demo system allows you to type in your own queries to be type checked and evaluated. All the examples in this paper can be executed by the demo system. This paper describes the key features of the algebra, but does not address features such as attributes, element identity, namespaces, collation, and key constraints, among others. We believe they can be added within the framework given here. The paper is organized as follows. A tutorial introduction is presented in Section 2. After reading the tutorial, the reader will have seen the entire algebra and should be able to write example queries. A summary of the algebra's operators and type system is given in Section 3. We present some equivalence and optimization laws of the algebra in Section 4. Finally, we give the static typing rules for the algebra in Section 5. Although it contains the most challenging material, we have tried to make the content as accessible as possible..." [cache]

  • [August 11, 2000] "A Data Model and Algebra for XML Query." By Mary Fernandez, Jérôme Siméon, Dan Suciu and Philip Wadler. Draft manuscript, November 1999. "This note presents a possible data model and algebra for an XML query language. It should be compared with the alternative proposal. The algebra is derived from the nested relational algebra, which is a widely-used algebra for semi-structured and object-oriented databases. For instance, similar techniques are used in the implementation of OQL. We differ from other presentations of nested relational algebra in that we make heavy use of list comprehensions, a standard notation in the functional programming community. We find list comprehensions slighly easier to manipulate than the more traditional algebraic operators, but it is not hard to translate comprehensions into these operators (or vice versa). One important aspect of XML is not covered by traditional nested relational algebras, namely, the structure imposed by a DTD or Schema. (So far as we can see, the proposal also does not cover this aspect.) We extend the nested relational algebra with operators on regular expressions to capture this additional structure. The operators for regular expressions are also expressed with a comprehension notation, similar to that used for lists. Again, a similar technique is common in the functional programming community. We use the functional programming language Haskell as a notation for presenting the algebra. This allows us to use a notation that is formal and concrete, without having to invent one from scratch. It also means you can download and play with the algebra, for instance, to try implementing your favorite query. We use a slightly modified version of Haskell that supports regular expression comprehensions. Code that implements the algebra and a modified Hugs interpreter for Haskell can be downloaded from the URL at the head of this document. The algebra is at the logical level, not the physical level. Hence, we do not have operators that exploit an index to compute joins efficiently. This concern is deferred to another level. The remainder of this paper is organized as follows. Section 2 presents the data model. Section 3 presents the algebra. Section 4 presents some of the laws that apply to list comprehensions and regular expressions." [cache]

  • [August 11, 2000] "XML Schema Languages: Beyond DTD." By Demetrios Ioannides (Michigan State University, East Lansing, MI). In Library Hi Tech Volume 18, Number 1 (2000), pages 9-14 (with 6 tables, 14 references). [ISSN: 0737-8831.] Abstract: "The flexibility and extensibility of XML have largely contributed to its wide acceptance beyond the traditional realm of SGML. Yet, there is still one more obstacle to be overcome before XML is able to become the evangelized universal data/document format. The obstacle is posed by the limitations of the legacy standard for constraining the contents of an XML document. The traditionally used DTD (document type definition) format does not lend itself to be used in the wide variety of applications XML is capable of handling. The World Wide Web Consortium (W3C) has charged the XML schema working group with the task of developing a schema language to replace DTD. This XML schema language is evolving based on early drafts of XML schema languages. Each one of these early efforts adopted a slightly different approach, but all of them were moving in the same direction. . . The XML new schema is not only an attempt to simplify existing schemas. It is an effort to create a language capable of defining the set of constraints of any possible data resource. Table VI XML schema goals simply illustrates the XML schema goals. The importance of having a fully fledged and universally accepted schema language is paramount. Without it, no serious migration from legacy data structures to XML will be possible. Databases with unwieldy data structures such as MARC will greatly benefit from such a migration. We will no longer depend on data structures that were predefined years ago to meet different needs. The extensible nature of these schema languages will allow the easy creation of any data structure, thus providing the flexibility mandated by the mutability of today's information needs. Bringing a wealth of metadata into an extensible format and allowing it to take full advantage of the dynamic nature of networking is an extremely exciting prospect for information professionals. The first step, premature as it may be, yet very symbolic, is the CORC Project's creation of a DTD for MARC. In the future, intelligent schemas will allow for blending of existing metadata with full-text, multimedia, and much more. The possibilities are endless..." For schema description and references, see "XML Schemas."

  • [August 11, 2000] "XML: How it will be applied to digital library systems." By Hyun-Hee Kim and Chang-Seok Choi (Myongji University, Seoul, Korea). In The Electronic Library Volume 18, Number 3 (2000), pages 183-189 (with 5 figures, 13 references). [ISSN: 0264-0473.] "The purpose of this paper is to show how XML is applied to digital library systems. For a better understanding of XML, the major features of XML are reviewed and compared with those of HTML. An experimental XML-based metadata retrieval system, which is designed as a subsystem of the Korean Virtual Library and Information System (VINIS) is demonstrated. The metadata retrieval system consists of two modules: a retrieval module and a browsing module. The retrieval module allows the retrieval of metadata stored in Microsoft Access files and the display of search results in an XML file format, while the browse module permits browsing of metadata in XML/XSL document formats. Finally, some issues for a more efficient application of XML to digital libraries are discussed. . . VINIS is a digital library that was designed and has been co-managed by the Library and Information Department of Myongji University and KIEP (Korean Institute for International Economic Policy) since 1997. The main feature of VINIS is that the system, with a search engine capable of retrieving databases such as document and expert databases, is able to support reference librarians or replace them. The system supplies electronic reference services, i.e., remote reference services that will be supplied with library clients through networks. An XML-based metadata retrieval system, a subsystem of the Korean digital library system VINIS, is built to efficiently retrieve economic-related electronic resources such as Internet sites, online databases (e.g., EPIC), and general files. . . KIEP has published an e-journal titled Journal of International Economic Policy Studies in a text file format. However, because the text file format is inconvenient to handle and process data, it was decided to use an XML format for publishing the e-journal, instead of the text file format. By using XML, XLink and XSL, it is also planned to create a hyperthesaurus which makes it possible to introduce experts and newcomers alike to a term definition and its related terms. The hyperthesaurus files will be designed to include automated linking that connects the contents of the hyperthesaurus files to documents. An inline extended link could be used to apply and filter sets of relevant links on demand, if application software supports it. Since the inline extended link allows one to connect two or more targets, it is therefore possible to examine co-citation analysis of Web pages and features of Web resources by using it. It is also expected that an out-of-line extended links will be used for browsing and filtering relevant links on request. . . In this article, following a review of XML features, an experimental XML-based metadata retrieval system has been described to show how XML is applied to digital library systems. There are some areas where improvements are needed to make an application of XML to digital libraries more efficient and effective. First, XLink/XPointer and XSL specifications, which are still in working drafts, need to be developed and defined in the near future. Second, more sophisticated application programs of XML are needed to support XML-based search queries and user-friendly editing interfaces. Open DTD format, which will allow information retrieval systems to retrieve documents and compress them into one single format, is also needed to control Web documents more efficiently."

  • [August 10, 2000] "Modeling data entry and operations in WebML." By Stefano Ceri, Piero Fraternali, Aldo Bongio, and Andrea Maurino. Dipartimento di Elettronica e Informazione, Politecnico di Milano, Piazza L. Da Vinci, 32 - 20133 Milano, Italy. Paper presented at WebDB 2000, Dallas, 2000. "Web Modeling Language (WebML, is a notation for visually specifying complex Web sites at the conceptual level. All the concepts of WebML are specified both graphically and in XML; in particular, navigation and composition abstractions are based on a restricted number of hypertext components (units) which are assembled into pages and interconnected by links. During implementation, pages and units are automatically translated into server-side scripting templates, which enable the display of data dynamically retrieved from heterogeneous data sources. This paper extends WebML with data entry and operation units, for gathering information from clients and invoking arbitrary operations. Predefined operations are also proposed as built-in primitives for supporting standard updates on the content of the underlying data sources (represented as entities and relationships). This natural extension of WebML permits the visual modeling of Web pages integrating read and write access, an essential aspect of many E-commerce applications (including user profiling and shopping cart management). . . This paper has shown two new Web modeling abstractions, which integrate data entry and operation invocation into WebML, an existing modeling notation for specifying read-only Web sites. These extensions can be orthogonally combined with primitives for composing hypertexts and defining their navigation, thus building on user skills matured in the conceptual specification of read-only Web sites. WebML is currently applied in the re-engineering of a large e-commerce site, where the write access primitives described in this paper are used to specify and implement a shopping cart connected to a legacy application for order confirmation and delivery. WebML read-only primitives are fully implemented in a Web design tool suite called ToriiSoft ( The write extensions described in the paper are under implementation and will be available for beta testing outside the W3I3 Project in early Summer 2000. Following the present implementation of WebML, the novel WebML primitives will be automatically translated into multiple rendition languages (including HTML, WML, XML+XSL, and special-purpose languages for TeleText applications) and server side scripting languages (including MicroSoft's Active Server Pages and JavaSoft's Java Server Pages)." [alt URL] See: "Web Modeling Language (WebML)."

  • [August 10, 2000] "XML: Current Developments and Future Challenges for the Database Community." By Stefano Ceri, Piero Fraternali, and Stefano Paraboschi. Presented at the Seventh Conference on Extending Database Technology, March 27-31, 2000, Konstanz - Germany. EDBT 2000: 3-17. "While we can take as a fact that 'the Web changes everything', we argue that 'XML is the means' for such a change to make a significant step forward. We therefore regard XML-related research as the most promising and challenging direction for the community of database researchers. In this paper, we approach XML-related research by taking three progressive perspectives. We first consider XML as a data representation standard (in the small), then as a data interchange standard (in the large), and finally as a basis for building a new repository technology. After a broad and necessarily coarse-grain analysis, we turn our focus to three specific research projects which are currently ongoing at the Politecnico di Milano, concerned with XML query languages, with active document management, and with XML-based specifications of Web sites." See also the Presentation slides [54], HTML; slides in PDF format. [cache, paper]

  • [August 10, 2000] "Design and Implementation of an Access Control Processor for XML Documents." By Ernesto Damiani, Sabrina De Capitani di Vimercati, Stefano Paraboschi, and Pierangela Samarati. Paper presented at the Ninth International World Wide Web Conference, Amsterdam, Netherlands, 15-19 May 2000. "More and more information is distributed in XML format, both on corporate Intranets and on the global Net. In this paper an Access Control System for XML is described allowing for definition and enforcement of access restrictions directly on the structure and content of XML documents, thus providing a simple and effective way for users to protect information at the same granularity level provided by the language itself." See the diagrams in the paper for the conceptual architecture: "A central authority uses a pool of XML DTDs to specify the format of information to be exchanged within the organization. XML documents instances of such DTDs are defined and maintained at each site, describing the site-specific information. The schema-instance relationship between XML documents and DTDs naturally supports the distinction between two levels of authorizations, both of them allowing for fine grained specifications. Namely, we distinguish: 1) Low-level authorizations, associated to XML documents, providing full control on authorizations on a document-by-document basis; 2) High-level authorizations, associated to XML DTDs, providing organization-wide and department-wide declarations of access permissions. Centrally specified DTD-level authorizations can be mandatory, stating impositions of the central authority to lower organizational levels where XML documents are created and managed, usually by means of a network of federated Web sites. This technique allows for easy, centralized modification of access permissions on large document sets, and provides a general, abstract way of specifying access authorizations. In other words, specifying authorizations at the DTD level cleanly separates access control specified via XML markup from access control policies defined for the individual datasources (e.g., relational databases vs. file systems) which are different from one another both in granularity and abstraction level. Each departmental authority managing a Web site retains the right to define its own authorizations (again, at the granularity of XML tags) on individual documents, or to document sets by means of wild cards. In our model local authorities can also define authorizations at the DTD level; however such authorizations only apply to the documents of the local domain. . . The approach proposed is focused on enforcing and resolving fine grained authorizations with respect to the data model and semantics. Although presented in association with a specific approach to authorization specification and subject identification, as supported in the current prototype, its operation is independent from such approaches and could then be applied in combination with different admninistrative policies. For instance, it can be combined with the treatment of roles and of authentication/authorization certificates. We are currently exploring such extensions." See further: "XML and Encryption."

  • [August 10, 2000] "XML Encryption Syntax and Processing." From the W3C public XML Encryption list. "This strawman proposal describes how the proposed W3C XML Encryption specification might look and work should the W3C choose to charter an XML Encryption Work Group. Though it is conceivable that XML Encryption could be used to encrypt any type of data, encryption of XML-encoded data is of special interest. This is because XML's ability to capture the structure and semantics of data unleashes new applications for the area of encryption. To this end, an important design consideration is to not encrypt more of an XML instance's structure and semantics than is necessary. For example, suppose there is an XML instance containing a list of customers including their names, addresses, and credit card numbers. A contracted marketing employee may be entrusted to see the names and addresses but must not see the credit card number. If an application knows it should just be decrypting the content of <name> elements, the XML instance needs to maintain its structure identifying what is a 'name' and what isn't. Otherwise the application would have to decrypt the other data just to find out what it was supposed to be decrypting in the first place which is problematic from both a security and performance point of view. So what level of granularity is needed? XML's document model provides access to a number of node types including elements, attributes, processing instructions, comments, and text nodes. However, because elements and attributes are the nodes intended for defining structure and semantics, the XML Encryption model (illustrated in the following examples) restricts itself to handling those. .. The centerpiece of XML Encryption is the <EncryptedNode> element. It has an attribute, NodeType, which indicates the type of node that was encrypted: element, element content, attribute, or attribute value. The encrypted node appears as a base64-encoded string which forms the content of the <EncryptedNode> element. . ." Contacts: Ed Simon and Brian LaMacchia. Note on the discussion list: "This list is for discussion about XML encryption and related (potential) IETF or W3C activity. The purpose of this list is to foster the development of a community of interest and a set of design issues and requirements that might prompt a BOF or workshop on the topic. This discussion list is public, it is not moderated, and it is not part of an chartered activity of the IETF or W3C." [cache]

  • [August 10, 2000] "Element-Wise XML Encryption." By Hiroshi Maruyama and Takeshi Imamura (IBM Research, Tokyo Research Laboratory). April 19, 2000. "When transmitting data over the Internet, in most cases the standard encryption protocols such as IPSec and SSL are good enough for achieving confidentiality during the transmission. Secure mails such as Pretty Good Privacy (PGP) and S/MIME can be used for encrypting data even after the message is received and stored in a file system. These methods are to encrypt an XML document as a whole. However, there are situations in which certain parts of an XML document need to be encrypted and the rest should be in clear text. A few motivating examples are shown below. . ."

  • [August 10, 2000] "Overview of IBM XML Security Suite." By Satoshi Hada (IBM Tokyo Research Laboratory). 2000/3/27. See the IBM web site for details. ['The XML Security Suite provides security features such as digital signature, element-wise encryption, and access control to Internet business-to-business transactions. As of 07/25/2000, it supports Xerces-J1.1 and latest working draft. DOMHASH implementation has conformed to RFC 2803, and new GUI-based sample programs for ASN.1/XML Translator and Element-wise Encryption libraries.']

  • [August 10, 2000] "XML Document Security and e-Business Applications." By Michiharu Kudo (Tokyo Research Laboratory, IBM Japan Ltd.) and S. Hada. Paper to be presented at CCS '00: [7th ACM Conference on Computer and Communication Security 2000, November 1-4, 2000, Athens, Greece. Note in this connection the comments by Michiharu Kudo on the W3C XML Encryption discussion list: "The idea of XML fine-grained access control is very interesting. Our team in Tokyo Research Lab has been interested and involved in several aspects of XML security such as digital signature, element-wise encryption, and access control on XML document as well. Someone may say that standardization for digital signature and encryption on XML is more essential compared to that of XML access control. Yes, however, it is often the case that the XML document such as e-contract contains multi-level security information and the access to that document must be controlled e.g., sub-portion of the original XML may have a digital signature that must be protected from the anonymous read access. Or when the access comes from the specific department, access is allowed but access must be logged. For these purposes, it is nice to have a fine-grained access control policy specification language for XML document, and also reasonable to provide such a language defined in XML. Thus we designed XACL (XML Access Control specification Language) and implemented a prototype system for e-commerce applications. However, there could be various language definitions, while they have many issues that could be shared in common. Thus I think that it is very good to propose this to some standardization unit as a first step." See: "XML access control language (XACL)." - "The XML Access Control Language (XACL) is centered around a subject-privilege-object oriented security model in the context of a particular XML document. This means, by writing rules in XACL a policy author is able to define who can exercise what access privileges on a particular XML document. The notion of subject comprises identity and role, with identity possibly including information about group or organization membership. The granularity of object is as fine as single elements within this document. The set of possible privileges currently consists of five types (read, write, create, delete, clone), but is not limited to these. In addition to subject, privilege and object, a condition can be added to the rule. By specifying enforcement conditions, temporal conditions and data-dependent conditions, more flexible rules can be written."

  • [August 10, 2000] "Securing XML Documents." By Ernesto Damiani, Sabrina De Capitani di Vimercati, Stefano Paraboschi, and Pierangela Samarati. Proceedings of EDBT 2000, Konstanz, Germany, March 2000. Published in Lecture Notes in Computer Science, Number 1777. "Web-based applications greatly increase information availability and ease of access, which is optimal for public information. The distribution and sharing by the Web of information that must be accessed in a selective way requires the definition and enforcement of security controls, ensuring that information will be accessible only to authorized entities. Approaches proposed to this end level, independently from the semantics of the data to be protected and for this reason result limited. The eXtensible Markup Language (XML), a markup language promoted by the World Wide Web Consortium (W3C), represents an important opportunity to solve this problem. We present an access control model to protect information distributed on the Web that, by exploiting XML's own capabilities, allows the definition and enforcement of access restrictions directly on the structure and content of XML documents. We also present a language for the specification of access restrictions that uses standard notations and concepts and briefly describe a system architecture for access control enforcement based on existing technology. [This work was supported in part by the INTERDATA and DATA-X - MURST projects and by the Fifth (EC) Framework Programme under the FASTER project.]" See also: "Design and Implementation of an Access Control Processor for XML Documents." [cache]

  • [August 10, 2000] "XML Access Control Systems: A Component-Based Approach." By Ernesto Damiani, Sabrina De Capitani di Vimercati, Stefano Paraboschi, Pierangela Samarati. Paper to be presented at the Fourteenth Annual IFIP [International Federation for Information Processing] WG 11.3 Working Conference on Database Security, Amsterdam, The Netherlands, August 21-23, 2000.

  • [August 10, 2000] 'XML serialization of a XML document'. "We have been developing a schema for representing the infoset of a document, with the intention of using it to compare the output of XML Schema implementations. See the directory infoset-basic-subset.xsd is a schema allowing any subset of the infoset. infoset-psv-subset.xsd is the same for the post-schema-validation infoset. From Richard Tobin, posting to XML-DEV, 2000-08-10. For schema description and references, see "XML Schemas." [cache]

  • [August 10, 2000] "Processing Inclusions with XSLT." By Eric van der Vlist. From (August 11, 2000). ['Processing document inclusions with general XML tools can be problematic. This article proposes a way of preserving inclusion information through SAX-based processing.'] "The consequences of using more than one file to create an XML document, or XML inclusion, is a topic that enflames discussion lists, and for which finding a general solution is like trying to square the circle. In this article, we will show how customized parsers can expose a more complete document model through an SAX interface and help process compound documents through standard XML tools such as XSLT. Most of the XML APIs and standards are focused on providing, in a convenient way, all the information needed to process and display the data embedded in XML documents. Applied to document inclusions, this means that XML processors are required to replace the inclusion instruction by the content of the included resource -- this is the best thing to do for common formatting and information extraction tasks, but results in a loss of information that can be unacceptable when transforming these documents with XML tools. This topic has been discussed a number of times on different mailing lists, and the feeling of many can be summarized by a post from Rick Geimer on the XSL List..." For related resources, see "Extensible Stylesheet Language (XSL/XSLT)."

  • [August 10, 2000] "Putting RDF to Work." By Edd Dumbill. From (August 09, 2000). ['Tool and API support for the Resource Description Framework is slowly coming of age. Edd Dumbill takes a look at RDFDB, one of the most exciting new RDF toolkits.'] "Over recent months, members of the www-rdf-interest mailing list have been working at creating practical applications of RDF technology. Notable among these efforts have been Dan Connolly's work with using XSLT to generate RDF from web pages, and R.V. Guha's lightweight RDF database project. RDF has always had the appeal of a Grand Unification Theory of the Internet, promising to create an information backbone into which many diverse information sources can be connected. With every source representing information in the same way, the prospect is that structured queries over the whole Web become possible. That's the promise, anyway. The reality has been somewhat more frustrating. RDF has been ostracized by many for a complex and confusing syntax, which more often than not obscures the real value of the platform. One also gets the feeling, RDF being the inaugural application of namespaces, that there's a certain contingent who will never forgive it for that! [...] RDFDB offers a great backbone -- storage and query facilities -- for integrating diverse information sources. In its early stages now, it's a project that deserves to get more mindshare. The SQL-like syntax brings a familiarity to querying that other, more Prolog-like, mechanisms don't. Architecturally, I find the implementation of RDFDB as a database server a great advantage. It immediately makes multiple data sources and clients a reality, and makes cross-platform implementation easy (writing a language client to RDFDB is pretty trivial, I managed a workable first cut in 10 lines of Perl)." For related references, see "Resource Description Framework (RDF)."

  • [August 10, 2000] "XML-Deviant: A Few Bumps." By Edd Dumbill. From (August 11, 2000). ['Some problems are due to success, some are growing pains, and some just refuse to go away. XML has all of these, chronicled as ever by the XML-Deviant.'] "This week I'm reporting on a few bumps in XML's otherwise streamlined route to world domination: one of them the consequence of success, one of progress, and another a bogeyman lurking under XML implementors' beds."

  • [August 08, 2000] "Getting to XML." By Anne Chen. In eWEEK Volume 17, Number 32 (August 07, 2000), pages 55-65. ['The new B2B lingua franca will need to live with EDI. Here's how to mix the two.'] "Rob Fusillo is caught in the middle. The CIO at scientific products e-marketplace Inc. would love to embrace XML as the exclusive technology to be used by buyers and sellers exchanging information on his business-to-business e-commerce site. Doing so -- and eliminating expensive EDI-based transactions and the VANs they require -- would, he estimates, cut transaction costs by 50 percent. But here's the rub: Fusillo can't mandate XML because, among the 1,000 business buyers and sellers that use, 15 percent of the information exchanged comes to or from large corporations that transfer data to the e-marketplace using the electronic data interchange format. And they aren't about to drop EDI, no matter how much Fusillo may want them to. So he has to support both EDI and XML as well as nonautomated data transfer methods, such as fax and e-mail. For some very good reasons, XML has quickly gained a reputation as the lingua franca of B2B e-commerce. Based on Internet protocols such as HTTP, XML is much easier and less expensive to deploy on the Web than EDI, making it a natural for smaller businesses without the IS staffs or money to deploy EDI. In fact, by next year, 70 percent of all B2B transactions executed on the Web will be done using XML, Gartner Group Inc. predicts. The bulk of those transactions will involve small and midsize enterprises that today do not use EDI..."See also in this eWEEK report: (1) "EDI vs. XML" ['Large enterprises with investments in EDI can use translation engines to convert XML documents']; (2) "Snaring the right XML version is tricky." "Although extensible markup language has been described as the lingua franca of B2B e-commerce, the fact is that there are still too many conflicting dialects of the language. The best IT managers can do, experts say, is to use a version of XML most relevant to their industries and their data exchange needs. Often, however, the reality is that they have to support multiple versions of XML. While the World Wide Web Consortium has already developed Version 1.0 of the XML specification, the area of concern centers on the formatting of XML and the specific expressions that most companies use to define objects represented on an XML document, such as invoice, company and partner. Any two companies exchanging XML must agree on a common DTD (Document Type Definition) so that each application will know how to interpret the XML data it receives. If companies use even slightly different data descriptors (such as one organization using CO for "company" while another uses COMPANY) without a consensus on how key business terms are defined, there is no guarantee that even companies within the same vertical industry will treat their data in a consistent manner. In many industries, standard XML definitions for even commonly used data elements have not yet been established. The health care, financial services and manufacturing industries have organizations focusing on creating standard XML definitions, but it will take time. . ."

  • [August 08, 2000] "XML takes on graphics. Vector-based spec promises to deliver flexible, high-resolution images." By Roberta Holland. In eWEEK Volume 17, Number 32 (August 07, 2000), page 18. "Standards groups and vendors continue to push the scope of XML with new specifications and products that take advantage of the technology. Last week, the spotlight turned to how Extensible Markup Language can transform traditionally bulky, pixel-based graphics into lightweight and richer vector-based graphics, as the World Wide Web Consortium moved forward with the Scalable Vector Graphics specification. The W3C issued SVG as a candidate recommendation, which is two steps before a final recommendation in its standards process, and extended a test suite on its site, Already there are 15 implementations of SVG, many more than HTML when it was at the same stage in the process, Lilley said. The implementations available on the W3C's site include offerings from IBM, Adobe Systems Inc. and Corel Corp. The outgoing chairman of the W3C is high on XML's ability to add functionality to Web applications in a way HTML cannot. 'The breadth [of XML] has grown very much over the last two years,' said Jean-François Abramatic, who works from the W3C's Paris office. 'Now there are lots of issues on the drawing board that need to be addressed. 'XML has become the foundation on top of which most of the technology and developments of the W3C are based.' Abramatic announced last week that he is joining software component vendor Ilog S.A. as senior vice president of research and development, and that he will phase out his W3C responsibilities. He will remain chairman until a replacement is named." See "W3C Scalable Vector Graphics (SVG)."

  • [August 08, 2000] "XML-to-EDI links forged." By Tom Sullivan. In InfoWorld (August 08, 2000). "Bluestone Software, in Philadelphia, and XML Solutions, in Mclean, Va., have aligned to enable the real-time exchange of data between EDI (electronic data interchange) and XML-based systems. 'The solution is a common bridge from XML documents to the old EDI,' said John Capobianco, executive vice president at Bluestone. 'It can expand the scope of communication by using XML as a standard vehicle [for data exchange].' The integration of Bluestone's Total-e-B2B e-business Integration Server with XML Solutions' XEDI Translator and Schema Central will enable companies with EDI systems, which previously communicated only with other EDI systems, to conduct business with partners using XML." See the announcement: " Bluestone Software and XML Solutions Team to Deliver Real-Time Exchange Between EDI and XML-Based Systems. Bluestone's B2B Platform Integrated with XMLSolutions' Translation and Schema Management to Enable Enterprises to Easily Exchange Data Between EDI and XML-based Systems."

  • [August 07, 2000] "Commerce One Integrates Electronic Data Interchange. Alliance with GE global exchange services handles billing and tracking of purchases." By Cheryl Rosen. In InformationWeek (July 31, 2000), page 28. "E-commerce platform provider Commerce One last week sealed an alliance with GE Global Exchange Services that analysts say will offer a new challenge to Ariba Inc., Commerce One's biggest competitor. The deal gives Commerce One access to GE's mammoth network of large global customers and its expertise in transaction processing. GE Global Exchange gains a partner in integrating its electronic data interchange system with Internet technologies, reaching small and midsize companies. 'We can accept XML inputs from Commerce One and convert them into protocols like X.12 and Edifact, which have represented the lingua franca of business-to-business E-commerce for over a decade,' says Harvey Seegers, GE Global Exchange's president and CEO. GE Global Exchange, which has an EDI network that handles $1 trillion in volume a year, will support billing, reporting, and tracking of purchases; pre- and post-transaction collaboration; and the download of data into corporate accounting systems. The vendors will sell each other's products and partner on customer support and implementation."

  • [August 07, 2000] "Directories Learn Sharing Is Good." By Rutrell Yasin. In InternetWeek (August 04, 2000). "As e-businesses use directory technology to give parners access to its systems, authorization -- the assigning of user privileges and rights -- becomes vital. Authorization is impossible without sharing of entitlement information. The problem is simple: There is no standard approach for partners' access management systems to do such sharing. That could change. Netegrity and Securant plan to submit separate specifications based on the Extensible Markup Language (XML) to standards bodies such as the World Wide Web Consortium (W3) and the Internet Engineering Task Force (IETF). Netegrity will be promoting XML-based middleware software for user authorization while Securant will push its AuthXML specification. 'A set of rules and methods -- or schema -- based on XML would enable an online stock trading firm, for example, to seamlessly share user privilege information with a partnering financial services firm that offers 401K investments, even if the companies use different server and access control systems,' said Eric Olden, Securant's chief technology officer. But not all vendors are endorsing XML as a common platform to maintain consistent security policies across different access management systems, however, and some of the naysayers are big names. Hewlett-Packard is looking to support both XML and Java. Tivoli, an IBM company, is supporting the Open Group's AznAPI authorization API. . . Tivoli also will support XML where it is practical for customers, said Bob Kalka, a product line manager for the Tivoli SecureWay unit. Kalka said that AznAPI supports both Web and legacy systems while products from Netegrity and Securant are Web-only solutions. AznAPI can plug into the SecureWay Policy Director to determine authorization rights for a messaging application such as IBM's MQSeries, without requiring code rewrites, he noted. While vendor-specific deployment of XML-based systems will give users some added value, user would prefer suppliers to work together on a standard." See also "Users Seek Unified Directory Answers," InternetWeek July 31, 2000, page 13. Cf. "DIF Directory Interoperability Proposal."

  • [August 07, 2000] "Standards critical for .Net success." By Roberta Holland. In eWEEK (July 24, 2000). page 22. "Standards organizations will play an important role for two of the key technologies involved in Microsoft Corp.'s .Net software as a service vision. But whether the company follows through on the standards processes, and whether that makes a difference for developers, remains to be seen. When the Redmond, Wash., company introduced its new C# programming language last month, it submitted the software to the ECMA, a European standards body. Microsoft officials also said they will continue to push for the standardization of SOAP (Simple Object Access Protocol) by the World Wide Web Consortium. Microsoft, with co-authors including IBM and UserLand Software Inc., submitted SOAP 1.1 to the W3C in May. The W3C has not yet decided whether to create a working group focused on XML (Extensible Markup Language) protocols, which would include SOAP. Microsoft this month announced two extensions to SOAP intended to bolster Microsoft .Net. The first, SCL (SOAP Contract Language), describes the features of a Web service and what messages it is expecting to send and receive. The second, the SOAP Discovery specification, is a set of rules for automatically locating the SCL description of a Web service. . . "The SOAP stuff based on XML and [the standards effort around it] probably has a lot of traction to it," Bickel said. 'There are two platforms, Micro soft and Java, and SOAP goes across both.' In touting its commitment to standards, Microsoft has not shied away from pointing out that Sun Microsystems Inc.'s Java is not an official standard. It was the ECMA's process from which Sun withdrew last year, citing possible fragmentation or a slowdown of innovation. Instead, Sun has used its own Java Community Process to oversee the language, leaving ultimate control under the Palo Alto, Calif., company's purview." See "Simple Object Access Protocol (SOAP)."

  • [August 06, 2000] "A Conversation With Doug Engelbart - Inventor of the Mouse, Hypertext, and More." By Eugene Eric Kim. In Dr. Dobb's Journal Volume 25, Issue 9 #316 (September 2000), pages 21-26. ['Doug Engelbart is a man who thinks big. Fifty years ago, he dedicated his career to designing systems that could help the world solve its most difficult problems. Along the way, he invented the mouse, hypertext systems, two-dimensional display editing, collaborative video teleconferencing, and a host of other technologies that today form the basis of interactive, collaborative computing. Many of these inventions were part of the NLS System, which Engelbart first presented at the 1968 Fall Joint Computer Conference, a demonstration that later became known as 'The Mother of All Demos.' While the World Wide Web and related technologies were directly inspired by Engelbart's work, they only scratch the surface of the capabilities of NLS and its successor, Augment. Today, Engelbart is developing an open-source version of the next generation of his system for collaborative knowledge work, dubbed the Open Hyperdocument System. His initial goal for the OHS is to help programmers collaboratively develop software, a problem that DDJ readers will agree is as complex as it gets. Engelbart is currently director of the Bootstrap Institute, which he founded in 1988 to help organizations learn how to improve their ability to solve complex problems using tools such as interactive computing. Engelbart recently spoke with Eugene Eric Kim about his life's work and ways in which we can augment the collective human intellect.] "For fifteen or more years, it has been clear to me that an OHS -- Open Hyperdocument System -- has to emerge as a common basis upon which to build the world's evolving knowledge base: handling every kind of knowledge domain, with embedded properties and structural conventions whose nature and usage has yet to evolve. And maximum-function interoperability has to prevail. The emergence of XML provides a candidate growth base for content form, but that has to coevolve with how people are going to learn to manipulate and view. I've been talking here about coevolution and such as optimal evolutionary environments. That's because the target capabilities are going to involve changing many many of the ways we couple our basic human sensory, perceptual, cognitive, and motor capabilities into the augmented way of doing our knowledge work. And since I don't believe that any person or group can possibly be smart enough to design the best end system that we should use, I favor an approach [that] seeks the best large-scale evolutionary process. And so, launching an OHS Project along with a proactive user community seems like the only way to get going on creating this large-scale evolutionary process. So, we have crafted a prototype model, with an open-source approach, and a rough-sketch evolutionary path involving a succession of expanding user communities..." ["Well-known technological firsts include the mouse, display editing, windows, cross-file editing, outline processing, hypermedia, and groupware. ntegrated prototypes were in full operation under the NLS system, as early as 1968. In the last decade of its continued evolution, thousands of users have benefited from its unique team support capabilities."] [cache]

  • [August 05, 2000] "Scalable Vector Graphics: An Executive Summary." By Vincent Hardy (Senior Staff Engineer, Sun Microsystems XML Technology Center). ['Vincent Hardy provides a high-level view of the features and benefits of the new XML-based Scalable Vector Graphics (SVG) format for the Web and beyond.'] "The Scalable Vector Graphics (SVG) format is a new XML grammar for defining vector-based 2D graphics for the Web and other applications. This article provides a brief introduction to SVG, followed by four examples of SVG graphics, and links to two code samples. After reading this article, be sure to go to the Web site for the Sun Developer Connection[sm] program and get the Graphics2D SVG Generator and SVG Slide Toolkit. Also be sure to check out the useful SVG Resources listed at the end of this article. SVG was created by the World Wide Web Consortium (W3C), the non-profit, industry-wide, open-standards consortium that created HTML and XML, among other important standards and vocabularies. Over twenty organizations, including Sun Microsystems, Adobe, Apple, IBM, and Kodak, have been involved in defining SVG. Sun has been involved with the definition of the SVG specification from the start, and has two active representatives in the SVG working group, which is the group of experts defining the SVG specification. SVG is currently (as of July 21, 2000) in Working Draft status, but it is expected to move soon to Candidate Recommendation and then to Proposed Recommendation before becoming a Final Recommendation. [...] Because SVG is an XML grammar, SVG graphics can easily be generated on web servers "on the fly," using standard XML tools, many of which are written in the Java programming language. For example, a web server can generate a high quality, low bandwidth stock quote graph from stock market data. SVG allows graphics to be authored in graphics authoring packages (see the W3C SVG implementation list) or automatically (for example, using JavaServer Pages[tm] software). With SVG, you can easily manipulate your graphics using standard XML tools." See the main news entry.

  • [August 05, 2000] "Sun gets behind new XML-based SVG graphics format." By [SunServer Magazine Staff]. In SunServer Magazine Volume 14, Number 8 (August 2000). "Touting the synergies of Java and XML, Sun is throwing its weight behind Scalable Vector Graphics (SVG), a new XML-based graphics format that is currently being developed by the World Wide Web Consortium (W3C). Sun yesterday posted a beta version of its 2-D graphics SVG generator software, which allows Java applications to export graphics to the SVG format. The new SVG format, which describes two-dimensional vector graphics in XML, was released yesterday by the W3C for Candidate Recommendation. Unlike images from popular graphics formats like GIF and JPEG, SVG images are "scalable" -- users can zoom in on a particular area of a graphic, such as a map, and not see any image degradation. This scalability also allows SVG images to be printed with high quality at any resolution, Sun officials said. 'As our networked world takes shape, developers will increasingly require rich graphics that work well on a range of devices, screen sizes and printer resolutions,' said Bill Smith, engineering manager of Sun's XML Technology Center. 'SVG meets these requirements and finally brings the full benefits of XML, such as interoperability and internationalization, to the graphics space.' Because SVG is a plain text format, SVG files are more readable and generally smaller than comparable graphics images, Sun said. Thanks to XML, text within an SVG image, such as a city name on a map, is both selectable and searchable. In addition, applications written in SVG can be made accessible through means for describing the visual information in textual detail. SVG also supports scripting and animation. . . The SVG slide toolkit is a collection of XML stylesheets and DTDs that can be used to create XML documents that are in turn transformed into SVG-based slide presentations. The software allows for the separation of a presentation's content from its look and feel, permitting users to independently modify the content, the presentation style or both, Sun said." See the main news entry.

  • [August 05, 2000] "Tutorial on Java Server Pages technology and SVG." From Sun Microsystems. August, 2000. The article covers "Setting the generated document's MIME type, Declaring Java language variables, Using variables in the generated SVG content, and Extracting request parameters for use in the SVG content."

  • [August 05, 2000] "Writing a custom Graphics2D implementation to generate SVG images." By Vincent Hardy. From Sun Microsystems. August, 2000. Illustrates "how to take advantage of the Java 2D API's extensible architecture to write a new Graphics2D implementation, SVGGraphics2D, which allows all Java programming language applications to export their graphics to the SVG format."

  • [August 05, 2000] "Exploring XML order processing for distributed e-commerce." By Michael Gentry. In SunServer Magazine Volume 14, Number 8 (August 2000) pages 8-9. ['Using the WebObjects application server running on Sun hardware, Blacksmith took on the task of integrating two disparate order entry systems from two different companies.] "Recently, Blacksmith worked on a project where two different companies came together to form a comprehensive partnership, and needed to integrate the order processing of the two companies so that customers were able to easily place orders and determine order status for each company's complimentary products, without having to know the intricacies and details of each company's information systems. Blacksmith accomplished this task using the WebObjects application server running on Sun hardware. The application server provided an order translation function to and from a standard order representation using XML (eXtensible Markup Language) as the common description of order formats. Sun's Java programming language was used to implement the order translation application. [...] XML and application server technology provide a powerful combination for information integration for distributed order entry systems. Using tools such as Java, XML, and WebObjects running on Sun hardware as the foundation for the implementation of these systems provides a standards-based, well-understood and supported, and scalable architecture for integration of business processes within and external to organizations as they become Web-enabled. Because XML is an industry-standard markup language, it will play an increasing role as a data interchange format, allowing businesses to communicate electronically with each other for a wide variety of applications. Economical commercial and open-source XML parsing routines/libraries are becoming available enabling developers to build software interchanges and allowing companies to spend more time building software to manage the business instead of building communication infrastructure."

  • [August 05, 2000] "Tax compliance with Java and XML. [What do you mean, tax compliance?]" By Mike Breeze. In SunServer Magazine Volume 14, Number 8 (August 2000). ['Tax compliance as a business requirement is too often an afterthought in the development of internal business systems. This oversight is even more prominent as e-commerce operating models become increasingly common.'] "Many modern Internet-enabled applications are using Java and XML as the development languages of choice. As developers and programmers well know, Java, and more recently, XML fall into the broad category of object-oriented software technology. The advantage of object orientation is the ability to encapsulate both data and functionality into easily implemented, reusable blocks of code. Since objects are independent of one another, enhancements can be made to individual objects with little or no impact to other objects that are in use in a given application. Tax calculation, reporting and remittance is a business function that lends itself well to object modeling. Individual factors that determine a given transaction's taxability, such as customer address, ship-to address, bill-toaddress, tax rate etc. can be created as standalone objects that are accessed as needed to calculate the appropriate tax. In Java, these items can be built as objects that can easily be included in existing or newly written Java apps. Once instantiated, the application has access to all the elements of the tax class or object. At this point, it becomes a matter of identifying the appropriate locations in the application(s) where tax calculations need to be made. XML is generating a lot of interest in development shops. XML allows developers to dynamically define their own data by embedding tags that describe the data. For instance, a document could be created which contains information that calculates sales tax on an invoice. When the document is sent back in to the application, it is parsed by a special XML parsing procedure which understands the embedded tags. The parser extracts the data supplied inside the tags and passes it to the application for tax calculation processing. Upon completion, the result is placed in a new XML document and sent back to the user. The beauty of both Java and XML is that they can interface with end-users running standard Web browsers. Applications don't need to know what kind of client workstation the end-user has. Java applications can run on a wide variety of server platforms without being rewritten. That makes it easier and cheaper for application vendors to write and maintain their application code, which translates into faster turnaround on upgrades and functional enhancements."

  • [August 04, 2000] "Wf-XML and Interoperability." By Tom Spitzer. In WebTechniques Volume 5, Issue 8 (August 2000), pages 99-101. ['Tom Spitzer tells you why communicating between workflow applications just got easier.'] "Wf-XML provides a message-based architecture for communicating between workflow engines. It's much like a message-oriented middleware system (it's already a part of IBM's message-oriented middleware system). When one workflow engine sends another engine a message encoded in Wf-XML, it's effectively making a remote procedure call on that engine and providing the parameters that the procedure requires. Wf-XML can represent the data required to support chained and nested workflows. When processes are chained, a process instance being enacted by one workflow engine triggers the creation and enactment of a subprocess on a second engine. Once the subprocess initiates, the first engine maintains no interest in the subprocess. When processes are nested, the process instance enacted on the first engine causes the creation and enactment of a subprocess instance on a second engine, then waits for the subprocess to terminate before proceeding. To enable interoperability, workflow engines need to expose an API sufficient to parse a Wf-XML message and act on its contents. Although Wf-XML is defined independently of programming languages and data transport mechanisms, the WfMC expects that HTTP will become the most widely used data transport mechanism. To this end, Wf-XML can be used as an RPC mechanism between "generic services" that may consist of a number of different resources. . . Products that support process management are numerous. Many of the workflow vendors that participate in the WfMC seem poised to add the necessary interfaces to their own products. IBM is developing its MQ Series message-oriented middleware product to support workflow functions. The company claims compliance with earlier workflow processing standards (for backward compatibility) and offers an XML interface. SAP offers a Business Workflow module and indicates on its Web site that, as a funding member of WfMC, the company is committed to the interoperability of workflow engines and actively participates in the specification of WfMC guidelines, including Wf-XML. Widespread adoption of this standard will take some time, if it happens at all. I spoke to a product manager at a leading vendor of infrastructure software for corporate online procurement systems and B2B marketplaces. He agreed that it's important to integrate process steps between systems in our scenario: collaborative RFP development, and RFP processing between company and supplier. But he said that currently developers would have to write code at the API level to do it. The WfMC also has some work to do before general purpose workflow vendors adopt XML. In its standardization work since 1993, WfMC has identified five interfaces to a workflow engine. These include interfaces for process definition, access by a workflow client, invocation of external applications, administration and monitoring, and delegating processes to other workflow engines. The initial Wf-XML standard addresses only the last of these interfaces, disappointing those who are looking for a process-definition schema. Over time, the coalition is likely to define schema for additional interfaces. There's an Open Source workflow engine development project under way that's taking a pragmatic approach to schema definition. The project has already released versions of process definition and client access schema for comments. Understanding the structure of schema like these provides insight into the interfaces we need to develop. One day, the Web will support processes that cross departmental and organizational boundaries." XML-Based Workflow [Process Management] Standard: Wf-XML."

  • [August 04, 2000] "Microsoft, IBM Shelve Rivalry to Create XML Standards." By Wylie Wong and Mike Ricciuti. In CNet (August 04, 2000) "Microsoft and IBM, for years bitter enemies, are finding common ground as each attempts to dominate the market for Internet software. The two technology giants, which have in the past clashed in the operating system, database and desktop software markets, are collaborating on potential Web standards aimed at simplifying the delivery of each company's future software fine-tuned for the Web. Microsoft and IBM remain fierce competitors in the race to build Web-based services. Microsoft recently announced its massive Microsoft.Net plan to Web-enable its entire product lineup and move the bulk of its business onto the Web. IBM is hoping to unite its multiple hardware systems through integration software that will make its products more attractive to buyers setting up e-commerce sites and other Web-based services. But each company needs a common infrastructure to make new services a reality, and Extensible Markup Language (XML) appears to be the consensus choice. IBM and Microsoft executives acknowledge that a common Web standard, developed through a cooperative effort, could improve the chances for each company's product to succeed in the market. 'Microsoft and IBM are still competitors, of course,' Paul Maritz, senior vice president of Microsoft's platform group, told CNET 'But our tech people have come to some of the same technology points of view as IBM's people. It's been sort of a meeting of the minds at a very high level.' Bob Sutor, IBM's program director of XML technologies, agrees. 'It's important for us to get together with Microsoft. It's admittedly a big player, and the more we can get good technology agreed upon, the faster the whole area can grow,' Sutor said. 'If we get agreement, Microsoft, Sun, Oracle and the industry will say, 'This is something we can trust. This is something that has legs.' That's why it's important we continue to talk to Microsoft.' Analysts see the alliance as a marriage of necessity. . . Microsoft and IBM have recently released or announced several new XML specifications that would work with SOAP. Microsoft's Turner said that Web services can be created today with current technology but that new XML specifications will simplify the process. Any of the XML specifications would have to be submitted and approved by a standards body before they become official standards. Executives at both companies 'are taking a look at each other's specifications and seeing where there's crossover and where things can come together,' said a source who requested anonymity. Both Microsoft and IBM executives declined to be specific about their collaboration, but both sides said it makes sense to have a common set of XML standards."

  • [August 04, 2000] "Topic Maps: Templates, Topology, and Type Hierarchies." By Hans Holger Rath. In Markup Languages: Theory & Practice 2/1 (Winter 2000), pages 45-64 (with 24 references). Author's affiliation: STEP Electronic Publishing Solutions GmbH; Email:; WWW: Abstract: "The new ISO standard ISO/IEC 13250 Topic Maps defines a model and architecture for the semantic structuring of link networks. Dubbed the 'GPS of the information universe,' topic maps will become the solution for organizing and navigating large and continuously growing information pools, and provide a 'bridge' between the domains of knowledge representation and information management. This paper presents several technical issues of which are of great interest when applying topic maps to real world applications. The main focus of the paper is the introduction of 'topic map templates' -- a semi-official term coined by the standards' committee for a concept that the author argues is a necessary but as yet unstandardized addition to the basic model. Furthermore: association taxonomies, class hierarchies, and consistency constraints of topic maps are presented and discussed." [Conclusion:] "The new topic map standard ISO/IEC 13250 defines a model and architecture for the semantic structuring of link networks. It can be seen as a base technology for modeling knowledge structures. The standards working group defined topic maps in such a way that a limited but implementable set of core concepts express the necessary semantics. The STEP Group has investigated how topic maps can be applied to reference works and uncovered some concepts which are not made explicit in the standard: (1) ability to separate the declarative part from the 'real' map, (2) predefined association types and association type properties, (3) class hierarchies for types, and (4) consistency constraints as input to map validation. The paper has explained these concepts and presented meaningful solutions. First experiences have shown that the part of a topic map made up by all topics used as themes and types by other 'objects' in the map should be clustered somehow. For this purpose the term topic map template was coined by the ISO working group. Templates can be used as starting points for new maps or can be used by reference in order to provide all the themes and types the map needs. Standardizing topic map templates will offer base topic maps for specific application areas and could form the basis of semantic application profiles. We looked at related academic fields like mathematics, linguistics, and philosophy to get some substantial input about relations. The results are a list of association type properties which give important hints to the topic map software and a list of basic association types which could act as built-in superclasses. The introduction of the superclass-subclass relationship was the logical consequence. Another technical issue covered by the paper is the validation problem. Topic maps might become rather big with millions of topics, occurrences, and associations. Manual consistency checking will be impossible. All the previously defined concepts open the possibility for sophisticated rule-based validation of topic maps. The proposed consistency constraints are those rules which declare the semantics not expressible with DTDs and which control the validation process. A couple of examples proved that standardizing the missing concepts as predefined topic map templates will help both the topic map developer and the topic map user. The improvements were presented on a level that they can be used as input to the ISO working group for further discussions." See "(XML) Topic Maps."

  • [August 04, 2000] "Topic Maps Chart Future Course for XML in Content. [Topic Maps Discover Audience at XML Europe. Trip Report.]" By Liora Alschuler. In The Seybold Report on Internet Publishing Volume 4, Number 11 (July 2000), pages 1, 19-25. ['Topic maps, content management and creation, and some of the innovative newcomers were the talk of attendees at XML Europe 2000 in Paris. The new standard for indexing and organizing hyperlinks was the surprise hit of this spring's XML Europe event. Our trip report explains why and reviews three TM products. Also inside: our first look at Software AG's Tamino, IxiaSoft's TEXTML Server, Signicon's server, and EBT's Engenda. In authoring, we critique Arbortext and SoftQuad upgrades and review newcomers Praxis and HyperVision.'] It is a hard sell to convince any crowd that an ISO spec is going to have a presence on the Web, but you wouldn't have known it from the rapt attention at the 14 topic map sessions and tutorials held during the week of the Paris show. Topic maps are indexes of links, grouped by topic and following a consistent syntax. The bottom line: if you are cataloging information for retrieval on the Web, you should keep on eye on the tools and implementations that use ISO 13250. (The official topic map spec is online at Oak Ridge National Labs' Web site. If the whirl of attention at XML Europe was an orchestrated conspiracy to create buzz, it worked on the conference attendees. How much gets translated into running code and pages served remains up to a larger audience, but there were three vendors showing topic map code in some stage of development, and at least one application developer is betting the farm on the maturation of the tools and was willing to talk about it. (1) STEP previews Java engine: STEP UK is one of the largest SGML/XML consultancies and technology vendors in Europe and has been covered many times by us, most recently when it introduced X2X, a general linking tool first shown in Philadelphia last year. Graham Moore, CTO of STEP UK, developed X2X and also is developing its topic map technology. The company has a Java engine available in beta with a set of classes and interfaces. The engine supports import, export and merging of topic map instances. Users can choose any model for persistent topic map storage, including relational, object-oriented, or in-memory. The beta version includes Web-ready examples and XLink integration, which treats all topics, associations and facet links as XLinks. (2) Ontopia is a six-employee STEP breakaway headed by Steve Pepper, CTO and acting CEO. The company will soon have two engines available: Atlas, which is written in Java, currently in beta testing and will be available under conventional license; and tmproc, which is an open source Python engine and will be available for download soon. Atlas is self-described as a "publishing and navigational framework" for fast development of custom Web applications. It can be used in conjunction with different persistence mechanisms and different interchange syntax (practically speaking, that appears to mean XML, HTML and SGML). (3) InfoLoom targets content creators: Unlike both Ontopia and STEP, the InfoLoom product, from startup InterLoom, is not a general topic map engine, but rather a program for inserting topic maps into content. TMLoom and the Topic Map Loom XSL subset have been under development almost as long as the specification itself. They were written by Michel Biezunski, co-editor of the spec and leader of the XML Topic Map (XTM) effort sponsored by GCA's IDEAlliance. . ." See further in 'topic maps at XML Europe 2000' and the main entry, "(XML) Topic Maps."

  • [August 04, 2000] "Microsoft Cranks Up Its Wide-Ranging E-Book Program. [Microsoft Discloses Ambitious E-Book Program.]" By Mike Letts and Mark Walter. In The Seybold Report on Internet Publishing Volume 4, Number 11 (July 2000), pages 5-11. ["There's been much speculation, and a few broad announcements, about Microsoft's e-book program, but details of the full scope of its efforts have been hard to obtain. However, last month, on the eve of Book Expo America in Chicago, the company hosted an all-day event to brief publishers on the e-book products that will be delivered in the coming months. At Microsoft's invitation, we attended the briefing to bring you this detailed report. The scope of the full effort is impressive, encompassing a PC-based Reader, several shrink-wrapped e-book server products and an end-to-end digital rights management solution. Perhaps most surprising was the showing of a prototype dedicated e-book reader based on a reference design Microsoft is showing to OEMs. Working with the same level of intensity that has come to define the company, Microsoft is charging hard into the e-book arena, raising the stakes and attempting to set the standards. In the process, we think it is also reshaping the electronic publishing landscape of the future.] ". . . The Digital Asset Server. The overall framework in which the DRM (digital rights management) system operates is Microsoft's upcoming Digital Asset Server platform. This server software is currently offered as a complete package that manages a store of books and processes and fulfills requests. It interfaces to the server containing the source content files, which need not reside at the distributor or bookseller's site; the Content Store Server houses the source files and may also store associated metadata, including the license information, in a SQL Server database. Microsoft has left open the option of interfacing its Content Store Server to other content repositories by relying on XML as the primary interface for exchanging metadata on the content. . . Microsoft is building a viewer specifically for extended reading, and isn't trying to bolt reading onto a Web browser. This approach is a distinct plus, as it gives the company the freedom to create a user interface optimized for the intended purpose, without the encumbrance of another application's baggage. The initial effort verifies its decision: the Windows MS Reader looks far better than most other PC-based e-book readers. The Reader also carries the advantage of using the XML-based Open E-Book format -- most notably rendering at the client, so that layout and composition can be optimized to the device, and even the window, on which the book is being read. . . Although Microsoft is utilizing ContentGuard's published XrML language to pass the license to its Reader, its .LIT binary data format is not open to the public, and the Reader itself is not open to third-party extensions." See "Open Ebook Initiative."

  • [August 04, 2000] "Nexpo 2000: Newspapers Tackle Mixing Content and E-Commerce." By Luke Cavanagh. In The Seybold Report on Internet Publishing Volume 4, Number 11 (July 2000), pages 14-18. ['The sight of new media software alongside print-oriented editorial, advertising and production systems has become commonplace at newspaper conferences. But a close look at the evolution of editorial systems shows that perhaps the newspaper industry is lagging behind others in embracing the total functionality of the Web.'] "One of the big questions going into Nexpo 2000 was: What will newspapers do about bolstering revenues lost to the Web? Specifically, how do they compensate for money lost by giving away content in the digital world that is normally paid for in print? Given that last-year's show produced the dramatic emergence of single-database, XML-enabled cross-media publishing, there was speculation that vendors, even mid-to-high-level vendors, would begin to build on that infrastructure and add some more 'new-media savvy' features to their systems... From a vendor standpoint, it sometimes seems as though newspapers are perhaps not pushing the cart quite hard enough and remaining grounded in print problems. One head of an international, mid-to-high level editorial system supplier offered the opinion that he thinks part of what's holding newspapers back is a power struggle, a clash between those that have had long-standing control of the print products and those that are now heading up the Web operations. . . [Yet] (1) CCI: Building on the foundation. CCI, the Danish vendor that handles accounts for some of the world's largest papers including the LA Times, USA Today, Washington Post, and the Chicago Tribune, will have Web functionality when version 6.0 of its NewsDesk editorial system hits the market. Still absent from the new CCI Web publishing features are any sort of personalization capabilities, as well as wireless publishing. With the system's reliance on XML, the latter shouldn't be too far off... (2) Unisys: XML to the rescue. Unisys, meanwhile, unveiled Hermes Online, for use with its Hermes print system. Hermes Online introduces XML support into the Hermes system and offers a similar feature set to what we saw from CCI. However, Hermes Online uses a separate database and FTPs content automatically from the production database to a database-driven Web server (as opposed to using single or side-by-side replicated databases on opposite sides of a firewall.) The system offers dynamic, template-driven publishing and automatically updates the Web server anytime a story is changed in the production database. . . (3) OpenPages demonstrated version 2.5 of ContentWare, slated to be available in Q3 of 2000 (though it is shipping to selected sites now.) The updated version features support for both Solaris and NT as well as new JSP support for page generation. Also, it features added support for Xtensible Style Sheets as well as integration with SoftQuad's XMetaL XML editor. Integration with Microsoft Word has been enhanced to allow pages using links and graphics created in Word to retain their original form when uploaded to the system. (4) Digital Technology International showed version 5.0 of its NewSpeed Editorial system, which publishes stories with XML markup directly to the Web from the production database. Particularly attractive in NewSpeed is a simple drag-and-drop interface for adding XML tags to story copy to ready the story for print or Web. The system also features a highly intuitive workflow interface, certainly a differentiating factor in a newsroom environment. The product has been in ongoing development at the Orem Daily Journal for the better part of a year." On the relevant XML news standards, see the recent news update.

  • [August 04, 2000] "How Oxford University Press and HighWire Took the OED Online." By Mike Letts. In The Seybold Report on Internet Publishing Volume 4, Number 09 (May 2000). "Oxford University Press, with the help of HighWire Press, recently launched the first Web edition of the massive Oxford English Dictionary (OED). Behind the scenes, the project team overcame numerous challenges that the voluminous reference work posed to online delivery, and proved the return on investment that careful markup provides. Believing that contemporary dictionaries were not adequately documenting the history and usage of the English language, the Philological Society of London decided in 1857 to begin a complete documentation of the evolution of the language from the 12th century forward. However, it wasn't until 1879 that a formal agreement was reached with Oxford University Press (OUP) to begin work on what would eventually become the eminently staid, but authoritative, Oxford English Dictionary. Considered the definitive guide to the evolution of the English language, the first full edition of the dictionary wasn't completed until 1928 under the name A New English Dictionary on Historical Principles. What was planned as a four-volume undertaking, became a four-decade, 12-volume project. By 1984, with supplemental volumes added to the first edition, OUP decided to move its magisterial reference work (now known as The Oxford English Dictionary) into an electronic format using a then innovative SGML tagging scheme. The project took five years and cost $13.5 million, culminating in 1989 with the publication of the 20-volume second edition. Now, using the foundation that was laid in SGML some 15 years ago, OED has established the dictionary on the World Wide Web. OED Online (, which went live on March 14, is a complete online copy of the second edition, which features more than 500,000 defined terms and 2.5 million quotations. It will also be used as the foundation of the reference work as OUP begins what it estimates will be at least a 10-year project to publish a third edition of the dictionary. . . Oxford's approach to taking the OED online marks yet another major print reference work whose editorial and production processes has been transformed from a print-centric workflow to a Web-centric one, in which print publications are merely snapshots of the live publication that lives online. The speed at which the project was brought online and its relatively small price tag were made possible by Oxford's prescient decision 15 years ago to convert the dictionary from film and typesetting files into SGML. To move a century's worth of work and accumulated material onto the Web in 16 months, at a cost of only about $1.5 million, is a testament to what solid markup, responsible content management and forward thinking can do. It's worth noting that the project could have been done even faster and at less cost had Oxford cut corners in the quality of the presentation or the efficiency of the application. It didn't. OED Online is well-executed, a pleasure to read and to use. Oxford didn't go it alone, however, and credit is certainly due to HighWire Press for its skill in production. Its tuning of Verity's K2 engine to handle the fine-grained searches that OED demanded defies conventional wisdom, which says to avoid such finely grained markup when deploying to thousands of readers. And its experience in SGML-based Web publishing, proven for more than 100 journals, translated well into its first venture into reference works and was essential to the successful launch of this landmark Web publication." For full details on the SGML phase, see: "University of Waterloo Centre for the New OED and Text Research."

  • [August 04, 2000] "Connex Brings XML to Legacy Newspaper Systems. Server Supports NAA Standards." By Luke Cavanagh. In The Seybold Report on Internet Publishing Volume 4, Number 9 (May 2000). "Global Digital Technologies, Inc., in conjunction with integrator and reseller Nova Publishing Products, Inc., has introduced a new line of XML conversion products for newspapers called Nova Connex. The product line is designed for non-XML compliant front-end newspaper systems that want to leverage XML in managing classifieds, news content, and commercial advertising. . . The system is a two-way XML conversion package that employs a central Connex XML Server interfaced to existing editorial, classified and business systems. The server's functionality extends, through various clients, to handle needs in several vertical areas, including ad management, news reporting and classifieds. Nova says the server is equipped to handle any XML DTD and contains support out of the box for the IPTC's NITF standard for digital news content and the NAA's ADEX standard for multimedia classifieds." See also the announcement. See "News Industry Text Format (NITF)."

  • [August 04, 2000] "BOX: Browsing Objects in XML." By Christian Nentwich, Wolfgang Emmerich, Anthony Finkelstein and Andrea Zisman (Department of Computer Science, University College London, Gower Street, London, WC1E 6BT. Email: {c.nentwich|w.emmerich|a.finkelstein|a.zisman} Abstrca currently required to map an input from one system to another should be greatly diminished with XML serving as the universal information platform. t: "The latest Internet markup languages support the representation of structured information and vector graphics. In this paper we describe how these languages can be used to publish software engineering diagrams on the Internet. We do so by describing BOX, a portable, distributed and interoperable approach to browsing UML models with off-the-shelf browser technology. Our ap-proach to browsing UML models leverages XML and related specifications, such as the Document Object Model (DOM), the XML Metadata Interchange (XMI) and a Vector Graphic Markup Language (VML). BOX translates a UML model that is represented in XMI into VML. VML can be directly displayed in Internet browsers, such as Microsoft's Internet Explorer 5. BOX enables software engineers to access, review and browse UML models without the need to purchase licenses of tools that produced the models. BOX has been successfully evaluated in two industrial case studies. The case studies used BOX to make extensive domain and enterprise object models available to a large number of stakeholders over a corporate intranets and the Internet. We discuss why XML and the BOX architecture can be applied to other software engineering notations. We also argue that the approach taken in BOX can be applied to other domains that already started to adopt XML and have a need for graphic representation of XML information. These include browsing gene sequences, chemical molecule structures and conceptual knowledge representations." [cache]

  • [August 04, 2000] "Implementing incremental code migration with XML." By Wolfgang Emmerich, Cecilia Mascolo, and Anthony Finkelstein. In Proceedings of the 22nd International Conference on on Software Engineering, 2000 [June 4-11, 2000, Limerick Ireland], pages 397-406 (with 31 references). Contact address: Department of Computer Science, University College London, Gower Street, London WC1E 6BT, UK. Email: {W.Emmerich | C.Mascolo | A.Finkelsteing} "We demonstrate how XML and related technologies can be used for code mobility at any granularity, thus overcoming the restrictions of existing approaches. By not fixing a particular granularity for mobile code, we enable complete programs as well as individual lines of code to be sent across the network. We define the concept of incremental code mobility as the ability to migrate and add, remove, or replace code fragments (i.e., increments) in a remote program. The combination of fine-grained and incremental migration achieves a previously unavailable degree of flexibility. We examine the application of incremental and fine-grained code migration to a variety of domains, including user interface management, application management on mobile thin clients, for example PDAs, and management of distributed documents. [Summary:] In this paper we presented an incremental approach to code mobility using the XML language. The novelty of the approach is the ability to send code incrementally instead of resending complete updated versions of the code. Java based technologies launched the idea of object and classes mobility, allowing a set of new paradigms for communication to become feasible. Many theoretical languages have been used to specify and analyze code mobility. The movement is specified with different granularities showing that the Java point of view, where a class is the unit of mobility, was not the only possibility to be explored. In this paper we have shown a possible embodiment of these ideas, and described a set of potential application. We will now develop one of these applications in order to have a benchmark that we can use to evaluate the approach. In particular, we want to study performance issues and understand the trade-off between space and speed overhead compared to Java byte code transmission. We are also interested in exploring the security implications of code migration and addressing them with the security services that object-middleware provides. By implementing interpreters as CORBA objects and using the access control interfaces of the CORBA Security service, we can guarantee that only authorized principals are performing changes to code. In [Ciancarini/Vitali/Mascolo] displets are used to render special tags defined in XML using Java specific code for displaying formal notation on the Web. We see possible development of our work with the integration of this technique; DTDs and Java fragments could be sent together in order to update the ability of the interpreter to understand new constructs. We are also interested in pro-viding support for proactive code mobility by adding specific XML tags like go, that are available in other mobile code languages. They are interpreted as movement commands: this extension introduces many issues related with the dynamic modification of code. However, we believe this would extend the potential of the approach described in the paper considerably. We intend to explore the use of this approach in real projects involving industrial partners in some of the domains that we mentioned in Section 4. We are currently collaborating with an industrial partner to develop a flexible user interface management for business analysis applications and intend to take advantage of XwingML in this application. Moreover, we are investigating the use of Symbian mobile phones and PDAs as application platforms with an e-commerce provider. In this setting, we will explore the use of incremental code mobility for application management purposes." [cache]

  • [August 04, 2000] "XML offers a universal information platform for Healthcare Information Systems." By Bryan Caporlette. "The XML/EDI Group is working to establish standards for commercial electronic data interchange based on XML. The emergence of XML for EDI has focused the development of packaged e-commerce systems based on exchanging XML data. As such, XML is enabling Internet-based e-commerce to happen. By tying together back-end ERP systems, corporations are able to support direct application integration over the Internet. This new incarnation of EDI will be cost effective and highly extensible to support the requirements of the healthcare industry. Clinical portals were mentioned earlier, let me better define these emerging XML-based systems. A clinical portal is an information system that enables the acquisition, management and presentation of information via through a single access point (or portal). They do not require the information to be stored in a master repository, rather these systems will provide facilities to locate, process, and deliver the information to a requesting agent. These new clinical portals must effectively contend with three issues; data acquisition, data processing, and data presentation. Data acquisition is the act of gathering data from multiple data sources. To successfully acquire data, a system must provide support for a multitude of industry protocols, robust data extraction, and comprehensive data transformation capabilities. Data processing provides the utilities to deal with various types of information once it is accessible via data acquisition. To successfully manage data, a system must provide a persistent data repository. A system must provide search and index capabilities for data stored in the repository, as well as search capabilities to real-time data (data not necessarily stored in the repository). Data presentation is the process of delivering data to the end user and representing that data in a way that best suits the needs of the user. By providing the data presentation layer, a system becomes the end-user "portal" to information that is being managed and acquired from other systems. Data presentation includes the ability to display information as individual records, or aggregated as a report. Interface engines are applications that provide connector devices between two systems, they are used to intercept, transform, and queue messages for delivery to the next application. Interface engines have been prevalent in healthcare for a number of years, usually deployed to enable point-to-point communication between two clinical information systems. The interface engine vendors have already begun to look at using XML as the canonical format for exchanging messages between systems. As clinical systems begin to support the new HL7 v3 messaging formats, the interface engines will have to be able to process XML messages. The 'training' currently required to map an input from one system to another should be greatly diminished with XML serving as the universal information platform."

  • [August 04, 2000] "Deep Inside C#: An Interview with Microsoft Chief Architect Anders Hejlsberg." By John Osborn. From O'Reilly News. July 2000. ['In July, O'Reilly editor John Osborn attended the Microsoft Professional Developer's Conference where he conducted the following interview with Anders Hejlsberg, Distinguished Engineer and Chief C# Language Architect about Microsoft's .Net framework and the C# programming language. Anders Hejlsberg is also known for having designed Turbo Pascal, one of the first languages available for PCs. Anders licensed Turbo Pascal to Borland and later led the team that created Delphi, a highly successful visual design tool for building client server applications. Also in attendance at the interview were Tony Goodhew, Microsoft C# product manager, and O'Reilly Windows editor Ron Petrusha.'] "First of all, C# is not a Java clone. In the design of C#, we looked at a lot of languages. We looked at C++, we looked at Java, at Modula 2, C, and we looked at Smalltalk. There are just so many languages that have the same core ideas that we're interested in, such as deep object-orientation, object-simplification, and so on. One of the key differences between C# and these other languages, particularly Java, is that we tried to stay much closer to C++ in our design. C# borrows most of its operators, keywords, and statements directly from C++. We have also kept a number of language features that Java dropped. Why are there no enums in Java, for example? I mean, what's the rationale for cutting those? Enums are clearly a meaningful concept in C++. We've preserved enums in C# and made them type-safe as well. In C#, enums are not just integers. They're actually strongly typed value types that derive from System.Enum in the .NET base-class library. An enum of type 'foo' is not interchangeable with an enum of type 'bar' without a cast. I think that's an important difference. We've also preserved operator overloading and type conversions. Our whole structure for name spaces is much closer to C++. But beyond these more traditional language issues, one of our key design goals was to make the C# language component-oriented, to add to the language itself all of the concepts that you need when you write components. Concepts such as properties, methods, events, attributes, and documentation are all first-class language constructs. The work that we've done with attributes -- a feature used to add typed, extensible metadata to any object -- is completely new and innovative. I haven't seen it in any other programming language. And C# is the first language to incorporate XML comment tags that can be used by the compiler to generate readable documentation directly from source code. [. . .] One fairly good example of this is how XML integrates with C#. We have this notion of 'attributes' in C# that allows you to add declarative information to types and members. Just as you can say a member is public or private, you also want to be able to say this one's transacted, or this one's supposed to be a Web service, or this one is supposed to be serializable as XML. So we've added attributes to provide this generic mechanism, but then we utilize it in all of our Web services and XML infrastructure. We also give you the ability to put attributes on classes and on fields in your classes that say: 'When this class goes to XML, it needs to become 'this' tagname in XML and it needs to go into 'this' XML namespace. You want to be able to say a specific field in one place becomes an element, and that another becomes an attribute. You also want to control the schema of the XML that goes out; control it where you're writing your class declaration, so that all of the additional declarative information is available. When attributes are properly used in this way to decorate your C# code, the system can simply turn a specific class into XML, send it over the wire, and when it comes back we can reconstitute the object on the other side. It's all done in one place. It's not like additional definition files or assorted infos and naming patterns. It's right there. It gives you statement completion when you build it in the IDE, and we can then provide you with higher-level tools that do the work for you. I know I'm on a tangent here, but some of the infrastructure we provide is truly exciting. Simply because we have these attributes, you can ask our XML serialization infrastructure or our Web services infrastructure to translate any given class into XML. When you do, we'll actually take the schema for the class, the XSD schema, and we'll build a specialized parser that derives from our generic XML parser (which is part of the .NET base classes), and then override methods and add logic into the parser so that it is specialized for that schema. So we've instantiated a parser that at native code speed can rip through XML. If it's not correct, we'll give you a nice error message, which tells you precisely what went wrong. Then we cache it in our code-caching infrastructure, and it sits around until the next time a class with an identical schema comes by and it just goes, 'Bam!' I mean, incredible, incredible throughput."[lbullard]

  • [August 04, 2000] "Digital Play Dough. Designing Applications with XUL." By Shelley Powers. In WebTechniques Volume 5, Issue 8 (August 2000). ['The XUL specification promises to make Web application development and design a whole lot easier. Shelley Powers shows you how to get started.'] "The XML-Based User Interface Language (XUL) made its first appearance with the release of Mozilla, the Open Source browser used as the foundation for Netscape 6. Pronounced "zool," the language gives developers and designers an easy way to describe the design and layout of application windows in Web browsers. By modifying a few files, you can change the entire look of your Web browser or of the windows that pop open while a visitor browses your site. Prior to XUL, this was only possible by modifying and re-compiling the browser's underlying source code. And in that case, you would have to distribute the modified browser to all your site's visitors -- an unlikely event. Fortunately, all you need to change the look and feel of a Web browser today is an understanding of the XML and CSS specifications and a little ingenuity. XUL applications consist of XML files created with .xul extensions. The files define the content of the application. Additional application data is located in Resource Description Framework (RDF) files. CSS files provide formatting, style, and some behavior, for the application. JavaScript files provide scripting support. Multimedia files, such as PNG images and other audio/visual files, might also be needed for additional user interface information. All of the file types are specifications recommended by the W3C, and collectively are referred to as the XUL application's "chrome" -- the contents, behavior, and appearance of the application's user interface. The Mozilla browser is itself designed as an XUL application. To manage the chrome for your browser, both Mozilla and Navigator have subdirectories labeled chrome, located off each browser's main directory. Within the chrome directory, separate XUL applications are packaged into separate subdirectories. Within each application directory, subdirectories further divide the application into content (containing the XUL, JS, and RDF files), skin (CSS files), and locale (DTD files). To deploy your own XUL application on the Web, you can either place all of the files within the same subdirectory on a Web server, or use the suggested chrome directory structure on the server. Note though that you may lose some functionality, such as localization, when your application is not using the chrome directory structure. Also, all files should be local to the URL of the main XUL file, otherwise the application may not work due to violations of built-in security... You'll find the complete application with all necessary files in the file. With just a few changes of the XML file associated with the XUL application -- adding a meter and using a tree instead of buttons -- I changed application functionality and provided better user feedback and a better design. With a few adjustments to the CSS file for the application, I was able to create a new and different look to go with my application's functionality. XUL promises to be a powerful tool for Web application development and deployment. I look forward to seeing future iterations and refinements of the technology in future browser versions." See "Extensible User Interface Language (XUL)."

  • [August 04, 2000] "Debugging XML Applications. [XMl@Large.]" By Michael Floyd. In WebTechniques Volume 5, Issue 8 (August 2000), pages 79-81. ['Michael Floyd offers tips for debugging your XML applications.'] "When developing XML applications, it's often necessary to use several technologies to manage data. As a developer, you must first design schema to represent the data, use markup to describe it, DTDs to validate it, XSL to transform and present it, and the DOM to access and modify it programmatically. In a client-server arrangement, you also have to deal with the technologies of the existing infrastructure. Because there are so many factors in a single XML application, it's not always easy to tell where things went wrong when a document doesn't display correctly. This is particularly true when generating XML dynamically from a server, because you never see the XML that's being generated. It's possible to spend hours debugging the program. So what can you do to locate the problem quickly and solve it? Debugging a static XML document is relatively straightforward. All you need do is run the document through a parser, which will show you where there are formatting errors. However, when debugging applications that generate XML on the fly, your chances of finding the errors in the document are slim. Because the document lives (and dies) in memory, you never actually get to see the XML that's being generated. There's no physical XML file to run through a parser. In this case -- actually, in any case -- the first thing you want to do is ensure that your XML document is well formed. If the data being generated isn't well formed, the parser fails to generate any output, and if your application is supposed to produce a resulting HTML document you'll typically end up with a blank screen instead. If this happens and you're not sure whether your document is well formed, you can turn the parser's validation option on and redirect any error messages back to your Web browser. And if your XML documents have DTDs associated with them, you can embed a minivalidator into your application. [...] The strategies presented here should serve well in tracking down most problems you'll encounter in XML. Remember to validate and revalidate. Include robust error handling routines in your code and embed validation in those routines. This will help you to narrow down possible problems quickly and ultimately will save you countless hours of head scratching."

  • [August 04, 2000] "XML plays peacemaker between Java and proprietary languages. A Few Kind Words About Microsoft." By David F. Carr. In InternetWeek (July 15, 2000). "It's rare that anyone from Sun Microsystems gives Microsoft credit for anything. But here's John Bosak, a Sun distinguished engineer who led the World Wide Web Consortium committee that created the original XML specifications, talking about the technology's current momentum: 'The biggest surprise here was Microsoft's adoption of this. Because XML is only partly a technical thing. It also has a social agenda, inherited from SGML, not to be held hostage to any particular application platform. So to see Microsoft embrace that -- well, amazed is not too strong a word. But I'd have to say that's what probably kick-started it back in '97.' OK, so you can detect certain undercurrents even in that statement. But Bosak isn't one of Sun's hatchet men. He gives Gates & Co. credit for assigning smart people to work on XML-related efforts. Actually, Microsoft is probably more closely associated with XML than Sun is, despite Bosak's work. Sun engineers have been saying for years that Java and XML were complementary technologies, not competitors. One is focused on data, while the other governs application behavior. But I wonder if there wasn't a lack of enthusiasm further up at Sun to back anything with which Microsoft was involved. Besides, Sun has been trying to sell a Java platform that will liberate applications from operating systems, particularly Microsoft's. XML is another attempt to eliminate platform dependence -- including dependence on Java. Given the choice, Sun would prefer a world of tightly wound Java objects to loosely bound systems based on XML document exchange. Even if that's correct, by now irrational prejudice has given way to a pragmatic recognition of XML's importance. . .Bosak thinks it's essential to define industry-specific XML vocabularies, a goal he has pursued as a founder of OASIS. However, he admits that not everyone agrees. With the right transformation technologies (something that has improved very rapidly over the past decade, thanks to enterprise application integration), many technologists figure it shouldn't be a problem to translate between two differently structured XML purchase-order formats, for example. 'I am of the school of thought that either there's a round-trip mapping that's possible between two documents or there's not,' Bosak says. If not, you're losing information with every translation, he points out. 'And if a transformation is possible, why do we have two versions?' Could XML be subverted by proprietary extensions? The issues aren't really the same as with Java, because XML was designed to be extended freely. Occasionally, someone warns that Microsoft will be able to wield undue influence by winning a big market share with its (still unreleased) BizTalk Server, or by tying Microsoft products to XML-inspired technologies such as its SOAP (Simple Object Access Protocol) proposal. But it's premature to worry about that. And both IBM and Sun have recognized the potential of SOAP, agreeing to work on refining the submission to the Internet Engineering Task Force. Bosak does worry about proprietary threats, 'even though XML, by its nature, eliminates maybe half the danger.' Often, the vendors have development resources that the constituencies for these technologies lack, 'so there's a strong incentive to take an off-the-shelf solution if one's available,' he says." [cache]

  • [August 04, 2000] "Generate XML/XSL Results from AS/400 Query." By Julie Miller. In Midrange Systems Volume 13, Number 10 (July 24, 2000), pages 12, 16. "Advanced Systems Concepts (ASC, Schaumburg, Ill.), a longtime provider of productivity and utility software tools for the AS/400, has recently added enhancements to its SEQUEL and SEQUEL Web Interface (SWI) products that help create and display XML/XSL results from AS/400 database queries. (Extensible Markup Language, XML, is the universal format for structured documents and data on the Web. Extensible Stylesheet Language, XSL, is a style sheet language used to display XML data at a browser.) New XML, scripting and FTP functions have been added to the SEQUEL Kernel -- along with new enhancements to the SWI -- that merge multiple XML files, generate XSL templates, and serve XML and XSL files to the browser. SEQUEL automatically generates XML files with an SQL statement in one step. The SEQUEL Web Interface analyzes the content of the XML data files and automatically generates an XSL style sheet. With SEQUEL, a user can submit a run-time prompted query from a Web browser, which will return XML data results from multiple SEQUEL views. These can then be combined with graphics, hyperlinks and formatted text, all within a single XSL style sheet. 'Developers with little or no experience in XML or XSL can use SEQUEL and get up to speed very quickly with this functionality,' says Rob Peterson, ASC's director of marketing. 'With a single command, SEQUEL will extract data from an AS/400 file, create an XML data file from the output, and either save the new XML file in an IFS directory or FTP the file to another server. The SWI also has a new function that automatically generates an XSL corresponding to the data.' The XSL can be used as-is, or modified further to incorporate a user's own design elements or styles. Developers with a basic knowledge of HTML can easily modify the XSL template with little or no additional training, adds Peterson. The new Scripting function in the SEQUEL Kernel is especially useful in combining the multiple steps usually required to run and display XML-based interactive queries from a browser. SEQUEL scripts allow users to run multiple SEQUEL commands (and many other system commands) together from a single request. Scripting gives users much of the capability of Command Language (CL) without requiring technical programming knowledge. A SEQUEL script can be run from a command line, job scheduler, icon (using SEQUEL ViewPoint) or Web browser (using the SEQUEL Web Interface)."

  • [August 04, 2000] "OSS Hatches Format-Independent Web Server." By Stephen Swoyer. In ent Magazine Volume 5, Number 12 (July 31, 2000), pages 20, 22. ['Cocoon is a powerful framework for XML web publishing which brings a whole new world of abstraction and ease to consolidated web site creation and management based on the XML paradigm and related technologies.'] "According to its caretakers, Cocoon -- an offshoot of the OSS community's Apache XML Project -- aims to change the way that information on the Web is created, rendered, and served. Cocoon's proponents point to the fact that document content, style, and logic are often engineered separately by different teams or project groups. This process can create interoperability problems or lead to anarchy in the development process, which in turn increases time-to-market or lengthens development schedules. When Cocoon is fully realized, supporters claim it will enable Web applications to be independently designed, created, and managed. Cocoon consists of nothing more than a Java servlet that exploits a new technology -- Extensible Stylesheet Language, or XSL -- to transform Web pages encoded in XML. Although it is enabled by virtue of both Java and XML, Cocoon has adopted several new standards, including XSL, XSL Transformations, and the Document Object Model Level 2. Despite the advantages XML brings to the table, Web content generation today is still based largely on HTML, which doesn't necessarily complement the Web development processes of most IT organizations. For example, HTML does not provides a native provision for separating data from its presentation, nor does it allow for the mixing of formatting tags, descriptive tags, and programmable logic on the client or on the server. Because it can exploit XSL's new transformation capabilities, however, Cocoon can dynamically merge content, programming logic, and style. Cocoon's advantage is that it can facilitate individual presentations of data precisely tailored to the specific application that requests it, say proponents. Rather than maintaining separate copies of HTML, PDF, and WML files, for example, Cocoon lets a Web development team maintain one source file that could be rendered on-the-fly into any supported format when requested by a connecting client. Cocoon works by breaking the Web development process down into three stages: XML creation, XML processing, and XSL rendering. In the first stage, an XML file is created by a development team that doesn't necessarily have to be aware of the manner in which the content itself is to be processed further on down the line, but need only know its particular DTD/namespace. In the second stage, the XML file is processed, and whatever programming logic it contains in its separate logic sheet is then applied. In the third stage, the document is rendered by applying an XSL style sheet to it and structuring it in the precise format -- such as HTML, PDF, or WML -- of the application that requests it..."

  • [August 04, 2000] "Microsoft Releases Public Beta of XML Parser." By Ted Williams. In ent Magazine (July 31, 2000). ['Microsoft has launched the public beta release of the third version of its XML parser, making it possible to apply XML services to applications. The program is available for download from the MSDN XML Development Center.]' "The latest version of the program makes SAX2 - the Simple API for XML programming interface - accessible through Visual Basic without the need to download an entire file. Previously, XML files were readable through Visual C++ through the program. The program achieved a passing rate of more than 98 percent using the OASIS (Organization for the Advancement of Structural Information Standards) Conformance Test Suite, which calls for synergy among developers regarding the XML processing language..." See the announcement.

  • [August 04, 2000] The Metadata War: Oracle vs. Microsoft." By Alicia Costanza. In ent Magazine Volume 5, Number 12 (July 19, 2000), page 28. "The Object Management Group (OMG), supported by Oracle Corp. and a battalion of other companies, announced the ratification of a new XML-based metadata standard: the Common Warehouse Metamodel (CWM). Approval of the standard is a victory for Oracle in the ongoing battle with Microsoft Corp. over metadata standardization efforts. CWM was developed because of the industry's need for commonly defined metadata. Right now, companies have many databases, many repositories, and many schemas describing the data, says Andrew Watson, vice president of technology at the OMG. By creating a standard, different data models can be integrated, plus there can be a standard basis for data mining and OLAP across the enterprise. Not only will a metadata standard enable integration throughout a single enterprise, but it will also enable data communication between different companies and their data applications. The purpose of CWM is to do just that: provide metadata with universal definitions so all data applications and stored data can be shared, integrated, and understood. . . The OMG approval of the CWM standard will undoubtedly lead numerous companies to adopt the Oracle-backed standard. Some companies, however, are members of both the Object Management Group and the Metadata Coalition. If two separate standards coexist in the same space, what does this mean for software vendors and IT managers? 'We ultimately expect the OMG and Metadata Coalition standards will converge over time as common partners of both Oracle and Microsoft will push for this and because of the simple fact that each organization -- OMG and MDC -- is a member of the other. Rather than one party surrendering the battle, we fully expect to see a compromise reached, and we eagerly await the Metadata Coalition's reaction,' Schiff says." See "OMG Common Warehouse Metadata Interchange (CWMI) Specification."

  • [August 04, 2000] "Transforming XML: Adding New Elements and Attributes." By Bob DuCharme. From (August 02, 2000). ['This month's installment of our XSLT tutorial covers adding new elements and attributes to the results of your XSLT transformations.'] "In the first "Transforming XML" column, we saw how an XSLT style sheet can instruct an XSLT processing program to copy, delete, and rename elements being copied from the input to the output. Another common task is the addition of new elements and attributes to the output. Whether you're converting an element to a new attribute, an attribute to a new element, or supplying either with a function's return value or a hardcoded string, you can choose between a quick simple way and a more complex and powerful way to add both elements and attributes to your output. An XSLT processor's main job is to look through a style sheet for the various specialized elements from the XSLT namespace and to execute the instructions specified by those elements on the tree where the input document is stored in memory. When the processor finds elements from outside of the XSLT namespace in any of a style sheet's templates, it passes them along to the result tree and eventually to the output document. We call these 'literal result elements.' This makes it easy to add new elements to your output documents: simply add elements from outside of the XSLT namespace inside of the appropriate templates. (If you really want to output elements from the XSLT namespace--for example, to generate style sheets as output--you'll need XSLT's namespace-alias element.) The following XSLT style sheet demonstrates this. . ." For related resources, see "Extensible Stylesheet Language (XSL/XSLT)."

  • [August 04, 2000] "Even More Extensible." By Alan Kotok. From (August 02, 2000). ['Since our first survey of XML business vocabularies in February this year, the number of entries in our tables has more than doubled, highlighting the large push forward in vertical and cross-industry standardization activity.'] "An Updated Survey of XML Business Vocabularies. In February 2000, published a survey that found 124 different XML business applications at various registries and reference sources. By August 2000, the number of XML business vocabularies listed with these sources appears to have doubled to over 250, putting even more pressure on standards bodies to develop a way for these applications to interact. As in the first survey, a majority of the vocabularies (146 of the 251) represent one-industry applications, called verticals. Some 95 other entries express business functions, defined as common operations that apply to a number of industries. The fewest number of vocabularies are frameworks that offer interoperability among the business languages, but frameworks provide the most functionality and have received the most attention. Many of the vocabularies uncovered in the new survey tend to represent much more specialized uses of XML. The new entries include more vocabularies for the exchange of scientific data, including a Physics Markup Language and languages for genetics. The increase in specialization is also expressed in single company vocabularies designed for doing businesses with those companies. For example, the DSL service provider Covad has an XML vocabulary for handling various customer service functions. Despite the proliferation, some consolidation among vocabularies has begun. The Human Resource Management Markup Language now comes under the HR-XML consortium. Likewise the Hospitality Industry Technology Integration Standards and Open Travel Alliance announced plans in June to merge their specifications."

  • [August 04, 2000] "XML-Deviant: Investigating the Infoset." By Leigh Dodds. From (August 02, 2000). ['XML's syntax was invented before its data model, but the XML Infoset specification is seeking to plug the gap and formalize the data model. The XML-Deviant examines what the Infoset is, and what people think of it so far.'] "What is the XML Infoset specification? What purpose does it serve? These are some of the questions that have been discussed on XML-DEV this week. The XML-Deviant was there to record the answers. The latest draft of the XML Information Set ("Infoset") specification was published this week, providing an update to the previous December 1999 draft. The Infoset is one of those specifications frequently mentioned, but rarely discussed in detail. Paul Abrahams, no doubt voicing the thoughts of many other developers, wondered what the purpose of the Infoset was: What is the purpose of the XML Infoset? Is it mainly intended to enlighten implementors about what the abstract structure of an XML document is, or does it have some other less obvious uses?..." For the W3C specification itself and the design goals of the XML Core Working Group, see the latest version of the XML Information Set working draft, edited by John Cowan. According to its published abstract, the specification "describes an abstract data set which contains the useful information available from an XML document." See further (1) XML Information Set Requirements, (2) W3C mailing list comments, and (3) the XML Activity Statement.

  • [August 04, 2000] "Style Matters: A Question of Timing." By Didier Martin. From (August 02, 2000). [The SMIL family of XML applications enables synchronized display of multimedia elements on the Web. Didier Martin explores SMIL, and the new synchronization features in Microsoft's IE5.5.'] "We are so used to HTML that, most of the time, the whole notion of a web browser is associated with HTML browsing. But outside of Microsoft or Netscape browsers there are alternatives, such as SMIL browsers, the most popular SMIL browser being the Real Audio G2 player. SMIL browsers display movies and animations, and play sound tracks. All these media are time sensitive, and to render multimedia objects is also to synchronize their rendering. Microsoft's recently released IE5.5 browser has some new synchronization capabilities. IE 5.5 supports a first version of the TIME module, a subset of the SMIL specification. But before we look at TIME, what is SMIL? SMIL (pronounced 'smile') stands for Synchronized Multimedia Integration Language. And guess what? SMIL is a rendering language based on XML. It is, in fact, a two year old W3C Recommendation! Two main multimedia browsers can interpret SMIL documents: (1) The Apple QuickTime Viewer; (2) The RealAudio G2 Viewer. These two are by far and away the most popular, having millions of copies installed. A SMIL document's structure is similar to the HTML document structure. . . Microsoft's IE 5.5 browser includes a useful new feature inspired by the SMIL specification: the time dimension. The HTML+TIME implementation is a first version of the SMIL 'Boston' profile [Synchronized Multimedia Integration Language (SMIL) Boston Specification, W3C Working Draft 22-June-2000; part of the W3C Synchronized Multimedia Activity]. W3C's SMIL Boston is a SMIL specification in which the SMIL features are defined as modules, and one or more modules can be included in other languages such as XHTML. The Microsoft implementation is an early draft and will not necessarily be compliant with the final recommendations. But at least it can give us an idea of the potential, and offer an early preview of the time and synchronization capabilities offered by the SMIL Boston working draft. Any object displayable in the browser's display area can have its lifecycle determined by the document's author. For instance, a paragraph may appear for only 5 seconds, or a floating frame may appear after 10 seconds. The SMIL Boston modules, and more particularly their implementation in mass market browsers, bring real multimedia capabilities to our published documents. For multimedia objects like movies, soundtracks, and animations, synchronization and object life cycle management are essential. Until SMIL Boston is finalized and gets browser support, you can experiment with the RealPlayer browser and SMIL publishing tools (RealPresenter), or play with IE 5.5's early SMIL Boston implementation." [Note also (1) SMIL Animation, published July 31, 2000, and (2) Synchronized Multimedia Integration Language Document Object Model (DOM).] For related references, see "Synchronized Multimedia Integration Language (SMIL)."

  • [August 03, 2000] "XML Dataspaces for Mobile Agent Coordination." By Giacomo Cabri, Letizia Leonardi, and Franco Zambonelli (Dipartimento di Scienze dell'Ingegneria, Università di Modena, Italy). In Proceedings of the 2000 ACM Symposium on Applied Computing [March 19 - 21, 2000, Como Italy] Volume 1, pages 181-188. [Author contacts: Phone: +39-059-376735; Fax: +39-059-376799; E-mail: {giacomo.cabri, letizia.leonardi, franco.zambonelli}] Abstract: "This paper presents XMARS, a programmable coordination architecture for Internet applications based on mobile agents. In XMARS, agents coordinate -- both with each other and with their current execution environment -- through programmable XML dataspaces, accessed by agents in a Linda-like fashion. This suits very well the characteristics of the Internet environment: on the one hand, it offers all the advantages of XML in terms of interoperability and standard representation of information; on the other hand, it enforces open and uncoupled interactions, as required by the dynamicity of the environment and by the mobility of the application components. In addition, coordination in XMARS is made more flexible and secure by the capability of programming the behaviour of the coordination media in reaction to the agents' accesses. An application example related to the management of on-line academic courses shows the suitability and the effectiveness of the XMARS architecture. [...] The presented XMARS architecture, by exploiting the power and the flexibility of a programmable Linda-like coordination model in the context of XML dataspaces, may provide several advantages in Internet applications based on mobile agents. In fact, while programmable Linda-like coordination suits the mobility of the application components and the openness of the Internet scenario, the emergent XML standard for Internet data representation may guarantee a high-degree of interoperability between heterogeneous components. However, apart from the need of updating portions of the current implementation (as discussed section 3), there are still several issues to be solved to make the XMARS architecture practically usable. These issues mainly relates to the lack of some XML specifications that are instead necessary for any effective management (and consequently for a tuple-based management) of XML documents. First of all, the XML schemas specification will permit to better specify data types in XML documents than the current version; this specification, together with the integration of XML namespaces in the current implementation, will be of great help in translating XML fields into Java objects (and viceversa), and in improving the effectiveness of the pattern matching mechanism. Second, the XML fragments specification will precisely define how fragments of an XML document can be extracted and inserted in a harmless way (i.e., preserving the validity of the XML document itself), thus meeting the need of handling single elements inside a possibly long document and, in the context of XMARS, enabling the system to extract/insert tuples representing a fragment of a large document in a consistent way. Strictly related, the XLink and XPointer specifications, which will rule the connections between different (parts of) XML documents, will possibly lead to richer and more complex tuple types." The document is also available in Postscript format. See the related collection of technical papers from the MOON Project - Mobile Object Oriented Environments. [cache]

  • [August 03, 2000] Annotations on Markup Languages: Theory and Practice Volume 2, Issue 1. The 'Winter 2000' issue of the academic journal Markup Languages: Theory and Practice (MIT Press) has been published, so I have prepared an annotated Table of Contents document for this publication: Volume 2, Issue 1 (Winter 2000). The document provides extended abstracts/summaries for the feature articles and some additional links. MLTP 2/1 has a mix of excellent articles: XML/EDI best practices, system architectures for structured documents, topic maps, web content authoring and transcoding, caterpillar expressions for tree matching, parameter entity tricks, book review. Details of editorship and publication are available in: (1) the journal publication statement; in (2) the journal description document; and in (3) the overview of the serials document, Markup Languages: Theory & Practice. See also the annotated TOCs for previous issues 1/1, 1/2, 1/3, and 1/4.

  • [August 03, 2000] "W3C advances shrink-to-fit graphics technology." In CNET (August 02, 2000). "More than a year behind schedule, a Web standards body today advanced a graphics technology aimed at making computer images fit into any screen--from cell phone displays to large monitors. The World Wide Web Consortium (W3C) is inviting comment on Scalable Vector Graphics (SVG), advancing the technology to the standards body's penultimate 'candidate recommendation' status. The specification, first proposed in January last year and originally slated for a summer 1999 proposed recommendation, promises to make Web graphics more flexible and lightweight, as well as more easily integrated with Web documents. Vector graphics are images that computers can render from a set of geometric descriptions instead of pixel-by-pixel bitmap copies such as the common JPEG or GIF formats. Because vector graphics are mere abstract descriptions, they can fly through tight bandwidth connections that typically choke on bulky image files. Vector graphics also have the advantage of being easily resized to suit their destinations. On that score, SVG comes at an opportune time for companies tailoring Web pages to fit a variety of different Web-surfing devices, including small appliances such as telephones. Because photographs still require bitmap formats, vector graphics don't spell the end of bitmap images on the Web. To that end, SVG is designed to assimilate and more efficiently resize photographic images, according to the W3C. SVG is written in Extensible Markup Language (XML), a W3C recommendation for creating specialized markup languages for the Web. Capitalizing on XML's capabilities, the W3C has made SVG's textual content, such as logos and labels, searchable and translatable, among other things." See the W3C press release and the main news entry.

  • [August 02, 2000] "DSML [Dialog System Markup Language]: A Proposal for XML Standards for Messaging Between Components of a Natural Language Dialog System." By Dragomir R. Radev, Nanda Kambhatla, Yiming Ye, Catherine Wolf, and Wlodek Zadrozny (IBM TJ Watson Research Center, 30 Saw Mill River Road, Hawthorne, NY 10532. Email: {radev,nanda,yiming,cwolf,wlodz} In Proceedings of the Artificial Intelligence and Simulation of Behaviour (AISB) 1999 Convention; the workshop on Reference Architectures and Data Standards for Natural Language Processing, Edinburgh, UK, April, 1999. Abstract: "In this paper, we propose using standard XML messaging interfaces between components of natural language dialog systems. We describe a stock trading and information access system, where XML is used to encode speech acts, transactions and retrieved information in messages between system components. We use an XML/XSL based unification approach to display personalized, multi-modal system responses. We are proposing the creation of XML standards for all messaging between components of NLP systems. We hope that the use of XML in messaging will promote greater interoperability of both data and code. The working name of the proposed standard messaging language(s) is DSML ('Dialog System Markup Language.') [...] In our natural language dialog (NLD) processing system, a user can express requests in any modality (e.g., speech, text, graphics, etc.). The NLD system iteratively identifies the communicative act(s) of a user, queries for and fills the parameters of the identified action, executes the action, and displays the results to the user using an appropriate modality. Our class of NLD systems can be considered to be a sub-class of more general language engineering systems such as the General Architecture for Text Engineering (GATE) (Gaizauskas et al.) system. . . Conclusion and Future Work: We are proposing the use of XML/XSL technologies as a standard messaging format for the components of natural language dialog systems. The XML can be used at several levels. For instance, we can use XML for representing the logical forms output by NLP parsers and also to represent the dialog acts for dialog processing. Similarly, XML can be used to represent contextual information needed for utterance interpretation (e.g., background knowledge, previous discourse, etc.). We envision a phased approach to building XML based standards for NLP systems: Phase I There is a broad usage of XML for NLP/dialog processing systems leading to more modular and easily exchangeable code and data. However, each group uses their own custom XML. The groups communicate with each other. Phase II Standard XML languages are identified for different domains (e.g., brokerage systems, ATIS, directory services, etc.). Thus, we agree both on the syntax and the vocabulary of the representations. We foresee potential development of transducers to transform messages in other languages such as KQML to XML. Like XML, our XML-based standard will facilitate the encoding of speech acts such as 'request', 'assert', 'reply', etc. Phase III Standard semantics are identified for interpreting the standard XML languages, leading to interoperable data and code. This stage will involve the use of content languages such as KIF or Lisp, as well as standardized domain ontologies." See now "Dialogue Moves Markup Language (DMML)." [cache]

  • [August 02, 2000] "Natural Language Dialogue for Personalized Interaction." By Wlodek Zadrozny, Malgorzata Budzikowska, J. Chai, Nanda Kambhatla, Sylvie Levesque, and Nicolas Nicolov. In Communications of the ACM (CACM) Volume 43, Number 8 (August, 2000), pages 116-120. ['Technologies that successfully recognize and react to spoken or typed words are key to true personalization. Front- and back-end systems must respond in accord, and one solution may be found somewhere in the middle(ware).' The article presents the "Dialogue Moves Markup Language (DMML)" in the context of this CACM special issue on 'Personalization'.] "We address the issue of managing the complexities of dialogue systems (for example, using NL dialogue on devices with differing bandwidth) by describing a piece of middleware called Dialogue Moves Markup Language (DMML). One of the key aspects of engineering is to design the middleware (middle layer)... which typically contains a dialogie manager, an action manager, a language-understanding module, and a layer for conveying messages between them... Universal Interaction uses the same mechanism for different communication devices (such as phones, PDAs, and desktop computers). This means conveying the same content through different channels by suitably modifying the way it is represented. Universal Interaction architecture is tailor-made for personalization; the actual interface for each user can be specifically constructed for him or her based upon geography-specific, user-specific, and style-specific transformations. How do we transform the content to fit into different representations? A potentially good idea is to use an XML/XSL-based architecture for the separation of form, content, style, and interactions. To make the idea more specific, imaging how one can represent a dialogue move in a stock trading scenario. DMML -- inspired by the theory of speech acts and XML -- is an attempt to capture the intent of communicative agents in the context of NL dialogue management. The idea is to codify dialogue moves such as greetings, warnings, reminders, thanks, notifications, clarifications, or confirmations in a set of tags connected at runtime with NL understanding modules, which allow us to describe participants' behaviors in terms of dialogie moves, without worring about how they are expressed in language. For instance, 'any positive identification must be followed by a confirmation.' The tags can also encode other parameters of the dialogie, such as the type of channel and personal characteristics. Thus, the dialogue can reflect the channel characterstics, which connects DMML and Universal Interaction. [...] DMML thus illustrates the concept of communication middleware in dialogue systems, and is very well suited for personalization." See "Dialogue Moves Markup Language (DMML)."

  • [August 02, 2000] "XBRL Taxonomy: Financial Reporting for Commercial and Industrial Companies, US GAAP." 2000-07-31. Edited by Sergio de la Fe, Jr. (CPA, KPMG LLP), Charles Hoffman (CPA, XBRL Solutions, Inc.), and Elmer Huh (Morgan Stanley Dean Witter). ['Taxonomy for the creation of XML-based instance documents for business and financial reporting of commercial and industrial companies according to US GAAP.'] Abstract: "This documentation explains the XBRL Taxonomy Financial Reporting of Commercial and Industrial Companies, US GAAP, dated 2000-07-31. This taxonomy is created compliant to the XBRL Specification, dated 2000-07-31. It is for the creation of XML-based instance documents that generate business and financial reporting for commercial and industrial companies according to US GAAP. XBRL is a specification for the eXtensible Business Reporting Language. XBRL allows software vendors, programmers, and end users who adopt it as a specification to enhance the creation, exchange, and comparison of financial reporting information. Financial reporting includes, but is not limited to, financial statements, financial information, non-financial information and regulatory filings such as annual and quarterly financial statements." See "Extensible Business Reporting Language (XBRL)."

  • [August 01, 2000] "The Paper Path: XML to Paper Using TeXML." By Brian E. Travis. In TUGboat [The Communications of the TeX Users Group] 20 (4) (December 1999), pages 350-356. A complete tutorial on the use of IBM's TeXML. ["TeXML provides a path from XML into the TeX formatting language. It supports many Unicode Characters and includes a document type and formatter for plain text. The path to print begins with your XML document. You write an XSL transform which accepts your document type and outputs a new XML document which conforms to the TeXML document type. The java program, TeXMLatte transforms any document conforming to the TeXML document type into TeX. You may now use your TeX processor to produce typeset output from XML."]

  • [August 01, 2000] "XML standard for electronic news released." By Christine M. Campbell. In Network World (July 18, 2000). "The International Press Telecommunications Council (IPTC) approved on Monday the public release of a standard based on Extensible Markup Language (XML) to simplify the delivery, creation and retrieval of news in electronic formats. Dubbed NewsML, the standard's 1.0 beta version was launched Monday and is the direct result of an IPTC initiative to deliver a standard to more easily manage electronically formatted news in environments such as wire services and Web sites, according to the IPTC. In addition, the standard allows for better categorization and indexing of the content, making retrieval easier. The content can be text, video, audio or still images. 'NewsML is about creating linked streams of objects so they can be managed through production' to the Web site rendering, said Tony Allday, product development director at Reuters Group. NewsML allows a developer to 'create one document that could be, because it's in XML, rendered using XML style sheets for the Web site and translated for use on some wireless devices,' he said. Using metadata tags to categorize news packages and their elements, NewsML allows for the physical description of the content, information about the construction of the content -- such as the author, publisher or owner -- and information about the content, such as what a particular story is about or who might be interested in it, according to the IPTC." See "NewsML and IPTC2000."

  • [August 01, 2000] "Common XML - Final Review Draft Specification." From the SML-DEV mailing list. 27-July-2000. "Common XML begins with a frequently used and thoroughly reliable subset of the features provided by the XML 1.0 and Namespaces in XML W3C Recommendations. Common XML defines a very small core, but allows developers to move beyond that core if needed. Additional features from XML 1.0 and Namespaces are still available. This specification includes descriptions of the impact of features beyond the core on interoperability." [From Simon St.Laurent, 2000-08-01: "The SML-DEV mailing list has published a 'Final Review Draft' of Common XML...Common XML is the milder (and generally less controversial) version of XML simplification that we've been working on through SML-DEV. Common XML is only a redescription of XML in terms of which parts are most highly interoperable, not a stripping-down of any XML functionality. Comments on the draft are welcome, either to the editor (me) or to the SML-DEV mailing list. We're hoping to close review by September 1 and move toward a final release. This document is the product of a mailing list, not a formal organization, so our publishing process is still in development, of course! More information on SML-DEV work is available at: Background information on SML-DEV and its activities is available at: XMLHack." [cache]

  • [August 01, 2000] "StarOffice goes open source." By James Niccolai and Paul Krill. In InfoWorld Volume 22, Issue 30 (July 24, 2000), page 12. "Sun Microsystems last week announced it will release the source code for its StarOffice productivity suite, a move that will allow software developers to modify the product to suit their needs. The code will be released under the GNU GPL (General Public License) open-source licensing model at, which will serve as the coordination point for the code, the definition of XML file formats, and the definition of language-independent APIs. Sun's plan was described as 'a significant initiative' but also as 'schizophrenic benevolence' in a statement by Stacey Quandt, an analyst at Giga Information Group, in Cambridge, Mass. A Sun representative also confirmed that a Web-based version of the software, StarPortal, is still being tested and will not ship for several weeks. Sun originally said it would ship in the first half of the year." See the main news item.

  • [August 01, 2000] "Sun hints at new strategy with open-source office software." By Stephen Shankland. In CNew (July 19, 2000). "Sun Microsystems finally has found a software project it likes that it's willing to let others control. Sun will announce today, as previously reported, that it will open the source code for the next version of its StarOffice suite of productivity software, said Marco Boerries, head of the software project that Sun acquired for $73.5 million. The OpenOffice group will control the file formats, which will use the XML data description technology. Though StarOffice can read and write Microsoft Office formats, currently StarOffice files must be saved in Microsoft Office formats for Microsoft Office users to read them, he said. Microsoft could supply import filters to change that situation, but the company declined earlier, and Boerries doesn't consider it likely that it would. . . StarOffice, which Sun allows people to download for free, has a word processor, spreadsheet, presentation package and other software for general office use. It competes with Applix's VistaSource software, which like StarOffice runs on Linux and several other operating systems. StarOffice also competes with the dominant Microsoft Office for Windows and Corel's nascent Word Perfect Office for Linux. The next version of the software, 6.0, will allow programs such as the word processor to be run by themselves instead of only as a part of the StarOffice suite, Boerries said. Sun also will release the source code of the StarPortal software, a delayed future version of the software that's designed to run primarily on central servers so that people can tap into it from handheld computers, cell phones, desktop computers or other devices. Sun hasn't decided which license to release that version under, however. . ." See the main news item.

  • [August 01, 2000] "Microsoft to help firms join online exchanges." By Melanie Austria Farmer. In CNew (July 17, 2000). "Less than a month after announcing a wide-ranging Internet plan, Microsoft will team with KPMG Consulting to help businesses connect their existing computer systems with online marketplaces. Microsoft and KPMG, which have partnered in the past, say they plan to offer Dot.Ramp, a new software and services initiative designed to simplify the integration process for buyers and sellers. The offering targets businesses of all sizes and is centered around Microsoft's .Net strategy. KPMG will provide systems integration work and consulting services. The company's .Net plan, heavily based on the Extensible Markup Language (XML) data-sharing standard, is aimed at making Microsoft's existing software available over the Web to traditional PCs and to increasingly popular devices such as cell phones and personal digital assistants (PDAs). KPMG and Microsoft said Dot.Ramp will simplify the integration process by using common Internet business standards such as XML (Extensible Markup Language) to connect businesses with multiple marketplaces. XML is a Web standard for exchanging data. In addition, the companies said that Dot.Ramp supports most business management software products from companies including SAP, J.D. Edwards, Ariba and Commerce One. Microsoft and KPMG said they will focus on five markets with the new offering, including industrial products, aerospace, chemical, energy and automotive."

  • [August 01, 2000] "Developers To Polish New Perl." By Charles Babcock. In Inter@ctive Week Volume 7, Number 30 (July 31, 2000), page 20. "Perl, the scripting language whose flexible connecting features make it known as 'the duct tape of the Web,' will get a massive upgrade over the next 18 months from its original author, Larry Wall. The language is being rewritten 'from the ground up' to bring it into the Web-based world and let it work with other Internet technologies, said Nathan Torkington, one of the developers on the Perl 6 project. The upgrade will better Perl's management of system memory, improve its ability to parse eXtensible Markup Language and search for XML-tagged documents, and make the language more compatible with Java and other software programs." See "XML and Perl."

  • [August 01, 2000] "Vendors look to ease directory synchronization." By Deni Connor. In Network World (July 31, 2000). "Novell and the Sun-Netscape Alliance are trying to make it easier for large customers to synchronize, exchange and manage user data across a variety of systems. Last week at The Burton Group Catalyst Conference, Novell announced that it is shipping its long-delayed metadirectory, DirXML, and several links, known as connectors, to other directories and messaging applications. Connectors let directories replicate changes with one application to another with Novell's NDS eDirectory at the center. In September, Novell will announce 12 other connectors for e-business and other applications, including PeopleSoft human resources applications; the Open Database Connectivity and Java Database Connectivity and X.500 directory standards; Windows NT domains; X.500; BroadVision's e-business applications; and SAP applications. Novell will also link the IBM/Tivoli SecureWay, Entrust and iPlanet Directory Server with its Lightweight Directory Access Protocol (LDAP) connector. Those connectors will complement Novell's existing DirXML connectors for Microsoft Exchange and Active Directory, other Novell Directory Services directories, LDAP Version 3 and Lotus Notes. DirXML is based on NDS eDirectory and uses XML to link user data and create a metadirectory, which collates data from disparate directories and combines it into a logical whole. Sun-Netscape also rolled out a proprietary scripting language dubbed the Directory Application Interface (DAI), which works with XML, LDAP and Java Server Pages to build connectors. The company claims that DAI simplifies the creation of connectors, requiring developers to only know simple APIs and not the complexity of the directory." See "Directory Services Markup Language (DSML)."

  • [August 01, 2000] "Directories meet e-comm. Integration with XML-enabled apps will be key." By John Fontana. In Network World Volume 17, Number 30 (July 24, 2000), pages 1, 73. "Long thought of as a place to manage end users and organize lists of employees, the enterprise directory is quickly evolving into a platform for e-commerce and a key technology for use with XML-enabled applications. That evolutionary process and its importance for enterprise users will get a thorough examination this week at The Burton Group's Catalyst Conference in San Diego. IT executives will get a peek at new products from several vendors, including Netegrity and Oblix, that are designed to help firms securely expose their directories to outside users. The directory is key for controlling business partners' access to applications and data, which is a pressing issue among IT executives building e-commerce relationships. They also will be looking at the Directory Services Markup Language (DSML), an XML specification introduced at the conference last year to great fanfare. The now emerging DSML 2.0, which will be put on a standards track, raises hopes of XML and directory integration, along with concerns over fragmentation of directory access standards. . . In addition to access, other key issues will find prominence at Catalyst, including DSML 2.0. A year ago at the conference, e-commerce vendor Bowstreet introduced the 1.0 version with backing from Microsoft, Novell and Oracle, among others. Version 1.0 was limited, providing only a description of a directory's content. DSML 2.0 promises to add query and modification capabilities and the ability to manipulate directory data, a critical step allowing developers of XML-enabled applications to add hooks to a directory. Many vendors, including iPlanet, Radiant Logic and Sun, will use Catalyst to demonstrate support for DSML. Radiant Logic plans to introduce Radiant One 1.5, which supports DSML 1.0. The software is a 'virtual directory' that has an intelligent cache to accelerate LDAP-based access and modification of back-end database information. But DSML 2.0 is raising some questions as XML and directories continue on a course toward convergence. Observers are concerned about the overlap of DSML and the Lightweight Directory Access Protocol and whether LDAP, XML's Simple Object Access Protocol (SOAP) or both will become the protocol of choice for accessing a directory. LDAP isn't particularly suited to traverse corporate firewalls, while SOAP is designed just for that purpose." See "Directory Services Markup Language (DSML)."

  • [August 01, 2000] "Directories in the limelight." By Stephanie Sanborn. In Network World (July 25, 2000). "Directories took center stage at The Burton Group's annual Catalyst Conference here Monday, with Novell taking the wraps off its long-awaited DirXML product and iPlanet International introducing a DAI (Directory Application Integration) architecture and the iPlanet Directory Server 5. DirXML, based heavily on XML, acts as a link between the directory and business applications so that companies can integrate applications without altering the applications themselves. The software also allows the linking of data sources to access information "no matter where it resides," said Ed Anderson, director of product management for directory services at Provo, Utah-based Novell. . . The DirXML engine will sit on top of eDirectory, between the access front end and the management back-end systems, providing the framework for developers to build applications. With the user information stored in the directory and other corporate data sources, DirXML drivers serve as a link to enable services such as provisioning, business-to-business extranet access, data access based on security policies, and CRM (customer relationship management). . . Novell will create solutions around DirXML and deliver the technology through its channels, starting with consulting partners and services. The company at Catalyst will be demonstrating an example involving a sample employee provisioning system, with PeopleSoft as the HR component and DirXML linking the corporate systems. More solutions will be delivered during the next few months, Anderson said."

  • [August 01, 2000] "An Introduction to XML." By Michel Rodriguez. (July 2000). In Boardwatch (July 2000). "...XML is a wonderful tool, and can be used to vastly enhance the value of information available both on the Web and on an intranet. It is not a magic wand, though, and it will not solve anything until the overall system is properly designed. XML defines a format for storing and exchanging information, and it comes with a host of tools and associated standards that make it much easier to handle than any "home-brewed" format. That's it. Without a sensible scheme (most likely described through DTDs or schemas) for documents and data, harmonization with other organizations, and a clean design of the general architecture of the system and proper attention paid to the details of how all the pieces are actually going to work together, the result will be either gigabytes of useless tag soup or days spent tagging the middle initial of each and every person who ever had a look at the system. On the other hand, using XML may be as simple as just choosing it as the format to store some configuration files, exchange a couple of simple data tables or add a small number of custom elements to HTML files. It will provide a flexible way to store and retrieve data while getting to understand better the format and what can be done with it. There is no need to start with a full-blown revamping of a whole Web site involving converting every single piece of data it contains."

  • [August 01, 2000] "Sun decides to break its Star Office suite to make the best open-source omelet." By Nicholas Petreley. In InfoWorld (July 28, 2000). "... Greater interoperability between KDE and GNOME components is supposedly already in the works. Sun's choice of Bonobo will simply increase the pressure on KDE to make its component architecture work seamlessly with Bonobo. More important to Sun is the fact that splitting up Star Office into components makes the suite more likely to become ubiquitous. The idea is to encourage developers to drop the Star Office word processor component into their applications instead of writing a word processor or editor module. There's no guarantee the open-source community or commercial developers will buy in to this, but I'm betting they will. The Bonobo layer should make the components relatively easy to reuse. There are no licensing fees or restrictions (other than the GPL requirement to release your modifications) to discourage developers from using the Star Office components. Sun is establishing a foundation of XML-based open file formats for its documents, so developers and users don't need to be afraid they'll be locked in to a proprietary data format when they adopt Star Office or applications that use Star Office components..."

  • [August 01, 2000] "More and more, XML diving into the 'stream'." By Roberta Holland. In eWEEK (July 27, 2000). "While the Extensible Markup Language has been hailed mainly for its usefulness in e-commerce, companies are also embracing XML as a quick way to stream massive amounts of content and data. On the content side, Wavo Corp. has XML at the heart of its MediaXpress service, which streams syndicated news, sports and entertainment to Web sites looking to boost their content. Next week Wavo will announce it is adding support for Unix and Linux. MediaXpress currently works with Windows NT. John Lehman, president of Sageware Inc. in Mountain View, Calif., said MediaXpress meets the needs of his startup, which categorizes content for vertical markets and distributes it to portals and Web sites. Lehman, whose company was a beta customer for MediaXpress on Unix, said the inclusion of XML is crucial. . . Like Wavo, Inc. also has XML at the core of its technology. But TekInsight is using XML not to deliver media but to provide massive amounts of failure and monitoring data to IT departments. TekInsight's BugSolver Enterprise, announced earlier this week, uses what the company calls "Streaming XML," a more efficient way to transmit, store and retrieve large amounts of XML data, officials say. Streaming XML breaks data into sequenced, self-contained packets that are then token compressed. . ."

  • [August 01, 2000] ".Net Hurdles: Standards, Security." By Mitch Wagner. In InternetWeek (July 2000). "As Microsoft prepares to fill in developers on its next-generation Internet strategy this week, the company faces the considerable challenge of turning some of its biggest weaknesses into strengths, customers and analysts say. . . The company will provide its cadre of programmers with important technical details about the strategy. In addition, Microsoft will release beta versions of the first tools based on the blueprint, called Visual Studio.Net, the new name for its Visual Studio tool. The new tool will include a beta version of a new programming language, C# (pronounced C-Sharp), that will compete with Java. Like Java, C# is based on C++, and will therefore be attractive to developers with C++ skills. But where Java emphasizes cross-platform portability, C# is designed to use XML and SOAP to provide cross-platform connectivity. . .Microsoft has an imperfect record when it comes to standards support, said Giga's Enderle. The company's XML support is strong, for example, but Microsoft's support for the Lightweight Directory Access Protocol (LDAP) has been weak, he said. Reliability has proved to be a big problem for Microsoft and will be critical if the company is to sell its .Net strategy. Currently, Microsoft Outlook and Internet Explorer are a source of crashes, viruses and security holes, Enderle said. Bruce Schneier, CTO of consulting firm Counterpane Internet Security Inc., said the SOAP protocol on which .Net will be based is inherently insecure. It's designed to use the e-mail protocol SMTP for transport to let DCOM objects communicate with each other and get around firewalls. "This is a protocol for a hacker to tunnel through your firewall and mess with the file systems," Schneier said."

July 2000

  • [July 31, 2000] "Meaning and Interpretation of Markup." By C. M. Sperberg-McQueen (World Wide Web Consortium, USA), Claus Huitfeldt (University of Bergen, Norway), and Allen Renear (Brown University, USA). Paper presented at the ALLC/ACH 2000 Conference (Glasgow, Scotland). "Markup is inserted into textual material not at random, but to convey some meaning. An author may supply markup as part of the act of composing a text; in this case the markup expresses the author's intentions. The author creates certain textual structures simply by tagging them; the markup has performative significance. In other cases, markup is supplied as part of the transcription in electronic form of pre-existing material. In such cases, markup reflects the understanding of the text held by the transcriber; we say that the markup expresses a claim about the text. In the one case, markup is constitutive of the meaning; in the other, it is interpretive. In each case, the reader (for all practical purposes, readers include software which processes marked up documents) may legitimately use the markup to make inferences about the structure and properties of the text. For this reason, we say that markup licenses certain inferences about the text. If markup has meaning, it seems fair to ask how to identify the meaning of the markup used in a document, and how to document the meaning assigned to particular markup constructs by specifications of markup languages (e.g., by DTDs and their documentation). In this paper, we propose an account of how markup licenses inferences, and how to tell, for a given marked up text, what inferences are actually licensed by its markup. As a side effect, we will also provide an account of what is needed in a specification of the meaning of a markup language. We begin by proposing a simple method of expressing the meaning of SGML or XML element types and attributes; we then identify a fundamental distinction between distributive and sortal features of texts, which affects the interpretation of markup. We describe a simple model of interpretation for markup, and note various ways in which it must be refined in order to handle standard patterns of usage in existing markup schemes; this allows us to define a simple measure of complexity, which allows direct comparison of the complexity of different ways of expressing the same information (i.e. licensing the same inferences) about a given text, using markup. For simplicity, we formulate our discussion in terms of SGML or XML markup, applied to documents or texts. Similar arguments can be made for other uses of SGML and XML, and may be possible for some other families of markup language. . ." See also the bibliography cited in the paper; further in What is Text, Really?" (DeRose/Durand/Mylonas/Renear). [cache]

  • [July 31, 2000] "XML Standards: A Problem Of Physics." [STOP BITS.] By Tim Wilson. In InternetWeek (July 24, 2000)., page 35. "In two weeks, the ebXML standards group will meet in San Jose. The group, made up of representatives from the United Nations' CEFACT standards committees as well as the vendor-dominated OASIS industry consortium, is working on standards that will make XML data understandable across industries and international boundaries. The group plans to have XML messaging and business process specifications ready by the middle of next year. Such a meeting may sound colossally dull, but as a longtime observer of standards wars, I can't help but watch breathlessly. To me, it's sort of like a mixing of matter and antimatter. . . On one side, there is XML, perhaps the most important technol-ogy in e-business today. XML is the basic language that companies will use to link their applications, to share information, to transact business over the Web. Just as a shark must swim to survive, XML must evolve if e-business is to move forward. On the other side is a standards group made up of vendors and U.N. representatives. Groups of this same deadly composition -- such as the old International Standards Organization (ISO) and the Consultative Committee on International Telephony and Telegraphy (CCITT) -- have given us some of the most resounding IT standards failures of the past two decades. X.500 directory services. X.400 messaging. After nearly 20 years of development and countless hours of meetings and technical reviews, the entire Open System Interconnection (OSI) effort yielded virtually nothing beyond physical cabling standards. . . it concerns me that ebXML is now being 'standardized' by an organization that has more than 300 attendees at some working group meetings. Aren't we making the same mistakes that OSI's developers made years ago?"

  • [July 31, 2000] "Specification Lets Apps Span Industries." By Tim Wilson. In InternetWeek (July 17, 2000), pages 1, 16. "E-businesses next month will get their first glimpse of prototype data-sharing and transactional applications that can operate not only across companies, but also across industries. The applications are based on ebXML, a proposed framework that bridges ordering, billing and other information that is formatted differently in various industries. Test applications will be on display at a meeting of the Electronic Business XML consortium on August 7 [2000] in San Jose, Calif. The specs will not only open up cross-industry e-commerce, but they'll also make e-business more accessible to small and midsize companies that can't afford EDI, said Simon Nicholson, a market strategist at Sun Microsystems. Sun is a charter member of a vendor consortium developing ebXML along with the United Nations CEFACT standards group. At the meeting next month, Sun will demonstrate a prototype of its Java-based technology that lets applications transmit XML and non-XML messages -- such as requests for quotes and inventory information, as well as conventional EDI data such as purchase orders and shipping notices -- across Web infrastructures. . . Several companies, including IBM and Microsoft, are defining their own XML frameworks for linking disparate e-business applications. Microsoft's BizTalk already is being implemented by several companies, including Dell and CapitalStream, for cross-industry ordering, billing and other apps. 'We don't see BizTalk and ebXML as being in conflict,' said Dave Turner, Microsoft's XML evangelist. 'We are defining a means for exchanging data [via XML], but not the format of the data itself. We are not trying to define things like business processes, which ebXML is trying to tackle.' BizTalk and ebXML are different approaches to XML messaging, but Microsoft plans to address those differences when the ebXML standards are completed sometime in the middle of next year, Turner said. 'We have customers that want to build interoperable XML applications today, and that needs to be built on something that exists today,' he said. Meantime, the ebXML group hopes to complete its specs and implementations by the second half of 2001. The group published draft specs for its technical architecture, core components and business process model earlier this month."

  • [July 31, 2000] "Introducing XSLT. XSL Transformations: XSLT Alleviates XML Schema Incompatibility Headaches." By Don Box, Aaron Skonnard, and John Lam. From MSDN Magazine Online (July 20, 2000). MSDN Technical Article. ['Learn how the XSL Transformations (XSLT) specification defines an XML-based language for expressing transformation rules that map one XML document to another.'] "The XSL Transformations (XSLT) specification defines an XML-based language for expressing transformation rules that map one XML document to another. XSLT has many of the constructs found in traditional programming languages, including variables, functions, iteration, and conditional statements. In this article you'll learn how to use the XSLT instructions and template rules, manage namespaces, control transformation output, use multiple stylesheets, and employ pattern-matching with template rules. A sidebar explains how to access XSLT from MSXML using the IXSLTemplate and IXSLProcessor interfaces. . . The XML Schema definition language is poised to become the dominant way to describe the type and structure of XML documents. XML Schemas provide the basic infrastructure for building interoperable systems based on XML since they give you a common language for describing XML that is based on proven software engineering principles. That stated, the expressiveness of XML Schemas makes it possible (if not likely) that multiple organizations modeling the same set of domain-specific abstractions will come up with different schema documents. Yes, this problem could be solved via industry consortia defining canonical schema for each domain, but until that happens, dealing with multiple schema definitions of the same basic information will be a fact of life. Enter XSL Transformations (XSLT). The XSLT specification defines an XML-based language for expressing transformation rules from one class of XML document to another. The XSLT language can be thought of as a programming language, and there are at least two XSLT execution engines currently available that can directly execute an XSLT document as a program. But, XSLT documents are also useful as a general-purpose language for expressing transformations from one schema type to another. In fact, we could imagine using an XSLT document as one form of input to an arbitrary XML translation engine. XSLT excels at mapping one XML-based representation onto another... XSL Transformations solve a major problem caused by the proliferation of multiple XML Schemas describing complementary data. With XSLT, you can use your favorite programming language to map XML documents to one another, creating output in an arbitrary text-based format (including XML). Of course, simply mapping documents doesn't ensure that they can interoperate properly -- human interaction is still needed to interpret the data -- but XSL Transformations provide a valuable first step that makes the task easier." [This article was adapted from the forthcoming book Essential XML (Chapter 5), by Don Box, Aaron Skonnard, and John Lam, (c) 2001 Addison Wesley Longman.]

  • "ARANEUS in the Era of XML." By G. Mecca, P. Merialdo, P. Atzeni. In IEEE Data Engineering Bullettin, Special Issue on XML, September, 1999.

  • [July 31, 2000] "Style-free XSLT Style Sheets." By Eric van der Vlist. From (July 2000). ['Building web sites with XSLT sometimes raises architectural issues. This article presents a pattern for maintaining a clear separation between style, logic, and content in XSLT-produced websites.'] "One of the most oft-marketed advantages of XML is the separation between content and the layout achievable through applying external CSS or XSL style sheets to XML documents. However, since work started on XSL, the focus has shifted from presentation to transformation. This has given birth to a transformation-only language, XSLT, which is much more widely used than its formatting counterpart, XSL formatting objects. This shift from presentation to transformation is leading to a massive injection of logic within style sheets. This mixing of presentation and logic is becoming questionable. In this article, I present a simple technique to isolate most of the presentation outside of XSLT, leading to 'style free style sheets.' As more logic is embedded in the XSLT style sheets, they become more similar to programs than data. Keeping style within the style sheets causes the same kinds of problems as keeping data within programs--loss of flexibility, and lack of maintainability by anyone but programmers. The fact that XSLT style sheets are generally pseudo-compiled in servlet environments, and that new techniques are being announced to compile them into Java byte code, enforces this tendency of XSLT style sheets to be more programs than style. The corollary of this trend is to remove as much style as possible, unless you are ready to maintain multiple compiled style sheets. From an XSLT user's perspective, you can't expect web designers to develop XSLT style sheets, which their favorite tools lack support for. Wouldn't it be nice to let designers design the layout of a page and to add special 'tags' to include its content?..."

  • [July 31, 2000] "XML Questions Answered." By John E. Simpson. From (July 26, 2000). ['In the first of our new monthly XML Q&A columns we tackle the problem of converting HTML to XML, ask "What is markup?", and discover whether XML has any weaknesses.'] On the larger question of ("What is) text/markup?", see the paper of Sperberg-McQueen, Huitfeldt, and Renear, cited above, with references.

  • [July 31, 2000] "XML-Deviant: Last Call Problems." By Leigh Dodds. From (July 26, 2000). [This week the XML Deviant dips into the SVG developer lists to find developers frustrated with the specification, which is still at Last Call status.']

  • [July 31, 2000] "Syndicating XML." By Rael Dornfest. From (July 17, 2000). ['This special issue of focuses on XML's application in syndication, including XML news formats, ICE, and syndicating web site headlines with RSS.'] "RSS is a portal content language. RSS is a lightweight syndication format. A content syndication system. And a metadata syndication framework. In its brief existence, RSS has undergone only one revision, but that hasn't stopped its adoption as one of the most widely used web site XML applications to date. The RSS format's popularity and utility has found it uses in many more scenarios than originally anticipated by its creators. RSS v0.9, standing at that time for "RDF Site Summary," was introduced in 1999 by Netscape as a channel description framework for their My Netscape Network (MNN) portal. While the 'My' concept itself wasn't anything earth-shattering, Netscape's content-gathering mechanism was rather novel. This simple XML application established a mutually beneficial relationship between Netscape, content providers, and end-users. By providing a simple snapshot-in-a-document, web site producers acquired audience through the presence of their content on My Netscape. End-users got one-stop-reading, a centralized location into which content from their favorite web sites flowed, rather than just the sanitized streams of content syndicated into most portals. And My Netscape, of course, acquired content for free." See "RDF Site Summary (RSS)."

  • [July 31, 2000] "Visual Basic Special Edition." By Kurt Cagle, Mark and Tracey Wilson, and James Snell. From (July 12, 2000). ['This special edition of is dedicated to exploring how XML can be used with Visual Basic, one of the most widespread programming environments. Find out more about using VB with the DOM, XSLT and SOAP.'] Kurt Cagle outlines how the DOM and XSLT can be used from Visual Basic to create applications. He also shows how XSLT can in fact subsume a lot of tasks that previously required VB programming, and goes on to ask whether XML integration technologies might in the end spell doom for Visual Basic. . . VB6, the most recent version of Visual Basic, which came out as part of Microsoft's Visual Studio package, appeared on the landscape in early 1998. At the time, another language was also in the works--XML. There was almost no acknowledgement by Visual Basic that XML was a force to be reckoned with (although VB6 "web classes" created a simple form of what's now recognized as a SAX based parser, though the creators at the time hadn't been aiming to do that). As such, Visual Basic's native support for XML is non-existent--no component can read or produce it, no persistence mechanisms exist around it, no data engines work with it. However, if that really were the extent of interaction between XML and VB, then the need for an article on the two together would never have arisen! Despite not supporting it natively, VB has proven to be a remarkably effective tool for working with XML. The reason for this is actually pretty simple: the MSXML parser exposes nearly all of the same interfaces in Visual Basic as it does in C++, and the parser that it does expose is arguably one of the best on the market."

  • [July 31, 2000] "XML Updategrams." ['This updategrams preview allows you to express changes to an XML document as database inserts, updates, and deletes with SQL Server 2000 Beta 2.'] SQL Server 2000 XML Updategrams: Beta 2. SQL Server 2000 introduces several features for querying database tables and receiving the results as an XML document. One of the features missing from the SQL Server 2000 Beta 2 release is the ability to express changes to an XML document as database inserts, updates, and deletes. This feature, called updategrams, is available for download. For a more detailed introduction to updategrams, click the documentation link... Updategrams define the structure of an XML document that can be used to describe a change to another XML document. The change is described by presenting what a fragment of the document looked like before the change and what the fragment should look like after the change. The before data is used to locate the portion of the document that should be changed. Enough data must be specified so that only one matching fragment can be found in the document. When the portion of the document matching the before data is found, it is changed to match the data specified in the after-data fragment of the updategram. This mechanism also supports optimistic concurrency control because if another user has changed the same portion of the document, the before data won't match and the update will fail..."

  • [July 31, 2000] "Microsoft Readies Web Parts. Keynote launches building blocks for DNA." By Jacqueline Emigh. In SmartPartner. In a keynote talk at TechEd, Microsoft group VP Bob Muglia launched a new Web Parts technology set to debut in a future edition of Windows 2000, along with Internet and Acceleration Server 2000 and seven other server building blocks for Microsoft's Digital Network Architecture (DNA). The new Web Parts technology will let ISVs, corporate developers and systems integrators build digital dashboards from already created HTML and XML pages located virtually anywhere on the Web. During a demo in Orlando, Muglia said that the Web Parts will use XSL and XML to let developers build customizable user interfaces out of XML and HTML content from existing Web sites. Microsoft also has created a set of connectors, he said. The new technology uses the Web Parts XML Schema. . . Web Parts ultimately will appear in the other DNA servers from Microsoft, as well as in W2K. Microsoft this week is releasing Web Part tools and sample parts in the Digital Dashboard Resource Kit (DDRK) 2.0. . . Commerce Server will add a product catalog system, plus new advertising, discount and content selection pipelines, said Rebecca Kunan, another Microsoft exec. Commerce Server also will integrate with Biztalk Server for e-commerce order processing. For its part, Biztalk Server will incorporate a built-in messaging server, in addition to an XSL transformation engine." [Web Parts are reusable components that wrap Web-based content such as XML, HTML, and scripts with a standard property schema that controls how the Web Parts are rendered in a digital dashboard. The Web Part schema offers a variety of ways to supply content to Web Parts. You can embed content in the Web Part itself, add a pointer to a location on the Internet or a local intranet, stream content from an Internet server, or add pointers to XML documents and XSL files. Because all Web Parts adhere to a common standard, you can create Web Part libraries from which you can assemble all the digital dashboards in your organization. In addition, system administrators can manage and distribute Web Parts using these libraries. End users generally don't need to learn new skills to create Web Parts. They can create simple Web Parts using tools included in the Digital Dashboard Resource Kit. Digital Dashboard Resource Kit 2.01 contains two XML-based sample digital dashboards.]

  • [July 31, 2000] "SOAP Toolkit - July 2000 Release." (July 26, 2000). ['This update provides improved support for passing XML in parameters and improved interoperability with the May 2000 IBM SOAP release.'] "The SOAP Toolkit is an MSDN architectural sample, which includes full source code. The SOAP Toolkit is not a supported Microsoft product. Please refer to the End User License Agreement, included in the download, for further details. To install the SOAP Toolkit, you must have the Windows Installer on your machine..."

  • [July 30, 2000] "XML Standards for Customer Information Quality Management." By V.R. [Ram] Kumar. In XML Journal, Volume 1, Issue 3 (July 2000), pages 41-45. Sys-Con Publishers. "The three key factors that contribute to the way we do our business today are corporate globalization and internationalization, an increase in the number of company acquisitions and a rapidly changing and increasingly competitive business environment. As a result, organizations recognize the immediate and urgent need to leverage their information assets in new and more efficient ways. At the heart of these information assets is enterprise data, the data collected during the normal course of business. This is regularly aggregated, combined and analyzed to provide the information needed for corporate decision-making. Once viewed as operational or tactical in nature, enterprise data is now used for strategic decision-making at every business level. Managing the strategic information assets and providing timely, accurate and global access to enterprise data in a secure, manageable and cost-efficient environment is becoming critical. Metadata - the information an enterprise stores about its data - has become the critical enabler for managing the integrated information assets of an enterprise. It's also, however, the weakest link in the information management chain. The proliferation of proprietary data management and manipulation tools has resulted in a host of incompatible information technology products, each processing metadata differently. End users suffer from inaccessible and incompatible metadata locked into individual tools. Metadata has thus become the number one integration problem in the area of enterprise information and data warehouse management. For enterprise-wide information management you need global and efficient access to shared metadata by all the heterogeneous products found in today's information technology environment. To use tools efficiently, users must be capable of moving metadata between tools and repositories..."

  • [July 28, 2000] "Knowledge Markup Techniques Tutorial." By Harold Boley, Stefan Decker, and Michael Sintek. Paper to be presented at ECAI 2000/PAIS 2000 (14th European Conference on Artificial Intelligence Prestigious Applications of Intelligent Systems, Berlin, Humboldt University, August 20-25, 2000). "There is an increasing demand for formalized knowledge on the Web. Several communities (e.g. in bioinformatics and educational media) are getting ready to offer semiformal or formal Web content. XML-based markup languages provide a 'universal' storage and interchange format for such Web-distributed knowledge representation. This tutorial introduces techniques for knowledge markup: we show how to map AI representations (e.g., logics and frames) to XML (incl. RDF and RDF Schema), discuss how to specify XML DTDs and RDF (Schema) descriptions for various representations, survey existing XML extensions for knowledge bases/ontologies, deal with the acquisition and processing of such representations, and detail selected applications. After the tutorial, participants will have absorbed the theoretical foundation and practical use of knowledge markup and will be able to assess XML applications and extensions for AI. Besides bringing to bear existing AI techniques for a Web-based knowledge markup scenario, the tutorial will identify new AI research directions for further developing this scenario. [Harold Boley has reinterpreted markup techniques for knowledge representation, showing the use of functional-logic programming in/for the Web, mapping the knowledge model of Protégé to XML-based systems, and developing the Relational-Functional Markup Language (RFML). Stefan Decker has worked in IT support for knowledge management using knowledge markup techniques facilitated by ontology, metadata and knowledge representation based approaches; he is currently working on scalable knowledge composition methods. Michael Sintek developed an XML import/export extension of the frame-based knowledge acquisition and modeling tool Protégé-2000 and currently works on XML/RDF-based methods and tools for building organizational memories in the DFKI FRODO project.]" See also "Relational-Functional Markup Language (RFML)."

  • [July 28, 2000] "Serving XML with JavaServer Pages. Today's e-commerce sites must interoperate with software components of all types, not just Web browsers." By Duan Yunjian and Willie Wheeler. In JavaPro Magazine (August 2000). "E-commerce systems must be able to work with other systems running on widely varying hardware and software platforms, including desktop machines with big or small monitors, PDAs, cell phones, and so forth. In any highly competitive business sector, no one can afford to turn customers or potential business partners away because their systems are 'incompatible.' Besides the two quality attribute requirements of usability and interoperability, an e-commerce site must be able to deliver dynamic content because business data is inherently dynamic. None of these requirements is trivial, and when combined, they can pose a real challenge. For example, the standard way of presenting dynamic content is to generate HTML on the server dynamically and then deliver it to the client, which is assumed to be a Web browser. This can work against usability, since HTML does not provide an especially rich set of controls with which to develop a user interface. It can also work against interoperability, since communicating with non-browser clients becomes problematic. In this article, we will explore an approach to meeting the usability, interoperability, and dynamic content requirements that face any e-commerce development project. Our approach is to use JavaServer Pages (JSP) to serve XML. We illustrate the concepts by way of a simple Web site for an e-travel business. We will use JSP to generate dynamic content, and XML to help achieve usability and interoperability."

  • [July 28, 2000] "Messaging: The transport part of the XML puzzle. A guide to technologies, protocols, and future directions." By Gordon Van Huizen (Director of Product Management, Progress Software). From IBM developerWorks (July 2000). ['Need help sorting out XML messaging protocols? This technical article explores the more recent focus of how to transfer XML between parties as part of a meaningful, reliable exchange. This article looks at major transport-level options and compares how they accomplish transferring XML between parties reliably. You'll find an overview of the approaches of XML-RPC, SOAP, WDDX, ebXML, and JMS as they apply to XML transport, with simple example code.'] "During the three years that XML has been on the scene, most discussion and debate in the developer community has centered on vocabularies and dialects -- in other words, what is to be said between two parties and how to represent such data. While a few forward thinkers began addressing the need to transport XML data as early as 1998, only in the last seven months has the focus of the XML community begun to shift from data taxonomy to the fundamental question of how to transfer XML between parties as part of a meaningful, reliable exchange. This article explores the leading protocols used for XML transport, the problems that they address, and the relationships between them. . . Despite the relatively recent community interest in XML transport issues, there are no less than 15 XML communication protocols vying for our collective attention. Eric Prud'hommeaux and Ken Macleod of the W3C have surveyed these protocols, producing a wonderful aerial view in the form of a facet-driven table on the W3C site. Of this array, XML-RPC, SOAP, WDDX, and ebXML have emerged as the most likely candidates to directly influence the future of XML transport. . . Having taken a quick look at the various options before us, we can begin to draw some conclusions about their roles in providing a transport for XML. SOAP and its younger sibling XML-RPC provide a serialization scheme and transport protocol over HTTP that best supports the synchronous mechanisms traditionally found in distributed-object computing. Although higher-level protocols can be layered on top of SOAP, some weaknesses may be inherent in relying on its underlying RPC mechanism. WDDX provides a serialization scheme that is more document-oriented, without the influence of remote procedure calls. This serialization scheme may find its way into higher-level protocols. ebXML -- a work in progress -- promises to specify the higher-level semantics required in multiparty business exchanges, while remaining independent of the low-level transport. Java language messaging based on JMS offers a platform-neutral messaging implementation that offers many of the required higher-level semantics, and is also independent of the low-level transport. It is already in use today, transporting XML to solve EAI and B2B problems within the enterprise and across the Internet."

  • [July 28, 2000] "RosettaNet: E-biz Rules Set in Stone. Viacore to unlock standard language for supply chains." By Richard Karpinski. In B2B Magazine (July 17, 2000), pages 3, 36. "Fadi Chehade was b-to-b before b-to-b was cool. Three years ago, Chehade led the groundbreaking computer industry effort, dubbed RosettaNet, to create a standard way for manufacturers, distributors and resellers to connect their supply chains through the Internet. Though the effort was successful in defining a collection of extensible markup language dialogues for standard buyer-seller interactions, individual companies have been slow to adopt the standards. Now, Chehade hopes to help the industry move from planning the standards to implementing them as founder and chairman of Viacore Inc., which this summer began delivering an innovative technology platform to make good on RosettaNet's promise. . ." See "RosettaNet."

  • [July 28, 2000] "Transporting Data with XML: the C++ Problem. Learn how to convert an XML document into C++ data structures with a solid object-oriented design." By Jim Beveridge. In XML Magazine (Summer 2000). "One of the most basic problems in development is how to move data from Point A to Point B. Moving data around is a problem at the hardware level, the software level, and the theoretical level. . . if you want to pass a class around by value, you must define a copy constructor, an assignment operator, and a destructor. Similarly, if you want to pass a class using COM, you might be limited by IDispatch datatypes or be required to create a custom marshaler. All of these situations involve trade-offs in flexibility, complexity of implementation, and speed. Enter XML. XML makes an excellent data transfer format. XML combined with a Data Type Definition (DTD) can increase the reliability of data flowing in and out of a site where the data should be in a standard format, and a data file format based on XML can evolve gracefully over time. Moreover, you can read XML using C++, but doing so can be difficult. [...] Now that you've seen the difference between the centralized glue layer and the decentralized glue layer, I can explain why the SAX event model proves simpler for this situation than DOM. With DOM, you navigate the tree hierarchy with commands such as GetFirstChild(), GetNextChild(), GetFirstSibling(), and GetNextSibling(). The top-level dispatcher uses distributed logic to process tags, so it has no idea whether any particular tag is supposed to have children or siblings. Therefore, it would be necessary for the dispatcher to query DOM continually about the structure of the XML document, such as whether the current node has children or whether it has siblings. In contrast, the SAX event engine is feeding the dispatcher exactly what it needs to know. Every startElement() is a stack push and every endElement() is a stack pop. No guesswork or analysis is necessary to determine how to proceed. With this architecture, the SAX parser was much easier to use when the entire XML document was to be converted to C++ structures. The design of this application meets the goals we set out for using XML as a data transport format. The XML documents it reads are forward extensible, while still being backward compatible. The file format is rich enough to model a complex C++ object hierarchy, including nested objects. It allows pointers between objects to be represented in XML with IDs -- information meaningful both to the XML parser and to our application. Finally, the C++ code is designed to be maintainable in a team setting, so that side effects between developers can be minimized."

  • [July 27, 2000] "Shaking Off the Wires." By Hank Simon. In Intelligent Enterprise Volume 3, Number 11 (July 17, 2000), pages 58-62. "WML is the markup language that sits on top of WAP. It is what HTML is to the conventional Web. WML is a flavor of XML and lets Web page developers create information that handheld computers, palmtops, smart cell phones, pagers, and other wireless devices can read. The WAP standard works with cellular digital packet data (CDPD), code division multiple access (CDMA), time division multiple access (TDMA), global systems for mobile communications (GSM), and other wireless standards. Wireless devices communicate through the wireless network to a WAP server. A WAP server converts data or Web pages between WAP and TCP/IP. This conversion lets conventional Web servers send WML pages to wireless devices, which use microbrowsers that let users surf the Web. Tools are emerging that will automate the ability to author content for multiple devices: cell phones, palmtops, and desktops. XML will help this situation by separating information into pure XML content and pure XML style sheet language (XSL)-based presentation. The point is to design an XML document architecture that separates presentation method, which varies by device, from content. In this way, the XML-based content can be translated to HTML for conventional browsers and to WML for microbrowsers by using different XSL scripts." See "WAP Wireless Markup Language Specification."

  • [July 27, 2000] "Channel Surfing: Ready to hitch your data delivery wagon to XML? [Product Review.]" By Nelson King. In Intelligent Enterprise Volume 3, Number 11 (July 17, 2000), pages 54-56. Extensible markup language (XML) is a specification. XML is a computer industry buzzword. XML is something a lot of vendors are issuing white papers about. You've probably heard of XML. How many products are actually using XML at an enterprise level? Not many; at least not yet. There are important exceptions, and DataChannel Server is one of them. DataChannel Inc.'s DataChannel Server 4.0 (DCS, formerly called RIO server) is an enterprise software system that uses TCP/IP networks to manage information content and distribute it to people in an organization. If you're an end user, using DCS means opening a Web browser or Windows client, going to a start page, and viewing the information in the corporation that you (or someone) has decided you need to see. . . There are so many ways to deliver information to a company's employees (and perhaps business partners) that the difficulty will be to select the approach that not only fits the situation but also does it the most efficiently. In some situations, perhaps a great deal of collaboration is necessary. In others, perhaps the volume of data is extreme, as it might be for statistical analysis. DataChannel Server is not always the best solution. On the other hand, because of its sophisticated use of XML, DCS may be able to accomplish projects such as EDI or business-to-employee interactions more easily and effectively than any other approach. So, in part, using DCS 4.0 means hitching a company's data delivery wagon to XML. Considering the state of XML development so far, a company could hardly do better. The expertise gained from implementing DCS 4.0 could well be a major advantage for using XML in business-to-business applications as well as providing a remarkably efficient way to provide information for employees. DataChannel Server 4.0 is a solid, useful, and flexible product that should be high on a short list of products for those considering an EIP, or for that matter, any of several other kinds of data-driven applications."

  • [July 27, 2000] "XML Gets Down to Business. XML's Promise of Open-Platform Data Exchange is Being Realized." [PC Tech Solutions] By Steven E. Sipe. In PC Magazine Volume 19, Number 14 (August 2000), pages 111-114. XML for aggregation and syndication. A look at "how XML is being used today."

  • [July 27, 2000] "ASP+ ListEditor in C#." By Chris Lovett. In MSDN Online Voices - Extreme XML (July 25, 2000). With online sources. ['XML technologies columnist Chris Lovett revisits his April ListEditor sample, now porting the code to C#.'] "There is a ton of amazing, innovative work in the .NET Framework -- and it has been in development for years. I've personally been programming in C# for about a year now, and I love it! It is definitely the most productive programming environment I've ever used -- and I've used just about every environment that has been on the market. We have a brand-new, 100 percent C# (C-sharp) implementation of the XML family of technologies. This includes stream-level classes, such as XmlReader and XmlWriter; W3C DOM classes such as XmlNode, XmlDocument, XmlElement, and so forth; XPath classes, such as XmlNavigator; and XSLT classes, including XslTransform. You'll see a lot of overview papers and reference material on all this stuff, so I won't drill into it here. Instead, I've decided to live up to a promise I made at the PDC, which is to port the ListEditor application from my April article to C#, running in the new .NET ASP+ Framework. See the ASP+ and C# source code. Now, because I already designed this application to be loosely coupled between client and server using XML as my data-transfer format, the client side of the application remains unchanged... An XmlNavigator class can navigate any XML, not just the in-memory DOM tree. For example, we have a sample XmlNavigator class that navigates the system registry. It is simply not practical to populate an in-memory DOM tree with the entire contents of the registry, because the DOM doesn't scale. The DOM requires 'object identity for nodes' -- which means that every time you ask for a node from the DOM, you are supposed to get back the same object. This makes it hard to virtualize. The XmlNavigator class solves this problem -- and we have in fact already exploited this by providing an XmlNavigator subclass called DataDocumentNavigator, which can navigate relational tables, rows, and columns stored in an ADO+ DataSet. The XmlNavigator class's XPath support also allows you to provide input to the XslTransform class. Plug in the DataDocumentNavigator, and you will quickly see that you can now transform relational data with XSLT just as easily as you can transform the contents of an XmlDocument object. The ListEditor is still designed for low volume. The performance bottleneck is actually the file system, because it is loading, updating, and saving the XML lists on each ASP+ request. Instead, it should cache stuff in memory, or use a real database backend. But the ListEditor will really shine if you are doing low volume edits on a million lists."

  • [July 16, 2000] "Searching for the perfect ... SCORM?" By Herb Bethoney In eWEEK (July 16, 2000). "Heeding Teddy Roosevelt's advice to speak softly and carry a big stick, the Department of Defense is stepping gently into the arena of e-learning while carrying its big contract-spending stick. According to market researcher IDC, over the next few years IT e-learning will become even bigger than the estimated $1.7 billion business it already is, based on sales of content and authoring tools and creation of customized courses and learning management systems. Recognizing the need for a cost-effective distributed learning management system that is consistent across an organization -- and even beyond its walls -- the DOD more than a year ago launched an initiative for an advanced distributed learning specification to improve employee performance and cut training costs. With diplomatic skill worthy of Kissinger, the DOD's Advanced Distributed Learning group gathered e-learning vendors, standards-setting groups, government and military trainers, and academics from around the country and formed the ADL specification group, with outstanding results. In fairly quick order, the group released Version 1.0 of its Shareable Courseware Object Reference Model, or SCORM, to enable learning management systems to reuse content and save the cost of creating the same material over and over again. The SCORM specification was designed to address the DOD's frustration with not being able to share distance learning courses among the different learning management systems deployed throughout the department. Last month, the DOD's ADL team conducted a successful test of the first version of the SCORM specification in Alexandria, VA., at the Institute for Defense Analyses' ADL Co-Laboratory Plugfest. Nothing motivates vendors like a lot of DOD cash, and we were pleasantly surprised that more than 90 organizations, including learning software developers and content providers from industry, government and academia, pledged support for SCORM. We saw numerous demonstrations of interoperability, and I can tell you this stuff really works... For the first time, content from different vendors' learning management systems was passed to other vendors' systems without a hitch. In fact, many of the vendors at the Plugfest said that they'll have SCORM-tested products ready in the next few months. The Plugfest also gave the DOD's ADL organization and the AICC, IEEE and IMS standards-setting groups the opportunity to discuss a unified e-learning specification that incorporates the four groups' work. The ADL Co-Lab also outlined its timetable for the availability of conformance test software and its plans for testing e-learning applications over the next few months."

  • [July 11, 2000] "Microsoft woos developers with test version of .Net tools." By Mike Ricciuti. In CNet (July 11, 2000). "Microsoft today attempted to sway software developers--some of the company's toughest critics and its most important customers--to its recently announced plan for linking its software more tightly to the Internet. At a company-sponsored software developer conference here, Microsoft distributed the first tools as part of its .Net plan, first announced last month. . . The company's .Net plan, heavily based on the Extensible Markup Language (XML) data-sharing standard, is aimed at making Microsoft's existing software available over the Web to traditional PCs and to increasingly popular devices such as cell phones and personal digital assistants (PDAs). The company is also focusing on making Web services, such as security and directory services, ubiquitous and easy to use for software developers. Today, Microsoft distributed a test version of the next release of its development tools, called Visual Studio.Net, to more than 6,000 attendees here. The tools package includes updates to Visual Basic, Visual C++ and Visual FoxPro tools, and includes the first version of C#, a new tool announced last month. As first reported by CNET, C# is a Java-like software programming language intended to simplify the building of Web services using Microsoft software. . . In other news, Microsoft said it has published the specifications for two XML-based technologies to its Web site for review. SOAP, or the Simple Object Access Protocol, is based on XML and forms the cornerstone of Microsoft's .Net plan. The technologies, called SOAP Contract Language and SOAP Discovery, are intended to let programmers more easily find and link to Web-based services. Microsoft also detailed plans for adding XML support to its existing server software through new products called .Net Enterprise Servers. The company announced a language, called XLANG, for integrating multiple Web programs through Microsoft's BizTalk software. As first reported by CNET, Microsoft plans to add a feature called Orchestration to BizTalk Server. Orchestration is Microsoft's technology for easily defining the business process logic that dictates how an e-commerce Web site functions and the information that needs to be passed among mainframe, Unix, PDA and Windows-based computers to complete a transaction."

  • [July 11, 2000] "XML/EDI." By Alain Dechamps (Workshop Manager in CEN/ISSS). "XML alone is not sufficient. Smart data needs to be structured for exchange and its meaning needs to be shared between processing systems. Many believe that EDI - Electronic Data Interchange - will have much to contribute to enrich the functionality of XML. The EDI standardization work for business-to-business messaging, notably UN/EDIFACT, has already identified a large collection of exchangeable information objects, which provides a good starter set for defining business-related information objects for use within XML applications. CEN/ISSS acts as the European entry point for the UN/EDIFACT process through its EBES (European Board for UN/EDIFACT Standardization) Workshop. XML/EDI builds upon the ground rules of XML and potentially offers the solution for interfacing existing EDI applications with the next generation of XML-aware applications. Moreover, it could lead to a new generation of EDI-based applications, which are simpler and more affordable, and more attractive for implementation by small and medium sized enterprises. As businesses start to use XML for tasks other than business-to-business communication (e.g., for interaction with consumers), integration of data from multiple sources (e.g. such as the creation of company databases) is expected to become much easier, as well as faster. Because XML enables devices to process "on the spot", the requirement for Web server processing is significantly reduced. The wide adoption of XML therefore should result in considerable improvements in network traffic and more efficient communications on a global basis. XML/EDI can potentially offer a number of major benefits to the EDI communities, but its full benefits are unlikely to be realised if different communities devise their own conventions on the usage of tags and the tagged elements, and/or make use of incompatible (interpretations of) semantic sets. Agreement on a common set of best practices is crucial for advancing XML/EDI. CEN/ISSS has therefore recently established a Workshop to offer a neutral, open and flexible platform for a common and standardized approach to the development and application of XML/EDI. A range of projects is already under way and the first CEN Workshop Agreement (CWA) is currently being reviewed, with the final version expected before mid-year. Further CWAs are expected at the end of 2000 and by the middle of next year. . ." See and CEN/ISSS XML/EDI Workshop - Document Register 2000.

  • [July 07, 2000] "Integrity Constraints for XML." By Wenfei Fan (Temple University), and Jérôme Siméon (Bell Labs). Pages 23-34 (with 28 references) in Proceedings of the Nineteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. PODS 2000. Dallas, Texas. May 15 - 17, 2000. Abstract: "Integrity constraints are useful for semantic specification, query optimization and data integration. The ID/IDREF mechanism provided by XML DTDs relies on a simple form of constraint to describe references. Yet, this mechanism is not suffcient to express semantic constraints, such askeys or inverse relationships, or stronger, object-style references. In this paper, we investigate integrity constraints for XML, both for semantic purposes and to improve its current reference mechanism. We extend DTDs with several families of constraints, including key, foreign key, inverse constraints and constraints specifying the semantics of object identities. These constraints are useful both for native XML documents and to preserve the semantics of data originating in relational or object databases. Complexity and axiomatization results are established for the (finite) implication problems associated with these constraints. These results also extend relational dependency theory on the interaction between (primary) keys and foreign keys. In addition, we investigate implication of more general constraints, such as functional, inclusion and inverse constraints defined in terms of navigation paths. [Conclusions:] We have proposed a formalization for XML DTDs that specifies both the syntactic structure and integrity constraints. The semantics of XML documents is captured with simple key, foreign key and inverse constraints. We have introduced several families of constraints useful either for native documents or for preserving the semantics of data originating in structured databases. In addition, these constraints improve the XML reference mechanism with typing and scoping. We have investigated the implication and finite implication problems for these basic XML constraints, and established a number of complexity and axiomatizability results. These results are not only useful for XML query optimization, but also extend relational dependency theory, notably, on the interaction between (primary) keys and foreign keys. We have also studied path functional, inclusion and inverse constraints and their implication by basic XML constraints. On the theoretical side, a number of questions are still open. First, it can be shown that (finite) implication of multi-attribute primary keys and foreign keys is in PSPACE. Can this be tested more effciently? Second, we only investigated implication of path constraints by basic constraints. Implication of path constraints by path constraints has not been settled. On the practical side, the basic constraints provide a good compromise between expressiveness and complexity. An important application of XML is data integration [See: S. Cluet, C. Delobel, J. Simeon, and K. Smaga. "Your mediators need data conversion!" In Proceedings of ACM SIGMOD Conference on Management of Data, pages 177-188, Seattle, Washington, June 1998]. In this context, important questions are how constraints propagate through integration programs, and how they can help in verifying the correctness of the programs?" See also by the author: "Finite Satisfiability of Key and Foreign Key Constraints for XML Data" (abstract). [cache] [alternate URL]

  • [July 07, 2000] "DTD Inference for Views of XML Data." By Yannis Papakonstantinou (UC San Diego), and Victor Vianu (UC San Diego). Pages 35-46 in Proceedings of the Nineteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. PODS 2000. Dallas, Texas. May 15 - 17, 2000. With 45 references. "We study the inference of Data Type Definitions (DTDs) for views of XML data, using an abstraction that focuses on document content structure. The views are defined by a query language that produces a list of documents selected from one or more input sources. The selection conditions involve vertical and horizontal navigation, thus querying explicitly the order present in input documents. We point several strong limitations in the descriptive ability of current DTDs and the need for extending them with (i) a subtyping mechanism and (ii) a more powerful specification mechanism than regular languages, such as context-free languages. With these extensions, we show that one can always infer tight DTDs, that precisely characterize a selection view on sources satisfying given DTDs. We also show important special cases where one can infer a tight DTD without requiring extension (ii). Finally we consider related problems such as verifying conformance of a view definition with a predefined DTD. Extensions to more powerful views that construct complex documents are also briefly discussed. [Conclusions:] We presented a Data Type Definition inference algorithm that produces tight specialized context-free DTDs for selection views of XML data. We used lotos and ltds as formal abstractions of XML documents and DTDs. The language loto-ql used for view definitions captures the common core of several query languages that have been proposed for XML. As a practically important side effect, the ltds produced by the inference algorithm can be used to test conformance of selection views to predefined ltds." Also: Postscript. Compare: B. Ludaescher, Y. Papakonstantinou, P. Velikhov, V. Vianu. "View Definition and DTD Inference for XML", Post-ICDT Workshop on Query Processing for Semistructured Data and Non-Standard Data Formats, 1999. For related publications, see the web site of Yannis Papakonstantinou. [cache]

  • [July 07, 2000] "Typechecking for XML Transformers. By Tova Milo (Tel Aviv University), Dan Suciu (AT&T Labs), and Victor Vianu (UC San Diego). Pages 11-22 (with 30 references) in Proceedings of the Nineteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. PODS 2000. Dallas, Texas. May 15 - 17, 2000. "We study the typechecking problem for XML transformers: given an XML transformation program and a DTD for the input XML documents, check whether every result of the program conforms to the specified output DTD. We model XML transformers using a novel device called a k-pebble transducer that can express most queries without data-value joins in XML-QL, XSLT, and other XML query languages. Types are modeled by regular tree languages, a robust extension of of DTDs. The main result of the paper is that typechecking for k-pebble transducers is decidable. Consequently, typechecking can be performed for a broad range of XML transformation languages, including XML-QL and a fragment of XSLT. [Conclusion:] We have developed a general, robust framework for typechecking XML transformers. The k-pebble transducer provides an abstraction that can express existing XML transformation languages such as XML-QL and a subset of XSLT. The regular tree languages capture current DTDs and several proposed extensions. Thus, our framework for typechecking is likely to remain relevant as the XML standards evolve. Future work has to address the complexity of typechecking and the treatment of data values in the context of restricted classes of XML transformations. Restrictions are often acceptable in practice, and may lead to efficient typechecking algorithms. We believe that k-pebble transducers provide a good framework for such a study, as suggested by our preliminary results in this direction." See the project page description: "The focus of the first year of the project has been on semistructured XML data. The database group at UCSD, including Victor Vianu (PI of the present project), Yannis Papakonstantinou, several graduate students and one post-doctoral researcher, initiated the development of a prototype mediator system and query language for XML data, called XMAS. A central component of the mediator system will be a DTD inference mechanism whose purpuse is to infer a DTD for an integrated view of XML sources, from the DTDs of the sources and the definition of the view. Preliminary formal results have been obtained on methods and algorithms for DTD inference in views defined by XMAS queries. A closely related line of research concerns typechecking XML transformation programs with respect to pre-specified output DTDs. The results obtained so far show decidability of the problem for a very wide class of XML transformers, modeled by an abstract device called k-pebble transducer. This subsumes the core of current XML languages, such as XML-QL and XSL. This line of research will also be pursued in the near future. Another topic to be investigated is workflow specification and verification and the connection with XML views, with applications to electronic commerce. Finally, another line of research has focused on heterogeneous data, including spatial data and topological queries, with applications to geographic information systems. The first year of the project has proven to be quite fruitful in terms of research advances and prototype development. The research on semistructured data and XML has good momentum and can be expected to yield substantial new results over the next year." [alt URL; cache]

  • [July 07, 2000] "New standards orbit XML." By Tom Yager (InfoWorld Test Center). In InfoWorld Volume 22, Issue 27 (July 03, 2000), pages 37-39. "XML is extensible, which means it can easily incorporate new features related to document formatting and automatic processing. Best of all, XML is freely available, which is part of the reason why few companies undertake information management projects without considering the role of XML. Here to stay: We've seen lots of computing fads come and go, and each one's downfall has brought costly consequences. But XML is here for the long haul, partly because it's a proven, widely adopted technology and partly because its inherent flexibility has led to scores of new technologies and extensions that support and rely on parent XML. But in many ways, these technology offshoots contribute to XML's biggest problem: complexity. There's at least one XML parser for every popular programming language and operating system. To ensure consistency and compliance with the XML specification, the World Wide Web Consortium (W3C) issues standards (or 'recommendations') that govern the structure of XML files and the programming interfaces used to manipulate them. The W3C also evaluates proposed extensions to XML, ensuring that new methods of parsing XML data are consistent before they become standards. But developers and vendors still have plenty of room to improvise, because the W3C -- best known for its standardization of HTML -- is rather liberal as standards bodies go. It does not enforce its standards, and there is no W3C-compatibility certification program. As a result, the W3C has created a cooperative community not unlike that of the Linux set: Some of the most crucial additions to the XML specification have resulted from a programmer's off-standard riffing. This means is that if you're planning an XML development project or data management architecture, you'd be well-advised to look beyond the standard XML developer's kit. Most of the truly interesting work is happening in the supporting technologies. Over the years, an impressive body of XML-related work has been assembled that can significantly cut your development costs and time to market. Without extensions, you'd have to write custom code for common operations such as searching and linking. As more vendors implement standards-based XML extensions, you're freed to spend more development time working on the applications. In this Test Center Analysis, we examine many key XML technologies. Some are already W3C recommendations, others are on the standards track, and still others are just good ideas that some developers have embraced. It's not a comprehensive list, as there are literally hundreds of early-stage extensions to XML, but this core group is worth your immediate attention. We gave each technology a green, yellow, or red rating on its relevance, acceptance among XML tools vendors, and prospect of becoming a W3C standard. Any technology with mostly green scores is a necessary part of an XML strategy, whereas lots of red indicates a degree of risk that must be weighed against the technology's value. To read more about them, visit the W3C's Web site."

Hosted By
OASIS - Organization for the Advancement of Structured Information Standards

Sponsored By

IBM Corporation
ISIS Papyrus
Microsoft Corporation
Oracle Corporation


XML Daily Newslink
Receive daily news updates from Managing Editor, Robin Cover.

 Newsletter Subscription
 Newsletter Archives
Globe Image

Document URI:  —  Legal stuff
Robin Cover, Editor: