The Cover PagesThe OASIS Cover Pages: The Online Resource for Markup Language Technologies
SEARCH | ABOUT | INDEX | NEWS | CORE STANDARDS | TECHNOLOGY REPORTS | EVENTS | LIBRARY
SEARCH
Advanced Search
ABOUT
Site Map
CP RSS Channel
Contact Us
Sponsoring CP
About Our Sponsors

NEWS
Cover Stories
Articles & Papers
Press Releases

CORE STANDARDS
XML
SGML
Schemas
XSL/XSLT/XPath
XLink
XML Query
CSS
SVG

TECHNOLOGY REPORTS
XML Applications
General Apps
Government Apps
Academic Apps

EVENTS
LIBRARY
Introductions
FAQs
Bibliography
Technology and Society
Semantics
Tech Topics
Software
Related Standards
Historic
Last modified: December 16, 2005
XML and Databases

Provisional references to resources on XML and Databases.

"All the relational vendors are trying hard to find ways of storing XML in relational databases. There are basically two approaches: flat storage and shredded storage. Flat storage stores an XML document in a cell of a table, shredded storage normalizes it into millions of rows and columns... In my view neither works well at all; but unfortunately, relational databases are what we've got to work with on the average consumer PC, and more advanced technologies like object databases have made little headway."    — Michael Kay, GenealogyXML, 2004-02-03.

General Resources

  • "XML and Databases." By Ronald Bourret. "This paper briefly discusses the relationship between XML and databases and lists some of the software available to process XML documents with databases. Although it is not intended to be exhaustive or provide in-depth evaluations of all the available software, I hope that it describes some of the major issues in using XML with databases. It is somewhat biased towards relational databases simply because that is where my experience is..."

  • XML Database Products. By Ronald Bourret. Updated November 08, 2000 or later. "The number of products for using XML with databases is growing with amazing speed -- new products seem to enter the market weekly. In this Web page, I have tried to capture the current state of the market, gathered from Web sites, product reviews, XML webzines, and other XML resource guides. . . Although complete description of how to use XML with databases is beyond the scope of this page, a brief review will help you choose what product is right for you. XML documents fall into two broad categories: data-centric and document-centric. Data-centric documents are those were XML is used as a data transport. They include sales orders, patient records, and scientific data and their physical structure -- the order of sibling elements, whether data is stored in attributes or PCDATA-only elements, whether entities are used -- is often unimportant. A special case of data-centric documents is dynamic Web pages, such as online catalogs and address lists, which are constructed from known, regular sets of data. Document-centric documents are those in which XML is used for its SGML-like capabilities, such as in user's manuals, static Web pages, and marketing brochures. They are characterized by irregular structure and mixed content and their physical structure is important. To store and retrieve the data in data-centric documents, you will need a database that is tuned for data storage, such as a relational or object-oriented database, and some sort of data transfer software. This may be built in to the database or might be third-party middleware. Depending on your needs, you may need Web-publishing abilities as well..."

  • Document Storage and Management. Software product listing in the "Free XML tools and software" list, by Lars Marius Garshol. This section lists tools for supporting document management, such as document databases and search engines. (1) XML document database systems Systems for persistently storing XML documents and providing access to their structure and individual parts; storing XML documents as blobs does not qualify. (2) XML document management utilities. (3) XML search engines.

  • "XML and Query Languages." See this reference collection for a number of research projects developing XML database management solutions in conjunction with XML-based query engines (e.g., SIM - The Structured Information Manager; Lore).

  • [March 06, 2001] XML Database Discussion List. A posting from Kimbro Staken (Chief Technology Officer, dbXML Group L.L.C.) announced the formation of a new mailing list for general discussions about XML database technologies. The mailing list is hosted by the XML:DB XML Database initiative. The list is designed as a "vendor neutral open forum and discussion of any topic related to XML database technology and standards is acceptable and encouraged." The forum is not intended for marketing, although announcements are acceptable if the list guidelines are followed. The new list had some 60 subscribers as of February 26, 2001, and is publicly archived. [Discussion]

Articles, Papers, News, Reviews

  • [November 08, 2005] "XML and Semi-Structured Data." By C. M. Sperberg-McQueen (World Wide Web Consortium). From ACM Queue Volume 3, Number 8 (October 2005), pages 34-41. Special Issue on Semi-Structured Data. "XML makes several contributions to solving the problem of semi- structured data, the term database theorists use to denote data that exhibits any of the following characteristics: (1) Numerous repeating fields and structures in a naive hierarchical representation of the data, which lead to large numbers of tables in a second- or third-normal form representation; (2) Wide variation in structure; (3) Sparse tables. XML provides a natural representation for hierarchical structures and repeating fields or structures. Further, XML document type definitions (DTDs) and schemas allow fine-grained control over how much variation to allow in the data: Vocabulary designers can require XML data to be perfectly regular, or they can allow a little variation, or a lot. Because the core semantics of an XML document rely not on particular application software but on declarative semantics that are (or should be) explicitly documented, the use of XML really does help ensure data longevity and reusability. Sometimes a rather thin, syntax-oriented, semantically vacuous layer of commonality is all that is needed to simplify things dramatically..."

  • [November 08, 2005] ACM Queue Special Issue on Semi-Structured Data. Edited by Charlene O'Hanlon. ACM Queue Volume 3, Number 8 (October 2005). ISSN: 1542-7730. "Some believe semi-structured data is nothing more than a fancy term for a data structure left unfinished, and others who firmly believe semi-structured data is the best way to describe data that doesn't easily fit into the traditional database structure. Semi-structured data, because of its unstructured-yet-structured nature, presents its own set of problems, such as schema discovery and determining the proper method to perform essential database operations such as extraction, integration, translation, and storage of data. Fortunately, there has been much research and testing to make semi-structured data a better neighbor with its traditional database counterparts...": [1] "Unstructured, But Not Really: Data that Doesn't Fit the Mold" [print] page 8, by Charlene O'Hanlon; [2] "Managing Semi-Structured Data" [print] pages 18-24, by Daniela Florescu (Oracle); [3] "Learning from The Web" [print] pages 26-32, by Adam Bosworth (Google); [4] "XML and Semi-Structured Data" [print] pages 34-41, by C. M. Sperberg-McQueen (World Wide Web Consortium); [5] "Order from Chaos" [print] pages 42-49, by Natalya Noy (Medical Informatics, Stanford University); [6] "Why Your Data Won't Mix [print] pages 50-58, by Alon Halevy (University of Washington); [7] "The Cost of Data", pages 62-64 [Curmudgeon column], by Chris Suver (Microsoft).

  • [August 2005] "Firing Up the Hybrid Engine." By Anjul Bhambhri. From IBM DB2 Magazine Online (August 2005). "Because enterprises have, in aggregate, trillions of dollars invested in relational data and relational database management systems (RDBMSs), simply replacing RDBMSs with a pure XML store isn't an option. Adding an XML-only database into the infrastructure adds yet another integration and complexity challenge. IBM is about to introduce true-native support for both XML and relational data. This evolutionary technology, now in beta tests with a small group of IBM customers, provides hybrid relational/XML storage from the ground up. That means DB2 will no longer need the XML Extender (just as it doesn't need an SQL Extender). DB2 will simply handle XML natively. (There are varying definitions of "native" XML support. To clear up the confusion about what's typically called "native" today, see the sidebar.) In the hybrid version, 'XML is handled as a new data type. Nearly every DB2 component, tool, and utility has been enhanced to recognize and handle this new data type. The new storage paradigm retains XML in a parsed, annotated tree form — similar to the XML Document Object Model (DOM) — that's separate from the relational data store. On top of both data stores (relational and XML) sits one hybrid database engine. That single engine can process XQuery, XPath, SQL, and SQL/XML. The engine features a bilingual query compiler with parsers for both SQL and XQuery. So developers can access information using either language (or both together) according to what makes the most sense in specific situations. A hybrid DB2 provides the flexibility to shift (between XML and SQL) paradigms as information management needs change. Storing relational and XML data in a database management system that understands and supports both models at every level (from the client, through the engine, down to the disk) provides flexibility and consistently fast performance. The XML data inherits the same backup and recovery, optimization, scalability, and high availability DB2 offers for relational data. Ultimately, a unified XML/relational database keeps things simple by avoiding the need to integrate XML and relational data from separate stores..." [PDF format, cache]

  • [March 2005] "The IBM Approach to Unified XML/Relational Databases." IBM Technical Report on Unified XML/relational storage. "'Native XML Storage' uses a physical storage model that is representative of the logical model or XML document.The XML document must be the fundamental basis for logical modeling, logical storage and physical storage to accurately represent and render the XML document. This approach leaves no layer or portion of the data engine exempt from understanding this XML model, from the data model through the engine down to disk and back to the client. The result is a data storage model with the flexibility to handle any XML statement in any column and uniform and exceptional performance across document and collection sizes... Stand-alone XML-only database products are currently available — however, these products are only XML databases and do not include support for relational data or data models other than XML. Because these products do not offer the capability or flexibility of unified offerings from the major relational vendors, they are not covered in detail in this document.The unified support offered by major vendors — such as, Oracle, Microsoft, Sybase and IBM — can be loosely grouped into four categories, with each vendor offering support for one of more of the following: (1) Shred, or decompose, the XML into relational or object relational form; (2) Store the XML intact in character form in a character large object (CLOB) (3) Store the XML in encoded binary form in a binary large object (BLOB) (4) Store the XML in a truly native repository... The approach IBM has taken is to support both shredded and true native storage. Support for shredding is important because XML can be used to feed existing relational schemas.Since documents can grow large and will be updatable in many cases, the advantages of non-BLOB storage for XML documents, which include storing at the node level of granularity instead of at the document level, are significant... IBM provides a truly native unified XML/relational database, supporting the XML data model from the client through the database down to the disk and back again. By deeply implementing XML into a database engine that previously was purely relational, IBM offers superior flexibility and performance relative to other offerings." [cache]

  • [March 2005] "Comparing XML and Relational Storage: A Best Practices Guide." IBM Technical Report on Unified XML/relational storage. While there have been years of research into physical and logical database design in purely relational systems, little definitive work has been done on the influence of XML on the logical and physical database design of unified XML/relational systems. The bulk of the influence of XML on logical and physical database design is based on fundamental properties of XML that make it different from the relational model: (1) XML is self-describing. A given document contains not only the data, but also the necessary metadata. As a result, an XML document can be searched or updated without requiring a static definition of the schema. Relational models, on the other hand, require more static schema definitions. All the rows of a table must have the same schema. (2) XML is hierarchical. A given document represents not only base information, but also information about the relationship of data items to each other in the form of the hierarchy. Relational models require all relationship information to be expressed either by primary key or foreign key relationships or by representing that information in other relations. (3) XML is sequence-oriented — order is important. Relational models are set-oriented —order is unimportant. What is a unified XML/relational database? Case 1a: Data has inherent hierarchical relationships; Case 1b: Data has multiple inherent hierarchical relationships; Case 2: Data has containment relationships; Case 3: Data has sparse attributes or a large number of attributes; Case 4: Schema evolution; Case 5: Highly variable or multiple schema... A true native XML data store is more than merely a data store that exposes XML to its clients — it must represent the XML throughout the entire data engine stack from client to disk and back out again. While XML storage may seem best for XML data, and relational storage best for relational data, in many cases this does not hold true. At times, relational storage proves best for XML data and XML storage proves best for tabular data..." [cache]

  • [December 14, 2004] "Sleepycat Software Releases Berkeley DB XML 2.0. Major Upgrade of Native XML Database Adds New Support for Emerging XML Data Access Standard and Up To 10x Performance Increase." - "Sleepycat Software, makers of Berkeley DB XML, the leading open source, native XML database, today announced the general availability of Berkeley DB XML 2.0. The major new release includes support for XQuery 1.0, the emerging standard for XML data access, as well as significant performance and usability enhancements. 'This is a major upgrade of Berkeley DB XML and is a significant advancement in native XML database development,' said Mike Olson, CEO of Sleepycat Software. 'Our new XQuery support benefits customers that have been waiting for a standard for XML databases similar to the SQL standard for relational databases.' 'Berkeley DB XML combines the extremely reliable database engine of Berkeley DB with the tremendous ease-of-use of native XML storage,' said Steve Bishop, CTO at WildCard Systems, Inc., a leading developer of solutions for electronic payment systems using pre-paid smart cards. 'Sleepycat's new support of XQuery gives us greater confidence in moving critical financial transaction support infrastructure to Berkeley DB XML.' 'Since AllPeers is a consumer application, it was extremely important for us when choosing a database to find a system that is fast, lightweight, embeddable and needs zero administration,' said Matthew Gertner, CTO of AllPeers, developers of a peer-to-peer information sharing platform. 'Berkeley DB XML not only meets all of these requirements but also offers excellent native support for XML. The AllPeers platform supports a massive peer-to-peer network for the sharing of millions of files. By storing the XML metadata for each file in Berkeley DB XML, we have achieved much faster time-to-market and a cleaner, more consistent architecture.' New features in Berkeley DB XML 2.0 include: (1) XQuery 1.0 support that allows application portability through complying with the July 2004 draft of the XQuery standard. (2) XPath 2.0 support that allows the selection of a portion of an XML document. (3) PHP API support to easily enable developers using the popular PHP scripting languages to work with XML documents. (4) Improved query performance that can be up to 10 times faster for multi-megabyte XML documents. (5) Ability to control storage granularity of documents (whole documents or nodes) to optimize query performance. (6) XML document streaming into database dramatically simplifies how documents are stored. Documents can now be streamed in from an URI, memory, or file..."

  • [September 02, 2004]   dbXML 2.0 Production Release Provides Open Source Native XML Database.    A communiqué from Tom Bradford reports on the recent production release of dbXML Version 2.0 by the dbXML Group. dbXML is a Native XML Database "capable of storing and indexing collections of XML documents in both native and mapped forms for highly efficient querying, transformation, and retrieval. In addition to these capabilities, the server may also be extended to provide business logic in the form of scripts, classes and triggers." New features in the dbXML Version 2.0 release include journaling transactions, XSLT transformations, full text indexing and full text querying, pluggable security models, a new command line system, new client/server APIs, SSL connection support, JSP Tag Library support, and embedded database APIs. dbXML 2.0 as an open source project governed by the terms of the GNU General Public License. This version of dbXML is basically "a complete rewrite of the dbXML 1.0 code, which forked into the Apache Xindice project. dbXML was developed using the Java 2 Standard Edition version 1.4, and should operate properly on all platforms to which J2SE 1.4 has been ported." The dbXML Group also "provides commercial licenses for situations where utilization under the terms of the GPL are inappropriate. Those using or deploying dbXML in a commercial environment may wish to consider contacting the group to discuss commercial licensing and support."

  • [April 29, 2004] "Databases Flex Their XML: IBM, Microsoft, Oracle, and Sybase Compete in Our Data Management Gymnastics." By Sean McCown. In InfoWorld (April 23, 2004). "If you could do one thing to improve integration and automate processes with customers and business partners, it would be to implement XML, which has become the standard for exchanging information between disparate systems because it is easily transformed into any format. With very little effort, the same file can be sent to several different customers with their own specific needs. XML eases the development effort for the transmitting company and gives recipients a safety net for altering the way they use the data without having to alter how they receive it. Being able to merge, query, and transform transmitted data with relational data is becoming as essential to businesses as data warehouses themselves. The good news is that the four leading relational databases, namely Oracle Database, IBM DB2, Sybase ASE (Adaptive Server Enterprise), and Microsoft SQL Server, not only can store XML data, but they hide much of the complexity of working with XML. Depending on which of these relational databases you use, however, the XML features you will have to work with may be extremely rich or limited in important ways. What does a fashionable XML database provide? Four basic functions: the ability to consume, store, search, and generate XML. The extent to which the database supports these functions and the methods it uses to accomplish them are what make for a successful implementation of XML in a database. I examined these four areas in Oracle Database 10g, IBM DB2 Universal Database V8.1, Sybase ASE 12.5.1, and Microsoft SQL Server 2000. I tested how they imported and read XML files, their options for saving the data, their indexing and query capabilities, and their options for creating XML and graded them based on the ease, flexibility, and speed with which they handled the most common XML operations. Of course, these products have many other capabilities beyond handling XML..." See other details in the InfoWorld special report.

  • [February 26, 2004] "Getting Reacquainted with dbXML 2.0." By Tom Bradford. From XML.com (February 25, 2004). "The goal of the dbXML project has been to produce a high quality, small footprint XML database that just works. dbXML is a native XML database written in Java. Native XML databases (NXDs) are databases that store XML using an internalized format for faster overall processing and representational flexibility. NXDs also provide support for indexing XML for improved query performance. Because it utilizes Java's memory mapped I/O and overlapping socket I/O, dbXML requires Java 1.4 or higher... In version 2.0 dbXML supports basic journaling transactions under the hood. At present, all transactions are implicit unless you're accessing dbXML using the database's lowest level APIs. Explicit transaction APIs will be exposed via the client/server APIs in a future release... The database now has a pluggable security model. There are currently three security managers to choose from. (1) NoSecurityManager provides no security whatsoever and is used when authentication is not needed to access the database. (2) SimpleSecurityManager provides simple security, where a single user name and password is used for the entire database. The user name and password are defined in the database's system.xml configuration file. (3) DefaultSecurityManager is so named because it is the default security manager. It provides access control based on users and roles stored in the database's system collections. dbXML 1.0 leveraged CORBA to provide client/server communications. While CORBA made dbXML accessible to many platforms and languages, it also came with its share of headaches. For version 2.0, it was decided that CORBA would no longer be used. dbXML 2.0 utilizes a web services hub called Project Labrador to provide client/server communications. Currently, Labrador only supports REST and the XML-RPC protocol. As a result, dbXML only supports these modes of access. A future version of Labrador will support SOAP; when it does, dbXML will automatically inherit this capability. This project has evolved quite a bit since version 1.0 and is very likely to evolve considerably in the coming year. It is already a mature product, with some rather high profile users, and is in a very good position to become the dominant open source XML database, if not one of the more popular XML databases in general..."

  • [December 09, 2003] "Software AG Increases XML Support Within Its Natural Development Environment for Windows, UNIX and Linux Platforms. Natural version 6 Enables Developers to Access XML Documents Stored in the Company's Tamino XML Server Without Learning an XML-Specific Query Language." - "Software AG, Inc., a pioneer in XML solutions, today announced the availability of Natural version 6 for Windows, UNIX and Linux platforms. Software AG's popular 4GL development environment, which is currently installed at approximately 3,000 organizations worldwide, now enables Natural programmers to access XML documents stored in the company's Tamino XML Server without needing to learn an XML-specific query language. Natural version 6 also allows developers working in Windows to access Natural programs running in UNIX or on a mainframe -- a capability the company calls Single-Point-of-Development. Both capabilities are designed to increase the speed and convenience of using Natural in an open systems environment. The announcement was made at the 2003 XML Conference and Exposition. Thanks to an expanded XML tool kit and new language constructs, users of Natural version 6 can process XML documents with greater ease and flexibility. For example, developers can gain access to Software AG's Tamino XML Server using familiar Natural DML (Data Manipulation Language) statements, meaning that XML documents residing in Tamino can be queried from Natural without the developer having to learn XPath, XQuery or a similar XML-specific protocol. In addition, Websites and HTML pages can now be designed more easily and efficiently in Natural version 6 through the incorporation of XSL (Extensible Stylesheet Language) support and the implementation a revised Web interface... The Single-Point-of-Development interface allows a Windows PC running Natural version 6 to access Natural programs running on Unix and mainframes -- thus combining the flexible development potential found on a Windows operating system with the stability and performance of mainframe and Unix. Using Single-Point-of-Development, programs created in Windows can be modified directly on the server platform, thereby addressing versioning and synchronizing issues flowing from the need to save code separately on multiple platforms. Single-Point-of-Development is available not only for the core Natural system, but also for four Natural add-ons: Natural Construct, Natural Engineer, Predict and Mainframe Navigator. These additional Natural engineering tools can therefore also be used via the Single-Point-of-Development interface..."

  • [September 30, 2003] "What's Next for SQL Server?" By Lisa Vaas. In eWEEK (September 26, 2003). "Users demanded SQL Server bond tighter with Visual Studio .Net, and Microsoft Corp. has since heeded the call, putting into beta testers' hands a version that opens the database up to .Net-compliant languages. The next version of SQL Server, code-named 'Yukon,' was originally slated for a spring 2004 release. That deadline was pushed out to the second half of next year after customers said they expected Yukon to fit hand-in-glove with the next version of .Net, code-named Whidbey. The Yukon beta was released in July to some 2,000 customers and partners. eWEEK recently talked with Microsoft Group Product Manager Tom Rizzo to find out how the .Net integration that customers demanded, along with upcoming features such as native XML and Web Services support, will benefit enterprises." [Rizzo:] "From the data level, we have things like native XML support. You take data from SQL Server, put it into XML format and ship it to anything that understands XML, such as Oracle has some XML support, and [IBM's DB2 database]. XML is ultimate interoperability -- it's an industry-standard format, and it's self-describing. You know both the schema of the data as well as the data itself. You don't lose the context when you pass your data around. We upped the level of XML support in Yukon through a number of things. In 2000 we had XML support but -- it was shredding. (Shredding is the parsing of XML tag components into corresponding relational table columns.) In Yukon the key thing is we have an XML type. Like you have STRING and NUMBERS and all that inside the database, now you can declare with the native data type XML. Although we had XML support in 2000, and many leveraged it and were happy with it, now we have native support... One reason we [moved to a native data type for XML] it is to support XQuery. Also to support XQuery we had to build code so as to combine XML with relational query language. You can take the relational sorts of queries you're used to in the database world, where people select things from tables with filters on that data. You can combine XQuery statements with such relational queries..."

  • [July 25, 2003] "The Future of XML Documents and Relational Databases. As New Species of XML Documents Are Emerging, Vendors Are Unveiling Increased RDBMS Support for XML." By Jon Udell. In InfoWorld (July 25, 2003). "Having absorbed objects, the RDBMS vendors are now working hard to absorb XML documents. Don't expect a simple rerun of the last movie, though. We've always known that most of the information that runs our businesses resides in the documents we create and exchange, and those documents have rarely been kept in our enterprise databases. Now that XML can represent both the documents that we see and touch -- such as purchase orders -- and the messages that exchange those documents on networks of Web services, it's more critical than ever that our databases can store and manage XML documents. A real summer blockbuster is in the making. No one knows exactly how it will turn out, but we can analyze the story so far and make some educated guesses. The first step in the long journey of SQL/XML hybridization was to publish relational data as XML. BEA Chief Architect Adam Bosworth, who worked on the idea's SQL Server implementation, calls it 'the consensual-hallucination approach -- we all agree to pretend there is a document.' XML publishing was the logical place to start because it's easy to represent a SQL result set in XML and because so many dynamic Web pages are fed by SQL queries. The traditional approach required programmatic access to the result set and programmatic construction of the Web page. The new approach materializes that dynamic Web page in a fully declarative way, using a SQL-to-XML query to produce an XML representation of the data and XSLT to massage the XML into the HTML delivered to the browser. Originally these virtual documents were created using proprietary SQL extensions such as SQL Server's 'FOR XML' clause. There's now an emerging ISO/ANSI standard called SQL/XML, which defines a common approach. SQL/XML is supported today by Oracle and DB2. It defines XML-oriented operators that work with the native XML data types available in these products. SQL Server does not yet support an XML data type or the SQL/XML extensions, but Tom Rizzo, SQL Server group product manager at Redmond, Wash.-based Microsoft, says that Yukon, due in 2004, will... Most of the information in an enterprise lives in documents kept in file systems, not in relational databases. There have always been reasons to move those documents into databases -- centralized administration, full-text search -- but in the absence of a way to relate the data in the documents to the data in the database, those reasons weren't compelling. XML cinches the argument. As business documents morph from existing formats to XML -- admittedly a long, slow process that has only just begun -- it becomes possible to correlate the two flavors of data..."

  • [July 21, 2003] "XQuery and SQL: Vive la Différence." By Ken North. In DB2 Magazine (Quarter 3, 2003). "Sometimes SQL and XML documents get along fine. Sometimes they don't. A new query language developed by SQL veterans is promising to smooth things over and get everything talking again. It's impossible to discuss the future of the software industry without discussing XML. XML has become so important that SQL is no longer the stock reply to the question, 'What query language is supported by all the major database software companies?' The new kid on the block is XQuery, a language for running queries against XML-tagged documents in files and databases. A specification published by the World Wide Web Consortium (W3C) and developed by veterans of the SQL standards process, XQuery emerged because SQL -- which was designed for querying relational data -- isn't a perfect match for XML documents. Although SQL works quite well for XML data when there's a suitable mapping between SQL tables and XML documents, it isn't a universal solution. Some XML documents don't reside in SQL databases. Some are shredded or decomposed before their content is inserted into an SQL database. Others are stored in native XML format, with no decomposition. And the nature of XML documents themselves poses other challenges for SQL. XML documents are hierarchical or tree-structured data. They're self-describing in that they consist of content and markup (tags that identify the content). In SQL databases, such as DB2, individual rows don't contain column names or types because that information is in the system catalog. The XML model is different. As with SQL, schemas that are external to the content they describe define names and type information. However, it's possible to process XML documents without using schemas. XML documents contain embedded tags that label the content. But unlike SQL, order is important when storing and querying XML documents. The nesting and order of elements in a document must be preserved in XML documents. Many queries against documents require positional logic to navigate to the correct node in a document tree. When shredding documents and mapping them to columns, it's necessary to store information about the document structure. Even mapping XML content to SQL columns often requires navigational logic to traverse a document tree. Other requirements for querying XML documents include pattern matching, calculations, expressions, functions, and working with namespaces and schemas... For these and other reasons, the W3C in 1998 convened a workshop to discuss proposals for querying XML and chartered the XML Query Working Group..."

  • [July 14, 2003] "Sleepycat Boosts Database." By Lisa Vaas. In eWEEK (July 14, 2003). "Sleepycat Software Inc. last week tossed its open-source database into the XML ring with the release of code for Berkeley DB XML, a native XML database that's built on top of its open-source embedded database, Berkeley DB. Berkeley DB XML offers a single data repository for storage and retrieval of native XML and non-XML data, avoiding the XML conversion overhead that occurs with relational databases that have been retrofit with XML adapters, officials said. The database supports XPath 1.0, a World Wide Web Consortium standard language for addressing parts of an XML document. It offers flexible indexing, giving application developers the ability to control query performance and tune data retrieval... Having Berkeley DB as the base engine for the XML offering means that the new product will inherit advanced database features such as concurrent access, transactions, recovery and replication, officials said. It will scale up to 256 terabytes for the database and up to 4GB for individual keys and values... The release of the open source code heralds the end of a 12-month beta program that comprised some 5,000 companies, many of them huge names such as 3M Co., Amazon.com Inc., BEA Systems Inc., Lucent Technologies Inc.'s Bell Labs, The Boeing Co., Cisco Systems Inc., Hewlett-Packard Co., IBM and NEC America Inc. Those big names are testimony to the traction XML is gaining in the enterprise, said Sleepycat officials, in Lincoln, Mass. Sleepycat's software is sold using a typical open-source scheme: free to download and use or fee-based to ship a product whose source code is withheld. The company has 200 paying customers, according to officials..." See details in the news story "Sleepycat Software Releases Berkeley DB XML Native XML Database."

  • [May 27, 2003]   IBM Announces General Availability of DB2 Information Integrator V8.1.    IBM has announced the general availability of DB2 Information Integrator V8.1 which "provides the foundation for a strategic information integration framework that helps customers to access, manipulate, and integrate diverse and distributed information in real time. The new product enables businesses to abstract a common data model across data and content sources and to access and manipulate them as though they were a single source. IBM's DB2 software helps businesses increase efficiencies by enabling them to centrally manage data, text, images, photos, video and audio files stored in a variety of databases. The new IBM product is most appropriate for projects whose primary data sources are relational data augmented by other XML, Web, or content sources." Core components in the DB2 Information Integrator include a Federated Data Server, a Replication Server for Mixed Relational Databases, and a Local Database Server. The federated data server allows administrators to use integrated graphical tools to configure data source access and define integrated views across diverse and distributed data; XML schema can be automatically mapped into relational schema. "DB2 Information Integrator V8.1 supports the predominantly read-access scenarios common to enterprise-wide reporting, knowledge management, business intelligence, portal infrastructures, and customer relationship management."

  • [May 20, 2003] "The Center of the Universe." By Ken North. In Intelligent Enterprise Volume 6, Number 9 (May 31, 2003). ['XML, Web services, analytics, and other hot technologies have the leading relational DBMS providers working overtime to remain the best choice for managing all of your data. Here's a look at what IBM, Microsoft, and Oracle are doing.'] "Whatever form software takes in the next decade, databases will continue as the primary tool for managing data. DBMSs from rivals IBM, Microsoft, and Oracle will provide persistent data management for Web services, embedded applications, Web stores, grid services, and other software. But SQL DBMS products will increasingly be judged on how well they support traditional tasks (such as transaction processing) while evolving to provide new capabilities (such as integrated business analytics). The latest releases of data management software from the big three vendors unite SQL with multidimensional and document-centric (XML) data and grid computing. Whether an organization follows a best-of-breed approach or taps a single vendor to build an IT infrastructure, problems can arise with interoperability, data aggregation, and data and application integration. That's why XML, XML-based messaging, XML-enabled databases, and Web services have become increasingly important. But XML is only one of the fields on which the database software giants are competing. Although there are now fewer SQL database vendors than a decade ago, competition remains fierce among IBM, Oracle, and Microsoft. Each company tries to gain an edge over the others by complementing their database platforms with broad-spectrum software offerings such as vertical market applications and developer tools. The DBMS products from each of these vendors provide parallel processing, extensible servers, online analytic processing (OLAP), tight integration with messaging software, and support for XML and Web services. The products diverge when it comes to programming database server plug-ins, querying multidimensional data sets, persisting message queues, orchestrating the flow of Web services, and processing audio, video, and other rich data types. This overview of the different strategies vendors are following sheds light on their plans for developing technologies to extend the SQL DBMS to handle business intelligence (BI), XML, Web services, and grid requirements..." See also Ken North's interviews with: [1] Rob High and Nelson Mattos of IBM; [2] Andrew Mendelsohn of Oracle; [3] Jim Gray and Michael Rys of Microsoft.

  • [May 12, 2003] "DB Updates Ease Web Services." By Lisa Vaas. In eWEEK (May 12, 2003). "Best-of-breed XML database developers Ipedo Inc., Sonic Software Corp. and Sleepycat Software Inc. are enhancing their respective native XML database software to make it easier for enterprises to use and manage XML data in Web services environments... Ipedo, for example, late this month will release Version 3.3 of its XML Information Hub, which boasts three new components. The first, called content conversion, automatically converts PDF, Microsoft Corp. Word and other non-XML documents into XML. The auto-organization component organizes, merges and transforms inbound content according to business rules, said Ipedo officials, in Redwood City, Calif. The third new piece, a universal XML Query engine, provides local and remote content and data source searching and updating using the XQuery standard. Some of the Ipedo upgrade's new features are compelling for user Thor Anderson, who is manager of program development at Collegis Inc. Anderson is working with Texas A&M University to take online digital library resources and put them into a repository with additional, educationally specific metadata. 'The more that [XQuery] engine is improved and sped up and usable, that's important,' said Anderson... Separately, Sonic late this summer will roll out a suite of integration products, called Sonic Business Integration Suite, that includes Sonic XML Server, a renamed and enhanced version of the Excelon XIS (Extensible Information Server) native XML database that the company acquired last fall. Enhancements to the XML database include a Web services-style interface laid over the XML processing and storage engine within XIS, said Sonic officials, in Bedford, Mass... Sleepycat next month will release Version 4.2 of Berkeley DB, its open-source embedded database..." See: "Ipedo Enhancements Boost Award-Winning XML Information Hub. Content Conversion, Auto-Organization, Universal XQuery, Web Services Views Reduce Cost and Complexity of Information Delivery."

  • [May 12, 2003] "Berkeley DB XML: An Embedded XML Database." By Paul Ford. From XML.com (May 07, 2003). ['Paul Ford introduces the embeddable Berkeley DB XML database. For many years the open source Berkeley DB libraries have been a popular choice for embedded database applications. It has been so ubiquitously used that chances are, you rely on some software product that embeds Berkeley DB. It is therefore pretty exciting when SleepyCat, the maintainers of Berkeley DB, announce that they will be releasing an XML-aware version of their database software.'] "Berkeley DB XML is an open source, embedded XML database created by Sleepycat Software. It's built on top of Berkeley DB, a 'key-value' database which provides record storage and transaction management. Unlike relational databases, which store data in relational tables, Berkeley DB XML is designed to store arbitrary trees of XML data. These can then be matched and retrieved, either as complete documents or as fragments, via the XML query language XPath. Berkeley DB XML is written in C++, APIs for Berkeley DB XML exist for C/C++, Java, Perl, Python, and TCL, and more languages interfaces are currently under development... An XML database has several advantages over key-value, relational, and object-oriented databases: (1) XML data is dropped straight into the database; it does not need to be manipulated or extracted from a document in order to be stored. (2) When inserted into the database, most (in Berkeley DB XML, all) aspects of an XML document, including white space, are maintained exactly. (3) Queries return XML documents or fragments, which means that the hierarchical structure of XML information is maintained... Berkeley DB XML, even in beta, is a promising solution for XML storage and retrieval. According to [John] Merrells, it is being evaluated by "several serious commercial enterprises." Based on Berkeley DB, it has an well-proven foundation for data storage, and SleepyCat's prior releases have proven them to be a reliable provider of well-documented open source tools for data storage. SleepyCat allows for commercial licensing of their open source tools, which may make this solution attractive for corporations that are skittish about open source. It is also worth noting that Berkeley DB XML users essentially get Berkeley DB "for free" with the product. In other words, it's easy to mix and match regular DB data sources with XML data sources. This combination may provide a strong alternative to relational and object-oriented databases... Since any data storage technology requires a significant investment in time and effort, this strong level of community and corporate support is encouraging; Berkeley DB XML, currently in its infancy, seems likely to be around for a long time, and by offering a standard embedded interface it may provide a very useful tool for programmers in need of robust data storage who want to avoid the overhead of a relational database. The tool has some growing to do, but even in its current form many programmers will find it a useful tool with a logical, powerful interface..."

  • [May 01, 2003] "A Normal Form for XML Documents." By Li-Yan Yuan (Professor, Department of Computing Science, University of Alberta, Canada). 40 pages. Reading reference for the course "Modern Database Management Systems" (Winter Term, 2003); "this course covers research topics in advanced database management systems as well as emerging database techonologies, with emphasis on XML data and XML support for object-oriented database management systems... Given a DTD, and a set F of FDs, ( D, F ) is in XML normal form (XNF) if and only if for every nontrivial FD of the form S --> p.@l or S --> p.S, it is the case that S--> p is implied by F. The presentation references the paper "A Normal Form for XML Documents", by M. Arenas and L. Libkin, published in the Proceedings of ACM PODS02. [cache]

  • [May 01, 2003] "An Information-Theoretic Approach to Normal Forms for Relational and XML Data." By Marcelo Arenas and Leonid Libkin (University of Toronto). Paper for presentation at the 22nd ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS 2003), San Diego, USA, [June 9-12] 2003. "Normalization as a way of producing good database designs is a well understood topic. However, the same problem of distinguishing well designed databases from poorly designed ones arises in other data models, in particular, XML. While in the relational world the criteria for being well designed are usually very intuitive and clear to state, they become more obscure when one moves to more complex data models. Our goal is to provide a set of tools for testing when a condition on a database design, specified by a normal form, corresponds to a good design. We use techniques of information theory, and define a measure of information content of elements in a database with respect to a set of constraints. We first test this measure in the relational context, providing information theoretic justification for familiar normal forms such as BCNF, 4NF, PJ/NF, 5NFR, DK/NF. We then show that the same measure applies in the XML context, which gives us a characterization of a recently introduced XML normal form called XNF. Finally, we look at information theoretic criteria for justifying normalization algorithms. Several [other] papers attempted a more formal evaluation of normal forms, by relating it to the elimination of update anomalies. Another criterion is the existence of algorithms that produce good designs: for example, we know that every database scheme can be losslessly decomposed into one in BCNF, but some constraints may be lost along the way... Our [research] goal was to find criteria for good data design, based on the intrinsic properties of a data model rather than tools built on top of it, such as query and update languages. We were motivated by the justification of normal forms for XML, where usual criteria based on update anomalies or existence of lossless decompositions are not applicable until we have standard and universally acceptable query and update languages. We proposed to use techniques from information theory, and measure the information content of elements in a database with respect to a set of constraints. We tested this approach in the relational case and showed that it works: that is, it characterizes the familiar normal forms such as BCNF and 4NF as precisely those corresponding to good designs, and justifies others, more complicated ones, involving join dependencies. We then showed that the approach straightforwardly extends to the XML setting, and for the case of constraints given by functional dependencies, equates the normal form XNF of ["A Normal Form for XML Documents", by M. Arenas and L. Libkin, published in the Proceedings of ACM PODS02] with good designs. In general, the approach is very robust: although we do not show it here due to space limitations, it can be easily adapted to the nested relational model, where it justifies a normal form NNF..." [cache]

  • [April 29, 2003] "Bluestream Upgrades XML Database." By Lisa Vaas. In eWEEK (April 29, 2003). "Bluestream Database Software Corp. has released an upgrade to its native XML database that features smoother handling of collaborative content management with XML and binary data. XStreamDB 3.0 has a new resource manager that enables content management via integration with Web or print authoring and publishing software. It now supports Corel XMetaL for XML content authoring using that software's word processor-like view of content. The new version also improves support for Altova XMLSpy for editing data-centric XML documents... Other new features include faster full-text search and indexing, built-in WebDAV server, event triggers, automated backup and the new XStreamDB 3.0 Server Console application for easier server administration. XStreamDB 3.0 also now supports binary document types with MIME-type attributes in addition to native XML document support. It has also acquired derivation by extension and attribute groups, adding to its existing W3C schemas support. The database now has full-text search that features LIKE wildcard matching, found word marking, phrase search and proximity search. XStream 3.0 is a cross-platform database server that runs on Windows NT/2000/XP, Solaris, Mac OS X or Linux. It features a choice of Java API, WebDAV or the XStreamDB 3.0 Explorer application for access to documents and data. The database also supports XQuery with update extensions, full text search, shared resource management and XML schemas with automatic validation. XStreamDB is compliant with the ACID (Atomicity Consistency Isolation Durability) standard for open-source database systems. That standard indicates that the database supports "all or nothing" transactions -- those that either work to their conclusion or refrain from changing data. XStreamDB is a pure Java technology-based server and requires Java 1.3 or 1.4 Runtime Environment..." See the announcement "Bluestream Releases XStreamDB 3.0 Native XML Database. Major Upgrade Adds Features for Collaborative Content Management."

  • [April 05, 2003] Book announcement: XML Data Management: Native XML and XML-Enabled Database Systems, by Akmal Chaudhri, Awais Rashid, and Roberto Zicari. Addison Wesley, 2003. ISBN: 0201844524. 688 pages. The book is divided into five parts each containing a coherent and closely related set of chapters; these are self-contained and can be read in any order: Introduction; Native XML Databases; XML and Relational Databases; Applications of XML; Performance and Benchmarks. Topics covered include: (1) The power of good grammar and style in modeling information to alleviate the need for redundant domain knowledge; (2) Tamino's XML storage, indexing, querying, and data access features; (3) The features and APIs of open source eXist; (4) Berkeley DB XML's ability to store XML documents natively; (5) IBM's DB2 Universal Database and its support for XML applications; (6) Xperanto's method of addressing information integration requirements; (7) Oracle's XMLType for managing document centric XML documents; (8) Microsoft SQL Server 2000's support for exporting and importing XML data; (9) A generic architecture for storing XML documents in a relational database; (10) X007, XMach-1, XMark, and other benchmarks for evaluating XML database performance. The Preface and Chapter 1 ("Information Modeling with XML") are available online. See also the online Table of Contents.

  • [March 03, 2003] "ISO-ANSI Working Draft XML-Related Specifications (SQL/XML)." Draft text for: Information technology -- Database languages -- SQL -- Part 14: XML-Related Specifications (SQL/XML). // Technologies de l'information -- Langages de base de donnée -- SQL -- Partie 14: «Specifications à XML» (SQL/XML).] Edited by Jim Melton. ISO/ANSI WD Reference: WG3:DRS-020, H2-2002-365. August, 2002. ISO Reference: ISO/IEC JTC 1/SC 32/WG 3. Date: 2002-08-09. ISO/IEC 9075-14:200x(E). Produced by ISO (International Organization for Standardization) and ANSI (American National Standards Institute). ISO/IEC JTC 1/SC 32/WG 3; ANSI TC NCITS H2. 154 pages. "This part of ISO/IEC 9075 defines ways in which Database Language SQL can be used in conjunction with XML. This standard defines mappings from SQL to XML, and from XML to SQL. The mappings from SQL to XML include: (1) Mapping SQL character sets to XML character sets; (2) Mapping SQL <identifier>s to XML Names; (3) Mapping SQL data types (as used in SQL-schemas to define SQL-schema objects such as columns) to XML Schema data types; (4) Mapping SQL data values to XML data values; (5) Mapping an SQL table to an XML document and an XML Schema document; (6) Mapping an SQL schema to an XML document and an XML Schema document; (7) Mapping an SQL catalog to an XML document and an XML Schema document. The mappings from XML to SQL include: [1] Mapping Unicode to SQL character sets; [2] Mapping XML Names to SQL <identifier>s..." For an overview, see "Standards: SQL/XML is Making Good Progress," by Andrew Eisenberg (IBM) and Jim Melton (Oracle Corp), ACM SIGMOD Record Volume 31, Issue 2 (June 2002). The update describes new work as of June 2002: "in three parts. The first part provides a mapping from a single table, all tables in a schema, or all tables in a catalog to an XML document. The second of these parts includes the creation of an XML data type in SQL and adds functions that create values of this new type. These functions allow a user to produce XML from existing SQL data. Finally, the 'infrastructure' work that we described in our previous article included the mapping of SQL's predefined data types to XML Schema data types. This mapping has been extended to include the mapping of domains, distinct types, row types, arrays, and multisets..." [cache]

  • [March 03, 2003] "Special Characters, Database Mappings." By John E. Simpson. From XML.com (February 26, 2003). ['John E. Simpson discusses XML special characters and SQLX.'] "... Yes, an XSLT processor, just like any other application which expects legitimate XML as input, will choke on ampersands, less-than symbols, and so on instead of their entity-reference forms. If you use a GUI-based XHTML or HTML editor, you may have noticed that you're free to enter any old character into a document, even the 'dangerous' markup-significant ones. What's more, the editor even shows you a literal ampersand, instead of something horrible like &. If you examine the raw source behind the GUI cosmetics, though, you'll find entity references scattered around even though you well know you didn't key them in yourself. The editor is in effect mediating between the markup- and non-markup-based worlds in the same way that your preprocessor would need to do... In a comment posted shortly after [the last month] column's publication, technical writer, editor, and Oracle database guru Jonathan Gennick directed me to an emerging ISO/ANSI standard called SQLX. Billed as the place 'where SQL meets XML,', SQLX is a joint effort by representatives of IBM, Oracle, Sybase, Microsoft, and Northrop-Grumman to establish a standard for the 'ways in which Database Language SQL can be used in conjunction with XML.' As Gennick says, 'no need to reinvent the wheel' by proposing some alternative method of mapping identifiers'..." The the SQLX Workgroup website references an August 2002 ISO-ANSI Working Draft for XML-Related Specifications (SQL/XML).

  • [February 12, 2003] "XML Data Binding." By Eldon Metz and Allen Brookes. In Dr. Dobb's Journal #346 Volume 28, Issue 3 (March 2003), pages 26-36. Special Issue on XML Development, edited by Jonathan Erickson. ['XML data binding utilities dramatically simplify the task of writing XML-enabled applications by automatically creating a data binding for you.'] "An XML binding is programming language code that represents XML data, thereby ensuring that the documents conform to their schema. The generated code enables the transfer of XML data to/from instances of the generated classes. While XML data-binding tools may not always be useful when writing code to process XML, they usually do save time in coding, testing, and maintenance..." See also Ronald Bourett's "XML Data Binding Resources" and the program listings.

  • [January 17, 2003] "Driving ODBC, JDBC Drivers to XML Web Services." By Vance McCarthy. In Enterprise Developer News (January 13, 2003). "It will get much easier in 2003 for developers using ODBC and JDBC to upgrade to XML-based web services, at least according to DataDirect Technologies, one of the leading providers of database driver technologies to software providers and end users. DataDirect, a long-time provider of OEM and end-user driver tools for both the ODBC and JDBC worlds, is now bearing down on the idea using XML technologies to bring its driver-based technologies into web services. [Said] Brian Reed, DataDirect's vice president of market intelligence: 'SQL is for data at rest. XML is for data in motion. There is nothing more optimized than a relational database, if you're talking about stored data. But XML makes data easier to share... XML and SOA [Service-Oriented Architecture] are bringing the ability to standardize middleware, and allow the application to more easily move into the infrastructure -- and not just be inside a silo. This creates a dynamic infrastructure that will make it easier to change new things and still keep data interoperable'... Based on this picture, Reed said DataDirect is aggressively working with leading web services providers -- including Microsoft, Oracle, IBM, Sybase and others -- to migrate the ODBC/JDBC world into a new world of XML-driven loosely-coupled connectivity... DataDirect Connect for .NET 1.1 adds support for distributed transactions on Oracle and Sybase databases is based on Microsoft's Distributed Transaction Coordinator (MS DTC) as the transaction manager. Using MS DTC enables developers to implement 'serviced components' that require distributed transaction support and use ADO.NET data providers... In addition, MS DTC can be used to (1) update multiple databases and files from a single application, (2) update geographically distributed databases, and (3) update databases that have been partitioned for scalability. MS DTC uses a two-phase commit protocol to ensure that all the resource managers commit the transaction or all abort it, to ensure data integrity... DataDirect's jXTransformer is DataDirect's XML software component for transforming data between relational and XML formats in Java programs. Rather than require developers and database professionals to learn database-specific tools, jXTransformer uses a language that is very similar to XQuery, the new draft SQL/XML standard, to enable developers to create XML from relational data or updating relational data from XML input. The goal, Reed said, is to let developers code once using a simple component, and reuse it across multiple databases without learning complex database-specific XML extensions. jXTransformer provides a Java API and a simple language, and a GUI tool for writing queries that will map and transform data between relational and XML formats. jXTransformer uses an API for data access and does not require any database changes, letting developers use existing stored procedures, reports, and queries without changing anything in the database. The tool reads data from relational databases and transforms data into any desired XML structure, and creates simple or complex hierarchical XML documents and XML document fragments..."

  • [January 17, 2003] "IBM Preparing Xperanto Deliverables." By Paul Krill. In InfoWorld (January 16, 2003). "IBM in the first half of this year is pledging to offer the first products based on its Xperanto technology for integrating multiple data points, as part of IBM's OnDemand initiative for leveraging existing technology assets. Xperanto represents a significant extension of IBM's DB2 database technology, allowing for federated access to data, regardless of whether the data resides in DB2 or in data management systems from vendors such as Oracle, Sybase, and Microsoft, said Nelson Mattos... IBM believes there is 'a major shift happening in the data management industry, which is moving away from the notion of a data management system that is only managing information that is physically stored in the repository toward a data management infrastructure that is managing, integrating, accessing, and analyzing all the information in the enterprise,' he added. IBM differs from Oracle in that Oracle favors a centralized approach to data management, Mattos contended. 'Oracle encourages customers to solve the integration problem by centralizing or moving all the data into the Oracle system, and that does not allow customers to obtain information on demand because if I'm going to centralize, I need to know what information I need to move into the Oracle system,' which is not always doable these days, Mattos said. Oracle officials, however, said IBM with Xperanto is not offering anything new as far as data federation because both IBM and Oracle already have federated data management capabilities..." See IBM Research's Xperanto project and references in "IBM: Xperanto Rollout To Start In Early 2003. Long-Promised Information Integrator on the Horizon."

  • [November 15, 2002] "Normalizing XML, Part 1." By Will Provost. From XML.com. November 13, 2002. ['Will Provost's Schema Clinic series on XML.com has so far taken an object-oriented view of W3C XML Schema design. This month, Will has written the first of a two-part series that examines the relational aspects of schema design. The series examines guidelines that achieve the goal of normalization -- the principles guiding database design -- using the mechanisms provided by W3C XML Schema.'] "The goal is to see what relational concepts we can usefully apply to XML. Can the normal forms that guide database design be applied meaningfully to XML document design? Note that we're not talking about mapping relational data to XML. Instead, we assume that XML is the native language for data expression, and attempt to apply the concepts of normalization to schema design. The discussion is organized loosely around the progression of normal forms, first to fifth. As we'll see, these forms won't apply precisely to XML, but we can adhere to the law's spirit, if not its letter. It's possible to develop guidelines for designing W3C XML Schema (WXS) that achieve the goals of normalization: (1) Eliminate ambiguity in data expression; (2) Minimize redundancy -- some would say, 'eliminate all redundancy'; (3) Facilitate preservation of data consistency; (4) Allow for rational maintenance of data. In this first of two parts, we'll consider the first through third normal forms, and observe that while there are important differences between the XML and relational models, much of the thinking that commonly goes into RDB design can be applied to WXS design as well. ... the key concept of reducing redundancy through key association is alive and well in W3C XML Schema design. While I'd love to finish on this bright note, I must report that there are devils inhabiting the details. In part two of this article, I'll point them out and discuss the implications for WXS design, as well as addressing the subtler fourth and fifth normal forms..."

  • [November 05, 2002] "Look at Storage Issues Before You Leap Into XML." By Kevin Dick (Kevin Dick Associates). In Application Development Trends Volume 9, Number 11 (November 2002), pages 45-49. Adapted from Chapter 5 of XML: A Manager's Guide, Second Edition, by Kevin Dick, Addison-Wesley. ['Organizations can avoid missteps by first selecting the right storage model for a project: a DBMS, a content management system or a native XML store.'] "XML documents are data that can be either at rest or in transit. Therefore, enterprises that want to successfully deploy XML must figure out how to manage XML in both of these states. For XML at rest, developers must first decide on the type of store to use. For XML in transit, they must first decide on the server infrastructure to deploy. It is not uncommon for projects using XML to stall while figuring out how to address the storage issue. The confusion stems from the fact that there are three vastly different choices: a database management system (DBMS), a content management system (CMS), or a native XML store. The appropriate choice depends on the characteristics of your XML data. What if you use XML as a data interchange format? In this case, a source application encodes data from its own native format as XML, and a target application decodes the XML data into its own native format. XML is an intermediate data representation. Both the source and target applications already have persistent storage mechanisms, almost certainly DBMSs of one sort or another. There is really no need to store the XML documents persistently themselves, except perhaps for logging purposes..."

  • [September 25, 2002] "Introduction to Xindice. An Open Source Native XML Database System." By Arun Gaikwad (Independent Software Consultant). From IBM developerWorks, Web architecture, XML zone. September 2002. ['This article is an introduction to an Open Source Native XML Database System, called Xindice (pronounced zeen-dea-chay). It is also an introduction to Native XML Database concepts.'] "Xindice is an Open Source Native XML Database System. In this article, you will learn how to: (1) Install Xindice; (2) Create and delete collections; (3) Insert and delete documents into these collections; (4) Use XQuery to query these documents. You can perform these operations on the command line or embed them in Java programs using the Java API. You will also learn to use the Java API to write JDBC style programs to communicate with Xindice. An XML Database System is something which you may think is unnecessary but once you start using it, you wonder how you would survive without it. I say this from personal experience. When I first heard of Native XML Database Systems about two years ago, I completely ignored them thinking that it was just hype. At that time, I was involved in the development of a project for a large financial brokerage company. We were using XML to send and receive financial feed data. It was necessary to save the feed data in some kind of permanent storage. As a Relational Database programmer, my first choice was to use a Relational Database System to save these XML documents. I decided to use CLOBs (Character Large Objects) with a modern RDBMS to save these documents. Since the RDBMS supported a Java API to insert and retrieve CLOBs, this was a very easy task. As our project evolved, I found that this approach had a major drawback. This was nothing but DIDO (Document In, Document Out). Retrieving partial documents or nodes from a DOM tree was not possible. I would have found a tool which saved the XML documents, performed database-like queries on nodes, and retrieved partial or full documents very useful. This is when NXDs came into the picture. If I had to do this project all over again, I would definitely use an NXD. If you need simple DIDO functionality, you might want use an RDBMS to save your documents, but for extended functionality such as Query and Update you should consider an NXD. Sometimes people try to save XML documents into Normalized Relational Database tables by mapping the document nodes into Relational format. This is not always easy. It is relatively easy to build an XML document from RDBMS tables, but not to store them because XML documents are hierarchical and almost free format..." Also available in PDF format.

  • [September 16, 2002] "Tame the Information Tangle: XML Data Management Systems." By Paul Sholtz. In New Architect Magazine Volume 7, Issue 10 (October 2002), pages 36-40. "Encoding information in XML and exposing it on the Web will help overcome these hurdles and enable fine-tuned, database-like queries on a global scale. Of course, if all the world's data is to be encoded in XML, we'll need more efficient ways to store and manage large volumes of XML data. To address that need, a new breed of document storage and management systems has appeared that's been specially optimized for publishing XML documents on the Web... If creating and maintaining relational data mappings seems like too much work for the scope of your XML application, one attractive alternative is to use a native XML database (NXD). The concept of a native XML database was first introduced by Software AG during the marketing campaign for its Tamino product line. Since then, the term has come into common usage among other companies developing similar products. NXDs are optimized for the storage and management of XML documents. Like other modern data management systems, they provide support for transactions, security, concurrent access, and query languages. Formally, a native XML database can be defined as a data management system that exhibits the following characteristics: (1) XML documents are the fundamental unit of logical storage in the system (similar to the way in which rows in a table are the fundamental unit of logical storage in a relational database system). (2) The system defines a logical model for XML documents, and stores and retrieves documents according to that model. At the very least, the model must include support for elements, attributes, PCDATA, and document order. Some examples of logical models that meet these requirements include the XPath data model and the XML InfoSet. (3) The system is independent of any underlying physical storage model. For example, it could be implemented using relational, hierarchical, object-oriented, or proprietary storage formats. NXDs are often a good choice for storing document-centric XML information. For example, NXDs support XML query languages that let you perform highly specialized queries like "find all documents where the second paragraph contains an italicized word." Most NXDs provide other powerful and sophisticated text-searching features, such as thesaurus support, word stubbing (for matching all forms of a word: swim, swam, and swimming, for example), and proximity searches (find all instances where the word "lake" occurs within five words of "swim"). These are extremely useful features when you're working with traditional documents, although they are usually much less important if you are working with data-centric XML information. There are other reasons you might want to consider using an NXD. Many such repositories are able to understand a DTD or an XML Schema, and can therefore provide data validation on the fly, as information is stored or updated. NXDs can also persist information such as document order, processing instructions, comments, CDATA sections, and entity usage, while many systems that attempt to store XML data into relational databases cannot..."

  • [August 06, 2002] "Managing Change." By Adam Bosworth (Vice President, Engineering, BEA Systems Inc). In XML & Web Services Magazine Volume 3, Number 5 (August/September 2002). "How can running instances of applications handle changes in business logic? That's the question I posed a few weeks back in an e-mail to a few key internal architects at BEA discussing some of the problems I think we still need to solve. Then I left on a four-day, five-country trip that left me out of the loop on e-mail. The question was meant to address a challenge faced by our customers with really long-running workflows and "conversations." In such cases, the business logic may change while the instances of the prior version of the application are still far from complete. Previously, I had thought this would not be an issue because people would not want to change the business logic of running instances, but simply deploy new applications with the new logic. However, numerous discussions with customers proved that the real world is a weird and wonderful place; people really do want to change business logic on the fly... [Customers?] If it is metadata, they are storing it in XML. If it is state that is essentially transient, they are increasingly managing it in XML. They are doing this because it is easy to write tools to analyze, migrate, and reshape XML to handle change. Customers have learned the hard way that this isn't true of either Java serialization or relational databases. With databases in particular, one of our customers' biggest problems, considering the highly dynamic world they live in, is the inflexibility of data in continuously running systems. Database administrators spend untold fortunes coping with this. Even after working with XML for six years, I'm still pleasantly surprised at the prevalent use of XML for metadata. I believe that we are at a point where the two biggest revolutions in computer science of the last 20 years, object-oriented computing and relational databases, have failed us. Because our systems must be available 24x7 for years on end, the methods we have for accommodating change just don't work. Customers running complex operations such as fabrication systems can never shut them down, but they constantly want to fine-tune the operations. In so doing they need to change the shape of the information they need, but cannot easily do so... So who needs an XML database? Anyone dealing with change..."

  • [August 5, 2002] "Oracle Goes XML." By Timothy Dyck. In eWEEK (August 02, 2002). "Oracle Corp. is the first among the big relational database vendors to make major changes to its database in response to XML, shaking up the generally overpriced and underperforming native XML database market something fierce but having a lesser effect on current Oracle database sites. Oracle9i Database Release 2 continues to provide the largest range of features available in a database... All the major database players are moving to strengthen support for XML data and XML query languages in their products. In the case of IBM's DB2 and Microsoft's SQL Server databases, XML technologies and SQL will be on the same level as data access techniques. However, Oracle has gotten there first with its XML DB engine. XML DB is a combination of three technologies: a large set of SQL functions that allows XML data to be manipulated as relational data (through a view or special SQL functions) as well as to retrieve relational table data in XML format; a native XML data type called XMLType that can store XML data either in an object-relational storage format that maintains the XML DOM (Document Object Model) or as the original text document; and a special hierarchical XML index type to speed access to hierarchies of XML files stored in Oracle9i's XML file repository. XML DB also supports XML Schema, the latest standard for defining the structure of XML documents, although it doesn't support the upcoming XML query language, XQuery. Instead, XML DB uses a combination of XPath and SQL to manipulate XML. The database includes an Extensible Stylesheet Language Transformation engine, made accessible through the built-in copy of Apache, that can retrieve XML data from XML DB and transform it into HTML or other formats... Previous versions of Oracle and other relational databases support the option of storing XML as text data or extracting data from XML and storing it in normal relational tables, but the interim option of storing data in a format that maintains DOM fidelity (including comments, namespaces, the distinction between elements, and attributes and element ordering) is valuable and is the distinguishing feature of a native XML database. The DOM format doesn't require XML documents to be re-parsed when accessed, and this, in combination with XML and SQL index types, should provide good performance."

  • [August 02, 2002] "The Next Generation Database - XDB." By Greg Mable. In XML Journal Volume 3, Issue 6 (June 2002). "... With the advent of Web services, applications are now free to communicate in a common format - that of an XML document - anywhere on the Web. Where the Web was once built on static content linked together via hypertext, XML takes it to the next level. Instead of users surfing the Internet via HTML pages linked with hyperlinks, we can now build Web-based applications that can be linked via XML documents. Imagine a user clicking on a link to a Web site. This in turn fires off an exchange of an XML document to another application. Here's the key: the XML document. This will be the primary means of information exchange and message passing. With the need to process XML documents comes the need to be able to store, retrieve, and report on them. Hence the need for a management system to handle the flood of XML documents that an application will process. This is where an XML database, XDB, comes in... So what is an XML database, or an XDB? In this article I define what it is, when and why you will need to use one, and what impact it will have on the business world. By the time you finish reading, you just may realize the importance of an XDB and will want to grab your surfboard to ride the next big wave. There are no requirements for how an XDB is expected to physically store XML documents. Some XDBs are built on an object database, others might use compressed files with an indexing scheme, and still others might be built on top of a relational database. At this time XDBs can be classified into two basic types (with a third type on the horizon): native and XML enabled. Native XML database: A native XML database (NXDB) is simply one that was designed from the ground up to store XML documents. It might make use of a preexisting technology such as object-oriented data storage techniques, but its mission is to store, retrieve, and update XML documents. XML-enabled database: In the second type, an XML-enabled database (XEDB), extensions are added to a preexisting database management system to support XML documents. An XEDB can be built on top of an existing object-oriented or relational database management system. An XEDB provides a mapping layer between the XML documents and its database structures as well as support for XML-based tools to retrieve and update XML documents. Convergence of NXDB and XEDB: The third type of XDB is in its formative stages, and like a wave approaching the beach, it is about to crest. It can be considered a convergence of the two other types: an XDB that is designed to handle XML documents but is built on a preexisting database technology, combining them into a unified data model and a single repository. In this article I'll briefly describe an example for each of these types..."

  • [July 30, 2002] "Adventures in High-Performance XML Persistence, Part 1. A High-Performance TCL-scripted XSLT Engine." By Cameron Laird (Vice president, Phaseit, Inc.). From IBM developerWorks, XML Zone. July 2002. ['XML storage is too sprawling a topic to offer easy answers. There's no one fastest XML database, nor fastest XML processing language. Still, it's helpful to understand the basic concepts of XML persistence so you can apply them to your specific situation. This article begins a new developerWorks series on high-performance XML by offering an explanation of common industry practices in XML persistence -- that is, storage of data beyond the lifetime of a single process.'] "You're responsible for large, mission-critical XML programs. You have dozens, or maybe thousands, of simultaneous users. Your XML pilot programs have gone well, and you've deployed more and more features. Your systems are in constant use, and response time is starting to stall. You start to wonder, 'What does it take to maximize XML performance?' The answer: You don't want to maximize your XML performance. You need to meet engineering requirements. Perhaps you need to manage scalability, or boost the responsiveness of specific applications. Don't hunt for the fastest XML storage. In that direction lie $700 hammers and the other symptoms of counter-productive obsession. Instead, learn to apply the basic concepts of XML so you can engineer the persistence needed for your own situation... The first principle of designing XML persistence is that any solution must make for a comfortable organizational fit. If your company requires use of Java technology, and a particular XML database has a poor Java binding, don't choose it. No matter how high its performance on standard benchmarks, it's likely that your co-workers will not make good use of it. Working with an unfamiliar technology will annoy them, and they're unlikely to achieve favorable results. On the other hand, suppose you work in an environment that provides a great deal of support for a database such as DB2. However well or poorly your XML content fits the DB2 persistence model, you should seriously consider DB2 storage. Sufficiently enthusiastic, well-equipped, and motivated expertise is likely to overcome modest mismatches on the technical level, as this article will show you. The principal categories of XML persistence center on these technologies: (1) Native file system, (2) Relational database management systems (RDBMS), (3) Special-purpose XML database managers, (4) Other data managers. The easiest XML storage is native: Keep XML document instances as named files in a file system. This is the most transparent and flexible persistence method, and should be your default starting point for new designs. ... No one XML persistence method is right for all scales of problem. Start with familiar technologies for your needs to store XML data. Make a clear distinction between policy requirements for transacting or storing data formatted as XML, and application-specific design requirements for data security and performance. Choose persistence methods compatible with the technologies your organization uses..."

  • [June 18, 2002] "XML Stores Get Richer Queries." By Matt Hicks. In eWEEK (June 17, 2002). "Native XML database developers X-Hive Corp., Excelon Corp., Ipedo Inc. and Software AG are adding more support in upcoming releases for emerging standards for such functions as querying. Much of the focus for the developers with their latest crop of XML databases is on bolstering querying capabilities through the XQuery XML data retrieval standard. X-Hive, for example, last week began shipping Version 3.0 of its X-Hive/DB, which supports XQuery, said officials in Rotterdam, Netherlands. Separately, Excelon, in a point release to its Extensible Information Server, due next month, will add full XQuery support. Also in releases planned over the next year and a half, the Burlington, Mass., company aims to support the XForms standard for handling XML forms, officials said. Ipedo, of Redwood City, Calif., is beefing up its current XQuery support. Version 3.1 of its namesake XML database, due next week, will be able to perform updates in the querying language. That release will also include support for the WebDAV, or Web-based Distributed Authoring and Versioning, protocol so documents from popular client applications can be published into the database server, officials said. For its part, Software AG plans to add full XQuery support in the next major release of its Tamino XML Server, Version 4.11, due by the end of the year. That release will also include validation of XML Schema; enterprise-level backup and restore of the database; and improved tools for Web services features such as Universal Description, Discovery and Integration directories, said company officials, in Darmstadt, Germany. All these companies are looking to extend their technological lead over Oracle Corp., IBM and Microsoft Corp., which offer XML add-ons and have plans to embed XML support deeper within their database engines..."

  • [May 06, 2002] "XML in Java: Data Binding with Castor. A Look at XML Data Binding for Java Using the Open Source Castor Project." By Dennis M. Sosnoski (President, Sosnoski Software Solutions, Inc.). From IBM developerWorks, XML Zone. April 2002. ['XML data binding for Java is a powerful alternative to XML document models for applications concerned mainly with the data content of documents. In this article, enterprise Java expert Dennis Sosnoski introduces data binding and discusses what makes it so appealing. He then shows readers how to handle increasingly complex documents using the open source Castor framework for Java data binding. If your application cares more about XML as data than as documents, you'll want to find out about this easy and efficient way of handling XML in Java.'] "Most approaches to working with XML documents in applications put the emphasis on XML: You work with documents from an XML point of view and program in terms of XML elements, attributes, and character data content. This approach is great if your application is mainly concerned with the XML structure of documents. For many applications that care more about the data contained in documents than the documents themselves, data binding offers a much simpler approach to working with XML... The document models discussed in previous articles of this series are the closest alternatives to data binding. Both document models and data binding build document representations in memory, with two-way conversions between the internal representation and standard text XML. The difference between the two is that document models preserve the XML structure as closely as possible, while data binding is concerned only with the document data as used by your application... Data binding is a great alternative to document models in applications that use XML for data exchange. It simplifies your programming because you no longer need to think in terms of XML. Instead, you can work directly with objects that represent the meaning of the data as used by your application. It also offers the potential for better memory and processor performance than document models... Data binding can provide other benefits beyond justprogramming simplicity. Since it abstracts many of the document details, data binding usually needs less memory than a document model approach. Consider, for instance, the two data structures shown in the earlier figures: The document model approach uses 10 separate objects, as compared to two for data binding. With a lot less to build, it may also be faster to construct the data binding representation for a document. Finally, access to the data within your program can be much faster with the data binding approach than with a document model, since you control how the data is represented and stored. I'll get back to these points later. If data binding is such great stuff, when would you want to use a document model instead? The two cases that require a document model are: (1) Your application is really concerned with the details of the document structure. If you're writing an XML document editor, for instance, you'll want to stick to a document model rather than using data binding. (2) The documents you're processing don't follow fixed structures. For example, data binding wouldn't be a good approach for implementing a general XML document database..."

  • [April 24, 2002] "Database Future Debated." By Paul Krill. In InfoWorld (April 24, 2002). "Whether the future of databases is the traditional, relational and SQL model with XML technologies incorporated into it or a new XML-based model is a matter of debate, according to panelists during a session Tuesday [2002-04-23] at the Software Development Conference & Expo. The fate of XML and SQL dominated the discussion, which featured officials from companies such as Oracle, Sun Microsystems, and IBM. 'I think that XML will become the dominant format for data interchange,' with its flexibility and ability to provide self-description,' said Don Chamberlin, a database technology researcher at IBM. Relational databases, he said, will be fitted with front ends to support XML and process queries based on the XQuery standard... Sun's Rick Cattell, a distinguished engineer at the company, had a less dominant outlook for XML, saying very few people are going to store XQuery data in an XML format. 'I think the momentum behind relational databases is insurmountable,' Cattell said, adding that he was drawing on his experience with object-oriented databases, which were unable to unseat relational databases in enterprise IT shops. Developers, Cattell said, will need tools to convert relational data to XML and vice versa. Another panelist, Daniela Florescu, chief technology officer at XQrl, said she was 'pretty optimistic [about] the performance of XML databases.' Documents will be stored natively in XML, she said. XQrl offers a version of the XQuery XML query language. Currently, performance on the Web is hindered because of translations between Java and XML data formats, Florescu said. 'I don't think we will have good performance as long as we have people marshalling data from XML to Java and back,' Florescu said... Panelists also touched on topic such as tuple space technology, which is intended to make it easier to store and fetch data by recognizing patterns. Tuple space technology is 'interesting, but I wouldn't predict that it's going to take over the world,' since much more research needs to be done and most people are not building production applications based on it, Cattell said. Cattell also said in-memory database technology is a 'no-brainer,' but there is not enough memory available yet to accommodate it... Panelist Jim Melton, consulting member of the technical staff at Oracle, said he is part of a vendor group called SQLX that has been working for a year to define ways to bring SQL and XML closer together. The group in mid-2003 plans to publish a specification called SQL/XML, which will contain publishing functions for the two formats..."

  • [April 11, 2002] "Database Strategies for Unstructured Content." By Stuart J. Johnston. In XML Magazine Volume 3, Number 3 (April/May 2002), pages 18-27. ['Relational and native XML database developers take diverse approaches to managing free-form information data.'] "... Consider using a native XML database as a consolidation point for data that has to be exchanged in an industry-standard format. With its ability to represent and query data from many sources as pure XML, a native XML database can enable levels of data correlation, aggregation, and information mining that would be difficult or impossible to achieve without a central place to standardize data formats and protocols. Although the field is still incipient, native XML databases might help enterprises respond to changes in the business environment more quickly and cheaply than custom approaches... a survey conducted in March 2001 at the Data Administration Management International Symposium by Intellor Group and Wilshire Conferences found that 12 percent of companies had already implemented a native XML database or planned to within the following 12 months. Indeed, advocates say that the use of native XML databases as middle-tier servers between conventional relational databases such as IBM's DB2 and XML-based Web services is catching on. Rather than using translators, the pure XML databases can speed processing time for electronic transactions while off-loading demand from the large-scale, enterprise database. Most native XML databases can communicate with relational systems relatively easily using either ODBC or JDBC drivers, through Extensible Stylesheet Language Transformations (XSLT), or in some cases using XPath. Rather than make the relational database translate SQL data into XML 'on the fly,' argue native XML aficionados, why not off-load most of that work to a native XML database? This approach has several benefits, including consolidation of all XML data in a single repository designed specifically to handle XML information and documents. However, the market is still formative, and many of the products themselves are not yet mature. Many of the players so far have come out with only version 1.0 releases, although a few have 2.0 and 3.0 releases now. But that doesn't mean that even the 1.0 releases aren't useful today, or that now wouldn't be a good time to get up to speed, try some pilot projects, and gain a measure of understanding as to what's good and bad about them. Key to the functionality of XML databases is support for several XML standards or proposed standards, although not every product will support all of them. The proposed standards include XSLT, Document Type Definitions (DTDs), and XML Schemas, as well as XPath (an XML language for addressing parts of XML documents), and XQuery. XQuery is an XML-based query language, an emerging World Wide Web Consortium (W3C) proposed standard for querying data in XML, which includes XPath 2.0 as a subset...

  • [March 15, 2002] "XPERANTO: Bridging Relational Technology and XML." From International Business Machines Corporation, DB2 Developer Domain. By Catalina Fan, John Funderburk, Hou-in Lam, Jerry Kiernan, and Eugene Shekita (IBM Almaden Research Center, San Jose, CA 95120) and Jayvel Shanmugasundaram (Cornell University). [March 2002.] 9 pages. ['The cutting edge of data management research! The XPERANTO research project enables XML-based applications to leverage relational database technology by using XML views of existing relational data.'] "XML has emerged as the standard data-exchange format for Internet-based business applications. These applications introduce a new set of data management requirements involving XML. However, for the foreseeable future, a significant amount of business data will continue to be stored in relational database systems. Thus, a bridge is needed to satisfy the requirements of these new XML-based applications while still leveraging relational database technology. This paper describes the design and implementation of the XPERANTO middleware system, which we believe achieves this goal. In particular, XPERANTO provides a general framework to create and query XML views of existing relational data. One of the features provided by XPERANTO is the ability to create XML views of existing relational data. XPERANTO does this by automatically mapping the data of the underlying relational database system to a low-level default XML view. Users can then create application-specific XML views on top of the default XML view. These application-specific views are created using XQuery, a general-purpose, declarative XML query language currently being standardized by W3C. XPERANTO materializes XML views on demand, and does so efficiently by pushing down most computation to the underlying relational database engine. Another feature provided by XPERANTO is the ability to query XML views of relational data. This is important because users often desire only a subset of a view's data. Moreover, users often need to synthesize and extract data from multiple views. In XPERANTO, queries are specified using the same language used to specify XML views, namely XQuery. XPERANTO executes queries efficiently by performing XML view composition so that only the desired relational data items are materialized. In summary, XPERANTO provides a general means to publish and query XML views of existing relational data. Users always use the same declarative XML query language (XQuery) regardless of whether they are creating XML views of relational data or querying those views. ... XPERANTO exposes relational data as an XML view. Users can then query these XML views using a general-purpose, declarative XML query language (XQuery), and they can use the same query language to create other XML views. Thus, users of the system always work with a single query language In addition to providing users with a powerful system that is simple to use, the declarative nature of user queries allows XPERANTO to perform optimizations such as view composition and pushing computation down to the underlying relational database system." See also "IBM Federated Database Technology," by Laura Haas and Eileen Lin.

  • [February 11, 2002] "Combining UML, XML and Relational Database Technologies. The Best of All Worlds For Robust Linguistic Databases." By Larry S. Hayashi and John Hatton (SIL International). Pages 115-124 in Proceedings of the IRCS Workshop on Linguistic Databases (11-13 December 2001, University of Pennsylvania, Philadelphia, USA. Organized by Steven Bird, Peter Buneman and Mark Liberman. Funded by the National Science Foundation). "This paper describes aspects of the data modeling, data storage, and retrieval techniques we are using as we develop the FieldWorks suite of applications for linguistic and anthropological research. Object-oriented analysis is used to create the data models. The models, their classes and attributes are captured using the Unified Modeling Language (UML). The modeling tool that we are using stores this information in an XML document that adheres to a developing standard known as the XML Metadata Interchange format (XMI). Adherence to the standard allows other groups to easily use our modeling work and because the format is XML, we can derive a number of other useful documents using standard XSL transformations. These documents include (1) a DTD for validating data for import, (2) HTML documentation of diagrams and classes, and (3) a database schema. The latter is used to generate SQL statements to create a relational database. From the database schema we can also generate an SQL-to-XML mapping schema. When used with SQL Server 2000 (or MSDE), the database can be queried using XPath rather than SQL and data can be output and input using XML. Thus the Fieldworks development process benefits from both the maturity of its relational database engine and the productivity of XML technologies. With this XML in/out capability, the developer does not need to translate between object-oriented data and relational representation. The result will be, hopefully, reduced development time. Another further implication is the potential for an increased interoperability between tools of different developers. Mapping schemas could be created that allow FieldWorks to easily produce and transfer data according to standard DTDs (for example, for lexicons or standard interlinear text). Data could then be shared among different tools -- in much the same way that XMI allows UML data to be used in different modeling tools..."

  • [January 11, 2002] "An Introduction to the XML:DB API." By Kimbro Staken. From XML.com. January 09, 2002. ['The growing number of native XML databases all have different programming interfaces. The XML:DB API is an open source project to provide a unified API for native XML databases.'] "In my last article, 'Introduction to dbXML', I provided an example that used the XML:DB API to access the dbXML server. This time around we'll take a more detailed look at the XML:DB API in order to get a better feel for what the API is about and how it can help you build applications for native XML databases (NXD). Currently, there are about 20 different native XML databases on the market. Among them are commercial products such as Tamino, X-Hive and Excelon. And open source NXDs include dbXML (now renamed Apache Xindice), eXist, and Ozone/XML. While this selection is a nice thing to see in an emerging market, it makes developing applications quite a bit more difficult. Each NXD defines its own API which prevents the development of software that will work with more then one NXD without coding for each specific server. If you've worked with relational databases, then you've likely worked with ODBC or JDBC to abstract away from proprietary relational database APIs. The goal of the XML:DB API is to bring similar functionality to native XML databases. The XML:DB API project was started a little over a year ago by the XML:DB Initiative and is currently still evolving. Most of the core framework is stable, and it has already been implemented by dbXML/Xindice and eXist. There's also a reference implementation in Java available, and there are several other implementations in progress, including some for commercial databases... There is much more to the XML:DB API than what's illustrated in this simple example and short article. But I have given you a better idea of what the API is and how it is used. If you want to find out more you should take a look at the XML:DB API site and the dbXML developers guide. The eXist documentation also contains some information about developing with the API. While there is still a lot of work to do on the XML:DB API, what is available today is already usable and provides a solid framework to build on. In fact, projects like Apache Xindice are using the XML:DB API as the primary Java API for accessing the server. Participating in API development is open to anyone who's interested; feel free to join the project mailing list and contribute to the development of the XML:DB API."

  • [January 11, 2002] "Working out the Bugs in XML Databases." By John Cox. In Network World Volume 19, Number 1 (January 07, 2002), page 24. The article summarizes the pros and cons of special XML repositories. ['As network executives begin to experiment with Web services, they're likely to find that they need a new kind of data store: the XML database. There's a growing belief that XML-based information needs its own database.'] "XML database software products are designed to efficiently store and manage the growing numbers of XML documents that users are creating, especially in Web interactions with business partners and customers. Advocates cite several advantages of XML databases compared with traditional databases: simplicity, ease of application development, ability to search and query XML documents, and fast document retrieval. There's no formal, standard definition of an XML database, although the XML:DB Initiative describes such a database as one that defines a logical model for an XML document (not for the data in the document), and manages documents based on that model. The key point is the database 'thinks and acts' based on XML - XML goes in, and XML comes out, even though these products can physically store the documents in an object or relational database or a proprietary storage model, such as indexed files. The lack of formal definition is just one issue that raises the hackles of critics. They also point to the immaturity of the products and of XML standards; the absence of a standard, reliable query language to match the SQL used in relational databases; and possible data integrity problems... Analysts expect these benefits to fuel a fast-growing market. IDC estimates enterprise spending for XML databases will grow by 130% annually, reaching $700 million in 2004. XML databases will complement relational databases, according to IDC analyst Anthony Picardi - the former being better suited for storing and processing XML documents, the latter for numbers and text. There are plenty of choices for network executives to evaluate, with at least two dozen native XML database products (see XML Database Products). The key vendors include Software AG and eXcelon - which stores documents in its ObjectStore object-oriented database. There are a host of smaller vendors, such as NeoCore, IXIA and ZYZFind, working on XML database products. There are also a number of open source projects. One is Xindice, formerly dbXML Core, which now is being handled by The Apache Software Foundation..."

  • [January 10, 2002] "On Database Theory and XML." By Dan Suciu (University of Washington). In SIGMOD Record Volume 30, Number 3 (2001). 7 pages (with 64 references). "Over the years, the connection between database theory and database practice has weakened. We argue here that the new challenges posed by XML and its applications are strengthening this connection today. We illustrate three examples of theoretical problems arising from XML applications... [We describe] three XML research problems, inspired from our own work. XML's semistructured data model represents paradigm shift for theoretical database research. It is not the first one: for example the object-oriented data model can also be considered a paradigm shift, which generated a vast amount of theoretical and applied research. This time, however, the shift comes from outside the community (XML was imposed on us) and this, at least, settles easily the question of applicability. It offers us both a chance both to apply research on old topics (query containment) and to conduct research on new topics (typechecking)... Today the most promising approach to typechecking remains that based on type inference. The XDuce language defines a type inference system for a functional language with recursion; the XQuery algebra defines a type inference system using XML Schema as its type system. Since we know that this approach cannot be as robust as typechecking in general-purpose programming languages, a study of its applicability and limitations is needed. XML Storage XML data is a labeled tree; a relation is a table. The problem of storing XML data in one or several tables is a challenging one, both for theoreticians and practicians. Since the tree is meant to describe some irregular structure while tables are by definition regular, we are attempting to store some irregular data into a regular data type. In addition to the pure combinatorial aspect, there is a logical aspect to the storage problem: given a storage mapping, one needs to be able to translate queries formulated over the XML data into relational queries formulated over the relational storage. The combination of combinatorics and logic make the problem particularly appealing. Several approaches have been tried so far. The simplest is to store XML as a graph, in a ternary relation (two columns for the edges, the third for the labels and/or data values). This approach is explored by Florescu and Kossman. The price one pays for its simplicity is that many self-joins of the edge table are required in order to reconstruct a given XML element: one join for each subelement. Shanmugasundaram et al. ["Relational databases for querying XML documents: limitations and opportunities"] use the DTD (or XML-Schema) to derive a relational schema. One table is created for each element type that can occur in a collection position. This technique works well in practice whenever one has a schema for the XML document. A subtle problem is that the resulting storage is very sensitive to that schema. For example if the content of <person> changes from (name, phone) to (name, phone*) then we need to move all phone numbers to a separate table, although perhaps the XML document has changed very little. The case when the XML document has no schema, or when the schema changes frequently is harder, and has a more dramatic impact on performance... The challenge in any storage schema is that it has to be flexible enough to accommodate any XML data, yet it has to be as efficient as regular data storage when the XML data happens to be regular. Finding the largest regular subset in an irregular data instance is a problem which can be formulated and addressed theoretically..." [source]

  • [December 20, 2001] "E-business Middleman. Native XML Databases." By Maggie Biggs. In InfoWorld Issue 51 (December 17, 2001), pages 37-38. ['Native XML databases tap heterogeneous back-end databases to feed Web-based applications and trading partners. A native XML database makes good economic sense for enterprises that must support XML document handling and interaction with multiple back-end data sources. In addition, native XML databases can simplify the management of enterprise data processing performance... An emerging technology, native XML databases are currently best suited to early adopters willing to experiment. When existing shortcomings -- such as query and update handling -- are resolved, these databases promise to make XML handling much more manageable for most IT shops.'] "Without a doubt, XML is fast becoming the lingua franca of b-to-b data exchange. As the use of XML increases, executives and IT managers must begin factoring in the growing number and differing types of XML solutions now coming to market before they can determine the most cost-effective XML strategy to implement. Recently major relational database vendors, such as Oracle and Microsoft, have introduced XML-enabling technologies in their products: Oracle's XDB and Microsoft's SQLXML. Rival IBM has offered an XML Extender for its DB2 database for some time. Another promising, more manageable approach to XML in the enterprise is the emerging NXDB (native XML database). An NXDB does not replace your existing enterprise data sources. Rather it acts as an intermediate cache that sits between back-end data sources and middle-tier application components. Using an NXDB provides two principal benefits. First, it's likely your enterprise has multiple back-end data sources and various types of middle-tier applications. Rather than liberally sprinkling XML capabilities across the middle tier and back end, which may significantly increase technology expenditures, you could add the XML support you need by implementing an NXDB. An NXDB supplies the programmatic interfaces and data access methods necessary to support multiple applications and data sources. Second, you might use an NXDB to augment the processing power of your primary enterprise databases. Rather than devote primary database processing cycles to XML translation, storage, and retrieval during peak hours, moving these operations to an NXDB can free primary databases for more important tasks, such as transaction processing. Interaction between the NXDB and your back-end data sources can then be performed at times of the day or night that allow you to optimize processing performance and reduce the load on back-end databases that must also serve other applications and end-users. Many of the XML handling capabilities recently added to RDBMSes provide functionality