Europe '98

Microsoft's Vision for XML


SGML/XML '98, Paris
From the Opening Keynote Address
Adam Bosworth, General Manager, Microsoft Corporation

See also the presentation slides.


Summary

The premise of this memo is that with the advent of the web, it is now possible to design a truly simple, flexible and open architecture that allows all applications on all machines on all platforms to interact. A second key premise is that this will only be possible through the aegis of logical views where, conceptually, the applications are interacting through agreed upon logical conventions, rather than directly upon either the database tables or methods of the other. The argument is that XML is the key building block for such architecture and that several key XML grammars and conventions will be required for this revolution to be realized. The rest of this paper spells out the building blocks Microsoft believes are required.

The Vision

All applications on the web are easy to make open. Goods and services are easy to find. Any customer can:

In short, make it easy to discover and interact with structured data and applications on the web.

Arguments

Whenever one thinks about a new architecture, one should consider what works and what does not. If the architecture is a sweeping one such as one required to allow applications to interact across the web, then reasonable places to learn lessons are the success of the web itself and the lessons we have learned about servers when used in applications.

So, what have we learned from the Web?

First and foremost, we have learned that it isn't enough for something to be possible. It must be easy, open and flexible. The web predates HTML of course, but until the advent of HTTP and HTML, it didn't really explode. Why? The answer, succinctly, is empowerment. Once HTML and HTTP arrived, more people could play more easily. The solutions were not necessarily optimal from the point of view of performance or even robustness. They were optimal from the point of view of ease of getting started. In short, they were drop-dead simple. Many people point out the deficiencies of HTML, especially because of the sloppiness of its grammar.

This is true and even regrettable, but the fact is that the simplicity of the HTML model where tags were used not abstractly despite what every book preached, but concretely to describe intended look and feel made HTML immediately approachable. The fact that HTML simply ignored unknown tags made it easy as well since mistakes were silently ignored. Now, as we all know, the lack of formality led to a mess in which few can implement an HTML engine since it is expected to maintain perfect visual and scripting fidelity with what is, essentially, an unwritten, arbitrary, and complex standard. We should also learn from this and keep the simplicity without the mess. Nevertheless, the key point is that HTML exploded because it crossed some threshold of cognitive simplicity. The lesson is that simplicity and flexibility beat optimization and power in a world where connectivity is key.

There is a second lesson which is key. Applications need to be constructed out of coarse-grained components that can be dynamically loaded rather than single large monolithic blocks. In the HTML world, these components are pages. In the applications world in general, however, this lesson applies. The reason for this is simple. The application starts more quickly, only consumes the resources it really needs, and most importantly can be dynamically loaded off of the net. Why is this so important? It is important because of deployment. Applications that can be dynamically loaded in from a central place don't require some massive, complex and difficult installation process onto clients' machines. Note that Java per se doesn't give one this. It is easy, as anyone who has built a large and complex Java application can testify, to build one, which requires literally hundreds of classes to run. That is monolithic. HTML had the serendipitous effect of forcing application designs to partition the application. To repeat, the lesson is that applications should be loaded in coarse-grained chunks.

What have we learned from servers?

We use a simple analogy here. The analogy is the corner grocery store. Imagine that there is such a neighborhood store and everyone buys his or her weekly groceries there. Now imagine that all of a sudden, everyone's buying patterns changed. Instead of buying a few days supplies at a time, they bought a single item at a time. In short, a customer would come in, buy a quart of milk and leave. Then return, buy a stick of butter and leave. Then return, buy a bag of apples and leave. And so on. In short order, there would be huge lines trying to buy their groceries at the store. If each time, the customers were buying their goods using electronic cards or checks (we know, only in backward USA), it would be even worse. Conversely, suppose that all panicked about coming inflation and came in and tried to buy a year's supply of groceries at once to bring home and put into 17 freezers. An entirely new and different set of problems would surface. Happily customers don't do this.

It turns out that servers are like grocery stores. They can only handle so many customers at once (check out counters). More customers mean longer queues whether at a grocery store or on a server. They cannot scale beyond a certain point (you cannot build checkout counters on the fly) and servers cannot just run more and more processes concurrently. They start competing for the CPU and overall performance actually degrades.

The cost of starting a new conversation with each customer (client) or finishing it has very real costs and if the transactions are too fine-grained, these costs can dwarf the other costs. Servers aren't designed to serve up really huge amounts of data (a year's supply of groceries) either. They bottleneck, TCP/IP complains, and so on. We also learned that processes shouldn't hang onto state while waiting on clients or other slow processes. As an example, if the clerk doesn't know the price of an item, it would be unfortunate if the clerk simply stopped until the price were found and went to find the price out. The whole queue would block. Instead, the clerk should get someone to find out the price and even start helping the next customer if the delay will be significant. In server-speak this means that the process shouldn't hang onto state. What we learned from servers was to conduct coarse-grained interruptible conversations with them.

This is very important. It means that when talking to a medical system or a payments system of your bank or a purchase order system, the client should exchange information in coarse-grained granularity; chunks large enough to let the client go away and do useful work for a spell. Indeed, often the process should dump all its state up to the client and grab it on the way back just to allow the process to quickly take the next task off of the queue. In short, it means that good application design for talking to applications on servers involves relatively few methods that can accept and return complex rich sets of information such as the complete patient record or the complete portfolio description or the complete purchase order. This is not how we normally design methods where we explicitly design for encapsulation and thus make every access to every property or element a call.

There is another key lesson to be learned from servers. The code on the server is designed to talk efficiently and swiftly to share transacted resources like databases or payment authorization systems or airline reservation systems. It is code that fundamentally assumes that it is connected through distributed transactions to these resources. This in turn implies rapid turn-around. Why? Because while any process is holding a transaction on resources, they are locked and this hurts other clients' ability to even read such resources. This code cannot move to a place that isn't similarly connected to these resources. It would lock up these servers. If a banking transaction, for example, is debiting an account from one system and crediting an account from another, all inside of a distributed transaction, it must not run over slow or unreliable lines because it will start to lock up the resources. This means that the code that deals with this information on the client isn't the same code. It shares the same information (purchase orders, portfolios, personnel records), but it is code dedicated to some different process such as letting the user view the information. The lesson is, in short, to move the information and possibly some simple portable validation logic for the information, but not the custom code that handles this information on the server. We tend to refer to this as an object to object bridge. Which is, of course, an RPC. So, to repeat, the lesson to be learned is to share data, not code.

To summarize, we can learn the following lessons:

Microsoft's Opinion

XML is tailor made as a way to move this coarse-grained information around the web, whether the information is purchase orders, personnel records, portfolios, or just parameters. It is open, easy, and flexible. It is self-describing. If the proposals in XML-DATA are adopted, it supports rich extensible data-types and additional meta-data that a tool might want to understand how to use or interpret this data. Virtually any logical view's data can be described and transported using XML although XML does prove to be cumbersome for expressions and script and could use some enhancement particularly in the latter area.

What is left to do to enable this interaction?

Enhance XML to support extensible and rich typing and meta-data:

The logical views that describe the data that applications will deliver or accept must include a rich and extensible model for data types. As soon as one tries to use XML even for something really simple like transporting the arguments that an RPC implies or transporting a simple set of rows from a relational database, one discovers the need for data types. The recipient needs both to be able to discover the data types and to be able to reliably parse them into the appropriate types for each programming language. Furthermore, the negotiation process will often involve the conveyance of other meta-data important for the recipient. For example, if I am shipping around task descriptions as part of a project description, I may have a default Java class or COM class that should be materialized from this information. If I am a middle-ware search engine, I may want to get enough meta-data about the elements to know which should be shown in a summary view and which elements should be used as links to related information and how. In short, the amount of ancillary information that might be desired for XML data is essentially unlimited and this argues for a model for describing the schema of an XML grammar that is extensible. All of this is, of course, exactly what the XML-DATA proposal is intended to address.

Describe the grammars:

In order to illustrate this section, we ask the reader to imagine a scenario in which the customer is trying to search for providers of used books. The customer has his/her heart dearly set upon some out of print tome by Thomas Carlyle and wants to know which sites have this book. The eager entrepreneur has decided to deliver such a service and has managed to convince many used book providers and even some online book retailers to publish their inventory using a particular schema to describe books. It need not even be a particularly good or comprehensive schema for describing books.

Remember that sites can support multiple schemas (logical views) against a single implementation. The enterprising entrepreneur has also convinced Yahoo to point to the site for book searching which lends certain urgency to the book purveyors to actually publish this schema. Now comes the need for grammars. Which ones?

First, of course, a grammar for the books themselves and most likely a grammar for a site to describe its physical ability to deliver goods and take payments. XML can do this today.

Second, however, is a way for a search engine to quickly and efficiently determine for each site whether that site has any URL that returns the desired schema. This requires some grammar and some protocol or convention. The grammar is one that enumerates which URLs return which schema (if any). The protocol or convention is either a magic URL or an HTTP header or verb. This is used to make it clear to the site that the desired information is the list of resources with the filtering restrictions if any (e.g. which resources support the book schema) that are being passed in through a particular XML grammar. Thus we are likely to need two XML grammars, one to describe a filter, and one to present a logical view of a site in terms of URLs and schema that URLs can return. Let's call this latter one "Site description" and the former one "Filters" for the moment.

Third, let's assume that the search engine has a way to now discover which URL's return book schema. Now the search engine may do one of two things:

  1. Pull down the inventory and merge it into a giant database that the search engine stores tracking which sites have which books. In this case the search engine would like some additional information. It would like to know if the book-purveying site would let the search engine "subscribe" to this URL. In this case, the search engine is likely to want to "subscribe" to the site. This would mean that, from time to time, the site would be told that the inventory on hand has changed. The site would then either ask for the entire inventory over again or ask for essentially a DIFF to the inventory already on file. If the site did the latter we would need yet another XML grammar to describe changes to make. We'll call this grammar "Updates". Presumably the search engine would discover whether the book-purveying site was willing to accept subscriptions to the URL through the same "Site description" grammar that described the URL in the first place.

  2. Remember that the site has books. In this case as well, the site would like to find out some additional information. It would like to ask whether the book purveyor site was willing to accept "Filter" grammar requests asking for filtered sets of these books. Then, if a request for books came in, the search engine would ship such a request directly to the book-purveying site using the "Filter" grammar. This model is slower of course, but is a lot easier to implement and less likely to fail.

In either case, as we hope this example shows, standard XML grammars for describing sites, for asking for filtered subsets and for sending changes are likely to be extraordinarily useful.

However, this model may sometimes be too simplistic. Let's imagine that the site isn't willing to support a general query language no matter how limited. What it is willing to do is expose certain URLs which when sent the appropriate parameters of Title or Author return a list of books for those Titles or Authors or both. Notice that this is basically RPC. Now, even in this case, the search engine would want to send the parameters to the book-purveying site in a simple, easy to engineer manner. If the "Site description" grammar assumed above describes the desired shape of the parameters of the "method" and there is a standard grammar for marshalling parameters for RPC or the "Site description" grammar describes the grammar of the search request, then the search engine knows what XML to send. Ideally, this grammar for RPC will also describe how synchronously the results may be returned and by what mechanism to allow for delayed return.

It is interesting to note that the "Site description" grammar could be built as an extension of the current XML-DATA proposal.

There is another standard grammar that shows great promise. This is XSL or Extensible Stylesheet Language. The search engine now has lists of books and this is good. But there is another challenge that faces the search engine: how does it handle the case where different sites support different book schemas? Suppose that there are several competing schemas. It would like to know them all and be able to convert between them into whichever common one it prefers. What would be ideal is an XML grammar that described the logic required to quickly and efficiently convert from one schema to another. XSL has the potential to provide a standard mechanism for such conversions.

Similarly, the client receiving this book list information has a challenge. How does it dynamically generate the HTML that would be required to display the book list that vendors might want to include as links back to their sites, little blurbs about their site, and so on? The client could of course take direct advantage of DHTML and JavaScript or Java or any COM language to solve this problem writing either script or a component to custom generate the appropriate DHTML from the data. The client could also use the data binding features of DHTML. But it would be handy to have a standard way to describe how to build the HTML from the XML and a standard component (converter) that knew how to execute such instructions. Again, XSL can provide such a conversion.

Lastly, the site vending books has a problem. How does it map its book database, undoubtedly stored typically in a large relational database, into the appropriate XML logical view? Interestingly, the XML-Schema for the view could itself contain suitable meta-data to help one compute the answer to this question, but then a standard for such meta-data decoration of XML-Schemas would be required. Even so, undoubtedly a "plan" would be computed for actually building the appropriate XML logical view and this plan would require an XML grammar, call it the "Database conversion". Also required would be a general converter that given such a grammar, would actually talk to the database, submit the requisite SQL, and build the appropriate XML. This model would be extremely useful on any server mapping relational data to HTML clients, even sometimes building the HTML directly.

So, to summarize, the following XML grammars would be necessary:

  1. The grammars proposed in the XML-Data proposal

  2. "Site description" grammar: This should probably be either a specific proposal or standard meta-data extensions to the XML-DATA proposal used to describe the "methods" of a site and the "schemas" it returns. Extensions to this model would also be needed to determine for any of these returned schemas, whether the site is also willing to accept the "Filters" or the "Updates" grammar.

  3. "Updates" grammar: An XML grammar to describe changes to/from cached XML

  4. "Filters" grammar: An XML grammar to describe desired subsets of a particular XML logical view

  5. Either an XML RPC proposal or extensions to the XML-IDL proposal above to describe the grammar of parameters submitted in an RPC and requisite envelope information

The following grammars would be useful:

  1. XSL: It would be helpful to pay some attention to ensuring that the part of XSL that is a tree transformation language is sufficiently easy and powerful

  2. More standard meta-data for XML-DATA to describe mappings of elements to relational stores

  3. "Database conversion" grammar: A grammar to construct logical XML views from relational databases

Ensure that the programming models for XML stay simple and appropriate:

Any model for moving information around the net will involve the requirement that two types of programmers can access this information, programmers using serious structured languages such as Java or C++ and programmers using scripting languages such as JavaScript. It is important to note that the requirements are not identical. The scripting programmer tends to not have either pointers or types and to want to treat XML as a single big tree of nodes which can be navigated through a collection syntax, something like root[foos[3][bars[2][sams[1]]] where this completely fictitious syntax would mean that the script writer wanted the first node with tagname "sams" within the second node with tagname "bars" within the third node with tagname "foos" within the root. And do to the magic of overloading of scripting languages, such a model can be surfaced without requiring each and every XML provider to constantly provide and maintain such indexed collections. What this code would really translate to is code that goes and finds such a node using simple enumerators. Conversely, the C++ or Java programmer will probably happily deal with a lightweight enumeration model and build their own object models on top of it rather than depend upon the implementation vagaries of layers provided by the provider. Indeed, in many cases, the C++ or Java programmer may simply want the nodes pushed into them as they will be building their own data structures. It is important that we not overburden each implementation with some top-heavy API which is neither fish nor fowl, but rather build the right low level API on top of which all can build.

What will be required in addition to simple enumeration are ways to get data based upon the types described in the schema, to make changes to the data or tree, and ways to validate that the changes conform to the schema of the XML document. Lastly, as XML grammars described above emerge, services to automatically execute them will start to be expected as a service layer on top of the base API, but with a model that allows the implementers to support them directly and natively. In other words, while a service layer might be necessary to find some node using simple predicates, a provider might natively implement support for this in a more efficient manner using internal indexes or hash tables or what have you.

Stores:

Many have asked about XML "stores". It is unlikely that there will be one XML store. Different stores will have different purposes.

The cheapest store, of course, is the file system. Many standard components are springing up to provide DOM (Document Object Model) API access to XML stored in streams or files including, of course, our own component which we will make available ubiquitously on all Windows platforms, on Unix, and in Java.

Relational stores are superb at exposing multiple different logical views on the same data and by now have very good scaling and transactional characteristics. Typically most mission critical data will live in them. But this doesn't mean that there cannot be "converters" between the relational stores and XML logical views. We expect to see this as a rapidly exploding part of the XML industry. Ultimately, we would expect that the database stores themselves would be able to make these conversions happen, but that in the near term these converters would be part of middle-ware living on middle tier servers. This is too complicated a subject to describe in depth in this memo.

Object oriented stores. Many object oriented stores have discovered new and additional purposes acting as "staging" stores for XML data as it is cached on the middle tier or the client. Some will undoubtedly turn out to do an excellent job of providing XML caching. What will be critical here is that we avoid a profusion of APIs for talking to these caches if we want to see interoperability. The DOM helps here by describing how to talk to a particular XML document/object, but doesn't handle the larger issues that caches are likely to worry about such as versioning, transactions, searching, and so on. We hope that WebDAV will play a major role in driving convergence in this arena. We also expect the object-oriented companies and innovative companies like Frontier to play key roles in this arena.

Converters architecture . . .

Clearly, most data will not be stored in XML. Indeed, the main thrust of the theme of this paper is that XML acts as a convenient mechanism for interacting at the logical view level, not the physical level. Some data will come from relational databases. Some from Teletype feeds. Some from mainframe databases through CICS. Some from SGML. Some from object-oriented databases. Some from mail stores. Much will be synthesized dynamically by objects written by programmers.

But if we want a model for interacting, we will need a standard component model for transforming between any data and XML. We call such components converters.

A normal middle tier server will have mechanisms for wiring such converters up to queues of XML messages. For example, clients may be sending requests for available classes to a university and expecting a specific logical view (XML schema). The server at the university would pop such requests off of a queue, pass the XML request to the appropriate converter, probably hooked up to the class schedule database on the back end, take the resulting XML, and route it back to the requestor. The requestor would then display the information, review it, and then, perhaps, want to enroll. The requestor would send a message with another schema (enroll) to the middle tier. The middle tier would pop this message off of some queue and quite possibly hand it off to a completely different converter component talking to a backend student database.

We expect to help produce three sorts of converters:

Objects written in code:

This requires a component model for being an XML converter. A converter will need to act a lot like any other XML provider. It will deliver up XML on request. It will support some standard discovery mechanism such as the "Site discovery" grammar. It will need to be able to stream results out. We need to make it easy for people to build such components, even using scripting languages.

Database converters:

An extraordinarily common case will be mappings between XML logical views and SQL databases. This is sufficiently common that it should be possible to author such transforms without having to write code. Indeed, it should be possible to simply send a request for a particular schema described using the XML-Schema syntax defined in the XML-DATA proposal, but augmented with information required to map the elements to relational database elements.

XSL (Extensible Style-sheet Language):

XSL has been traditionally viewed as a component that took XML in and emitted nice printed output using whatever print-medium was chosen. We have looked at XSL somewhat differently. Our chosen output medium is DHTML, whether for dynamic interactive output or for producing rich Office documents. However, we don't want to require that everyone who is trying to produce user interface for XML be a programmer. We believe that it should be easy to produce DHTML from XML.

Furthermore, we also don't believe that everyone will always agree about schema. It is possible, for example, that 3 or 4 or 5 schemas will be used by sites providing information about their book inventory rather than one. However, any sensible program accessing or viewing such data will want to pick one. This will require a mechanism for converting from one XML schema to another. Sometimes, if sufficient semantic information were available, it might be possible to do this automatically.

But frequently such information will not be available and a customer will have to describe how to convert. Again, however, we don't want to limit this to programmers. We believe it should be easy to produce XML from XML.

We want a standard extensible grammar, therefore, for converting from XML trees into either other XML trees or into DHTML trees. Thus to Microsoft, the part of XSL that is interesting is the part that helps define how a given XML tree should be translated into a quite different tree. We see the emergence of standard converters which use XSL messages to decide how to translate from one XML grammar into another or into DHTML.

Implications of all of this for the web:

XML is the building block. XML-DATA is the required next step. Then, as we have seen, we do need agreement upon some specific XML grammars and we must remember to keep these grammars simple, open, and easy. If we do all this, the power we unleash to the normal developer is incalculable. It becomes possible to build systems for any line of business application, for collaboration, and systems for intelligent information retrieval including for goods and services.

This revolution will do for applications what SQL partially did for databases. It will open up the floodgates because the number of people who can interact with them will increase enormously. Many hard issues remain to be worked out including the details of these grammars, the security issues, transactions, and so on. But in the spirit of the web, we have submitted our proposals for most of these pieces to allow us to start and learn rather than trying to get it perfect and never shipping.

This will cause a site revolution. Well behaved sites will support not just data for specific schema on request, but filtered sets of data, updates to data (notifications), and requests to update their data (data entry) and, in all these cases, they will support these services using standard XML grammars. It will open up an entire new business, namely converters with a myriad of specialized XML grammars that make sense for particular types of transformations.

Predictions:

The requisite grammars will be defined and implemented. The DOM will converge and streamline. Many database storage systems will start to support both the grammars and the DOM. XML will become a widespread solution to interoperable RPC on the net. Microsoft will actively support all of the above.

Tools will emerge to:

Server components will emerge to:

The programming model for building applications will change to:

Summary

We're only at the very beginning of the Net revolution. The most exciting part is still to come. Soon it will become as easy to interact with programs and data all over the net as it currently is to view shared presentation and content.


Copyright (c) Adam Bosworth, Microsoft. April 1998.