SGML/XML Europe '98. Adam Bosworth (Microsoft)

So, what have we learned from the Web?

First and foremost, we have learned that it isn't enough for something to be possible. It must be easy, open and flexible. The web predates HTML of course, but until the advent of HTTP and HTML, it didn't really explode. Why? The answer, succinctly, is empowerment. Once HTML and HTTP arrived, more people could play more easily. The solutions were not necessarily optimal from the point of view of performance or even robustness. They were optimal from the point of view of ease of getting started. In short, they were drop-dead simple. Many people point out the deficiencies of HTML, especially because of the sloppiness of its grammar.

This is true and even regrettable, but the fact is that the simplicity of the HTML model where tags were used not abstractly despite what every book preached, but concretely to describe intended look and feel made HTML immediately approachable. The fact that HTML simply ignored unknown tags made it easy as well since mistakes were silently ignored. Now, as we all know, the lack of formality led to a mess in which few can implement an HTML engine since it is expected to maintain perfect visual and scripting fidelity with what is, essentially, an unwritten, arbitrary, and complex standard. We should also learn from this and keep the simplicity without the mess. Nevertheless, the key point is that HTML exploded because it crossed some threshold of cognitive simplicity. The lesson is that simplicity and flexibility beat optimization and power in a world where connectivity is key.

There is a second lesson which is key. Applications need to be constructed out of coarse-grained components that can be dynamically loaded rather than single large monolithic blocks. In the HTML world, these components are pages. In the applications world in general, however, this lesson applies. The reason for this is simple. The application starts more quickly, only consumes the resources it really needs, and most importantly can be dynamically loaded off of the net. Why is this so important? It is important because of deployment. Applications that can be dynamically loaded in from a central place don't require some massive, complex and difficult installation process onto clients' machines. Note that Java per se doesn't give one this. It is easy, as anyone who has built a large and complex Java application can testify, to build one, which requires literally hundreds of classes to run. That is monolithic. HTML had the serendipitous effect of forcing application designs to partition the application. To repeat, the lesson is that applications should be loaded in coarse-grained chunks.

What have we learned from servers?

We use a simple analogy here. The analogy is the corner grocery store. Imagine that there is such a neighborhood store and everyone buys his or her weekly groceries there. Now imagine that all of a sudden, everyone's buying patterns changed. Instead of buying a few days supplies at a time, they bought a single item at a time. In short, a customer would come in, buy a quart of milk and leave. Then return, buy a stick of butter and leave. Then return, buy a bag of apples and leave. And so on. In short order, there would be huge lines trying to buy their groceries at the store. If each time, the customers were buying their goods using electronic cards or checks (we know, only in backward USA), it would be even worse. Conversely, suppose that all panicked about coming inflation and came in and tried to buy a year's supply of groceries at once to bring home and put into 17 freezers. An entirely new and different set of problems would surface. Happily customers don't do this.

It turns out that servers are like grocery stores. They can only handle so many customers at once (check out counters). More customers mean longer queues whether at a grocery store or on a server. They cannot scale beyond a certain point (you cannot build checkout counters on the fly) and servers cannot just run more and more processes concurrently. They start competing for the CPU and overall performance actually degrades.

The cost of starting a new conversation with each customer (client) or finishing it has very real costs and if the transactions are too fine-grained, these costs can dwarf the other costs. Servers aren't designed to serve up really huge amounts of data (a year's supply of groceries) either. They bottleneck, TCP/IP complains, and so on. We also learned that processes shouldn't hang onto state while waiting on clients or other slow processes. As an example, if the clerk doesn't know the price of an item, it would be unfortunate if the clerk simply stopped until the price were found and went to find the price out. The whole queue would block. Instead, the clerk should get someone to find out the price and even start helping the next customer if the delay will be significant. In server-speak this means that the process shouldn't hang onto state. What we learned from servers was to conduct coarse-grained interruptible conversations with them.

This is very important. It means that when talking to a medical system or a payments system of your bank or a purchase order system, the client should exchange information in coarse-grained granularity; chunks large enough to let the client go away and do useful work for a spell. Indeed, often the process should dump all its state up to the client and grab it on the way back just to allow the process to quickly take the next task off of the queue. In short, it means that good application design for talking to applications on servers involves relatively few methods that can accept and return complex rich sets of information such as the complete patient record or the complete portfolio description or the complete purchase order. This is not how we normally design methods where we explicitly design for encapsulation and thus make every access to every property or element a call.

There is another key lesson to be learned from servers. The code on the server is designed to talk efficiently and swiftly to share transacted resources like databases or payment authorization systems or airline reservation systems. It is code that fundamentally assumes that it is connected through distributed transactions to these resources. This in turn implies rapid turn-around. Why? Because while any process is holding a transaction on resources, they are locked and this hurts other clients' ability to even read such resources. This code cannot move to a place that isn't similarly connected to these resources. It would lock up these servers. If a banking transaction, for example, is debiting an account from one system and crediting an account from another, all inside of a distributed transaction, it must not run over slow or unreliable lines because it will start to lock up the resources. This means that the code that deals with this information on the client isn't the same code. It shares the same information (purchase orders, portfolios, personnel records), but it is code dedicated to some different process such as letting the user view the information. The lesson is, in short, to move the information and possibly some simple portable validation logic for the information, but not the custom code that handles this information on the server. We tend to refer to this as an object to object bridge. Which is, of course, an RPC. So, to repeat, the lesson to be learned is to share data, not code.

To summarize, we can learn the following lessons:

Simplicity and flexibility beat optimization and power when connectivity is key
Applications should be loaded in coarse-grained chunks
What we learned from servers was to conduct coarse-grained interruptible conversations with them
The lesson to be learned is to share data, not code

Describe the grammars:

In order to illustrate this section, we ask the reader to imagine a scenario in which the customer is trying to search for providers of used books. The customer has his/her heart dearly set upon some out of print tome by Thomas Carlyle and wants to know which sites have this book. The eager entrepreneur has decided to deliver such a service and has managed to convince many used book providers and even some online book retailers to publish their inventory using a particular schema to describe books. It need not even be a particularly good or comprehensive schema for describing books.

Remember that sites can support multiple schemas (logical views) against a single implementation. The enterprising entrepreneur has also convinced Yahoo to point to the site for book searching which lends certain urgency to the book purveyors to actually publish this schema. Now comes the need for grammars. Which ones?

First, of course, a grammar for the books themselves and most likely a grammar for a site to describe its physical ability to deliver goods and take payments. XML can do this today.

Second, however, is a way for a search engine to quickly and efficiently determine for each site whether that site has any URL that returns the desired schema. This requires some grammar and some protocol or convention. The grammar is one that enumerates which URLs return which schema (if any). The protocol or convention is either a magic URL or an HTTP header or verb. This is used to make it clear to the site that the desired information is the list of resources with the filtering restrictions if any (e.g. which resources support the book schema) that are being passed in through a particular XML grammar. Thus we are likely to need two XML grammars, one to describe a filter, and one to present a logical view of a site in terms of URLs and schema that URLs can return. Let's call this latter one "Site description" and the former one "Filters" for the moment.

Third, let's assume that the search engine has a way to now discover which URL's return book schema. Now the search engine may do one of two things:

Pull down the inventory and merge it into a giant database that the search engine stores tracking which sites have which books. In this case the search engine would like some additional information. It would like to know if the book-purveying site would let the search engine "subscribe" to this URL. In this case, the search engine is likely to want to "subscribe" to the site. This would mean that, from time to time, the site would be told that the inventory on hand has changed. The site would then either ask for the entire inventory over again or ask for essentially a DIFF to the inventory already on file. If the site did the latter we would need yet another XML grammar to describe changes to make. We'll call this grammar "Updates". Presumably the search engine would discover whether the book-purveying site was willing to accept subscriptions to the URL through the same "Site description" grammar that described the URL in the first place.
Remember that the site has books. In this case as well, the site would like to find out some additional information. It would like to ask whether the book purveyor site was willing to accept "Filter" grammar requests asking for filtered sets of these books. Then, if a request for books came in, the search engine would ship such a request directly to the book-purveying site using the "Filter" grammar. This model is slower of course, but is a lot easier to implement and less likely to fail.

In either case, as we hope this example shows, standard XML grammars for describing sites, for asking for filtered subsets and for sending changes are likely to be extraordinarily useful.

However, this model may sometimes be too simplistic. Let's imagine that the site isn't willing to support a general query language no matter how limited. What it is willing to do is expose certain URLs which when sent the appropriate parameters of Title or Author return a list of books for those Titles or Authors or both. Notice that this is basically RPC. Now, even in this case, the search engine would want to send the parameters to the book-purveying site in a simple, easy to engineer manner. If the "Site description" grammar assumed above describes the desired shape of the parameters of the "method" and there is a standard grammar for marshalling parameters for RPC or the "Site description" grammar describes the grammar of the search request, then the search engine knows what XML to send. Ideally, this grammar for RPC will also describe how synchronously the results may be returned and by what mechanism to allow for delayed return.

It is interesting to note that the "Site description" grammar could be built as an extension of the current XML-DATA proposal.

There is another standard grammar that shows great promise. This is XSL or Extensible Stylesheet Language. The search engine now has lists of books and this is good. But there is another challenge that faces the search engine: how does it handle the case where different sites support different book schemas? Suppose that there are several competing schemas. It would like to know them all and be able to convert between them into whichever common one it prefers. What would be ideal is an XML grammar that described the logic required to quickly and efficiently convert from one schema to another. XSL has the potential to provide a standard mechanism for such conversions.

Similarly, the client receiving this book list information has a challenge. How does it dynamically generate the HTML that would be required to display the book list that vendors might want to include as links back to their sites, little blurbs about their site, and so on? The client could of course take direct advantage of DHTML and JavaScript or Java or any COM language to solve this problem writing either script or a component to custom generate the appropriate DHTML from the data. The client could also use the data binding features of DHTML. But it would be handy to have a standard way to describe how to build the HTML from the XML and a standard component (converter) that knew how to execute such instructions. Again, XSL can provide such a conversion.

Lastly, the site vending books has a problem. How does it map its book database, undoubtedly stored typically in a large relational database, into the appropriate XML logical view? Interestingly, the XML-Schema for the view could itself contain suitable meta-data to help one compute the answer to this question, but then a standard for such meta-data decoration of XML-Schemas would be required. Even so, undoubtedly a "plan" would be computed for actually building the appropriate XML logical view and this plan would require an XML grammar, call it the "Database conversion". Also required would be a general converter that given such a grammar, would actually talk to the database, submit the requisite SQL, and build the appropriate XML. This model would be extremely useful on any server mapping relational data to HTML clients, even sometimes building the HTML directly.

So, to summarize, the following XML grammars would be necessary:

The grammars proposed in the XML-Data proposal
"Site description" grammar: This should probably be either a specific proposal or standard meta-data extensions to the XML-DATA proposal used to describe the "methods" of a site and the "schemas" it returns. Extensions to this model would also be needed to determine for any of these returned schemas, whether the site is also willing to accept the "Filters" or the "Updates" grammar.
"Updates" grammar: An XML grammar to describe changes to/from cached XML
"Filters" grammar: An XML grammar to describe desired subsets of a particular XML logical view
Either an XML RPC proposal or extensions to the XML-IDL proposal above to describe the grammar of parameters submitted in an RPC and requisite envelope information

The following grammars would be useful:

XSL: It would be helpful to pay some attention to ensuring that the part of XSL that is a tree transformation language is sufficiently easy and powerful
More standard meta-data for XML-DATA to describe mappings of elements to relational stores
"Database conversion" grammar: A grammar to construct logical XML views from relational databases

Ensure that the programming models for XML stay simple and appropriate:

Any model for moving information around the net will involve the requirement that two types of programmers can access this information, programmers using serious structured languages such as Java or C++ and programmers using scripting languages such as JavaScript. It is important to note that the requirements are not identical. The scripting programmer tends to not have either pointers or types and to want to treat XML as a single big tree of nodes which can be navigated through a collection syntax, something like root[foos[3][bars[2][sams[1]]] where this completely fictitious syntax would mean that the script writer wanted the first node with tagname "sams" within the second node with tagname "bars" within the third node with tagname "foos" within the root. And do to the magic of overloading of scripting languages, such a model can be surfaced without requiring each and every XML provider to constantly provide and maintain such indexed collections. What this code would really translate to is code that goes and finds such a node using simple enumerators. Conversely, the C++ or Java programmer will probably happily deal with a lightweight enumeration model and build their own object models on top of it rather than depend upon the implementation vagaries of layers provided by the provider. Indeed, in many cases, the C++ or Java programmer may simply want the nodes pushed into them as they will be building their own data structures. It is important that we not overburden each implementation with some top-heavy API which is neither fish nor fowl, but rather build the right low level API on top of which all can build.

What will be required in addition to simple enumeration are ways to get data based upon the types described in the schema, to make changes to the data or tree, and ways to validate that the changes conform to the schema of the XML document. Lastly, as XML grammars described above emerge, services to automatically execute them will start to be expected as a service layer on top of the base API, but with a model that allows the implementers to support them directly and natively. In other words, while a service layer might be necessary to find some node using simple predicates, a provider might natively implement support for this in a more efficient manner using internal indexes or hash tables or what have you.

Stores:

Many have asked about XML "stores". It is unlikely that there will be one XML store. Different stores will have different purposes.

The cheapest store, of course, is the file system. Many standard components are springing up to provide DOM (Document Object Model) API access to XML stored in streams or files including, of course, our own component which we will make available ubiquitously on all Windows platforms, on Unix, and in Java.

Relational stores are superb at exposing multiple different logical views on the same data and by now have very good scaling and transactional characteristics. Typically most mission critical data will live in them. But this doesn't mean that there cannot be "converters" between the relational stores and XML logical views. We expect to see this as a rapidly exploding part of the XML industry. Ultimately, we would expect that the database stores themselves would be able to make these conversions happen, but that in the near term these converters would be part of middle-ware living on middle tier servers. This is too complicated a subject to describe in depth in this memo.

Object oriented stores. Many object oriented stores have discovered new and additional purposes acting as "staging" stores for XML data as it is cached on the middle tier or the client. Some will undoubtedly turn out to do an excellent job of providing XML caching. What will be critical here is that we avoid a profusion of APIs for talking to these caches if we want to see interoperability. The DOM helps here by describing how to talk to a particular XML document/object, but doesn't handle the larger issues that caches are likely to worry about such as versioning, transactions, searching, and so on. We hope that WebDAV will play a major role in driving convergence in this arena. We also expect the object-oriented companies and innovative companies like Frontier to play key roles in this arena.

Converters architecture . . .

Clearly, most data will not be stored in XML. Indeed, the main thrust of the theme of this paper is that XML acts as a convenient mechanism for interacting at the logical view level, not the physical level. Some data will come from relational databases. Some from Teletype feeds. Some from mainframe databases through CICS. Some from SGML. Some from object-oriented databases. Some from mail stores. Much will be synthesized dynamically by objects written by programmers.

But if we want a model for interacting, we will need a standard component model for transforming between any data and XML. We call such components converters.

A normal middle tier server will have mechanisms for wiring such converters up to queues of XML messages. For example, clients may be sending requests for available classes to a university and expecting a specific logical view (XML schema). The server at the university would pop such requests off of a queue, pass the XML request to the appropriate converter, probably hooked up to the class schedule database on the back end, take the resulting XML, and route it back to the requestor. The requestor would then display the information, review it, and then, perhaps, want to enroll. The requestor would send a message with another schema (enroll) to the middle tier. The middle tier would pop this message off of some queue and quite possibly hand it off to a completely different converter component talking to a backend student database.

We expect to help produce three sorts of converters:

Objects written in code:

This requires a component model for being an XML converter. A converter will need to act a lot like any other XML provider. It will deliver up XML on request. It will support some standard discovery mechanism such as the "Site discovery" grammar. It will need to be able to stream results out. We need to make it easy for people to build such components, even using scripting languages.

Database converters:

An extraordinarily common case will be mappings between XML logical views and SQL databases. This is sufficiently common that it should be possible to author such transforms without having to write code. Indeed, it should be possible to simply send a request for a particular schema described using the XML-Schema syntax defined in the XML-DATA proposal, but augmented with information required to map the elements to relational database elements.

XSL (Extensible Style-sheet Language):

XSL has been traditionally viewed as a component that took XML in and emitted nice printed output using whatever print-medium was chosen. We have looked at XSL somewhat differently. Our chosen output medium is DHTML, whether for dynamic interactive output or for producing rich Office documents. However, we don't want to require that everyone who is trying to produce user interface for XML be a programmer. We believe that it should be easy to produce DHTML from XML.

Furthermore, we also don't believe that everyone will always agree about schema. It is possible, for example, that 3 or 4 or 5 schemas will be used by sites providing information about their book inventory rather than one. However, any sensible program accessing or viewing such data will want to pick one. This will require a mechanism for converting from one XML schema to another. Sometimes, if sufficient semantic information were available, it might be possible to do this automatically.

But frequently such information will not be available and a customer will have to describe how to convert. Again, however, we don't want to limit this to programmers. We believe it should be easy to produce XML from XML.

We want a standard extensible grammar, therefore, for converting from XML trees into either other XML trees or into DHTML trees. Thus to Microsoft, the part of XSL that is interesting is the part that helps define how a given XML tree should be translated into a quite different tree. We see the emergence of standard converters which use XSL messages to decide how to translate from one XML grammar into another or into DHTML.

Europe '98

Microsoft's Vision for XML

SGML/XML '98, Paris
From the Opening Keynote Address
Adam Bosworth, General Manager, Microsoft Corporation

Summary

The Vision

Arguments

So, what have we learned from the Web?

What have we learned from servers?

Microsoft's Opinion

What is left to do to enable this interaction?

Enhance XML to support extensible and rich typing and meta-data:

Describe the grammars:

Ensure that the programming models for XML stay simple and appropriate:

Stores:

Converters architecture . . .

Implications of all of this for the web:

Predictions:

Summary

Europe '98

Microsoft's Vision for XML

SGML/XML '98, Paris From the Opening Keynote Address Adam Bosworth, General Manager, Microsoft Corporation

Summary

The Vision

Arguments

So, what have we learned from the Web?

What have we learned from servers?

Microsoft's Opinion

What is left to do to enable this interaction?

Enhance XML to support extensible and rich typing and meta-data:

Describe the grammars:

Ensure that the programming models for XML stay simple and appropriate:

Stores:

Converters architecture . . .

Implications of all of this for the web:

Predictions:

Summary

SGML/XML '98, Paris
From the Opening Keynote Address
Adam Bosworth, General Manager, Microsoft Corporation