MS Office 'Office 11' XML as Seen by Tim Bray
Subject: MS office XML From: Tim Bray <tbray@textuality.com> To: xml-dev@lists.xml.org Date: Thu, 24 Oct 2002 16:03:05 -0700
Justin Lipton wrote:
> Does anyone know or have ideas about what XML enabled Office 11 actually > means?
I got an extended (hours-long) demo of Word & Excel & XDocs from JeanPa and a product manager whose name I don't have handy, two or three months ago, so things may have changed but here's what I saw:
Both Word & this new XDocs thing can edit arbitrary XML docs per the constraints of any old XSD schema. No DTD supprt. There are some of the usual XML editor goodies such as suggesting what elements can go here and picking attributes. They have pretty cool facilities for GUIfied schema customization. Neither of them can help much with mixed content, which has always separated the men from the boys in the *ML editing sweepstakes.
I'm not sure that either of them are really being positioned as general-purpose XML content creation facilities up against Arbortext & Altova & Corel. I'm not sure that market is big enough to interest MS anyhow. XDocs is (strictly my opinion) an attempt to build a desktop application constructor at a level that is a bit more declarative and open than VB, but richer & more interactive than a Web browser. I'm not really convinced yet - I think MS would agree there's still quite a bit of product management to do - but it does seem to be a pretty clever piece of software. I'm pretty sure it's safe to interpret the advent of XDocs as MSFT's declaration that they're not going to do anything with XForms.
What actually turns my crank is that you can save word docs as XML and they have their own "WordML" tag set that gets generated. I took a close look at this and it's pretty interesting. Very verbose - every word on the page gets its own markup. Suppose you have the word "foo" in bold with single-underline, the WordML looks something like:
<r> <rps> <rp class="bold" /> <rp class="underline" lines="1" /> </rps>foo</r>
When you get something like a Word table or floating text box the markup gets really severely dense and ugly, but I didn't see anything that seemed egregiously wrong, it's not pretending to do anything more than capture all the semantics that Word carries around inside, which are correspondingly severely dense and ugly. And HTML tables get pretty hideous too.
Why did I like this? I didn't see anything that I couldn't pick apart straightforwardly with Perl, and if someone asked me to write a script to pull all the paragraphs out of a Word doc that contain the word "foo" in bold, well you could do that. Which seems pretty important to me.
The idea is that you can have a Word document with all that formatting and then you can mix that up pretty freely with your own schema stuff, and have validation, then you can save it as Word (your markup plus Word's) or as pure XML (discards Word's markup, leaving just yours). The old Corel WPerfect SGML editor used to be able to do this too.
WordML and VML (for graphics) and your own schemas all get namespaces and they seem to use them sensibly. JeanPa even talked to me about using real HTTP URIs pointing at schemas.microsoft.com and having RDDL or equivalent there. This gave me an opportunity for sarcastic remarks about "Imagine that, a URL on microsoft.com that stays stable for more than a week..."
Well, whaddaya know:
~/ 513> host schemas.microsoft.com schemas.microsoft.com has address 207.68.176.124
Anyhow, if they really do something like what they showed me, I'd call it a positive step.
Now, why would they do this? Ask yourself, who is going to be making the decision as to whether or not to buy the next Office upgrade? The CIO, right? Will the CIO care about a better spell-checker or other such wordprocessing fluff? I think not. Will the CIO like having the inventory of Office docs accessible to software for... well, anything? I think so. -Tim
[Source: http://lists.xml.org/archives/xml-dev/200210/msg01357.html]
Prepared by Robin Cover for The XML Cover Pages archive. See: (1) the Microsoft announcement: "Microsoft Releases First Beta of 'Office 11'. Next Version of Office to Connect People, Information and Business Processes."; (2) "Microsoft 'XDocs' Office Product Supports Custom-Defined XML Schemas."