Paoli on Microsoft's Use of XML in Office
Bringing the XML Vision to the Desktop: How Office will Connect Data and Business Processes Across the Organization
Redmond, WA, USA. November 12, 2002.
Microsoft recently announced the beta release of the next version of Microsoft Office, codenamed "Office 11," which broadly supports XML (eXtensible Markup Language) and enables the exchange of data across diverse systems, platforms, and applications.
Extensive support of XML in Office means that Office applications can now create, view, and edit structured data stored in disparate systems. By creating specific data models, or "schemas," businesses can customize the structure of data, making it easier to search and reuse critical company data, as well as improved productivity for end users.
PressPass spoke with Jean Paoli, an XML architect at Microsoft, to find out why bringing XML to the desktop is important and what it really means for businesses. Paoli is one of the co-creators of the XML 1.0 standard with the World Wide Web Consortium (W3C) and has been a significant player in the worldwide XML community since 1985, when the technology was still known as SGML (Standard Generalized Markup Language). He helped accelerate XML activity at Microsoft by creating and managing the team that delivered the software that XML-enabled both Internet Explorer and Windows. He now works on the Office team.
PressPass: Why Microsoft is so extensively supporting XML for Office?
Paoli: With "Office 11," we are addressing a fundamental concern that we have heard over and over again from our customers: Too often, business-critical information ends up locked inside data storage systems or individual documents, forcing companies to adopt inefficient and duplicative business processes. The data might be located in a database that employees either aren't aware of or don't know how to access, in a text document that a co-worker has stored on her PC, or even somewhere on the Internet. To address this issue, Microsoft is evolving the existing document paradigm for Office by broadly supporting XML, an open and widely accepted W3C standard, in its products.
With the massive support for XML in Office 11, we're adding two highly significant pieces of functionality to the existing rich benefits that Office already affords. First, customers now have the option of saving any Microsoft Word document, or Excel spreadsheet, Visio diagram, or Access database in XML, which allows those documents to be shared across the organization and via XML Web services. But as significant as this new functionality is, an even greater and more innovative benefit is the fact that companies now can actually create their own XML schemas specific to their business, and define the structure and type of data that each data element in a document can contain. This capability opens up a whole new realm of possibilities, not only for end users, but also for the business itself because now organizations can capture and reuse critical information that in the past has been lost or gone unused.
Being able to define your own schemas is a critical business advantage, because no one knows what kind of information your company needs to have available at the click of a button better than you do. For example, a medical clinic needs to be able to define and organize information that is entirely different from the type of information that a financial institution needs. By making it possible for businesses to capture the kind of information they need in a richer, more semantic and structured way, we're enabling businesses to work with information in whatever way makes sense to their organization.
PressPass: Can you elaborate on what schemas are, and on their connection to XML?
Paoli: Finding a way to describe the actual meaning contained in a document has been a central focus of the XML community for nearly 20 years, when the technology was SGML (Standard Generalized Markup Language). Traditionally, the way a document is created, it doesn't include information about its actual content. All that's captured is the content's styling -- its size, whether the words are bold or italicized, the font, and so on. For example, a resume doesn't "know" it's a resume -- it's just a collection of words that only has meaning when a human being interprets it that way. Those of us in the XML field have long believed that if we could separate the actual content, or meaning, from the presentation of a document, then users would be able to "tag" parts of their document with labels that mean something to them. So in a resume, for instance, a user could tag the name, address, career goals, qualifications, and so on. In this way, documents of any kind could become a source of information as rich as a database.
Unlike HTML (HyperText Markup Language), which has fixed tags that force users to fit data into general categories that describe only the appearance of text, XML tags are truly extensible. This allows the creation of what we call "semi-structured" documents, or documents that have regions of meaning, in the same way that columns in a database have meaning. As a result, users are able to define the structure and the type of content that each data element in a document can contain. These schemas are often referred to as XSDs because they are generated using a standard defined by the W3C called the XML Schema Definition Language.
Because each company knows best what type of data it needs to capture, it can define for itself whatever XSDs are most relevant to its own business. These schemas can be totally internal and unique to a company. But in some industries, it makes more sense to create schemas that are used among multiple companies or organizations. One example of such an industry standard is XBRL, which stands for eXtensible Business Reporting Language, an open specification that uses XML schema to describe financial information. Another is HL7, the Health Level 7 specification developed by one of several ANSI-accredited Standards Developing Organizations that describes healthcare information. Having these standard schemas in place allow different organizations to easily share information, even if they're using completely different technologies in their respective systems, and creates some great communication and business efficiencies.
At Microsoft, we've worked very hard to bring XML and all its functionality to the hands of the masses by implementing the technology into Office, the world's leading suite of productivity software. Because we've built XML into the heart of the Office applications, customers can define their own unique regions of meaning within their documents, such as names and addresses in a resume, and then a program can go and find regions in multiple documents about the same subject and extract them and, say, build a list, such as all the names and addresses of a set of job applicants. Once extracted, that information can be reused, repurposed, indexed or reassembled for any other medium that supports XML, regardless of the server, application, or platform.
The potential is huge for customer-defined schemas to transform the way we interact with and use information, whether it's on our desktops, on another local server, or across the world via a Web service. Because now that the data is precisely defined in regions of a document, the software can begin to actually understand that information and help end users interact with it far more than ever before. And because information can now be stored in a format that is open, standardized, and semantically rich, there's great potential to evolve the Web into a huge worldwide database of knowledge, making it much easier to search, reuse, aggregate, and reassemble information.
PressPass: Will all files that get created with Office 11 applications be XML files?
Paoli: We want to give our customers the choice to decide what file format they want to use, down to the XML schema they employ. Some customers may not want to use XML -- they may prefer working in the existing Office file formats. The binary formats -- such as .doc or .xls. -- that have served our customers well will continue to be the default file formats. Others may want to use a specific XML schema, like XBRL. We leave the choice up to them.
PressPass: Can you talk about how XML is supported in the individual Office applications?
Paoli: Now that support for XML in Office is so broad, it can be confusing for people to decide which tool they should be using. Because the applications themselves function as XML editing tools, companies and their employees just need to get in the habit of first analyzing the task at hand and then deciding which tool best suits their needs.
So, let's start with Word. In instances when you need to create a long document with large areas of formatted text, such as a corporate report, resume, customer letter, or marketing plan, for example, Word remains the best choice. With Word in "Office 11," users have several options that determine how extensible the document will be. The document can be saved in the traditional =.doc format or it can be saved using Word's native XML format, which will enable the document to be viewed or repurposed on any device, across platforms, systems, and applications -- a great advantage.
But the more exciting benefit is the fact that an XSD file can be easily associated with a Word document to create a template. An XSD file defines how the information that end users create will fit into your custom-defined schema. Then, when end users work with the template, they'll be creating a document that contains information marked by XML tags that make it easier to capture and reuse the information. At the same time, the user won't even know the XML tags are there -- they don't need to know or understand anything about XML to get the benefits.
Let's use the example of a corporate profile report to get a sense of how this works. If you create a corporate report in Word, you can save it in XML so that it can be read or indexed by any application, on any platform, via any device. But if you created the report using an XSD-enabled template, you have two options. You could either save it using the tags defined by the XSD or as a combined file with the tags defined by the XSD, interwoven with Word's native XML tags, to preserve its appearance. With the second approach, you could send this profile to a securities regulatory organization, and people there would not only be able to view the document, but they could also build an XML Web service to extract the relevant information from the document and enter it into the their own database automatically, without need for user intervention and without having to rely on Word to open the file.
PressPass: So how does XML improve how users access the kind of data people generally view and edit in Excel, which is already in a tabular format?
Paoli: If you need to import and analyze or extract information that would be best represented in a grid or a tabular format, Excel is still the best desktop tool. Excel 11 now makes it easy to feed XML data into your spreadsheets. This is a huge benefit, because now customers can harness the full power of Excel for analyzing data from any XML file -- old or new, and belonging to any schema.
Just like in Word, Excel 11 gives you an easy, visual tool for relating an XSD schema to a spreadsheet. Once you've created that relationship, you can then import XML files based on that schema into Excel with a single click. The data lands in the appropriate columns and cells, conforming to what you specified with the drag and drop tool. You can even "retrofit" an XML relationship into an existing spreadsheet without having to change your formulas or charts.
An XML-enabled template can then be opened by end users who don't have any knowledge of XML. They can use the template to view and analyze data and re-use the template when new batches of data become available. Furthermore, users can enhance the template by adding new formulas or charts that reference the imported data, and those enhancements will not break the XML relationship. So if the user needs to see a pie chart of sales figures rather than a bar chart, they can modify the chart using normal Excel tools, and from then on when they import the XML data they will see their preferred chart type.
Lastly, Excel 11 has also enabled one-click export of XML data. For instance, if you need to publish your projected sales figures in an XML format on a monthly basis, and you calculate those projections in Excel, you can use the visual tool to relate the sales projections XSD schema to your existing spreadsheet. Now each month when you run the numbers, you can export the results in the proper XML schema with a single click.
PressPass: And how about other Office applications?
Paoli: If you're integrating information coming from a database into a diagram, then Visio is the appropriate Office tool. For instance, you can save a Visio diagram as XML and also incorporate data coming from any XML source. This can be done by creating a template with a Visio Solution that uses XML Web Services or with third party XML tools. Visio can persist corporate data in a Visio XML file, using a corporate schema so that the Visio XML document can be mined to retrieve the data from within the diagram.
For extracting data from one or more tables in a database, Microsoft Access is the appropriate Office tool. For instance, say you needed to extract the information of one of your customers' order from a database. With Access, you can simply browse related tables in your database, and then choose how you want to export the customer data by defining the structure of an XSD. That way you can export portions of the data from the database into an XML file following the schema you defined, and then, email that file as an attachment to a person or automated process further down the production channel.
To gather business-critical data typically created in documents such as sales reports, inventory updates, project memos, travel itineraries, and performance reviews, the most appropriate application is XDocs -- a new Office family application. XDocs can be thought of as a hybrid tool, which combines the best of a traditional document editing experience, such as a word processor or email program, with the rigorous data-capture capabilities of forms.
So you can open any XSD file and associate it to what we call an Editing View, which is defined using XSLT, or eXtensible Stylesheet Language Transformations, a standard defined by the W3C. In this view, end users can modify abstract data structures using familiar tools like rich text formatting and table and picture support. When they save the file, they can save it as an XML file belonging to the schema that's been defined.
Finally, XML can make it easier to transition content from internal systems onto the Web using FrontPage. A user might author Master-Detail reports showing live XML data on a Web site coming from an enterprise resource planning system. FrontPage lets you define how XML documents that follow any customer-defined schema will be formatted on a Web page by authoring XSLTs (eXtensible Stylesheet Language Transformations) directly within the FrontPage editor.
FrontPage supports a complete set of "WYSIWYG" tools for creating and editing XSLT Data Views on a variety of data sources including XML files, databases, and XML SOAP services. These Data Views include industry-standard reporting tools for sorting, grouping, filtering, and conditionally formatting data. Users can create high quality, dynamic web pages for presenting corporate data using these tools.
PressPass: Can you give us an example of how an "Office 11" solution might work across a business environment?
Paoli: Sure. Let's look at how information entered by a sales team can be reused for various processes, both inside and outside the company, such as the creation of a financial report.
Throughout the year, sales staff may capture customer information in a form created using XDocs. Because that information is captured in a template structured by a schema the company created, a sales manager can compile this information together with information on customer orders from an Access database into an Excel spreadsheet. She can then create graphs and charts that help her gain valuable insights from the analysis of the data.
In Word, a financial officer may then draft the body of the report, attach to it a Visio XML diagram, and send it to the securities regulatory organization. The organization can now go and extract the appropriate information from the profile and integrate it in its own database. Finally, that same financial information may later be displayed on the company's Web site using FrontPage.
At each step, additional information coming from other software products could be incorporated in the process, as long as that information is written in XML, following the customer-defined schema.
PressPass: What do you see as the future of XML and Office?
Paoli: We are very excited about the potential XML in "Office 11" holds for jump-starting new business scenarios for our customers, and we look forward to learning how they will be using Office so that we can continue to evolve the product to better suit their needs.
A few years ago, Microsoft created .NET -- a new software for connecting information, people, systems and devices. Since then, we've built our entire strategy around building in broad support for XML, which will allow users to connect to diverse systems and platforms and access information, any time, from any place, and on any device. We really believe that Office can become a great front end for XML Web services, and help generate and consume the customer data that is at the center of our strategy. And the most important element that will make this vision a reality is the broad support for XML and customer-defined schemas that we've built into "Office 11."
I'm thrilled about and very proud of the great feedback we've received from the XML community around "Office 11" so far. The advent of XML on the desktop is a great achievement, and it's something that we in the XML community have been looking forward to for a long time.
[Source: http://www.microsoft.com/presspass/features/2002/nov02/11-12XMLOffice.asp]
Prepared by Robin Cover for The XML Cover Pages archive. See "Microsoft Office 11 and XDocs."