[This local archive copy mirrored from the canonical site: http://www.microsoft.com/xml/xmlfaq.htm, 980213; links may not have complete integrity, so use the canonical document at this URL if possible.]
Last updated: January 8, 1998
Download this document in Microsoft Word (.DOC) format (zipped, 9.7K).
What is XML?
Does XML replace HTML?
What are the benefits of adding XML to HTML?
Where will XML be used on the Web?
Does Microsoft® Internet Explorer 4.0 support XML?
What is the difference between SGML and XML?
What is the relationship between HTML, Dynamic HTML, and XML?
Will it be necessary to compress XML for transmission over the Web?
How secure is XML as a data format? Are there plans to add security to XML?
How will XML be generated from existing databases?
What is a DTD? What is it used for?
Do Web developers have to include a DTD when they use XML to describe data?
What are XML schemas? How are they different from DTDs?
What are namespaces? Why are they important?
What is XSL? What does it let Web developers do that they can't do today?
How is XSL different from Cascading Style Sheets? Why is a new stylesheet language needed?
What is the relationship between XML and the World Wide Web Consortium?
What is the status of XML with the W3C?
What is the status of CDF in the W3C?
What is the status of XSL in the W3C?
XML Vocabularies and Data Formats
What are XML vocabularies?
What is CDF?
What is OSD?
What is OFX?
What is RDF?
Does Netscape Navigator support XML?
What tools support XML today?
What companies have promised XML support in their products in the near future?
Where will the tools come from in the future?
Extensible Markup Language (XML) is the universal language for data on the Web. It gives developers the power to deliver structured data from a wide variety of applications to the desktop for local computation and presentation. XML allows the creation of unique data formats for specific applications. It is likewise an ideal format for server-to-server transfer of structured data.
No. XML complements HTML by describing data. HTML, along with Cascading Style Sheets (CSS), will continue to be used to describe and present the physical rendition of pages. Microsoft Corp. expects many authors and developers to use XML and HTML in tandem.
There are many benefits of using XML on the Web:
Since XML describes data in a consistent, self-describing, open-format, XML could potentially be used anywhere there is a need for data interchange and delivery. We expect that initially XML will be used to describe information about HTML pages, such as the case today with channel definition format (CDF) for building Active Channel content, as well as future applications such as searching, distributed printing, etc.
More important, since XML can describe data itself, it will be useful for delivering any kind of data such as financial transactions, news updates, weather information, patient records, or legal libraries to the desktop. Once on the desktop, applications can compute on the data and dynamically present the data.
Yes, Internet Explorer 4.0 supports XML. It supports the following features:
A generalized XML parser, which reads XML files and hands them off for processing to applications, such as viewers. Microsoft has two parsers, the MSXML Parser, a high-performance, nonvalidating parser written in C++ that ships with Internet Explorer 4.0, and the MSXML Parser in Java, available for download from http://www.microsoft.com/xml/parser/.
The XML Object Model (XML OM) uses the W3C standard Document Object Model (DOM) to allow programmatic access to the structured data, through the XML parsers, giving developers the power to interact and compute on the data. For more information on the DOM, see http://www.w3.org/DOM/ .
The XML Data Source Object (XML DSO) allows developers to connect to XML data and supply it to the HTML page using Dynamic HTML's data binding facility.
SGML (ISO 8879), or the Standard Generalized Markup Language, is the international standard for defining descriptions of structure and content in electronic documents. XML is a simplified version of SGML; XML was designed to maintain the most useful parts of SGML. Whereas SGML requires that structured documents reference a Document Type Definition (DTD) to be "valid," XML allows for "well-formed" data and can be delivered without a DTD. XML was designed so that SGML can be delivered, as XML, over the Web.
HTML (and CSS today) are used in conjunction to format and present hyperlinked pages. Dynamic HTML, through the Document Object Model, makes all elements in HTML accessible through language-independent scripting and other programming languages, thus dramatically increasing client-side interactivity without additional requests to the server. The page's object model allows any aspect of its content (including additions, deletions, and movement) to be changed dynamically. By adding XML for structured data, developers have the technologies needed to build the next generation of rich Web applications. With it, they can deliver structured data to the desktop, compute on the data via the object model, apply formatting rules to the data with Extensible Style Language (XSL), and present the data with HTML.
In general, the need to compress XML data will be application-dependent and largely a function of the amount of data being moved between the server and the client. XML compresses extremely well due to the repeated nature of the tags used to describe the structure of the data. Benchmarks will be provided in the future to assist with determining whether compression is necessary. It is worth noting that compression is standard to HTTP 1.1 servers and clients, and XML will automatically benefit from this.
XML is as secure as HTML. Just as HTTPS can be used to add encryption to HTTP, protecting HTML, it can be used to protect XML. XML is text-based in format to represent structured data. This maximizes simplicity and interoperability with the data. A number of steps can be taken to add security and authentication to the XML format. First, XML can be encrypted on the server before transmission to the client, then decrypted on the client. In addition, XML can be authenticated by digital signatures applied to the data itself.
In general, this will be handled using a three-tier architecture. Agents will be built to run on the middle tier to access multiple existing DBMSs and output XML. These agents will also support the ability to generate XML updategrams bidirectionally, to inform the client of changes made to the data on the middle tier or database server, and vice versa. Consequently, the agents will be able to receive updategrams from the client and send updates to the DBMS.
The Document Type Definition (DTD) describes the valid syntax of a class of XML documents. That is, it lists a number of element names, defines which elements can appear in combination with which other ones, what attributes are available for each element type, etc. A DTD file uses a different syntax from that used by XML documents.
No. XML can be used to describe data with or without a DTD. The term "valid XML" refers to XML data that references a DTD, while "well-formed" XML refers to XML that does not use a DTD. The addition of well-formed XML is one of the fundamental differences between XML and SGML. Clearly, in both cases, the XML itself must conform to the standards for the language (so, for example, all tags must be closed and tags may not overlap).
Schemas combine concepts from DTDs, relational databases, and object-oriented design. Schemas can describe the structure of XML documents, databases, directed-labeled-graphs, and other similar organizations of data. Schemas supply additional semantic information to documents, and contain new facilities such as as data types, inheritance, and extensibility that are not available in DTDs. Schemas use the same syntax as XML documents. Schema components are reusable through the facility of "XML namespaces."
Namespaces are another advanced feature of XML, outlined in a W3C note as part of the XML 1.0 specification. They allow developers to qualify uniquely the element names and relationships and make these names recognizable, to avoid name collisions on elements that have the same name but are defined in different vocabularies. They allow tags from multiple namespaces to be mixed, which is essential if data is coming from multiple sources.
For example, a bookstore may define the <TITLE> tag to mean the title of a book, contained only within the <BOOK> element. A directory of people, however, might define <TITLE> to indicate a person's position, for instance: <TITLE>President</TITLE>. Namespaces help define this distinction clearly.
XSL is a stylesheet language that defines the rules for mapping structured XML data to, for example, HTML for presentation. A group of these rules defines a stylesheet. With XSL, developers can generate a presentation structure that may be quite different from the original data structure. XSL allows an element to be formatted and displayed in multiple places on a page, rearranged or removed from display. For example, an <ITEM> element described in an XML-based purchase order could be presented in HTML in a list <UL> or in a table <TD>. Many stylesheets can exist for one set of data, describing various delivery platforms or output devices.
XSL is compatible with CSS and is designed to handle the new capabilities of XML that CSS can't handle. XSL is derived from Document Style Semantics and Specification Language (DSSSL), a complex stylesheet language with roots in the SGML community. The syntax of XSL is quite different from CSS, which could be used to display simple XML data but isn't general enough to handle all the possibilities generated by XML. XSL adds the capability to handle these possibilities. For instance, CSS cannot add new items or generated text (for instance, to assign a purchase order number) or add a footer (such as an order confirmation). XSL allows for these capabilities. (For more information, see http://www.w3.org/ .)
Back to list of questions
The W3C has an active XML Working Group. Microsoft was one of the co-founders of this group in June 1996, and since then numerous industry players have joined, including Netscape Communications Corp. For more information on the XML standards process, see http://www.w3.org/ .
XML version 1.0 has just moved from the working draft phase to the proposed recommendation phase, the last step in the approval process before becoming a W3C recommendation. For more information on the current XML specification, and on the submission and review process within the W3C, please refer to http://www.w3.org/ .
Channel Definition Format (CDF), an application based on XML, was recently resubmitted to the W3C. This resubmission of CDF takes advantage of some of the recent advances in the XML world at the W3C, namely XML/RDF (Resource Description Format). For example, it includes an RDF diagram of the CDF vocabulary showing the relationships between various elements within CDF. For more information, see http://www.w3.org/metadata/rdf/overview.html .
The specification was jointly submitted to the W3C by Microsoft, ArborText Inc., and Inso Corp. in September 1997 and is now under review. For more information, see http://www.microsoft.com/standards/xsl/.
Back to list of questions
XML vocabularies are the elements used in particular applications or data formats -- the definitions of the meanings of those formats. For example, in CDF, element names such as <Schedule>, <Channel>, and <Item> make up the vocabulary for describing collections of pages, when these pages should be downloaded, etc. Vocabularies, along with the structural relationships between the elements, are defined in XML DTDs or XML schemas.
Channel Definition Format is an XML-based data format used in the Microsoft Internet Explorer 4.0 browser, for describing Active Channel content and the Desktop Components. It is used by thousands of content developers and millions of end users to describe collections of pages and data about pages, such as channel bar display, download behavior, Web page usage, and page-hit logging. For more information on CDF, see http://www.microsoft.com/standards/cdf-f.htm.
Open Software Description (OSD) is an XML-based data format fully supported in Microsoft Internet Explorer 4.01, for advertising and installing software components over the Internet. When new versions of software become available, OSD provides a mechanism to notify the user (referred to as publishing). In addition, OSD provides the functionality to describe in great detail how to install ActiveX Controls and Java packages and class files, adding functionality to the use of .INFs for setup. Microsoft and Marimba Inc. submitted this specification to the W3C in August 1997. For more information, see http://www.microsoft.com/standards/osd/.
Open Financial Exchange (OFX) is a data format that Microsoft Money and Intuit Quicken personal finance applications use to communicate with financial institutions over the Web. Although it is currently described using SGML, OFX will soon be based on XML.
RDF, or Resource Description Format, is a future XML-based application being developed in the W3C. It brings together ideas from Meta Content Format, or MCF (technology acquired by Netscape from Apple Computer Inc.) and XML-Data (defined in a position paper written by Microsoft, Inso, Arbortext and other experts).
RDF allows for generalized searching of information without application-specific rules, such as those defined in DTDs. RDF allows a complementary view of data through graphs and nodes, rather than through a structured tree, which the current XML technology enables. RDF, together with XML schemas, will provide a standard way for developers to write these relationships down for broad classes of XML elements.
The crucial technologies that will deliver value this year and next are XML for structured data, XML namespaces to make names unique and recognizable, and new XML tags that add meaning to data so smarter search engines can perform better searches.
Back to list of questions
No. Netscape has recently talked about support for XML, and the company recently joined the XML Working Group in the W3C, but it has referred to XML as a "futures" technology, for release in 1998. Microsoft supports XML today in Internet Explorer 4.0.
Back to list of questions
Many of the top SGML vendors have made generalized XML versions of their products available, such as ArborText Adept7 (http://www.arbortext.com/ ) and Inso Dynabase (http://www.inso.com/ ), Chrystal Software Astoria (http://www.chrystal.com/ ), and POET Object Server (http://www.poet.com/ ) for authoring, editing and database publishing. Other vendors, such as DataChannel Inc. (http://www.datachannel.com/ ) have products based on XML for data management.
Allaire Corp., ExperTelligence Inc., InterMax Solutions Inc., Pictorius Inc., Powersoft, and SoftQuad Inc. recently committed to providing XML support in their products by March 1998.
Microsoft expects a wide variety of applications to be developed in the coming months that convert information currently stored in documents and databases into XML for delivery to the desktop. In addition, Microsoft expects XML-centric databases, rich authoring and application developer tools, as well as data format-specific tools such as wizards to be developed as new vocabularies are defined.
Back to list of questions