MSDN Online Web Workshop   All Products  |   Support  |   Search  |   microsoft.com Home  
microsoft.com Home
  Home  |   Magazines  |   Libraries  |   Developer Centers  |   Resources  |   Downloads  |   Search MSDN
43 total users have rated this article, result: 
4.5 out of 5.
Web Workshop  |  XML (Extensible Markup Language)

Frequently Asked Questions About XML

Microsoft Corporation
Updated: June 7, 2000

Contents

General Questions

What is XML?
What is MSXML?
What does the Microsoft XML Parser do?
What's the difference between MSXML, MSXML2, and MSXML3?
Does XML replace HTML?
What are the benefits of adding XML to HTML?
Is XML just for hard-core developers?
What do I need to get started with XML?
What are some real examples of how XML can be used?
Can I ignore XML?
Does Microsoft Internet Explorer 4.0 support XML?
What is the level of XML support in Internet Explorer 5.0?
How are HTML, Dynamic HTML, and XML related?
Why is XML so important?
What XML products does Microsoft provide?
Will it be necessary to compress XML for transmission over the Web?
How secure is XML as a data format? Are there plans to add security to XML?

Validation

What is a DTD and what is it used for?
Do Web developers have to include a DTD when they use XML to describe data?
What are XML schemas? How are they different from DTDs?
What are namespaces and why are they important?

XSLT and XPath

What is XSLT?
What's the difference between XSL, XQL, XSL Patterns, and XSLT?
What is XPath?
Why is XSLT so important to XML?
What's the difference between XSLT and CSS? Aren't they both stylesheets?

Standards

How compliant is Microsoft with the XML standards?
What is the relationship between XML and the World Wide Web Consortium (W3C)?
What is the status of XML with the W3C?
What is the status of DOM in the W3C?

Tools Support

Do SQL Server and ADO Support XML?
Are there any Microsoft tools at the moment that can help me leverage XML quickly?
What is SOAP?
How does XML fit into the Microsoft Windows® Distributed InterNet Applications (Windows DNA) strategy for building three-tier, Web-enabled applications?

Issues and solutions

Why is my document object still empty after I call the Load() method?
How do I load a document with foreign and special characters?
How do I use MSXML COM components in Visual Studio 6.0 C++?
How do I use HTML Entities in my XML?
How is white space handled in element content?
How is white space handled for attributes?
How is white space handled in the XML object model?
What does the XML declaration do?
How do I print my XML document in a readable format?
How do I use namespaces in DTDs?
How do I use XMLDSO in Visual Basic?
How do I use the XML DOM with Java?

General Questions

What is XML?

Extensible Markup Language (XML) is the universal language for data on the Web. It gives developers the power to deliver structured data from a wide variety of applications to the desktop for local computation and presentation. XML allows the creation of unique data formats for specific applications. It is also an ideal format for server-to-server transfer of structured data.

What is MSXML?

MSXML is the Microsoft software component that provides core XML services.

What does the Microsoft XML Parser do?

The latest version of Microsoft's core XML services provides the following four distinct features.

All four features are contained in the same MSXML library package, which is available at no cost from the MSDN XML Developer Center.

What's the difference between MSXML, MSXML2, and MSXML3?

XML has undergone a number of iterations in the past three years, and it's perhaps not surprising that there are different versions of the Microsoft XML parser in existence. Internet Explorer 4.0 contained an early version of the XML parser, one that predates the existence of XSL, XML data, or most other XML technologies (and that has a completely different DOM model). This early version of the parser is contained in the MSXML.dll library. You can upgrade your parser to a more current one from the MSDN XML Developer Center.

It is highly recommended that you upgrade to the new parser because it is far superior. Internet Explorer 5.0 includes the MSXML 2.0 parser, which contained preliminary versions of XSL and XML Schema. MSXML2 is the version of the parser that ships with SQL Server 2000. MSXML2 contains many performance-enhancing features and, in general, has improved performance and scalability. MSXML3 is the version that is currently shipping as a Technology Preview. MSXML3 includes XSLT and XPath support as well as SAX interfaces.

Does XML replace HTML?

XML offers more flexibility than HTML, but it's not likely that it will replace HTML any time soon. In fact, XML and HTML work quite nicely together. Microsoft expects many authors and developers to use XML and HTML in tandem, for example, by using XSLT to generate HTML.

What are the benefits of adding XML to HTML?

Here are some benefits of using XML on the Web:

Is XML just for hard-core developers?

No. Just like HTML documents, XML documents can be created by just about anybody—even someone without any programming experience. XML is just a standard way of describing information. Moreover, it is a language that can be written without the use of any specialized software. You can author an XML document in a text editor and drop it directly into your Web site without ever needing to write a line of code in the traditional sense.

What do I need to get started with XML?

To use XML, you need an XML parser that reads an XML document and makes its contents available for processing. Microsoft provides a parser that you can download from the MSDN XML Developer Center.

To create XML documents, you can use a text editor, such as Notepad, or any other editor you use for creating HTML pages. To create full-fledged XML applications, use a programming environment such as Microsoft® Visual Studio®.

What are some real examples of how XML can be used?

XML is being used in a surprisingly wide number of applications, from Web site creation and documentation to database integration and distributed programming. Just a few areas where XML has found its niche include:

Can I ignore XML?

Not if you want you want to compete in an Internet-based world. XML is a language that is causing a paradigm shift in the way that we think about programming itself. The traditional dedicated client/server application is giving way to "access anytime, anywhere" Internet services, and XML is a logical medium to handle everything from data access to form processing to presentation in this new environment.

Does Microsoft Internet Explorer 4.0 support XML?

Yes, Internet Explorer 4.0 supports XML with the following features:

What is the level of XML support in Internet Explorer 5.0?

Internet Explorer 5 provides the following XML support:

How are HTML, Dynamic HTML, and XML related?

HTML is used in conjunction with CSS to format and present hyperlinked pages. Dynamic HTML, through the DOM, makes all elements in HTML accessible through language-independent scripting and other programming languages, thus dramatically increasing client-side interactivity without additional requests to the server. The page's object model allows any aspect of its content (including additions, deletions, and movement) to be changed dynamically.

By adding XML for structured data, developers have the technologies they need to build the next generation of rich, flexible Web applications. With XML, they can deliver structured data to the desktop and compute on the data with the XML Object Model. Today developers can display XML-based data in a browser, such as Microsoft Internet Explorer 4.0 and Microsoft Internet Explorer 5, or in other applications through scripting. In addition, they can also apply formatting rules to the data without complex scripting using XSLT stylesheets, which essentially transform the XML-based data into display. These two methods of displaying XML-based data make it possible to generate multiple views of complex data.

Why is XML so important?

XML is poised to become the future of computing. As a technology its effects will permeate every aspect of programming, from embedded systems to graphical interfaces, to distributed systems and database management. It has become the de facto standard for data communication among the software industry, and is rapidly replacing EDI systems as the primary medium for business interchange in almost every industry on the planet. It will likely become the language in which most documents are created and stored, both on and off the Internet, and could well become the foundation for Internet application servers that some believe will replace many of the shrink-wrapped products currently produced.

What XML products does Microsoft provide?

Microsoft has worked hard to develop XML resources in a number of areas:

Will it be necessary to compress XML for transmission over the Web?

In general, the need to compress XML data is application dependent and largely a function of the amount of data being moved between the server and the client. XML compresses extremely well due to the repetitive nature of the tags used to describe the structure of the data. It is worth noting that compression is standard for HTTP 1.1 servers and clients, and XML automatically benefits from this.

How secure is XML as a data format? Are there plans to add security to XML?

XML is as secure as HTML. Just as secure HTTP (HTTPS) can be used to add encryption to HTTP, thereby protecting HTML, it can also be used to protect XML. XML is a text-based format for representing structured data. This maximizes simplicity and interoperability with the data. A number of steps can be taken to add security and authentication to the XML format. First, XML can be encrypted on the server before transmission to the client, and then decrypted on the client. Digital signatures applied to the data itself can also authenticate XML.

Validation

What is a DTD and what is it used for?

The document type definition (DTD) defines the valid syntax of a class of XML documents. That is, it lists a number of element names, which elements can appear in combination with which other ones, what attributes are available for each element type, etc. A DTD uses a different syntax from that used by XML documents.

Do Web developers have to include a DTD when they use XML to describe data?

No. XML can be used to describe data with or without a DTD. The term "valid" XML refers to XML data that references a DTD, while "well-formed" XML refers to XML that does not use a DTD. The addition of "well-formed" XML is one of the fundamental differences between XML and Standard Generalized Markup Language (SGML). Clearly, in both cases, the XML itself must conform to the standards for the language (for example, all tags must be closed and tags cannot overlap).

What are XML schemas? How are they different from DTDs?

While XML 1.0 supplies a mechanism—the DTD—for defining the content model of an XML document, it is evident that a more comprehensive and rigorous method of defining a content model is needed. An XML schema is the definition (both in terms of its organization and its data types) of a specific XML structure. An XML schema uses the XML Schema language to specify how each type of element in the schema is defined and what data type that element has associated with it. Perhaps one of the most compelling features of schemas compared to DTDs is that a schema is itself an XML document. This means that it can be read by the same tools that read the XML it describes.

Microsoft's XML services currently support XML-Data schemas, representing a snapshot of the W3C Schema activity at the time Internet Explorer 5 shipped in March 1999. XML-Data schemas allow developers to add data types to their XML documents and define open content models. Such extensions to the functionality of DTDs are critical to programming with XML.

The W3C, however, is preparing XML Schema Definitions (XSD), which will be the XML Schemas standard. Microsoft plans on making support for XML Schema Definitions (XSD) part of its core XML services as soon as the specification becomes a recommendation.

What are namespaces and why are they important?

Namespaces are another advanced feature of XML and are outlined in a W3C note as part of the XML 1.0 specification. They allow developers to qualify element names and relationships. Namespaces make element names uniquely recognizable to avoid name collisions for elements that have the same name but are defined in different vocabularies. They allow tags from multiple namespaces to be mixed, which is essential if data is coming from multiple sources.

For example, a bookstore may define the <TITLE> tag to mean the title of a book, contained only within the <BOOK> element. A directory of people, however, might define <TITLE> to indicate a person's position, for instance:

  <TITLE>President</TITLE>

Namespaces help define this distinction clearly.

XSLT and XPATH

What is XSLT?

XSLT Non-MS link, or the Extensible Stylesheet Language for Transformations, is a formal W3C recommendation that was approved on November 16, 1999. It is a language in both the markup and the programming sense of the word in that it provides a mechanism to transform an XML structure into either another XML structure, HTML, or any number of other text-based formats (such as SQL). While it can be used to create the display output of a Web page, the real power of XSLT is its ability to change the underlying structures rather than simply the media representations of those structures, as is the case with cascading style sheets (CSS).

What's the difference between XSL, XQL, XSL Patterns, and XSLT?

The origins of XSLT derive from the limitation of CSS to provide a way to make structural alterations to an XML document, at a time when the compelling reason for the creation of XML was oriented more toward creating a replacement of XML than of providing a universal data description language. The Extensible Stylesheet Language (XSL) thus became an effort to build a new way of formatting XML.

However, it soon became evident to both the participants in the W3C Style Working group and to early XML adopters that a language that could transform XML from one format to another would radically simplify much of the code that was being generated. Microsoft released a proposal to the W3C, initially termed the XML Query Language (or XQL), which in turn was adopted by the W3C as the XSL Pattern language. Most of the features of this language in turn found their way into the final XSLT specification.

The resulting standard incorporates parameters for modifying XSLT for varying initial conditions, named templates for creating functional blocks of code, and a number of enhanced functions for numeric and string manipulation. XSLT also has provisions for adding functionality built into the language, something that Microsoft takes advantage of in its implementation to add a number of highly useful features including access to COM objects and scripting.

What is XPath?

XPath Non-MS link is a query language defined for XML that provides a simple syntax to select the subset of nodes in a document. With XPath, you can retrieve collections of elements by specifying a directory-like path (hence the name) as well as conditions placed on the path. XPath is critical both for XSLT and the XML DOM, and also has ties to the XPointer specification (which lets you select fragments of documents based on combinations of Uniform Resource Locators [URLs] and XPath expressions).

Why is XSLT so important to XML?

XSLT is a language that transforms one XML document to another. That means it provides a mechanism for single-sourcing XML data, for creating rich views in Web pages that can be dynamically changed by the user, and for filtering data for targeted communications. XSLT is robust enough to encode business rules. It can generate graphics (not just Web pages) from data. It can even handle communicating with other servers—especially in conjunction with scripting modules that can be integrated into the XSLT—and it can generate the appropriate messages within the body of XSLT itself. While it is not likely to replace most of the interactions within desktop systems (for reasons pertaining to both performance and ease of use), XSLT will likely end up becoming one of the primary "programming" languages for communicating between systems within the next few years.

What's the difference between XSLT and CSS? Aren't they both style sheets?

Cascading style sheets (CSS) work by assigning a set of display properties to an HTML element. CSS determines the visual appearance of a page, but doesn't alter the structure of the source document.

XSLT, on the other hand, is known as a template-based language that lets you map a certain pattern in the source document with output written in XML, HTML, or plain text. With XSLT, you can transform the structure of an XML document into a different XML document. For example, you can change the order of an XML document, add or delete elements, perform conditional tests, or iterate through collections of elements.

XSLT and CSS are not incompatible standards. One useful technique in creating Web pages in XML is to use XSLT to transform XML into structures such as lists or tables, then apply CSS to the result to control how these structures appear in the appropriate medium. You can even create CSS from XSLT.

Standards

How compliant is Microsoft with the XML standards?

Microsoft has been at the forefront of XML practically since the inception of the language, and significantly, most of the XML recommendations and working drafts that the W3C has produced in the last several years have included input and participation from at least one, and in some cases, several Microsoft employees. Microsoft has been very much committed to working with the W3C standards bodies to insure that XML develops in a way that benefits all users, and has had critical input in the development of a number of different areas, including the XML Specification, DOM, XSLT, and Schema Definition Languages. Microsoft is committed to staying compliant with emerging specifications and standards.

What is the relationship between XML and the World Wide Web Consortium (W3C)?

The W3C has an active XML Working Group. Microsoft was one of the cofounders of this group in June 1996, and since then numerous industry players have joined, including Netscape Communications Corp., IBM, and Oracle. For more information on the XML standards process, got to the W3C Web site Non-MS link.

What is the status of XML with the W3C?

XML version 1.0 was formally ratified in December 1998 and is now a stable standard. For more information on the current XML specification, and on the submission and review process within the W3C, please refer to the W3C Web site Non-MS link.

What is the status of DOM in the W3C?

The W3C document for DOM Level 1 has a status of Recommendation. This means that the W3C is now promoting it as a standard on the World Wide Web. For more information on DOM and on the submission and review process within the W3C, see the DOM specification Non-MS link.

Tools Support

Do SQL Server and ADO Support XML?

Microsoft ActiveX Data Objects (ADO) technology provides a number of ways to convert database record sets (collections of data records) into an XML format, as well as tools to take XML in a given structure and revert it back into any database that ADO supports (including SQL Server and Oracle databases). In addition, through the XML Data Source Object in MSXML2 and MSXML3, you can load arbitrary XML directly into ADO to generate recordsets.

SQL Server 2000 also lets you set and retrieve XML directly through a URL in much the same way that you would call up a Web page. This is a powerful mechanism for working with data, because it essentially means that you can integrate your SQL Server data into XSL filters, into Web pages, basically anywhere that an XML document can go. Furthermore, you can set up custom templates for controlling how the XML gets produced from the SQL Server data, making the database a powerful tool for generating XHTML pages.

Finally, applications such as BizTalk Server let you map between any number of different data sources (from XML documents to databases to Excel and Word documents), create complex data pipelines for your Web architecture, and build effective schemas for your XML database needs.

Are there any Microsoft tools at the moment that can help me leverage XML quickly?

Microsoft BizTalk Server 2000, an XML-based server for data interchange, provides the infrastructure and tools to enable e-commerce business communities. The foundation for BizTalk Server is its rules-based business document routing, transformation, and tracking infrastructure. This infrastructure enables companies to integrate, manage, and automate business processes by exchanging business documents (for example, purchase orders and invoices) among applications within or across organizational boundaries. For more information, see Microsoft BizTalk Server 2000 Non-MSDN Online link .

What is SOAP?

SOAP is the Simple Object Access Protocol, a way to create widely-distributed, complex computing environments that run over the Internet using existing Internet infrastructure. SOAP is about applications communicating directly with each other over the Internet in a very rich way. For more information about SOAP, see the SOAP specification.

How does XML fit into the Microsoft Windows® Distributed InterNet Applications (Windows DNA) strategy for building three-tier, Web-enabled applications?

XML is quickly becoming the vehicle for delivering structured data from the middle tier to the desktop. XML-based data can be integrated from multiple back-end (database) sources, using agents on the middle tier. Schemas (see the XML-Data section) can improve this process, as developers can describe and exchange data more precisely.

Issues and Solutions

Why is my document object still empty after I call the Load() method?

By default, operations are loaded asynchronously. This means that if you provide an http URL location, the load() method will return immediately and your document object will still be empty because the data hasn't come back from the server yet. To fix this, add the following line to your code:

xmldoc.async = false;
Also, if you are loading http XML documents from a standalone C++ application, you will have to query the message queue in order to continue downloading.

How do I load a document with foreign and special characters?

A document may contain foreign characters such as the following:

<test>foreign characters (úóíá) </test>
Foreign characters such as úóíá must be prefaced with an escape sequence. Foreign characters can be either UTF-8 encoded or specified with a different encoding as follows:
<?xml version="1.0" encoding="iso-8859-1"?>
<test>foreign characters (úóíá) </test>

Now your XML will load correctly.

Other characters are reserved in XML and also need to be handled differently. The following XML:

<foo>This & that</foo>
generates this error:
Whitespace is not allowed at this location.
Line 0000001: <foo>This & that</foo>
Pos  0000012: ----------^

The ampersand is part of the syntactic structure of XML and will not be interpreted as an ampersand if simply placed within an XML data source. You need to substitute a special character sequence called an "entity".

<foo>This & that</foo>

The following characters require the corresponding entities:

<	&lt;
&	&amp;
>	&gt;
"	&quot;
'	&apos;
Quote characters are used as delimiters for attribute values inside a tag, and therefore cannot always be used inside the value of an attribute. For example, the following will return an error:
<foo description='John's Stuff'>

The single quote is used both as an attribute delimiter and in the attribute value itself. To fix this, you can either switch to use a double quote for the attribute delimiter as follows:

<foo description="John's Stuff">

Or you can escape the single quote to the entity '

<foo description='John' Stuff'>
Both of the above will return the attribute value John's Stuff via the getAttribute method in the XML object model. Similarly for the double quote, you can use the entity
&quot;.

You can also handle special characters in element content by putting your text inside a CDATA section. The following is valid:

<xml>
  <![CDATA[ This & that <stuff> is just "text" content. ]]>
</xml>

In this example, the XML Object Model will show a CDATA node as a child of the xml node which will return the string

This & that <stuff> is just "text" content.

as the nodeValue.

How do I use MSXML COM components in Visual C++ 6.0?

The easiest way to use MSXML COM components in Visual C++ 6.0 is to use the #import directive:

#import "msxml.dll" named_guids no_namespace
This defines all the IXML* interfaces and interface IDs so you can use them in your application. You can also get the MSXML type libraries and header files, and the uuid.lib that contains the class IIDs from the INETSDK.

How do I use HTML Entities in my XML?

The following XML contains an HTML entity:

<copyright>Copyright © 1999, Microsoft Inc, All rights reserved.</copyright>

It generates the following error:

Reference to undefined entity 'copy'. 
Line: 1, Position: 23, ErrorCode: 0xC00CE002
<copyright>Copyright © 1999, ...
----------------------^

This is because XML has only five built-in entities. See How do I load a document with special characters? for more information about built-in entities.

To use HTML entities, you need to define them with a DTD. To find out more about DTDs, see the W3C XML Recommendation Non-MS link. To use this DTD, include it directly in a DOCTYPE tag as follows:
<!DOCTYPE foo SYSTEM "http://msdn.microsoft.com/xml/general/htmlentities.dtd">
<copyright>Copyright © 1999, Microsoft Inc, All rights reserved.</copyright>

For this to load, you need to turn off the validateOnParse property of the IXMLDOMDocument interface. Try pasting this into the Validator Test Page, turn off DTD validation, and click Validate. Notice that the document loads and the copyright character is available in the DOM tree shown at the end of the validator page.

If you are already doing DTD validation, then you must include the HTML entities as a parameter entity in your existing DTD as follows:
<!ENTITY % HTMLENT SYSTEM "http://msdn.microsoft.com/xml/general/htmlentities.dtd">
%HTMLENT;
This will define all the HTML entities so you can use them in your XML document.

How is white space handled in element content?

The XML DOM has three methods for accessing the text content of elements:

Property Behavior
nodeValue Returns the original text content (including white space) on TEXT, CDATA, COMMENT, and PI nodes as specified in the original XML source. Returns null on ELEMENT nodes and on the DOCUMENT itself.
data Same as nodeValue
text Recursively concatenates multiple TEXT and CDATA nodes in a specified subtree and returns the combined result.

Note: White space consists of newline, tab, and space characters.

The nodeValue property always returns what is in the original document independent of how the document is loaded and current xml:space scope.

The text property concatenates all text in the specified subtree and expands entities. This is dependant upon how the document is loaded, the current state of the preserveWhiteSpace switch, and the current xml:space scope, as follows:

preserveWhiteSpace = true when the document is loaded
preserveWhiteSpace=truepreserveWhiteSpace=true preserveWhiteSpace=falsepreserveWhiteSpace=false
xml:space=preserve xml:space=default xml:space=preserve xml:space=default
preserved preserved preserved preserved and trimmed

preserveWhiteSpace = false when the document is loaded
preserveWhiteSpace=truepreserveWhiteSpace=true preserveWhiteSpace=false preserveWhiteSpace=false
xml:space=preserve xml:space=default xml:space=preserve xml:space=default
half preserved half preserved and trimmed half preserved half preserved and trimmed

Where preserved means the exact original text content as found in the original XML document, trimmed means the leading and trailing spaces have been removed, and half preserved means that "significant white space" is preserved and "insignificant white space" is normalized. Significant white space is white space inside of text content. Insignificant white space is white space between tags as follows:

<name>\n
\t<first>    Jane</first>\n
\t<last>Smith     </last>\n
</name>

In this example, the red is insignificant white space and can be ignored, while the green is significant white space since it is part of the text content and therefore has a significant meaning and cannot be ignored. So in this example, the text property returns the following results:

state returned value
preserved
"\n\t    Jane\n\tSmith    \n"
preserved and trimmed
"Jane\n\tSmith"
half preserved
"    Jane Smith    "
half preserved and trimmed
"Jane Smith"

Notice that "half preserved" normalizes insignificant white space, for example, the newlines and tab characters are collapsed down into a single space character. You can change the xml:space attributes and the preserveWhiteSpace switch and the text property will return a different value accordingly.

CDATA and xml:space="preserve" subtree boundaries

In the following example, the contents of the CDATA node or the "preserved" node are concatenated as they are and do not participate in the insignificant white space normalization. For example:

<name>\n
\t<first> Jane </first>\n
\t<last><![CDATA[     Smith     ]></last>\n
</name>

In this case, the white space inside the CDATA node is never "merged" with "insignificant" white space and is never trimmed. Therefore, the "half preserved and trimmed" case will return the following:

"Jane      Smith     "

Here, the insignificant white space between the </first> and <last> tags is included regardless of the contents of the CDATA node. The same result is returned if the CDATA is replaced with the following:

<last xml:space="preserve">     Smith     </last>

Entities are special

Entities are loaded and parsed as part of the DTD and appear under the DOCTYPE node. They do not necessarily have any xml:space scope. For example:

<!DOCTYPE foo [
<!ENTITY Jane "<employee>\n
\t<name> Jane </name>\n
\t<title>Software Design Engineer</title>\n
</employee>">
]>
<foo xml:space="preserve">&Jane;</foo>

Assuming that preserveWhiteSpace=false (in the scope of the DOCTYPE tag), the insignificant white space is lost when the entity is parsed. The entity will not have white space nodes. The tree will look like this:

DOCTYPE foo
    ENTITY: Jane
        ELEMENT: employee
            ELEMENT: name
                TEXT: Jane 
            ELEMENT: title
                TEXT>:Software Design Engineer
    ELEMENT: foo
       ATTRIBUTE: xml:space="preserve"
       ENTITYREF: Jane

Notice that the DOM tree exposed under the ENTITY node inside the DOCTYPE does not contain any WHITESPACE nodes. This means that the children of the ENTITYREF node will also have no WHITESPACE nodes even though the entity reference is in the scope of xml:space="preserve".

Every instance of an ENTITY referenced in a given document always has the identical tree.

If an entity absolutely must preserve white space, then it must specify its own xml:space attribute inside itself or the document preserveWhiteSpace switch must be set to true.

How is white space handled for attributes?

There are several ways of accessing an attribute value. The IXMLDOMAttribute interface has a nodeValue property, which is equal to nodeValue and a text property which is the Microsoft extension. These properties return the following:
property text returned
attrNode.nodeValue
attrNode.value
getAttribute("name")
Returns exact content (with entities expanded) as found in the original document.
attrNode.nodeTypedValue Null
attrNode.text Same as nodeValue except the leading and trailing white space is trimmed.

The XML Language specification defines the following behavior for XML Applications:
Attribute type Text returned
CDATA ID, IDREF, IDREFS, ENTITY, ENTITIES, NOTATION, enumeration
half normalized fully normalized
Where half normalized means that newlines and tab characters are converted to spaces, but multiple spaces are not collapsed into one space.

How is white space handled in the XML object model?

Sometimes the XML Object Model will show TEXT nodes containing white space characters. This can be confusing when most of the time white space is stripped. For example the following XML example:

<?xml version="1.0" ?>
<!DOCTYPE person [
  <!ELEMENT person (#PCDATA|lastname|firstname)>
  <!ELEMENT lastname (#PCDATA)>
  <!ELEMENT firstname (#PCDATA)>
]>
<person>
  <lastname>Smith</lastname>
  <firstname>John</firstname>
</person>

Generates the following tree:

Processing Instruction: xml
DocType: person
ELEMENT: person
	TEXT: 
	ELEMENT: lastname
	TEXT: 
	ELEMENT: firstname
	TEXT: 

The first name and last name are surrounded by TEXT nodes containing only white space because the content model for the "person" element is MIXED; it contains the #PCDATA keyword. A MIXED content model indicates that the elements can have text interspersed between them. Therefore, the following is also valid:

<person>
My last name is <lastname>Smith</lastname> and my first name is
<firstname>John</firstname>
</person>

And this results in the following similar looking tree:

ELEMENT: person
	TEXT: My last name is
	ELEMENT: lastname
	TEXT: and my first name is
	ELEMENT: firstname
	TEXT: 

Without the white space after the word "is" and before <lastname>, and the white space after the </lastname> and before the word "and", the sentence would be unintelligible. So, for MIXED content models, the combination of text, white space, and elements is relevant. For non-MIXED content models this is not the case.

To make the white-space-only TEXT nodes go away, remove the #PCDATA keyword from the "person" element declaration:

<!ELEMENT person (lastname,firstname)>

which results in the following clean tree:

Processing Instruction: xml
DocType: person
ELEMENT: person
	ELEMENT: lastname
	ELEMENT: firstname

What does the XML declaration do?

The XML declaration must be listed at the top of the XML document:

<?xml version="1.0" encoding="utf-8"?>

It specifies the following items:

Note: The XML declaration must be the first line in an XML document, so the following XML file:

<!--HEADLINE="Dow closes as techs get hammered"-->
<?xml version="1.0"?> 

generates the following parse error:

Invalid xml declaration.
Line 0000002:     <?xml version="1.0"?>
Pos  0000007: ------^

Note: The XML declaration is optional. If you need to specify a comment or processing instruction at the top, then don't put the XML declaration in at all. However, the encoding will be UTF-8, the default.

How do I print my XML document in a readable format?

When generating an XML file by building a document from scratch using the DOM, everything is on a single line with no whitepace in between. This is the default behavior.

The default XSL style sheet built into Internet Explorer 5 displays and prints XML documents in a readable format. For example, if you have IE5 installed, try viewing the nospace.xml file. You should see the following tree display in your browser:

- <ORDER>
 - <ITEM NAME="123">
    <NAME>XYZ</NAME> 
    <PRICE>12.56</PRICE> 
   </ITEM> 
  </ORDER>

No white space is inserted into the XML.

Printing readable XML is quite tricky, especially when you have a DTD that defines different kinds of content models. For example in the mixed content model (#PCDATA), you may not want to insert spaces because this may change the meaning of the content. For example, consider the following XML:

<B>E</B><I>lephant</I>

This better not be output as:

<B>E</B>
<I>lephant</I>

because then the word boundaries are no longer correct.

All this makes automatic printing problematic. If you do need to print readable XML, you can use the DOM to insert white space as text nodes in the appropriate places.

How do I use namespaces in DTDs?

To use a namespace in a DTD, declare it in the ATTLIST declaration of the element that uses it, as follows:
<!ELEMENT x:customer ANY >
<!ATTLIST x:customer xmlns:x CDATA #FIXED "urn:...">
The namespace has to be of type #FIXED. Namespaces on attributes work the same way:
<!ELEMENT customer ANY >
<!ATTLIST customer
          x:value CDATA #IMPLIED
          xmlns:x CDATA #FIXED "urn:...">
Namespaces and XML Schemas

DTD's and XML Schemas cannot be mixed. For example, the following

xmlns:x CDATA #FIXED "x-schema:myschema.xml"
will not result in the use of schema definitions defined in myschema.xml. The use of DTDs and XML Schemas is mutually exclusive.

How do I use XMLDSO in Visual Basic?

Using the following XML as an example:

<contacts>
 <person>
  <name>Mark Hanson</name> 
  <telephone>206 765 4583</telephone> 
 </person>
 <person>
  <name>Jane Smith</name> 
  <telephone>425 808 1111</telephone> 
 </person>
</contacts>
You can bind to an ADO Recordset as follows:

  1. Create a new VB 6.0 project.
  2. Add references to Microsoft ActiveX Data Objects 2.1 or later, the Microsoft Data Adapter Library, and Microsoft XML, version 2.0.
  3. Load the XML data into an XML DSO control using the following code:
    Dim dso As New XMLDSOControl
    Dim doc As IXMLDOMDocument
    Set doc = dso.XMLDocument
    doc.Load ("d:\test.xml")
    
  4. Map the DSO into a new Recordset object using a DataAdapter with the following code:
    Dim da As New DataAdapter
    Set da.Object = dso
    Dim rs As New ADODB.Recordset
    Set rs.DataSource = da
  5. Access the data:
    MsgBox rs.Fields("name").Value
    
    This displays the string "Mark Hanson"

How do I use the XML DOM with Java?

The IE5 version of MSXML.DLL must have already been installed. In Visual J++ 6.0, from the Project menu, select Add COM Wrapper, and choose "Microsoft XML 1.0" from the list of COM objects. This builds the required Java wrappers into a new package called "msxml". These pre-built Java wrappers are also available for download. The classes can be used as follows:

import com.ms.com.*;
import msxml.*;
public class Class1
{
  public static void main (String[] args)
  {
    DOMDocument doc = new DOMDocument();
    doc.load(new Variant("file://d:/samples/ot.xml"));
    System.out.println("Loaded " + doc.getDocumentElement().getNodeName());
  }
}

The code sample loads a 3.8 MB test file "ot.xml" from the sun religion example. The Variant class is used for wrapping the Win32 VARIANT primitive type.

You cannot use pointer comparisons on the nodes since each time you retrieve a node you actually get a new wrapper. So, rather than using the following code,

IXMLDOMNode root1 = doc.getDocumentElement();
IXMLDOMNode root2 = doc.getDocumentElement();
if (root1 == root2)...

use the following instead:

if (ComLib.isEqualUnknown(root1, root2)) ....

The total size of the .class wrappers is about 160 KB. However, to be fully compliant with the W3C specification, you should use only the IXMLDOM* wrappers. The following classes are old IE 4.0 XML interfaces and can be deleted from the msxml folder:

This brings the size down to 147 KB. You may also want to delete the following additional items:

This brings the size down to 116 KB. To get it even smaller, consider the fact that the DOM itself comes in two layers: a core layer consisting of:

and DTD information that you probably want to keep:

All nodes in an XML document are of type IXMLDOMNode, which provides complete functionality, but higher level wrappers exist for each node type. Therefore, all the following interfaces can also be deleted if you modify the DOMDocument wrapper and change these specific types to use IXMLDOMNode instead:

Deleting these brings the size down to 61 KB. However, with IXMLDOMElement, the getAttribute and setAttribute methods are useful. Otherwise, you will need to use:

IXMLDOMNode.getAttributes().setNamedItem(...)