[This local archive copy is from the official and canonical URL, http://research.microsoft.com/dtas/bnformat/default.htm; please refer to the canonical source document if possible.]
Over the last several years, there has been ongoing discussion of the potential value of creating a Bayesian Network Interchange Format (BNIF) to promote collaboration among investigators in the Uncertainty and Artificial Intelligence (UAI) community. The goal of the BNIF discussions has been to make it easy for researchers using different Bayesian network modeling and inference tools to share models with ease. A proposed BNIF, evolving from ongoing interactions and a meeting at 1996 Conference on Uncertainty in Artificial Intelligence, has been online for several years. The proposed BNIF has stimulated discussion and a set of related offshoots on shared representations of Bayesian networks.
During the 1998 Conference on Uncertainty in Artificial Intelligence in Madison, Wisconsin, there was a panel discussion concerning the future of the Bayesian Network Interchange Format. The discussion at UAI '98 converged on the value of leveraging XML (the Extensible Markup Language)---a representation introduced by the World Wide Web Consortium---to revitalize the BNIF efforts. As the discussion ended, various participants, including the Decision Theory & Adaptive Systems Group at Microsoft Research, agreed to post first drafts of XML formats for belief networks on their web sites, along with the necessary samples and documentation.
At this stage, the XML-BNIF effort is not directed towards producing an interchange format. Instead, the goal is to allow direct participants and others to compare and contrast format proposals. In addition, the ease of parsing and porting XML allows the creation of simple converters (written in Perl or Java) for migration between alternate XML-based formats. Although reading XML directly may not be as simple as reading the older BNIF format, the advantage of being able to use generic XML viewers on these proposed formats should be apparent.
It remains to be seen whether a new interchange format is a likely or even a possible outcome of this endeavor. In the meantime, however, having a small number of well-documented XML formats and their conversion utilities holds the promise of providing a large portion of the actual gain that might be achieved from a true standard.
We in the Decision Theory and Adaptive Systems group in Microsoft Research have begun the process of converting all of our more important data file formats to XML. The first result of this effort is the XBN format, which will replace our older DSC format. Parsing of the XBN format is guided by a Document Type Description file called XBN.DTD
Our development efforts have been facilitated by the extensive XML support in Microsoft's Internet Explorer version 5.0 Beta 2 software. We are currently using its MSXML.DLL component to parse XBN files. This component, of course, may not function identically to other parsers available elsewhere, but the ongoing XML efforts should eliminate serious incompatibilities over time.
The revised tool set for Bayesian network authoring being developed by the DTAS group at Microsoft Research is being built on the XBN (for "Bayesian network in XML") format. These tools support standard XML 1.0 files and their Document Type Definition components. Currently, we use the file XBN.DTD as the document description for XBN files. Unfortunately, using DTDs with XML engenders some limitations which may ulitimately relegate DTDs to documentary status.
A single file in the XBN format is a container for one or more belief networks. This container, called an ANALYSISNOTEBOOK, will be extended over time to encompass other types of objects; for example, collections of data, database connection and query information, and so on. The primary reason for the existence of the ANALYSISNOTEBOOK container is that XML currently does not allow nesting or inheritance of DTDs. Each DTD must completely describe the document being parsed, and each file must contain exactly one top-level document. Since we have near-term plans to provide support for hierarchical modeling, it was necessary to have the capability to store multiple belief networks in a single XML document from the outset.
Here is a simplified version of the element hierarchy of an XBN file.
ANALYSISNOTEBOOK: a collection of belief network models
BNMODEL: a single belief network
STATICPROPERTIES: common properties for XBN networks
DYNAMICPROPERTIES: user-defined properties for the network and its variables.
VARIABLES: the random variables (nodes) in the belief network
STRUCTURE: the dependency information (edges) in the belief network
DISTRIBUTIONS: the conditional distributions for variables in the belief network
XML provides a default namespace for each document; elements can use ID and IDREF attributes to guarantee uniqueness of names within the document. XBN uses this facility only the names of BNMODEL elements. The top-level ANALYSISNOTEBOOK element has a ROOT attribute which must give the name of the top-level model in the ensemble of models contained in it.
This example displays the Serum Calcium model developed by Greg Cooper (Cooper, 1986) in XBN format, without DTD. The Bayesian network first appeared in Cooper's PhD dissertation, and is also published in Neopolitan (1990).
<ANALYSISNOTEBOOK NAME="Notebook.Cancer Example From Neapolitan" ROOT="Cancer"> <BNMODEL NAME="Cancer"> <STATICPROPERTIES> <FORMAT VALUE="MSR DTAS XML"/> <VERSION VALUE="0.2"/> <CREATOR VALUE="Microsoft Research DTAS"/> </STATICPROPERTIES> <VARIABLES> <VAR NAME="a" TYPE="discrete" XPOS="13495" YPOS="10465"> <DESCRIPTION>(a) Metastatic Cancer</DESCRIPTION> <STATENAME>Present</STATENAME> <STATENAME>Absent</STATENAME> </VAR> <VAR NAME="b" TYPE="discrete" XPOS="11290" YPOS="11965"> <DESCRIPTION>(b) Serum Calcium Increase</DESCRIPTION> <STATENAME>Present</STATENAME> <STATENAME>Absent</STATENAME> </VAR> <VAR NAME="c" TYPE="discrete" XPOS="15250" YPOS="11935"> <DESCRIPTION>(c) Brain Tumor</DESCRIPTION> <STATENAME>Present</STATENAME> <STATENAME>Absent</STATENAME> </VAR> <VAR NAME="d" TYPE="discrete" XPOS="13960" YPOS="12985"> <DESCRIPTION>(d) Coma</DESCRIPTION> <STATENAME>Present</STATENAME> <STATENAME>Absent</STATENAME> </VAR> <VAR NAME="e" TYPE="discrete" XPOS="17305" YPOS="13240"> <DESCRIPTION>(e) Papilledema</DESCRIPTION> <STATENAME>Present</STATENAME> <STATENAME>Absent</STATENAME> </VAR> </VARIABLES> <STRUCTURE> <ARC PARENT="a" CHILD="b"/> <ARC PARENT="a" CHILD="c"/> <ARC PARENT="b" CHILD="d"/> <ARC PARENT="c" CHILD="d"/> <ARC PARENT="c" CHILD="e"/> </STRUCTURE> <DISTRIBUTIONS> <DIST TYPE="discrete"> <PRIVATE NAME="a"/> <DPIS> <DPI> 0.2 0.8</DPI> </DPIS> </DIST>
<DIST TYPE="discrete"> <CONDSET> <CONDELEM NAME="a"/> </CONDSET> <PRIVATE NAME="b"/> <DPIS> <DPI INDEXES=" 0 "> 0.8 0.2</DPI> <DPI INDEXES=" 1 "> 0.2 0.8</DPI> </DPIS> </DIST>
<DIST TYPE="discrete"> <CONDSET> <CONDELEM NAME="a"/> </CONDSET> <PRIVATE NAME="c"/> <DPIS> <DPI INDEXES=" 0 "> 0.2 0.8</DPI> <DPI INDEXES=" 1 "> 0.05 0.95</DPI> </DPIS> </DIST>
<DIST TYPE="discrete"> <CONDSET> <CONDELEM NAME="b"/> <CONDELEM NAME="c"/> </CONDSET> <PRIVATE NAME="d"/> <DPIS> <DPI INDEXES=" 0 0 "> 0.8 0.2</DPI> <DPI INDEXES=" 0 1 "> 0.9 0.1</DPI> <DPI INDEXES=" 1 0 "> 0.7 0.3</DPI> <DPI INDEXES=" 1 1 "> 0.05 0.95</DPI> </DPIS> </DIST>
<DIST TYPE="discrete"> <CONDSET> <CONDELEM NAME="c"/> </CONDSET> <PRIVATE NAME="e"/> <DPIS> <DPI INDEXES=" 0 "> 0.8 0.2</DPI> <DPI INDEXES=" 1 "> 0.6 0.4</DPI> </DPIS> </DIST>
</DISTRIBUTIONS> </BNMODEL> </ANALYSISNOTEBOOK>
This section is intended to document only the primary aspects of an XBN file.
The BNMODEL element describes an entire belief network. The attributes of a BNMODEL give its name, which must be unique within the document.
Each of these primary network elements may appear zero or more times.
This section is designed to preserve authoring and file version compatibility information. All of its elements are optional.
The FORMAT attribute currently must contain the string "0.2"; this will change over time and will hopefully allow newer parsers to read older documents.
The CREATOR attribute describes the creator of the model, and the VERSION attribute defines the version of the model. Both of these values are arbitrary and optional.
"Dynamic properties" are typed, user-defined, non-XML attributes which may be applied to networks and their variables (or nodes). (We also plan to extend dynamic properties to arcs (or edges) and distributions in the future). Dynamic properties must be declared before they can be used. All such declarations appear in this section, along with property definitions for the network itself. Property definitions for variables appear inside the VAR elements of the VARIABLES elements.
There are five possible types of dynamic properties:
These elements occur only within DYNAMICPROPERTIES elements. They are the means by which the author of the belief network declares the names and data types of model-specific properties.
A property element binds a particular property instance value to either a network or a variable. Its NAME attribute must refer to a previously declared dynamic property type. Its parsed character data (PCDATA) is the value of the property itself.
Property character data is validated by the upper-level XBN parser; the DTD-driven XML parser only checks the syntax of the declarations.
The PROPXML element acts like a PROPERTY element, but any declarations are allowed within it. XBN parsers which do not use the XBN.DTD file could accept virtually any well-formed XML statements within a PROPXML element.
The variables section defines the random variables in the model. Each variable is declared in its own VAR element; each variable must have a NAME attribute whose value is a string unique within the BNMODEL.. Names must follow the same guidelines as variable names in standard programming languages. (This provision allows for use of variable names in infix expressions in future versions.)
Each variable may have a DESCRIPTION (or long name) element. If the TYPE attribute of the variable has the value discrete, the STATENAME element is used to define the names of its discrete states; states are ordered in the sequence they appear. The XPOS and YPOS attributes define its position for graphical editors.
This section lists the edges or dependency arcs present in the network. Each ARC element has PARENT and CHILD attributes. There is also a MEMBER attribute for declaring parentless and childless nodes which are to be considered as variables in the network.
In other words, any variable whose name does not appear in the STRUCTURE section at least once is logically omitted from the resulting network.
The discrete, conditional distributions for all variables are declared in this section.
A distribution is represented by a DIST element. It may have a TYPE attribute which may be discrete or continuous. (We do not currently support continuous variables, so a complete exposition of continuous distributions has not been devised.) The target variable is given by the PRIVATE element's NAME attribute. (The usage PRIVATE alludes to our plans for "public" or sharable distributions, but that is not discussed here.)
Each distribution may have a CONDSET element which lists the target variable's parent set. The DPIS element represents the set of discrete parent instantiations; each group of probabilities in the set is represented by a DPI element. The INDEXES attribute of the DPI element gives the subscript of each parent, starting from zero, in the same order as they were listed in the CONDSET element. The PCDATA for each DPI is the set of probabilities.
We hope that this flexibility in declaration will allow simple extensions into continuous variables, as well as continuous variables with discrete parents. Again, the fact that the node names are unique and well-behaved will allow us to extend the PCDATA in DPI elements to contain infix expressions composed of references to parent variables.
Some XBN examples are available: