[This local archive copy is from the official and canonical URL, http://www.dmg.org/public/software/ncdm/pmml/PmmlDoc.html; please refer to the canonical source document if possible.]


PMML 0.9

DISCLAIMER: PMML 0.9 is intended as a "proof of concept" specification and implementation and is not intended to be a formal PMML standard document.

What is PMML 0.9?

Predictive Model Markup Language (PMML) is  used for  describing  the structure and  intent of the  data mining   models.  PMML  is  a simple  markup  language  that  uses  XML  as  its  meta-language in  a manner similar to the  way Hypertext Markup Language  (HTML)  uses SGML as its meta-language. PMML helps in giving out semantically expressive data mining models from which different predictive models can be built.

PMML  0.9  is a "proof of concept" specification of the PMML language. This specification defines the intended interpretation of PMML  0.9 elements, and places further constraints  on the  permitted  syntax, which are otherwise  inexpressible  in the  DTD. A PMML 0.9 document is stored in a file with a  .pmml extension. Also the XML DTD  for PMML 0.9 is specified  using XML and stored in a file with a .dtd extension.
 
Why PMML?
 
In  recent  years a  variety of  predictive models  have been  developed  within  the data - mining community. There is  also significant  interest in comparing and evaluating the different models. The  PMML   is  a  robust solution to  the  problem  of  interchanging   predictive  models  and  to performing   ensemble   and   distributed learning.  This   text   based   markup   language  for the predictive  models  enable  easy analysis  and comparison and also eliminates the need for binary compatibility   between the  platforms  where  these  predictive  models were  built. Also  PMML permits  the addition of a new model by just accommodating it in the DTD. It provides a flexible mechanism for  defining schema for predictive  models and supports  model selection and model averaging  when  multiple   predictive  models  are   involved.  In  addition,  it  facilitates  moving models across applications and systems.
 

The PMML 0.9 Specification (Click here for PMM 0.9 DTD)
 
A PMML  0.9 document consists of several parts:

1) Header
2) Data Schema
3) Data Mining Schema
4) Predictive Model Schema
5) Definitions for Predictive Models
6) Definitions for Ensembles of Models
7) Rules for Selecting and Combining Models and Ensembles of Models
8) Rules for Exception Handling.

Among the above components, definition for predictive models (Component 5)  is  mandatory.  In addition a schema for the predictive model must be defined. This can be done using one or more of the schemas - components 3, 4 and 5. All the other components are optional.

Click  here  for  a   sample  PMML document based on  this DTD.



 

HEADER Element

<! ELEMENT HEADER - O (DATA-SCHEMA & CREATION-INFORMATION?) >

This contains the document header, but you can always omit the end tag for HEADER. The
contents of the document header is an unordered collection of the following elements:
 

DATA-SCHEMA Element

<!ELEMENT DATA-SCHEMA - -

(ATTRIBUTE-DESCRIPTOR, ATTRIBUTE-DESCRIPTOR+ ) >

Every PMML 0.9 document must have exactly one DATA-SCHEMA element in the document's
HEADER. It provides the data-schema modeled by the given PMML file. It must contain at least
two attribute descriptors, one being the predicted attribute.

ATTRIBUTE-DESCRIPTOR Element

<! ELEMENT ATTRIBUTE-DESCRIPTOR - O ( mapping-function? ) >

<!ATTLIST ATTRIBUTE-DESCRIPTOR
                                                                       NAME CDATA #REQUIRED
                                                         USE-AS (exclude | continuous | category|binary-category)#REQUIRED
DATA-TYPE ( real | integer | boolean | string ) #REQUIRED >

It describes a single attribute of the data-schema. It can contain at most one mapping function.
NAME specifies the name of the attribute, USE-AS specifies usage of this attribute in the data
mining process and DATA-TYPE specifies the way the attribute is stored in the database.
 

    MAPPING-FUNCTION

<!ELEMENT MAPPING-FUNCTION - - CDATA >

<!ATTLIST MAPPING-FUNCTION TYPE CDATA #REQUIRED>

The mapping function describes the transformation to be performed on the attribute. The TYPE
attribute indicates the language in which the mapping function is written.

    CREATION-INFORMATION

<!ELEMENT CREATION-INFORMATION - -

( COPYRIGHT? & APPLICATION? & INDIVIDUAL? & TIMESTAMP? ) >

This is the information about how, when and by whom the model was created. It is optional and the
tags in this sub tree are self-explanatory.



 

The MODEL element

<!ELEMENT MODEL - O

( CREATION-INFORMATION?, (CART-MODEL | REGRESSION-MODEL | ID3-MODEL ) ) >

<!ATTLIST MODEL

NAME CDATA #REQUIRED

TYPE (CART | C4.5 | OC-1) #REQUIRED

TRAINING-DATA-NAME CDATA #IMPLIED

TRAINING-DATA-SIZE NUMBER #IMPLIED >

This contains the model specific part of the document. The end tag for MODEL may be omitted.
The key attributes are: model name and the model type.

    CREATION-INFORMATION

This field is the same as what was described for the HEADER block.



 

 The Model Specific Part

This block describes the details of a particular type of predictive model. We here present the PMML
for the models we support.

  1. C4.5 Model

    C45-MODEL

<!ELEMENT C45-MODEL - - ( (C45-NODE | C45-LEAF-NODE)+ ) >

<!ATTLIST C45-MODEL

...

>

    C45-NODE

<!ELEMENT CART-NODE - O EMPTY >

<!ATTLIST CART-NODE

... >

    C45-LEAF-NODE

<!ELEMENT CART-LEAF-NODE - O EMPTY>

<!ATTLIST CART-LEAF-NODE  ?  >



 

  2.CART Model

    CART-MODEL

<!ELEMENT CART-MODEL - - ( (CART-NODE | CART-LEAF-NODE)+ ) >

<!ATTLIST CART-MODEL

TYPE ( binary-classification | classification | regression ) #REQUIRED

ATTRIBUTE-PREDICTED CDATA #REQUIRED

NUMBER-NODES NUMBER #REQUIRED

DEPTH NUMBER #REQUIRED >

It marks the beginning of a cart-model. The attributes of this tag include, the TYPE of the CART
model, the attribute that is predicted using this model, the number of nodes in the tree and the
depth of the tree.

    CART-NODE

<!ELEMENT CART-NODE - O EMPTY >

<!ATTLIST CART-NODE

NODE-NUMBER NUMBER #REQUIRED

ATTRIBUTE-NAME CDATA #REQUIRED

LEFT-CHILD NUMBER #REQUIRED

RIGHT-CHILD NUMBER #REQUIRED

CUT-VALUE CDATA #REQUIRED >

This denotes a non-leaf node in the tree. Its attributes are the node number, the attribute name
associated with the node, the node numbers of its left and right children and the cut value.

    CART-LEAF-NODE

<!ELEMENT CART-LEAF-NODE - O EMPTY>

<!ATTLIST CART-LEAF-NODE

NODE-NUMBER NUMBER #REQUIRED

SCORE CDATA #REQUIRED >

This denotes a leaf node in the tree. Its attributes are the node number and the class value
associated with it.



 

  3.ID3 Model

    ID3-MODEL

<!ELEMENT ID3-MODEL - - ( (ID3-NODE | ID3-LEAF-NODE)+ ) >

<!ATTLIST ID3-MODEL

ATTRIBUTE-PREDICTED CDATA #REQUIRED

NUMBER-NODES NUMBER #REQUIRED

DEPTH NUMBER #REQUIRED >

It marks the beginning of a id3-model. The attributes of this tag include, the attribute that is
predicted using this model, the number of nodes in the tree and the depth of the tree.

    ID3-NODE

<!ELEMENT ID3-NODE - O EMPTY >

<!ATTLIST ID3-NODE

NODE-NUMBER NUMBER #REQUIRED

ATTRIBUTE-NAME CDATA #REQUIRED

CUT-VALUE CDATA #REQUIRED

LEFT-CHILD NUMBER #REQUIRED

RIGHT-SIBLING NUMBER #REQUIRED >

This denotes a non-leaf node in the tree. Its attributes are the node number, the attribute name
associated with the node, the node numbers of its left child and right sibling and the cut value.

    ID3-LEAF-NODE

<!ELEMENT ID3-LEAF-NODE - O EMPTY>

<!ATTLIST ID3-LEAF-NODE

NODE-NUMBER NUMBER #REQUIRED

CUT-VALUE CDATA #REQUIRED

SCORE CDATA #REQUIRED

RIGHT-SIBLING NUMBER #REQUIRED >

This denotes a leaf node in the tree. Its attributes are the node number, the cut value, the class value
and the node number of its right sibling associated with it.
 



 

  4.LINEAR REGRESSION

    LINEAR-REGRESSION-MODEL

<!ELEMENT LINEAR-REGRESSION-MODEL - O ( LINEAR-REGRESSION-COEFFICIENT )+ >

<!ATTLIST LINEAR-REGRESSION-MODEL

DIMENSION CDATA #REQUIRED >

It marks the beginning of a linear regression model. The attribute of this tag is the dimension of the
model.

    LINEAR-REGRESSION-COEFFICIENT

<!ELEMENT LINEAR-REGRESSION-COEFFICIENT - - EMPTY>

<!ATTLIST LINEAR-REGRESSION-COEFFICIENT

COEFFICIENT-POSITION CDATA #REQUIRED

COEFFICIENT-VALUE CDATA #REQUIRED >

This tag gives the position and the coefficient value for that position in the linear regression model.


A Sample PMML Document

                                                                            Back to Top