DMG.org -- The Data Mining Group

[This local archive copy is from the official and canonical URL, http://www.dmg.org/public/techreports/pmml-1_0-overview.html; please refer to the canonical source document if possible.]

PMML 1.0 -- Overview

What Is PMML?

A PMML document provides a non-procedural definition of fully trained or parameterized analytic models with sufficient information for an application to deploy them. By parsing the PMML using any standard XML parser the application can determine the types of data input to and output from the models, the detailed forms of the models, and how, in terms of standard data mining terminology, to interpret their results.

Detailed forms of models will vary according to model types, but they all are complete textual definitions. In parsed form they provide enough information for some other entity to generate a program or perform a parse-tree driven interpretive execution of the model.

Version 1.0 of the standard provides a small set of DTDs that specify the entities and attributes for documenting decision tree and multinomial logistic regression models. This is by no means a comprehensive set, and our expectation is that this standard will evolve very rapidly to cover a robust collection of model types. The purpose of publishing this limited set is to demonstrate the fundamentals of PMML with a realistic and useful "initial value" of what will emerge as a comprehensive and rich collection of modeling capabilities.

Version 1.0 DTDs follow a common pattern of combining a data dictionary with one or more model definitions to which that dictionary immediately applies. As you will see, our dictionary elements are very primitive. We anticipate and look forward to subsequent versions of this standard introducing optimizations, such as bit vector expansions of categorical fields or log transforms of continuous fields, but we believe that before such optimizations can be included it is necessary to agree on minimally sufficient infrastructure. We also expect to provide definitions based on XML Schema definitions, once those become formal W3C recommendations.

Why PMML?

One major goal of PMML is to allow applications and on-line analytic processing tools to models obtained from multiple sources without having to deal with individual differences between those sources. Another goal is to enable combined, collaborative use of a potentially very large number of individual models and proactive administration of collections of models based on business needs as well as mathematical principles. We believe these capabilities are fundamental to effective deployment of analytic models in commercial application domains. PMML, or something very like it, is urgently needed to satisfy dramatically increased requirements for statistical and data mining tools and technologies in business systems.

PMML Strategy

PMML Version 1.0 has been developed by a loose affiliation of Angoss, Magnify, NCR, SPSS, and The University of Illinois, Chicago. Our strategy is to turn this activity into a W3C working group and have PMML become a W3C recommendation. As part of W3C affiliation we expect to increase group membership to include other major players in the data mining tools and applications space.

DMG.org is hosted by the National Center for Data Mining at The University of Illinois at Chicago (UIC)

Webmaster