The Cover PagesThe OASIS Cover Pages: The Online Resource for Markup Language Technologies
SEARCH | ABOUT | INDEX | NEWS | CORE STANDARDS | TECHNOLOGY REPORTS | EVENTS | LIBRARY
SEARCH
Advanced Search
ABOUT
Site Map
CP RSS Channel
Contact Us
Sponsoring CP
About Our Sponsors

NEWS
Cover Stories
Articles & Papers
Press Releases

CORE STANDARDS
XML
SGML
Schemas
XSL/XSLT/XPath
XLink
XML Query
CSS
SVG

TECHNOLOGY REPORTS
XML Applications
General Apps
Government Apps
Academic Apps

EVENTS
LIBRARY
Introductions
FAQs
Bibliography
Technology and Society
Semantics
Tech Topics
Software
Related Standards
Historic
Last modified: May 26, 2005
Predictive Model Markup Language (PMML)

[October 2004] PMML Version 3.0 description: "The Predictive Model Markup Language (PMML) is an XML-based language which provides a way for applications to define statistical and data mining models and to share models between PMML compliant applications." It is formally defined in an XML Schema. PMML "provides applications a vendor-independent method of defining models so that proprietary issues and incompatibilities are no longer a barrier to the exchange of models between applications. It allows users to develop models within one vendor's application, and use other vendors' applications to visualize, analyze, evaluate or otherwise use the models. Previously, this was very difficult, but with PMML, the exchange of models between compliant applications is now straightforward... PMML is complementary to many other data mining standards. Its XML interchange format is supported by several other standards, such as XML for Analysis, JSR 73, and SQL/MM Part 6: Data Mining."

[August 01, 2002] "Predictive Model Markup Language (PMML) is an XML-based language which provides a quick and easy way for companies to define predictive models and share models between compliant vendors' applications. A PMML document provides a non-procedural definition of fully trained or parameterized analytic models with sufficient information for an application to deploy them. By parsing the PMML using any standard XML parser the application can determine the types of data input to and output from the models, the detailed forms of the models, and how, in terms of standard data mining terminology, to interpret their results. Version 1.0 of the standard provides a small set of DTDs that specify the entities and attributes for documenting decision tree and multinomial logistic regression models. This is by no means a comprehensive set, and our expectation is that this standard will evolve very rapidly to cover a robust collection of model types. The purpose of publishing this limited set is to demonstrate the fundamentals of PMML with a realistic and useful "initial value" of what will emerge as a comprehensive and rich collection of modeling capabilities. Version 1.0 DTDs follow a common pattern of combining a data dictionary with one or more model definitions to which that dictionary immediately applies. As you will see, our dictionary elements are very primitive. We anticipate and look forward to subsequent versions of this standard introducing optimizations, such as bit vector expansions of categorical fields or log transforms of continuous fields, but we believe that before such optimizations can be included it is necessary to agree on minimally sufficient infrastructure. We also expect to provide definitions based on XML Schema definitions, once those become formal W3C recommendations."

"One major goal of PMML is to allow applications and on-line analytic processing tools to models obtained from multiple sources without having to deal with individual differences between those sources. Another goal is to enable combined, collaborative use of a potentially very large number of individual models and proactive administration of collections of models based on business needs as well as mathematical principles. We believe these capabilities are fundamental to effective deployment of analytic models in commercial application domains. PMML, or something very like it, is urgently needed to satisfy dramatically increased requirements for statistical and data mining tools and technologies in business systems."

"PMML Version 1.0 has been developed by a loose affiliation of Angoss, Magnify, NCR, SPSS, and The University of Illinois, Chicago. Our strategy is to turn this activity into a W3C working group and have PMML become a W3C recommendation. As part of W3C affiliation we expect to increase group membership to include other major players in the data mining tools and applications space." [As of July 1999, the PMML Consortium Committee had three institutional members: Magnify, Inc. & Magnify Research, the National Center for Data Mining, and Imperial College, London.]

A version 0.9 'Predictive Model Markup Language - Document Type Definition' (by Philip L. Hallstrom) is available for review. Magnify is providing an open source architecture for the PMML.

Details on the specification are provided in a PMML paper presented at the Armed Forces Communications and Electronics Association (AFCEA) '99 conference: "The Management and Mining of Multiple Predictive Models Using the Predictive Modeling Markup Language (PMML)," by Robert Grossman, Stuart Bailey, Ashok Ramu, Balinder Malhi, Michael Cornelison, Philip Hallstrom, and Xiao Qin. The authors "introduce a markup language based upon XML for working with the predictive models produced by data mining systems. The language is called the Predictive Model Markup Language (PMML) and can be used to define predictive models and ensembles of predictive models. It provides a flexible mechanism for defining schema for predictive models and supports model selection and model averaging involving multiple predictive models. It has proved useful for applications requiring ensemble learning, partitioned learning, and distributed learning. In addition, it facilitates moving predictive models across applications and systems. . . In particular, we feel that PMML is well suited for partition learning, meta-learning, distributed learning, and related areas. Models described using PMML consist of several parts: 1) a header, 2) a data schema, 3) a data mining schema, 4) a predictive model schema, 5) definitions for predictive models, 6) definitions for ensembles of models, 7) rules for selecting and combining models and ensembles of models, 8) rules for exception handling. Component 5) is required. In addition a schema for the predictive model must be defined. This can be done using one or more of the schemas - components 3, 4, and 5. The other components are optional."

Principal References

General: News, Articles, Papers, Early Drafts

  • [May 27, 2005]   New Release of Predictive Model Markup Language (PMML) from SourceForge.    SourceForge developers have issued two recent updates to Version 3 of the Predictive Model Markup Language (PMML). Considered to be the most widely deployed data mining standard, PMML is an XML markup language used to describe statistical and data mining models. Formally defined in a W3C XML Schema language, PMML "describes the inputs to data mining models, the transformations used prior to prepare data for data mining, and the parameters which define the models themselves. PMML is used for a wide variety of applications, including applications in finance, e-business, direct marketing, manufacturing, and defense. PMML is complementary to several other data mining standards: its XML interchange format is supported by XML for Analysis (XMLA), JSR 73, and 'SQL/MM Part 6: Data Mining'. As of PMML Version 3.0.2, the specification is said to represent a mature standard such that deployment through the creation of PMML scoring engines is now straight-forward. For PMML version 3.1 and following the development team will continue to add new statistical and data-mining models, reducing the need to use approved extension mechanisms. They also plan to enhance support for data preparation, which is still a labor-intensive task for some applications. PMML specification development has been advanced for several years by the independent, vendor-led Data Mining Group (DMG), though end user companies are now showing heightened interest. According to a published "Overview of PMML Version 3.0" by Stefan Raspl (IBM), PMML is an application and system independent interchange format for statistical and data mining models. More precisely, the goal of PMML is to encapsulate a model in an application and system independent fashion so that two different applications (the PMML Producer and Consumer) can use it. PMML Version 3.0 adds the ability to compose certain data mining operations. For example, the outputs of regression models can be used as the inputs to other models (model sequencing) and a decision tree or regression model can be used to combine the outputs of several embedded models (model selection)." Three new models in PMML Version 3 include rule sets, support vector machines, and text models.

  • [May 26, 2005] "PMML: Data Mining for the Masses? PMML Recasts the Data Warehouse as a Turnkey Platform for Real-Time Data Mining." By Stephen Swoyer. From Enterprise Systems (May 25, 2005) "It's been eight years in the making, but the predictive modeling mark-up language (PMML) is on the verge of going mainstream. PMML is an XML markup language that's used to describe statistical and data-mining models. Its principal selling point is that it gives PMML-compliant applications an easy way to share data models with other PMML-aware tools. It's stewarded by the Data Mining Group, an industry consortium that comprises a Who's Who of data mining and relational database vendors, including lots of familiar faces — such as IBM Corp., Microsoft Corp., Oracle Corp., SAP AG, SAS Institute Inc., SPSS Inc., and NCR Corp. subsidiary Teradata. One benefit, proponents say, is that PMML effectively brings data mining down from the mountain — i.e., the rarefied realm of the SAS or SPSS guru — and democratizes it. 'Data mining can be looked at as two major processes. The development of the model still has to be done by the SAS or SPSS experts, but PMML helps with the deployment, taking that data model and executing it,' says Arlene Zaima, advanced analytic program manager with Teradata. As a result, Zaima says, users who aren't familiar with the intricacies of SAS or SPSS can work effectively with pre-built PMML data models. 'The deployment [by users] is done much more frequently — daily or even up to the minute — and that's what PMML addresses: the execution for the model.' Like another long-gestating standard, the XML Query language (XQuery), PMML has been in the works for some time. But unlike XQuery, PMML has steadily evolved over time, starting with the PMML 1.1 release five years ago. Today, PMML is in version 3.0 and companies such as Teradata, IBM, Oracle, and Microsoft offer varying degrees of support for the technology..."

  • [May 01, 2005] " Enhancing the User Experience: MicroStrategy 8 builds on the BI Suite's Technical Merits While Becoming More Accessible to End Users." By Cindi Howson. From Intelligent Enterprise (May 1, 2005) "Version 8 of MicroStrategy's BI platform builds on an already solid architecture with more of the bells and whistles that business users love. The latest version includes a number of innovations that address past limitations, making the software easier to use, giving it access to a broader range of data and enhancing it with predictive analytics. MicroStrategy uses a number of techniques such as multipass SQL and analytic functions built into its Intelligent Server to answer complex business questions. To extend this capability further, MicroStrategy 8 introduces Data Mining Services with four types of statistical models: regression, neural networks, cluster models and decision trees. Across the industry, there have been many failed attempts to commoditize data mining. What MicroStrategy seems to have recognized is that while many people want to leverage statistical models, few people can build them. Thus with Data Mining Services, MicroStrategy leaves the model creation in the hands of experts. Statisticians use statistical packages (such as SPSS, SAS Enterprise Miner or IBM Intelligent Miner) to create a data mining model and then export it as PMML (Predictive Modeling Markup Language, a relatively new standard defined by the Data Mining Group). MicroStrategy then imports the PMML so that new metrics can use the model. So, for example, 'Churn Predictor' metrics can assign scores to customers to indicate how likely they are to switch to new service providers. Churn Predictors can then be added to any MicroStrategy report..."

  • [February 09, 2005] "MicroStrategy Verion 8 Gets Thumbs Up from Analysts, Users. BI Suite Boasts Improved Support for Heterogeneous Data Sources, New Data Mining Capabilities, and Integration With SAP BW." By Stephen Swoyer. From Enterprise Systems (February 09, 2005). "If it seems like years since MicroStrategy Inc. last delivered a major new release of its business intelligence suite, you're right. It was nearly three years ago. Last month the MacLean, Virginia-based company announced MicroStrategy 8, its first major business intelligence platform release. MicroStrategy bills version 8 as a significantly revamped offering that includes integrated reporting, analysis, and data-mining capabilities in addition to the company's bread-and-butter ROLAP technology. All told, version 8 includes more than 2,000 enhancements... MicroStrategy 8 shouldn't be used as a replacement for an existing data-mining tool. Instead, it uses the Predictive Model Markup Language (PMML) to import data-mining models from other tools. "MicroStrategy 8 doesn't discover the predictive model. That's still the job of the dedicated data-mining product from SAS or SPSS or IBM or Teradata; instead, we take the models these other tools discover and import them as a standard MicroStrategy metric'..."

  • [August 22, 2004] Online Proceedings of the Second Annual Workshop on Data Mining Standards, Services and Platforms. KDD-2004 Workshop on Data Mining Standards, Services and Platforms (DM-SSP 04). August 22, 2004, Seattle, WA, USA. From the Preface: "This year marks the fourth year that there has been a KDD workshop on the Predictive Model Markup Language (PMML) and related areas and the second year of a broader conference with the theme of Data Mining Standards, Services and Platforms. It's perhaps useful to think of the role played by the relational database model and the standard infrastructure provided by relational databases in the theory and practice of databases. The field of data mining is in some sense very far from either a theory or a standard infrastructure for data mining. On the other hand, from another perspective one of the goals of PMML was to create a standard interface between producers of models, such as statistical or data mining systems, and consumers of models, such as scoring engines, applications containing embedded models, and other operational systems. There are now quite a few vendors shipping scoring engines, which is an important measure of success in this area. For the past several years, the developers of PMML have been working to create a similar mechanism so that the transformations and compositions required in the data processing, which are so essential to data mining, can be similarly encapsulated. This is one of the themes of this year's workshop. As a standard architecture for scoring and a standard architecture for data preparation emerges, we are one step closer to a standard infrastructure for data mining..."

  • [August 22, 2004] "A Simple Strategy for Composing Data Mining Operations." By Robert L. Grossman (University of Illinois at Chicago and Open Data Partners and David Hanley University of Illinois at Chicago) and Gregor Meyer (IBM). Presented at KDD-2004 Workshop on Data Mining Standards, Services and Platforms (DM-SSP 04), Sunday, August 22, 2004. "An important element in data preparation is composing data mining operations. In this note, we discuss some of the issues involved when composing data mining operations. We also describe the support in PMML Version 3.0 for two of the most common type of compositions: using the output of one model as the input to another model (model sequencing) and using one model to select one or more other models (model selection or averaging)... It is now standard in data mining to view the data mining process as consisting of several steps, some of the most important of which are: data preparation, data modeling, and scoring or deployment. Today, there are well defined architectures and standards for scoring. In particular, the Predictive Model Markup Language or PMML provides a clean interface between producers of models, such as a statistical or data mining system, and consumers of models, such as a scoring system or application that employs embedded analytics..."

  • [August 22, 2004] "An Overview of PMML Version 3.0." By Stefan Raspl (IBM). Presented at KDD-2004 Workshop on Data Mining Standards, Services and Platforms (DM-SSP 04), Sunday, August 22, 2004. "This paper gives an overview of some of the changes in Version 3.0 of the Predictive Model Markup Language (PMML), which is expected to be released in 2004. PMML Version 3.0 adds several new models, including models for rule sets and text mining. It also adds the ability to compose certain data mining operations. For example, in PMML Version 3.0 the outputs of regression models can be used as the inputs to other models (model sequencing) and a decision tree or regression model can be used to combine the outputs of several embedded models (model selection)... PMML is an application and system independent interchange format for statistical and data mining models. More precisely, the goal of PMML is to encapsulate a model in an application and system independent fashion so that two different applications (the PMML Producer and Consumer) can use it... Perhaps the most significant changes to PMML 3.0 is the support for model composition through model sequencing and model selection. Together with the improved support for built-in functions and user-defined functions, Version 3.0 of PMML now provides a much more powerful platform for data preparation. PMML 3.0 also adds several new model types: support vector machines, text models, and rule sets..."

  • [April 28, 2004] "SAS and IBM Demonstrate Industry Leadership in Data Mining Interoperability." In Online News published by DMReview.com (April 28, 2004). "SAS, a leader in business intelligence, announces the immediate availability of enhanced interoperability between SAS' and IBM's data mining offerings. SAS and IBM are the first companies to deliver production software that implements an extended version of the Predictive Model Markup Language (PMML) 2.1 standard, providing customers with the capability to rapidly integrate new predictive and descriptive models in operational systems without manual interference. SAS Enterprise Miner for SAS9 generates PMML for more modeling algorithms including linear and logistic regression, decision trees, neural networks, clustering and associations than does any other solution on the market. SAS Enterprise Miner PMML has been enhanced beyond PMML 2.1 to support models including SAS formats, datatypes, and functions, as well as additional modeling extensions. IBM has tested the deployment of more than 160 different SAS Enterprise Miner PMML models with the IBM DB2 Intelligent Miner Scoring engine. PMML model exchange is available in the current release of SAS Enterprise Miner data mining workbench in combination with IBM's Intelligent Miner Scoring, an optional feature of IBM DB2 Universal Database (UDB) V8.1. The cooperative relationship between SAS and IBM enables both companies to utilize each other's strengths in the data mining arena..."

  • [August 01, 2002] "Data Mining Standards Initiatives." By Robert L. Grossman (Laboratory of Advanced Computing and the National Center for Data Mining, University of Illinois at Chicago), Mark F. Hornick (Data Mining Technologies, Oracle Corp), and Gregor Meyer (Business Intelligence Unit, IBM Corp., San Jose, CA). In Communications of the ACM (CACM) Volume 45, Issue 8 (August 2002), pages 59-61. Special Issue: Evolving Data Mining Into Solutions For Insights. ['Lacking standards for statistical and data mining models, applications cannot leverage the benefits of data mining.'] Early draft version. "... The Predictive Model Markup Language (PMML) is an XML standard being developed by the Data Mining Group, a vendor-led consortium established in 1998 to develop data mining standards. PMML represents and describes data mining and statistical models, as well as some of the operations required for cleaning and transforming data prior to modeling. PMML aims to provide enough infrastructure for an application to be able to produce a model (the PMML producer) and another application to consume it (the PMML consumer) simply by reading the PMML XML data file... PMML consists of the following components: (1) Data dictionary. Defines the input attributes to models and specifies each one's type and value range. (2) Mining schema. Precisely one in each model, listing the schema's attributes and their role in the model; these attributes are a subset of the attributes in the data dictionary. The schema contains information specific to a certain model, while the data dictionary contains data definitions that do not vary by model. It also specifies an attribute's usage type, which can be active (an input of the model), predicted (an output of the model), or supplementary (holding descriptive information and ignored by the model). (3) Transformation dictionary. Can contain any of the following transformations: normalization (mapping continuous or discrete values to numbers); discretization (mapping continuous values to discrete values); value mapping (mapping discrete values to discrete values); and aggregation (summarizing or collecting groups of values, such as by computing averages). (4) Model statistics. Univariate statistics about the attributes in the model. (5) Models. Model parameters specified by tags. PMML v.2.0 includes regression models, cluster models, trees, neural networks, Bayesian models, association rules, and sequence models... In PMML v.2.0, inputs to PMML models can be DataFields defined in a data dictionary or DerivedFields defined in the transformation dictionary. The consensus among Data Mining Group members is that the transformation dictionary is powerful enough for capturing the process of preparing data for statistical and data mining models... The main reason so many different data representation and data communication standards exist today is that data mining is used in so many different ways and in combination with a so many different systems and services, many requiring their own separate often-incompatible standards. Although some vendor- led efforts have sought to homogenize terminology and concepts among standards, more work is indeed required. Relatively narrow XML standards, such as PMML, serve as common ground for several emerging standards. For example, SQL/MM Part 6: Data Mining, JSR-73, CWM, and Microsoft's Analysis Services all use PMML in their specifications, providing a base level of compatibility among them all. Meanwhile, two major challenges top the data mining standards agenda: agreeing on a common standard for cleaning, transforming, and preparing data for data mining (PMML v.2.0 represents a first step in this direction); and agreeing on a common set of Web services for working with remote and distributed data (an effort only just beginning)..."

  • PMML 1.0 Ratified

  • PMML 1.0 Overview, [local archive copy]

  • PMML 1.0 DTD, [local archive copy]

  • PMML 1.0 Spec and Documentation, [local archive copy]

  • Software Repository

  • PMML Page at Magnify.Com

  • PMML 0.9 DTD in .ZIP format

  • PMML 0.9 DTD in plain text

  • [January 05, 2001] "D-Miner and PMML." Contact: Dietrich Wettschereck. "PMML is an XML format describing data mining models. D-Miner is a data mining software system to carry out data mining projects. One feature of D-Miner is its plug in concept in order to easily integrate new software modules into the system... D-Miner is a data mining tool to investigate data and carry out data mining tasks. It includes process management in order to repeat stored mining tasks. It also consists several data management functions. And with its plug in concept, it is easy to integrate new algorithms into the system... D-Miner supports the PMML standard in two ways. First, a plug in module can be used to import or export PMML files. Second a PMML API, as part of the plug in API, provides immediate access to PMML models. The latter solution allows the direct exchange of PMML models between the plug in modules and the D-Miner system..." [Wettschereck has developed a number of machine learning algorithms and contributed to the development of the Data Mining system Kepler. Recently, he has done extensive work on extending PMML for first order rule models and assisted a number of researchers in enhancing their systems to output PMML. He also developed a system for the visualization and evaluation of PMML-based models.]

  • "Data Mining Consortium Develops Predictive Modeling Open Standard To Enable Model Sharing Between Different Vendors' Applications." - "The Data Mining Group (DMG), a consortium of technology organizations, today announces the first version of an XML-based open standard for defining predictive models. Marking an industry first, the Predictive Modeling Markup Language (PMML) provides a quick and easy way for companies to define predictive models and share models between compliant vendors' applications. The founding consortium companies include: Magnify, Chicago; SPSS Inc., Chicago; Angoss Software Corp., Toronto; NCR Corp., Dayton, Ohio; and the National Center for Data Mining (NCDM) at the University of Illinois at Chicago (UIC). DMG founders invite other vendors and interested parties to participate in the proposed W3C standards initiative."

  • [September 21, 2000] "Oracle Backs New XML Standard to 'Mindmeld'." - "Soon, data mining software from different companies will be able to "mindmeld" by exchanging information and the context of that information with each other. Today, information in one data mining application is locked away from others because there is no standard way of exchanging it. In order to eliminate the roadblock, Oracle is backing a new, vendor-independent XML-based standard called PMML (Predictive Model Markup Language). PMML joins several other important standards Oracle is backing to drive more cohesive, real-time e-business intelligence - so that companies can live long and prosper..."

  • [March 07, 2000] "Microsoft and Leading Data Mining Vendors Line Up In Support of Data Mining Specification. OLE DB for Data Mining Beta Now Broadly Available." - "Microsoft Corporation today announced the beta release of the OLE DB for Data Mining specification, a protocol based on the SQL language, that provides software vendors and application developers with an open interface to more efficiently integrate data mining tools and capabilities into line-of-business and e-commerce applications. A dozen leading data mining and business intelligence vendors announced their support for the new protocol, which will enable diverse data mining products to more easily exchange data and results and allow developers to more easily incorporate data mining technology into existing data warehousing solutions. OLE DB for Data Mining has been under vendor review and modification since its introduction last May at Tech Ed '99 and now incorporates the Predictive Model Markup Language (PMML) standards from the Data Mining Group, an industry consortium that facilitates the creation of useful standards for the data mining community. PMML is an XML-based language that provides a quick and easy way for organizations to define and share data mining models between compliant vendors' applications. 'By incorporating the PMML standard, Microsoft has further strengthened an open specification for bringing data mining into analytical applications,' said Jack Noonan, president and CEO of SPSS. 'This means that a much broader group will now have a simple and accessible way to incorporate data mining models into the applications they build, increasing analytical power without increasing complexity'." . . . The beta specification for OLE DB for Data Mining is currently available at the Microsoft Web site... Data mining model (DMM): The DMM is like a relational table, except that it contains special columns that can be used for data training and prediction-making -- the DMM is the workhorse that both creates your prediction model and generates your predictions. Unlike a standard relational table, which stores raw data, the DMM stores the patterns discovered by your data mining algorithm. And for Web developers working on Web-based data mining projects, all of the structure and content of a DMM can be expressed as an XML string."

  • [November 08, 1999] "Data Mining Group and IBM Extend Open Standard for Sharing Predictive Models." - "The Data Mining Group (DMG), a consortium of technology organizations, today announced IBM has joined the DMG in its endeavors to extend the Predictive Modeling Markup Language (PMML), the XML-based open standard for defining predictive models. PMML is the first language to provide a quick and easy way for companies to define predictive models and share models between compliant vendors' applications. IBM joins founding consortium companies Magnify, Chicago, SPSS Inc., Chicago, Angoss Software Corp., Toronto, NCR Corp., Dayton, Ohio, and the National Center for Data Mining (NCDM) at the University of Illinois at Chicago (UIC). 'PMML provides a simple, open framework for working with and exchanging predictive models so that clicks and mortar companies can more quickly exploit the information they discover in their online and traditional data,' said Robert Grossman, chairman of the DMG and president of Magnify, Inc. 'The momentum provided by IBM's support of this standard should make it easier for all companies to build business intelligence into their IT infrastructure.' Today, customers like Mellon Bank, the University of Pennsylvania, Loyalty Consulting and the Bank of Montreal are using IBM data mining solutions to solve real-world business problems. 'PMML allows users to develop models within one vendor's application, and use other vendors' applications to visualize, analyze, evaluate or otherwise use the models. Previously, this was virtually impossible, but with PMML, the exchange of models between compliant applications now will be seamless. The DMG currently is working on PMML Version 1.1, an XML-based language providing applications a vendor-independent method of defining models so that proprietary issues and incompatibilities are no longer a barrier to the exchange of models between applications. Predictive models express the patterns of information discovered in data mining, which companies then may use to develop specific strategies from which they can realize increased profitability."

  • PMML Presentation at the AFCEA 99 Conference - "The Management and Mining of Multiple Predictive Models Using the Predictive Modeling Markup Language (PMML)," by Robert Grossman, Stuart Bailey, Ashok Ramu, Balinder Malhi, Michael Cornelison, Philip Hallstrom, and Xiao Qin. See also the PMML PowerPoint presentation slides. [In 1998, Magnify co-founded the Data Mining Group, a vendor led consortium which develops open standards for data mining and business intelligence applications, such as the Predictive Model Markup Language (PMML).]

  • libPMML 0.9 Reference Implementation From The National Center for Data Mining (NCDM). 'libPMML 0.9 is a reference implementation of C++ and Java based parsing utilities for the PMML 0.9 specification'

  • "Providing Support for Resource Management Tools in a Wide Area High Performance Distributed Data Mining System." By Balinder Singh Malhi. EECS 598 : Master's Thesis in Computer Science. Laboratory for Advanced Computing, University of Illinois at Chicago. Thesis also in Postscript. [local archive copy, text only],

  • [August 13, 1998] "Supporting the Data Mining Process with Next Generation Data Mining Systems." By Robert Grossman. From Enterprise Systems (August 13, 1998). Data mining is an emerging technology for the automatic extraction of patterns, associations, changes, anomalies and significant structures from data. Data mining is emerging as one of the key technologies enabling businesses to select, filter, screen, correlate and fuse data automatically. Most of the value of data mining comes from using data mining technology to improve predictive modeling. For example, data mining can be used to generate predictive models automatically, which predict how much profit prospects and customers will provide and how much risk they entail from fraud, bankruptcy, charge-off and related problems... Broadly speaking, there are two cultures in data mining: the knowledge discovery (KD) culture and predictive modeling (PM) culture. In the KD culture, the output are rules. In the PM culture, the output are predictive models. In both cultures, the input are learning sets. The goal of both cultures is to automate as much of the process of data mining as possible. In practice, the data mining process is not completely automatic, but rather a semi-automated process... Fourth-generation data mining systems are characterized by being able to mine data generated by embedded, mobile and ubiquitous computing devices. For example, a sales rep using a mobile computing device can enter information at a client's office. A fourth generation data mining system could then provide an appropriate cross-selling suggestion... There has been some work recently on understanding the appropriate interface between data mining systems and predictive modeling systems. An XML markup language called the Predictive Model Markup Language (PMML) has been proposed as a suitable interface..."

  • "The data miner's arcade: Pluggable data mining." By Graham J Williams. Technical report, CSIRO Mathematical and Information Sciences, 1998. The Data Miner's Arcade is a Java-based environment for data mining. It implements an Object-Oriented model for the Data Mining process, with standard interfaces for accessing data and for delivering results. By developing standards, new tools can plug into the environment with a minimum of effort, providing `Plug-n-Play' opportunities with new tools as they become available. Data can be accessed from Database systems through ODBC and JDBC, or from other sources and managed internally within the Arcade. The Extensible Markup Language (XML) is used as the target "language'' for all Data Mining tools within the environment. The Predictive Modelling Markup Language (PMML) developed by UIC is an example of the XML markup that the system handles. Data Mining tools produce as their output documents that conform to PMML. These can then be visualised, run, or combined with other models as appropriate, all within The Data Miner's Arcade environment."

  • PMML 0.9 Description "PMML 0.9 is a "proof of concept" specification of the PMML language. This specification defines the intended interpretation of PMML 0.9 elements, and places further constraints on the permitted syntax, which are otherwise inexpressible in the DTD. A PMML 0.9 document is stored in a file with a .pmml extension. Also the XML DTD for PMML 0.9 is specified using XML and stored in a file with a .dtd extension." [local archive copy]

  • Sample PMML Document [local archive copy]

  • The Data Mining Group - 'A consortium of industry and academics formed to facilitate the creation of useful standards for the data mining community.'

  • XML Interest Group (XIG) "The XML Interest Group fosters NPACI-wide interactions, collaborations, and activities on the use of XML, and development of XML-based technologies, in NPACI."

  • Cf. MIX: Mediation of Information Using XML

  • Magnify contact: pmml@magnify.com


Hosted By
OASIS - Organization for the Advancement of Structured Information Standards

Sponsored By

IBM Corporation
ISIS Papyrus
Microsoft Corporation
Oracle Corporation

Primeton

XML Daily Newslink
Receive daily news updates from Managing Editor, Robin Cover.

 Newsletter Subscription
 Newsletter Archives
Globe Image

Document URI: http://xml.coverpages.org/pmml.html  —  Legal stuff
Robin Cover, Editor: robin@oasis-open.org