The Cover PagesThe OASIS Cover Pages: The Online Resource for Markup Language Technologies
SEARCH | ABOUT | INDEX | NEWS | CORE STANDARDS | TECHNOLOGY REPORTS | EVENTS | LIBRARY
SEARCH
Advanced Search
ABOUT
Site Map
CP RSS Channel
Contact Us
Sponsoring CP
About Our Sponsors

NEWS
Cover Stories
Articles & Papers
Press Releases

CORE STANDARDS
XML
SGML
Schemas
XSL/XSLT/XPath
XLink
XML Query
CSS
SVG

TECHNOLOGY REPORTS
XML Applications
General Apps
Government Apps
Academic Apps

EVENTS
LIBRARY
Introductions
FAQs
Bibliography
Technology and Society
Semantics
Tech Topics
Software
Related Standards
Historic
Created: May 27, 2005.
News: Cover StoriesPrevious News ItemNext News Item

New Release of Predictive Model Markup Language (PMML) from SourceForge.

SourceForge developers have issued two recent updates to Version 3 of the Predictive Model Markup Language (PMML). Considered to be the most widely deployed data mining standard, PMML is an XML markup language used to describe statistical and data mining models.

PMML is formally defined in a W3C XML Schema language. It "describes the inputs to data mining models, the transformations used prior to prepare data for data mining, and the parameters which define the models themselves. PMML is used for a wide variety of applications, including applications in finance, e-business, direct marketing, manufacturing, and defense. PMML is complementary to several other data mining standards: its XML interchange format is supported by XML for Analysis (XMLA), JSR 73, and 'SQL/MM Part 6: Data Mining'.

As of PMML Version 3.0.2, the specification is said to represent a mature standard such that deployment through the creation of PMML scoring engines is now straight-forward. For PMML version 3.1 and following the development team will continue to add new statistical and data-mining models, reducing the need to use approved extension mechanisms. They also plan to enhance support for data preparation, which is still a labor-intensive task for some applications.

PMML specification development has been advanced for several years by the independent, vendor-led Data Mining Group (DMG), though end user companies are now showing heightened interest. DMG full members as of 2005-04 included IBM Corp; KXEN; Magnify Inc; Microsoft; MicroStrategy Inc.; National Center for Data Mining, University of Illinois at Chicago Oracle Corporation; Prudential Systems Software; Salford Systems; SAS Inc; SPSS Inc; StatSoft, Inc. Associate members include Angoss Software Corp; Insightful Corp; NCR Corp; Quadstone; Urban Science; SAP. Support of PMML in software products is provided by several of these members, and others who desire an XML interchange format for statistical and data mining models.

According to a published "Overview of PMML Version 3.0" by Stefan Raspl (IBM), PMML is an application and system independent interchange format for statistical and data mining models. More precisely, the goal of PMML is to encapsulate a model in an application and system independent fashion so that two different applications (the PMML Producer and Consumer) can use it. PMML Version 3.0 adds the ability to compose certain data mining operations. For example, the outputs of regression models can be used as the inputs to other models (model sequencing) and a decision tree or regression model can be used to combine the outputs of several embedded models (model selection)."

Three new models in PMML Version 3 include rule sets, support vector machines, and text models. "Ruleset models can be thought of as flattened decision tree models, but cover areas where decision trees are not handy or are too limited. Rulesets can be applied to new instances to derive predictions and associated confidences (scoring). Support vector machines define hyperplanes, which try to separate the values of a given target field. The hyperplanes are defined using kernel functions. The most popular kernel types are supported: linear, polynomial, radial basis and sigmoid; they can be used for both classification and regression."

The PMML Version 3 text model consists of the following components: (1) text dictionary that contains the terms in the model; (2) corpus of text documents which identifies the actual texts that are covered by a model; (3) document-term matrix that specifies which terms are used in which document; (4) text model normalization element defining one of several possible normalizations of the document term matrix; (5) text model similarity element to define the similarity used to compare two vectors representing documents.

The PMML specification has undergone successive refinement since (at least) 1997; a version 0.7 developed by National Center for Data Mining (NCDM) at the University of Illinois at Chicago was released in July 1997. A variety of PMML version 0.9 applications were demonstrated at Supercomputing 1998. Version 1.0 was developed by Angoss, Magnify, NCR, SPSS, and The National Center for Data Mining. IBM joined the effort in 1999; Microsoft and Oracle joined in 2000. PMML developers began to use Source Forge for PMML Version 2.1 schemas, documentation, and associated utilities in June 2002.

The KDD-2004 Online Proceedings volume notes that the August 2004 DM-SSP Workshop "marks the fourth year that there has been a KDD workshop on the Predictive Model Markup Language (PMML) and related areas and the second year of a broader conference with the theme of Data Mining Standards, Services and Platforms. One of the goals of PMML was to create a standard interface between producers of models, such as statistical or data mining systems, and consumers of models, such as scoring engines, applications containing embedded models, and other operational systems. There are now quite a few vendors shipping scoring engines, which is an important measure of success in this area. For the past several years, the developers of PMML have been working to create a similar mechanism so that the transformations and compositions required in the data processing, which are so essential to data mining, can be similarly encapsulated."

Principal references:


Hosted By
OASIS - Organization for the Advancement of Structured Information Standards

Sponsored By

IBM Corporation
ISIS Papyrus
Microsoft Corporation
Oracle Corporation

Primeton

XML Daily Newslink
Receive daily news updates from Managing Editor, Robin Cover.

 Newsletter Subscription
 Newsletter Archives
Bottom Globe Image

Document URI: http://xml.coverpages.org/ni2005-05-27-a.html  —  Legal stuff
Robin Cover, Editor: robin@oasis-open.org