Model Development R/Rattle allows for reliable data manipulation and model development The Making of a Predictive Solution Devopment
Execution Model Deployment PMML allows for easy expression and deployment of data transformations and data mining models Execution Instant execution of solutions via Web Console, Web Services and Excel Phases of the CRISP-DM Process Model Rattle Developed by Togaware - Australia
Offers a simple graphical interface for data analytics and model development
Great way to learn R as it shows produced R code
It is open source
Exports models in PMML Predictive Model Markup Language Standard used to represent data mining models
Avoid proprietary issues and incompatibilities
Share models between compliant applications
Eliminates need for custom model deployment PMML Structure PMML defines a standard not only to represent data-mining models, but also data transformations A Data Dictionary defines all the input data fields
Several data transformations strategies allow for intelligent extraction of feature detectors
A comphreensive list of data mining models offers power and flexibility
Model explanation allows for performance evaluation Industry Support Mature Standard - current version 4.0
Data Mining Group (dmg.org)
Active group and constant enhancements
Vendor independent consortium
One Standard, One Process A (new) first book on PMML Model Execution via iPhone Zementis Contributions ADAPA: Decisioning Engine available for on-site and cloud deployments
Excel Add-in: scores from within Excel
Member of the DMG: helping to shape PMML
Code contributor for the R PMML Package
PMML Book: available on Amazon
PMML Articles: R Journal and others PMML Thank you! R The yin yang of model deployment Rules + Predictive Analytics
= Enhanced Decisioning Rules = Expert Knowledge Logic used by experts to solve problems can be represented as business rules: When body temperature more than X AND blood pressure more than Y, then ... Predictive Analytics = Data-Driven Knowledge Based on the ability to automatically recognize patterns in data not obvious to the expert eye. Learn from past behavior present in historical data to predict the future. Ideal World: Seamless Integration of Predictive Analytics and Business Rules Development Knock, Knock.
"FBI. You're under arrest."
"But I haven't done anything"
"You will if we don't arrest you," replied Agent Smith of the Precrime Squad. Minority Report - 20th Century Fox 2002
Rules and Predictive Analytics
Predictive Solutions Fish Processing Plant Goal: Automate the process of sorting incoming fish according to species (salmon or sea bass) From [Duda, Hart and Stork, 2001] Association Rules
Naive Bayes Classifiers
Support Vector Machines
Time-Series Outline Popular Techniques Thank you! Introduction Deployment Predictive Solution in R 1) Raw Data: Obtain images of salmon and sea bass (as to be implemented in production).
2) Preprocessing: Image Processing Algorithms (e.g. segmentation to separate fish from background).
3) Feature Extraction: Through data analysis we find out that 1) on average, sea bass is larger than salmon; and 2) salmon has a higher scale intensity.
4) Model Training: Select a predictive technique and train the model to classify incoming fish based on extracted features.
5) Rules: Create business strategies around model output probabilities. Length of fish Intensity of the scales Model Results IF probality of salmon > 95 AND probability of sea bass < 5
THEN assign fish to [Premier Salmon Conveyor Belt]
IF probability of salmon > 80 AND probability of sea bass < 10
THEN assign fish to [Special Salmon Conveyor Belt]
IF probability of salmon > 60 AND probability of sea bass < 20
THEN assign fish to [Ordinary Salmon Conveyor Belt]
... Business Rules in Our Proposed Solution Enhanced Decisioning 1 Enhanced Decisioning 2 Enhanced Decisioning 3 Model Execution Development
Execution Our Solution at Work Closing Remarks Accessible from Anywhere Fish Plant PMML CodeSee the full transcript