### Present Remotely

Send the link below via email or IM

• Invited audience members will follow you as you navigate and present
• People invited to a presentation do not need a Prezi account
• This link expires 10 minutes after you close the presentation

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

# Comparing alternative classifiers for database marketing: Th

No description
by

## duygu güney

on 28 April 2014

Report abuse

#### Transcript of Comparing alternative classifiers for database marketing: Th

Introduction
Literature Review
Algorithms for binary classification
Logistic Regression, Neural Networks and CHAID
Binary classification
Application in direct marketing
Comparing alternative classifiers for database marketing: The case of imbalanced datasets
Başak Yüksel
Duygu Güney
Irmak Göksu
Sercan Taş

Outline
Introduction

Literature Review

Algorithms for binary classification

Performance criteria for binary classification

Application in direct marketing

Conclusion

Conclusion
Logistic regression
Neural networks
CHAID
Preferred in direct marketing classification models
Good for both balanced and imbalanced data sets
The weights of inputs to give the closest value of the output.
Highly efficient statistical technique
Most preferable
Hit Rate, Capture Rate, Lift

Having more negative class the training set

CHAID is successful

Comparing performance of models on an unseen data from
a later period in time

Propensities of bank customers to buy an unpopular product
A sample data set: 2826 positive, 14130 negative
Training set (70%) and Test set (30%)
6 different imbalance figures (finding optimum composition)

3 Data Mining Algorithms
Logistic Regression (logit), Neural Network (NN), CHAID

Results for whole set, 1, 5 and 10 percentiles
Cutoff value for positive class: 50 and above

Performance criteria results on the data set
Performances on future data
(sales ratio: 0.2 %)
Bank’s budget limit: 4120 customers
(Top 2.4%)

The Confusion Matrix
Possible outcomes of a Binary Classifier
True Positive and True Negative
False Positives = Observations that are actually negatives(-) but that the classifier labels as positive(+)
False Negatives = Positive observations(+) that the classifier labels as negative(-)

Confusion Matrix
TP, FN, FP, TN

Hit rate = TP/(TP+FT)

Accuracy = (TP + TN)/ (TP+FN+FP+FN)

Capture rate = TP/ (TP+FN)

Lift Rate

AUC (The ROC Curve)
Evaluate the performance of a classification model
Confusion Matrix
Cross Selling

Up Selling

Right performance measures which can be done by determining the right balance of a train set.

Imbalanced data set includes more minority class.

Performs better when determining the balance.
Object
Act of selling an additional product to
an existing customer
Tries to identify the customers who can
increase the volume of a particular product

Imbalance problem
Re-sampling
1-What should be the balance of the training set for a better classification.
2-How the performance of the alternative models should be measured and compared
3-Where the model performances should be measured
Q & A

Questions
Full transcript