Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Comparing alternative classifiers for database marketing: Th

No description
by

duygu güney

on 28 April 2014

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Comparing alternative classifiers for database marketing: Th

Introduction
Literature Review
Algorithms for binary classification
Logistic Regression, Neural Networks and CHAID
Binary classification
Application in direct marketing
Comparing alternative classifiers for database marketing: The case of imbalanced datasets
Başak Yüksel
Duygu Güney
Irmak Göksu
Sercan Taş

Outline
Introduction

Literature Review

Algorithms for binary classification

Performance criteria for binary classification

Application in direct marketing

Conclusion


Conclusion
Logistic regression
Neural networks
CHAID
Preferred in direct marketing classification models
Good for both balanced and imbalanced data sets
The weights of inputs to give the closest value of the output.
Highly efficient statistical technique
Most preferable
Hit Rate, Capture Rate, Lift

Having more negative class the training set

CHAID is successful

Comparing performance of models on an unseen data from
a later period in time

Propensities of bank customers to buy an unpopular product
A sample data set: 2826 positive, 14130 negative
Training set (70%) and Test set (30%)
6 different imbalance figures (finding optimum composition)

3 Data Mining Algorithms
Logistic Regression (logit), Neural Network (NN), CHAID







Results for whole set, 1, 5 and 10 percentiles
Cutoff value for positive class: 50 and above

Performance criteria results on the data set
Performances on future data
412 product buyers, 169777 non-buyers
(sales ratio: 0.2 %)
Bank’s budget limit: 4120 customers
(Top 2.4%)

Buyers , Non-Buyers of a particular investment product of a bank

The Confusion Matrix
Possible outcomes of a Binary Classifier
True Positive and True Negative
False Positives = Observations that are actually negatives(-) but that the classifier labels as positive(+)
False Negatives = Positive observations(+) that the classifier labels as negative(-)

Confusion Matrix
TP, FN, FP, TN

Hit rate = TP/(TP+FT)

Accuracy = (TP + TN)/ (TP+FN+FP+FN)

Capture rate = TP/ (TP+FN)

Lift Rate

AUC (The ROC Curve)
Evaluate the performance of a classification model
Confusion Matrix
Cross Selling


Up Selling

Right performance measures which can be done by determining the right balance of a train set.

Imbalanced data set includes more minority class.

Performs better when determining the balance.
Object
Act of selling an additional product to
an existing customer
Tries to identify the customers who can
increase the volume of a particular product

Imbalance problem
Re-sampling
1-What should be the balance of the training set for a better classification.
2-How the performance of the alternative models should be measured and compared
3-Where the model performances should be measured
Q & A

Questions
Full transcript