Loading presentation...

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.


Session 4

Logistic Regression

Anuj Thakur

on 23 August 2013

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Session 4

Logistic Regression
Search for Signal
This step is for the Understanding of the Modeler....
Prepare your Data
Multicollinearity Check
Making Sure your IV's are Truly Independent

Risk analysis : Default Prediction
Industry Application
Business Understanding
When Business Objective /Target involves categorization of observations into Groups.

Building a Model to predict the Response of Campaign i.e to predict Whether customer will Respond( category 1) or Not ( Category 2)

Where to Apply Logistic Model

Campaign Targetting
HR and IT : Attrition Prediction

As a
Logistic Regression
Filtering Excercise : Finding the relationship of IV's with the Target
Target Variable
Nature : Categoric
Independent var
Nature : categoric
Independent Variable
Nature : Numeric
Output of this Step
List of few Imp Variables that can predict the Target variables
Target Variable
Independent Variable
Independent Variable
Search for the Signal
Step 1 :Division of Dataset to form Training and Testing Dataset
Original Dataset
Step 2: Creation Of Dummy Variables( if required)
Use VIF parametre do decide on the variables that are exhibiting the Multicollinearity . If VIF for a variable VIF is greater than 10, it indicates a strong degree of correalation with atleast one other variable in the model.
Performing a Demo run to understand the health of Model.
Model Thumb Rules
Thumb Rule 4:
Somers' D >=.5
Thumb Rule 3:
P value < .05
Thumb Rule 1:

Model Convergence status

Thumb Rule 2:
Model P value < .05

Thumb Rule 5:
H&L p value >.05
For F and FirstPruch Vif>10 that means mulicollinearity is present.
Model Selection
We will use Stepwise Regression Algorithm
to help identify optimal Model combination
According to Stepwise;
these 5 variables are
the best possible
combination to predict
the target
!!!Last Lap!!!
Deciding the cutoff probability value
Testing the stability of Model
on Validation data

Right now predicted output value is Probability value ;
but Prediction should be in Binary format.
eg :
Whether People are buying Florence (Class1) or Not(Class 0)

To do this conversion; Cutoff probability value is required.
Rule : Choose the Cutoff value where
value are close to each other.
At .060 probability value ,
Sensitivity and specificity
are closest to each other
ROC can also be used to decide the cutoff
Sensitivity and Specificity can both be 70%

On Training data set also
we are getting 70%
specificity and sensitivity
Good News !!
Model is Stable
All p values are <.05
Sommers' D > .5
H&L > .05
Full transcript