### Present Remotely

Send the link below via email or IM

• Invited audience members will follow you as you navigate and present
• People invited to a presentation do not need a Prezi account
• This link expires 10 minutes after you close the presentation

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

# Session 4

Logistic Regression
by

## Anuj Thakur

on 23 August 2013

Report abuse

#### Transcript of Session 4

Logistic Regression
Search for Signal
This step is for the Understanding of the Modeler....
Multicollinearity Check
Making Sure your IV's are Truly Independent

Risk analysis : Default Prediction
Industry Application
When Business Objective /Target involves categorization of observations into Groups.

Example
Building a Model to predict the Response of Campaign i.e to predict Whether customer will Respond( category 1) or Not ( Category 2)

Where to Apply Logistic Model

Campaign Targetting
HR and IT : Attrition Prediction

As a
Logistic Regression
Process
Filtering Excercise : Finding the relationship of IV's with the Target
Target Variable
Nature : Categoric
Independent var
Nature : categoric
Independent Variable
Nature : Numeric
Output of this Step
List of few Imp Variables that can predict the Target variables
Target Variable
Independent Variable
Categoric
Categoric
Independent Variable
Numeric
Chi-Square
Annova
Search for the Signal
Step 1 :Division of Dataset to form Training and Testing Dataset
Training
Testing
Original Dataset
Step 2: Creation Of Dummy Variables( if required)
Use VIF parametre do decide on the variables that are exhibiting the Multicollinearity . If VIF for a variable VIF is greater than 10, it indicates a strong degree of correalation with atleast one other variable in the model.
Performing a Demo run to understand the health of Model.
Model Thumb Rules
Thumb Rule 4:
Somers' D >=.5
Thumb Rule 3:
Variables
P value < .05
Thumb Rule 1:

"Satisfied"
Model Convergence status

Thumb Rule 2:
Model P value < .05

Thumb Rule 5:
H&L p value >.05
For F and FirstPruch Vif>10 that means mulicollinearity is present.
Model Selection
We will use Stepwise Regression Algorithm
to help identify optimal Model combination
According to Stepwise;
these 5 variables are
the best possible
combination to predict
the target
!!!Last Lap!!!
Deciding the cutoff probability value
Testing the stability of Model
on Validation data

Right now predicted output value is Probability value ;
but Prediction should be in Binary format.
eg :
Whether People are buying Florence (Class1) or Not(Class 0)

To do this conversion; Cutoff probability value is required.
Rule : Choose the Cutoff value where
Sensitivity
and
Specificity
value are close to each other.
At .060 probability value ,
Sensitivity and specificity
are closest to each other
ROC can also be used to decide the cutoff
Sensitivity and Specificity can both be 70%
Specificity
Sensitivity

On Training data set also
we are getting 70%
specificity and sensitivity
Good News !!
Model is Stable
All p values are <.05
Sommers' D > .5
H&L > .05
;
;
Full transcript