Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
Transcript of Session 4
Search for Signal
This step is for the Understanding of the Modeler....
Prepare your Data
Making Sure your IV's are Truly Independent
Risk analysis : Default Prediction
When Business Objective /Target involves categorization of observations into Groups.
Building a Model to predict the Response of Campaign i.e to predict Whether customer will Respond( category 1) or Not ( Category 2)
Where to Apply Logistic Model
HR and IT : Attrition Prediction
Filtering Excercise : Finding the relationship of IV's with the Target
Nature : Categoric
Nature : categoric
Nature : Numeric
Output of this Step
List of few Imp Variables that can predict the Target variables
Search for the Signal
Step 1 :Division of Dataset to form Training and Testing Dataset
Step 2: Creation Of Dummy Variables( if required)
Use VIF parametre do decide on the variables that are exhibiting the Multicollinearity . If VIF for a variable VIF is greater than 10, it indicates a strong degree of correalation with atleast one other variable in the model.
Performing a Demo run to understand the health of Model.
Model Thumb Rules
Thumb Rule 4:
Somers' D >=.5
Thumb Rule 3:
P value < .05
Thumb Rule 1:
Model Convergence status
Thumb Rule 2:
Model P value < .05
Thumb Rule 5:
H&L p value >.05
For F and FirstPruch Vif>10 that means mulicollinearity is present.
We will use Stepwise Regression Algorithm
to help identify optimal Model combination
According to Stepwise;
these 5 variables are
the best possible
combination to predict
Deciding the cutoff probability value
Testing the stability of Model
on Validation data
Right now predicted output value is Probability value ;
but Prediction should be in Binary format.
Whether People are buying Florence (Class1) or Not(Class 0)
To do this conversion; Cutoff probability value is required.
Rule : Choose the Cutoff value where
value are close to each other.
At .060 probability value ,
Sensitivity and specificity
are closest to each other
ROC can also be used to decide the cutoff
Sensitivity and Specificity can both be 70%
On Training data set also
we are getting 70%
specificity and sensitivity
Good News !!
Model is Stable
All p values are <.05
Sommers' D > .5
H&L > .05