Loading presentation...

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.


Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks


No description

ankita pal

on 16 November 2016

Comments (0)

Please log in to add your comment.

Report abuse


The Database
Data from UCI Machine Learning Repository had been obtained for this project. Name of the dataset is Default of credit card clients Data Set. The Number of Instances is 30000. The number of attributes is 24. A binary variable, default payment (Yes = 1, No = 0), as the response variable.
Tools Used
RSTUDIO: VERSION 0.98.1091:RStudio IDE is a powerful and productive user interface for R. It’s free and open source, and works great on Windows, Mac, and Linux. RStudio is a free and open-source integrated development environment (IDE) for R, a programming language for statistical computing and graphics.

Underlying Concept used in the Algorithm
Credit card allows the convenience of spending on credit to the owner of owner of card. This means that the lack of availability of cash at that time is not much of a concern for a credit card holder since he can spend and purchase on credit and pay conveniently at a later date. . Before giving a credit loan to borrowers, bank decides who is bad (Defaulter) or good (Non-defaulter) borrower. The prediction of borrower status i.e. in future borrower will be defaulter or non-defaulter is a challenging task for bank. The loan defaulter prediction is a binary classification problem.
R LANGUAGE: R VERSION 3.3.1 :R is a programming language and software environment for statistical computing and graphics supported by the R Foundation for Statistical .R and its libraries implement a wide variety of statistical and graphical techniques, including linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, and others
A Genetic Algorithm (GA) is a method for solving both constrained and unconstrained optimization problems based on a natural selection process that mimics biological evolution.
The algorithm repeatedly modifies a population of individual solutions.
At each step, the genetic algorithm randomly selects individuals from the current population and uses them as parents to produce the children for the next generation.
Over successive generations, the population "evolves" toward an optimal solution.
“Support Vector Machine” (SVM) is a supervised machine learning algorithm which can be used for both classification or regression challenges.
It is mostly used in classification problems.
In this algorithm, we plot each data item as a point in n-dimensional space (where n is number of features ) with the value of each feature being the value of a particular coordinate.
We perform classification by finding the hyper-plane that differentiate the two classes very well .
Support Vector Machine is a frontier which best segregates the two classes (hyper-plane/ line).
Principal Component Analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components.
The number of principal components is less than or equal to the number of original variables.
The desired goal is to reduce the dimensions of a (d)-dimensional dataset by projecting it onto a (k)-dimensional subspace (where k<d) in order to increase the computational efficiency while retaining most of the information.
An artificial neuron is a mathematical function conceived as a model of biological neurons.
Artificial neurons are the constitutive units in an artificial neural network.
The artificial neuron receives one or more inputs (representing dendrites) and sums them to produce an output (representing a neuron's axon).
The transfer functions usually have a sigmoid shape, but they may also take the form of other non-linear functions, piecewise linear functions, or step functions.

Packages used in the project:

The caret package (short for Classification And
REgression Training) is a set of functions that attempt to streamline the process for creating predictive models. The package contains tools for:
 Data splitting
 Pre-processing
 Feature selection
 Model tuning using resampling
 Variable importance estimation
 Other functionalities

The doParallel package is a “parallel backend” for the foreach package. It provides a mechanism needed to execute foreach loops in parallel. The foreach package must be used in conjunction with a package such as doParallel in order to execute code in parallel.
dplyr is a package for data manipulation, written and maintained by Hadley Wickham. It provides some great, easy-to-use functions that are very handy when performing exploratory data analysis and manipulation.
A set of tools that solves a common set of problems: A big problem may be broken down into manageable pieces, operate on each piece and then put all the pieces back together.
The package allows flexible settings through
custom-choice of error and activation function. Furthermore,
the calculation of generalized weights
This is observed from the above screenshots that SVM classifier does not correctly classify the customers correctly as it identifies only 3 out of 690 actual defaulters. The Neural Network Predictor gives a better performance by predicting 254 correctly.
Full transcript