### Present Remotely

Send the link below via email or IM

• Invited audience members will follow you as you navigate and present
• People invited to a presentation do not need a Prezi account
• This link expires 10 minutes after you close the presentation

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

# Thesis Defense Presentation

Enhancing Performance of a Decision Tree by Reducing Training Instance
by

## Kazi Mohammed Ehsan

on 13 May 2015

Report abuse

#### Transcript of Thesis Defense Presentation

Enhancing Performance of a Decision tree by Reducing Training Instance
Abstract
Methodology
Our Proposal
Background
Need Data for Learning
Data sets are getting larger Day-by-Day
Space shortage
Time Consuming
Related Work
Feature Extraction
Tree Pruning
Attribute Reduction

So, now what?
How can we solve this?
Our Idea!

Decisions tree
A supporting tool
Decision tree is a supporting tool that uses a tree like graph or model of decisions.
Useful Classifier
The decision tree approach is most useful in classification problem.
Popular Algorithms of Decision tree are-
ID3
C4.5
Probability
What is probability?
Probability is a measure or estimation of how likely it is that something will happen or that a statement is true.
Probabilities are given a value between 0 and 1. The higher the degree of probability, the more likely the event is to happen.
Proposed Algorithm
United International University
Thesis Supervisor:
Dr. Chowdhury Mofizur Rahman
Pro-Vice chancellor
United International University

Hasnaeen Ferdous Bin Hashem

Conditional Probability
A conditional probability is the probability that an event will occur, when another event is known to occur or to have occurred.

Given two events A and B with P(B)>0, the conditional probability of A given B is defined as the quotient of the joint probability of A and B, and the probability of B:

Naïve Bayesian classifier is a simple probabilistic classifier based on probability mode, which can be trained very efficiently in a supervised learning.
It is based on static probability theory.

Naive Bayesian Classifier
Gaussian Distributions

The Gaussian distribution is the classic “bell-shaped curve” distribution. The mathematical function for computing the probability density of the Gaussian distribution at a particular point X is:

Input: Total training Data set, % of delete info

Output: Reduced Training Data set
Randomize Total Data
Separate the Data part from arff file
Attribute <- Extract Attributes from Data
Class <- Extract Class from Data
classProbility <- Calculate the class probability
classMean <- calculate the class mean according to class of each attributes for all data
classStd <- calculate the class standard deviation according to class of each attributes for all data
for all Data (i=0 to length of data)
Initialize probability variable x  1
for all classes (j=0 to length of class) do
if class = class of data
for all attributes (k=0 to all attributes) do
x=x *GuassianPF(attributes[k],classMean[j],classStd[j])
Gaussian Probability Calculations for each attributes of datawith attribute value class mean and class std
End for
Weight x * classProbability
Write weight according to data index
End if
End for
End for
Short the total Data in ascending order
Delete data as % input
Create new arff file with reduced data

Flow Chart (Main)
Flow Chart (Mean)
Flow Chart (Weight Calculation)
Analysis in Python
Lib.py (Personalized Library)
All the functions
Randomizing
Calculating class Mean & class Std
Calculating Gaussian Conditional Probability
Calculating Weight
Data.py (Main Script)
Calling all the methods of Lib.py
Calculating all the posterior probability of each data
Making WEKA readable ‘.arff’ file for analysis.

System Configuration
Machine 1
CPU
Intel i3 3.30 GHz (Core 2, Thread 4)
RAM
4 GB

Machine 2
CPU
AMD Phenom II x6 3.40 Ghz
RAM
8 GB

Analysis
Datasets for Analysis
Letter Dataset
Training Dataset (80% data)
Test Dataset (20% data)
Prior probability for each class:
P(Class= A) =
.
.
.
P(Class= J) = 677/16000 = 0.0423
.
.
P(Class= Z) =
Calculation
To Calculate weight, we are picking up the first data (D1):

P(J|D1)=P(A1|J)* P(A2|J)*. . . . * P(A16|J)*P(Class=J)

P(J|D1)=P(3|J)* P(11|J)*. . . . * P(6|J)* 0.0423

To Calculate weight, we are picking up the first data (D1):

P(J | D1) = P(A1 | J) * P(A2 | J) *. . . . * P(A16 | J) * P(Class=J)

P(J | D1) = P(3 | J) * P(11 | J) *. . . . * P(6 | J) * 0.0423

Calculation
P(J | D1) = P(3 | J) * P(11 | J) *. . . . * P(6 | J) * 0.0423
= 0.092366 * 0.02252 *. . .. * 0.0423

Weight = 1.60189274743e-17

Calculated Weight
Sorted Data according to Weight
Final Dataset (1% Deleted)
WEKA is a strong tool to make Decision Trees and Analyzing
There are several Decision Tree making algorithms:
J48
ID3
RAT etc
WEKA:
Developed at the University of Waikato, New Zealand
It is a java based software.

WEKA (Continues)
Work in WEKA
Work in WEKA
Work in WEKA
Work in WEKA
Work in WEKA
Work in WEKA
Work i WEKA
We will consider:
Data Deleted Percentage
Number of Instances
Number of Leaves
Size of Tree
Incorrectly Classified Percentage

Performance Analysis
Performance Analysis
Performance Analysis
References
Conclusion
# Deleting unimportant data from data set.
# Performance is related with number of leaves of the tree.
# Number of Leaves and Tree Size will be reduce.
# Performance better for large Data Set compare to smaller Data Set.
# Training time will be less as well as prediction time.
# Saving storage
Flow Chart (STD)

#ASSIGNING WEIGHTS TO TRAINING INSTANCES INCREASES CLASSIFICATION ACCURACY
Dr. Dewan Md. Farid1 and Prof. Dr. Chowdhury Mofizur Rahman2
#Artificial Intelligence: A Modern Approach
By Stuart J. Russell and Peter Norvig.
#The Morgan Kaufmann Series in Data Management Systems "Data Mining: Concepts and Techniques"