**Enhancing Performance of a Decision tree by Reducing Training Instance**

**Abstract**

Methodology

Our Proposal

Background

Machine learning is spreading worldwide

Need Data for Learning

Data sets are getting larger Day-by-Day

Space shortage

Time Consuming

Related Work

Feature Extraction

Tree Pruning

Attribute Reduction

So, now what?

How can we solve this?

Our Idea!

Decisions tree

A supporting tool

Decision tree is a supporting tool that uses a tree like graph or model of decisions.

Useful Classifier

The decision tree approach is most useful in classification problem.

Popular Algorithms of Decision tree are-

ID3

C4.5

Probability

What is probability?

Probability is a measure or estimation of how likely it is that something will happen or that a statement is true.

Probabilities are given a value between 0 and 1. The higher the degree of probability, the more likely the event is to happen.

Proposed Algorithm

**United International University**

**Thesis Supervisor:**

Dr. Chowdhury Mofizur Rahman

Pro-Vice chancellor

United International University

Dr. Chowdhury Mofizur Rahman

Pro-Vice chancellor

United International University

**Aditi Biswas**

Kazi Mohammad Ehsan

Hasnaeen Ferdous Bin Hashem

Kazi Mohammad Ehsan

Hasnaeen Ferdous Bin Hashem

Conditional Probability

A conditional probability is the probability that an event will occur, when another event is known to occur or to have occurred.

Given two events A and B with P(B)>0, the conditional probability of A given B is defined as the quotient of the joint probability of A and B, and the probability of B:

Naïve Bayesian classifier is a simple probabilistic classifier based on probability mode, which can be trained very efficiently in a supervised learning.

It is based on static probability theory.

Naive Bayesian Classifier

Gaussian Distributions

The Gaussian distribution is the classic “bell-shaped curve” distribution. The mathematical function for computing the probability density of the Gaussian distribution at a particular point X is:

Input: Total training Data set, % of delete info

Output: Reduced Training Data set

Randomize Total Data

Separate the Data part from arff file

Attribute <- Extract Attributes from Data

Class <- Extract Class from Data

classProbility <- Calculate the class probability

classMean <- calculate the class mean according to class of each attributes for all data

classStd <- calculate the class standard deviation according to class of each attributes for all data

for all Data (i=0 to length of data)

Initialize probability variable x 1

for all classes (j=0 to length of class) do

if class = class of data

for all attributes (k=0 to all attributes) do

x=x *GuassianPF(attributes[k],classMean[j],classStd[j])

Gaussian Probability Calculations for each attributes of datawith attribute value class mean and class std

End for

Weight x * classProbability

Write weight according to data index

End if

End for

End for

Short the total Data in ascending order

Delete data as % input

Create new arff file with reduced data

Flow Chart (Main)

Flow Chart (Mean)

Flow Chart (Weight Calculation)

Analysis in Python

Lib.py (Personalized Library)

All the functions

Randomizing

Calculating class Mean & class Std

Calculating Gaussian Conditional Probability

Calculating Weight

Data.py (Main Script)

Calling all the methods of Lib.py

Calculating all the posterior probability of each data

Making WEKA readable ‘.arff’ file for analysis.

System Configuration

Machine 1

CPU

Intel i3 3.30 GHz (Core 2, Thread 4)

RAM

4 GB

Machine 2

CPU

AMD Phenom II x6 3.40 Ghz

(Core 4, Thread 4)

RAM

8 GB

**Analysis**

Datasets for Analysis

Letter Dataset

Training Dataset (80% data)

Test Dataset (20% data)

Prior probability for each class:

P(Class= A) =

.

.

.

P(Class= J) = 677/16000 = 0.0423

.

.

P(Class= Z) =

Calculation

To Calculate weight, we are picking up the first data (D1):

P(J|D1)=P(A1|J)* P(A2|J)*. . . . * P(A16|J)*P(Class=J)

P(J|D1)=P(3|J)* P(11|J)*. . . . * P(6|J)* 0.0423

To Calculate weight, we are picking up the first data (D1):

P(J | D1) = P(A1 | J) * P(A2 | J) *. . . . * P(A16 | J) * P(Class=J)

P(J | D1) = P(3 | J) * P(11 | J) *. . . . * P(6 | J) * 0.0423

Calculation

P(J | D1) = P(3 | J) * P(11 | J) *. . . . * P(6 | J) * 0.0423

= 0.092366 * 0.02252 *. . .. * 0.0423

Weight = 1.60189274743e-17

Calculated Weight

Sorted Data according to Weight

Final Dataset (1% Deleted)

WEKA is a strong tool to make Decision Trees and Analyzing

There are several Decision Tree making algorithms:

J48

ID3

RAT etc

WEKA:

Developed at the University of Waikato, New Zealand

It is a java based software.

WEKA (Continues)

Work in WEKA

Work in WEKA

Work in WEKA

Work in WEKA

Work in WEKA

Work in WEKA

Work i WEKA

We will consider:

Data Deleted Percentage

Number of Instances

Number of Leaves

Size of Tree

Incorrectly Classified Percentage

Performance Analysis

Performance Analysis

Performance Analysis

References

Conclusion

# Deleting unimportant data from data set.

# Performance is related with number of leaves of the tree.

# Number of Leaves and Tree Size will be reduce.

# Performance better for large Data Set compare to smaller Data Set.

# Training time will be less as well as prediction time.

# Saving storage

Flow Chart (STD)

#ASSIGNING WEIGHTS TO TRAINING INSTANCES INCREASES CLASSIFICATION ACCURACY

Dr. Dewan Md. Farid1 and Prof. Dr. Chowdhury Mofizur Rahman2

#Artificial Intelligence: A Modern Approach

By Stuart J. Russell and Peter Norvig.

#The Morgan Kaufmann Series in Data Management Systems "Data Mining: Concepts and Techniques"

#Wikipedia

Calculation

Calculation