Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Random Forests

No description
by

yumeng du

on 9 May 2014

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Random Forests

In addition to bagging, every tree in the forest selects from a random set of features to split on at every node.
Each tree can be expressed in terms of a random vector , independent from past vectors but drawn from the same distribution.
The tree that is trained from can be written as , where x is an input vector.
Random Forests
Aims
Method & result
Introduction
Random ForestsTM are an application of bagging (bootstrap aggregation) to decision trees.
Random ForestsTMis a class of ensemble methods published in 2001 by Leo Breiman and Adele Cutler at UCB.
Random Forests Using
Random Input Selection

Introduction cont.
Group Number: 3
Group Members:
Hector Chu (Leader)
Theerayut Sueverachai
Yuchen Zou
Yumeng Du

Grow many trees that differ slightly in their training data and feature splits, allow them to vote for the majority class.
By using an ensemble model, the accuracy is usually better than that of any individual model instance.
It remains one of the most popular ensemble methods within the field of classification.
No need to fine tune many parameters or refine features.
Still gives very good accuracy comparable to best-in-class.
Objection - Hard to understand the built model. The paper suggests a method for estimating variable importance.
Easy to implement and fast to build.
Overfitting is avoided.
A Black Box
1996 – Freund and Schapire – AdaBoost – Adaptive Boosting Error rates comparable to Random ForestsTM . Random ForestsTM are more robust with respect to noise. Breiman conjectures that in the final stages of training AdaBoost is emulating a random forest.
Comparison to other methods
is the classifier in an ensemble of classifiers.
The margin function is the extent to which the average number of votes for the right class exceeds the average vote for any other class.
1996 – Breiman devises bagging as a method of generating ensemble classifiers.
1997 – Amit and Geman – Shape quantization and recognition with randomized trees
1998 – Random split selection by Dietterich.
1998 – Ho – The random subspace method for constructing decision forests.
1999 – Breiman – Using adaptive bagging to debias regressions.
Earlier seminal work
Ensemble Classification Model
Consideration:
Building random forests is faster than building Adaboost
E.g. growing 100 trees in random forests is quicker than growing 50 trees in Adaboost.

4. Random Forests Using
Random Input Selection

Results:
• Similar error rates for
Adaboost and Random
Forest
• Single random input variable
does better in some cases.

Experiment Steps
M input variables
Randomly Permuted the values of the mth variable and run the out-of-bag data.
The plurality of out-of-bag class votes for Xn with the mth variable.
Comparing with the true class label of Xn to give a misclassification rate.
The output is the percent increase in mis-classification rate as compared to the out-of-bag rate(with all variable intact)
Random Forest Mechanism
5.1 Categorical Variables
Problem:
Input variables may be categorical
Thus, find an approach to combine categorical and numerical variables.
Approach:
• Choose random subset of categorical
• Define a substitute variable
• 1 when the categorical variable is in the subset
• Otherwise, 0

5.1 Categorical Variables
Problem for this approach:
For small group’s size
Low correlation and low strength
Thus, group’s size must be increased by
2-3 times higher than log2(M+1)

6. Empirical Results on
Strength and Correlation
Look at the effect of strength and correlation on the
generalisation error
• The out-of-bag estimates are used
• Experiment:
• Run from 1 - 50 inputs (Forest-RI on sonar data sets)
• For each iteration
10% of data is split off as a test set
the number of random inputs is varied from 1 - 50
Record test set error, strength, correlation
Repeat 80 times, average all results

6. Empirical Results on
Strength and Correlation

Figure 1:
Plot the values of strength and correlation vs the size of input group


6. Empirical Results on
Strength and Correlation

Figure 2:
Plot the test set errors and the out-of-bag estimates vs the size of input group
Using single variable.
Random Forest Mechanism
6. Empirical Results on
Strength and Correlation

Figure 3: Try on larger data set (satellite data set)
Plot the values of strength and correlation vs the size of input group
8. The effect of noise
Method:
For each data set
10% at random is split off as a test set
2 runs on the remaining training set
First run: original training set
Second run: noisy version of the training set
For noisy version,
5% alter classes at random
Compare both error rates
8. The effect of noise
Result:
The accuracy of Adaboost tends to degenerate
While, the random forest maintain robustness to noise
3.1 Random features
Bagging is used to tandem with random feature selection:
There are two reasons for using bagging:
1). Bagging seems to enhance accuracy when random features are used.
2). Bagging can be used to give ongoing estimates of the generalization error of the combined ensemble of trees
Regression
Theory of Convergence
Discussion
Generalisation error is the probability that the margin function is negative.

Strong Law of Large Numbers shows that as the number of trees
increases, the generalization error converges to a limiting value. Explains why overfitting doesnt happen.
Generalisation error is bounded by the strength of individual trees and the correlation between them. This bound can be given in terms of strength defined as the expectation of the margin function.
Theory of Convergence
The tree predictor takes on numerical values instead of the class labels
The mean-squared generalization error for any numerical predictor h(x) is
Requirements for accurate regression forests -- low correlation and low error.
Random Forest is always better than Bagging
Mean-Squared Test Set Error
The test set mean-squared error for bagging, adaptive bagging and the random forest.
Variable 2 and variable 6
Variable 2 and variable 8
3. Random features
To improve accuracy, the randomness injected has to minimize the correlation while maintaining strength.
The forests studied here consist of using randomly selected inputs or combinations of inputs at each node to grow each tree.
3. Random features
There are some desirbable characteristics about resulting forest compare with Adaboost:
1.) Its accuracy is as good as Adaboost and sometimes better.
2.) It’s relatively robust to outliers and noise.
3.) It’s faster than bagging or boosting.
4.) It gives useful internal estimates of error, strength, correlation and variable importance.
5.) It’s simple and easily parallelized.
3.1 Random features
The study of error estimates for bagged classifiers in Breiman [1996b], gives empirical evidence to show that the out-of-bag estimate is as accurate as using a test set of the same size as the training set.
Therefore, using the out-of-bag error estimate removes the need for a set aside test set
3.1 Random features
The out-of-bag estimates are based on combining only about one-third as many classifiers as in the ongoing main combination.
Since the error rate decreases as the number of combinations increases, the out-of-bag estimates will tend to overestimate the current error rate.
Full transcript