Each tree can be expressed in terms of a random vector , independent from past vectors but drawn from the same distribution.

The tree that is trained from can be written as , where x is an input vector.

**Random Forests**

**Aims**

**Method & result**

Introduction

Random ForestsTM are an application of bagging (bootstrap aggregation) to decision trees.

Random ForestsTMis a class of ensemble methods published in 2001 by Leo Breiman and Adele Cutler at UCB.

Random Forests Using

Random Input Selection

Introduction cont.

**Group Number: 3**

Group Members:

Hector Chu (Leader)

Theerayut Sueverachai

Yuchen Zou

Yumeng Du

Group Members:

Hector Chu (Leader)

Theerayut Sueverachai

Yuchen Zou

Yumeng Du

Grow many trees that differ slightly in their training data and feature splits, allow them to vote for the majority class.

By using an ensemble model, the accuracy is usually better than that of any individual model instance.

It remains one of the most popular ensemble methods within the field of classification.

No need to fine tune many parameters or refine features.

Still gives very good accuracy comparable to best-in-class.

Objection - Hard to understand the built model. The paper suggests a method for estimating variable importance.

Easy to implement and fast to build.

Overfitting is avoided.

A Black Box

1996 – Freund and Schapire – AdaBoost – Adaptive Boosting Error rates comparable to Random ForestsTM . Random ForestsTM are more robust with respect to noise. Breiman conjectures that in the final stages of training AdaBoost is emulating a random forest.

Comparison to other methods

is the classifier in an ensemble of classifiers.

The margin function is the extent to which the average number of votes for the right class exceeds the average vote for any other class.

1996 – Breiman devises bagging as a method of generating ensemble classifiers.

1997 – Amit and Geman – Shape quantization and recognition with randomized trees

1998 – Random split selection by Dietterich.

1998 – Ho – The random subspace method for constructing decision forests.

1999 – Breiman – Using adaptive bagging to debias regressions.

Earlier seminal work

Ensemble Classification Model

Consideration:

Building random forests is faster than building Adaboost

E.g. growing 100 trees in random forests is quicker than growing 50 trees in Adaboost.

4. Random Forests Using

Random Input Selection

Results:

• Similar error rates for

Adaboost and Random

Forest

• Single random input variable

does better in some cases.

Experiment Steps

M input variables

Randomly Permuted the values of the mth variable and run the out-of-bag data.

The plurality of out-of-bag class votes for Xn with the mth variable.

Comparing with the true class label of Xn to give a misclassification rate.

The output is the percent increase in mis-classification rate as compared to the out-of-bag rate(with all variable intact)

Random Forest Mechanism

5.1 Categorical Variables

Problem:

Input variables may be categorical

Thus, find an approach to combine categorical and numerical variables.

Approach:

• Choose random subset of categorical

• Define a substitute variable

• 1 when the categorical variable is in the subset

• Otherwise, 0

5.1 Categorical Variables

Problem for this approach:

For small group’s size

Low correlation and low strength

Thus, group’s size must be increased by

2-3 times higher than log2(M+1)

6. Empirical Results on

Strength and Correlation

Look at the effect of strength and correlation on the

generalisation error

• The out-of-bag estimates are used

• Experiment:

• Run from 1 - 50 inputs (Forest-RI on sonar data sets)

• For each iteration

10% of data is split off as a test set

the number of random inputs is varied from 1 - 50

Record test set error, strength, correlation

Repeat 80 times, average all results

6. Empirical Results on

Strength and Correlation

Figure 1:

Plot the values of strength and correlation vs the size of input group

6. Empirical Results on

Strength and Correlation

Figure 2:

Plot the test set errors and the out-of-bag estimates vs the size of input group

Using single variable.

Random Forest Mechanism

6. Empirical Results on

Strength and Correlation

Figure 3: Try on larger data set (satellite data set)

Plot the values of strength and correlation vs the size of input group

8. The effect of noise

Method:

For each data set

10% at random is split off as a test set

2 runs on the remaining training set

First run: original training set

Second run: noisy version of the training set

For noisy version,

5% alter classes at random

Compare both error rates

8. The effect of noise

Result:

The accuracy of Adaboost tends to degenerate

While, the random forest maintain robustness to noise

3.1 Random features

Bagging is used to tandem with random feature selection:

There are two reasons for using bagging:

1). Bagging seems to enhance accuracy when random features are used.

2). Bagging can be used to give ongoing estimates of the generalization error of the combined ensemble of trees

Regression

Theory of Convergence

**Discussion**

Generalisation error is the probability that the margin function is negative.

Strong Law of Large Numbers shows that as the number of trees

increases, the generalization error converges to a limiting value. Explains why overfitting doesnt happen.

Generalisation error is bounded by the strength of individual trees and the correlation between them. This bound can be given in terms of strength defined as the expectation of the margin function.

Theory of Convergence

The tree predictor takes on numerical values instead of the class labels

The mean-squared generalization error for any numerical predictor h(x) is

Requirements for accurate regression forests -- low correlation and low error.

Random Forest is always better than Bagging

Mean-Squared Test Set Error

The test set mean-squared error for bagging, adaptive bagging and the random forest.

Variable 2 and variable 6

Variable 2 and variable 8

3. Random features

To improve accuracy, the randomness injected has to minimize the correlation while maintaining strength.

The forests studied here consist of using randomly selected inputs or combinations of inputs at each node to grow each tree.

3. Random features

There are some desirbable characteristics about resulting forest compare with Adaboost:

1.) Its accuracy is as good as Adaboost and sometimes better.

2.) It’s relatively robust to outliers and noise.

3.) It’s faster than bagging or boosting.

4.) It gives useful internal estimates of error, strength, correlation and variable importance.

5.) It’s simple and easily parallelized.

3.1 Random features

The study of error estimates for bagged classifiers in Breiman [1996b], gives empirical evidence to show that the out-of-bag estimate is as accurate as using a test set of the same size as the training set.

Therefore, using the out-of-bag error estimate removes the need for a set aside test set

3.1 Random features

The out-of-bag estimates are based on combining only about one-third as many classifiers as in the ongoing main combination.

Since the error rate decreases as the number of combinations increases, the out-of-bag estimates will tend to overestimate the current error rate.