您的位置:首页 > 其它

Random Forest

2015-09-17 22:38 351 查看





4. 在训练完后,它能够给出哪些feature比较重要

5.在创建随机森林的时候,对generlization error使用的是无偏估计





Features of Random Forests

1.It is unexcelled in accuracy among current algorithms.

2.It runs efficiently on large data bases.

3.It can handle thousands of input variables without variable deletion.

4.It gives estimates of what variables are important in the classification.

5.It generates an internal unbiased estimate of the generalization error as the forest building progresses.

I6.t has an effective method for estimating missing data and maintains accuracy when a large proportion of the data are missing.

7.It has methods for balancing error in class population unbalanced data sets.

8.Generated forests can be saved for future use on other data.

Prototypes are computed that give information about the relation between the variables and the classification.

9.It computes proximities between pairs of cases that can be used in clustering, locating outliers, or (by scaling) give interesting views of the data.

10.The capabilities of the above can be extended to unlabeled data, leading to unsupervised clustering, data views and outlier detection.

11.It offers an experimental method for detecting variable interactions.

In random forests , each tree in the ensemble is built from a sample drawn with replacement (i.e., a bootstrap sample) from the training set. In addition, when splitting a node during the construction of the tree, the split that is chosen is no longer the best split among all features. Instead, the split that is picked is the best split among a random subset of the features. As a result of this randomness, the bias of the forest usually slightly increases (with respect to the bias of a single non-random tree) but, due to averaging, its variance also decreases, usually more than compensating for the increase in bias, hence yielding an overall better model.
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  数据 算法