Random Forest
2015-09-17 22:38
351 查看
RF优点:
随机森林是一个最近比较火的算法,它有很多的优点:
1.在数据集上表现良好
2.在当前的很多数据集上,相对其他算法有着很大的优势
3.它能够处理很高维度(feature很多)的数据,并且不用做特征选择
4. 在训练完后,它能够给出哪些feature比较重要
5.在创建随机森林的时候,对generlization error使用的是无偏估计
6.训练速度快
7.在训练过程中,能够检测到feature间的互相影响
8.容易做成并行化方法
9.实现比较简单
Features of Random Forests
1.It is unexcelled in accuracy among current algorithms.
2.It runs efficiently on large data bases.
3.It can handle thousands of input variables without variable deletion.
4.It gives estimates of what variables are important in the classification.
5.It generates an internal unbiased estimate of the generalization error as the forest building progresses.
I6.t has an effective method for estimating missing data and maintains accuracy when a large proportion of the data are missing.
7.It has methods for balancing error in class population unbalanced data sets.
8.Generated forests can be saved for future use on other data.
Prototypes are computed that give information about the relation between the variables and the classification.
9.It computes proximities between pairs of cases that can be used in clustering, locating outliers, or (by scaling) give interesting views of the data.
10.The capabilities of the above can be extended to unlabeled data, leading to unsupervised clustering, data views and outlier detection.
11.It offers an experimental method for detecting variable interactions.
In random forests , each tree in the ensemble is built from a sample drawn with replacement (i.e., a bootstrap sample) from the training set. In addition, when splitting a node during the construction of the tree, the split that is chosen is no longer the best split among all features. Instead, the split that is picked is the best split among a random subset of the features. As a result of this randomness, the bias of the forest usually slightly increases (with respect to the bias of a single non-random tree) but, due to averaging, its variance also decreases, usually more than compensating for the increase in bias, hence yielding an overall better model.
随机森林是一个最近比较火的算法,它有很多的优点:
1.在数据集上表现良好
2.在当前的很多数据集上,相对其他算法有着很大的优势
3.它能够处理很高维度(feature很多)的数据,并且不用做特征选择
4. 在训练完后,它能够给出哪些feature比较重要
5.在创建随机森林的时候,对generlization error使用的是无偏估计
6.训练速度快
7.在训练过程中,能够检测到feature间的互相影响
8.容易做成并行化方法
9.实现比较简单
Features of Random Forests
1.It is unexcelled in accuracy among current algorithms.
2.It runs efficiently on large data bases.
3.It can handle thousands of input variables without variable deletion.
4.It gives estimates of what variables are important in the classification.
5.It generates an internal unbiased estimate of the generalization error as the forest building progresses.
I6.t has an effective method for estimating missing data and maintains accuracy when a large proportion of the data are missing.
7.It has methods for balancing error in class population unbalanced data sets.
8.Generated forests can be saved for future use on other data.
Prototypes are computed that give information about the relation between the variables and the classification.
9.It computes proximities between pairs of cases that can be used in clustering, locating outliers, or (by scaling) give interesting views of the data.
10.The capabilities of the above can be extended to unlabeled data, leading to unsupervised clustering, data views and outlier detection.
11.It offers an experimental method for detecting variable interactions.
In random forests , each tree in the ensemble is built from a sample drawn with replacement (i.e., a bootstrap sample) from the training set. In addition, when splitting a node during the construction of the tree, the split that is chosen is no longer the best split among all features. Instead, the split that is picked is the best split among a random subset of the features. As a result of this randomness, the bias of the forest usually slightly increases (with respect to the bias of a single non-random tree) but, due to averaging, its variance also decreases, usually more than compensating for the increase in bias, hence yielding an overall better model.
相关文章推荐
- 我是运营,我没有假期
- 动易2006序列号破解算法公布
- DB2数据库的安装
- C#实现把指定数据写入串口
- “传奇”图象数据存储方式
- Ruby实现的矩阵连乘算法
- 修复mysql数据库
- C#插入法排序算法实例分析
- SQLServer 数据导入导出的几种方法小结
- MySQL数据备份之mysqldump的使用详解
- 超大数据量存储常用数据库分表分库算法总结
- C#数据结构与算法揭秘二
- C#冒泡法排序算法实例分析
- C#实现窗体间传递数据实例
- 算法练习之从String.indexOf的模拟实现开始
- C#算法之关于大牛生小牛的问题
- 给你的数据库文件减肥
- Oracle数据更改后出错的解决方法
- C#实现的算24点游戏算法实例分析
- C#将Sql数据保存到Excel文件中的方法