sklearn Discrete AdaBoost vs Real AdaBoost
2016-08-08 21:43
393 查看
在组合式学习器中一般有参数learning_rate :学习速率 学习率
这是一个取值在[0, 1]上的值,一些文章说其是用来在算法中用来设定迭代范围的,
过大会导致过拟合,过拟合意味着拟合函数震荡不稳定,这在直观上是可以理解的。
对于adaBoost组合模型调用staged_predict可以得到每个迭代阶段的预测值。
sklearn.metrics.zero_one_loss直接度量了prediction与原值的距离。
下面在训练集及测试集上比较了Discrete AdaBoost及Real AdaBoost
这是一个取值在[0, 1]上的值,一些文章说其是用来在算法中用来设定迭代范围的,
过大会导致过拟合,过拟合意味着拟合函数震荡不稳定,这在直观上是可以理解的。
对于adaBoost组合模型调用staged_predict可以得到每个迭代阶段的预测值。
sklearn.metrics.zero_one_loss直接度量了prediction与原值的距离。
下面在训练集及测试集上比较了Discrete AdaBoost及Real AdaBoost
import numpy as np import matplotlib.pyplot as plt from sklearn import datasets from sklearn.tree import DecisionTreeClassifier from sklearn.metrics import zero_one_loss from sklearn.ensemble import AdaBoostClassifier n_estimators = 400 learning_rate = 1 X, y = datasets.make_hastie_10_2(n_samples = 12000, random_state = 1) X_test, y_test = X[2000:], y[2000:] X_train, y_train = X[:2000], y[:2000] dt_stump = DecisionTreeClassifier(max_depth = 1, min_samples_leaf = 1) dt_stump.fit(X_train, y_train) dt_stump_err = 1.0 - dt_stump.score(X_test, y_test) dt = DecisionTreeClassifier(max_depth = 9, min_samples_leaf = 1) dt.fit(X_train, y_train) dt_err = 1.0 - dt.score(X_test, y_test) ada_discrete = AdaBoostClassifier(base_estimator = dt_stump, learning_rate = learning_rate, n_estimators = n_estimators, algorithm = "SAMME") ada_discrete.fit(X_train, y_train) ada_real = AdaBoostClassifier(base_estimator = dt_stump, learning_rate = learning_rate, n_estimators = n_estimators, algorithm = "SAMME.R") ada_real.fit(X_train, y_train) fig = plt.figure() ax = fig.add_subplot(111) ax.plot([1, n_estimators], [dt_stump_err] * 2, "k-", label = "Decision Stump Error") ax.plot([1, n_estimators], [dt_err] * 2, "k--", label = "Decision Tree Error") ada_discrete_err = np.zeros((n_estimators,)) for i, y_pred in enumerate(ada_discrete.staged_predict(X_test)): ada_discrete_err[i] = zero_one_loss(y_pred, y_test) ada_discrete_err_train = np.zeros((n_estimators,)) for i, y_pred in enumerate(ada_discrete.staged_predict(X_train)): ada_discrete_err_train[i] = zero_one_loss(y_pred, y_train) ada_real_err = np.zeros((n_estimators,)) for i, y_pred in enumerate(ada_real.staged_predict(X_test)): ada_real_err[i] = zero_one_loss(y_pred, y_test) ada_real_err_train = np.zeros((n_estimators,)) for i, y_pred in enumerate(ada_real.staged_predict(X_train)): ada_real_err_train[i] = zero_one_loss(y_pred, y_train) ax.plot(np.arange(n_estimators) + 1, ada_discrete_err, label = "Discrete AdaBoost Test Error", color = "red") ax.plot(np.arange(n_estimators) + 1, ada_discrete_err_train, label = "Discrete AdaBoost Train Error", color = "blue") ax.plot(np.arange(n_estimators) + 1, ada_real_err, label = "Real AdaBoost Test Error", color = "orange") ax.plot(np.arange(n_estimators) + 1, ada_real_err_train, label = "Real AdaBoost Train Error", color = "green") ax.set_ylim((0.0, 0.5)) ax.set_xlabel("n_estimators") ax.set_ylabel("err rate") leg = ax.legend(loc = "upper right", fancybox = True) leg.get_frame().set_alpha(0.7) plt.show()
相关文章推荐
- 几种Boost算法的比较(Discrete AdaBoost, Real AdaBoost, LogitBoost, Gentle Adaboost)
- 几种Boost算法的比较(Discrete AdaBoost, Real AdaBoost, LogitBoost, Gentle Adaboost)
- Discrete AdaBoost,Real AdaBoost,LogitBoost AdaBoost,Gentle AdaBoost
- Real Adaboost总结
- SVM vs Adaboost
- Real adaboost
- adaboost人脸检测
- AdaBoost—MATLAB代码
- Adaboost原理、算法以及应用
- 初识AdaBoost
- VS2010 + STLport + Boost 的编译使用
- Gradient boost adaboost简单区别
- [Boost] 1.57.0 with VS2013 + Intel compiler
- Windows+VS2013下Boost1.59编译运行
- 数据挖掘十大经典算法(7) AdaBoost
- 转:Adaboost 算法的原理与推导
- 监督算法大比拼之BP、SVM、adaboost非线性多分类实验
- 【转载】AdaBoost / Decision Stumps
- 人脸检测分类器AdaBoost