scikit-learn 常用分类算法的使用
2017-06-10 20:05
260 查看
scikit-learn机器学习的分类算法包括逻辑回归、朴素贝叶斯、KNN、支持向量机、决策树和随机森林等。这些模块的调用形式基本一致,训练用fit方法,预测用predict方法。用joblib.dump方法可以保存训练的模型,用joblib.load方法可以载入模型。
测试程序。测试数据采用小麦种子数据集 (seeds)。
在Spyder中调试运行,运行结果。
测试程序。测试数据采用小麦种子数据集 (seeds)。
# -*- coding: utf-8 -*- import numpy as np from sklearn.cross_validation import KFold from sklearn.linear_model import LogisticRegression from sklearn.naive_bayes import GaussianNB from sklearn.neighbors import KNeighborsClassifier from sklearn import svm from sklearn.tree import DecisionTreeClassifier from sklearn.ensemble import RandomForestClassifier feature_names = [ 'area', 'perimeter', 'compactness', 'length of kernel', 'width of kernel', 'asymmetry coefficien', 'length of kernel groove', ] COLOUR_FIGURE = False def load_csv_data(filename): data = [] labels = [] datafile = open(filename) for line in datafile: fields = line.strip().split('\t') data.append([float(field) for field in fields[:-1]]) labels.append(fields[-1]) data = np.array(data) labels = np.array(labels) return data, labels def accuracy(test_labels, pred_lables): correct = np.sum(test_labels == pred_lables) n = len(test_labels) return float(correct) / n #------------------------------------------------------------------------------ #逻辑回归 #------------------------------------------------------------------------------ def testLR(features, labels): kf = KFold(len(features), n_folds=3, shuffle=True) clf = LogisticRegression() result_set = [(clf.fit(features[train], labels[train]).predict(features[test]), test) for train, test in kf] score = [accuracy(labels[result[1]], result[0]) for result in result_set] print(score) #------------------------------------------------------------------------------ #朴素贝叶斯 #------------------------------------------------------------------------------ def testNaiveBayes(features, labels): kf = KFold(len(features), n_folds=3, shuffle=True) clf = GaussianNB() result_set = [(clf.fit(features[train], labels[train]).predict(features[test]), test) for train, test in kf] score = [accuracy(labels[result[1]], result[0]) for result in result_set] print(score) #------------------------------------------------------------------------------ #K最近邻 #------------------------------------------------------------------------------ def testKNN(features, labels): kf = KFold(len(features), n_folds=3, shuffle=True) clf = KNeighborsClassifier(n_neighbors=5) result_set = [(clf.fit(features[train], labels[train]).predict(features[test]), test) for train, test in kf] score = [accuracy(labels[result[1]], result[0]) for result in result_set] print(score) #------------------------------------------------------------------------------ #--- 支持向量机 #------------------------------------------------------------------------------ def testSVM(features, labels): kf = KFold(len(features), n_folds=3, shuffle=True) clf = svm.SVC() result_set = [(clf.fit(features[train], labels[train]).predict(features[test]), test) for train, test in kf] score = [accuracy(labels[result[1]], result[0]) for result in result_set] print(score) #------------------------------------------------------------------------------ #--- 决策树 #------------------------------------------------------------------------------ def testDecisionTree(features, labels): kf = KFold(len(features), n_folds=3, shuffle=True) clf = DecisionTreeClassifier() result_set = [(clf.fit(features[train], labels[train]).predict(features[test]), test) for train, test in kf] score = [accuracy(labels[result[1]], result[0]) for result in result_set] print(score) #------------------------------------------------------------------------------ #--- 随机森林 #------------------------------------------------------------------------------ def testRandomForest(features, labels): kf = KFold(len(features), n_folds=3, shuffle=True) clf = RandomForestClassifier() result_set = [(clf.fit(features[train], labels[train]).predict(features[test]), test) for train, test in kf] score = [accuracy(labels[result[1]], result[0]) for result in result_set] print(score) if __name__ == '__main__': features, labels = load_csv_data('data/seeds_dataset.txt') print(features) print('LogisticRegression: \r') testLR(features, labels) print('GaussianNB: \r') testNaiveBayes(features, labels) print('KNN: \r') testKNN(features, labels) print('SVM: \r') testSVM(features, labels) print('Decision Tree: \r') testDecisionTree(features, labels) print('Random Forest: \r') testRandomForest(features, labels)
在Spyder中调试运行,运行结果。
runfile('E:/MyProject/_python/ScikitLearn/demo_clf.py', wdir='E:/MyProject/_python/ScikitLearn') [[ 15.26 14.84 0.871 ..., 3.312 2.221 5.22 ] [ 14.88 14.57 0.8811 ..., 3.333 1.018 4.956 ] [ 14.29 14.09 0.905 ..., 3.337 2.699 4.825 ] ..., [ 13.2 13.66 0.8883 ..., 3.232 8.315 5.056 ] [ 11.84 13.21 0.8521 ..., 2.836 3.598 5.044 ] [ 12.3 13.34 0.8684 ..., 2.974 5.637 5.063 ]] LogisticRegression: [0.9142857142857143, 0.9714285714285714, 0.8857142857142857] GaussianNB: [0.9428571428571428, 0.8714285714285714, 0.9] KNN: [0.9285714285714286, 0.8571428571428571, 0.8857142857142857] SVM: [0.9, 0.9285714285714286, 0.8571428571428571] Decision Tree: [0.8714285714285714, 0.9714285714285714, 0.9142857142857143] Random Forest: [0.8857142857142857, 0.9142857142857143, 0.8428571428571429]
相关文章推荐
- 使用 scikit-learn 实现多类别及多标签分类算法
- 用scikit-learn估计值分类——近邻算法(KNN)
- 使用scikit-learn实现决策树的算法模板
- 使用scikit-learn进行文本分类
- Scikit-learn中使用SVM对文本进行分类
- 机器学习 - 基于 Scikit-learn 多类别和多标签分类算法
- scikit-learn 常用模型介绍及使用(上)
- scikit-learn的梯度提升算法(Gradient Boosting)使用
- scikit-learn 常用模型介绍及使用(下)
- 使用scikit-learn对车辆情况进行分类
- Python 文本分类:使用scikit-learn 机器学习包进行文本分类
- 使用scikit-learn进行音乐分类
- scikit-learn一些常用算法
- 1.2 KNN算法学习——使用scikit-learn中的KNN模型实现
- scikit-learn的基本用法(一)——KNN算法的使用
- Python scikit-learn分类 近邻算法KNN
- 使用scikit-learn处理分类的基础流程
- 使用scikit-learn的随机森林对西瓜进行分类
- scikit-learn中PCA的使用方法
- scikit-learn使用笔记与sign prediction简单小结