利用基于线性假设的支持向量机分类器LinearSVC进行多类分类(复习2)
2018-01-13 16:00
459 查看
本文是个人学习笔记,内容主要涉及SVC(Support Vector Classifier)对sklearn内置的digits邮票手写数字数据集进行线性多类分类。
支持向量机分类器(Support Vector Classifier)是根据训练样本的分布,搜索所以可能的线性分类器中最佳的那个,决定分类边界位置的样本并不是所有训练数据,是其中的两个类别空间的间隔最小的两个不同类别的数据点,即“支持向量”。从而可以在海量甚至高维度的数据中,筛选对预测任务最为有效的少数训练样本。(LogisticRegression模型在训练过程中考虑了所有训练样本对参数的影响)
准确率、召回率和F1指标最先适用于二分类任务,对待多分类任务,训练过程的策略是逐一评估某个类别的准确率、召回率和F1指标的性能,即把所有其他的类别看做阴性(负)样本,这样一来对于邮票手写数字问题就创造了10个二分类任务。
支持向量机分类器(Support Vector Classifier)是根据训练样本的分布,搜索所以可能的线性分类器中最佳的那个,决定分类边界位置的样本并不是所有训练数据,是其中的两个类别空间的间隔最小的两个不同类别的数据点,即“支持向量”。从而可以在海量甚至高维度的数据中,筛选对预测任务最为有效的少数训练样本。(LogisticRegression模型在训练过程中考虑了所有训练样本对参数的影响)
准确率、召回率和F1指标最先适用于二分类任务,对待多分类任务,训练过程的策略是逐一评估某个类别的准确率、召回率和F1指标的性能,即把所有其他的类别看做阴性(负)样本,这样一来对于邮票手写数字问题就创造了10个二分类任务。
from sklearn.datasets import load_digits digits=load_digits() #该sklearn.datasets里的手写体数字图像数据共1797条,每幅图片由8*8=64的像素矩阵表示 digits.data.shape #Output:(1797, 64)
digits #Output:{'DESCR': "Optical Recognition of Handwritten Digits Data Set\n===================================================\n\nNotes\n-----\nData Set Characteristics:\n :Number of Instances: 5620\n :Number of Attributes: 64\n :Attribute Information: 8x8 image of integer pixels in the range 0..16.\n :Missing Attribute Values: None\n :Creator: E. Alpaydin (alpaydin '@' boun.edu.tr)\n :Date: July; 1998\n\nThis is a copy of the test set of the UCI ML hand-written digits datasets\nhttp://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits\n\nThe data set contains images of hand-written digits: 10 classes where\neach class refers to a digit.\n\nPreprocessing programs made available by NIST were used to extract\nnormalized bitmaps of handwritten digits from a preprinted form. From a\ntotal of 43 people, 30 contributed to the training set and different 13\nto the test set. 32x32 bitmaps are divided into nonoverlapping blocks of\n4x4 and the number of on pixels are counted in each block. This generates\nan input matrix of 8x8 where each element is an integer in the range\n0..16. This reduces dimensionality and gives invariance to small\ndistortions.\n\nFor info on NIST preprocessing routines, see M. D. Garris, J. L. Blue, G.\nT. Candela, D. L. Dimmick, J. Geist, P. J. Grother, S. A. Janet, and C.\nL. Wilson, NIST Form-Based Handprint Recognition System, NISTIR 5469,\n1994.\n\nReferences\n----------\n - C. Kaynak (1995) Methods of Combining Multiple Classifiers and Their\n Applications to Handwritten Digit Recognition, MSc Thesis, Institute of\n Graduate Studies in Science and Engineering, Bogazici University.\n - E. Alpaydin, C. Kaynak (1998) Cascading Classifiers, Kybernetika.\n - Ken Tang and Ponnuthurai N. Suganthan and Xi Yao and A. Kai Qin.\n Linear dimensionalityreduction using relevance weighted LDA. School of\n Electrical and Electronic Engineering Nanyang Technological University.\n 2005.\n - Claudio Gentile. A New Approximate Maximal Margin Classification\n Algorithm. NIPS. 2000.\n", # 'data': array([[ 0., 0., 5., ..., 0., 0., 0.], [ 0., 0., 0., ..., 10., 0., 0.], [ 0., 0., 0., ..., 16., 9., 0.], ..., [ 0., 0., 1., ..., 6., 0., 0.], [ 0., 0., 2., ..., 12., 0., 0.], [ 0., 0., 10., ..., 12., 1., 0.]]), # 'images': array([[[ 0., 0., 5., ..., 1., 0., 0.], [ 0., 0., 13., ..., 15., 5., 0.], [ 0., 3., 15., ..., 11., 8., 0.], ..., [ 0., 4., 11., ..., 12., 7., 0.], [ 0., 2., 14., ..., 12., 0., 0.], [ 0., 0., 6., ..., 0., 0., 0.]], [[ 0., 0., 0., ..., 5., 0., 0.], [ 0., 0., 0., ..., 9., 0., 0.], [ 0., 0., 3., ..., 6., 0., 0.], ..., [ 0., 0., 1., ..., 6., 0., 0.], [ 0., 0., 1., ..., 6., 0., 0.], [ 0., 0., 0., ..., 10., 0., 0.]], [[ 0., 0., 0., ..., 12., 0., 0.], [ 0., 0., 3., ..., 14., 0., 0.], [ 0., 0., 8., ..., 16., 0., 0.], ..., [ 0., 9., 16., ..., 0., 0., 0.], [ 0., 3., 13., ..., 11., 5., 0.], [ 0., 0., 0., ..., 16., 9., 0.]], ..., [[ 0., 0., 1., ..., 1., 0., 0.], [ 0., 0., 13., ..., 2., 1., 0.], [ 0., 0., 16., ..., 16., 5., 0.], ..., [ 0., 0., 16., ..., 15., 0., 0.], [ 0., 0., 15., ..., 16., 0., 0.], [ 0., 0., 2., ..., 6., 0., 0.]], [[ 0., 0., 2., ..., 0., 0., 0.], [ 0., 0., 14., ..., 15., 1., 0.], [ 0., 4., 16., ..., 16., 7., 0.], ..., [ 0., 0., 0., ..., 16., 2., 0.], [ 0., 0., 4., ..., 16., 2., 0.], [ 0., 0., 5., ..., 12., 0., 0.]], [[ 0., 0., 10., ..., 1., 0., 0.], [ 0., 2., 16., ..., 1., 0., 0.], [ 0., 0., 15., ..., 15., 0., 0.], ..., [ 0., 4., 16., ..., 16., 6., 0.], [ 0., 8., 16., ..., 16., 8., 0.], [ 0., 1., 8., ..., 12., 1., 0.]]]), # 'target': array([0, 1, 2, ..., 8, 9, 8]), # 'target_names': array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])}
from distutils.version import LooseVersion as Version from sklearn import __version__ as sklearn_version from sklearn import datasets if Version(sklearn_version) < '0.18': from sklearn.cross_validation import train_test_split else: from sklearn.model_selection import train_test_split X_train,X_test,y_train,y_test = train_test_split(digits.data, digits.target, test_size=0.25, random_state=33) y_train.shape #Output:(1347,) y_test.shape #Output:(450,)
from sklearn.preprocessing import StandardScaler from sklearn.svm import LinearSVC #基于线性假设的支持向量机分类器LinearSVC ss=StandardScaler() X_train=ss.fit_transform(X_train) X_test=ss.transform(X_test) lsvc=LinearSVC() lsvc.fit(X_train,y_train) y_predict=lsvc.predict(X_test) print('The Accuracy of Linear SVC is',lsvc.score(X_test,y_test)) #Output:The Accuracy of Linear SVC is 0.953333333333
from sklearn.metrics import classification_report print(classification_report(y_test,y_predict,target_names=digits.target_names.astype(str)))
相关文章推荐
- 利用基于贝叶斯定理的朴素贝叶斯分类器MultinomialNB进行多类分类(复习3)
- 利用基于线性假设的线性分类器LogisticRegression/SGDClassifier进行二类分类(复习1)
- 利用树的集成模型分类器RandomForestClassifier/GradientBoostingClassifier进行二类分类(复习6)
- 利用平行坐标轴分类的决策树分类器DecisionTreeClassifier进行二类分类(复习5)
- 利用无参数的K近邻分类器KNeighborsClassifier进行三类分类(复习4)
- Python机器学习笔记:利用Keras进行多类分类
- 基于线性假设的SVM分类器--以手写数字为例
- 使用LDA线性判别分析进行多类的训练分类
- 机器学习笔记-利用线性模型进行分类
- 利用支持向量机(回归)SVR进行回归预测(复习8)
- 用线性分类器对cifar10数据集进行分类
- 使用LDA线性判别分析进行多类的训练分类
- 量价线性模型假设-基于Adaboost和线性回归弱分类器
- 【Python与机器学习】:利用Keras进行多类分类
- 基于qt和opencv3实现机器学习之:利用svm(支持向量机)分类
- 基于VC++2010利用API函数MoveFileEx实现程序的隐藏、自启动与自删除
- 利用tensorflow一步一步实现基于MNIST 数据集进行手写数字识别的神经网络,逻辑回归
- 利用opencv训练基于Haar特征、LBP特征、Hog特征的分类器cascade.xml
- 借助weka实现的分类器进行针对文本分类问题的特征词选择实验(实验代码备份)
- 基于朴素贝叶斯分类器的文本分类算法(上)