《python机器学习及实践》书籍代码练习
2017-02-18 21:57
211 查看
采用线性模型对良/恶性乳腺癌肿瘤预测
import numpy as np import pandas as pd from sklearn.cross_validation import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression from sklearn.linear_model import SGDClassifier from sklearn.metrics import classification_report #定义名字,为了,简单命名,第一列是序号,最后一列是输出 column_names=['sample code number','1','2','3','4','5','6','7','8','9','class'] #利用pandas从网上下载数据 data=pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data',names=column_names) #删除丢失的不完整的数据 data=data.replace(to_replace='?',value=np.nan) data=data.dropna(how='any') data.shape x_train,x_test,y_train,y_test=train_test_split(data[column_names[1:10]],data[column_names[10]],test_size=0.25,random_state=33) y_train.value_counts() ss=StandardScaler() x_train=ss.fit_transform(x_train) x_test=ss.fit_transform(x_test) lr=LogisticRegression() sgdc=SGDClassifier() lr.fit(x_train,y_train) lr_y_predict=lr.predict(x_test) sgdc.fit(x_train,y_train) sgdc_y_predict=sgdc.predict(x_test) print "Accuracy of LR Classifier:",lr.score(x_test,y_test) print classification_report(y_test,lr_y_predict,target_names=['Benign','Malignant']) print "finish"
SVM对手写数字进行分类
from sklearn.datasets import load_digits from sklearn.cross_validation import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.svm import LinearSVC from sklearn.metrics import classification_report digits=load_digits() digits.data.shape x_train,x_test,y_train,y_test=train_test_split(digits.data,digits.target,test_size=0.25,random_state=33) y_train.shape ss=StandardScaler() x_train=ss.fit_transform(x_train) x_test=ss.fit_transform(x_test) lsvc=LinearSVC() lsvc.fit(x_train,y_train) y_predict=lsvc.predict(x_test) print 'The Accuracy of Linear SVC is',lsvc.score(x_test,y_test) print classification_report(y_test,y_predict,target_names=digits.target_names.astype(str))
朴素贝叶斯对新闻文本数据进行类别预测
from sklearn.datasets import fetch_20newsgroups from sklearn.cross_validation import train_test_split from sklearn.feature_extraction.text import CountVectorizer from sklearn.naive_bayes import MultinomialNB from sklearn.metrics import classification_report news=fetch_20newsgroups(subset='all') x_train,x_test,y_train,y_test=train_test_split(news.data,news.target,test_size=0.25,random_state=33) vec=CountVectorizer() x_train=vec.fit_transform(x_train) x_test=vec.transform(x_test) mnb=MultinomialNB() mnb.fit(x_train,y_train) print x_test.shape print x_train.shape y_predict=mnb.predict(x_test) print 'The Accuracy of Naive Bayes Classifier is', mnb.score(x_test,y_test) print classification_report(y_test,y_predict,target_names=news.target_names) print "done"
相关文章推荐
- 《python机器学习及实践-从零开始通往kaggle竞赛之路(代码Python 3.6 版)》chapter2.1.1.3
- 《python机器学习及实践-从零开始通往kaggle竞赛之路(代码Python 3.6 版)》chapter2.1.1.1
- 《python机器学习及实践-从零开始通往kaggle竞赛之路(代码Python 3.6 版)》chapter1.1
- 《Python 机器学习及实践--从零开始通往kaggle竞赛之路》--第一章代码
- 机器学习三人行(系列七)----支持向量机实践指南(附代码)
- Python Sting 练习实践(一)
- python机器学习及实践(从零开始kaggle竞赛之路)第二章的2.1.2.5集成模型程序报错:numpy.core._internal.AxisError: axis 0 is out of bo
- 《机器学习Python实践》CH10 评估算法
- Python之基础练习代码
- 《机器学习:算法原理与编程实践》的读书笔记:SMO部分最难,大部分代码基于Scikit-Learn,决策树其实用处不大
- 机器学习系列(9)_机器学习算法一览(附Python和R代码)
- Python 练习代码 -- 异常,抛异常, 自定义异常
- 软工实践练习——使用git进行代码管理心得
- python学习第九章类部分课后练习自己尝试的代码
- python入门代码练习
- Python练习代码 -- 类
- python编程练习---一行代码实现计算器功能
- 决策树ID3基本代码,周志华《机器学习》练习
- 推荐系统实践----基于用户的协同过滤算法(python代码实现书中案例)
- 资源 | 想用Python学机器学习?Google大神替你写好了所有的编程示范代码