您的位置:首页 > Web前端

利用贝叶斯分类器对fetch_20newsgroups数据集进行分类

2017-10-26 08:54 651 查看
from sklearn.datasets import fetch_20newsgroups

from sklearn.cross_validation import train_test_split

from  sklearn.feature_extraction.text import CountVectorizer

from sklearn.naive_bayes import MultinomialNB 

from sklearn.metrics import classification_report

news = fetch_20newsgroups(subset='all')#本次使用的数据需要到互联网上下载

#查看数据

print(len(news.data))

print(news.data[0])

#对数据训练集和测试件进行划分

X_train,X_test,y_train,y_test = train_test_split(news.data,news.target,test_size=0.25,random_state=33)

vec = CountVectorizer()

X_train = vec.fit_transform(X_train)

X_test = vec.transform(X_test)

#利用贝叶斯分类器对数据进行分类

mnb = MultinomialNB()

mnb.fit(X_train,y_train)

y_predict = mnb.predict(X_test)

print('The accuracy of Naive Bays Classifier is',mnb.score(X_test,y_test))

print(classification_report(y_test,y_predict,target_names=news.target_names))
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  机器学习 python
相关文章推荐