您的位置:首页 > 编程语言 > Python开发

机器学习(一):决策树算法及使用python构造一个决策树

2016-08-28 19:33 549 查看
本文使用python学习机器学习,其中anaconda包很强大,并且使用到scikit-learn这个机器学习的库

首先是第一个算法:决策树算法

一,安装配置python及anaconda,见我另一篇博客。http://blog.csdn.net/qq_32166627/article/details/52301641

二,安装scikit-learn库

三,决策树的理论知识参见网络

四,准备好数据:下图用于预测一个人是否会买电脑,特征有age,income,student,credit_rating,标签就是是否买电脑

该表保存到一个csv文件里面data.csv,可用excel编辑。



五,eclipse创建一个PyDev项目,代码如下:

# _*_ coding: utf-8 _*_

from sklearn.feature_extraction import DictVectorizer

import csv

from sklearn import preprocessing

from sklearn import tree

from sklearn.externals.six import StringIO

allData = open("E:\eclipse_file\Deeplearning\data\decisionTree.csv", "rU")

reader = csv.reader(allData)

headers = next(reader)#3.4版本使用该语法,2.7版本则使用headers=reader.next()

print(headers)#打印文件第一行

featureList = []

lableList = []

for row in reader:

    lableList.append(row[len(row)-1])

    rowDic = {}

    for i in range(1,len(row)-1):

        rowDic[headers[i]] = row[i]

    featureList.append(rowDic)

print("featureList:"+str(featureList))

print("lablelist:"+str(lableList))

vec = DictVectorizer()

dummyX = vec.fit_transform(featureList).toarray()

print("dummyX:"+str(dummyX))

print("get_feature_names():"+str(vec.get_feature_names()))

lb = preprocessing.LabelBinarizer()

dummyY = lb.fit_transform(lableList)

print("dummyY:"+str(dummyY))

clf = tree.DecisionTreeClassifier(criterion="entropy")

clf = clf.fit(dummyX, dummyY)

print("clf:"+str(clf))

with open("DecisionTree_BuyCompute.dot","w") as f:

    f = tree.export_graphviz(clf, feature_names=vec.get_feature_names(),out_file = f)

    

oneRow = dummyX[0]

print("oneRow:"+str(oneRow))

oneRow[0] = 1

oneRow[2] = 0

predictY = clf.predict(oneRow)

print("predictY:"+str(predictY))
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: