【机器学习算法】之朴素贝叶斯的实现
2016-03-05 11:22
465 查看
为了加深对机器学习算法的理解,以及熟悉python,pandas,scikit-learn。现在自己实现一下主要的机器学习算法,程序记录如下:
决策树类的实现程序:
决策树类的实现程序:
from numpy import * def loadDataSet(): postingList=[['my', 'dog', 'has', 'flea', 'problems', 'help', 'please'], ['maybe', 'not', 'take', 'him', 'to', 'dog', 'park', 'stupid'], ['my', 'dalmation', 'is', 'so', 'cute', 'I', 'love', 'him'], ['stop', 'posting', 'stupid', 'worthless', 'garbage'], ['mr', 'licks', 'ate', 'my', 'steak', 'how', 'to', 'stop', 'him'], ['quit', 'buying', 'worthless', 'dog', 'food', 'stupid']] classVec = [0,1,0,1,0,1] #1 is abusive, 0 not return postingList,classVec def gen_label_prob(label): sample_len = len(label) label_dic = {} for label_val in label: label_dic[label_val] = label_dic.get(label_val,0)+1 for key in label_dic.keys(): label_dic[key]=float(label_dic[key])/sample_len return label_dic def gen_condi_prob(train_data,label,label_dic): data_len = len(train_data) label_set = set(label) res_dic={} for data_list in train_data: for label_val in label_set: for curr_x in data_list: key = tuple([curr_x,label_val]) res_dic[key] = res_dic.get(key,0)+1 for key in res_dic.keys(): res_dic[key] = float(res_dic[key])/(data_len*label_dic[key[1]]) return res_dic,label_set def predict(test,res_dic,label_set,label_dic): prob = {} for label in label_set: for curr_x in test: key=tuple([curr_x,label]) prob[label]=prob.get(label,1)*res_dic.get(key,0) max_prob=0;max_label=0 for key in prob.keys(): prob[key]=prob[key]*label_dic[key] if(prob[key]>max_prob): max_label=key max_prob=prob[key] return max_label def model_test(): train_data,train_label = loadDataSet() label_dic=gen_label_prob(train_label) res_dic,label_set=gen_condi_prob(train_data,train_label,label_dic) #x=['quit', 'buying', 'worthless', 'food', 'stupid'] x=['stop'] res_label = predict(x,res_dic,label_set,label_dic) print res_label
相关文章推荐
- 软件过程与项目管理(作业一)
- I.MX6 Android 移除 Settings wifi功能
- Easyui修改样式
- sphinx下的max_matches取值对SetLimits的影响
- Dubbo 并发调优的几个参数,dubbo并发调优参数
- ecshop 多语言版 fckeditor,支持中文英文韩文等众多语言
- jQuery选择器
- 介绍几个常用的代码管理工具
- leetcode268 Missing Number
- 央行常用操作
- linux expect 简介与使用
- nginx自签ssl证书
- io流复习笔记第三发流的套接结束及对象流读写
- java Log
- leetcode319 Bulb Switcher
- 一次U3D DLL加密的记录(一)
- 10938 - Flea circus
- leetcode:Palindrome Number 【Java】
- 一次U3D DLL加密的记录(二)
- PathEffect