k最邻近算法-KNN,及python3 实例代码
2016-12-04 17:59
399 查看
刚读了《machine learning in action》的KNN算法。
K最近邻算法(kNN,k-NearestNeighbo),即计算到每个样本的距离,选取前k个。从前k个选择出大多数属于的class来进行分类,以下特点:
1. 简单,无需训练
2. 样本数量不平衡时, 对‘最邻近,大多数’这样的规则,明显样本数量多的分类占优势
3. 计算到全部样本的距离,计算量大
书中给出的第一个实例代码如下,原书中是python2的,下面改为python3 (仅对一行代码进行了修改):
'''
first case of KNN classifer
'''
from numpy import *
import operator
def createDataSet():
group = array([[1.0,1.1],[1.0,1.0],[0,0],[0,0.1]])
labels = ['A','A','B','B']
return (group,labels)
def classify0(inX, dataSet, labels, k):
dataSetSize = dataSet.shape[0]
diffMat = tile(inX, (dataSetSize,1))-dataSet
sqDiffMat = diffMat**2
sqDistances = sqDiffMat.sum(axis=1)
distances = sqDistances**0.5
sortedDistIndicies = distances.argsort()
classCount={}
for i in range(k):
voteIlabel = labels[sortedDistIndicies[i]]
classCount[voteIlabel] = classCount.get(voteIlabel,0) + 1
# change itemgetter to item
sortedClassCount = sorted(classCount.items(),key=operator.itemgetter(1), reverse=True)
return sortedClassCount[0][0]
if __name__=='__main__':
print ('dataset - labels')
print(createDataSet())
group,labels = createDataSet()
label = classify0([1,1.3],group,labels,3)
print (label)
K最近邻算法(kNN,k-NearestNeighbo),即计算到每个样本的距离,选取前k个。从前k个选择出大多数属于的class来进行分类,以下特点:
1. 简单,无需训练
2. 样本数量不平衡时, 对‘最邻近,大多数’这样的规则,明显样本数量多的分类占优势
3. 计算到全部样本的距离,计算量大
书中给出的第一个实例代码如下,原书中是python2的,下面改为python3 (仅对一行代码进行了修改):
'''
first case of KNN classifer
'''
from numpy import *
import operator
def createDataSet():
group = array([[1.0,1.1],[1.0,1.0],[0,0],[0,0.1]])
labels = ['A','A','B','B']
return (group,labels)
def classify0(inX, dataSet, labels, k):
dataSetSize = dataSet.shape[0]
diffMat = tile(inX, (dataSetSize,1))-dataSet
sqDiffMat = diffMat**2
sqDistances = sqDiffMat.sum(axis=1)
distances = sqDistances**0.5
sortedDistIndicies = distances.argsort()
classCount={}
for i in range(k):
voteIlabel = labels[sortedDistIndicies[i]]
classCount[voteIlabel] = classCount.get(voteIlabel,0) + 1
# change itemgetter to item
sortedClassCount = sorted(classCount.items(),key=operator.itemgetter(1), reverse=True)
return sortedClassCount[0][0]
if __name__=='__main__':
print ('dataset - labels')
print(createDataSet())
group,labels = createDataSet()
label = classify0([1,1.3],group,labels,3)
print (label)
相关文章推荐
- 机器学习—— SVM分类垃圾短信
- python 自动化之路 分分钟带你写个FTP
- python-matplotlib
- 图形中添加纯文本注释(text)
- Python静态方法实现单实例模式
- 转载:python中的StringIO模块
- python 笔试题
- 初学 Python(十四)——生成器
- R-FCN、SSD、YOLO2、faster-rcnn和labelImg实验笔记
- ubuntu sublime text3 python 配置
- vim-python编辑器
- tkinter学习总结-初学者
- python的下一步
- python 之 格式化字符串函数format()
- Python学习笔记
- python 实现删除文件或文件夹实例详解
- bug宝典PYTHON篇 UnicodeDecodeError: ‘gbk’ codec can’t decode
- 图形中添加注释(annotate)
- 更换Python默认软件镜像源
- python 根据正则表达式提取指定的内容实例详解