您的位置:首页 > 其它

Kaggle入门赛之Digit Recognizer

2017-03-22 12:29 393 查看
题目大意:手写数字的识别。每个数字由28*28的像素矩阵表示,也就是784个像素点。每个像素点的值between 0 and 255。

思路:knn在数字识别方面表现比较好,因为特征维数过多,kd_tree比较慢,所以我采用的是基于ball_tree的knn。每个像素的值都归一化,非0值都变成1。

工具:py2.7,sklearn,pycharm

# -*- coding: utf-8 -*-
import csv
import numpy as ny
from sklearn.neighbors import KNeighborsClassifier

def to_int(list):
n=len(list)
for i in range(n):
list[i]=int(list[i])
return list

# 归一化
def normalize(array):
n,m=array.shape
for i in range(n):
for j in range(m):
if array[i,j]!=0:
array[i,j]=1
return array

# 读取训练集
def load_train_data():
train_data=[]
train_label=[]
with open('E:\\data\\kaggle\\digit recognizer\\train.csv','rb') as file:
lines=csv.reader(file)
header=True
for line in lines:
if header:
header=False
continue
train_label.append(int(line[0]))
train_data.append(to_int(line[1:]))
return normalize(ny.array(train_data)),ny.array(train_label)

# 读取测试集
def load_test_data():
test_data=[]
with open('E:\\data\\kaggle\\digit recognizer\\test.csv','rb') as file:
lines=csv.reader(file)
header=True
for line in lines:
if header:
header=False
continue
test_data.append(to_int(line))
return normalize(ny.array(test_data))

def classify():
train_data,train_label=load_train_data()
test_data=load_test_data()
neigh=KNeighborsClassifier(algorithm='ball_tree')
neigh.fit(train_data,train_label)
result=[]
result.append(('ImageId','Label'))
i=1
for item in test_data:
label=neigh.predict(ny.array(item).reshape((1,-1)))
result.append((i,label[0]))
i+=1
with open('E:\\data\\kaggle\\digit recognizer\\result.csv','wb') as file:
writer=csv.writer(file)
writer.writerows(result)

classify()


大概跑了半小时就出来了,这是我的分数

内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: