您的位置：首页 > 其它

Tensoflow+CNN实现简单的mnist手写数字识别

2017-11-12 16:26 751 查看

MNIST数据集

该数据集的下载官网是 mnist

在tensorflow里面可直接用下面代码实现：

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

改代码运行之后所在路径下会多出一个

MNIST_data

的文件夹，里面为mnist数据集的四个文件（假如运行时间很长那就是在下载，可以自己把mnist的四个文件拷贝到该文件夹下面）。

下载下来的数据集被分成两部分：60000行的训练数据集

（mnist.train）

和10000行的测试数据集

（mnist.test）

，每一个MNIST数据单元有两部分组成：一张包含手写数字的图片和一个对应的标签。我们把这些图片设为“xs”，把这些标签设为“ys”。训练数据集和测试数据集都包含xs和ys，比如训练数据集的图片是

mnist.train.images

，训练数据集的标签是

mnist.train.labels

。

在MNIST训练数据集中，

mnist.train.images

是一个形状为 [60000, 784] 的张量，第一个维度数字用来索引图片，第二个维度数字用来索引每张图片中的像素点。在此张量里的每一个元素，都表示某张图片里的某个像素的强度值，值介于0和1之间。

CNN网络框架

该图是利用Tensorflow中的Tensorboard功能画出来的。

可以看到从下往上分别是input–>conv1–>pool1–>conv2–>pool2–>fc1–>fc2–output，即输入，卷积层1，池化层1，卷积层2，池化层2，全连接层1，全连接层2，输出标签信息。

最原始的输入为[n_samples, 784]，reshape 之后为 x_image ，大小为[n_samples, 28, 28, 1]，其中 n_samples是样本数量，28*28为一幅图片的规格，1为颜色通道，因为在这是黑白图，假如是彩色的RGB图，颜色通道就是3。

在conv1层中，滤波器的尺寸为5*5（即patch的大小），滤波器的数量有32个，所以经过conv1层后的输出规格为 [n_samples, 28, 28, 32]。

在pool1层中，滤波器的尺寸为2*2，移动步长在x轴方向和y轴方向分别为2，所以经过这一层后的输出为[n_samples, 14, 14, 32]。

在conv2层中，滤波器的尺寸为5*5，滤波器的数量有64个，每一个对图像在深度方向上做卷积操作，所以输出的规格为[n_samples, 14, 14 ,64]。

在pool2层中，滤波器与pool1层一样，输出为[n_samples, 7, 7, 64]。

在fc1即第一层全连接层中，加入了dropout，防止过拟合，dropout的作用就是部分连接（这也是cnn与传统的神经网络的区别之一）。

fc2层输入为[n_samples, 1024]，输出是一个10维的列向量，按0,1编码，它的标签是什么，就在对应位置处置1，否则为0。比如输出的标签为2，则编码为[0 0 1 0 0 0 0 0 0 0]。

整个神经网络最终的优化目标函数为交叉熵：

Hy′(y)=−∑iy′ilog(yi)

y 是我们预测的概率分布, y’ 是实际的分布, 在实际操作的时候，因为每个batch大小为100，即每次取100个样本进行训练，这样的话交叉熵用代码表示为

cross_entropy=tf.reduce_mean(-tf.reduce_sum(ys*tf.log(prediction), reduction_indices=[1]))

然后用Adam优化器来优化这个目标函数，使其最小

train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)

交叉熵随训练次数的增加其变化趋势如图：

以上参考极客学院的tensorflow官方中文版手册

源代码

# -*- coding:utf-8: -*-
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

def compute_accuracy(v_xs, v_ys):
global prediction
y_pre = sess.run(prediction, feed_dict={xs: v_xs, keep_prob: 1})
# print('y_pre=', np.shape(y_pre))
# print('tf.argmax(y_pre, 1)=', sess.run(tf.argmax(y_pre, 1)))
correct_prediction = tf.equal(tf.argmax(y_pre, 1), tf.argmax(v_ys, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
result = sess.run(accuracy, feed_dict={xs: v_xs, ys: v_ys, keep_prob: 1})
return result

def weight_variable(shape, layer_name):
with tf.name_scope('weights'):
initial = tf.truncated_normal(shape, stddev=0.1)
weights = tf.Variable(initial, name='W')
tf.summary.histogram(layer_name+'/weights', weights)
return weights

def bias_variable(shape, layer_name):
with tf.name_scope('biases'):
initial = tf.constant(0.1, shape=shape)
biases = tf.Variable(initial, name='b')
tf.summary.histogram(layer_name+'biases', biases)
return biases

def conv2d(x, W, layer_name):  # 卷积层函数
# stride 格式为： [1, x_movement, y_movement, 1]
# must have strides[0]=strides[3]=1
outputs = tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
tf.summary.histogram(layer_name+'/outputs', outputs)
return outputs
# W: [filter_height, filter_width, in_channels, out_channels]

def max_pool_2x2(x, layer_name):  # 池化层函数
outputs = tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
tf.summary.histogram(layer_name+'/outputs', outputs)
return outputs

with tf.name_scope('inputs'):
# define placeholder for inputs to network
xs = tf.placeholder(tf.float32, [None, 784], name='x_input')  # 28*28
ys = tf.placeholder(tf.float32, [None, 10], name='y_input')
keep_prob = tf.placeholder(tf.float32, name='keep_prob')
# keep_prob是保留概率，即我们要保留的结果所占比例，
# 它作为一个placeholder，在run时传入， 当keep_prob=1的时候，相当于100%保留，也就是dropout没有起作用。
x_image = tf.reshape(xs, [-1, 28, 28, 1], name='x_image')  # 图片高度为1
# print(x_image.shape)  # [n_samples, 28, 28, 1]

##########################################################################
###  构建整个卷积神经网络
##########################################################################

# conv1 layer #
with tf.name_scope('conv1'):
W_conv1 = weight_variable([5, 5, 1, 32], 'conv1')  # patch 5*5, in_size=1. out_size=32
b_conv1 = bias_variable([32], 'conv1')
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1, 'conv1')+b_conv1)  # output_size = 28*28*32

with tf.name_scope('pool1'):
h_pool1 = max_pool_2x2(h_conv1, 'pool1')  # output_size = 14*14*32

with tf.name_scope('conv2'):
# conv2 layer #
W_conv2 = weight_variable([5, 5, 32, 64], 'conv2')  # patch 5*5, in_size=32. out_size=64
b_conv2 = bias_variable([64], 'conv2')
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2, 'conv2')+b_conv2)  # output_size = 14*14*64

with tf.name_scope('pool2'):
h_pool2 = max_pool_2x2(h_conv2, 'pool2')  # output_size = 7*7*64

with tf.name_scope('fc1'):
# func1 layer #
W_fc1 = weight_variable([7*7*64, 1024], 'fc1')
b_fc1 = bias_variable([1024], 'fc1')

h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])  # [n_samples, 7, 7, 64]-->[n_samples,7*7*64]
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1)+b_fc1)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)  # dropout防止过拟合

with tf.name_scope('fc2'):
# func2 layer #
W_fc2 = weight_variable([1024, 10], 'fc2')
b_fc2 = bias_variable([10], 'fc2')
prediction = tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2)+b_fc2)

with tf.name_scope('cross_entropy'):
cross_entropy = tf.reduce_mean(-tf.reduce_sum(ys*tf.log(prediction), reduction_indices=[1]))
tf.summary.scalar('corss_entropy', cross_entropy)

with tf.name_scope('train'):
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)

sess = tf.Session()
merged = tf.summary.merge_all()
writer = tf.summary.FileWriter("logs/", sess.graph)

sess.run(tf.global_variables_initializer())

for i in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(100)
# print batch_xs.shape, batch_ys.shape
## 输出 (100, 784) (100, 10)
sess.run(train_step, feed_dict={xs: batch_xs, ys: batch_ys, keep_prob: 0.5})
if i % 50 == 0:
rs = sess.run(merged, feed_dict={xs: batch_xs, ys: batch_ys, keep_prob: 0.5})
writer.add_summary(rs, i)
print(compute_accuracy(mnist.test.images[: 1000], mnist.test.labels[: 1000]))

以上代码参考morvan的tensorflow教程

运行结果

手写数字识别准确率达到了96.3%

last but not least

这是我的关于深度学习的第一篇博客，我相信我能坚持写下去，与更多的朋友交流深度学习，计算机视觉等相关领域的知识。也感谢我身边有个一直督促我前进的人，哈哈。最后附上NG和他夫人的照片。

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航