利用Tensorflow的Slim API实现卷积神经网络
2017-10-24 17:03
543 查看
这段时间在小象学院上戎雪健老师主讲《神经网络》这门课。戎老师讲得很好。但我老没时间跑老师给的代码。老师推荐尽量用TF-SLIM实现复杂结构。
下面就是以著名的mnist数据集来实例一个神经网络的实现。
Jupyter notebook输出结果:
在Jupyter notebook中运行模型,代码如下:
在我GPU上运行了几个小时后,结果如下:
best_epoch = 47
restorename = savedir + "net-" + str(best_epoch) + ".ckpt"
print ("LOADING [%s]" % (restorename))
saver.restore(sess, restorename)
feeds = {x: testimg, y: testlabel, is_training: False}
test_acc = sess.run(accr, feed_dict=feeds)
print ("TEST ACCURACY: %.5f" % (test_acc))
最后在测试集上跑一下,效果也还可以:
LOADING [nets/cnn_mnist_modern/net-47.ckpt]TEST ACCURACY: 0.99120
---------------------------------------------------------------------------ResourceExhaustedError Traceback (most recent call last)C:\Users\CC-Laptop\Anaconda3\lib\site-packages\tensorflow\python\client\session.py
in _do_call(self, fn, *args) 1326try:-> 1327return
fn(*args) 1328except errors.OpErroras
e:C:\Users\CC-Laptop\Anaconda3\lib\site-packages\tensorflow\python\client\session.py in_run_fn(session, feed_dict,
fetch_list, target_list, options, run_metadata) 1305 feed_dict, fetch_list, target_list,->
1306 status, run_metadata) 1307C:\Users\CC-Laptop\Anaconda3\lib\contextlib.py in__exit__(self,
type, value, traceback) 65try:---> 66next(self.gen)
67except StopIteration:C:\Users\CC-Laptop\Anaconda3\lib\site-packages\tensorflow\python\framework\errors_impl.py inraise_exception_on_not_ok_status()
465 compat.as_text(pywrap_tensorflow.TF_Message(status)),-->
466 pywrap_tensorflow.TF_GetCode(status)) 467finally:ResourceExhaustedError:
OOM when allocating tensor with shape[5000,28,28,64] [[Node: conv2/convolution = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Reshape, conv2/weights/read)]]
[[Node: Mean_1/_117 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_255_Mean_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]During
handling of the above exception, another exception occurred:ResourceExhaustedError Traceback (most recent call last)<ipython-input-19-6519d8ed8769> in<module>()
31#下面这段代码计算在验证数据集上的准确度,原来的代码不能工作 32 feeds=
{x: valimg, y: vallabel, is_training:False}--->
33val_acc = sess.run(accr, feed_dict=feeds)
34 35
#total_batch_val=int(valimg.shape[0]/batch_size)C:\Users\CC-Laptop\Anaconda3\lib\site-packages\tensorflow\python\client\session.py inrun(self,
fetches, feed_dict, options, run_metadata) 893try: 894 result = self._run(None, fetches, feed_dict,
options_ptr,--> 895 run_metadata_ptr) 896
if run_metadata: 897 proto_data= tf_session.TF_GetBuffer(run_metadata_ptr)C:\Users\CC-Laptop\Anaconda3\lib\site-packages\tensorflow\python\client\session.py
in _run(self, handle, fetches, feed_dict, options, run_metadata) 1122if final_fetches
or final_targetsor
(handle and feed_dict_tensor): 1123 results = self._do_run(handle, final_targets, final_fetches,->
1124 feed_dict_tensor, options, run_metadata) 1125else: 1126 results=
[]C:\Users\CC-Laptop\Anaconda3\lib\site-packages\tensorflow\python\client\session.py in_do_run(self,
handle, target_list, fetch_list, feed_dict, options, run_metadata) 1319if handle
is None: 1320 return self._do_call(_run_fn, self._session, feeds, fetches, targets,->
1321 options, run_metadata) 1322
else: 1323return self._do_call(_prun_fn,
self._session, handle, feeds, fetches)C:\Users\CC-Laptop\Anaconda3\lib\site-packages\tensorflow\python\client\session.py
in_do_call(self, fn, *args) 1338except KeyError: 1339pass->
1340raise type(e)(node_def, op, message)
1341 1342
def _extend_graph(self):ResourceExhaustedError: OOM when allocating tensor with shape[5000,28,28,64]
[[Node: conv2/convolution = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Reshape, conv2/weights/read)]] [[Node: Mean_1/_117 = _Recv[client_terminated=false,
recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_255_Mean_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]Caused by op 'conv2/convolution',
defined at: File "C:\Users\CC-Laptop\Anaconda3\lib\runpy.py", line 184, in _run_module_as_main "__main__", mod_spec) File "C:\Users\CC-Laptop\Anaconda3\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\ipykernel\__main__.py",
line 3, in <module> app.launch_new_instance() File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\traitlets\config\application.py", line 653, in launch_instance app.start() File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\ipykernel\kernelapp.py", line
474, in start ioloop.IOLoop.instance().start() File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\zmq\eventloop\ioloop.py", line 162, in start super(ZMQIOLoop, self).start() File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\tornado\ioloop.py", line 887,
in start handler_func(fd_obj, events) File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\tornado\stack_context.py", line 275, in null_wrapper return fn(*args, **kwargs) File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\zmq\eventloop\zmqstream.py", line
440, in _handle_events self._handle_recv() File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\zmq\eventloop\zmqstream.py", line 472, in _handle_recv self._run_callback(callback, msg) File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\zmq\eventloop\zmqstream.py",
line 414, in _run_callback callback(*args, **kwargs) File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\tornado\stack_context.py", line 275, in null_wrapper return fn(*args, **kwargs) File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\ipykernel\kernelbase.py",
line 276, in dispatcher return self.dispatch_shell(stream, msg) File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\ipykernel\kernelbase.py", line 228, in dispatch_shell handler(stream, idents, msg) File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\ipykernel\kernelbase.py",
line 390, in execute_request user_expressions, allow_stdin) File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\ipykernel\ipkernel.py", line 196, in do_execute res = shell.run_cell(code, store_history=store_history, silent=silent) File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\ipykernel\zmqshell.py",
line 501, in run_cell return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs) File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2717, in run_cell interactivity=interactivity, compiler=compiler, result=result)
File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2821, in run_ast_nodes if self.run_code(code, result): File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2881, in run_code
exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-4-0133824eed48>", line 2, in <module> pred = CNN(x, is_training) File "<ipython-input-3-d15e2c190a64>", line 30, in CNN , scope='conv2') File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\tensorflow\contrib\framework\python\ops\arg_scope.py",
line 181, in func_with_args return func(*args, **current_args) File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\tensorflow\contrib\layers\python\layers\layers.py", line 1027, in convolution outputs = layer.apply(inputs) File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\tensorflow\python\layers\base.py",
line 503, in apply return self.__call__(inputs, *args, **kwargs) File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\tensorflow\python\layers\base.py", line 450, in __call__ outputs = self.call(inputs, *args, **kwargs) File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\tensorflow\python\layers\convolutional.py",
line 158, in call data_format=utils.convert_data_format(self.data_format, self.rank + 2)) File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 672, in convolution op=op) File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\tensorflow\python\ops\nn_ops.py",
line 338, in with_space_to_batch return op(input, num_spatial_dims, padding) File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 664, in op name=name) File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\tensorflow\python\ops\nn_ops.py",
line 131, in _non_atrous_convolution name=name) File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\tensorflow\python\ops\gen_nn_ops.py", line 397, in conv2d data_format=data_format, name=name) File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py",
line 767, in apply_op op_def=op_def) File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 2630, in create_op original_op=self._default_original_op, op_def=op_def) File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py",
line 1204, in __init__ self._traceback = self._graph._extract_stack() # pylint: disable=protected-accessResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[5000,28,28,64] [[Node: conv2/convolution = Conv2D[T=DT_FLOAT, data_format="NHWC",
padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Reshape, conv2/weights/read)]] [[Node: Mean_1/_117 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0",
send_device_incarnation=1, tensor_name="edge_255_Mean_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
这类问题的一般解决方法是将数据集分成更小的batch,然后再训练或测试。例如上面的问题,把下面两句话:
feeds = {x: valimg, y: vallabel, is_training: False}
val_acc = sess.run(accr, feed_dict=feeds)
改成下面一段即可:
total_batch_val=int(valimg.shape[0]/batch_size)
print("在验证数据集上分%d批计算准确度", % total_batch_val)
val_acc_sum = 0.0
for j in range(total_batch_val):
feeds = {x: valimg[j*batch_size:min((j+1)*batch_size,valimg.shape[0]-1)],
y: vallabel[j*batch_size:min((j+1)*batch_size,valimg.shape[0]-1)],
is_training: False}
val_acc = sess.run(accr, feed_dict=feeds)
val_acc_sum = val_acc_sum + val_acc
val_acc = val_acc_sum/total_batch_val
#代码修改结束
下面就是以著名的mnist数据集来实例一个神经网络的实现。
import os import numpy as np from scipy import ndimage import matplotlib.pyplot as plt import tensorflow as tf import tensorflow.contrib.slim as slim import time from tensorflow.examples.tutorials.mnist import input_data # %matplotlib inline #装载minist数据集,请把该数据集的四个文件拷贝到程序所在目录的data子目录下 mnist = input_data.read_data_sets(r'data/', one_hot=True) trainimg = mnist.train.images trainlabel = mnist.train.labels valimg = mnist.validation.images vallabel = mnist.validation.labels testimg = mnist.test.images testlabel = mnist.test.labels print ("MNIST ready")jupyter notebook运行结果:
Extracting Z:\CarlWu\temp\machinelearning_course\Hadoop_cn\deeplearning\DeepLearningCourseCodes-master\04_CNN_advances\data/train-images-idx3-ubyte.gz Extracting Z:\CarlWu\temp\machinelearning_course\Hadoop_cn\deeplearning\DeepLearningCourseCodes-master\04_CNN_advances\data/train-labels-idx1-ubyte.gz Extracting Z:\CarlWu\temp\machinelearning_course\Hadoop_cn\deeplearning\DeepLearningCourseCodes-master\04_CNN_advances\data/t10k-images-idx3-ubyte.gz Extracting Z:\CarlWu\temp\machinelearning_course\Hadoop_cn\deeplearning\DeepLearningCourseCodes-master\04_CNN_advances\data/t10k-labels-idx1-ubyte.gz MNIST ready
定义神经网络模型
n_input = 784 n_classes = 10 x = tf.placeholder("float", [None, n_input]) y = tf.placeholder("float", [None, n_classes]) is_training = tf.placeholder(tf.bool) def lrelu(x, leak=0.2, name='lrelu'): with tf.variable_scope(name): f1 = 0.5 * (1 + leak) f2 = 0.5 * (1 - leak) return f1 * x + f2 * abs(x) def CNN(inputs, is_training=True): x = tf.reshape(inputs, [-1, 28, 28, 1]) batch_norm_params = {'is_training': is_training, 'decay': 0.9 , 'updates_collections': None} init_func = tf.truncated_normal_initializer(stddev=0.01) net = slim.conv2d(x, 32, [5, 5], padding='SAME' , activation_fn = lrelu , weights_initializer = init_func , normalizer_fn = slim.batch_norm , normalizer_params = batch_norm_params , scope='conv1') net = slim.max_pool2d(net, [2, 2], scope='pool1') net = slim.conv2d(x, 64, [5, 5], padding='SAME' , activation_fn = lrelu , weights_initializer = init_func , normalizer_fn = slim.batch_norm , normalizer_params = batch_norm_params , scope='conv2') net = slim.max_pool2d(net, [2, 2], scope='pool2') net = slim.flatten(net, scope='flatten3') net = slim.fully_connected(net, 1024 , activation_fn = lrelu , weights_initializer = init_func , normalizer_fn = slim.batch_norm , normalizer_params = batch_norm_params , scope='fc4') net = slim.dropout(net, keep_prob=0.7, is_training=is_training, scope='dr') out = slim.fully_connected(net, n_classes , activation_fn=None, normalizer_fn=None, scope='fco') return out print ("神经网络准备完毕")
定义图结构
# PREDICTION pred = CNN(x, is_training) # LOSS AND OPTIMIZER cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits( labels=y, logits=pred)) optm = tf.train.AdamOptimizer(learning_rate=0.001).minimize(cost) corr = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1)) accr = tf.reduce_mean(tf.cast(corr, "float")) # INITIALIZER init = tf.global_variables_initializer() sess = tf.Session() sess.run(init) print ("FUNCTIONS READY") #检查变量 print ("=================== TRAINABLE VARIABLES ===================") t_weights = tf.trainable_variables() var_names_list = [v.name for v in tf.trainable_variables()] for i in range(len(t_weights)): wval = sess.run(t_weights[i]) print ("[%d/%d] [%s] / SAHPE IS %s" % (i, len(t_weights), var_names_list[i], wval.shape,))
Jupyter notebook输出结果:
=================== TRAINABLE VARIABLES =================== [0/8] [conv1/weights:0] / SAHPE IS (5, 5, 1, 32) [1/8] [conv1/BatchNorm/beta:0] / SAHPE IS (32,) [2/8] [conv2/weights:0] / SAHPE IS (5, 5, 1, 64) [3/8] [conv2/BatchNorm/beta:0] / SAHPE IS (64,) [4/8] [fc4/weights:0] / SAHPE IS (12544, 1024) [5/8] [fc4/BatchNorm/beta:0] / SAHPE IS (1024,) [6/8] [fco/weights:0] / SAHPE IS (1024, 10) [7/8] [fco/biases:0] / SAHPE IS (10,)
#将模型存储在nets子目录下的一个目录中 savedir = "nets/cnn_mnist_modern/" saver = tf.train.Saver(max_to_keep=100) save_step = 4 if not os.path.exists(savedir): os.makedirs(savedir) print ("SAVER READY") #增加图片数据,训练模型 def augment_img(xs): out = np.copy(xs) xs_r = np.reshape(xs, [-1, 28, 28]) for i in range(xs_r.shape[0]): xs_img = xs_r[i, :, :] bg_value = 0 # ROTATE angle = np.random.randint(-15, 15, 1).astype(float) xs_img = ndimage.rotate(xs_img, angle, reshape=False, cval=bg_value) # ZOOM rg = 0.1 zoom_factor = np.random.uniform(1., 1.+rg) h, w = xs_img.shape[:2] zh = int(np.round(zoom_factor * h)) zw = int(np.round(zoom_factor * w)) top = (zh - h) // 2 left = (zw - w) // 2 zoom_tuple = (zoom_factor,) * 2 + (1,) * (xs_img.ndim - 2) temp = ndimage.zoom(xs_img[top:top+zh, left:left+zw], zoom_tuple) trim_top = ((temp.shape[0] - h) // 2) trim_left = ((temp.shape[1] - w) // 2) xs_img = temp[trim_top:trim_top+h, trim_left:trim_left+w] # SHIFT shift = np.random.randint(-3, 3, 2) xs_img = ndimage.shift(xs_img, shift, cval=bg_value) # RESHAPE xs_v = np.reshape(xs_img, [1, -1]) out[i, :] = xs_v return out
在Jupyter notebook中运行模型,代码如下:
# PARAMETERS training_epochs = 50 batch_size = 50 display_step = 3 val_acc = 0 val_acc_max = 0 # OPTIMIZE currentTime = time.time() for epoch in range(training_epochs): avg_cost = 0. total_batch = int(mnist.train.num_examples/batch_size) # ITERATION for i in range(total_batch): batch_xs, batch_ys = mnist.train.next_batch(batch_size) # AUGMENT DATA batch_xs = augment_img(batch_xs) feeds = {x: batch_xs, y: batch_ys, is_training: True} sess.run(optm, feed_dict=feeds) avg_cost += sess.run(cost, feed_dict=feeds) avg_cost = avg_cost / total_batch # DISPLAY if (epoch+1) % display_step == 0: print('time spent is ', (time.time()-currentTime)) currentTime = time.time() print ("Epoch: %03d/%03d cost: %.9f" % (epoch+1, training_epochs, avg_cost)) randidx = np.random.permutation(trainimg.shape[0])[:500] feeds = {x: trainimg[randidx], y: trainlabel[randidx], is_training: False} train_acc = sess.run(accr, feed_dict=feeds) print (" TRAIN ACCURACY: %.5f" % (train_acc)) #下面这段代码计算在验证数据集上的准确度,原来的代码不能工作 #feeds = {x: valimg, y: vallabel, is_training: False} #val_acc = sess.run(accr, feed_dict=feeds) total_batch_val=int(valimg.shape[0]/batch_size) print("在验证数据集上分%d批计算准确度", % total_batch_val) val_acc_sum = 0.0 for j in range(total_batch_val): feeds = {x: valimg[j*batch_size:min((j+1)*batch_size,valimg.shape[0]-1)], y: vallabel[j*batch_size:min((j+1)*batch_size,valimg.shape[0]-1)], is_training: False} val_acc = sess.run(accr, feed_dict=feeds) val_acc_sum = val_acc_sum + val_acc val_acc = val_acc_sum/total_batch_val #代码修改结束 print (" 在验证数据集上的准确度为: %.5f" % (val_acc)) # SAVE if (epoch+1) % save_step == 0: savename = savedir + "net-" + str(epoch) + ".ckpt" saver.save(sess=sess, save_path=savename) print (" [%s] SAVED." % (savename)) # MAXIMUM VALIDATION ACCURACY if val_acc > val_acc_max: val_acc_max = val_acc best_epoch = epoch print ("\x1b[31m BEST EPOCH UPDATED!! [%d] \x1b[0m" % (best_epoch)) print ("OPTIMIZATION FINISHED")
在我GPU上运行了几个小时后,结果如下:
time spent is 595.5124831199646 Epoch: 003/050 cost: 0.056146707 TRAIN ACCURACY: 0.99200 total batch val: total_batch_val 100 VALIDATION ACCURACY: 0.99160 BEST EPOCH UPDATED!! [2] [nets/cnn_mnist_modern/net-3.ckpt] SAVED. time spent is 644.9777743816376 Epoch: 006/050 cost: 0.052948017 TRAIN ACCURACY: 0.99400 total batch val: total_batch_val 100 VALIDATION ACCURACY: 0.99020 [nets/cnn_mnist_modern/net-7.ckpt] SAVED. time spent is 689.395813703537 Epoch: 009/050 cost: 0.052893652 TRAIN ACCURACY: 0.99200 total batch val: total_batch_val 100 VALIDATION ACCURACY: 0.99180 BEST EPOCH UPDATED!! [8] time spent is 598.4757721424103 ... ... Epoch: 042/050 cost: 0.037603188 TRAIN ACCURACY: 0.99200 total batch val: total_batch_val 100 VALIDATION ACCURACY: 0.99500 [nets/cnn_mnist_modern/net-43.ckpt] SAVED. time spent is 689.3062949180603 Epoch: 045/050 cost: 0.034730853 TRAIN ACCURACY: 0.99400 total batch val: total_batch_val 100 VALIDATION ACCURACY: 0.99520 BEST EPOCH UPDATED!! [44] time spent is 616.6805007457733 Epoch: 048/050 cost: 0.035798393 TRAIN ACCURACY: 0.99800 total batch val: total_batch_val 100 VALIDATION ACCURACY: 0.99340 [nets/cnn_mnist_modern/net-47.ckpt] SAVED. OPTIMIZATION FINISHED
best_epoch = 47
restorename = savedir + "net-" + str(best_epoch) + ".ckpt"
print ("LOADING [%s]" % (restorename))
saver.restore(sess, restorename)
feeds = {x: testimg, y: testlabel, is_training: False}
test_acc = sess.run(accr, feed_dict=feeds)
print ("TEST ACCURACY: %.5f" % (test_acc))
最后在测试集上跑一下,效果也还可以:
LOADING [nets/cnn_mnist_modern/net-47.ckpt]TEST ACCURACY: 0.99120
总结下遇到的问题及解决方法:
由于我的gpu计算能力只有3.5,老是遇到OOM及ResourceExhaustedError错误:---------------------------------------------------------------------------ResourceExhaustedError Traceback (most recent call last)C:\Users\CC-Laptop\Anaconda3\lib\site-packages\tensorflow\python\client\session.py
in _do_call(self, fn, *args) 1326try:-> 1327return
fn(*args) 1328except errors.OpErroras
e:C:\Users\CC-Laptop\Anaconda3\lib\site-packages\tensorflow\python\client\session.py in_run_fn(session, feed_dict,
fetch_list, target_list, options, run_metadata) 1305 feed_dict, fetch_list, target_list,->
1306 status, run_metadata) 1307C:\Users\CC-Laptop\Anaconda3\lib\contextlib.py in__exit__(self,
type, value, traceback) 65try:---> 66next(self.gen)
67except StopIteration:C:\Users\CC-Laptop\Anaconda3\lib\site-packages\tensorflow\python\framework\errors_impl.py inraise_exception_on_not_ok_status()
465 compat.as_text(pywrap_tensorflow.TF_Message(status)),-->
466 pywrap_tensorflow.TF_GetCode(status)) 467finally:ResourceExhaustedError:
OOM when allocating tensor with shape[5000,28,28,64] [[Node: conv2/convolution = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Reshape, conv2/weights/read)]]
[[Node: Mean_1/_117 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_255_Mean_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]During
handling of the above exception, another exception occurred:ResourceExhaustedError Traceback (most recent call last)<ipython-input-19-6519d8ed8769> in<module>()
31#下面这段代码计算在验证数据集上的准确度,原来的代码不能工作 32 feeds=
{x: valimg, y: vallabel, is_training:False}--->
33val_acc = sess.run(accr, feed_dict=feeds)
34 35
#total_batch_val=int(valimg.shape[0]/batch_size)C:\Users\CC-Laptop\Anaconda3\lib\site-packages\tensorflow\python\client\session.py inrun(self,
fetches, feed_dict, options, run_metadata) 893try: 894 result = self._run(None, fetches, feed_dict,
options_ptr,--> 895 run_metadata_ptr) 896
if run_metadata: 897 proto_data= tf_session.TF_GetBuffer(run_metadata_ptr)C:\Users\CC-Laptop\Anaconda3\lib\site-packages\tensorflow\python\client\session.py
in _run(self, handle, fetches, feed_dict, options, run_metadata) 1122if final_fetches
or final_targetsor
(handle and feed_dict_tensor): 1123 results = self._do_run(handle, final_targets, final_fetches,->
1124 feed_dict_tensor, options, run_metadata) 1125else: 1126 results=
[]C:\Users\CC-Laptop\Anaconda3\lib\site-packages\tensorflow\python\client\session.py in_do_run(self,
handle, target_list, fetch_list, feed_dict, options, run_metadata) 1319if handle
is None: 1320 return self._do_call(_run_fn, self._session, feeds, fetches, targets,->
1321 options, run_metadata) 1322
else: 1323return self._do_call(_prun_fn,
self._session, handle, feeds, fetches)C:\Users\CC-Laptop\Anaconda3\lib\site-packages\tensorflow\python\client\session.py
in_do_call(self, fn, *args) 1338except KeyError: 1339pass->
1340raise type(e)(node_def, op, message)
1341 1342
def _extend_graph(self):ResourceExhaustedError: OOM when allocating tensor with shape[5000,28,28,64]
[[Node: conv2/convolution = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Reshape, conv2/weights/read)]] [[Node: Mean_1/_117 = _Recv[client_terminated=false,
recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_255_Mean_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]Caused by op 'conv2/convolution',
defined at: File "C:\Users\CC-Laptop\Anaconda3\lib\runpy.py", line 184, in _run_module_as_main "__main__", mod_spec) File "C:\Users\CC-Laptop\Anaconda3\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\ipykernel\__main__.py",
line 3, in <module> app.launch_new_instance() File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\traitlets\config\application.py", line 653, in launch_instance app.start() File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\ipykernel\kernelapp.py", line
474, in start ioloop.IOLoop.instance().start() File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\zmq\eventloop\ioloop.py", line 162, in start super(ZMQIOLoop, self).start() File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\tornado\ioloop.py", line 887,
in start handler_func(fd_obj, events) File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\tornado\stack_context.py", line 275, in null_wrapper return fn(*args, **kwargs) File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\zmq\eventloop\zmqstream.py", line
440, in _handle_events self._handle_recv() File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\zmq\eventloop\zmqstream.py", line 472, in _handle_recv self._run_callback(callback, msg) File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\zmq\eventloop\zmqstream.py",
line 414, in _run_callback callback(*args, **kwargs) File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\tornado\stack_context.py", line 275, in null_wrapper return fn(*args, **kwargs) File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\ipykernel\kernelbase.py",
line 276, in dispatcher return self.dispatch_shell(stream, msg) File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\ipykernel\kernelbase.py", line 228, in dispatch_shell handler(stream, idents, msg) File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\ipykernel\kernelbase.py",
line 390, in execute_request user_expressions, allow_stdin) File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\ipykernel\ipkernel.py", line 196, in do_execute res = shell.run_cell(code, store_history=store_history, silent=silent) File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\ipykernel\zmqshell.py",
line 501, in run_cell return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs) File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2717, in run_cell interactivity=interactivity, compiler=compiler, result=result)
File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2821, in run_ast_nodes if self.run_code(code, result): File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2881, in run_code
exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-4-0133824eed48>", line 2, in <module> pred = CNN(x, is_training) File "<ipython-input-3-d15e2c190a64>", line 30, in CNN , scope='conv2') File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\tensorflow\contrib\framework\python\ops\arg_scope.py",
line 181, in func_with_args return func(*args, **current_args) File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\tensorflow\contrib\layers\python\layers\layers.py", line 1027, in convolution outputs = layer.apply(inputs) File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\tensorflow\python\layers\base.py",
line 503, in apply return self.__call__(inputs, *args, **kwargs) File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\tensorflow\python\layers\base.py", line 450, in __call__ outputs = self.call(inputs, *args, **kwargs) File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\tensorflow\python\layers\convolutional.py",
line 158, in call data_format=utils.convert_data_format(self.data_format, self.rank + 2)) File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 672, in convolution op=op) File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\tensorflow\python\ops\nn_ops.py",
line 338, in with_space_to_batch return op(input, num_spatial_dims, padding) File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 664, in op name=name) File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\tensorflow\python\ops\nn_ops.py",
line 131, in _non_atrous_convolution name=name) File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\tensorflow\python\ops\gen_nn_ops.py", line 397, in conv2d data_format=data_format, name=name) File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py",
line 767, in apply_op op_def=op_def) File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 2630, in create_op original_op=self._default_original_op, op_def=op_def) File "C:\Users\CC-Laptop\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py",
line 1204, in __init__ self._traceback = self._graph._extract_stack() # pylint: disable=protected-accessResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[5000,28,28,64] [[Node: conv2/convolution = Conv2D[T=DT_FLOAT, data_format="NHWC",
padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Reshape, conv2/weights/read)]] [[Node: Mean_1/_117 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0",
send_device_incarnation=1, tensor_name="edge_255_Mean_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
这类问题的一般解决方法是将数据集分成更小的batch,然后再训练或测试。例如上面的问题,把下面两句话:
feeds = {x: valimg, y: vallabel, is_training: False}
val_acc = sess.run(accr, feed_dict=feeds)
改成下面一段即可:
total_batch_val=int(valimg.shape[0]/batch_size)
print("在验证数据集上分%d批计算准确度", % total_batch_val)
val_acc_sum = 0.0
for j in range(total_batch_val):
feeds = {x: valimg[j*batch_size:min((j+1)*batch_size,valimg.shape[0]-1)],
y: vallabel[j*batch_size:min((j+1)*batch_size,valimg.shape[0]-1)],
is_training: False}
val_acc = sess.run(accr, feed_dict=feeds)
val_acc_sum = val_acc_sum + val_acc
val_acc = val_acc_sum/total_batch_val
#代码修改结束
相关文章推荐
- TensorFlow实战5:利用卷积神经网络对图像分类(初阶:MNIST手写数字)代码实现
- 利用卷积神经网络(VGG19)实现火灾分类(附tensorflow代码及训练集)
- 利用TensorFlow实现卷积神经网络做文本分类
- 利用TensorFlow实现卷积神经网络做文本分类
- tensorflow实战(三)实现卷积神经网络
- TensorFlow实现卷积神经网络
- 基于TensorFlow实现卷积神经网络 2
- 利用Mesosphere DC/OS在任意基础设施之上实现TensorFlow分布
- TensorFlow实现用于图像分类的卷积神经网络(代码详细注释)
- 利用 TensorFlow 实现上下文的 Chat-bots
- [置顶] tensorflow实战3-利用seq2seq实现一个聊天机器人
- 基于字符的卷积神经网络实现文本分类(char-level CNN)-论文详解及tensorflow实现
- 深度学习-CNN卷积神经网络使用TensorFlow框架实现MNIST手写数字识别
- 使用 TensorFlow 在卷积神经网络上实现 L2 约束的 softmax 损失函数
- 基于TensorFlow实现卷积神经网络 3
- 深度学习之卷积神经网络CNN及tensorflow代码实现
- 利用TensorFlow实现VGG16
- 深度学习之卷积神经网络CNN及tensorflow代码实现示例详细介绍(转载)
- Tensorflow实战( 四)经典卷积神经网络之实现AlexNet
- 机器学习实验(四):用tensorflow实现卷积神经网络识别人类活动