基于caffe的多标签端到端车牌识别
2017-12-20 11:49
435 查看
最近利用深度学习框架caffe做了一个端到端车牌识别的模型,所用的网络为AlexNet稍加了一些修改,由于数据集太少、网络结构不完善,效果并没有很好,但是一些基本的思路摸清楚了。
caffe自带的lmdb数据库没有存储多标签数据的功能,所以用hdf5对数据进行存储。车牌共有7个字符,第一个字符为汉字,第二个字符为字母,后面五个字符为数字和字母,所以将标签分为三类:31个汉字标签,24个字母标签,以及34个字母数字标签。
opencv读取的图片为H*W*C的形式,根据blob对图片reshape为C*H*W维度。
将图片存入h5文件的时候,需要加上图片的均值,由于caffe计算均值的文件是针对lmdb格式的,所以均值就自己写了一个:
最后要将h5文件读入到一个txt文件里,这个txt文件存储的就是h5文件的路径。
数据层用的是hdf5数据;
多标签分类,添加了slice层;
最后的输出根据标签数量的不同,输出的特征数有变化。
有两个问题需要总结下:
AlexNet在conv5之后又一个pool5层,但是到conv5之后,图片的边界已经不足以再做池化,所以去掉了pool5层;
第一次训练的时候忘记修改最后fc层输出的num_output参数,导致特征分类出现问题。七个fc层的num_output分别是31、24、34、34、34、34、34,分别对应各自标签的类别。
solver修改为如下所示,再修改相应的deploy网络结构,就可以开始训练了。
最后的测试结果并不好,还需要对网络结构进行修改和构建,数据集也需要重新扩充,后期会进行更新。
一、数据处理
所用的数据为EasyPR提供的数据集,共有1541张车牌图片,按照8:1:1的比例分为train、val和test三个数据集。(数据集太少,后期还会用GAN和真实车牌生成数据)caffe自带的lmdb数据库没有存储多标签数据的功能,所以用hdf5对数据进行存储。车牌共有7个字符,第一个字符为汉字,第二个字符为字母,后面五个字符为数字和字母,所以将标签分为三类:31个汉字标签,24个字母标签,以及34个字母数字标签。
opencv读取的图片为H*W*C的形式,根据blob对图片reshape为C*H*W维度。
# 读取labelmap labeldict_Chi = dict() labelmap_Chi = open("labelmap_Chinese.txt", "r") lines = labelmap_Chi.readlines() for line in lines: line = line.strip("\n") temp = line.split(" ") labeldict_Chi[temp[1]] = temp[0] labelmap_Chi.close() labeldict_Letter = dict() labelmap_Letter = open("labelmap_Letter.txt", "r") lines = labelmap_Letter.readlines() for line in lines: line = line.strip("\n") temp = line.split(" ") labeldict_Letter[temp[1]] = temp[0] labelmap_Letter.close() labeldict_NL = dict() labelmap_NL = open("labelmap_Num&Letter.txt", "r") lines = labelmap_NL.readlines() for line in lines: line = line.strip("\n") temp = line.split(" ") labeldict_NL[temp[1]] = temp[0] labelmap_NL.close() for i in range(2): imgs_file = os.listdir(IMG_FILE[i]) for img_file in imgs_file: filename = os.path.join(IMG_FILE[i], img_file) image = cv2.imread(filename) image = image.reshape(3, 36, 136) label = [] char_Chi = img_file[:3] label.append(int(labeldict_Chi[char_Chi])) char_Spe = img_file[3:4] label.append(int(labeldict_Letter[char_Spe])) for char in img_file[4:-4]: label.append(int(labeldict_NL[char])) IMAGE[i].append(image) LABEL[i].append(np.array(label)) IMAGE[i] = np.array(IMAGE[i], dtype=int) LABEL[i] = np.array(LABEL[i], dtype=int) meanValue = mean(IMAGE[0]) print "Creating hdf5..." 4000 with h5py.File(HDF5_FILE[i], "w") as h5: data = IMAGE[i].astype(np.float32) - meanValue label = LABEL[i].astype(np.float32) h5["data"] = data h5["label"] = label h5.close() print "Creating txt..." with open(TXT_FILE[i], "w") as txt: txt.write(os.path.join(os.getcwd(), HDF5_FILE[i])) txt.close() np.save("mean_e2e.npy", meanValue)
将图片存入h5文件的时候,需要加上图片的均值,由于caffe计算均值的文件是针对lmdb格式的,所以均值就自己写了一个:
# 计算均值 def mean(images): images = images.astype(np.float32) meanValue = sum(images) / len(images) return meanValue
最后要将h5文件读入到一个txt文件里,这个txt文件存储的就是h5文件的路径。
二、生成网络结构
网络结构使用的是AlexNet结构,对网络结构进行了一些修改:数据层用的是hdf5数据;
多标签分类,添加了slice层;
最后的输出根据标签数量的不同,输出的特征数有变化。
name: "Plate_E2E" layer { name: "data" type: "HDF5Data" top: "data" top: "label" include { phase: TRAIN } hdf5_data_param { source: "examples/plate_e2e/train_e2e.txt" batch_size: 10 } } layer { name: "data" type: "HDF5Data" top: "data" top: "label" include { phase: TEST } hdf5_data_param { source: "examples/plate_e2e/val_e2e.txt" batch_size: 10 } } layer { name: "slicers" type: "Slice" bottom: "label" top: "label_1" top: "label_2" top: "label_3" top: "label_4" top: "label_5" top: "label_6" top: "label_7" slice_param { axis: 1 slice_point: 1 slice_point: 2 slice_point: 3 slice_point: 4 slice_point: 5 slice_point: 6 } } layer { name: "conv1" type: "Convolution" bottom: "data" top: "conv1" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 96 kernel_size: 11 stride: 4 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "relu1" type: "ReLU" bottom: "conv1" top: "conv1" } layer { name: "norm1" type: "LRN" bottom: "conv1" top: "norm1" lrn_param { local_size: 5 alpha: 0.0001 beta: 0.75 } } layer { name: "pool1" type: "Pooling" bottom: "norm1" top: "pool1" pooling_param { pool: MAX kernel_size: 3 stride: 2 } } layer { name: "conv2" type: "Convolution" bottom: "pool1" top: "conv2" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 256 pad: 2 kernel_size: 5 group: 2 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0.1 } } } layer { name: "relu2" type: "ReLU" bottom: "conv2" top: "conv2" } layer { name: "norm2" type: "LRN" bottom: "conv2" top: "norm2" lrn_param { local_size: 5 alpha: 0.0001 beta: 0.75 } } layer { name: "pool2" type: "Pooling" bottom: "norm2" top: "pool2" pooling_param { pool: MAX kernel_size: 3 stride: 2 } } layer { name: "conv3" type: "Convolution" bottom: "pool2" top: "conv3" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 384 pad: 1 kernel_size: 3 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "relu3" type: "ReLU" bottom: "conv3" top: "conv3" } layer { name: "conv4" type: "Convolution" bottom: "conv3" top: "conv4" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 384 pad: 1 kernel_size: 3 group: 2 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0.1 } } } layer { name: "relu4" type: "ReLU" bottom: "conv4" top: "conv4" } layer { name: "conv5" type: "Convolution" bottom: "conv4" top: "conv5" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 256 pad: 1 kernel_size: 3 group: 2 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0.1 } } } layer { name: "relu5" type: "ReLU" bottom: "conv5" top: "conv5" } layer { name: "fc6"< 1271d /span> type: "InnerProduct" bottom: "conv5" top: "fc6" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 4096 weight_filler { type: "gaussian" } bias_filler { type: "constant" value: 0 } } } layer { name: "relu6" type: "ReLU" bottom: "fc6" top: "fc6" } layer { name: "drop6" type: "Dropout" bottom: "fc6" top: "fc6" dropout_param { dropout_ratio: 0.5 } } layer { name: "fc7_1" type: "InnerProduct" bottom: "fc6" top: "fc7_1" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 4096 weight_filler { type: "gaussian" std: 0.005 } bias_filler { type: "constant" value: 0.1 } } } layer { name: "relu7_1" type: "ReLU" bottom: "fc7_1" top: "fc7_1" } layer { name: "drop7_1" type: "Dropout" bottom: "fc7_1" top: "fc7_1" dropout_param { dropout_ratio: 0.5 } } layer { name: "fc7_2" type: "InnerProduct" bottom: "fc6" top: "fc7_2" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 4096 weight_filler { type: "gaussian" std: 0.005 } bias_filler { type: "constant" value: 0.1 } } } layer { name: "relu7_2" type: "ReLU" bottom: "fc7_2" top: "fc7_2" } layer { name: "drop7_2" type: "Dropout" bottom: "fc7_2" top: "fc7_2" dropout_param { dropout_ratio: 0.5 } } layer { name: "fc7_3" type: "InnerProduct" bottom: "fc6" top: "fc7_3" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 4096 weight_filler { type: "gaussian" std: 0.005 } bias_filler { type: "constant" value: 0.1 } } } layer { name: "relu7_3" type: "ReLU" bottom: "fc7_3" top: "fc7_3" } layer { name: "drop7_3" type: "Dropout" bottom: "fc7_3" top: "fc7_3" dropout_param { dropout_ratio: 0.5 } } layer { name: "fc7_4" type: "InnerProduct" bottom: "fc6" top: "fc7_4" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 4096 weight_filler { type: "gaussian" std: 0.005 } bias_filler { type: "constant" value: 0.1 } } } layer { name: "relu7_4" type: "ReLU" bottom: "fc7_4" top: "fc7_4" } layer { name: "drop7_4" type: "Dropout" bottom: "fc7_4" top: "fc7_4" dropout_param { dropout_ratio: 0.5 } } layer { name: "fc7_5" type: "InnerProduct" bottom: "fc6" top: "fc7_5" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 4096 weight_filler { type: "gaussian" std: 0.005 } bias_filler { type: "constant" value: 0.1 } } } layer { name: "relu7_5" type: "ReLU" bottom: "fc7_5" top: "fc7_5" } layer { name: "drop7_5" type: "Dropout" bottom: "fc7_5" top: "fc7_5" dropout_param { dropout_ratio: 0.5 } } layer { name: "fc7_6" type: "InnerProduct" bottom: "fc6" top: "fc7_6" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 4096 weight_filler { type: "gaussian" std: 0.005 } bias_filler { type: "constant" value: 0.1 } } } layer { name: "relu7_6" type: "ReLU" bottom: "fc7_6" top: "fc7_6" } layer { name: "drop7_6" type: "Dropout" bottom: "fc7_6" top: "fc7_6" dropout_param { dropout_ratio: 0.5 } } layer { name: "fc7_7" type: "InnerProduct" bottom: "fc6" top: "fc7_7" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 4096 weight_filler { type: "gaussian" std: 0.005 } bias_filler { type: "constant" value: 0.1 } } } layer { name: "relu7_7" type: "ReLU" bottom: "fc7_7" top: "fc7_7" } layer { name: "drop7_7" type: "Dropout" bottom: "fc7_7" top: "fc7_7" dropout_param { dropout_ratio: 0.5 } } layer { name: "fc8_1" type: "InnerProduct" bottom: "fc7_1" top: "fc8_1" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 31 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "fc8_2" type: "InnerProduct" bottom: "fc7_2" top: "fc8_2" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 24 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "fc8_3" type: "InnerProduct" bottom: "fc7_3" top: "fc8_3" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 34 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "fc8_4" type: "InnerProduct" bottom: "fc7_4" top: "fc8_4" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 34 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "fc8_5" type: "InnerProduct" bottom: "fc7_5" top: "fc8_5" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 34 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "fc8_6" type: "InnerProduct" bottom: "fc7_6" top: "fc8_6" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 34 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "fc8_7" type: "InnerProduct" bottom: "fc7_7" top: "fc8_7" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 34 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "accuracy_1" type: "Accuracy" bottom: "fc8_1" bottom: "label_1" top: "accuracy_1" include { phase: TEST } } layer { name: "accuracy_2" type: "Accuracy" bottom: "fc8_2" bottom: "label_2" top: "accuracy_2" include { phase: TEST } } layer { name: "accuracy_3" type: "Accuracy" bottom: "fc8_3" bottom: "label_3" top: "accuracy_3" include { phase: TEST } } layer { name: "accuracy_4" type: "Accuracy" bottom: "fc8_4" bottom: "label_4" top: "accuracy_4" include { phase: TEST } } layer { name: "accuracy_5" type: "Accuracy" bottom: "fc8_5" bottom: "label_5" top: "accuracy_5" include { phase: TEST } } layer { name: "accuracy_6" type: "Accuracy" bottom: "fc8_6" bottom: "label_6" top: "accuracy_6" include { phase: TEST } } layer { name: "accuracy_7" type: "Accuracy" bottom: "fc8_7" bottom: "label_7" top: "accuracy_7" include { phase: TEST } } layer { name: "loss_1" type: "SoftmaxWithLoss" bottom: "fc8_1" bottom: "label_1" top: "loss_1" loss_weight: 0.2 } layer { name: "loss_2" type: "SoftmaxWithLoss" bottom: "fc8_2" bottom: "label_2" top: "loss_2" loss_weight: 0.2 } layer { name: "loss_3" type: "SoftmaxWithLoss" bottom: "fc8_3" bottom: "label_3" top: "loss_3" loss_weight: 0.2 } layer { name: "loss_4" type: "SoftmaxWithLoss" bottom: "fc8_4" bottom: "label_4" top: "loss_4" loss_weight: 0.2 } layer { name: "loss_5" type: "SoftmaxWithLoss" bottom: "fc8_5" bottom: "label_5" top: "loss_5" loss_weight: 0.2 } layer { name: "loss_6" type: "SoftmaxWithLoss" bottom: "fc8_6" bottom: "label_6" top: "loss_6" loss_weight: 0.2 } layer { name: "loss_7" type: "SoftmaxWithLoss" bottom: "fc8_7" bottom: "label_7" top: "loss_7" loss_weight: 0.2 }
有两个问题需要总结下:
AlexNet在conv5之后又一个pool5层,但是到conv5之后,图片的边界已经不足以再做池化,所以去掉了pool5层;
第一次训练的时候忘记修改最后fc层输出的num_output参数,导致特征分类出现问题。七个fc层的num_output分别是31、24、34、34、34、34、34,分别对应各自标签的类别。
solver修改为如下所示,再修改相应的deploy网络结构,就可以开始训练了。
net: "examples/plate_e2e/train_val_e2e.prototxt" test_iter: 130 test_interval: 100 base_lr: 0.001 lr_policy: "step" gamma: 0.1 stepsize: 3000 display: 10 max_iter: 15000 momentum: 0.9 weight_decay: 0.0005 snapshot: 3000 snapshot_prefix: "examples/plate_e2e/plate_e2e" solver_mode: GPU
三、训练
#!/usr/bin/env sh TOOLS=./build/tools $TOOLS/caffe train --solver=examples/plate_e2e/solver_e2e.prototxt -gpu all
四、测试
def Test(img): # 加载模型 net = caffe.Net(deploy, caffe_model, caffe.TEST) # 注意可以调节预处理批次的大小 # 由于是处理一张图片,所以把原来的10张的批次改为1 net.blobs['data'].reshape(1, 3, 36, 136) # 加载图片到数据层 im = cv2.imread(img) images = np.array([im.reshape(3, 36, 136)], dtype=np.float32) meanValue = mean() net.blobs['data'].data[...] = images - meanValue # 前向计算 out = net.forward() # 预测分类 result = [] result.append(out['prob_1'].argmax()) result.append(out['prob_2'].argmax()) result.append(out['prob_3'].argmax()) result.append(out['prob_4'].argmax()) result.append(out['prob_5'].argmax()) result.append(out['prob_6'].argmax()) result.append(out['prob_7'].argmax()) return result
最后的测试结果并不好,还需要对网络结构进行修改和构建,数据集也需要重新扩充,后期会进行更新。
相关文章推荐
- 数字图像处理:基于MATLAB的车牌识别项目 标签: 图像处理matlab算法 2017-06-24 09:17 98人阅读 评论(0)
- 基于opencv的车牌识别(一)开章及任务详述
- 车牌识别中一种新的直接基于彩色图像的二值化简化方法
- 私有云车牌识别,基于服务器平台的车牌OCR识别服务程序
- 【OpenCV学习笔记】【教程翻译】一(基于SVM和神经网络的车牌识别概述)
- 基于模板匹配的车牌识别系统实例
- 基于百度AI实现 车牌识别
- 基于opencv的车牌识别系统
- Qt+Caffe+OpenCV——【一个基于VGG网络的人脸识别考勤系统】
- 基于动物标签识别的基础知识以及FDX-B协议与结构介绍。
- 基于模板匹配的车牌识别系统实例
- 【Caffe实践】基于Caffe的人脸识别实现
- 基于matlab的蓝色车牌定位与识别---分割
- 基于模板匹配的车牌识别系统实例
- HyperLPR - 一个基于深度学习的支持多种车牌的中文开源车牌识别框架
- 基于Caffe的人脸识别实现
- 基于OpenCV的车牌识别系统之三 ——字符分割与识别(川字分割)
- 深度学习Caffe实战笔记(2)用LeNet跑车牌识别数据
- 【深度学习】基于caffe的表情识别(五):使用python接口测试
- 基于模板匹配的车牌识别系统实例