深度学习: Faster R-CNN 网络
2017-12-21 11:10
387 查看
Structure
看不清的可以右键,在新tab中查看该图片:前部
Faster R-CNN 头部 负责对输入图像进行 特征提取 :网络结构有两种,一种是将ZFNet(扔掉了尾端的全连接层)拿来用,另一种则是将VGG拿来用(扔掉了尾端的全连接层)。论文中给出的是第一种(绿框内为拿来用的那部分):
中部
Faster R-CNN 中部 负责对 特征图 (即前部所提取到的特征) 进行 特征提取 :双分支:
绿框内:【RPN(生成anchor –> 初步的分类及初步的边框回归 –> 洗涤anchor成proposal)】—-》
蓝框内:【proposal–>RoI】。
红框内:【特征图的无损传递】。
最后统一交付给
黄框内:【RoIPooling】 去输出相同size的RoI。
后部
再一次 的 分类任务 和 边框回归任务 来 进一步提升检测精度,并输出检测结果:Loss Computation
多任务:Faster R-CNN论文笔记——FR
Fast R-CNN网络有两个同级输出层(cls score和bbox_prdict层),都是全连接层,称为multi-task。
① clsscore层:用于分类,输出k+1维数组p,表示属于k类和背景的概率。对每个RoI(Region of Interesting)输出离散型概率分布
通常,p由k+1类的全连接层利用softmax计算得出。
② bbox_prdict层:用于调整候选区域位置,输出bounding box回归的位移,输出4*K维数组t,表示分别属于k类时,应该平移缩放的参数。
k表示类别的索引,
是指相对于objectproposal尺度不变的平移,
是指对数空间中相对于objectproposal的高与宽。
loss_cls层评估分类损失函数。由真实分类u对应的概率决定:
loss_bbox评估检测框定位的损失函数。比较真实分类对应的预测平移缩放参数
和真实平移缩放参数为
的差别:
其中,smooth L1损失函数为:
smooth L1损失函数曲线如下图9所示,作者这样设置的目的是想让loss对于离群点更加鲁棒,相比于L2损失函数,其对离群点、异常值(outlier)不敏感,可控制梯度的量级使训练时不容易跑飞。
最后总损失为(两者加权和,如果分类为背景则不考虑定位损失):
规定u=0为背景类(也就是负标签),那么艾弗森括号指数函数[u≥1]表示背景候选区域即负样本不参与回归损失,不需要对候选区域进行回归操作。λ控制分类损失和回归损失的平衡。Fast R-CNN论文中,所有实验λ=1。
艾弗森括号指数函数为:
源码中bbox_loss_weights用于标记每一个bbox是否属于某一个类。
Code
附上作者的源码 rbgirshick/py-faster-rcnn/models/pascal_voc/ZF/faster_rcnn_end2end/train.prototxt :name: "ZF" layer { name: 'input-data' type: 'Python' top: 'data' top: 'im_info' top: 'gt_boxes' python_param { module: 'roi_data_layer.layer' layer: 'RoIDataLayer' param_str: "'num_classes': 21" } } #========= conv1-conv5 ============ layer { name: "conv1" type: "Convolution" bottom: "data" top: "conv1" param { lr_mult: 1.0 } param { lr_mult: 2.0 } convolution_param { num_output: 96 kernel_size: 7 pad: 3 stride: 2 } } layer { name: "relu1" type: "ReLU" bottom: "conv1" top: "conv1" } layer { name: "norm1" type: "LRN" bottom: "conv1" top: "norm1" lrn_param { local_size: 3 alpha: 0.00005 beta: 0.75 norm_region: WITHIN_CHANNEL engine: CAFFE } } layer { name: "pool1" type: "Pooling" bottom: "norm1" top: "pool1" pooling_param { kernel_size: 3 stride: 2 pad: 1 pool: MAX } } layer { name: "conv2" type: "Convolution" bottom: "pool1" top: "conv2" param { lr_mult: 1.0 } param { lr_mult: 2.0 } convolution_param { num_output: 256 kernel_size: 5 pad: 2 stride: 2 } } layer { name: "relu2" type: "ReLU" bottom: "conv2" top: "conv2" } layer { name: "norm2" type: "LRN" bottom: "conv2" top: "norm2" lrn_param { local_size: 3 alpha: 0.00005 beta: 0.75 norm_region: WITHIN_CHANNEL engine: CAFFE } } layer { name: "pool2" type: "Pooling" bottom: "norm2" top: "pool2" pooling_param { kernel_size: 3 stride: 2 pad: 1 pool: MAX } } layer { name: "conv3" type: "Convolution" bottom: "pool2" top: "conv3" param { lr_mult: 1.0 } param { lr_mult: 2.0 } convolution_param { num_output: 384 kernel_size: 3 pad: 1 stride: 1 } } layer { name: "relu3" type: "ReLU" bottom: "conv3" top: "conv3" } layer { name: "conv4" type: "Convolution" bottom: "conv3" top: "conv4" param { lr_mult: 1.0 } param { lr_mult: 2.0 } convolution_param { num_output: 384 kernel_size: 3 pad: 1 stride: 1 } } layer { name: "relu4" type: "ReLU" bottom: "conv4" top: "conv4" } layer { name: "conv5" type: "Convolution" bottom: "conv4" top: "conv5" param { lr_mult: 1.0 } param { lr_mult: 2.0 } convolution_param { num_output: 256 kernel_size: 3 pad: 1 stride: 1 } } layer { name: "relu5" type: "ReLU" bottom: "conv5" top: "conv5" } #========= RPN ============ layer { name: "rpn_conv/3x3" type: "Convolution" bottom: "conv5" top: "rpn/output" param { lr_mult: 1.0 } param { lr_mult: 2.0 } convolution_param { num_output: 256 kernel_size: 3 pad: 1 stride: 1 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "rpn_relu/3x3" type: "ReLU" bottom: "rpn/output" top: "rpn/output" } #layer { # name: "rpn_conv/3x3" # type: "Convolution" # bottom: "conv5" # top: "rpn_conv/3x3" # param { lr_mult: 1.0 } # param { lr_mult: 2.0 } # convolution_param { # num_output: 192 # kernel_size: 3 pad: 1 stride: 1 # weight_filler { type: "gaussian" std: 0.01 } # bias_filler { type: "constant" value: 0 } # } #} #layer { # name: "rpn_conv/5x5" # type: "Convolution" # bottom: "conv5" # top: "rpn_conv/5x5" # param { lr_mult: 1.0 } # param { lr_mult: 2.0 } # convolution_param { # num_output: 64 # kernel_size: 5 pad: 2 stride: 1 # weight_filler { type: "gaussian" std: 0.0036 } # bias_filler { type: "constant" value: 0 } # } #} #layer { # name: "rpn/output" # type: "Concat" # bottom: "rpn_conv/3x3" # bottom: "rpn_conv/5x5" # top: "rpn/output" #} #layer { # name: "rpn_relu/output" # type: "ReLU" # bottom: "rpn/output" # top: "rpn/output" #} layer { name: "rpn_cls_score" type: "Convolution" bottom: "rpn/output" top: "rpn_cls_score" param { lr_mult: 1.0 } param { lr_mult: 2.0 } convolution_param { num_output: 18 # 2(bg/fg) * 9(anchors) kernel_size: 1 pad: 0 stride: 1 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "rpn_bbox_pred" type: "Convolution" bottom: "rpn/output" top: "rpn_bbox_pred" param { lr_mult: 1.0 } param { lr_mult: 2.0 } convolution_param { num_output: 36 # 4 * 9(anchors) kernel_size: 1 pad: 0 stride: 1 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { bottom: "rpn_cls_score" top: "rpn_cls_score_reshape" name: "rpn_cls_score_reshape" type: "Reshape" reshape_param { shape { dim: 0 dim: 2 dim: -1 dim: 0 } } } layer { name: 'rpn-data' type: 'Python' bottom: 'rpn_cls_score' bottom: 'gt_boxes' bottom: 'im_info' bottom: 'data' top: 'rpn_labels' top: 'rpn_bbox_targets' top: 'rpn_bbox_inside_weights' top: 'rpn_bbox_outside_weights' python_param { module: 'rpn.anchor_target_layer' layer: 'AnchorTargetLayer' param_str: "'feat_stride': 16" } } layer { name: "rpn_loss_cls" type: "SoftmaxWithLoss" bottom: "rpn_cls_score_reshape" bottom: "rpn_labels" propagate_down: 1 propagate_down: 0 top: "rpn_cls_loss" loss_weight: 1 loss_param { ignore_label: -1 normalize: true } } layer { name: "rpn_loss_bbox" type: "SmoothL1Loss" bottom: "rpn_bbox_pred" bottom: "rpn_bbox_targets" bottom: 'rpn_bbox_inside_weights' bottom: 'rpn_bbox_outside_weights' top: "rpn_loss_bbox" loss_weight: 1 smooth_l1_loss_param { sigma: 3.0 } } #========= RoI Proposal ============ layer { name: "rpn_cls_prob" type: "Softmax" bottom: "rpn_cls_score_reshape" top: "rpn_cls_prob" } layer { name: 'rpn_cls_prob_reshape' type: 'Reshape' bottom: 'rpn_cls_prob' top: 'rpn_cls_prob_reshape' reshape_param { shape { dim: 0 dim: 18 dim: -1 dim: 0 } } } layer { name: 'proposal' type: 'Python' bottom: 'rpn_cls_prob_reshape' bottom: 'rpn_bbox_pred' bottom: 'im_info' top: 'rpn_rois' # top: 'rpn_scores' python_param { module: 'rpn.proposal_layer' layer: 'ProposalLayer' param_str: "'feat_stride': 16" } } #layer { # name: 'debug-data' # type: 'Python' # bottom: 'data' # bottom: 'rpn_rois' # bottom: 'rpn_scores' # python_param { # module: 'rpn.debug_layer' # layer: 'RPNDebugLayer' # } #} layer { name: 'roi-data' type: 'Python' bottom: 'rpn_rois' bottom: 'gt_boxes' top: 'rois' top: 'labels' top: 'bbox_targets' top: 'bbox_inside_weights' top: 'bbox_outside_weights' python_param { module: 'rpn.proposal_target_layer' layer: 'ProposalTargetLayer' param_str: "'num_classes': 21" } } #========= RCNN ============ layer { name: "roi_pool_conv5" type: "ROIPooling" bottom: "conv5" bottom: "rois" top: "roi_pool_conv5" roi_pooling_param { pooled_w: 6 pooled_h: 6 spatial_scale: 0.0625 # 1/16 } } layer { name: "fc6" type: "InnerProduct" bottom: "roi_pool_conv5" top: "fc6" param { lr_mult: 1.0 } param { lr_mult: 2.0 } inner_product_param { num_output: 4096 } } layer { name: "relu6" type: "ReLU" bottom: "fc6" top: "fc6" } layer { name: "drop6" type: "Dropout" bottom: "fc6" top: "fc6" dropout_param { dropout_ratio: 0.5 scale_train: false } } layer { name: "fc7" type: "InnerProduct" bottom: "fc6" top: "fc7" param { lr_mult: 1.0 } param { lr_mult: 2.0 } inner_product_param { num_output: 4096 } } layer { name: "relu7" type: "ReLU" bottom: "fc7" top: "fc7" } layer { name: "drop7" type: "Dropout" bottom: "fc7" top: "fc7" dropout_param { dropout_ratio: 0.5 scale_train: false } } layer { name: "cls_score" type: "InnerProduct" bottom: "fc7" top: "cls_score" param { lr_mult: 1.0 } param { lr_mult: 2.0 } inner_product_param { num_output: 21 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "bbox_pred" type: "InnerProduct" bottom: "fc7" top: "bbox_pred" param { lr_mult: 1.0 } param { lr_mult: 2.0 } inner_product_param { num_output: 84 weight_filler { type: "gaussian" std: 0.001 } bias_filler { type: "constant" value: 0 } } } layer { name: "loss_cls" type: "SoftmaxWithLoss" bottom: "cls_score" bottom: "labels" propagate_down: 1 propagate_down: 0 top: "cls_loss" loss_weight: 1 loss_param { ignore_label: -1 normalize: true } } layer { name: "loss_bbox" type: "SmoothL1Loss" bottom: "bbox_pred" bottom: "bbox_targets" bottom: 'bbox_inside_weights' bottom: 'bbox_outside_weights' top: "bbox_loss" loss_weight: 1 }
相关文章推荐
- 【神经网络与深度学习】【计算机视觉】Faster R-CNN
- [计算机视觉][神经网络与深度学习]Faster R-CNN配置及其训练教程
- [计算机视觉][神经网络与深度学习]Faster R-CNN配置及其训练教程2
- R-CNN,SPP-NET, Fast-R-CNN,Faster-R-CNN, YOLO, SSD系列深度学习检测方法梳理
- R-CNN,SPP-NET, Fast-R-CNN,Faster-R-CNN, YOLO,系列深度学习检测方法
- R-CNN,SPP-NET, Fast-R-CNN,Faster-R-CNN, YOLO, SSD系列深度学习检测方法梳理
- Matlab图像识别/检索系列(5)—10行代码完成深度学习网络之CNN/Autoencoder
- R-CNN,SPP-NET, Fast-R-CNN,Faster-R-CNN, YOLO, SSD系列深度学习检测方法梳理
- 深度学习论文笔记:Faster R-CNN
- 顶级论文详解-深度学习Faster R-CNN
- 深度学习实践经验:用Faster R-CNN训练行人检测数据集Caltech——准备工作
- 【神经网络与深度学习】【计算机视觉】RCNN- 将CNN引入目标检测的开山之作
- 深度学习检测方法梳理:R-CNN,SPP-NET, Fast-R-CNN,Faster-R-CNN, YOLO, SSD系列
- 深度学习实践经验:用Faster R-CNN训练Caltech数据集——修改读写接口
- 深度学习与自然语言处理之四:卷积神经网络模型(CNN)
- 卷积神经网络学习三:神经网络之深度学习与tinny_cnn中的层
- 深度学习之检测模型-Faster RCNN
- R-CNN,SPP-NET, Fast-R-CNN,Faster-R-CNN, YOLO, SSD系列深度学习检测方法梳理
- 深度学习与自然语言处理之四:卷积神经网络模型(CNN)
- 深度学习 计算机视觉 物体检测 rcnn,fast rcnn,faster rcnn