论文笔记:Faster RCNN
2016-06-17 12:41
387 查看
Part I. RCNN
Paper link: http://arxiv.org/pdf/1311.2524v5.pdf
Github link: https://github.com/rbgirshick/rcnn
Detailed notes: http://zhangliliang.com/2014/07/23/paper-note-rcnn/
(Regarding to CNN training; pos/neg samples definition; performance of using different layers as feature maps; bounding box regression, etc.)
Notes:
1. architecture
2. advantages & disadvantages:
a. use CNN feature extraction as opposed to traditional feature learning methods.
b. around 2000 proposals to feed into CNN ==> computationally costly ==> SPP net
PART II. SPP net (spatial pyramid pooling)
paper link: http://arxiv.org/pdf/1406.4729v4.pdf
github link: https://github.com/ShaoqingRen/SPP_net
Detailed notes: http://zhangliliang.com/2014/09/13/paper-note-sppnet/
Notes:
1. Motivation: CNN requires standard input size ⇒ crop/wrap leads to information loss ⇒ specifically, only FC need uniform size ⇒ construct SPP layer
to transform various sizes of conv outputs to same size of FC input ⇒ application to RCNN: share conv layers for all proposals to reduce cost.
2. Architecture:
Input whole image ⇒ conv layers to get feature maps (256) ⇒ project proposal regions onto feature map (how? Discussed in detailed notes) ⇒ SPP layer:
for each proposal, apply different pooling kernels to get 4x4, 2x2, 1x1 outputs (3 levels of pyramid) and concatenate them into a vector (16+4+1 = 21) (how to calculate window size and stride?Paper section
2.3) ⇒ FC + SVM + regression
3. Advantages: quicker (24x); multiple levels of pyramid help to extract different level of information from image, higher accuracy.
PART III. FAST RCNN
Paper link: http://arxiv.org/pdf/1504.08083v2.pdf
Github link: https://github.com/rbgirshick/fast-rcnn
Detailed notes: http://zhangliliang.com/2015/05/17/paper-note-fast-rcnn/
Notes:
Motivation: implement SPP to RCNN (RoI pooling); joint SVM, Bbox regression to RCNN
Architecture:
RoI pooling layer: 1 level SPP layer;
Multi-task loss layer:
Where u is true class, v is true regression object, p is prob vector, t = [deltax, deltay, width, height]
classification loss (Lcis): N+1 softmax loss (1 for background)
Regression loss (Lloc): 4*N regressor (for each class, output deltax, deltay, width and height)
PART IV. FASTER RCNN
Paper link:http://arxiv.org/pdf/1506.01497v3.pdf
Github link:https://github.com/ShaoqingRen/faster_rcnn
Notes:
1. Motivation: fast RCNN uses separate pipelines for making proposals and getting feature maps ⇒ making proposal could be done through CNN (Region
proposal networks RPN)
2. Architecture of RPN
Whole image feeded into CNN(any benchmark)==> last conv layer ==> 3x3 sliding windows to look at pix’s neighbour, and each sliding position has 9 (3
ratios * 3 sizes) anchors (9 proposals) ==>all together (W*H*9) anchors⇒ 1x1 conv (look at channel’s infor) and get a vector ==> cls and reg loss layer
Multitask loss layer:
3. Training process of RPN and fast RCNN
phase 1: train RPN ⇒ get proposals ⇒ feed to fast RCNN and train
phase 2: feed RCNN convolution weight to RPN conv, keep RCNN & RPN conv layers learning rate =0, only train FC and loss layer of RPN ⇒ feed proposals
to fast RCNN and train FC layer
Paper link: http://arxiv.org/pdf/1311.2524v5.pdf
Github link: https://github.com/rbgirshick/rcnn
Detailed notes: http://zhangliliang.com/2014/07/23/paper-note-rcnn/
(Regarding to CNN training; pos/neg samples definition; performance of using different layers as feature maps; bounding box regression, etc.)
Notes:
1. architecture
2. advantages & disadvantages:
a. use CNN feature extraction as opposed to traditional feature learning methods.
b. around 2000 proposals to feed into CNN ==> computationally costly ==> SPP net
PART II. SPP net (spatial pyramid pooling)
paper link: http://arxiv.org/pdf/1406.4729v4.pdf
github link: https://github.com/ShaoqingRen/SPP_net
Detailed notes: http://zhangliliang.com/2014/09/13/paper-note-sppnet/
Notes:
1. Motivation: CNN requires standard input size ⇒ crop/wrap leads to information loss ⇒ specifically, only FC need uniform size ⇒ construct SPP layer
to transform various sizes of conv outputs to same size of FC input ⇒ application to RCNN: share conv layers for all proposals to reduce cost.
2. Architecture:
Input whole image ⇒ conv layers to get feature maps (256) ⇒ project proposal regions onto feature map (how? Discussed in detailed notes) ⇒ SPP layer:
for each proposal, apply different pooling kernels to get 4x4, 2x2, 1x1 outputs (3 levels of pyramid) and concatenate them into a vector (16+4+1 = 21) (how to calculate window size and stride?Paper section
2.3) ⇒ FC + SVM + regression
3. Advantages: quicker (24x); multiple levels of pyramid help to extract different level of information from image, higher accuracy.
PART III. FAST RCNN
Paper link: http://arxiv.org/pdf/1504.08083v2.pdf
Github link: https://github.com/rbgirshick/fast-rcnn
Detailed notes: http://zhangliliang.com/2015/05/17/paper-note-fast-rcnn/
Notes:
Motivation: implement SPP to RCNN (RoI pooling); joint SVM, Bbox regression to RCNN
Architecture:
RoI pooling layer: 1 level SPP layer;
Multi-task loss layer:
Where u is true class, v is true regression object, p is prob vector, t = [deltax, deltay, width, height]
classification loss (Lcis): N+1 softmax loss (1 for background)
Regression loss (Lloc): 4*N regressor (for each class, output deltax, deltay, width and height)
PART IV. FASTER RCNN
Paper link:http://arxiv.org/pdf/1506.01497v3.pdf
Github link:https://github.com/ShaoqingRen/faster_rcnn
Notes:
1. Motivation: fast RCNN uses separate pipelines for making proposals and getting feature maps ⇒ making proposal could be done through CNN (Region
proposal networks RPN)
2. Architecture of RPN
Whole image feeded into CNN(any benchmark)==> last conv layer ==> 3x3 sliding windows to look at pix’s neighbour, and each sliding position has 9 (3
ratios * 3 sizes) anchors (9 proposals) ==>all together (W*H*9) anchors⇒ 1x1 conv (look at channel’s infor) and get a vector ==> cls and reg loss layer
Multitask loss layer:
3. Training process of RPN and fast RCNN
phase 1: train RPN ⇒ get proposals ⇒ feed to fast RCNN and train
phase 2: feed RCNN convolution weight to RPN conv, keep RCNN & RPN conv layers learning rate =0, only train FC and loss layer of RPN ⇒ feed proposals
to fast RCNN and train FC layer
相关文章推荐
- CUDA搭建
- 深入理解CNN的细节
- TensorFlow人工智能入门教程之十三 RCNN 区域卷积网络(视频侦测分析人脸侦测区域检测 )
- TensorFlow人工智能引擎入门教程所有目录
- convolutional neural network
- UFLDL Exercise: Convolutional Neural Network
- 使用深度卷积网络和支撑向量机实现的商标检测与分类的例子
- 对Pedestrian Detection aided by Deep Learning Semantic Tasks的小结
- 阅读 理解 思考 - Learning to Segment Object Candidates
- 卷积神经网络学习
- CNN: single-label to multi-label总结
- 总结:Large Scale Distributed Deep Networks
- 总结:One weird trick for parallelizing convolutional neural networks
- Extract CNN features using Caffe
- Deep Learning Face Attributes in the Wild
- 卷积神经网络CNN
- Tiled convolutional neural networks(TCNN)
- 卷积神经网络
- 卷积神经网络参数说明
- windows下的theano以及GPU加速环境的搭建