您的位置:首页 > 其它

论文笔记:Faster RCNN

2016-06-17 12:41 387 查看
Part I. RCNN

Paper link: http://arxiv.org/pdf/1311.2524v5.pdf

Github link: https://github.com/rbgirshick/rcnn

Detailed notes: http://zhangliliang.com/2014/07/23/paper-note-rcnn/

(Regarding to  CNN training; pos/neg samples definition; performance of using different layers as feature maps; bounding box regression,  etc.)

Notes:

1. architecture



2. advantages & disadvantages:

a. use CNN feature extraction as opposed to traditional feature learning methods.

b. around 2000 proposals to feed into CNN ==> computationally costly ==> SPP net

PART II. SPP net (spatial pyramid pooling)

paper link: http://arxiv.org/pdf/1406.4729v4.pdf

github link: https://github.com/ShaoqingRen/SPP_net

Detailed notes: http://zhangliliang.com/2014/09/13/paper-note-sppnet/

Notes:

1. Motivation: CNN requires standard input size ⇒ crop/wrap leads to information loss ⇒ specifically, only FC need uniform size ⇒ construct SPP layer
to transform various sizes of conv outputs to same size of FC input ⇒ application to RCNN: share conv layers for all proposals to reduce cost.

2. Architecture:



Input whole image ⇒ conv layers to get feature maps (256) ⇒ project proposal regions onto feature map (how? Discussed in detailed notes) ⇒ SPP layer:
for each proposal, apply different pooling kernels to get 4x4, 2x2, 1x1 outputs (3 levels of pyramid) and concatenate them into a vector (16+4+1 = 21) (how to calculate window size and stride?Paper section
2.3) ⇒ FC + SVM + regression

3. Advantages: quicker (24x); multiple levels of pyramid help to extract different level of information from image, higher accuracy.

PART III. FAST RCNN

Paper link: http://arxiv.org/pdf/1504.08083v2.pdf

Github link: https://github.com/rbgirshick/fast-rcnn

Detailed notes: http://zhangliliang.com/2015/05/17/paper-note-fast-rcnn/

Notes:

Motivation: implement SPP to RCNN (RoI pooling); joint SVM, Bbox regression to RCNN

Architecture:



RoI pooling layer: 1 level SPP layer;

Multi-task loss layer:



Where u is true class, v is true regression object, p is prob vector, t = [deltax, deltay, width, height]

classification loss (Lcis): N+1 softmax loss (1 for background)

Regression loss (Lloc): 4*N regressor (for each class, output deltax, deltay, width and height)



PART IV. FASTER RCNN

Paper link:http://arxiv.org/pdf/1506.01497v3.pdf

Github link:https://github.com/ShaoqingRen/faster_rcnn

Notes:

1. Motivation: fast RCNN uses separate pipelines for making proposals and getting feature maps ⇒ making proposal could be done through CNN (Region
proposal networks RPN)

2. Architecture of RPN



Whole image feeded into CNN(any benchmark)==> last conv layer ==> 3x3 sliding windows to look at pix’s neighbour, and each sliding position has 9 (3
ratios * 3 sizes) anchors (9 proposals) ==>all together (W*H*9) anchors⇒  1x1 conv (look at channel’s infor) and get a vector ==> cls and reg loss layer

Multitask loss layer:



3. Training process of RPN and fast RCNN

phase 1: train RPN ⇒ get proposals ⇒ feed to fast RCNN and train

phase 2: feed RCNN convolution weight to RPN conv, keep RCNN & RPN conv layers learning rate =0, only train FC and loss layer of RPN ⇒ feed proposals
to fast RCNN and train FC layer
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  cnn rcnn