READING NOTE: Object Detection by Labeling Superpixels
2015-12-05 17:00
537 查看
TITLE: Object Detection by Labelling Superpixels
AUTHOR: Yan, Junjie and Yu, Yinan and Zhu, Xiangyu and Lei, Zhen and Li, Stan Z.
FROM: CVPR2015
Conduct an energy function considering appearance, spatial context and numbers of labels.
![](http://joshua881228.webfactional.com/media/uploads/ReadingNote/CVPR2015_ODLS/ODLS.png)
The image is partitioned into a set of super-pixels, denoted as P={p1,p2,...,pN}.
An energy function E(L) is calculated to measure the corresponding label configuration for each super-pixels, where L={l1,l2,...,lN}.
The problem is transfered to select an L to minimise E(L).
SOME DETAILS
The energy function is conducted as
E(L)=∑pi∈PD(li,pi)+∑(pi,pj)∈NV(li,lj,pi,pj)+C(L)
where D(li,pi) is the data cost to capture the appearance of pi and measure its cost of belonging to label li, V(li,lj,pi,pj) is the pairwise smooth cost in the local area N and C(L) is the label cost to encourage compact detection and to punish the number of labels.
Data Cost
Super-pixels usually does not have enough semantic information, so corresponding regions are classified and their costs are propagated to super-pixels. In this work, RCNN is used to generate and classify semantic regions. The region set of T elements is denoted as R={r1,..,rT} and the classifier score is st, thus we can map the scores into (0,1) by
D(lt,rt)=⎧⎩⎨11+exp(−α⋅st) exp(−α⋅st)1+exp(−α⋅st)if lt>0if lt=0
where α is set to 1.5 empirically. For each super-pixel the data cost is the weighted sum of T smallest costs,
D(li,pi)=∑t=1Tωdt⋅D(lt,R(pi)t)
where R(pi)t is the region pi belongs to with the t-th smallest cost.
Smooth Cost
The smooth cost is conducted for the reason that 1) adjacent super-pixels often have the same label and 2) super-pixels belonging to the same label should have similar apprearance. This attribute is measured by
V(li,lj,pi,pj)=ωslVl(li,lj)+Va(li,lj,pi,pj)
where Vl is a boolean variable and is set to 1 when li=lj and (pi,pj)∈N. Va is defined as
Va(li,lj,pi,pj)=ωsc(1−∑qmin(cqi),cqj)+ωst(1−∑qmin(tqi),tqj)
where cqi and tqi are the values in the q-th bin of color and texture histogram of super-pixel pi. In this work color histogram and SIFT histogram are calculated to describe color and texture information.
Label Cost
The label cost is used to encourage less number of labels and its defination is
C(L)=∑i=1Kωli⋅δ(i,L)
where δ(⋅) is defined as
δ(i,L)={1 0if i∈Lif otherwise
Avoid false negatives caused by inappropriate proposals generated by algorithms suchas Selective Search and BING.
Super-pixel based method is a trade-off of Pixel based and Proposal based algorithm, leading to accurate and fast results.
The region generated might not cover all the super-pixels.
Time consumption is high. Its speed is 1fps for each 128 proposals on a NVIDIA Telsa K40 GPU. However, 128 proposals might not be enough.
AUTHOR: Yan, Junjie and Yu, Yinan and Zhu, Xiangyu and Lei, Zhen and Li, Stan Z.
FROM: CVPR2015
CONTRIBUTIONS
Convert object detection problem into super-pixel labelling problem, which could avoid false negatives caused by proposals and could take advantages from global contexts.Conduct an energy function considering appearance, spatial context and numbers of labels.
METHOD
![](http://joshua881228.webfactional.com/media/uploads/ReadingNote/CVPR2015_ODLS/ODLS.png)
The image is partitioned into a set of super-pixels, denoted as P={p1,p2,...,pN}.
An energy function E(L) is calculated to measure the corresponding label configuration for each super-pixels, where L={l1,l2,...,lN}.
The problem is transfered to select an L to minimise E(L).
SOME DETAILS
The energy function is conducted as
E(L)=∑pi∈PD(li,pi)+∑(pi,pj)∈NV(li,lj,pi,pj)+C(L)
where D(li,pi) is the data cost to capture the appearance of pi and measure its cost of belonging to label li, V(li,lj,pi,pj) is the pairwise smooth cost in the local area N and C(L) is the label cost to encourage compact detection and to punish the number of labels.
Data Cost
Super-pixels usually does not have enough semantic information, so corresponding regions are classified and their costs are propagated to super-pixels. In this work, RCNN is used to generate and classify semantic regions. The region set of T elements is denoted as R={r1,..,rT} and the classifier score is st, thus we can map the scores into (0,1) by
D(lt,rt)=⎧⎩⎨11+exp(−α⋅st) exp(−α⋅st)1+exp(−α⋅st)if lt>0if lt=0
where α is set to 1.5 empirically. For each super-pixel the data cost is the weighted sum of T smallest costs,
D(li,pi)=∑t=1Tωdt⋅D(lt,R(pi)t)
where R(pi)t is the region pi belongs to with the t-th smallest cost.
Smooth Cost
The smooth cost is conducted for the reason that 1) adjacent super-pixels often have the same label and 2) super-pixels belonging to the same label should have similar apprearance. This attribute is measured by
V(li,lj,pi,pj)=ωslVl(li,lj)+Va(li,lj,pi,pj)
where Vl is a boolean variable and is set to 1 when li=lj and (pi,pj)∈N. Va is defined as
Va(li,lj,pi,pj)=ωsc(1−∑qmin(cqi),cqj)+ωst(1−∑qmin(tqi),tqj)
where cqi and tqi are the values in the q-th bin of color and texture histogram of super-pixel pi. In this work color histogram and SIFT histogram are calculated to describe color and texture information.
Label Cost
The label cost is used to encourage less number of labels and its defination is
C(L)=∑i=1Kωli⋅δ(i,L)
where δ(⋅) is defined as
δ(i,L)={1 0if i∈Lif otherwise
ADVANTAGES
Super-pixels are compact and perceptually meaningful atomic regions for images.Avoid false negatives caused by inappropriate proposals generated by algorithms suchas Selective Search and BING.
Super-pixel based method is a trade-off of Pixel based and Proposal based algorithm, leading to accurate and fast results.
DISADVANTAGES
The CNN used in RCNN and the parameters in the energy function are learned separately.The region generated might not cover all the super-pixels.
Time consumption is high. Its speed is 1fps for each 128 proposals on a NVIDIA Telsa K40 GPU. However, 128 proposals might not be enough.
相关文章推荐
- 计算机视觉领域的牛人博客和有实力的研究机构
- 科研工作的关注点
- 最小外接矩形(MBR)
- 色彩量化评价指标 Quantitative measure methods for color quantization
- 图像处理的网址(转载)
- OpenCV学习笔记 第一篇 显示图像
- 图铭Android平台银行卡号识别系统
- 学习OpenCV第一课——认识、安装配置OpenCV(CodeBlocks)
- Kalman滤波
- 图像处理特征不变算子系列之KLT算子--GoodFeaturesToTrack(七)
- 《计算机视觉中的数学方法》笔记1 向量叉积的反对称矩阵表示
- 摄像测量相关
- Learning OpenCV Chapter3 初探OpenCV上
- 开通博客--for interest
- 计算机视觉投稿
- 摄像机几何概念
- R. Wang-Manifold-Manifold Distance with Application to Face Recognition based on Image Set读后记
- 计算机视觉、机器学习相关领域论文和源代码大集合--持续更新
- 计算机视觉、模式识别大牛资料搜集
- 显著性检测(Saliency Detection)