Fine-Grained Recognition with Automatic and Efficient Part Attention
2017-03-20 08:13
429 查看
论文出处:2016年CVPR
作者单位:Baidu Research
细粒度分类的挑战在于较小的类间差异VS较大的类间差异。因此解决这个问题的关键在于定位判别性的位置并提取pose-invariant 特征。本文提出了一种全卷积注意力模型(Fully Convolutional Attention Networks, FCANs)。此模型利用增强学习的框架(reinforcement learning framework)自适应地选取局部判别性的区别用于不同的细粒度领域。本文的主要优势在于以下四点
1)融合了三个元素:特征提取,视觉attention和细粒度分类一起训练, 是一个end-to-end的模型;
2)使用弱监督的增强学习,并且不需要额外的局部标注信息(part annotation);
3)全卷积网络提升了训练和测试速度;
4)贪心的奖励策略加速了收敛。
所提的FCANs包括三个元素:the feature component,the attention component, the classification component.
Feature map extraction:
Fully convolutional Part Attention:
这一部分的功能是通过计算basis convolutional feature maps 生成大量的part score maps来定位不同的区域。每一个score map 是有两个卷积层和一个空间softmax层构成。第一个卷积层利用64个3x3的kernel,第二个卷积层是一个3x3的kernel,得到的是一个single-channel的confidence map。空间softmax层将confidence
map转化成概率。测试时,模型利用最高的概率对应的attention region作为part location。
Fine-Grained Classification:
The classification component contains a convolutional network for each part as well as the whole image. 每一个位置的分类网络都是一个全卷积层,followed by a softmax layer。最终的预测结果是所有individual 分类器得分的均值。
整个attention 问题可以看做是一个Markov Decision Process (MDP)During each time step of MDP,
the FCANs work as an agent to perform an action based on the observation and receives a reward. 在本文中,action对应着attention region的位置,observation对应着输入图像以及 the crops of the attention regions; reward对应着利用attention region获得分类得分。
作者单位:Baidu Research
细粒度分类的挑战在于较小的类间差异VS较大的类间差异。因此解决这个问题的关键在于定位判别性的位置并提取pose-invariant 特征。本文提出了一种全卷积注意力模型(Fully Convolutional Attention Networks, FCANs)。此模型利用增强学习的框架(reinforcement learning framework)自适应地选取局部判别性的区别用于不同的细粒度领域。本文的主要优势在于以下四点
1)融合了三个元素:特征提取,视觉attention和细粒度分类一起训练, 是一个end-to-end的模型;
2)使用弱监督的增强学习,并且不需要额外的局部标注信息(part annotation);
3)全卷积网络提升了训练和测试速度;
4)贪心的奖励策略加速了收敛。
所提的FCANs包括三个元素:the feature component,the attention component, the classification component.
Feature map extraction:
Fully convolutional Part Attention:
这一部分的功能是通过计算basis convolutional feature maps 生成大量的part score maps来定位不同的区域。每一个score map 是有两个卷积层和一个空间softmax层构成。第一个卷积层利用64个3x3的kernel,第二个卷积层是一个3x3的kernel,得到的是一个single-channel的confidence map。空间softmax层将confidence
map转化成概率。测试时,模型利用最高的概率对应的attention region作为part location。
Fine-Grained Classification:
The classification component contains a convolutional network for each part as well as the whole image. 每一个位置的分类网络都是一个全卷积层,followed by a softmax layer。最终的预测结果是所有individual 分类器得分的均值。
整个attention 问题可以看做是一个Markov Decision Process (MDP)During each time step of MDP,
the FCANs work as an agent to perform an action based on the observation and receives a reward. 在本文中,action对应着attention region的位置,observation对应着输入图像以及 the crops of the attention regions; reward对应着利用attention region获得分类得分。
相关文章推荐
- 阅读小结:Fine-Grained Recognition with Automatic and Efficient Part Attention
- 论文阅读(Lukas Neuman——【ICDAR2015】Efficient Scene Text Localization and Recognition with Local Character Refinement)
- Efficient Scene Text Localization and Recognition with Local Character Refinement
- 《Distributed Programming With Ruby》读书笔记一Drb:Hellowold and Pass by Reference(Part1.1-1)
- 2013_ICCV_Efficient Image Dehazing with Boundary Constraint and Contextual Regularization
- Image Processing for Dummies with C# and GDI+ Part 3 - Edge Detection Filters
- 2014LuJW-Human Identity and Gender Recognition from Gait Sequences with Arbitrary Walking Directions
- Unsupervised Template Learning for Fine-Grained Object Recognition(精读)
- 【翻译】Building a Simple Blog Engine with ASP.NET MVC and LINQ - Part 2
- 论文笔记之:Multiple Object Recognition With Visual Attention
- The Application of Two-level Attention Models in CNN for Fine-grained Image Classification
- DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations – CVPR 2016
- [Clojure] A Room-Escape game, playing with telnet and pure-text commands - Part 1
- Mining Twitter Data with Python Part 4: Rugby and Term Co-occurrences
- 29.Your database instance is configured with automatic undo management and the UNDO_RETENTION
- robust scene text recognition with automatic rectification
- Getting Started with AngularJS 1.5 and ES6: part 6
- Efficient Data Paging and Sorting with ASP.NET 2.0 and SQL 2005
- Towards End-to-End Car License Plates Detection and Recognition with Deep Neural Networks
- Advanced Load Testing Scenarios with JMeter: Part 2 - Data-Driven Testing and Assertions