论文阅读:Learning Visual Question Answering by Bootstrapping Hard Attention
Learning Visual Question Answering by Bootstrapping Hard Attention
Google DeepMind ECCV-2018
2018-08-05 19:24:44
Paper:https://arxiv.org/abs/1808.00300
Introduction:
本文尝试仅仅用 hard attention 的方法来抠出最有用的 feature,进行 VQA 任务的学习。
Soft Attention:
Existing attention models are predominantly based on soft attention, in which all information is adaptively re-weighted before being aggregated. This can improve accuracy by isolating important information and avoiding interference from unimportant information.
Hard Attention:
It has the potential to improve accuracy and learning efficiency by focusing computation on the important parts of an image. But beyond this, it offers better computational efficiency because it only fully processes the information deemed most relevant.
但是,hard attention 有一个很致命的缺陷:由于图像中信息的选择是离散的,这导致基于梯度的学习方法,如 deep learning based methods,不可求导。然后,就无法利用 back-propagation 的方法进行区域的选择,来支持基于梯度的优化(because the choice of which information to process is discrete and thus non-differentiable, gradients cannot be backpropagated into the selection mechanism to support gradient-based optimization.)。当然有一些基于 Policy Gradient 的方法可以通过采样的方法,来处理梯度不可导的问题,但是这方面的研究,也仍然是非常的火热。
Approach Details:
待更新 、、、
--
- 论文笔记:Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
- 论文笔记: Hierarchical Question-Image Co-Attention for Visual Question Answering
- 论文笔记:Visual Question Answering as a Meta Learning Task
- Hierarchical Question-Image Co-Attention for Visual Question Answering
- 论文笔记 :Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
- 论文笔记:Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answeri
- 【论文阅读】Author2Vec: Learning Author Representations by Combining Content and Link Information
- 1705.Person Re-Identification by Deep Joint Learning of Multi-Loss Classification 论文阅读笔记
- 论文阅读理解 - Part-based clothing image annotation by visual neighbor retrieval
- ABC-CNN: An Attention Based Convolutional Neural Network for Visual Question Answering
- 阅读图像显著性检测论文一:A Model of Saliency-Based Visual Attention for Rapid Scene Analysis
- 论文阅读之:Deep Meta Learning for Real-Time Visual Tracking based on Target-Specific Feature Space
- 1705.Person Re-Identification by Deep Joint Learning of Multi-Loss Classification 论文阅读笔记
- Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering
- 阅读笔记(Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding)
- [论文阅读] Low-shot Visual Recognition by Shrinking and Hallucinating Features
- 【论文阅读】Neural Machine Translation By Jointly Learning To Align and Translate
- Paper Reading - Snap and ask: Answering Multimodal Question by Naming Visual Instance
- Hierarchical Question-Image Co-Attention for Visual Question Answering
- 论文研读--Stacked Attention Networks for Image Question Answering