Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning
2017-06-11 10:18
344 查看
Computer Science > Computer Vision and Pattern Recognition
Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning
Pingbo Pan, Zhongwen Xu, YiYang, Fei Wu, Yueting Zhuang
(Submitted on 11 Nov 2015)
Recently, deep learning approach, especially deep Convolutional Neural Networks (ConvNets), have achieved overwhelming accuracy with fast processing speed for image classification. Incorporating temporal structure with deep ConvNets for video representation
becomes a fundamental problem for video content analysis. In this paper, we propose a new approach, namely Hierarchical Recurrent Neural Encoder (HRNE), to exploit temporal information of videos. Compared to recent video representation inference approaches,
this paper makes the following three contributions. First, our HRNE is able to efficiently exploit video temporal structure in a longer range by reducing the length of input information flow, and compositing multiple consecutive inputs at a higher level. Second,
computation operations are significantly lessened while attaining more non-linearity. Third, HRNE is able to uncover temporal transitions between frame chunks with different granularities, i.e., it can model the temporal transitions between frames as well
as the transitions between segments. We apply the new method to video captioning where temporal information plays a crucial role. Experiments demonstrate that our method outperforms the state-of-the-art on video captioning benchmarks. Notably, even using a
single network with only RGB stream as input, HRNE beats all the recent systems which combine multiple inputs, such as RGB ConvNet plus 3D ConvNet.
Subjects: | Computer Vision and Pattern Recognition (cs.CV) |
Cite as: | arXiv:1511.03476 [cs.CV] |
(or arXiv:1511.03476v1 [cs.CV] for this version) |
Submission history
From: Zhongwen Xu [view email][v1] Wed, 11 Nov 2015 12:38:14 GMT (1777kb,D)
相关文章推荐
- Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks
- Hierarchical Boundary-Aware Neural Encoder for Video Captioning
- see the forest for the trees:spitial and temporal recurrent neural networks for video-based re-id
- Learning Attention for Online Advertising with Recurrent Neural Network论文思路整理
- keras 2.0:Encoder-Decoder Sequence-to-Sequence Model for Neural Machine Translation
- Dialog throwing "Unable to add window — token null is not for an application” with getApplication()
- Look Closer to See Better Recurrent Attention Convolutional Neural Network for Fine-grained Image Re
- [论文笔记] Learning to Read Chest X-Rays Recurrent Neural Cascade Model for Automated Image Annotation
- How to Visualize Your Recurrent Neural Network with Attention in Keras
- An End-to-End System for Unconstrained Face Verification with Deep Convolutional Neural Networks
- 在电力智能测量数据应用中的分层时间序列预测正则化(Regularization in Hierarchical Time Series Forecasting with Application to)
- with ffmpeg to encode video for live streaming and for recording to files for on-demand playback
- 论文阅读(Xiang Bai——【PAMI2017】An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition)
- 论文笔记:An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application
- [COLING2016]Semantic Relation Classification via Hierarchical Recurrent Neural Network with Attentio
- An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to S
- A new boosting algorithm for improved time-series forecasting with recurrent neural networks
- Beyond Caption To Narrative: Video Captioning With Multiple Sentences
- End-to-end Concept Word Detection for Video Captioning, Retrieval, and Question Answering
- Where to apply dropout in recurrent neural networks for handwriting recognition?