4.7 Article

Dual-Stream Recurrent Neural Network for Video Captioning

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TCSVT.2018.2867286

关键词

Video captioning; hidden state fusion; dual stream; recurrent neural network; attention module

资金

  1. National Natural Science Foundation of China [61772359, 61472275, 61525206, 61502337]
  2. National Key Research and Development Program of China [2017YFC0820600]
  3. National Defense Science and Technology Fund for Distinguished Young Scholars [2017-JCJQ-ZQ-022]
  4. National Research Foundation, Prime Minister's Office, Singapore, under its International Research Centre in Singapore Funding Initiative

向作者/读者索取更多资源

Recent progress in using recurrent neural networks (RNNs) for video description has attracted an increasing interest, due to its capability to encode a sequence of frames for caption generation. While existing methods have studied various features (e.g., CNN, 3D CNN, and semantic attributes) for visual encoding, the representation and fusion of heterogeneous information from multi-modal spaces have not fully explored. Consider that different modalities are often asynchronous, frame-level multi-modal fusion (e.g., concatenation and linear fusion) will negatively influence each modality. In this paper, we propose a dual-stream RNN (DS-RNN) framework to jointly discover and integrate the hidden states of both visual and semantic streams for video caption generation. First, an encoding RNN is used for each stream to flexibly exploit the hidden states of respective modality. Specifically, we proposed an attentive multi-grained encoder module to enhance the local feature learning with global semantics feature. Then, a dual-stream decoder is deployed to integrate the asynchronous yet complementary sequential hidden states from both streams for caption generation. Extensive experiments on three benchmark datasets, namely, MSVD, MSR-VTT, and MPII-MD, show that DS-RNN achieves competitive performance against the state-of-the-art. Additional ablation studies were conducted on various variants of the proposed DS-RNN.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据