4.7 Article

Unified Spatio-Temporal Attention Networks for Action Recognition in Videos

期刊

IEEE TRANSACTIONS ON MULTIMEDIA
卷 21, 期 2, 页码 416-428

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TMM.2018.2862341

关键词

Action recognition; spatio-temporal attention; deep convolutional networks

资金

  1. PKU-NTU Joint Research Institute (JRI)

向作者/读者索取更多资源

Recognizing actions in videos is not a trivial task because video is an information-intensive media and includes multiple modalities. Moreover, on each modality, an action may only appear at some spatial regions, or only part of the temporal video segments may contain the action. A valid question is how to locate the attended spatial areas and selective video segments for action recognition. In this paper, we devise a general attention neural cell, called AttCell, that estimates the attention probability not only at each spatial location but also for each video segment in a temporal sequence. With AttCell, a unified Spatio-Temporal Attention Networks (STAN) is proposed in the context of multiple modalities. Specifically, STAN extracts the feature map of one convolutional layer as the local descriptors on each modality and pools the extracted descriptors with the spatial attention measured by AttCell as a representation of each segment. Then, we concatenate the representation on each modality to seek a consensus on the temporal attention, a priori, to holistically fuse the combined representation of video segments to the video representation for recognition. Our model differs from conventional deep networks, which focus on the attention mechanism, because our temporal attention provides a principled and global guidance across different modalities and video segments. Extensive experiments are conducted on four public datasets; UCF101, CCV, THUMOS14, and Sports-1M; our STAN consistently achieves superior results over several state-of-the-art techniques. More remarkably, we validate and demonstrate the effectiveness of our proposal when capitalizing on the different number of modalities.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据