4.5 Article

ST-CNN: Spatial-Temporal Convolutional Neural Network for crowd counting in videos

期刊

PATTERN RECOGNITION LETTERS
卷 125, 期 -, 页码 113-118

出版社

ELSEVIER
DOI: 10.1016/j.patrec.2019.04.012

关键词

Crowd counting; Spatio-temporal feature; Crowd analysis

资金

  1. Natural Science Foundation of China [61672079, 61473086]
  2. Shenzhen Science and Technology Program
  3. Shenzhen Peacock Plan [KQTD2016112515134654]

向作者/读者索取更多资源

The task of crowd counting and density maps estimating from videos is challenging due to severe occlusions, scene perspective distortions and diverse crowd distributions. Conventional crowd counting methods via deep learning technique process each video frame independently with no consideration of the intrinsic temporal correlation among neighboring frames, thus making the performance lower than the required level of real-world applications. To overcome this shortcoming, a new end-to-end deep architecture named Spatial-Temporal Convolutional Neural Network (ST-CNN) is proposed, which unifies 2D convolutional neural network (C2D) and 3D convolutional neural network (C3D) to learn spatial-temporal features in the same framework. On top of that, a merging scheme is performed on the resulting density maps, taking advantages of the spatial-temporal information simultaneously for the crowd counting task. Experimental results on two benchmark data sets a Mall dataset and WorldExpo' 10 dataset show that our ST-CNN outperforms the state-of-the-art models in terms of mean absolutely error (MAE) and mean squared error (MSE). (C) 2019 Published by Elsevier B.V.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据