☆ 4.6 Article

Semi-CNN Architecture for Effective Spatio-Temporal Learning in Action Recognition

APPLIED SCIENCES-BASEL (2020)

Journal

APPLIED SCIENCES-BASEL

Volume 10, Issue 2, Pages -

Publisher

MDPI

DOI: 10.3390/app10020557

Keywords

action recognition; spatio-temporal features; convolution network; transfer learning

Categories

Chemistry, Multidisciplinary Engineering, Multidisciplinary Materials Science, Multidisciplinary Physics, Applied

Funding

UiT The Arctic University of Norway

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

This paper introduces a fusion convolutional architecture for efficient learning of spatio-temporal features in video action recognition. Unlike 2D convolutional neural networks (CNNs), 3D CNNs can be applied directly on consecutive frames to extract spatio-temporal features. The aim of this work is to fuse the convolution layers from 2D and 3D CNNs to allow temporal encoding with fewer parameters than 3D CNNs. We adopt transfer learning from pre-trained 2D CNNs for spatial extraction, followed by temporal encoding, before connecting to 3D convolution layers at the top of the architecture. We construct our fusion architecture, semi-CNN, based on three popular models: VGG-16, ResNets and DenseNets, and compare the performance with their corresponding 3D models. Our empirical results evaluated on the action recognition dataset UCF-101 demonstrate that our fusion of 1D, 2D and 3D convolutions outperforms its 3D model of the same depth, with fewer parameters and reduces overfitting. Our semi-CNN architecture achieved an average of 16-30% boost in the top-1 accuracy when evaluated on an input video of 16 frames.

Authors

I am an author on this paper

Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6

Not enough ratings

Secondary Ratings

Novelty

-

Significance

-

Scientific rigor

-

Rate this paper

Recommended

Article Computer Science, Artificial Intelligence

A fast human action recognition network based on spatio-temporal features

Jie Xu, Rui Song, Haoliang Wei, Jinhong Guo, Yifei Zhou, Xiwei Huang

Summary: This paper proposes a fast network model to improve the accuracy of human action recognition by exploring the efficiency of optical flow feature extraction and fusion method of spatio-temporal features, achieving highly competitive accuracy in various datasets.

NEUROCOMPUTING (2021)

Add to Collection

Article Mathematics

STSM: Spatio-Temporal Shift Module for Efficient Action Recognition

Zhaoqilin Yang, Gaoyun An, Ruichen Zhang

Summary: The paper focuses on the modeling, computational complexity, and accuracy of spatio-temporal models in video action recognition. A plug-and-play Spatio-Temporal Shift Module (STSM) is proposed, which effectively enhances the network's ability to learn spatio-temporal features without increasing parameters and computational complexity. By integrating with 2D CNNs, the new network can learn spatio-temporal features and outperform networks based on 3D convolutions.

MATHEMATICS (2022)

Add to Collection

Article Computer Science, Artificial Intelligence

Spatio-Temporal Collaborative Module for Efficient Action Recognition

Yanbin Hao, Shuo Wang, Yi Tan, Xiangnan He, Zhenguang Liu, Meng Wang

Summary: Efficient action recognition is achieved through a novel spatio-temporal collaborative (STC) module, which integrates channel splitting and filter decoupling for efficient architecture design and feature refinement. Experimental results demonstrate that the proposed STC networks strike a competitive balance between model efficiency and effectiveness in video action recognition tasks.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2022)

Add to Collection

Article Chemistry, Multidisciplinary

Skeleton Motion Recognition Based on Multi-Scale Deep Spatio-Temporal Features

Kai Hu, Yiwu Ding, Junlan Jin, Liguo Weng, Min Xia

Summary: This paper proposes a novel multi-scale time sampling module and a deep spatiotemporal feature extraction module to enhance the accuracy of human motion recognition network. Comparative experiments show that the proposed method achieves performance improvement on two datasets.

APPLIED SCIENCES-BASEL (2022)

Add to Collection

Article Computer Science, Artificial Intelligence

Global spatio-temporal synergistic topology learning for skeleton-based action recognition

Meng Dai, Zhonghua Sun, Tianyi Wang, Jinchao Feng, Kebin Jia

Summary: Compared to RGB video-based action recognition, skeleton-based action recognition algorithm has gained more attention for its lightweight and robust nature. However, existing feature extraction methods have limitations in capturing temporal feature connections and global temporal features effectively. This work proposes a global spatio-temporal synergistic feature learning module (GSTL) and a powerful global spatio-temporal synergistic topology learning network (GSTLN) that achieves competitive performance on challenging datasets with fewer parameters.

PATTERN RECOGNITION (2023)

Add to Collection

Article Engineering, Electrical & Electronic

Spatio-Temporal Adaptive Network With Bidirectional Temporal Difference for Action Recognition

Zhilei Li, Jun Li, Yuqing Ma, Rui Wang, Zhiping Shi, Yifu Ding, Xianglong Liu

Summary: This paper proposes a novel Spatio-Temporal Adaptive Network (STANet) with bidirectional temporal difference and two adaptive modules to sufficiently extract motion information and model spatial appearance information. The experiments on widely-used action recognition benchmarks prove the effectiveness of the proposed methods compared to other state-of-the-art approaches.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY (2023)

Add to Collection

Article Computer Science, Information Systems

3DCANN: A Spatio-Temporal Convolution Attention Neural Network for EEG Emotion Recognition

Shuaiqi Liu, Xu Wang, Ling Zhao, Bing Li, Weiming Hu, Jie Yu, Yu-Dong Zhang

Summary: This paper proposes a deep learning model called 3DCANN for EEG emotion recognition. The model is able to extract spatio-temporal features from EEG signals and achieves superior performance over existing models in emotion classification.

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS (2022)

Add to Collection

Article Computer Science, Artificial Intelligence

Spatial-temporal pooling for action recognition in videos

Jiaming Wang, Zhenfeng Shao, Xiao Huang, Tao Lu, Ruiqian Zhang, Xianwei Lv

Summary: The study introduces a novel parameter-free spatial-temporal pooling block (STP) for action recognition in videos, which efficiently discards non-informative frames, learns spatial and temporal weights, and uses a new loss function to enforce the model to learn information from sparse and discriminative frames, ultimately outperforming several state-of-the-art methods in action classification.

NEUROCOMPUTING (2021)

Add to Collection

Article Mathematics

DSTnet: Deformable Spatio-Temporal Convolutional Residual Network for Video Super-Resolution

Anusha Khan, Allah Bux Sargano, Zulfiqar Habib

Summary: This study proposes a method to improve video super-resolution by effectively utilizing spatio-temporal information through a deformable spatio-temporal convolutional residual network (DSTNet). The method overcomes the challenges of separate motion estimation and compensation in existing methods, and has fewer learning parameters, making it an efficient framework for VSR.

MATHEMATICS (2021)

Add to Collection

Article Computer Science, Information Systems

Learning representative temporal features for action recognition

Ali Javidani, Ahmad Mahmoudi-Aznaveh

Summary: A novel video classification method is presented in this paper, which breaks down the processing of 3-dimensional video input to 1D in temporal dimension and 2D in spatial. The focus is on training a multi-channel 1D CNN to learn temporal features effectively, resulting in state-of-the-art results on public datasets.

MULTIMEDIA TOOLS AND APPLICATIONS (2022)

Add to Collection

Article Computer Science, Information Systems

Multimodal human action recognition based on spatio-temporal action representation recognition model

Qianhan Wu, Qian Huang, Xing Li

Summary: In this paper, a new model called the Spatio-temporal Action Representation Recognition Model is proposed for human action recognition. The model utilizes multimodal data and fusion algorithms to improve recognition accuracy. Experimental results demonstrate the effectiveness of this approach.

MULTIMEDIA TOOLS AND APPLICATIONS (2023)

Add to Collection

Article Computer Science, Artificial Intelligence

Spatio-temporal stacking model for skeleton-based action recognition

Yufeng Zhong, Qiuyan Yan

Summary: In this paper, a novel method based on shallow learning architecture is proposed to effectively identify complex actions in skeleton data. By combining Temporal Hierarchy Pyramid (THP) and Symmetric Positive Definite (SPD) features, the method captures both the temporal relationship of inter-frame and the spatial relationship of intra-frame. Extensive verification on widely used 3D action recognition datasets demonstrates that the method achieves state-of-the-art performance.

APPLIED INTELLIGENCE (2022)

Add to Collection

Article Computer Science, Software Engineering

STAM: a spatio-temporal adaptive module for improving static convolutions in action recognition

Wei Li, Weijun Gong, Yurong Qian, Haichen Tian

Summary: Temporal adaptive convolution has been proven to outperform static convolution techniques in video understanding. In this study, we propose a spatio-temporal hybrid adaptive convolution (STHAC) method to further enhance convolution's modeling capabilities by learning a set of spatio-temporal calibration filters. Compared to other dynamic convolution methods, STHAC requires fewer parameters and has lower computational complexity. Experimental results demonstrate its competitiveness with state-of-the-art convolutional neural network architectures on action recognition benchmarks.

VISUAL COMPUTER (2023)

Add to Collection

Article Clinical Neurology

Depression and Severity Detection Based on Body Kinematic Features: Using Kinect Recorded Skeleton Data of Simple Action

Yanhong Yu, Wentao Li, Yue Zhao, Jiayu Ye, Yunshao Zheng, Xinxin Liu, Qingxiang Wang

Summary: In this study, we used the Kinect V2 to collect skeletal data and proposed a novel spatial attention dilated TCN network for depression recognition. Our experiments and methods based on Kinect V2 not only identified and classified depression patients accurately but also observed the recovery level of depression patients during the recovery process.

FRONTIERS IN NEUROLOGY (2022)

Add to Collection

Article Computer Science, Information Systems

Skeleton-based action recognition with temporal action graph and temporal adaptive graph convolution structure

Yi Cao, Chen Liu, Zilong Huang, Yongjian Sheng, Yongjian Ju

Summary: This paper proposes a novel skeletons-based action recognition model ST-AGCN, which combines T-AGCN with spatial graph convolution to effectively explore global temporal information and improve action recognition accuracy.

MULTIMEDIA TOOLS AND APPLICATIONS (2021)

Add to Collection

Article Computer Science, Artificial Intelligence

RS-HeRR: a rough set-based Hebbian rule reduction neuro-fuzzy system

Feng Liu, Arif Ahmed Sekh, Chai Quek, Geok See Ng, Dilip K. Prasad

Summary: This paper introduces a hybrid fuzzy-rough set approach called RS-HeRR for generating effective, interpretable, and compact rule sets. It combines a powerful rule generation and reduction fuzzy system and improves system performance by reducing partial dependencies in rules.

NEURAL COMPUTING & APPLICATIONS (2021)

Add to Collection

Article Biochemical Research Methods

Artefact removal in ground truth deficient fluctuations-based nanoscopy images using deep learning

Suyog Jadhav, Sebastian Acuna, Ida S. Opstad, Balpreet Singh Ahluwalia, Krishna Agarwal, Dilip K. Prasad

Summary: Deep learning for image denoising or artefact removal faces challenges in nanoscopy images due to the lack of supervised training datasets and noise models. This study proposes a simulation-supervised training approach and investigates its application in sub-cellular structures within biological samples for nanoscopy images.

BIOMEDICAL OPTICS EXPRESS (2021)

Add to Collection

Article Optics

Label-free non-invasive classification of rice seeds using optical coherence tomography assisted with deep neural network

Deepa Joshi, Ankit Butola, Sheetal Raosaheb Kanade, Dilip K. Prasad, S. V. Amitha Mithra, N. K. Singh, Deepak Singh Bisht, Dalip Singh Mehta

Summary: A new technique using deep learning assisted optical coherence tomography (OCT) is proposed for identifying seed varieties, achieving classification accuracy of 89.6% for one dataset and 82.5% for another dataset. This method can accurately classify seed varieties despite morphological similarities, assisting in removing varietal duplication and assessing seed purity.

OPTICS AND LASER TECHNOLOGY (2021)

Add to Collection

Article Computer Science, Theory & Methods

Topic-based Video Analysis: A Survey

Ratnabali Pal, Arif Ahmed Sekh, Debi Prosad Dogra, Samarjit Kar, Partha Pratim Roy, Dilip K. Prasad

Summary: Handling a large volume of video data captured through closed-circuit television manually is challenging due to the time-consuming nature of manual analysis and the dynamic conditions of surveillance videos. Therefore, computer vision-based automatic surveillance scene analysis is performed in unsupervised ways, with topic modelling emerging as a key method for this purpose.

ACM COMPUTING SURVEYS (2021)

Add to Collection

Article Computer Science, Artificial Intelligence

Motivation detection using EEG signal analysis by residual-in-residual convolutional neural network

Soham Chattopadhyay, Laila Zary, Chai Quek, Dilip K. Prasad

Summary: A novel approach for motivation detection using EEG signals is proposed in this paper, which effectively addresses the issues of overfitting and vanishing gradient in small datasets through residual-in-residual architecture of convolutional neural network. The motivation state during learning can be accurately detected using alpha and beta wave signals, achieving 89% and 88% accuracy respectively.

EXPERT SYSTEMS WITH APPLICATIONS (2021)

Add to Collection

Article Automation & Control Systems

Object Pose Estimation via Pruned Hough Forest With Combined Split Schemes for Robotic Grasp

Huixu Dong, Dilip K. Prasad, I-Ming Chen

Summary: The article introduces a novel approach for estimating the poses of textureless and textured objects, which is superior to recent works under various conditions. Extensive experiments demonstrate the applicability of the proposed method in practical scenarios.

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING (2021)

Add to Collection

Article Computer Science, Artificial Intelligence

Emotionally charged text classification with deep learning and sentiment semantic

Jeow Li Huan, Arif Ahmed Sekh, Chai Quek, Dilip K. Prasad

Summary: This paper investigates text classification methods by using deep models and recurrent neural networks to extract features and represent documents as semantic vector sequences for classification. The addition of sentiment information improves accuracy, outperforming classical techniques in experiments.

NEURAL COMPUTING & APPLICATIONS (2022)

Add to Collection

Article Chemistry, Multidisciplinary

Biosignal-Based Driving Skill Classification Using Machine Learning: A Case Study of Maritime Navigation

Hui Xue, Bjorn-Morten Batalden, Puneet Sharma, Jarle Andre Johansen, Dilip K. Prasad

Summary: This study presents a novel approach to detecting stress differences between experts and novices in Situation Awareness tasks during maritime navigation using wearable sensors. The analysis of biosignal data with a machine learning algorithm revealed that experts and novices show differences in biosignal data under a given workload state, which can contribute to the development of a self-training system in maritime navigation.

APPLIED SCIENCES-BASEL (2021)

Add to Collection

Article Engineering, Electrical & Electronic

Pixel-Wise Ship Identification From Maritime Images via a Semantic Segmentation Model

Xinqiang Chen, Xingyu Wu, Dilip K. Prasad, Bing Wu, Octavian Postolache, Yongsheng Yang

Summary: This paper proposes a novel approach for pixel-wise ship segmentation and identification task using the EU-Net deep learning architecture. Experimental results show that the proposed model accurately identifies ships and can be applied in ship sensing systems for maritime traffic situation awareness and intelligent visual navigation in the smart ship era.

IEEE SENSORS JOURNAL (2022)

Add to Collection

Article Biochemical Research Methods

Virtual labeling of mitochondria in living cells using correlative imaging and physics-guided deep learning

Ayush Somani, Arif Ahmed Sekh, Ida S. Opstad, Asa Birna Birgisdottir, Truls Myrmel, Balpreet Singh Ahluwalia, Alexander Horsch, Krishna Agarwal, Dilip K. Prasad

Summary: This paper presents a novel method to visualize mitochondria in living cells without fluorescent markers. The authors proposed a physics-guided deep learning approach to obtain virtually labeled micrographs of mitochondria from bright-field images. The results showed that the virtual labeling approach significantly outperformed state-of-the-art techniques in segmenting and tracking individual mitochondria.

BIOMEDICAL OPTICS EXPRESS (2022)

Add to Collection

Article Optics

Single-shot multispectral quantitative phase imaging of biological samples using deep learning

Sunil Bhatt, Ankit Butola, Anand Kumar, Pramila Thapa, Akshay Joshi, Suyog Jadhav, Neetu Singh, Dilip K. Prasad, Krishna Agarwal, Dalip Singh Mehta

Summary: Multispectral quantitative phase imaging (MS-QPI) is achieved by using a highly spatially sensitive digital holographic microscope assisted by a deep neural network to extract spectral dependent quantitative information in single-shot. Three different wavelengths (532, 633, and 808 nm) are used, and interferometric data is acquired for each wavelength. A generative adversarial network is trained to generate multispectral (MS) quantitative phase maps from a single input interferogram. The validation of the approach is done by comparing the predicted MS phase maps with numerically reconstructed phase maps using different image quality assessment metrics.

APPLIED OPTICS (2023)

Add to Collection

Article Nanoscience & Nanotechnology

Image inpainting in acoustic microscopy

Pragyan Banerjee, Sibasish Mishra, Nitin Yadav, Krishna Agarwal, Frank Melandso, Dilip K. Prasad, Anowarul Habib

Summary: Scanning Acoustic Microscopy (SAM) is a non-ionizing and label-free imaging modality that uses high-frequency acoustic waves to create images of the surface and internal structures of industrial objects and biological specimens. This paper proposes a deep learning-based method for image inpainting in acoustic microscopy, using various generative adversarial networks (GANs) to fill in holes in the original image and generate a 4x image. The performance of the trained model is evaluated using peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM), with the Hypergraphs image inpainting model achieving an average SSIM of 0.93 for 2x and up to 0.93 for the final 4x, as well as a PSNR of 32.33 for 2x and up to 32.20 for the final 4x.

AIP ADVANCES (2023)

Add to Collection

Article Computer Science, Artificial Intelligence

Client Selection in Federated Learning under Imperfections in Environment

Sumit Rai, Arti Kumari, Dilip K. Prasad

Summary: This paper proposes a novel sampling method called "irrelevance sampling technique" for selecting the best clients in each round of learning. The method defines an irrelevance score to classify clients into three pools for sampling. It is computationally inexpensive, intuitive, and privacy preserving, achieving faster convergence even in skewed and imbalanced data scenarios.

AI (2022)

Add to Collection

Article Computer Science, Artificial Intelligence

Physics-based machine learning for subcellular segmentation in living cells

Arif Ahmed Sekh, Ida S. Opstad, Gustav Godtliebsen, Asa Birna Birgisdottir, Balpreet Singh Ahluwalia, Krishna Agarwal, Dilip K. Prasad

Summary: To solve the problem of segmenting very small subcellular structures, the study uses a physics-based simulation approach to train neural networks and introduces a simulation-supervision method supported by physics-based GT. This approach addresses the issue of lacking ground truth data and improves the accuracy and speed of subcellular segmentation.

NATURE MACHINE INTELLIGENCE (2021)

Add to Collection

Article Thermodynamics

Inverse and efficiency of heat transfer convex fin with multiple nonlinearities

Pranab Kanti Roy, Hiranmoy Mondal, Ashis Mallick, Dilip K. Prasad

Summary: This article introduces a novel semi-analytical technique - the modified Adomian decomposition method (MADM) - to solve the nonlinear heat transfer equation of convex profile with singularity. Through inverse heat transfer analysis, unknown parameters such as thermal conductivity and surface emissivity were successfully predicted, with consideration of the effects of measurement error and the number of measurement points.

HEAT TRANSFER (2021)

Add to Collection

No Data Available

© Peeref 2019-2024. All rights reserved.