4.6 Article

Semi-CNN Architecture for Effective Spatio-Temporal Learning in Action Recognition

Journal

APPLIED SCIENCES-BASEL
Volume 10, Issue 2, Pages -

Publisher

MDPI
DOI: 10.3390/app10020557

Keywords

action recognition; spatio-temporal features; convolution network; transfer learning

Funding

  1. UiT The Arctic University of Norway

Ask authors/readers for more resources

This paper introduces a fusion convolutional architecture for efficient learning of spatio-temporal features in video action recognition. Unlike 2D convolutional neural networks (CNNs), 3D CNNs can be applied directly on consecutive frames to extract spatio-temporal features. The aim of this work is to fuse the convolution layers from 2D and 3D CNNs to allow temporal encoding with fewer parameters than 3D CNNs. We adopt transfer learning from pre-trained 2D CNNs for spatial extraction, followed by temporal encoding, before connecting to 3D convolution layers at the top of the architecture. We construct our fusion architecture, semi-CNN, based on three popular models: VGG-16, ResNets and DenseNets, and compare the performance with their corresponding 3D models. Our empirical results evaluated on the action recognition dataset UCF-101 demonstrate that our fusion of 1D, 2D and 3D convolutions outperforms its 3D model of the same depth, with fewer parameters and reduces overfitting. Our semi-CNN architecture achieved an average of 16-30% boost in the top-1 accuracy when evaluated on an input video of 16 frames.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Computer Science, Artificial Intelligence

RS-HeRR: a rough set-based Hebbian rule reduction neuro-fuzzy system

Feng Liu, Arif Ahmed Sekh, Chai Quek, Geok See Ng, Dilip K. Prasad

Summary: This paper introduces a hybrid fuzzy-rough set approach called RS-HeRR for generating effective, interpretable, and compact rule sets. It combines a powerful rule generation and reduction fuzzy system and improves system performance by reducing partial dependencies in rules.

NEURAL COMPUTING & APPLICATIONS (2021)

Article Biochemical Research Methods

Artefact removal in ground truth deficient fluctuations-based nanoscopy images using deep learning

Suyog Jadhav, Sebastian Acuna, Ida S. Opstad, Balpreet Singh Ahluwalia, Krishna Agarwal, Dilip K. Prasad

Summary: Deep learning for image denoising or artefact removal faces challenges in nanoscopy images due to the lack of supervised training datasets and noise models. This study proposes a simulation-supervised training approach and investigates its application in sub-cellular structures within biological samples for nanoscopy images.

BIOMEDICAL OPTICS EXPRESS (2021)

Article Optics

Label-free non-invasive classification of rice seeds using optical coherence tomography assisted with deep neural network

Deepa Joshi, Ankit Butola, Sheetal Raosaheb Kanade, Dilip K. Prasad, S. V. Amitha Mithra, N. K. Singh, Deepak Singh Bisht, Dalip Singh Mehta

Summary: A new technique using deep learning assisted optical coherence tomography (OCT) is proposed for identifying seed varieties, achieving classification accuracy of 89.6% for one dataset and 82.5% for another dataset. This method can accurately classify seed varieties despite morphological similarities, assisting in removing varietal duplication and assessing seed purity.

OPTICS AND LASER TECHNOLOGY (2021)

Article Computer Science, Theory & Methods

Topic-based Video Analysis: A Survey

Ratnabali Pal, Arif Ahmed Sekh, Debi Prosad Dogra, Samarjit Kar, Partha Pratim Roy, Dilip K. Prasad

Summary: Handling a large volume of video data captured through closed-circuit television manually is challenging due to the time-consuming nature of manual analysis and the dynamic conditions of surveillance videos. Therefore, computer vision-based automatic surveillance scene analysis is performed in unsupervised ways, with topic modelling emerging as a key method for this purpose.

ACM COMPUTING SURVEYS (2021)

Article Computer Science, Artificial Intelligence

Motivation detection using EEG signal analysis by residual-in-residual convolutional neural network

Soham Chattopadhyay, Laila Zary, Chai Quek, Dilip K. Prasad

Summary: A novel approach for motivation detection using EEG signals is proposed in this paper, which effectively addresses the issues of overfitting and vanishing gradient in small datasets through residual-in-residual architecture of convolutional neural network. The motivation state during learning can be accurately detected using alpha and beta wave signals, achieving 89% and 88% accuracy respectively.

EXPERT SYSTEMS WITH APPLICATIONS (2021)

Article Automation & Control Systems

Object Pose Estimation via Pruned Hough Forest With Combined Split Schemes for Robotic Grasp

Huixu Dong, Dilip K. Prasad, I-Ming Chen

Summary: The article introduces a novel approach for estimating the poses of textureless and textured objects, which is superior to recent works under various conditions. Extensive experiments demonstrate the applicability of the proposed method in practical scenarios.

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING (2021)

Article Computer Science, Artificial Intelligence

Emotionally charged text classification with deep learning and sentiment semantic

Jeow Li Huan, Arif Ahmed Sekh, Chai Quek, Dilip K. Prasad

Summary: This paper investigates text classification methods by using deep models and recurrent neural networks to extract features and represent documents as semantic vector sequences for classification. The addition of sentiment information improves accuracy, outperforming classical techniques in experiments.

NEURAL COMPUTING & APPLICATIONS (2022)

Article Chemistry, Multidisciplinary

Biosignal-Based Driving Skill Classification Using Machine Learning: A Case Study of Maritime Navigation

Hui Xue, Bjorn-Morten Batalden, Puneet Sharma, Jarle Andre Johansen, Dilip K. Prasad

Summary: This study presents a novel approach to detecting stress differences between experts and novices in Situation Awareness tasks during maritime navigation using wearable sensors. The analysis of biosignal data with a machine learning algorithm revealed that experts and novices show differences in biosignal data under a given workload state, which can contribute to the development of a self-training system in maritime navigation.

APPLIED SCIENCES-BASEL (2021)

Article Engineering, Electrical & Electronic

Pixel-Wise Ship Identification From Maritime Images via a Semantic Segmentation Model

Xinqiang Chen, Xingyu Wu, Dilip K. Prasad, Bing Wu, Octavian Postolache, Yongsheng Yang

Summary: This paper proposes a novel approach for pixel-wise ship segmentation and identification task using the EU-Net deep learning architecture. Experimental results show that the proposed model accurately identifies ships and can be applied in ship sensing systems for maritime traffic situation awareness and intelligent visual navigation in the smart ship era.

IEEE SENSORS JOURNAL (2022)

Article Biochemical Research Methods

Virtual labeling of mitochondria in living cells using correlative imaging and physics-guided deep learning

Ayush Somani, Arif Ahmed Sekh, Ida S. Opstad, Asa Birna Birgisdottir, Truls Myrmel, Balpreet Singh Ahluwalia, Alexander Horsch, Krishna Agarwal, Dilip K. Prasad

Summary: This paper presents a novel method to visualize mitochondria in living cells without fluorescent markers. The authors proposed a physics-guided deep learning approach to obtain virtually labeled micrographs of mitochondria from bright-field images. The results showed that the virtual labeling approach significantly outperformed state-of-the-art techniques in segmenting and tracking individual mitochondria.

BIOMEDICAL OPTICS EXPRESS (2022)

Article Optics

Single-shot multispectral quantitative phase imaging of biological samples using deep learning

Sunil Bhatt, Ankit Butola, Anand Kumar, Pramila Thapa, Akshay Joshi, Suyog Jadhav, Neetu Singh, Dilip K. Prasad, Krishna Agarwal, Dalip Singh Mehta

Summary: Multispectral quantitative phase imaging (MS-QPI) is achieved by using a highly spatially sensitive digital holographic microscope assisted by a deep neural network to extract spectral dependent quantitative information in single-shot. Three different wavelengths (532, 633, and 808 nm) are used, and interferometric data is acquired for each wavelength. A generative adversarial network is trained to generate multispectral (MS) quantitative phase maps from a single input interferogram. The validation of the approach is done by comparing the predicted MS phase maps with numerically reconstructed phase maps using different image quality assessment metrics.

APPLIED OPTICS (2023)

Article Nanoscience & Nanotechnology

Image inpainting in acoustic microscopy

Pragyan Banerjee, Sibasish Mishra, Nitin Yadav, Krishna Agarwal, Frank Melandso, Dilip K. Prasad, Anowarul Habib

Summary: Scanning Acoustic Microscopy (SAM) is a non-ionizing and label-free imaging modality that uses high-frequency acoustic waves to create images of the surface and internal structures of industrial objects and biological specimens. This paper proposes a deep learning-based method for image inpainting in acoustic microscopy, using various generative adversarial networks (GANs) to fill in holes in the original image and generate a 4x image. The performance of the trained model is evaluated using peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM), with the Hypergraphs image inpainting model achieving an average SSIM of 0.93 for 2x and up to 0.93 for the final 4x, as well as a PSNR of 32.33 for 2x and up to 32.20 for the final 4x.

AIP ADVANCES (2023)

Article Computer Science, Artificial Intelligence

Client Selection in Federated Learning under Imperfections in Environment

Sumit Rai, Arti Kumari, Dilip K. Prasad

Summary: This paper proposes a novel sampling method called "irrelevance sampling technique" for selecting the best clients in each round of learning. The method defines an irrelevance score to classify clients into three pools for sampling. It is computationally inexpensive, intuitive, and privacy preserving, achieving faster convergence even in skewed and imbalanced data scenarios.
Article Computer Science, Artificial Intelligence

Physics-based machine learning for subcellular segmentation in living cells

Arif Ahmed Sekh, Ida S. Opstad, Gustav Godtliebsen, Asa Birna Birgisdottir, Balpreet Singh Ahluwalia, Krishna Agarwal, Dilip K. Prasad

Summary: To solve the problem of segmenting very small subcellular structures, the study uses a physics-based simulation approach to train neural networks and introduces a simulation-supervision method supported by physics-based GT. This approach addresses the issue of lacking ground truth data and improves the accuracy and speed of subcellular segmentation.

NATURE MACHINE INTELLIGENCE (2021)

Article Thermodynamics

Inverse and efficiency of heat transfer convex fin with multiple nonlinearities

Pranab Kanti Roy, Hiranmoy Mondal, Ashis Mallick, Dilip K. Prasad

Summary: This article introduces a novel semi-analytical technique - the modified Adomian decomposition method (MADM) - to solve the nonlinear heat transfer equation of convex profile with singularity. Through inverse heat transfer analysis, unknown parameters such as thermal conductivity and surface emissivity were successfully predicted, with consideration of the effects of measurement error and the number of measurement points.

HEAT TRANSFER (2021)

No Data Available