☆ 3.8 Article

Enhancing Speech Recognition for Parkinson’s Disease Patient Using Transfer Learning Technique

Journal of Shanghai Jiaotong University (Science) (2021)

Journal

Journal of Shanghai Jiaotong University (Science)

Volume 27, Issue 1, Pages 90-98

Publisher

Springer Science and Business Media LLC

DOI: 10.1007/s12204-021-2376-3

Keywords

-

Categories

-

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Authors

I am an author on this paper

Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

3.8

Not enough ratings

Secondary Ratings

Novelty

-

Significance

-

Scientific rigor

-

Rate this paper

Recommended

Article Computer Science, Artificial Intelligence

A GDPR-compliant Ecosystem for Speech Recognition with Transfer, Federated, and Evolutionary Learning

Di Jiang, Conghui Tan, Jinhua Peng, Chaotao Chen, Xueyang Wu, Weiwei Zhao, Yuanfeng Song, Yongxin Tong, Chang Liu, Qian Xu, Qiang Yang, Li Deng

Summary: Automatic Speech Recognition (ASR) is crucial in real-world applications, but commercial solutions often face performance degradation and data regulation issues. By integrating three machine learning paradigms, a win-win ecosystem is created for both clients and vendors, solving their problems effectively.

ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY (2021)

Add to Collection

Article Engineering, Electrical & Electronic

Cyclic Transfer Learning for Mandarin-English Code-Switching Speech Recognition

Cao Hong Nga, Duc-Quang Vu, Huong Hoang Luong, Chien-Lin Huang, Jia-Ching Wang

Summary: In this study, a cyclic transfer learning method (CTL) is proposed to improve the model's performance on the target task by utilizing code-switching and monolingual speech resources as pretext tasks. The model is alternately learned among these tasks, allowing the preservation of code-switching features for knowledge transfer. Experimental results on the SEAME Mandarin-English code-switching corpus show that the CTL approach achieves the best performance compared to other methods, with significant relative MER reduction on the test sets.

IEEE SIGNAL PROCESSING LETTERS (2023)

Add to Collection

Article Computer Science, Artificial Intelligence

A hierarchical automatic phoneme recognition model for Hindi-Devanagari consonants using machine learning technique

Mousumi Malakar, Ravindra B. Keskar, Ajit Zadgaonkar

Summary: A phoneme is the smallest distinct sound unit that differentiates words in a language. This paper proposes a hierarchical classification approach using machine learning techniques for phoneme recognition. The proposed approach improves performance compared to the direct classification approach.

EXPERT SYSTEMS (2023)

Add to Collection

Article Computer Science, Information Systems

An Improved Noise Reduction Technique for Enhancing the Intelligibility of Sinewave Vocoded Speech: Implication in Cochlear Implants

Venkateswarlu Poluboina, Aparna Pulikala, Arivudai Nambi Pitchai Muthu

Summary: A cochlear implant is the most suitable option for individuals with severe profound hearing loss, as it restores audibility and offers good speech understanding in quiet. However, speech perception in noise with cochlear implants is suboptimal due to current coding strategies that lack sophisticated pre-processing. This study proposes a novel pre-processing method to improve speech intelligibility in noise and evaluates its performance using objective and subjective tests.

IEEE ACCESS (2023)

Add to Collection

Article Chemistry, Analytical

Enhancing Speech Emotion Recognition Using Dual Feature Extraction Encoders

Ilkhomjon Pulatov, Rashid Oteniyazov, Fazliddin Makhmudov, Young-Im Cho

Summary: Understanding and identifying emotional cues in human speech is crucial for human-computer communication. This study proposes an innovative framework for speech emotion recognition, utilizing spectrograms and semantic feature transcribers. The framework combines convolutional neural network models and Mel-frequency cepstral coefficient feature abstraction approach for better representation. The evaluation results show superior performance compared to existing models, with an accuracy of 94.8% on RAVDESS dataset and 94.0% on EMO-DB dataset.

SENSORS (2023)

Add to Collection

Article Physics, Multidisciplinary

Adapting Multiple Distributions for Bridging Emotions from Different Speech Corpora

Yuan Zong, Hailun Lian, Hongli Chang, Cheng Lu, Chuangao Tang

Summary: This paper focuses on the challenging task of cross-corpus speech emotion recognition (SER). To tackle the feature distribution mismatch between labeled source and target speech samples from different emotion corpora, the authors propose a transfer subspace learning method called MDAR. By learning a projection matrix and incorporating a novel regularization term called MDA, the MDAR method achieves better performance than other state-of-the-art transfer learning methods in cross-corpus SER tasks.

ENTROPY (2022)

Add to Collection

Article Acoustics

Enhancing Segment-Based Speech Emotion Recognition by Iterative Self-Learning

Shuiyang Mao, P. C. Ching, Tan Lee

Summary: This paper presents a deep neural network approach for speech emotion recognition using a limited amount of labeled data. Unlike traditional methods, this approach trains backbone networks on shorter segments, thereby increasing the number of training examples. However, due to the lack of segment-level labels in most emotional corpora, an iterative self-learning framework is proposed to correct the labels and improve recognition performance.

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING (2022)

Add to Collection

Article Computer Science, Information Systems

A Novel Efficient Patient Monitoring FER System Using Optimal DL-Features

Mousa Alhajlah

Summary: In this paper, a novel FER framework is proposed for patient monitoring. Preprocessing and data balancing are performed, followed by training two lightweight efficient CNN models MobileNetV2 and NasNetMobile and extracting feature vectors. The WOA algorithm is used to remove irrelevant features from these vectors, and the optimized features are passed to the classifier. Experimental results show that the proposed model achieves 82.5% accuracy and outperforms state-of-the-art techniques in terms of accuracy. It is worth noting that the proposed technique achieves better accuracy with 2.8 times fewer features.

CMC-COMPUTERS MATERIALS & CONTINUA (2023)

Add to Collection

Article Engineering, Electrical & Electronic

Silent Speech Recognition Based on Surface Electromyography Using a Few Electrode Sites Under the Guidance From High-Density Electrode Arrays

Zhihang Deng, Xu Zhang, Xi Chen, Xiang Chen, Xun Chen, Erwei Yin

Summary: The study aims to develop a nonacoustic modality of silent speech recognition (SSR) that transfers knowledge learned from high-density electrode array to a system using a few channels, with both high portability and performance. A convolutional neural network (CNN) was established and trained using data recorded from face and neck muscles, and then calibrated through transfer learning to adapt to a new target domain with data recorded by separate electrodes. The proposed method outperformed other classification approaches and showed performance improvements even under electrode shift and cross-user variability conditions.

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT (2023)

Add to Collection

Article Psychiatry

A generalizable speech emotion recognition model reveals depression and remission

Lasse Hansen, Yan-Ping Zhang, Detlef Wolf, Konstantinos Sechidis, Nicolai Ladegaard, Riccardo Fusaroli

Summary: A generalizable speech emotion recognition model trained using transfer learning on non-clinical datasets can effectively predict changes in depressive states before and after remission in patients with major depressive disorder (MDD). Data collection and cleaning play crucial roles in ensuring the accuracy of automated voice analysis for clinical purposes.

ACTA PSYCHIATRICA SCANDINAVICA (2022)

Add to Collection

Article Acoustics

Speech emotion recognition based on transfer learning from the FaceNet frameworka)

Shuhua Liu, Mengyu Zhang, Ming Fang, Jianwei Zhao, Kun Hou, Chih-Cheng Hung

Summary: Speech plays a crucial role in human-computer emotional interaction, and this study utilizes the FaceNet model to improve speech emotion recognition. By pretraining on the CASIA dataset and fine-tuning on the IEMOCAP dataset, the proposed approach achieves high accuracy due to clean signals. Experimental results demonstrate that the method outperforms state-of-the-art approaches on the IEMOCAP dataset among single modal methods.

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA (2021)

Add to Collection

Article Computer Science, Artificial Intelligence

Self-labeling with feature transfer for speech emotion recognition

Guihua Wen, Huiqiang Liao, Huihui Li, Pengchen Wen, Tong Zhang, Sande Gao, Bao Wang

Summary: This paper proposes a self-labeling learning method for speech emotion recognition, which automatically segments each speech sample and labels them with emotional tags. It designs and trains a time-frequency deep neural network and applies a feature transfer model to enhance performance.

KNOWLEDGE-BASED SYSTEMS (2022)

Add to Collection

Article Computer Science, Artificial Intelligence

Identifying crop diseases using attention embedded MobileNet-V2 model

Junde Chen, Defu Zhang, Md Suzauddola, Adnan Zeb

Summary: Crop diseases are a major issue globally, leading to decreased crop production. Image-based automatic identification methods have gained attention for addressing this problem. This study introduces a Location-wise Soft Attention mechanism in MobileNet-V2, showing promising results in crop disease recognition through experimental analyses.

APPLIED SOFT COMPUTING (2021)

Add to Collection

Article Computer Science, Artificial Intelligence

Speech emotion recognition based on meta-transfer learning with domain adaption

Zhen -Tao Liu, Bao-Han Wu, Meng -Ting Han, Wei -Hua Cao, Min Wu

Summary: In this study, a few-shot learning method based on meta-transfer learning with domain adaption is proposed for speech emotion recognition (SER). It effectively reduces the over-fitting phenomenon and solves the target domain adaptability problem.

APPLIED SOFT COMPUTING (2023)

Add to Collection

Article Computer Science, Artificial Intelligence

Cross Corpus Speech Emotion Recognition using transfer learning and attention-based fusion of Wav2Vec2 and prosody features

Navid Naderi, Babak Nasersharif

Summary: This paper proposes a method for adapting a speech emotion recognition system to different conditions. It uses attention-based feature fusion and transfer learning in both feature extraction and classification. Experimental results demonstrate the effectiveness of the proposed method on various target corpora.

KNOWLEDGE-BASED SYSTEMS (2023)

Add to Collection

No Data Available

No Data Available

© Peeref 2019-2024. All rights reserved.