Article
Computer Science, Artificial Intelligence
Di Jiang, Conghui Tan, Jinhua Peng, Chaotao Chen, Xueyang Wu, Weiwei Zhao, Yuanfeng Song, Yongxin Tong, Chang Liu, Qian Xu, Qiang Yang, Li Deng
Summary: Automatic Speech Recognition (ASR) is crucial in real-world applications, but commercial solutions often face performance degradation and data regulation issues. By integrating three machine learning paradigms, a win-win ecosystem is created for both clients and vendors, solving their problems effectively.
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY
(2021)
Article
Engineering, Electrical & Electronic
Cao Hong Nga, Duc-Quang Vu, Huong Hoang Luong, Chien-Lin Huang, Jia-Ching Wang
Summary: In this study, a cyclic transfer learning method (CTL) is proposed to improve the model's performance on the target task by utilizing code-switching and monolingual speech resources as pretext tasks. The model is alternately learned among these tasks, allowing the preservation of code-switching features for knowledge transfer. Experimental results on the SEAME Mandarin-English code-switching corpus show that the CTL approach achieves the best performance compared to other methods, with significant relative MER reduction on the test sets.
IEEE SIGNAL PROCESSING LETTERS
(2023)
Article
Computer Science, Artificial Intelligence
Mousumi Malakar, Ravindra B. Keskar, Ajit Zadgaonkar
Summary: A phoneme is the smallest distinct sound unit that differentiates words in a language. This paper proposes a hierarchical classification approach using machine learning techniques for phoneme recognition. The proposed approach improves performance compared to the direct classification approach.
Article
Computer Science, Information Systems
Venkateswarlu Poluboina, Aparna Pulikala, Arivudai Nambi Pitchai Muthu
Summary: A cochlear implant is the most suitable option for individuals with severe profound hearing loss, as it restores audibility and offers good speech understanding in quiet. However, speech perception in noise with cochlear implants is suboptimal due to current coding strategies that lack sophisticated pre-processing. This study proposes a novel pre-processing method to improve speech intelligibility in noise and evaluates its performance using objective and subjective tests.
Article
Chemistry, Analytical
Ilkhomjon Pulatov, Rashid Oteniyazov, Fazliddin Makhmudov, Young-Im Cho
Summary: Understanding and identifying emotional cues in human speech is crucial for human-computer communication. This study proposes an innovative framework for speech emotion recognition, utilizing spectrograms and semantic feature transcribers. The framework combines convolutional neural network models and Mel-frequency cepstral coefficient feature abstraction approach for better representation. The evaluation results show superior performance compared to existing models, with an accuracy of 94.8% on RAVDESS dataset and 94.0% on EMO-DB dataset.
Article
Physics, Multidisciplinary
Yuan Zong, Hailun Lian, Hongli Chang, Cheng Lu, Chuangao Tang
Summary: This paper focuses on the challenging task of cross-corpus speech emotion recognition (SER). To tackle the feature distribution mismatch between labeled source and target speech samples from different emotion corpora, the authors propose a transfer subspace learning method called MDAR. By learning a projection matrix and incorporating a novel regularization term called MDA, the MDAR method achieves better performance than other state-of-the-art transfer learning methods in cross-corpus SER tasks.
Article
Acoustics
Shuiyang Mao, P. C. Ching, Tan Lee
Summary: This paper presents a deep neural network approach for speech emotion recognition using a limited amount of labeled data. Unlike traditional methods, this approach trains backbone networks on shorter segments, thereby increasing the number of training examples. However, due to the lack of segment-level labels in most emotional corpora, an iterative self-learning framework is proposed to correct the labels and improve recognition performance.
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING
(2022)
Article
Computer Science, Information Systems
Mousa Alhajlah
Summary: In this paper, a novel FER framework is proposed for patient monitoring. Preprocessing and data balancing are performed, followed by training two lightweight efficient CNN models MobileNetV2 and NasNetMobile and extracting feature vectors. The WOA algorithm is used to remove irrelevant features from these vectors, and the optimized features are passed to the classifier. Experimental results show that the proposed model achieves 82.5% accuracy and outperforms state-of-the-art techniques in terms of accuracy. It is worth noting that the proposed technique achieves better accuracy with 2.8 times fewer features.
CMC-COMPUTERS MATERIALS & CONTINUA
(2023)
Article
Engineering, Electrical & Electronic
Zhihang Deng, Xu Zhang, Xi Chen, Xiang Chen, Xun Chen, Erwei Yin
Summary: The study aims to develop a nonacoustic modality of silent speech recognition (SSR) that transfers knowledge learned from high-density electrode array to a system using a few channels, with both high portability and performance. A convolutional neural network (CNN) was established and trained using data recorded from face and neck muscles, and then calibrated through transfer learning to adapt to a new target domain with data recorded by separate electrodes. The proposed method outperformed other classification approaches and showed performance improvements even under electrode shift and cross-user variability conditions.
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT
(2023)
Article
Psychiatry
Lasse Hansen, Yan-Ping Zhang, Detlef Wolf, Konstantinos Sechidis, Nicolai Ladegaard, Riccardo Fusaroli
Summary: A generalizable speech emotion recognition model trained using transfer learning on non-clinical datasets can effectively predict changes in depressive states before and after remission in patients with major depressive disorder (MDD). Data collection and cleaning play crucial roles in ensuring the accuracy of automated voice analysis for clinical purposes.
ACTA PSYCHIATRICA SCANDINAVICA
(2022)
Article
Acoustics
Shuhua Liu, Mengyu Zhang, Ming Fang, Jianwei Zhao, Kun Hou, Chih-Cheng Hung
Summary: Speech plays a crucial role in human-computer emotional interaction, and this study utilizes the FaceNet model to improve speech emotion recognition. By pretraining on the CASIA dataset and fine-tuning on the IEMOCAP dataset, the proposed approach achieves high accuracy due to clean signals. Experimental results demonstrate that the method outperforms state-of-the-art approaches on the IEMOCAP dataset among single modal methods.
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA
(2021)
Article
Computer Science, Artificial Intelligence
Guihua Wen, Huiqiang Liao, Huihui Li, Pengchen Wen, Tong Zhang, Sande Gao, Bao Wang
Summary: This paper proposes a self-labeling learning method for speech emotion recognition, which automatically segments each speech sample and labels them with emotional tags. It designs and trains a time-frequency deep neural network and applies a feature transfer model to enhance performance.
KNOWLEDGE-BASED SYSTEMS
(2022)
Article
Computer Science, Artificial Intelligence
Junde Chen, Defu Zhang, Md Suzauddola, Adnan Zeb
Summary: Crop diseases are a major issue globally, leading to decreased crop production. Image-based automatic identification methods have gained attention for addressing this problem. This study introduces a Location-wise Soft Attention mechanism in MobileNet-V2, showing promising results in crop disease recognition through experimental analyses.
APPLIED SOFT COMPUTING
(2021)
Article
Computer Science, Artificial Intelligence
Zhen -Tao Liu, Bao-Han Wu, Meng -Ting Han, Wei -Hua Cao, Min Wu
Summary: In this study, a few-shot learning method based on meta-transfer learning with domain adaption is proposed for speech emotion recognition (SER). It effectively reduces the over-fitting phenomenon and solves the target domain adaptability problem.
APPLIED SOFT COMPUTING
(2023)
Article
Computer Science, Artificial Intelligence
Navid Naderi, Babak Nasersharif
Summary: This paper proposes a method for adapting a speech emotion recognition system to different conditions. It uses attention-based feature fusion and transfer learning in both feature extraction and classification. Experimental results demonstrate the effectiveness of the proposed method on various target corpora.
KNOWLEDGE-BASED SYSTEMS
(2023)