☆ 4.6 Article

Speech Emotion Classification Using Attention-Based LSTM

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING (2019)

期刊

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING

卷 27, 期 11, 页码 1675-1685

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TASLP.2019.2925934

关键词

Speech emotion; frame-level features; LSTM; attention mechanism

类别

Acoustics Engineering, Electrical & Electronic

资金

National Natural Science Foundation of China [61871213, 61673108, 61571106]
Six Talent Peaks Project in Jiangsu Province [2016-DZXX-023]
Natural Science Foundation of Jiangsu Province [BK20161517]

向作者/读者索取更多资源

Protocol

Reagent

摘要

Automatic speech emotion recognition has been a research hotspot in the field of human-computer interaction over the past decade. However, due to the lack of research on the inherent temporal relationship of the speech waveform, the current recognition accuracy needs improvement. To make full use of the difference of emotional saturation between time frames, a novel method is proposed for speech recognition using frame-level speech features combined with attention-based long short-term memory (LSTM) recurrent neural networks. Frame-level speech features were extracted from waveform to replace traditional statistical features, which could preserve the timing relations in the original speech through the sequence of frames. To distinguish emotional saturation in different frames, two improvement strategies are proposed for LSTM based on the attention mechanism: first, the algorithm reduces the computational complexity by modifying the forgetting gate of traditional LSTM without sacrificing performance and second, in the final output of the LSTM, an attention mechanism is applied to both the time and the feature dimension to obtain the information related to the task, rather than using the output from the last iteration of the traditional algorithm. Extensive experiments on the CASIA, eNTERFACE, and GEMEP emotion corpora demonstrate that the performance of the proposed approach is able to outperform the state-of-the-art algorithms reported to date.

作者

我是这篇论文的作者

点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6

评分不足

次要评分

新颖性

-

重要性

-

科学严谨性

-

评价这篇论文

推荐

Article Psychology, Multidisciplinary

A Novel User Emotional Interaction Design Model Using Long and Short-Term Memory Networks and Deep Learning

Xiang Chen, Rubing Huang, Xin Li, Lei Xiao, Ming Zhou, Linghao Zhang

Summary: Emotional design is a crucial trend in interaction design, playing a key role in enhancing user experience and evoking emotional resonance. By focusing on users' emotional experiences, designers are increasingly emphasizing the importance of emotional design in products to enhance their design thinking.

FRONTIERS IN PSYCHOLOGY (2021)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Learning multi-scale features for speech emotion recognition with connection attention mechanism

Zengzhao Chen, Jiawen Li, Hai Liu, Xuyang Wang, Hu Wang, Qiuyu Zheng

Summary: This study proposes a parallel network for multi-scale speech emotion recognition that fuses frame-level manual features with utterance-level deep features using a connection attention mechanism. The experiments demonstrate the effectiveness and performance superiority of the proposed method.

EXPERT SYSTEMS WITH APPLICATIONS (2023)

添加到收藏夹

Article Engineering, Electrical & Electronic

Speech Emotion Recognition via Multi-Level Attention Network

Ke Liu, Dekui Wang, Dongya Wu, Yutao Liu, Jun Feng

Summary: The aim of this research is to improve the performance of human speech emotion recognition. The proposed multi-level attention network (MLAnet) extracts low-level emotion features from the popular mel-scale frequency cepstral coefficient (MFCC) and weights these features using a multi-unit attention module. Experimental results show that this method outperforms other state-of-the-art approaches.

IEEE SIGNAL PROCESSING LETTERS (2022)

添加到收藏夹

Article Acoustics

Attention guided 3D CNN-LSTM model for accurate speech based emotion recognition

Orhan Atila, Abdulkadir Sengur

Summary: This paper proposes a novel approach based on attention guided 3D convolutional neural networks (CNN)-long short-term memory (LSTM) model for speech based emotion recognition. The method is evaluated using three datasets and outperforms compared methods according to classification accuracy, sensitivity, specificity, and F1-score evaluations.

APPLIED ACOUSTICS (2021)

添加到收藏夹

Article Chemistry, Analytical

Multimodal Emotion Detection via Attention-Based Fusion of Extracted Facial and Speech Features

Dilnoza Mamieva, Akmalbek Bobomirzaevich Abdusalomov, Alpamis Kutlimuratov, Bahodir Muminov, Taeg Keun Whangbo

Summary: Methods that use multiple modalities to detect emotions are more accurate and resilient than those that rely on a single sense. Sentiments can be conveyed in various ways, and combining data from multiple modalities leads to a more comprehensive understanding of a person's emotional state.

SENSORS (2023)

添加到收藏夹

Article Biochemical Research Methods

DeepPPThermo: A Deep Learning Framework for Predicting Protein Thermostability Combining Protein-Level and Amino Acid-Level Features

Xiaoyang Xiang, Jiaxuan Gao, Yanrui Ding

Summary: In this study, a deep learning-based classifier called DeepPPThermo was proposed, which combines classical sequence features and deep learning representation features to classify thermophilic and mesophilic proteins. The model utilizes deep neural networks and bi-long short-term memory networks to extract hidden features, and applies local and global attention mechanisms to assign different importance to multiview features. Experimental results show that our model outperforms other machine learning algorithms and deep learning algorithms. Furthermore, the robustness of the model and the importance of each feature have been demonstrated.

JOURNAL OF COMPUTATIONAL BIOLOGY (2023)

添加到收藏夹

Review Chemistry, Analytical

Deep Learning Techniques for Speech Emotion Recognition, from Databases to Models

Babak Joze Abbaschian, Daniel Sierra-Sosa, Adel Elmaghraby

Summary: This study reviews deep learning and conventional machine learning techniques for speech emotion recognition in order to compare different approaches and achieve feasible solutions. The goal is to provide a survey of discrete speech emotion recognition in the field.

SENSORS (2021)

添加到收藏夹

Article Computer Science, Information Systems

Speech Emotion Recognition Based on Self-Attention Weight Correction for Acoustic and Text Features

Jennifer Santoso, Takeshi Yamada, Kenkichi Ishizuka, Taiichi Hashimoto, Shoji Makino

Summary: Research focuses on improving SER performance with BLSTM and self-attention, using the SAWC method to adjust the importance weights of segments and words with high ASR error probability, achieving higher accuracy in experiments.

IEEE ACCESS (2022)

添加到收藏夹

Article Computer Science, Artificial Intelligence

TWACapsNet: a capsule network with two-way attention mechanism for speech emotion recognition

Xin-Cheng Wen, Kun-Hong Liu, Yan Luo, Jiaxin Ye, Liyan Chen

Summary: This study proposes a Capsule Network with Two-Way Attention Mechanism (TWACapsNet) for the Speech Emotion Recognition (SER) problem. Experimental results demonstrate that the proposed method outperforms other neural network models on multiple SER datasets, and the combination of the two ways contributes to the higher and more stable performance of TWACapsNet.

SOFT COMPUTING (2023)

添加到收藏夹

Article Computer Science, Information Systems

A BiLSTM-Transformer and 2D CNN Architecture for Emotion Recognition from Speech

Sera Kim, Seok-Pil Lee

Summary: The significance of emotion recognition technology continues to grow, and this study proposes a new model architecture that combines BiLSTM-Transformer and 2D CNN to enhance the efficacy of emotion recognition from speech. The results show high accuracy rates in two major emotion recognition databases.

ELECTRONICS (2023)

添加到收藏夹

Article Computer Science, Information Systems

Document-Level Sentiment Analysis Using Attention-Based Bi-Directional Long Short-Term Memory Network and Two-Dimensional Convolutional Neural Network

Yanying Mao, Yu Zhang, Liudan Jiao, Heshan Zhang

Summary: This paper proposes a new neural network model, AttBiLSTM-2DCNN, for sentiment analysis on long texts and addresses the challenge of differentiating the importance of document features. The experimental results demonstrate that the model can capture sentimental relations and outperform certain state-of-the-art models.

ELECTRONICS (2022)

添加到收藏夹

Article Computer Science, Artificial Intelligence

On combining acoustic and modulation spectrograms in an attention LSTM-based system for speech intelligibility level classification

Ascension Gallardo-Antolin, Juan M. Montero

Summary: This study presents an automatic prediction system for speech intelligibility level using LSTM networks and attention mechanism. Two main contributions are proposed: using per-frame modulation spectrograms as input features, and exploring two different strategies for combining per-frame log-mel and modulation spectrograms in the LSTM framework. Results show that attentional LSTM networks can effectively model modulation spectrograms and the combination strategies outperform single-feature systems.

NEUROCOMPUTING (2021)

添加到收藏夹

Article Computer Science, Information Systems

Human Action Recognition Using Key-Frame Attention-Based LSTM Networks

Changxuan Yang, Feng Mei, Tuo Zang, Jianfeng Tu, Nan Jiang, Lingfeng Liu

Summary: In this paper, a key-frame-based approach to human action recognition is proposed. A key-frame attention-based LSTM network (KF-LSTM) is designed using the attention mechanism to effectively recognize human action sequences by assigning different weights to key frames. A new key-frame extraction method is also designed to avoid confusion in the temporal sequence of key frames and ensure smooth human action recognition.

ELECTRONICS (2023)

添加到收藏夹

Article Chemistry, Analytical

The Impact of Attention Mechanisms on Speech Emotion Recognition

Shouyan Chen, Mingyan Zhang, Xiaofen Yang, Zhijia Zhao, Tao Zou, Xinqi Sun

Summary: This paper discusses the applicable rules of Global-Attention and Self-Attention in SER classification construction, and proposes a new classifier model with an accuracy of 85.427% on the EMO-DB dataset.

SENSORS (2021)

添加到收藏夹

Article Computer Science, Artificial Intelligence

GaitSlice: A gait recognition model based on spatio-temporal slice features

Huakang Li, Yidan Qiu, Huimin Zhao, Jin Zhan, Rongjun Chen, Tuanjie Wei, Zhihui Huang

Summary: In this paper, a novel model, GaitSlice, is proposed to analyze human gait based on spatio-temporal slice features. The model combines parallel RFAMs with inter-related slice features to focus on the features' spatio-temporal information, achieving high accuracy in gait recognition under cross-view and various walking conditions.

PATTERN RECOGNITION (2022)

添加到收藏夹

Article Multidisciplinary Sciences

Perception and classification of emotions in nonsense speech: Humans versus machines

Emilia Parada-Cabaleiro, Anton Batliner, Maximilian Schmitt, Markus Schedl, Giovanni Costantini, Bjoern Schuller

Summary: This article addresses four fallacies in traditional affective computing and proposes a more adequate modelling of emotions encoded in speech. The fallacies include limited focus on few emotions, lack of comparison between clean and noisy data, insufficient assessment of machine learning approaches, and the absence of strict comparison between human perception and machine classification. The article demonstrates the effectiveness of machine learning based on state-of-the-art feature representations in reflecting the main emotional categories even in degraded acoustic conditions.

PLOS ONE (2023)

添加到收藏夹

Article Health Care Sciences & Services

Assessing the Feasibility of a Text-Based Conversational Agent for Asthma Support: Protocol for a Mixed Methods Observational Study

Rafael A. Calvo, Dorian Peters, Laura Moradbakhti, Darren Cook, Georgios Rizos, Bjoern Schuller, Constantinos Kallis, Ernie Wong, Jennifer Quint

Summary: This study aims to determine the feasibility and usability of a text-based conversational agent to assess asthma risk and provide information for improving asthma control. The study will recruit 300 adult participants through various channels and assess their asthma outcomes. The study is expected to be completed in 2023, and will inform future pilot studies and randomized controlled trials.

JMIR RESEARCH PROTOCOLS (2023)

添加到收藏夹

Article Engineering, Biomedical

Exploring interpretable representations for heart sound abnormality detection

Zhihua Wang, Kun Qian, Houguang Liu, Bin Hu, Bjorn W. Schuller, Yoshiharu Yamamoto

Summary: The advantages of non-invasive, real-time and convenient computer audition-based heart sound abnormality detection methods have attracted increasing attention from the cardiovascular diseases community. A comprehensive investigation on time-frequency methods for analyzing heart sounds is proposed, considering the urgent need for robust detection algorithms in real environments. Experimental results show that Stockwell transformation outperforms other methods with the highest overall score of 65.2%, and the interpretable results demonstrate its ability to provide more information and noise robustness for heart sounds.

BIOMEDICAL SIGNAL PROCESSING AND CONTROL (2023)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Classification of stuttering-The ComParE challenge and beyond

Sebastian P. Bayerl, Maurice Gerczuk, Anton Batliner, Christian Bergler, Shahin Amiriparian, Bjoern Schuller, Elmar Noeth, Korbinian Riedhammer

Summary: The ACM Multimedia 2022 Computational Paralinguistics Challenge (ComParE) focused on the classification of stuttering, aiming to raise awareness and engage a wider research community. Stuttering is a complex speech disorder characterized by blocks, prolongations, and repetitions in speech. Accurate classification of stuttering symptoms is important for the development of self-help tools and specialized automatic speech recognition systems. This paper reviews the challenge contributions, presents improved state-of-the-art classification results, and explores cross-language training using the KSF-C dataset.

COMPUTER SPEECH AND LANGUAGE (2023)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Will Affective Computing Emerge From Foundation Models and General Artificial Intelligence? A First Evaluation of ChatGPT

Mostafa Amin, Erik W. Cambria, Bjorn Schuller

Summary: ChatGPT demonstrates the potential of general artificial intelligence capabilities and performs well across various natural language processing tasks. This study evaluates ChatGPT's text classification abilities for affective computing problems including personality prediction, sentiment analysis, and suicide tendency detection. Results show that task-specific RoBERTa models generally outperform other baselines, while ChatGPT performs decently and is comparable to Word2Vec and BoW baselines. ChatGPT exhibits robustness against noisy data, outperforming Word2Vec in such scenarios. The study concludes that ChatGPT is a good generalist model but not as specialized as task-specific models for optimal performance.

IEEE INTELLIGENT SYSTEMS (2023)

添加到收藏夹

Article Acoustics

Robust Audio Watermarking Based on Empirical Mode Decomposition and Group Differential Relations

Wen-Hsing Lai, Tsung-Yuan Chou, Meng-Chen Chou, Bjoern W. Schuller

Summary: This paper proposes an audio watermarking technique using Complementary Ensemble Empirical Mode Decomposition and group differential relations. The technique achieves near-imperceptibility and robustness under various attacks, and the experimental results validate its effectiveness.

JOURNAL OF THE AUDIO ENGINEERING SOCIETY (2023)

添加到收藏夹

Article Biology

Automated acoustic detection of Geoffroy's spider monkey highlights tipping points of human disturbance

Jenna Lawson, George Rizos, Dui Jasinghe, Andrew Whitworth, Bjoern Schuller, Cristina Banks-leite

Summary: With the increased human activity and threatened species at risk of extinction, it is important to understand how to conserve them across human-modified landscapes. Passive acoustic monitoring (PAM) is an efficient method for collecting data on vocal species, but there is a lack of automated species detectors to analyze large amounts of acoustic data. In this study, we used PAM and a newly developed automated detector to successfully detect the endangered Geoffroy's spider monkey and found that they were absent below a certain forest cover threshold and near primary paved roads, and occurred equally in old growth and secondary forests.

PROCEEDINGS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES (2023)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Automated composition of Galician Xota-tuning RNN-based composers for specific musical styles using deep Q-learning

Rodrigo Mira, Eduardo Coutinho, Emilia Parada-Cabaleiro, Bjoern W. Schuller

Summary: Music composition is challenging to automate due to the subjective nature of what is considered aesthetically pleasing. Past neural network-based methods have lacked consistency and failed to produce impressive results. In this project, we built upon Magenta's RL Tuner model and extended it to emulate the Galician Xota genre. By implementing a new rule-set and training a Deep Q Network using reward functions, we effectively enforced the desired style and structure on the generated compositions. Our research methodology provides a solid foundation for future studies using this architecture, and we propose further applications and improvements for this model in future work.

PEERJ COMPUTER SCIENCE (2023)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Can ChatGPT's Responses Boost Traditional Natural Language Processing?

Mostafa M. Amin, Erik Cambria, Bjoern W. Schuller

Summary: The employment of foundation models is expanding and ChatGPT has the potential to enhance existing NLP techniques with its novel knowledge.

IEEE INTELLIGENT SYSTEMS (2023)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Dawn of the Transformer Era in Speech Emotion Recognition: Closing the Valence Gap

Johannes Wagner, Andreas Triantafyllopoulos, Hagen Wierstorf, Maximilian Schmitt, Felix Burkhardt, Florian Eyben, Bjoern W. W. Schuller

Summary: Recent advances in transformer-based architectures have shown promise in several machine learning tasks, specifically speech emotion recognition (SER) in the audio domain. However, existing works have not thoroughly evaluated the influence of model size and pre-training data on downstream performance, and have shown limited attention to generalisation, robustness, fairness, and efficiency. This study conducts a thorough analysis on pre-trained variants of wav2vec 2.0 and HuBERT, demonstrating their top performance for valence prediction without explicit linguistic information, and releasing the best performing model to the community for reproducibility.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2023)

添加到收藏夹

Article Clinical Neurology

Multilingual markers of depression in remotely collected speech samples: A preliminary analysis

Nicholas Cummins, Judith Dineley, Pauline Conde, Faith Matcham, Sara Siddi, Femke Lamers, Ewan Carr, Grace Lavelle, Daniel Leightley, Katie M. White, Carolin Oetzmann, Edward L. Campbell, Sara Simblett, Stuart Bruce, Josep Maria Haro, Brenda W. J. H. Penninx, Yatharth Ranjan, Zulqarnain Rashid, Callum Stewart, Amos A. Folarin, Raquel Bailon, Bjoern W. Schuller, Til Wykes, Srinivasan Vairavan, Richard J. B. Dobson, Vaibhav A. Narayan, RADAR-CNS Consortium

Summary: Speech rate, articulation rate, and intensity of speech are associated with depressive symptoms, suggesting that these speech features may serve as biomarkers for major depressive disorder (MDD). This study collected real-world data, providing significant insights into the onset and progress of MDD.

JOURNAL OF AFFECTIVE DISORDERS (2023)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Audio-Visual Gated-Sequenced Neural Networks for Affect Recognition

Decky Aspandi, Federico Sukno, Bjorn W. Schuller, Xavier Binefa

Summary: There is growing interest in automatic emotion recognition and affective computing. The use of large video-based affect datasets has facilitated the development of deep learning-based models for automatic affect analysis. However, current approaches to process these multimodal inputs are oversimplified and fail to fully exploit their potential. This work proposes a multi-modal, sequence-based neural network with gating mechanisms for affect recognition, achieving state of the art accuracy on two affect datasets.

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING (2023)

添加到收藏夹

Editorial Material Computer Science, Artificial Intelligence

Guest Editorial Neurosymbolic AI for Sentiment Analysis

Frank Xing, Bjoern Schuller, Iti Chaturvedi, Erik Cambria, Amir Hussain

Summary: Neural network-based methods, such as word2vec and GPT-based models, have achieved significant progress in AI research, especially in handling large datasets. However, these methods lack in-depth understanding of the internal features and representations of the data, leading to various problems and concerns.

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING (2023)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Self Supervised Adversarial Domain Adaptation for Cross-Corpus and Cross-Language Speech Emotion Recognition

Siddique Latif, Rajib Rana, Sara Khalifa, Raja Jurdak, Bjorn Schuller

Summary: Despite recent advancements in speech emotion recognition (SER) within a single corpus, the performance of these systems degrades significantly for cross-corpus and cross-language scenarios. This is due to the lack of generalization in SER systems towards unseen conditions. Adversarial methods have been used to address this issue, but many only focus on cross-corpus SER and ignore the cross-language performance degradation. This study proposes an adversarial dual discriminator (ADDi) network and a self-supervised ADDi (sADDi) network to improve cross-corpus and cross-language SER without requiring target data labels. Experimental results demonstrate improved performance compared to state-of-the-art methods.

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING (2023)

添加到收藏夹

Article Computer Science, Artificial Intelligence

FENP: A Database of Neonatal Facial Expression for Pain Analysis

Jingjie Yan, Guanming Lu, Xiaonan Li, Wenming Zheng, Chengwei Huang, Zhen Cui, Yuan Zong, Mengying Chen, Qiang Hao, Yi Liu, Jindu Zhu, Haibo Li

Summary: In this article, a new neonatal facial expression database for pain analysis is introduced. The database, called facial expression of neonatal pain (FENP), consists of 11,000 neonatal facial expression images associated with 106 Chinese neonates. The experimental results show that the proposed database is suitable for studying neonatal pain and facial expression recognition.

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING (2023)

添加到收藏夹

暂无数据

© Peeref 2019-2024. All rights reserved.