Article
Psychology, Multidisciplinary
Xiang Chen, Rubing Huang, Xin Li, Lei Xiao, Ming Zhou, Linghao Zhang
Summary: Emotional design is a crucial trend in interaction design, playing a key role in enhancing user experience and evoking emotional resonance. By focusing on users' emotional experiences, designers are increasingly emphasizing the importance of emotional design in products to enhance their design thinking.
FRONTIERS IN PSYCHOLOGY
(2021)
Article
Computer Science, Artificial Intelligence
Zengzhao Chen, Jiawen Li, Hai Liu, Xuyang Wang, Hu Wang, Qiuyu Zheng
Summary: This study proposes a parallel network for multi-scale speech emotion recognition that fuses frame-level manual features with utterance-level deep features using a connection attention mechanism. The experiments demonstrate the effectiveness and performance superiority of the proposed method.
EXPERT SYSTEMS WITH APPLICATIONS
(2023)
Article
Engineering, Electrical & Electronic
Ke Liu, Dekui Wang, Dongya Wu, Yutao Liu, Jun Feng
Summary: The aim of this research is to improve the performance of human speech emotion recognition. The proposed multi-level attention network (MLAnet) extracts low-level emotion features from the popular mel-scale frequency cepstral coefficient (MFCC) and weights these features using a multi-unit attention module. Experimental results show that this method outperforms other state-of-the-art approaches.
IEEE SIGNAL PROCESSING LETTERS
(2022)
Article
Acoustics
Orhan Atila, Abdulkadir Sengur
Summary: This paper proposes a novel approach based on attention guided 3D convolutional neural networks (CNN)-long short-term memory (LSTM) model for speech based emotion recognition. The method is evaluated using three datasets and outperforms compared methods according to classification accuracy, sensitivity, specificity, and F1-score evaluations.
Article
Chemistry, Analytical
Dilnoza Mamieva, Akmalbek Bobomirzaevich Abdusalomov, Alpamis Kutlimuratov, Bahodir Muminov, Taeg Keun Whangbo
Summary: Methods that use multiple modalities to detect emotions are more accurate and resilient than those that rely on a single sense. Sentiments can be conveyed in various ways, and combining data from multiple modalities leads to a more comprehensive understanding of a person's emotional state.
Article
Biochemical Research Methods
Xiaoyang Xiang, Jiaxuan Gao, Yanrui Ding
Summary: In this study, a deep learning-based classifier called DeepPPThermo was proposed, which combines classical sequence features and deep learning representation features to classify thermophilic and mesophilic proteins. The model utilizes deep neural networks and bi-long short-term memory networks to extract hidden features, and applies local and global attention mechanisms to assign different importance to multiview features. Experimental results show that our model outperforms other machine learning algorithms and deep learning algorithms. Furthermore, the robustness of the model and the importance of each feature have been demonstrated.
JOURNAL OF COMPUTATIONAL BIOLOGY
(2023)
Review
Chemistry, Analytical
Babak Joze Abbaschian, Daniel Sierra-Sosa, Adel Elmaghraby
Summary: This study reviews deep learning and conventional machine learning techniques for speech emotion recognition in order to compare different approaches and achieve feasible solutions. The goal is to provide a survey of discrete speech emotion recognition in the field.
Article
Computer Science, Information Systems
Jennifer Santoso, Takeshi Yamada, Kenkichi Ishizuka, Taiichi Hashimoto, Shoji Makino
Summary: Research focuses on improving SER performance with BLSTM and self-attention, using the SAWC method to adjust the importance weights of segments and words with high ASR error probability, achieving higher accuracy in experiments.
Article
Computer Science, Artificial Intelligence
Xin-Cheng Wen, Kun-Hong Liu, Yan Luo, Jiaxin Ye, Liyan Chen
Summary: This study proposes a Capsule Network with Two-Way Attention Mechanism (TWACapsNet) for the Speech Emotion Recognition (SER) problem. Experimental results demonstrate that the proposed method outperforms other neural network models on multiple SER datasets, and the combination of the two ways contributes to the higher and more stable performance of TWACapsNet.
Article
Computer Science, Information Systems
Sera Kim, Seok-Pil Lee
Summary: The significance of emotion recognition technology continues to grow, and this study proposes a new model architecture that combines BiLSTM-Transformer and 2D CNN to enhance the efficacy of emotion recognition from speech. The results show high accuracy rates in two major emotion recognition databases.
Article
Computer Science, Information Systems
Yanying Mao, Yu Zhang, Liudan Jiao, Heshan Zhang
Summary: This paper proposes a new neural network model, AttBiLSTM-2DCNN, for sentiment analysis on long texts and addresses the challenge of differentiating the importance of document features. The experimental results demonstrate that the model can capture sentimental relations and outperform certain state-of-the-art models.
Article
Computer Science, Artificial Intelligence
Ascension Gallardo-Antolin, Juan M. Montero
Summary: This study presents an automatic prediction system for speech intelligibility level using LSTM networks and attention mechanism. Two main contributions are proposed: using per-frame modulation spectrograms as input features, and exploring two different strategies for combining per-frame log-mel and modulation spectrograms in the LSTM framework. Results show that attentional LSTM networks can effectively model modulation spectrograms and the combination strategies outperform single-feature systems.
Article
Computer Science, Information Systems
Changxuan Yang, Feng Mei, Tuo Zang, Jianfeng Tu, Nan Jiang, Lingfeng Liu
Summary: In this paper, a key-frame-based approach to human action recognition is proposed. A key-frame attention-based LSTM network (KF-LSTM) is designed using the attention mechanism to effectively recognize human action sequences by assigning different weights to key frames. A new key-frame extraction method is also designed to avoid confusion in the temporal sequence of key frames and ensure smooth human action recognition.
Article
Chemistry, Analytical
Shouyan Chen, Mingyan Zhang, Xiaofen Yang, Zhijia Zhao, Tao Zou, Xinqi Sun
Summary: This paper discusses the applicable rules of Global-Attention and Self-Attention in SER classification construction, and proposes a new classifier model with an accuracy of 85.427% on the EMO-DB dataset.
Article
Computer Science, Artificial Intelligence
Huakang Li, Yidan Qiu, Huimin Zhao, Jin Zhan, Rongjun Chen, Tuanjie Wei, Zhihui Huang
Summary: In this paper, a novel model, GaitSlice, is proposed to analyze human gait based on spatio-temporal slice features. The model combines parallel RFAMs with inter-related slice features to focus on the features' spatio-temporal information, achieving high accuracy in gait recognition under cross-view and various walking conditions.
PATTERN RECOGNITION
(2022)
Article
Multidisciplinary Sciences
Emilia Parada-Cabaleiro, Anton Batliner, Maximilian Schmitt, Markus Schedl, Giovanni Costantini, Bjoern Schuller
Summary: This article addresses four fallacies in traditional affective computing and proposes a more adequate modelling of emotions encoded in speech. The fallacies include limited focus on few emotions, lack of comparison between clean and noisy data, insufficient assessment of machine learning approaches, and the absence of strict comparison between human perception and machine classification. The article demonstrates the effectiveness of machine learning based on state-of-the-art feature representations in reflecting the main emotional categories even in degraded acoustic conditions.
Article
Health Care Sciences & Services
Rafael A. Calvo, Dorian Peters, Laura Moradbakhti, Darren Cook, Georgios Rizos, Bjoern Schuller, Constantinos Kallis, Ernie Wong, Jennifer Quint
Summary: This study aims to determine the feasibility and usability of a text-based conversational agent to assess asthma risk and provide information for improving asthma control. The study will recruit 300 adult participants through various channels and assess their asthma outcomes. The study is expected to be completed in 2023, and will inform future pilot studies and randomized controlled trials.
JMIR RESEARCH PROTOCOLS
(2023)
Article
Engineering, Biomedical
Zhihua Wang, Kun Qian, Houguang Liu, Bin Hu, Bjorn W. Schuller, Yoshiharu Yamamoto
Summary: The advantages of non-invasive, real-time and convenient computer audition-based heart sound abnormality detection methods have attracted increasing attention from the cardiovascular diseases community. A comprehensive investigation on time-frequency methods for analyzing heart sounds is proposed, considering the urgent need for robust detection algorithms in real environments. Experimental results show that Stockwell transformation outperforms other methods with the highest overall score of 65.2%, and the interpretable results demonstrate its ability to provide more information and noise robustness for heart sounds.
BIOMEDICAL SIGNAL PROCESSING AND CONTROL
(2023)
Article
Computer Science, Artificial Intelligence
Sebastian P. Bayerl, Maurice Gerczuk, Anton Batliner, Christian Bergler, Shahin Amiriparian, Bjoern Schuller, Elmar Noeth, Korbinian Riedhammer
Summary: The ACM Multimedia 2022 Computational Paralinguistics Challenge (ComParE) focused on the classification of stuttering, aiming to raise awareness and engage a wider research community. Stuttering is a complex speech disorder characterized by blocks, prolongations, and repetitions in speech. Accurate classification of stuttering symptoms is important for the development of self-help tools and specialized automatic speech recognition systems. This paper reviews the challenge contributions, presents improved state-of-the-art classification results, and explores cross-language training using the KSF-C dataset.
COMPUTER SPEECH AND LANGUAGE
(2023)
Article
Computer Science, Artificial Intelligence
Mostafa Amin, Erik W. Cambria, Bjorn Schuller
Summary: ChatGPT demonstrates the potential of general artificial intelligence capabilities and performs well across various natural language processing tasks. This study evaluates ChatGPT's text classification abilities for affective computing problems including personality prediction, sentiment analysis, and suicide tendency detection. Results show that task-specific RoBERTa models generally outperform other baselines, while ChatGPT performs decently and is comparable to Word2Vec and BoW baselines. ChatGPT exhibits robustness against noisy data, outperforming Word2Vec in such scenarios. The study concludes that ChatGPT is a good generalist model but not as specialized as task-specific models for optimal performance.
IEEE INTELLIGENT SYSTEMS
(2023)
Article
Acoustics
Wen-Hsing Lai, Tsung-Yuan Chou, Meng-Chen Chou, Bjoern W. Schuller
Summary: This paper proposes an audio watermarking technique using Complementary Ensemble Empirical Mode Decomposition and group differential relations. The technique achieves near-imperceptibility and robustness under various attacks, and the experimental results validate its effectiveness.
JOURNAL OF THE AUDIO ENGINEERING SOCIETY
(2023)
Article
Biology
Jenna Lawson, George Rizos, Dui Jasinghe, Andrew Whitworth, Bjoern Schuller, Cristina Banks-leite
Summary: With the increased human activity and threatened species at risk of extinction, it is important to understand how to conserve them across human-modified landscapes. Passive acoustic monitoring (PAM) is an efficient method for collecting data on vocal species, but there is a lack of automated species detectors to analyze large amounts of acoustic data. In this study, we used PAM and a newly developed automated detector to successfully detect the endangered Geoffroy's spider monkey and found that they were absent below a certain forest cover threshold and near primary paved roads, and occurred equally in old growth and secondary forests.
PROCEEDINGS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES
(2023)
Article
Computer Science, Artificial Intelligence
Rodrigo Mira, Eduardo Coutinho, Emilia Parada-Cabaleiro, Bjoern W. Schuller
Summary: Music composition is challenging to automate due to the subjective nature of what is considered aesthetically pleasing. Past neural network-based methods have lacked consistency and failed to produce impressive results. In this project, we built upon Magenta's RL Tuner model and extended it to emulate the Galician Xota genre. By implementing a new rule-set and training a Deep Q Network using reward functions, we effectively enforced the desired style and structure on the generated compositions. Our research methodology provides a solid foundation for future studies using this architecture, and we propose further applications and improvements for this model in future work.
PEERJ COMPUTER SCIENCE
(2023)
Article
Computer Science, Artificial Intelligence
Mostafa M. Amin, Erik Cambria, Bjoern W. Schuller
Summary: The employment of foundation models is expanding and ChatGPT has the potential to enhance existing NLP techniques with its novel knowledge.
IEEE INTELLIGENT SYSTEMS
(2023)
Article
Computer Science, Artificial Intelligence
Johannes Wagner, Andreas Triantafyllopoulos, Hagen Wierstorf, Maximilian Schmitt, Felix Burkhardt, Florian Eyben, Bjoern W. W. Schuller
Summary: Recent advances in transformer-based architectures have shown promise in several machine learning tasks, specifically speech emotion recognition (SER) in the audio domain. However, existing works have not thoroughly evaluated the influence of model size and pre-training data on downstream performance, and have shown limited attention to generalisation, robustness, fairness, and efficiency. This study conducts a thorough analysis on pre-trained variants of wav2vec 2.0 and HuBERT, demonstrating their top performance for valence prediction without explicit linguistic information, and releasing the best performing model to the community for reproducibility.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2023)
Article
Clinical Neurology
Nicholas Cummins, Judith Dineley, Pauline Conde, Faith Matcham, Sara Siddi, Femke Lamers, Ewan Carr, Grace Lavelle, Daniel Leightley, Katie M. White, Carolin Oetzmann, Edward L. Campbell, Sara Simblett, Stuart Bruce, Josep Maria Haro, Brenda W. J. H. Penninx, Yatharth Ranjan, Zulqarnain Rashid, Callum Stewart, Amos A. Folarin, Raquel Bailon, Bjoern W. Schuller, Til Wykes, Srinivasan Vairavan, Richard J. B. Dobson, Vaibhav A. Narayan, RADAR-CNS Consortium
Summary: Speech rate, articulation rate, and intensity of speech are associated with depressive symptoms, suggesting that these speech features may serve as biomarkers for major depressive disorder (MDD). This study collected real-world data, providing significant insights into the onset and progress of MDD.
JOURNAL OF AFFECTIVE DISORDERS
(2023)
Article
Computer Science, Artificial Intelligence
Decky Aspandi, Federico Sukno, Bjorn W. Schuller, Xavier Binefa
Summary: There is growing interest in automatic emotion recognition and affective computing. The use of large video-based affect datasets has facilitated the development of deep learning-based models for automatic affect analysis. However, current approaches to process these multimodal inputs are oversimplified and fail to fully exploit their potential. This work proposes a multi-modal, sequence-based neural network with gating mechanisms for affect recognition, achieving state of the art accuracy on two affect datasets.
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING
(2023)
Editorial Material
Computer Science, Artificial Intelligence
Frank Xing, Bjoern Schuller, Iti Chaturvedi, Erik Cambria, Amir Hussain
Summary: Neural network-based methods, such as word2vec and GPT-based models, have achieved significant progress in AI research, especially in handling large datasets. However, these methods lack in-depth understanding of the internal features and representations of the data, leading to various problems and concerns.
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING
(2023)
Article
Computer Science, Artificial Intelligence
Siddique Latif, Rajib Rana, Sara Khalifa, Raja Jurdak, Bjorn Schuller
Summary: Despite recent advancements in speech emotion recognition (SER) within a single corpus, the performance of these systems degrades significantly for cross-corpus and cross-language scenarios. This is due to the lack of generalization in SER systems towards unseen conditions. Adversarial methods have been used to address this issue, but many only focus on cross-corpus SER and ignore the cross-language performance degradation. This study proposes an adversarial dual discriminator (ADDi) network and a self-supervised ADDi (sADDi) network to improve cross-corpus and cross-language SER without requiring target data labels. Experimental results demonstrate improved performance compared to state-of-the-art methods.
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING
(2023)
Article
Computer Science, Artificial Intelligence
Jingjie Yan, Guanming Lu, Xiaonan Li, Wenming Zheng, Chengwei Huang, Zhen Cui, Yuan Zong, Mengying Chen, Qiang Hao, Yi Liu, Jindu Zhu, Haibo Li
Summary: In this article, a new neonatal facial expression database for pain analysis is introduced. The database, called facial expression of neonatal pain (FENP), consists of 11,000 neonatal facial expression images associated with 106 Chinese neonates. The experimental results show that the proposed database is suitable for studying neonatal pain and facial expression recognition.
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING
(2023)