☆ 4.6 Article

The Emotion Probe: On the Universality of Cross-Linguistic and Cross-Gender Speech Emotion Recognition via Machine Learning

SENSORS (2022)

Journal

SENSORS

Volume 22, Issue 7, Pages -

Publisher

MDPI

DOI: 10.3390/s22072461

Keywords

speech; emotion recognition; artificial intelligence; English; cross-linguistic; cross-gender; SVM; machine learning; SER

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This study investigates the feasibility and characteristics of cross-linguistic and cross-gender speech emotion recognition (SER). The results show that the MLP classifier is the most effective, with accuracies exceeding 90% for single-language approaches and over 80% for cross-language classification. Cross-gender tasks are found to be more challenging than tasks involving different languages, indicating significant differences in emotions expressed by male and female subjects. RASTA, F0, MFCC, and spectral energy are identified as the most effective feature domains.

Machine Learning (ML) algorithms within a human-computer framework are the leading force in speech emotion recognition (SER). However, few studies explore cross-corpora aspects of SER; this work aims to explore the feasibility and characteristics of a cross-linguistic, cross-gender SER. Three ML classifiers (SVM, Naive Bayes and MLP) are applied to acoustic features, obtained through a procedure based on Kononenko's discretization and correlation-based feature selection. The system encompasses five emotions (disgust, fear, happiness, anger and sadness), using the Emofilm database, comprised of short clips of English movies and the respective Italian and Spanish dubbed versions, for a total of 1115 annotated utterances. The results see MLP as the most effective classifier, with accuracies higher than 90% for single-language approaches, while the cross-language classifier still yields accuracies higher than 80%. The results show cross-gender tasks to be more difficult than those involving two languages, suggesting greater differences between emotions expressed by male versus female subjects than between different languages. Four feature domains, namely, RASTA, F0, MFCC and spectral energy, are algorithmically assessed as the most effective, refining existing literature and approaches based on standard sets. To our knowledge, this is one of the first studies encompassing cross-gender and cross-linguistic assessments on SER.

Authors

I am an author on this paper

Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6

Not enough ratings

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

English Flipped Classroom Teaching Mode Based on Emotion Recognition Technology

Lin Lai

Summary: With the development of modern information technology, the flipped classroom teaching mode has become a hot topic in contemporary education and is being applied in various disciplines. However, this teaching mode still faces challenges such as low efficiency and lack of teacher-student interaction, leading to low student enthusiasm for learning. Thus, further testing and revision of the flipped classroom teaching mode is needed.

FRONTIERS IN PSYCHOLOGY (2022)