4.2 Article

Analysis of Emotionally Salient Aspects of Fundamental Frequency for Emotion Detection

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TASL.2008.2009578

关键词

Emotional speech analysis; emotional speech recognition; expressive speech; intonation; pitch contour analysis

资金

  1. National Science Foundation (NSF) through the Integrated Media Systems Center [EEC-9529152]
  2. Department of the Army
  3. Office of Naval

向作者/读者索取更多资源

During expressive speech, the voice is enriched to convey not only the intended semantic message but also the emotional state of the speaker. The pitch contour is one of the important properties of speech that is affected by this emotional modulation. Although pitch features have been commonly used to recognize emotions, it is not clear what aspects of the pitch contour are the most emotionally salient. This paper presents an analysis of the statistics derived from the pitch contour. First, pitch features derived from emotional speech samples are compared with the ones derived from neutral speech, by using symmetric Kullback-Leibler distance. Then, the emotionally discriminative power of the pitch features is quantified by comparing nested logistic regression models. The results indicate that gross pitch contour statistics such as mean, maximum, minimum, and range are more emotionally prominent than features describing the pitch shape. Also, analyzing the pitch statistics at the utterance level is found to be more accurate and robust than analyzing the pitch statistics for shorter speech regions (e.g., voiced segments). Finally, the best features are selected to build a binary emotion detection system for distinguishing between emotional versus neutral speech. A new two-step approach is proposed. In the first step, reference models for the pitch features are trained with neutral speech, and the input features are contrasted with the neutral model. In the second step, a fitness measure is used to assess whether the input speech is similar to, in the case of neutral speech, or different from, in the case of emotional speech, the reference models. The proposed approach is tested with four acted emotional databases spanning different emotional categories, recording settings, speakers and languages. The results show that the recognition accuracy of the system is over 77% just with the pitch features (baseline 50%). When compared to conventional classification schemes, the proposed approach performs better in terms of both accuracy and robustness.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.2
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Computer Science, Artificial Intelligence

Building Naturalistic Emotionally Balanced Speech Corpus by Retrieving Emotional Speech from Existing Podcast Recordings

Reza Lotfian, Carlos Busso

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING (2019)

Article Acoustics

Curriculum Learning for Speech Emotion Recognition From Crowdsourced Labels

Reza Lotfian, Carlos Busso

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING (2019)

Article Acoustics

Speech-driven animation with meaningful behaviors

Najmeh Sadoughi, Carlos Busso

SPEECH COMMUNICATION (2019)

Article Psychiatry

Smartphone sensing of social interactions in people with and without schizophrenia

Daniel Fulford, Jasmine Mote, Rachel Gonzalez, Samuel Abplanalp, Yuting Zhang, Jarrod Luckenbaugh, Jukka-Pekka Onnela, Carlos Busso, David E. Gard

Summary: Social impairment is prevalent in schizophrenia spectrum disorders, with individuals often having difficulty accurately reporting their social behaviors. Smartphone sensors may offer more objective indicators of social activity, showing promise for understanding individuals with schizophrenia.

JOURNAL OF PSYCHIATRIC RESEARCH (2021)

Article Oncology

Machine-Learning Assisted Discrimination of Precancerous and Cancerous from Healthy Oral Tissue Based on Multispectral Autofluorescence Lifetime Imaging Endoscopy

Elvis Duran-Sierra, Shuna Cheng, Rodrigo Cuenca, Beena Ahmed, Jim Ji, Vladislav V. Yakovlev, Mathias Martinez, Moustafa Al-Khalil, Hussain Al-Enazi, Yi-Shing Lisa Cheng, John Wright, Carlos Busso, Javier A. Jo

Summary: The combination of multispectral autofluorescence lifetime imaging (maFLIM) and machine learning allows for automated discrimination of dysplastic and cancerous oral tissue from healthy tissue, potentially improving outcomes for oral cancer patients by facilitating maximal tumor resection.

CANCERS (2021)

Article Computer Science, Artificial Intelligence

The Ordinal Nature of Emotions: An Emerging Approach

Georgios N. Yannakakis, Roddy Cowie, Carlos Busso

Summary: This paper discusses the theoretical reasons for using ordinal labels to represent and annotate emotions, emphasizing the appropriateness of preference learning methods in treating ordinal labels, and demonstrates the advantages of ordinal annotation in affective computing through case studies.

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING (2021)

Article Computer Science, Information Systems

End-to-End Audiovisual Speech Recognition System With Multitask Learning

Fei Tao, Carlos Busso

Summary: The study introduces a novel multitask learning audiovisual automatic speech recognition system that generalizes across conditions, improves performance, and solves two key speech tasks.

IEEE TRANSACTIONS ON MULTIMEDIA (2021)

Proceedings Paper Engineering, Electrical & Electronic

Use of Triplet-Loss Function to Improve Driving Anomaly Detection Using Conditional Generative Adversarial Network

Yuning Qiu, Teruhisa Misu, Carlos Busso

2020 IEEE 23RD INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC) (2020)

Proceedings Paper Computer Science, Artificial Intelligence

Dynamic versus Static Facial Expressions in the Presence of Speech

Ali N. Salman, Carlos Busso

2020 15TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2020) (2020)

Proceedings Paper Imaging Science & Photographic Technology

STYLE EXTRACTOR FOR FACIAL EXPRESSION RECOGNITION IN THE PRESENCE OF SPEECH

Ali N. Salman, Carlos Busso

2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP) (2020)

Proceedings Paper Acoustics

MODELING UNCERTAINTY IN PREDICTING EMOTIONAL ATTRIBUTES FROM SPONTANEOUS SPEECH

Kusha Sridhar, Carlos Busso

2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (2020)

Article Acoustics

Semi-Supervised Speech Emotion Recognition With Ladder Networks

Srinivas Parthasarathy, Carlos Busso

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING (2020)

Article Computer Science, Information Systems

Discriminative Features for Texture Retrieval Using Wavelet Packets

Andrea Vidal, Jorge F. Silva, Carlos Busso

IEEE ACCESS (2019)

Proceedings Paper Acoustics

ESTIMATION OF GAZE REGION USING TWO DIMENSIONAL PROBABILISTIC MAPS CONSTRUCTED USING CONVOLUTIONAL NEURAL NETWORKS

Sumit Jha, Carlos Busso

2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) (2019)

暂无数据