☆ 4.2 Article

Analysis of Emotionally Salient Aspects of Fundamental Frequency for Emotion Detection

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING (2009)

期刊

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING

卷 17, 期 4, 页码 582-596

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TASL.2008.2009578

关键词

Emotional speech analysis; emotional speech recognition; expressive speech; intonation; pitch contour analysis

类别

Acoustics Engineering, Electrical & Electronic

资金

National Science Foundation (NSF) through the Integrated Media Systems Center [EEC-9529152]
Department of the Army
Office of Naval

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

During expressive speech, the voice is enriched to convey not only the intended semantic message but also the emotional state of the speaker. The pitch contour is one of the important properties of speech that is affected by this emotional modulation. Although pitch features have been commonly used to recognize emotions, it is not clear what aspects of the pitch contour are the most emotionally salient. This paper presents an analysis of the statistics derived from the pitch contour. First, pitch features derived from emotional speech samples are compared with the ones derived from neutral speech, by using symmetric Kullback-Leibler distance. Then, the emotionally discriminative power of the pitch features is quantified by comparing nested logistic regression models. The results indicate that gross pitch contour statistics such as mean, maximum, minimum, and range are more emotionally prominent than features describing the pitch shape. Also, analyzing the pitch statistics at the utterance level is found to be more accurate and robust than analyzing the pitch statistics for shorter speech regions (e.g., voiced segments). Finally, the best features are selected to build a binary emotion detection system for distinguishing between emotional versus neutral speech. A new two-step approach is proposed. In the first step, reference models for the pitch features are trained with neutral speech, and the input features are contrasted with the neutral model. In the second step, a fitness measure is used to assess whether the input speech is similar to, in the case of neutral speech, or different from, in the case of emotional speech, the reference models. The proposed approach is tested with four acted emotional databases spanning different emotional categories, recording settings, speakers and languages. The results show that the recognition accuracy of the system is over 77% just with the pitch features (baseline 50%). When compared to conventional classification schemes, the proposed approach performs better in terms of both accuracy and robustness.

作者

我是这篇论文的作者

点击您的名字以认领此论文并将其添加到您的个人资料中。

主要评分

4.2

评分不足

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

An Overview of Lithuanian Intonation: A Linguistic and Modelling Perspective

Gerda Ana Melnik-Leroy, Jolita Bernataviciene, Grazina Korvel, Gediminas Navickas, Gintautas Tamulevicius, Povilas Treigys

Summary: This paper is the first attempt to gather research on intonation in Lithuanian from both the Lithuanian and the Western traditions, the structuralist and generativist points of view, and the linguistic and modelling perspectives. It identifies issues in existing research and proposes directions for future investigations in linguistics and modelling.

INFORMATICA (2022)