4.5 Article

Speech emotion recognition model based on Bi-GRU and Focal Loss

期刊

PATTERN RECOGNITION LETTERS
卷 140, 期 -, 页码 358-365

出版社

ELSEVIER
DOI: 10.1016/j.patrec.2020.11.009

关键词

Bi-GRU; Focal loss; Speech emotion recognition; Deep learning; CRNN

资金

  1. Special Projects in Key Areas (New Generation of Information Technology) of Colleges and Universities in Guangdong Province [2020ZDZX3046]
  2. Characteristics Innovation Project of Colleges and Universities of Guangdong Province [2019KTSCX235, 2019KTSCX234]
  3. Higher Education of the Ministry of Education of the People's Republic of China [201901070016]

向作者/读者索取更多资源

For the problems of inconsistent sample duration and unbalance of sample categories in the speech emotion corpus, this paper proposes a speech emotion recognition model based on Bi-GRU (Bidirection Gated Recurrent Unit) and Focal Loss. The model has been improved on the basis of learning CRNN (Convolutional Recurrent Neural Network) deeply. In CRNN, Bi-GRU is used to effectively lengthen the samples of the speech with short duration, and Focal Loss function is used to deal with the difficulties in classification caused by the imbalance of emotional categories of the samples. Through different methods for experimental comparison, weighted average recall (WAR), unweighted average recall (UAR) and confusion matrix (CM) are used as evaluation index of the algorithm. The experimental results show that the speech emotion recognition model proposed in this paper improves the recognition accuracy and the imbalance of IEMOCAP database samples, and can effectively prove that the improvement of speech emotion recognition performance is not due to the adjustment of model parameters or the change of the model topology. (c) 2020 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据