☆ 4.5 Article Proceedings Paper

Prediction of asynchronous dimensional emotion ratings from audiovisual and physiological data

PATTERN RECOGNITION LETTERS (2015)

期刊

PATTERN RECOGNITION LETTERS

卷 66, 期 -, 页码 22-30

出版社

ELSEVIER

DOI: 10.1016/j.patrec.2014.11.007

关键词

Context-learning long short-term memory; recurrent neural networks; Audiovisual and physiological data; Continuous affect analysis; Multi-task learning; Multitime resolution features extraction; Multimodal fusion

类别

Computer Science, Artificial Intelligence

资金

EC (ERC starting grant iHEARu) [338164]
Swiss National Science Foundation through the National Centre for Competence in Research (NCCR) on Interactive Multimodal Information Management [IM2]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Automatic emotion recognition systems based on supervised machine learning require reliable annotation of affective behaviours to build useful models. Whereas the dimensional approach is getting more and more popular for rating: affective behaviours in continuous time domains, e.g., arousal and valence, methodologies to take into account reaction lags of the human raters are still rare. We therefore investigate the relevance of using machine learning algorithms able to integrate contextual information in the modelling, like long short-term memory recurrent neural networks do, to automatically predict emotion from several (asynchronous) raters in continuous time domains, i.e., arousal and valence. Evaluations are performed on the recently proposed RECOLA multimodal database (27 subjects, 5 min of data and six raters for each), which includes audio, video, and physiological (ECG, EDA) data. In fact, studies uniting audiovisual and physiological information are still very rare. Features are extracted with various window sizes for each modality and performance for the automatic emotion prediction is compared for both different architectures of neural networks and fusion approaches (feature-level/decision-level). The results show that: (i) LSTM network can deal with (asynchronous) dependencies found between continuous ratings of emotion with video data, (ii) the prediction of the emotional valence requires longer analysis window than for arousal and (iii) a decision-level fusion leads to better performance than a feature-level fusion. The best performance (concordance correlation coefficient) for the multimodal emotion prediction is 0.804 for arousal and 0.528 for valence. (C) 2014 Elsevier B.V. All rights reseived.

Prediction of asynchronous dimensional emotion ratings from audiovisual and physiological data

期刊

PATTERN RECOGNITION LETTERS

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Prediction of asynchronous dimensional emotion ratings from audiovisual and physiological data

期刊

PATTERN RECOGNITION LETTERS

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文