4.7 Article

The Multimodal Sentiment Analysis in Car Reviews (MuSe-CaR) Dataset: Collection, Insights and Improvements

期刊

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING
卷 14, 期 2, 页码 1334-1350

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TAFFC.2021.3097002

关键词

Sentiment analysis; Annotations; Task analysis; Databases; Affective computing; Social networking (online); Computational modeling; affective computing; database; mutlimedia retrieval; trustworthiness

向作者/读者索取更多资源

Truly real-life data presents a challenge for sentiment and emotion research. The large variety of 'in-the-wild' properties necessitates the use of large datasets for building robust machine learning models. This paper introduces MuSe-CaR, a first-ever multimodal dataset, and provides a comprehensive overview of its collection and annotation. Furthermore, the paper proposes a Multi-Head-Attention network that outperforms the baseline model in predicting trustworthiness levels.
Truly real-life data presents a strong, but exciting challenge for sentiment and emotion research. The high variety of possible 'in-the-wild' properties makes large datasets such as these indispensable with respect to building robust machine learning models. A sufficient quantity of data covering a deep variety in the challenges of each modality to force the exploratory analysis of the interplay of all modalities has not yet been made available in this context. In this contribution, we present MuSe-CaR, a first of its kind multimodal dataset. The data is publicly available as it recently served as the testing bed for the 1st Multimodal Sentiment Analysis Challenge, and focused on the tasks of emotion, emotion-target engagement, and trustworthiness recognition by means of comprehensively integrating the audio-visual and language modalities. Furthermore, we give a thorough overview of the dataset in terms of collection and annotation, including annotation tiers not used in this year's MuSe 2020. In addition, for one of the sub-challenges - predicting the level of trustworthiness - no participant outperformed the baseline model, and so we propose a simple, but highly efficient Multi-Head-Attention network that exceeds using multimodal fusion the baseline by around 0.2 CCC (almost 50 percent improvement).

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据