☆ 4.6 Article

TWilBert: Pre-trained deep bidirectional transformers for Spanish Twitter

NEUROCOMPUTING (2021)

期刊

NEUROCOMPUTING

卷 426, 期 -, 页码 58-69

出版社

ELSEVIER

DOI: 10.1016/j.neucom.2020.09.078

关键词

Contextualized Embedd ngs; Spanish; Twitter; TWilBERT

类别

Computer Science, Artificial Intelligence

资金

Spanish Ministerio de Ciencia, Innovacion y Universidades
FEDER [TIN2017-85854-C4-2-R]
Generalitat Valenciana under GiSPRO [PROMETEU/2018/176]
Generalitat Valenciana under GUAITA [INNVA1/2020/61]
Universitat Politecnica de Valencia [PAID-01-17]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

The article introduces TWiLBERT, a specialized variant of BERT for the Spanish language and Twitter domain. By introducing a new Reply Order Prediction signal, the performance in reasoning on Twitter conversation sequences is improved. Across 14 different text classification tasks, TWilBERT outperforms existing systems and Multilingual BERT.

In recent years, the Natural Language Processing community have been moving from uncontextualized word embeddings towards contextualized word embeddings. Among these contextualized architectures, BERT stands out due to its capacity to compute bidirectional contextualized word representations. However, its competitive performance in English downstream tasks is not obtained by its multilingual version when it is applied to other languages and domains. This is especially true in the case of the Spanish language used in Twitter. In this work, we propose TWiLBERT, a specialization of BERT architecture both for the Spanish language and the Twitter domain. Furthermore, we propose a Reply Order Prediction signal to learn inter-sentence coherence in Twitter conversations, which improves the performance of TWilBERT in text classification tasks that require reasoning on sequences of tweets. We perform an extensive evaluation of TWilBERT models on 14 different text classification tasks, such as irony detection, sentiment analysis, or emotion detection. The results obtained by TWilBERT outperform the state-of-the-art systems and Multilingual BERT. In addition, we carry out a thorough analysis of the TWilBERT models to study the reasons of their competitive behavior. We release the pre-trained TWilBERT models used in this paper, along with a framework for training, evaluating, and fine-tuning TWilBERT models. (C) 2020 Elsevier B.V. All rights reserved.

TWilBert: Pre-trained deep bidirectional transformers for Spanish Twitter

期刊

NEUROCOMPUTING

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

TWilBert: Pre-trained deep bidirectional transformers for Spanish Twitter

期刊

NEUROCOMPUTING

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文