4.8 Article

A multilingual offensive language detection method based on transfer learning from transformer fine-tuning model

出版社

ELSEVIER
DOI: 10.1016/j.jksuci.2021.07.013

关键词

Offensive language detection; Social media; Multilingual; Transfer learning; Text classification; Natural language processing

向作者/读者索取更多资源

This study aims to tackle the problem of offensive communications on social media by using computational techniques and transfer learning models. The proposed approach, based on BERT and translation-based techniques, achieves high performance in terms of F1-score and accuracy for multilingual offensive language detection.
Offensive communications have invaded social media content. One of the most effective solutions to cope with this problem is using computational techniques to discriminate offensive content. Moreover, social media users are from linguistically different communities. This study aims to tackle the Multilingual Offensive Language Detection (MOLD) task using transfer learning models and the fine-tuning phase. We propose an effective approach based on the Bidirectional Encoder Representations from Transformers (BERT) that has shown great potential in capturing the semantics and contextual information within texts. The proposed system consists of several stages: (1) Preprocessing, (2) Text representation using BERT models, and (3) Classification into two categories: Offensive and non-offensive. To handle multilingualism, we explore different techniques such as the joint-multilingual and translation-based ones. The first consists in developing one classification system for different languages, and the second involves the translation phase to transform all texts into one universal language then classify them. We conduct several experiments on a bilingual dataset extracted from the Semi-supervised Offensive Language Identification Dataset (SOLID). The experimental findings show that the translation-based method in conjunction with Arabic BERT (AraBERT) achieves over 93% and 91% in terms of F1-score and accuracy, respectively.(c) 2021 The Authors. Published by Elsevier B.V. on behalf of King Saud University. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据