☆ 4.8 Article

A multilingual offensive language detection method based on transfer learning from transformer fine-tuning model

JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES (2022)

期刊

JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES

卷 34, 期 8, 页码 6048-6056

出版社

ELSEVIER

DOI: 10.1016/j.jksuci.2021.07.013

关键词

Offensive language detection; Social media; Multilingual; Transfer learning; Text classification; Natural language processing

类别

Computer Science, Information Systems

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This study aims to tackle the problem of offensive communications on social media by using computational techniques and transfer learning models. The proposed approach, based on BERT and translation-based techniques, achieves high performance in terms of F1-score and accuracy for multilingual offensive language detection.

Offensive communications have invaded social media content. One of the most effective solutions to cope with this problem is using computational techniques to discriminate offensive content. Moreover, social media users are from linguistically different communities. This study aims to tackle the Multilingual Offensive Language Detection (MOLD) task using transfer learning models and the fine-tuning phase. We propose an effective approach based on the Bidirectional Encoder Representations from Transformers (BERT) that has shown great potential in capturing the semantics and contextual information within texts. The proposed system consists of several stages: (1) Preprocessing, (2) Text representation using BERT models, and (3) Classification into two categories: Offensive and non-offensive. To handle multilingualism, we explore different techniques such as the joint-multilingual and translation-based ones. The first consists in developing one classification system for different languages, and the second involves the translation phase to transform all texts into one universal language then classify them. We conduct several experiments on a bilingual dataset extracted from the Semi-supervised Offensive Language Identification Dataset (SOLID). The experimental findings show that the translation-based method in conjunction with Arabic BERT (AraBERT) achieves over 93% and 91% in terms of F1-score and accuracy, respectively.(c) 2021 The Authors. Published by Elsevier B.V. on behalf of King Saud University. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

A multilingual offensive language detection method based on transfer learning from transformer fine-tuning model

期刊

JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES

出版社

ELSEVIER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A multilingual offensive language detection method based on transfer learning from transformer fine-tuning model

期刊

JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES

出版社

ELSEVIER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文