☆ 4.4 Article

Arabic senti-lexicon: Constructing publicly available language resources for Arabic sentiment analysis

JOURNAL OF INFORMATION SCIENCE (2018)

期刊

JOURNAL OF INFORMATION SCIENCE

卷 44, 期 3, 页码 345-362

出版社

SAGE PUBLICATIONS LTD

DOI: 10.1177/0165551516683908

关键词

Arabic Sentiment Corpus; Arabic sentiment lexicon; feature set; senti-lexicon; sentiment analysis

类别

Computer Science, Information Systems Information Science & Library Science

资金

Ministry of Higher Education, Malaysia [FRGS/1/2016/ICT02/UKM/02/11, FRGS/1/2015/ICT02/UKM/01/2]
Universiti Kebangsaan Malaysia [DIP-2016-024]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Sentiment analysis is held to be one of the highly dynamic recent research fields in Natural Language Processing, facilitated by the quickly growing volume of Web opinion data. Most of the approaches in this field are focused on English due to the lack of sentiment resources in other languages such as the Arabic language and its large variety of dialects. In most sentiment analysis applications, good sentiment resources play a critical role. Based on that, in this article, several publicly available sentiment analysis resources for Arabic are introduced. This article introduces the Arabic senti-lexicon, a list of 3880 positive and negative synsets annotated with their part of speech, polarity scores, dialects synsets and inflected forms. This article also presents a Multi-domain Arabic Sentiment Corpus (MASC) with a size of 8860 positive and negative reviews from different domains. In this article, an in-depth study has been conducted on five types of feature sets for exploiting effective features and investigating their effect on performance of Arabic sentiment analysis. The aim is to assess the quality of the developed language resources and to integrate different feature sets and classification algorithms to synthesise a more accurate sentiment analysis method. The Arabic senti-lexicon is used for generating feature vectors. Five well-known machine learning algorithms: naive Bayes, k-nearest neighbours, support vector machines (SVMs), logistic linear regression and neural network are employed as base-classifiers for each of the feature sets. A wide range of comparative experiments on standard Arabic data sets were conducted, discussion is presented and conclusions are drawn. The experimental results show that the Arabic senti-lexicon is a very useful resource for Arabic sentiment analysis. Moreover, results show that classifiers which are trained on feature vectors derived from the corpus using the Arabic sentiment lexicon are more accurate than classifiers trained using the raw corpus.

Arabic senti-lexicon: Constructing publicly available language resources for Arabic sentiment analysis

期刊

JOURNAL OF INFORMATION SCIENCE

出版社

SAGE PUBLICATIONS LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Arabic senti-lexicon: Constructing publicly available language resources for Arabic sentiment analysis

期刊

JOURNAL OF INFORMATION SCIENCE

出版社

SAGE PUBLICATIONS LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文