4.4 Article

A Parallel Approach for Sentiment Analysis on Social Networks Using Spark

Journal

INTELLIGENT AUTOMATION AND SOFT COMPUTING
Volume 35, Issue 2, Pages 1831-1842

Publisher

TECH SCIENCE PRESS
DOI: 10.32604/iasc.2023.029036

Keywords

Social networks; sentiment analysis; big data; spark; tweets; classi fi cation

Ask authors/readers for more resources

Social media has become a vital platform for public opinion, and efficient sentiment analysis methods are needed to handle large datasets. This research proposes a scalable system using Apache Spark and a Naive Bayes training technique for sentiment analysis on Twitter, achieving significant improvements in processing speed and cost-effectiveness.
The public is increasingly using social media platforms such as Twitter and Facebook to express their views on a variety of topics. As a result, social media has emerged as the most effective and largest open source for obtaining public opinion. Single node computational methods are inefficient for sentiment analysis on such large datasets. Supercomputers or parallel or distributed proces-sing are two options for dealing with such large amounts of data. Most parallel programming frameworks, such as MPI (Message Processing Interface), are dif-ficult to use and scale in environments where supercomputers are expensive. Using the Apache Spark Parallel Model, this proposed work presents a scalable system for sentiment analysis on Twitter. A Spark-based Naive Bayes training technique is suggested for this purpose; unlike prior research, this algorithm does not need any disk access. Millions of tweets have been classified using the trained model. Experiments with various-sized clusters reveal that the suggested strategy is extremely scalable and cost-effective for larger data sets. It is nearly 12 times quicker than the Map Reduce-based model and nearly 21 times faster than the Naive Bayes Classifier in Apache Mahout. To evaluate the framework's scalabil-ity, we gathered a large training corpus from Twitter. The accuracy of the classi-fier trained with this new dataset was more than 80%.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available