4.1 Article

IMPROVING MULTI-LABEL TEXT CLASSIFICATION USING WEIGHTED INFORMATION GAIN AND CO-TRAINED MULTINOMIAL NAIVE BAYES CLASSIFIER

Journal

MALAYSIAN JOURNAL OF COMPUTER SCIENCE
Volume 35, Issue 1, Pages 21-36

Publisher

UNIV MALAYA, FAC COMPUTER SCIENCE & INFORMATION TECH
DOI: 10.22452/mjcs.vol35no1.2

Keywords

Text classification; Multi-label; Feature selection; Weighted Information Gain; Multinomial Naive Bayes

Funding

  1. University of Malaya [UMRG RP059C 17SBS]

Ask authors/readers for more resources

This paper examines the weighted information gain method for improving text classification accuracy. The proposed algorithm is trained and tested using a corpus from Facebook pages, and incorporates the weighted information gain feature selection technique with a co-trained Naive Bayes classification algorithm. The results show an improvement in classification to 61%.
Over recent years, the emergence of electronic text processing systems has generated a vast amount of structured and unstructured data, thus creating a challenging situation for users to rummage through irrelevant information. Therefore, studies are continually looking to improve the classification process to produce more accurate results that would benefit users. This paper looks into the weighted information gain method that re-assigns wrongly classified features with new weights to provide better classification. The method focuses on the weights of the frequency bins, assuming every time a certain word frequency bin is iterated, it provides information on the target word feature. Therefore, the more iteration and re-assigning of weight occur within the bin, the more important the bin becomes, eventually providing better classification. The proposed algorithm was trained and tested using a corpus extracted from dedicated Facebook pages related to diabetes. The weighted information gain feature selection technique is then fed into a co-trained Multinomial Naive Bayes classification algorithm that captures the labels' dependencies. The algorithm incorporates class value dependencies since the dataset used multi-label data before converting string vectors that allow the sparse distribution between features to be minimised, thus producing more accurate results. The results of this study show an improvement in classification to 61%.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.1
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Computer Science, Information Systems

A machine learning approach in analysing the effect of hyperboles using negative sentiment tweets for sarcasm detection

Vithyatheri Govindan, Vimala Balakrishnan

Summary: This paper investigates negative sentiment tweets with hyperboles for sarcasm detection. The proposed model achieved high accuracy and F-score in detecting sarcasm in tweets that contain hyperbolic words.

JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES (2022)

Article Information Science & Library Science

Unravelling social media racial discriminations through a semi-supervised approach

Vimala Balakrishnan, Kee S. Ng, Hamid R. Arabnia

Summary: This study investigated cyber-racism on social media during the recent Coronavirus pandemic using machine learning models. The results showed that the models had consistent performance in detecting cyber-racism patterns based on textual communications. Topic modelling revealed three distinct topics for racist tweets, namely, Eating habit, Political hatred, and Xenophobia.

TELEMATICS AND INFORMATICS (2022)

Article Psychology, Multidisciplinary

COVID-19 mental health prevalence and its risk factors in South East Asia

Vimala Balakrishnan, Kee Seong Ng, Wandeep Kaur, Zhen Lek Lee

Summary: This study aims to synthesize existing literature on the psychological outcomes of people in Southeast Asia during the COVID-19 pandemic and identify risk factors. The study found that there was an elevated prevalence of adverse mental effects, with Malaysia and Philippines reporting higher rates. Mental decline was more common among the general population compared to healthcare workers and students. The dominant risk factors identified were younger age, female sex, higher education, low coping skills and social/family support, and poor reliability of COVID-19 related information.

CURRENT PSYCHOLOGY (2023)

Article Computer Science, Information Systems

Benchmarking full version of GureKDDCup, UNSW-NB15, and CIDDS-001 NIDS datasets using rolling-origin resampling

Yee Jian Chew, Nicholas Lee, Shih Yin Ooi, Kok-Seng Wong, Ying Han Pang

Summary: Several recent NIDS datasets have been published, however, the lack of baseline experimental results on the full version of datasets had made it difficult for researchers to perform benchmarking. It is challenging for researchers to compare the performance unbiasedly across each of the machine classifiers, and literature has addressed that the cross-validation resampling scheme in the domain of NIDS is considered inappropriate.

INFORMATION SECURITY JOURNAL (2022)

Article Computer Science, Cybernetics

Personality and emotion based cyberbullying detection on YouTube using ensemble classifiers

Vimala Balakrishnan, See Kiat Ng

Summary: This study investigates the impact of users' personality traits and emotions expressed through textual communications on YouTube to detect cyberbullying. The results show that both personality traits and emotions significantly improve the identification of cyberbullying presence, with accuracy and F-score values of more than 95%. Further analysis reveals that anger and openness have a more profound effect compared to other emotions and personalities, and neurotic individuals tend to engage in cyberbullying due to joy, disgust and fear.

BEHAVIOUR & INFORMATION TECHNOLOGY (2023)

Article Computer Science, Artificial Intelligence

Tamil Offensive Language Detection: Supervised versus Unsupervised Learning Approaches

Vimala Balakrishnan, Vithyatheri Govindan, Kumanan N. Govaichelvan

Summary: This study uses a corpus of Tamil comments collected from YouTube to detect offensive language patterns. The research compares supervised and unsupervised machine learning approaches, and finds that unsupervised clustering is more effective in detecting offensive language in under-resourced languages.

ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING (2023)

Review Biochemistry & Molecular Biology

Machine learning approaches in diagnosing tuberculosis through biomarkers-A systematic review

Vimala Balakrishnan, Yousra Kehrabi, Ghayathri Ramanathan, Scott Arjay Paul, Chiong Kian Tiong

Summary: Biomarker-based tests can improve tuberculosis diagnosis, treatment initiation, and outcomes. Machine learning approaches show promising results in detecting TB using biomarkers.

PROGRESS IN BIOPHYSICS & MOLECULAR BIOLOGY (2023)

Review Health Care Sciences & Services

A Scoping Review of Knowledge, Awareness, Perceptions, Attitudes, and Risky Behaviors of Sexually Transmitted Infections in Southeast Asia

Vimala Balakrishnan, Kok Khuen Yong, Chiong Kian Tiong, Nicholas Jian Shen Ng, Zhao Ni

Summary: This scoping review examines the extent of research on knowledge, awareness, perceptions, attitudes, and risky behaviors related to sexually transmitted infections (STIs) in Southeast Asia (SEA), indicating low levels across various cohorts. The review highlights the impact of cultural, societal, economic, and gender inequality on people's behaviors. It calls for increased investment in educating vulnerable populations, especially in less-developed countries/regions of SEA, to prevent STIs.

HEALTHCARE (2023)

Review Computer Science, Artificial Intelligence

Cyberbullying detection and machine learning: a systematic literature review

Vimala Balakrisnan, Mohammed Kaity

Summary: This paper conducts a systematic literature review on scholarly publications from 2011 to 2022 that focus on using machine learning to detect cyberbullying incidents. The findings highlight the dire consequences of cyberbullying across different demographics and provide insights on machine learning algorithms, features, and performance measures in cyberbullying detection. The paper also discusses research challenges and future directions for further exploration.

ARTIFICIAL INTELLIGENCE REVIEW (2023)

Article Computer Science, Information Systems

Optimized support vector regression predicting treatment duration among tuberculosis patients in Malaysia

Vimala Balakrishnan, Ghayathri Ramanathan, Siyi Zhou, Chee Kuan Wong

Summary: This study developed and optimized a machine learning model to predict the treatment duration for Tuberculosis patients in Malaysia using a real-life patient dataset. The Support Vector Regression model performed the best in predicting treatment duration with the lowest error rates. Comparison with data from other countries confirmed the consistent performance of the optimized model.

MULTIMEDIA TOOLS AND APPLICATIONS (2023)

Review Health Care Sciences & Services

Understanding Women's Knowledge, Awareness, and Perceptions of STIs/STDs in Asia: A Scoping Review

Wandeep Kaur, Vimala Balakrishnan, Ian Ng Zhi Wei, Annabel Yeo Yung Chen, Zhao Ni

Summary: This scoping review collected current literature on the knowledge, awareness, and perception of STIs/STDs among women in Asia. The results showed consistently low levels of knowledge and awareness across Asia, particularly among vulnerable groups such as sex workers, transgender women, pregnant women, and rural housewives. The study emphasizes the need for educational initiatives to target these groups and prevent STIs/STDs.

HEALTHCARE (2023)

Article Education & Educational Research

Cyberbullying attitude, intention and behaviour among Malaysian tertiary students - A two stage SEM- ANN approach

Farhan Bashir Shaikh, Ramesh Kumar Ayyasamy, Vimala Balakrishnan, Mobashar Rehman, Shadab Kalhoro

Summary: This study examines the factors influencing cyberbullying behavior among Malaysian tertiary students. A model combining Social Cognitive Theory and the Theory of Planned Behavior is used, and the data from a survey of 428 students is analyzed using a two-step Structural Equation Modeling (SEM) -Artificial Neural Networks (ANN) approach. The results show that the intention to engage in cyberbullying is the most influential factor, influenced by variables such as image, moral disengagement, perceived behavioral control, university climate, subjective norms, peer relationships, and attitude towards cyberbullying. The ANN results further reveal that image is the strongest predictor of cyberbullying intention, followed by moral disengagement, cyberbullying attitude, perceived behavioral control, and university climate. These findings offer valuable insights into the underlying factors of cyberbullying among Malaysian tertiary students and provide guidance for addressing this issue in the country.

EDUCATION AND INFORMATION TECHNOLOGIES (2023)

Article Computer Science, Artificial Intelligence

COVID-19 INFODEMIC - UNDERSTANDING CONTENT FEATURES IN DETECTING FAKE NEWS USING A MACHINE LEARNING APPROACH

Vimala Balakrishnan, Hii Lee Zing, Eric Laporte

Summary: The use of content features, especially textual and linguistic, in detecting fake news has not been sufficiently studied, despite evidence suggesting their potential contribution. This study explores various content features, such as word bigrams and part of speech distribution, for improved fake news detection. The experiments conducted on a new dataset collected during the COVID-19 pandemic using different machine learning algorithms show that Random Forest performs the best, followed closely by Support Vector Machine. Overall, both textual and linguistic features enhance fake news detection when used separately, but combining them into a single model does not significantly improve the results. Differences in performance are also observed between word bigrams and part of speech tags. This study demonstrates the successful utilization of textual and linguistic features in detecting fake news with traditional machine learning approaches compared to deep learning.

MALAYSIAN JOURNAL OF COMPUTER SCIENCE (2023)

Proceedings Paper Computer Science, Information Systems

Emerging Privacy and Trust Issues for Autonomous Vehicle Systems

Thai-Hung Nguyen, Truong Giang Vu, Huong-Lan Tran, Kok-Seng Wong

Summary: The rise of autonomous vehicles has raised concerns about privacy and data protection. Implementing privacy and security protections becomes difficult, especially when different suppliers are involved. Individual concerns mainly focus on data collection and usage, particularly how location information combined with personal data can reveal sensitive information. Some data needs to be shared or published in real-time for analysis or research purposes due to mutual benefits or regulations.

36TH INTERNATIONAL CONFERENCE ON INFORMATION NETWORKING (ICOIN 2022) (2022)

No Data Available