Article
Mathematics
Shengfeng Gan, Shiqi Shao, Long Chen, Liangjun Yu, Liangxiao Jiang
Summary: The paper introduces a single model called hidden MNB (HMNB), which creates a hidden parent for each feature to synthesize the influences of all other qualified features by adapting the method of hidden NB (HNB). A simple but effective learning algorithm is proposed and applied to text classification datasets, validating the effectiveness of HMNB in text classification.
Article
Computer Science, Artificial Intelligence
Mohsen Miri, Mohammad Bagher Dowlatshahi, Amin Hashemi, Marjan Kuchaki Rafsanjani, Brij B. Gupta, W. Alhalabi
Summary: The value and importance of multi-label text classification have increased due to the overgrowth of data. Preprocessing and intelligent feature selection are crucial steps in classification. This article proposes an ensemble feature selection method using order statistics to improve the accuracy of multi-label text classification.
INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS
(2022)
Article
Computer Science, Artificial Intelligence
Bekir Parlak
Summary: Text classification is a crucial problem in the modern era due to the large amount of textual data. Feature selection, which has a big impact on classification accuracy, is one of the most crucial processes in text classification studies. Various feature selection techniques are suggested in the literature, each with a different feature order and selection criteria. This study aims to combine these distinguishing features in different orders to observe the success and failure of different methods when combined. The results show that the combination of feature selection approaches performs better than any single feature selection method alone, but some combinations may have lower performance rates than individual methods.
NEURAL COMPUTING & APPLICATIONS
(2023)
Article
Chemistry, Analytical
Majed Alwateer, Abdulqader M. Almars, Kareem N. Areed, Mostafa A. Elhosseini, Amira Y. Haikal, Mahmoud Badawy
Summary: A novel approach for processing healthcare data is introduced in this paper to predict useful information with minimum computational cost, aiming to improve accuracy and reduce processing time. The proposed method utilizes the Whale Optimization Algorithm and Naive Bayes Classifier for data processing and feature selection, resulting in enhanced accuracy and processing speed.
Article
Genetics & Heredity
Yuxin Guo, Liping Hou, Wen Zhu, Peng Wang
Summary: The study focuses on the characteristics and identification methods of hormone binding proteins, successfully establishing a prediction model HBP_NB, using high-quality dataset and feature selection algorithm to accurately identify HBPs.
FRONTIERS IN GENETICS
(2021)
Article
Engineering, Multidisciplinary
M. Shaheen, N. Naheed, A. Ahsan
Summary: Big data analytics uncovers hidden patterns through classification, prediction and reinforcement of big datasets. Relevant, important and informative features are selected using different filtration techniques. A new feature selection technique called Relevance-diversity algorithm and a new supervised classification algorithm based on Naive Bayes classification are proposed. The performance of these techniques is evaluated using various datasets, and the results show improvements in terms of feature selection, accuracy, and time complexity.
ALEXANDRIA ENGINEERING JOURNAL
(2023)
Article
Biochemical Research Methods
Fengsheng Wang, Leyi Wei
Summary: In this study, we propose a novel multi-scale end-to-end deep learning model, MSTLoc, for identifying protein subcellular locations in the imbalanced multi-label immunohistochemistry (IHC) images dataset. We demonstrate that the proposed MSTLoc outperforms current state-of-the-art models in multi-label subcellular location prediction. Through feature visualization and interpretation analysis, we show that the multi-scale deep features learned from our model exhibit better ability in capturing discriminative patterns underlying protein subcellular locations, and the features from different scales are complementary for the improvement in performance. Case study results indicate that our MSTLoc can successfully identify some biomarkers from proteins that are closely involved in cancer development.
Article
Computer Science, Artificial Intelligence
Azam Asilian Bidgoli, Hossein Ebrahimpour-komleh, Shahryar Rahnamayan
Summary: This study proposes a multi-objective optimization method to select efficient features for multi-label classification, aiming to improve classification performance by maximizing feature-label correlation and minimizing computational complexity. Experimental results show significant improvements for the proposed method on multi-label datasets compared to other algorithms.
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS
(2021)
Article
Computer Science, Information Systems
Lee-Kien Foo, Sook-Ling Chua, Neveen Ibrahim
Summary: The naive Bayes classifier is a simple yet effective method for data mining classification. However, the assumption of attribute independence may not hold in real-world applications. To address this, researchers proposed a method to incorporate attribute weights into naive Bayes, which resulted in improved classification performance in terms of accuracy and F1 score.
CMC-COMPUTERS MATERIALS & CONTINUA
(2022)
Article
Computer Science, Artificial Intelligence
Shufen Ruan, Baozhou Chen, Kunfang Song, Hongwei Li
Summary: This paper presents an innovative attribute weighting method for naive Bayes text classifiers, utilizing an improved distance correlation coefficient to accurately measure the importance of attributes to categories, achieving optimization. Experimental results indicate an effective balance between classification accuracy and execution time with this method.
NEURAL COMPUTING & APPLICATIONS
(2022)
Article
Computer Science, Artificial Intelligence
Hongpo Zhang, Ning Cheng, Yang Zhang, Zhanbo Li
Summary: Label flipping attack is a poisoning attack that reduces the classification performance of a model by flipping the labels of training samples. Naive Bayes algorithm demonstrates good robustness in handling issues like document classification and spam filtering. The proposed label flipping attacks effectively reduce the accuracy of various classification models.
APPLIED INTELLIGENCE
(2021)
Article
Computer Science, Information Systems
Ankita Dhar, Himadri Mukherjee, Kaushik Roy, K. C. Santosh, Niladri Sekhar Dash
Summary: This article introduces a hybrid approach that combines text-based and graph-based features to showcase the effectiveness of an automatic text categorization system. The approach was applied on 14,373 Bangla articles, collected from various online news corpora covering nine categories. The experiments also include the application of the features on two popular English datasets to test the system's robustness and language independency.
JOURNAL OF INFORMATION SCIENCE
(2023)
Article
Computer Science, Artificial Intelligence
Yonghao Li, Liang Hu, Wanfu Gao
Summary: Multi-label feature selection is an efficient technique for dealing with high-dimensional multi-label data, but existing methods suffer from low feature discrimination and redundancy. This paper proposes a new regularization norm and optimization framework to address these issues, and empirical studies demonstrate the effectiveness and efficiency of the proposed method.
PATTERN RECOGNITION
(2023)
Article
Computer Science, Information Systems
Bekir Parlak
Summary: Text classification is an important topic in the current era, but in feature selection, the information of the features is often ignored. This study proposes a new globalization technique, called FCWS, which considers both feature and class information to improve classification performance. Experimental results on multiple datasets demonstrate the effectiveness of the proposed method.
MULTIMEDIA TOOLS AND APPLICATIONS
(2023)
Review
Computer Science, Information Systems
Hong Ming, Wang Heyong
Summary: This paper provides a comprehensive systematic review of existing filter feature selection methods for text classification. It discusses mathematical designs, effectiveness, and complexity of different methodologies (supervised, unsupervised, and hybrid methods). Benchmark datasets for evaluating performance are also discussed. Future research directions and conclusions are provided.
MULTIMEDIA TOOLS AND APPLICATIONS
(2023)
Article
Computer Science, Information Systems
Vithyatheri Govindan, Vimala Balakrishnan
Summary: This paper investigates negative sentiment tweets with hyperboles for sarcasm detection. The proposed model achieved high accuracy and F-score in detecting sarcasm in tweets that contain hyperbolic words.
JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES
(2022)
Article
Information Science & Library Science
Vimala Balakrishnan, Kee S. Ng, Hamid R. Arabnia
Summary: This study investigated cyber-racism on social media during the recent Coronavirus pandemic using machine learning models. The results showed that the models had consistent performance in detecting cyber-racism patterns based on textual communications. Topic modelling revealed three distinct topics for racist tweets, namely, Eating habit, Political hatred, and Xenophobia.
TELEMATICS AND INFORMATICS
(2022)
Article
Psychology, Multidisciplinary
Vimala Balakrishnan, Kee Seong Ng, Wandeep Kaur, Zhen Lek Lee
Summary: This study aims to synthesize existing literature on the psychological outcomes of people in Southeast Asia during the COVID-19 pandemic and identify risk factors. The study found that there was an elevated prevalence of adverse mental effects, with Malaysia and Philippines reporting higher rates. Mental decline was more common among the general population compared to healthcare workers and students. The dominant risk factors identified were younger age, female sex, higher education, low coping skills and social/family support, and poor reliability of COVID-19 related information.
CURRENT PSYCHOLOGY
(2023)
Article
Computer Science, Information Systems
Yee Jian Chew, Nicholas Lee, Shih Yin Ooi, Kok-Seng Wong, Ying Han Pang
Summary: Several recent NIDS datasets have been published, however, the lack of baseline experimental results on the full version of datasets had made it difficult for researchers to perform benchmarking. It is challenging for researchers to compare the performance unbiasedly across each of the machine classifiers, and literature has addressed that the cross-validation resampling scheme in the domain of NIDS is considered inappropriate.
INFORMATION SECURITY JOURNAL
(2022)
Article
Computer Science, Cybernetics
Vimala Balakrishnan, See Kiat Ng
Summary: This study investigates the impact of users' personality traits and emotions expressed through textual communications on YouTube to detect cyberbullying. The results show that both personality traits and emotions significantly improve the identification of cyberbullying presence, with accuracy and F-score values of more than 95%. Further analysis reveals that anger and openness have a more profound effect compared to other emotions and personalities, and neurotic individuals tend to engage in cyberbullying due to joy, disgust and fear.
BEHAVIOUR & INFORMATION TECHNOLOGY
(2023)
Article
Computer Science, Artificial Intelligence
Vimala Balakrishnan, Vithyatheri Govindan, Kumanan N. Govaichelvan
Summary: This study uses a corpus of Tamil comments collected from YouTube to detect offensive language patterns. The research compares supervised and unsupervised machine learning approaches, and finds that unsupervised clustering is more effective in detecting offensive language in under-resourced languages.
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING
(2023)
Review
Biochemistry & Molecular Biology
Vimala Balakrishnan, Yousra Kehrabi, Ghayathri Ramanathan, Scott Arjay Paul, Chiong Kian Tiong
Summary: Biomarker-based tests can improve tuberculosis diagnosis, treatment initiation, and outcomes. Machine learning approaches show promising results in detecting TB using biomarkers.
PROGRESS IN BIOPHYSICS & MOLECULAR BIOLOGY
(2023)
Review
Health Care Sciences & Services
Vimala Balakrishnan, Kok Khuen Yong, Chiong Kian Tiong, Nicholas Jian Shen Ng, Zhao Ni
Summary: This scoping review examines the extent of research on knowledge, awareness, perceptions, attitudes, and risky behaviors related to sexually transmitted infections (STIs) in Southeast Asia (SEA), indicating low levels across various cohorts. The review highlights the impact of cultural, societal, economic, and gender inequality on people's behaviors. It calls for increased investment in educating vulnerable populations, especially in less-developed countries/regions of SEA, to prevent STIs.
Review
Computer Science, Artificial Intelligence
Vimala Balakrisnan, Mohammed Kaity
Summary: This paper conducts a systematic literature review on scholarly publications from 2011 to 2022 that focus on using machine learning to detect cyberbullying incidents. The findings highlight the dire consequences of cyberbullying across different demographics and provide insights on machine learning algorithms, features, and performance measures in cyberbullying detection. The paper also discusses research challenges and future directions for further exploration.
ARTIFICIAL INTELLIGENCE REVIEW
(2023)
Article
Computer Science, Information Systems
Vimala Balakrishnan, Ghayathri Ramanathan, Siyi Zhou, Chee Kuan Wong
Summary: This study developed and optimized a machine learning model to predict the treatment duration for Tuberculosis patients in Malaysia using a real-life patient dataset. The Support Vector Regression model performed the best in predicting treatment duration with the lowest error rates. Comparison with data from other countries confirmed the consistent performance of the optimized model.
MULTIMEDIA TOOLS AND APPLICATIONS
(2023)
Review
Health Care Sciences & Services
Wandeep Kaur, Vimala Balakrishnan, Ian Ng Zhi Wei, Annabel Yeo Yung Chen, Zhao Ni
Summary: This scoping review collected current literature on the knowledge, awareness, and perception of STIs/STDs among women in Asia. The results showed consistently low levels of knowledge and awareness across Asia, particularly among vulnerable groups such as sex workers, transgender women, pregnant women, and rural housewives. The study emphasizes the need for educational initiatives to target these groups and prevent STIs/STDs.
Article
Education & Educational Research
Farhan Bashir Shaikh, Ramesh Kumar Ayyasamy, Vimala Balakrishnan, Mobashar Rehman, Shadab Kalhoro
Summary: This study examines the factors influencing cyberbullying behavior among Malaysian tertiary students. A model combining Social Cognitive Theory and the Theory of Planned Behavior is used, and the data from a survey of 428 students is analyzed using a two-step Structural Equation Modeling (SEM) -Artificial Neural Networks (ANN) approach. The results show that the intention to engage in cyberbullying is the most influential factor, influenced by variables such as image, moral disengagement, perceived behavioral control, university climate, subjective norms, peer relationships, and attitude towards cyberbullying. The ANN results further reveal that image is the strongest predictor of cyberbullying intention, followed by moral disengagement, cyberbullying attitude, perceived behavioral control, and university climate. These findings offer valuable insights into the underlying factors of cyberbullying among Malaysian tertiary students and provide guidance for addressing this issue in the country.
EDUCATION AND INFORMATION TECHNOLOGIES
(2023)
Article
Computer Science, Artificial Intelligence
Vimala Balakrishnan, Hii Lee Zing, Eric Laporte
Summary: The use of content features, especially textual and linguistic, in detecting fake news has not been sufficiently studied, despite evidence suggesting their potential contribution. This study explores various content features, such as word bigrams and part of speech distribution, for improved fake news detection. The experiments conducted on a new dataset collected during the COVID-19 pandemic using different machine learning algorithms show that Random Forest performs the best, followed closely by Support Vector Machine. Overall, both textual and linguistic features enhance fake news detection when used separately, but combining them into a single model does not significantly improve the results. Differences in performance are also observed between word bigrams and part of speech tags. This study demonstrates the successful utilization of textual and linguistic features in detecting fake news with traditional machine learning approaches compared to deep learning.
MALAYSIAN JOURNAL OF COMPUTER SCIENCE
(2023)
Proceedings Paper
Computer Science, Information Systems
Thai-Hung Nguyen, Truong Giang Vu, Huong-Lan Tran, Kok-Seng Wong
Summary: The rise of autonomous vehicles has raised concerns about privacy and data protection. Implementing privacy and security protections becomes difficult, especially when different suppliers are involved. Individual concerns mainly focus on data collection and usage, particularly how location information combined with personal data can reveal sensitive information. Some data needs to be shared or published in real-time for analysis or research purposes due to mutual benefits or regulations.
36TH INTERNATIONAL CONFERENCE ON INFORMATION NETWORKING (ICOIN 2022)
(2022)