Article
Computer Science, Information Systems
Ruby Rani, D. K. Lobiyal
Summary: This paper attempts to construct corpus specific stopwords lists for Hindi text documents using statistical and knowledge-based methods, and proposes an evaluation method to examine their behavior using text mining models.
JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES
(2022)
Article
Genetics & Heredity
Daniel Voskergian, Burcu Bakir-Gungor, Malik Yousef
Summary: This study explores the performance of a previous study on short texts and proposes an advanced short-text classification framework that utilizes lexical features and topic distribution to address the data sparseness problem. The proposed approach is evaluated on two datasets related to biomedical and computer science fields, and it demonstrates the effectiveness of leveraging semantic information for improving ML classifier performance.
FRONTIERS IN GENETICS
(2023)
Article
Computer Science, Artificial Intelligence
Jay Kumar, Junming Shao, Rajesh Kumar, Salah Ud Din, Cobbinah B. Mawuli, Qinli Yang
Summary: Online clustering of short text streams is important in news and social media platforms. This paper proposes a non-parametric Dirichlet model with episodic inference (EINDM) to cluster the evolving short text stream. EINDM introduces a window-based low-dimensional semantic term representation to capture contextual relationships between words and reduces cluster sparsity using episodic inference. Evaluation results show that EINDM outperforms recent clustering models in terms of NMI, homogeneity, and cluster purity.
EXPERT SYSTEMS WITH APPLICATIONS
(2023)
Article
Computer Science, Information Systems
Mingyang Li, Xinhua Bi, Limin Wang, Xuming Han, Lin Wang, Wei Zhou
Summary: This study proposes a semantic similarity calculation method integrating word2vec model and TF-IDF, which is successfully applied to text data in online medical communities. Experimental results demonstrate the superiority of this method in text similarity measurement, providing a reference for identifying user demands from medical text data in the big data environment.
JOURNAL OF ORGANIZATIONAL AND END USER COMPUTING
(2022)
Article
Health Care Sciences & Services
Minhao Xiang, Dongdong Zhong, Minghua Han, Kun Lv
Summary: With the development of economy and society, people's health awareness has increased and the demand for health information has grown. This study introduces an advanced BERT-LDA model for topic-sentiment analysis in online health communities. By analyzing the distribution of positive and negative sentiments across each topic, the correlation between different health information demands and emotional expressions is investigated. This research enhances our understanding of users' emotional reactions and provides valuable insights for delivering personalized health information in online communities.
Article
Computer Science, Artificial Intelligence
Junaid Rashid, Jungeun Kim, Amir Hussain, Usman Naseem
Summary: This paper proposes a novel word embedding-based topic model (WETM) for short text documents to discover the structural information of topics and words and eliminate the sparsity problem. WETM extracts semantically coherent topics from short texts and finds relationships between words using a modified collapsed Gibbs sampling algorithm. Extensive experimental results show that WETM achieves better topic quality, coherence, classification, and clustering results compared to traditional topic models, while also requiring less execution time.
PATTERN RECOGNITION LETTERS
(2023)
Article
Health Care Sciences & Services
Jingfang Liu, Yu Zeng
Summary: Physician online communities provide a platform for doctors to communicate and help each other. A study analyzed posts from a physician online community in China and found that the use of certain types of words in the posts, such as time words, visual words, auditory words, and physiological process words, had a positive effect on the number of responses.
Article
Multidisciplinary Sciences
Oliver C. Stringham, Stephanie Moncayo, Katherine G. W. Hill, Adam Toomes, Lewis Mitchell, Joshua Ross, Phillip Cassey
Summary: Automated monitoring of wildlife trade websites is increasingly important for conservation efforts. Text classifiers have shown promise in accurately identifying relevant advertisements, with a minimum sample size of 33% required for accurate predictions. Further integration of machine learning tools, such as image classification, may improve predictive abilities in streamlining data processing for online wildlife trade.
Article
Automation & Control Systems
Feifei Wang, Junni L. Zhang, Yichao Li, Ke Deng, Jun S. Liu
Summary: The CSTM model is proposed for text classification and text summarization tasks. Through analysis of the 20 Newsgroups dataset, it is found that CSTM outperforms a two-stage approach based on LDA and other existing extensions.
JOURNAL OF MACHINE LEARNING RESEARCH
(2021)
Article
Automation & Control Systems
Jay Kumar, Junming Shao, Rajesh Kumar, Salah Ud Din, Cobbinah B. Mawuli, Qinli Yang
Summary: This article presents an online semi-supervised classification algorithm (OSMTS) for multilabel text streams. It dynamically maintains the subspace of terms for each label with evolving micro-clusters, and uses non-parametric Dirichlet model with k nearest micro-clusters for multilabel classification. It handles gradual concept drift with the triangular time function, and abrupt concept drift by deleting outdated micro-clusters and creating new micro-clusters based on the Chinese restaurant process and Dirichlet process.
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS
(2023)
Article
Computer Science, Information Systems
Jiaqi Zhou, Qingpeng Zhang, Sijia Zhou, Xin Li, Xiaoquan Michael Zhang
Summary: Online health communities (OHCs) are valuable platforms for patients to exchange information and receive social support. However, this research explores how OHC content can impact patients' emotions and suggests that emotional support intended for one person may have negative effects on others. The study emphasizes the need for careful management of OHC-based interventions for depression patients to avoid unintended consequences. A deep learning model is proposed to differentiate emotional support from auxiliary content, and options for adjusting intervention variables are discussed to address the challenge of negative effects.
Article
Computer Science, Information Systems
Zhizhen Yao, Bin Zhang, Zhenni Ni, Feicheng Ma
Summary: This study investigates user health information seeking and sharing behaviors in an online diabetes community and identifies four different ways users engage in seeking and sharing. The results show that threads with self-disclosure tend to receive more replies and attract more user contributions. Additionally, there is significant overlap in information seeking and sharing related to symptoms, while less overlap is found in self-management and medication categories.
ASLIB JOURNAL OF INFORMATION MANAGEMENT
(2022)
Article
Psychology, Multidisciplinary
H. -J. Choi, Seohyun Kim, Allan S. Cohen, Jonathan Templin, Yasemin Copur-Gencturk
Summary: In this study, a statistical topic model and a diagnostic classification model were used to analyze a mixed item format test of English and Language Arts. It was found that students' mastery of reading skills could influence their writing patterns in response to constructed response items.
FRONTIERS IN PSYCHOLOGY
(2021)
Article
Business
Xuan Liu, Shan Lin, Shan Jiang, Ming Chen, Jia Li
Summary: The authors empirically examined the impact of social capital factors on patients' social support acquisition and found that structural connections have a lasting impact on both informational and emotional support. They also discovered that quantity of connections matters more than quality for informational support acquisition. The findings provide guidance for patients seeking social support online.
Article
Psychology, Multidisciplinary
Xiaoling Wei, Yuan-Teng Hsu
Summary: This study analyzed 38,457 physicians' profiles in a popular online healthcare community in China to identify factors associated with physician ratings and page views. The mention of research ability and foreign experience had a positive impact on physician ratings, while mentioning more clinical experience had a negative impact. In terms of page views, descriptions about foreign experience and committee position had a positive impact, whereas mentioning research ability had a negative impact.
FRONTIERS IN PSYCHOLOGY
(2022)