3.9 Article

Knowledge discovery through directed probabilistic topic models: a survey

Journal

FRONTIERS OF COMPUTER SCIENCE IN CHINA
Volume 4, Issue 2, Pages 280-301

Publisher

HIGHER EDUCATION PRESS
DOI: 10.1007/s11704-009-0062-y

Keywords

text corpora; Directed Probabilistic Topic Models (DPTMs); soft clustering; unsupervised learning; knowledge discovery

Funding

  1. National Natural Science Foundation of China [90604025, 60703059]
  2. Chinese National Key Foundation Research and Development Plan [2007CB310803]
  3. Higher Education Commission (HEC), Pakistan

Ask authors/readers for more resources

Graphical models have become the basic framework for topic based probabilistic modeling. Especially models with latent variables have proved to be effective in capturing hidden structures in the data. In this paper, we survey an important subclass Directed Probabilistic Topic Models (DPTMs) with soft clustering abilities and their applications for knowledge discovery in text corpora. From an unsupervised learning perspective, topics are semantically related probabilistic clusters of words in text corpora; and the process for finding these topics is called topic modeling. In topic modeling, a document consists of different hidden topics and the topic probabilities provide an explicit representation of a document to smooth data from the semantic level. It has been an active area of research during the last decade. Many models have been proposed for handling the problems of modeling text corpora with different characteristics, for applications such as document classification, hidden association finding, expert finding, community discovery and temporal trend analysis. We give basic concepts, advantages and disadvantages in a chronological order, existing models classification into different categories, their parameter estimation and inference making algorithms with models performance evaluation measures. We also discuss their applications, open challenges and future directions in this dynamic area of research.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

3.9
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Information Science & Library Science

Investigating the citation advantage of author-pays charges model in computer science research: a case study of Elsevier and Springer

Tehmina Amjad, Mehwish Sabir, Azra Shamim, Masooma Amjad, Ali Daud

Summary: This study compared the citation advantage of open access and toll access articles in four subfields of computer science, finding that open access articles have a higher citation advantage and the advantage varies among different subfields. The results validate the positive movement towards open access articles in the field of computer science.

LIBRARY HI TECH (2022)

Article Multidisciplinary Sciences

Measuring the impact of COVID-19 surveillance variables over the international oil market

Abdulrahman A. Alshdadi, Malik Khizar Hayat, Ali Daud, Ameen Banjar, Hussain Dawood

Summary: The COVID-19 pandemic has had a significant impact on the international oil market, causing fluctuations in crude oil prices and triggering a global economic crisis. This study aims to investigate the short-term and long-term effects of COVID-19 on the international oil market by analyzing the correlation between surveillance variables and international crude oil prices. The findings will provide important guidance for policymakers in the oil market.

INTERNATIONAL JOURNAL OF ADVANCED AND APPLIED SCIENCES (2022)

Article Computer Science, Information Systems

Reduction of random-valued impulse noise by using multi-structured textons

Hussain Dawood, Ali Daud, Hassan Dawood, Marium Azhar

Summary: This paper presents an iterative two-stage image denoising technique based on multi-structured textons for the denoising of random-valued impulse noise. The proposed method identifies noisy pixels using multiple textons and restores noise-free pixels using spatially linked directional similarity. Experimental results demonstrate the superiority of the proposed method in denoising performance.

MULTIMEDIA TOOLS AND APPLICATIONS (2022)

Article Computer Science, Interdisciplinary Applications

Citation burst prediction in a bibliometric network

Tehmina Amjad, Nafeesa Shahid, Ali Daud, Asma Khatoon

Summary: This study aims to investigate the impact of several features on the number of citations for articles published in journals or conferences, as well as to predict future citations. The findings show that for journal publications, author first-year citations and author total citation are the most important features, while author total citation is more effective for conference publications.

SCIENTOMETRICS (2022)

Article Computer Science, Interdisciplinary Applications

Indexing important drugs from medical literature

Riad Alharbey, Jong In Kim, Ali Daud, Min Song, Abdulrahman A. Alshdadi, Malik Khizar Hayat

Summary: Health maintenance is crucial for society, and the progress in biomedical field has led to a wealth of medical information. Extracting meaningful insights, especially related to gene-drug relationships, is important for recent medicine. This study proposes a new measure, Drug-Index, to detect gene-drug relations, which is useful for drug discovery, diagnoses, and personalized treatment.

SCIENTOMETRICS (2022)

Article Automation & Control Systems

DBP-DeepCNN: Prediction of DNA-binding proteins using wavelet-based denoising and deep learning

Farman Ali, Harish Kumar, Shruti Patil, Aftab Ahmed, Ameen Banjar, Ali Daud

Summary: In this study, a deep learning-based predictor (DBP-DeepCNN) is proposed to improve the prediction of DNA-binding proteins (DBPs). By using a novel feature extraction method and training with various models, the predictor achieved higher accuracies on both training and independent datasets, indicating its potential for large scale DBP prediction and promising therapeutic strategies for chronic diseases.

CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS (2022)

Article Automation & Control Systems

iDBP-PBMD: A machine learning model for detection of DNA-binding proteins by extending compression techniques into evolutionary profile

Ameen Banjar, Farman Ali, Omar Alghushairy, Ali Daud

Summary: DNA-binding proteins (DBPs) play crucial roles in DNA transcription, recombination, and replication, and are associated with diseases like AIDS/HIV, cancer, and asthma. This research encoded DBPs using different feature descriptors and eliminated noisy and redundant features using compression techniques. The resulting features were used to train models with XGBoost and ERT classifiers. The study demonstrated the superiority of this approach over previous methods.

CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS (2022)

Article Automation & Control Systems

Comparative analysis of the existing methods for prediction of antifreeze proteins

Adnan Khan, Jamal Uddin, Farman Ali, Ameen Banjar, Ali Daud

Summary: Antifreeze proteins (AFPs) are found in various organisms and play a crucial role in preventing the formation of ice crystals. The development of accurate predictors for identifying AFPs is essential. This review article provides a comprehensive summary of existing AFP predictors, including their applied datasets, feature descriptors, model training classifiers, performance assessment parameters, and web servers. The drawbacks of current predictors are highlighted, and suggestions for future improvements, such as more effective feature descriptors and efficient classifiers, are discussed.

CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS (2023)

Article Computer Science, Cybernetics

Identifying Rising Stars via Supervised Machine Learning

Ali Daud, Naveed ul Islam, Xin Li, Imran Razzak, Malik Khizar Hayat

Summary: Identifying rising stars is important for the growth of any organization. This article explores the classification of rising business managers (RBMs) by examining the features of co-business managers (Co-BMs), using machine learning techniques. Experimental results show that generative models, particularly Bayesian networks, produce better predictions for the dataset based on average revenue.

IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS (2023)

Article Information Science & Library Science

OpenRank - a novel approach to rank universities using objective and publicly verifiable data sources

Muhammad Sajid Qureshi, Ali Daud, Malik Khizar Hayat, Muhammad Tanvir Afzal

Summary: This research aims to enhance the credibility of academic rankings by using objective indicators based on publicly verifiable data sources. The proposed ranking methodology, OpenRank, uses objective indicators from two well-known data repositories, ArnetMiner and DBpedia. The resulting academic ranking reflects common tendencies of international rankings. Evaluation of the methodology shows its effectiveness and reproducibility with low data collection cost.

LIBRARY HI TECH (2023)

Article Computer Science, Cybernetics

Citation Count Is Not Enough: Citation's Context-Based Scientific Impact Evaluation

Ali Daud, Sehrish Ghaffar, Tehmina Amjad

Summary: Qualitative analysis of citations received by a scientific manuscript is challenging. Most existing approaches for scientific impact evaluation only use quantitative parameters, such as the number of citations, and ignore the qualitative feature of citation context. In this study, a context-based article impact factor (CBAIF) is proposed to evaluate articles based on the context of citations, considering positive, negative, or neutral contexts and the conflict-of-interest relationship between citing and cited authors. Experimental results show that CBAIF provides more accurate rankings compared to the article impact factor (AIF) without considering the context of citations.

IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS (2022)

Article Computer Science, Information Systems

Prediction of Movie Quality via Adaptive Voting Classifier

Muhammad Shahzad Faisal, Atif Rizwan, Khalid Iqbal, Heba Fasihuddin, Ameen Banjar, Ali Daud

Summary: This paper discusses the challenges of information retrieval from social web data and proposes a method to predict high-quality/popular movies using various features. Additionally, an enhanced optimization-based voting classifier is introduced to improve the performance of the proposed features.

IEEE ACCESS (2022)

Article Information Science & Library Science

Measuring the impact of co-author count on citation count of research publications

Ali Daud, Malik Khizar Hayat, Abdulrahman A. Alshdadi, Ameen Banjar, Wael Mansour Alharbi

Summary: Co-authored research work has higher visibility and impact compared to individual published work. This study analyzes the correlation between the number of co-authors in a published paper and the number of times the paper is cited. The analysis is divided into three categories and the results show that most research fields have increasing citability with a greater number of co-authors.

COLLNET JOURNAL OF SCIENTOMETRICS AND INFORMATION MANAGEMENT (2022)

Article Computer Science, Information Systems

Ontological Modeling and Semantic Search in Quran

Ali Daud, Muhammad Hafeez Ullah, Ameen Reda Banjar, Abdulrahman A. Alshdadi

Summary: This paper introduces an ontology development method considering Quran, Hadith, and Tafsir, and performs semantic search on Zakat as a use case. The results show that the proposed method meets the expectations.

INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY (2022)

Article Computer Science, Cybernetics

Advanced Learning Analytics: Aspect Based Course Feedback Analysis of MOOC Forums to Facilitate Instructors

Tehmina Amjad, Zainab Shaheen, Ali Daud

Summary: The use of Massive Online Open Courses (MOOCs) has increased significantly in recent times, particularly after the COVID-19 pandemic. To address the lack of face-to-face interaction, MOOC platforms provide a discussion forum for students to share their thoughts and problems. Instructors must closely monitor student performance and analyze discussion threads to identify specific problem areas. This study proposes a method that categorizes threads using topic modeling and performs sentiment analysis on comments to improve teaching methodology and enhance student understanding.

IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS (2022)

No Data Available