4.4 Article

A combined Fisher and Laplacian score for feature selection in QSAR based drug design using compounds with known and unknown activities

期刊

JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN
卷 32, 期 2, 页码 375-384

出版社

SPRINGER
DOI: 10.1007/s10822-017-0094-6

关键词

Semi-supervised; Feature selection; Fisher criterion; Graph Laplacian; QSAR models

资金

  1. Hematology and Oncology Research Center of Shahid Sadoughi University of Medical Sciences [5666]

向作者/读者索取更多资源

Quantitative structure-activity relationship (QSAR) is an effective computational technique for drug design that relates the chemical structures of compounds to their biological activities. Feature selection is an important step in QSAR based drug design to select the most relevant descriptors. One of the most popular feature selection methods for classification problems is Fisher score which aim is to minimize the within-class distance and maximize the between-class distance. In this study, the properties of Fisher criterion were extended for QSAR models to define the new distance metrics based on the continuous activity values of compounds with known activities. Then, a semi-supervised feature selection method was proposed based on the combination of Fisher and Laplacian criteria which exploits both compounds with known and unknown activities to select the relevant descriptors. To demonstrate the efficiency of the proposed semi-supervised feature selection method in selecting the relevant descriptors, we applied the method and other feature selection methods on three QSAR data sets such as serine/threonine-protein kinase PLK3 inhibitors, ROCK inhibitors and phenol compounds. The results demonstrated that the QSAR models built on the selected descriptors by the proposed semi-supervised method have better performance than other models. This indicates the efficiency of the proposed method in selecting the relevant descriptors using the compounds with known and unknown activities. The results of this study showed that the compounds with known and unknown activities can be helpful to improve the performance of the combined Fisher and Laplacian based feature selection methods.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.4
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Computer Science, Information Systems

Semi-supervised sparse feature selection via graph Laplacian based scatter matrix for regression problems

Razieh Sheikhpour, Mehdi Agha Sarram, Elnaz Sheikhpour

INFORMATION SCIENCES (2018)

Article Telecommunications

A Novel Network Coding Algorithm to Improve TCP in Wireless Networks

Azam Jannesari, Mehdi Agha Sarram, Razieh Sheikhpour

WIRELESS PERSONAL COMMUNICATIONS (2020)

Article Automation & Control Systems

Sparse feature selection in multi-target modeling of carbonic anhydrase isoforms by exploiting shared information among multiple targets

Razieh Sheikhpour, Sajjad Gharaghani, Elmira Nazarshodeh

CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS (2020)

Article Computer Science, Information Systems

A robust graph-based semi-supervised sparse feature selection method

Razieh Sheikhpour, Mehdi Agha Sarram, Sajjad Gharaghani, Mohammad Ali Zare Chahooki

INFORMATION SCIENCES (2020)

Article Automation & Control Systems

BRNS plus SSFSM-DTI: A hybrid method for drug-target interaction prediction based on balanced reliable negative samples and semi-supervised feature selection

Mohammad Morovvati Sharifabad, Razieh Sheikhpour, Sajjad Gharaghani

Summary: De novo drug discovery is a costly and time-consuming process. Repositioning existing drugs for new applications can reduce the time and cost of finding new drugs. Predicting drug-target interactions (DTIs) can facilitate drug repositioning, but there are challenges due to the diversity of drug descriptors and protein features, as well as the lack of experimentally-confirmed non-interacting drug-target pairs as negative samples. This study presents a modified algorithm for extracting balanced negative samples and a semi-supervised feature selection method, which outperform other methods on benchmark DTI datasets.

CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS (2022)

Article Biochemistry & Molecular Biology

Proteochemometrics modeling for prediction of the interactions between caspase isoforms and their inhibitors

Zahra Bastami, Razieh Sheikhpour, Parvin Razzaghi, Ali Ramazani, Sajjad Gharaghani

Summary: Caspases are important enzymes involved in inflammation and cell death processes. This study used Proteochemometrics Modeling to summarize and predict the interactions between caspases and ligands. The ensemble model showed superior performance compared to other models.

MOLECULAR DIVERSITY (2023)

Article Pharmacology & Pharmacy

Drug-target interaction prediction using reliable negative samples and effective feature selection methods

Mohammad Morovvati Sharifabad, Razieh Sheikhpour, Sajjad Gharaghani

Summary: This study proposes a reliable algorithm for selecting negative samples in drug-target interaction prediction, which demonstrates superior performance and highlights the significant improvement in learning process performance by correctly selecting negative samples.

JOURNAL OF PHARMACOLOGICAL AND TOXICOLOGICAL METHODS (2022)

Article Soil Science

Semi-supervised learning for the spatial extrapolation of soil information

Ruhollah Taghizadeh-Mehrjardi, Razieh Sheikhpour, Mojtaba Zeraatpisheh, Alireza Amirian-Chakan, Norair Toomanian, Ruth Kerry, Thomas Scholten

Summary: Digital soil mapping can be used to predict soils at unvisited sites, but problems arise when predictions are needed in areas without any soil observations. A new semi-supervised learning approach was found to outperform supervised learning in extrapolating soil classes in target areas, resulting in higher accuracy and lower uncertainty.

GEODERMA (2022)

Article Computer Science, Software Engineering

A robust method for coherent and non-coherent source number detection using a special Hankel-based covariance matrix

Roohallah Fazli, Hadi Owlia, Razieh Sheikhpour

Summary: A robust algorithm for source number estimation based on the formation of the Hankel covariance matrix is presented. The proposed algorithm can handle both non-coherent and fully coherent sources, and it outperforms competing methods in numerical simulations.

INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING (2023)

Article Computer Science, Artificial Intelligence

A local spline regression-based framework for semi-supervised sparse feature selection

Razieh Sheikhpour

Summary: Feature selection is widely used in machine learning applications to select relevant features from data sets. Recently, there has been considerable research interest in semi-supervised sparse feature selection based on graph Laplacian, which uses the correlation between features. This paper proposes a spline regression-based framework for semi-supervised sparse feature selection, which uses mixed convex and non-convex t2,p-norm regularization to select relevant features and considers feature correlation. The framework retains the geometry structure of labeled and unlabeled data using local spline regression and encodes the data distribution. A unified iterative algorithm is presented to solve the framework, and its convergence is theoretically and experimentally proved. Experiments on several data sets demonstrate the effectiveness of the framework in selecting the most relevant and discriminative features.

KNOWLEDGE-BASED SYSTEMS (2023)

Article Pediatrics

Prediction of blood cancer using leukemia gene expression data and sparsity-based gene selection methods

Sanaz Mehrabani, Morteza Zangeneh Soroush, Negin Kheiri, Razieh Sheikhpour, Mahshid Bahrami

Summary: This study aimed to predict blood cancer using leukemia gene expression data and a robust l2,p-norm sparsity-based gene selection method. The results showed that this method can correctly classify all samples of acute myeloid leukemia (AML) and lymphoblastic leukemia (ALL), and identified seven important genes, with PRTN3 gene being the most important. This method can be useful for predicting leukemia and examining the expression levels of related genes.

IRANIAN JOURNAL OF PEDIATRIC HEMATOLOGY AND ONCOLOGY (2023)

Article Computer Science, Artificial Intelligence

Hessian-based semi-supervised feature selection using generalized uncorrelated constraint

Razieh Sheikhpour, Kamal Berahmand, Saman Forouzandeh

Summary: Feature selection aims to eliminate redundant features and choose informative ones. Semi-supervised feature selection becomes important as it utilizes labeled and unlabeled data. We propose two frameworks, one based on Hessian matrix and the other on Hessian-Laplacian combination, for semi-supervised feature selection. Our frameworks utilize regularization and constraint techniques to select informative features and maintain the topological structure of data. Experimental results demonstrate the effectiveness of our frameworks in selecting informative features.

KNOWLEDGE-BASED SYSTEMS (2023)

Article Pediatrics

Gene Identification from Microarray Data for Diagnosis of Acute Myeloid and Lymphoblastic Leukemia Using a Sparse Gene Selection Method

Razieh Sheikhpour, Roohallah Fazli, Sanaz Mehrabani

Summary: This study identified important genes for the diagnosis of acute myeloid and lymphoblastic leukemia using microarray data and a sparse feature selection method. The results showed that AML and ALL can be accurately diagnosed with high accuracy using machine learning methods. The investigation of selected genes in this study may be helpful for the diagnosis of ALL and AML.

IRANIAN JOURNAL OF PEDIATRIC HEMATOLOGY AND ONCOLOGY (2021)

Article Engineering, Multidisciplinary

A noise robust convolutional neural network for image classification

Mohammad Momeny, Ali Mohammad Latif, Mehdi Agha Sarram, Razieh Sheikhpour, Yu Dong Zhang

Summary: In this paper, a Noise-Robust Convolutional Neural Network (NR-CNN) is proposed to classify noisy images without preprocessing, by adding a noise map layer and an adaptive resize layer, and considering noise in different components of the network. The proposed NR-CNN improves the classification performance of noisy images and network training speed.

RESULTS IN ENGINEERING (2021)

暂无数据