☆ 4.7 Article

Feature Selection for Polymer Informatics: Evaluating Scalability and Robustness of the FS4RVDD Algorithm Using Synthetic Polydisperse Data Sets

JOURNAL OF CHEMICAL INFORMATION AND MODELING (2020)

期刊

JOURNAL OF CHEMICAL INFORMATION AND MODELING

卷 60, 期 2, 页码 592-603

出版社

AMER CHEMICAL SOC

DOI: 10.1021/acs.jcim.9b00867

关键词

类别

Chemistry, Medicinal Chemistry, Multidisciplinary Computer Science, Information Systems Computer Science, Interdisciplinary Applications

资金

Argentinean National Council of Scientific and Technological Research (CONICET) [PIP 112-2017-0100829]
Universidad Nacional del Sur (UNS), Bahia Blanca, Argentina [PGI 24/N042, PGI 24/ZM17]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

The feature selection (FS) process is a key step in the Quantitative Structure Property Relationship (QSPR) modeling of physicochemical properties in cheminformatics. In particular, the inference of QSPR models for polymeric material properties constitutes a complex problem because of the uncertainty introduced by the polydispersity of these materials. The main challenge is how to capture the polydispersity information from the molecular weight distribution (MWD) curve to achieve a more effective computational representation of polymeric materials. To date, most of the existing QSPR techniques use only a single molecule to represent each of these materials, but polydispersity is not considered. Consequently, QSPR models obtained by these approaches are being oversimplified. For this reason, we introduced in a previous work a new FS algorithm called Feature Selection for Random Variables with Discrete Distribution (FS4RV(DD)), which allows dealing with polydisperse data. In the present paper, we evaluate both the scalability and the robustness of the FS4RV(DD) algorithm. In this sense, we generated synthetic data by varying and combining different parameters: the size of the database, the cardinality of the selected feature subsets, the presence of noise in the data, and the type of correlation (linear and nonlinear). Moreover, the performances obtained by FS4RV(DD) were contrasted with traditional FS techniques applied to different simplified representations of polymeric materials. The obtained results show that the FS4RV(DD) algorithm outperformed the traditional FS methods in all proposed scenarios, which suggest the need of an algorithm such as FS4RV(DD) to deal with the uncertainty that polydispersity introduces in human-made polymers.

作者

我是这篇论文的作者

点击您的名字以认领此论文并将其添加到您的个人资料中。

主要评分

4.7

评分不足

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

Robust feature selection using label enhancement and b-precision fuzzy rough sets for multilabel fuzzy decision system

Tengyu Yin, Hongmei Chen, Tianrui Li, Zhong Yuan, Chuan Luo

Summary: This paper investigates the use of soft labels for label enhancement in multilabel feature selection. By constructing a robust fuzzy neighborhood and utilizing a label enhancement strategy, the accuracy of feature selection in multilabel data can be improved. The research results demonstrate the good performance of this method in terms of classification performance and anti-noise ability.

FUZZY SETS AND SYSTEMS (2023)