Article
Computer Science, Theory & Methods
Tengyu Yin, Hongmei Chen, Tianrui Li, Zhong Yuan, Chuan Luo
Summary: This paper investigates the use of soft labels for label enhancement in multilabel feature selection. By constructing a robust fuzzy neighborhood and utilizing a label enhancement strategy, the accuracy of feature selection in multilabel data can be improved. The research results demonstrate the good performance of this method in terms of classification performance and anti-noise ability.
FUZZY SETS AND SYSTEMS
(2023)
Article
Computer Science, Artificial Intelligence
Shuang An, Qinghua Hu, Changzhong Wang
Summary: The PGDFRS model reduces the impact of noise on statistical minimum and maximum values by introducing the concept of probability granular distance, and it also creates a feature selection algorithm limited to two-dimensional space, avoiding difficulties in parameter setting in high-dimensional space.
APPLIED SOFT COMPUTING
(2021)
Article
Computer Science, Artificial Intelligence
Pei Huang, Zhaoming Kong, Mengying Xie, Xiaowei Yang
Summary: This paper proposes a robust unsupervised feature selection method that can effectively deal with the influence of many outliers on model performance. By learning a robust subspace that preserves local structure and addressing the shortcomings of traditional methods through outlier removal and Euclidean distance threshold setting, the superiority of the proposed method is demonstrated through experiments.
PATTERN RECOGNITION
(2023)
Article
Nanoscience & Nanotechnology
Zihao Wang, Arvin Bilegsaikhan, Ronald T. Jerozal, Tristan A. Pitt, Phillip J. Milner
Summary: Metal-organic frameworks (MOFs) are being increasingly used in synthetic chemistry as sustainable reagents and catalysts. This study systematically characterized the robustness of different MOFs towards various conditions representing synthetic organic chemistry. It found that azolate MOFs generally possess excellent chemical stabilities, while carboxylate and salicylate frameworks have complementary stabilities toward different reagents. These findings can guide the rational design of robust frameworks for synthetic chemistry applications and the development of new strategies for MOF modification.
ACS APPLIED MATERIALS & INTERFACES
(2021)
Article
Environmental Sciences
Zhen Shen, Jing Miao, Junjie Wang, Demei Zhao, Aowei Tang, Jianing Zhen
Summary: Mangrove forests are highly productive ecosystems with important ecological and economic value. Accurate mapping of mangrove forests is crucial for their management and restoration. This study utilizes multi-source remote sensing data to compare different feature selection methods and machine learning algorithms for accurate mangrove mapping. The results show that optical data performs better than SAR data, and the combination of optical and SAR data can further improve mapping accuracy. The XGBoost classification model achieves the highest overall accuracy. This research provides important insights and a reliable database for the restoration and protection of mangrove forests.
Article
Chemistry, Medicinal
Sherwin S. S. Ng, Yunpeng Lu
Summary: Oral bioavailability is an important pharmacokinetic property in drug discovery. Computational models using molecular descriptors, fingerprints, and machine-learning have been developed, but determining the right molecular descriptors requires domain expert knowledge and time for feature selection. Graph neural networks (GNN) can automatically extract important features, and in this study, we used GNN's automatic feature selection to predict oral bioavailability. By utilizing transfer learning and pre-training a model to predict solubility, we achieved improved prediction performance with average accuracy of 0.797, F1 score of 0.840, and AUC-ROC of 0.867, outperforming previous studies on predicting oral bioavailability with the same test data set.
JOURNAL OF CHEMICAL INFORMATION AND MODELING
(2023)
Article
Computer Science, Information Systems
Lin Sun, Mengmeng Li, Weiping Ding, En Zhang, Xiaoxia Mu, Jiucheng Xu
Summary: This paper proposes a novel adaptive fuzzy neighborhood-based feature selection method for imbalanced data with adaptive synthetic over-sampling. It addresses the limitations of manually setting fuzzy neighborhood radius and potential ignorance of boundary regions, and achieves effective classification results.
INFORMATION SCIENCES
(2022)
Article
Computer Science, Artificial Intelligence
Binbin Sang, Hongmei Chen, Lei Yang, Tianrui Li, Weihua Xu
Summary: This study investigates incremental feature selection approaches for dynamic ordered data, proposing a new conditional entropy with robustness as an evaluation metric for features and designing two incremental feature selection algorithms. Experimental results demonstrate the robustness of the proposed metric and the effectiveness and efficiency of the incremental algorithms in updating reducts for dynamic ordered data.
IEEE TRANSACTIONS ON FUZZY SYSTEMS
(2022)
Article
Automation & Control Systems
Puneet Mishra, Kristian Hovde Liland
Summary: A new method using iterative re-weighted partial least squares and covariates selection is presented for feature selective modelling in the presence of outliers. The method iteratively down-weights the outlying samples to minimize their influence on the squared covariance estimation for selecting robust features. It is shown that models based on such features outperform those using equal sample weights in terms of prediction accuracy. The method is tested in different scenarios and its performance is demonstrated on a real spectral data set.
JOURNAL OF CHEMOMETRICS
(2023)
Review
Computer Science, Information Systems
Utkarsh Mahadeo Khaire, R. Dhanalakshmi
Summary: Feature selection technique is a tool for understanding problems by analyzing relevant features, which can improve classifier performance and reduce computational load. However, the high correlation between features often leads to instability in traditional feature selection algorithms, resulting in reduced confidence in the selected features. Therefore, achieving high stability in feature selection algorithms is crucial.
JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES
(2022)
Article
Computer Science, Information Systems
Tian Yang, Jie Liang, Yan Pang, Pengyu Xie, Yuhua Qian, Ruili Wang
Summary: The curse of dimensionality is a bottleneck in big data and artificial intelligence. To address this issue, a more efficient approach to graph construction based on a description vector is proposed. The graph-based description vector (GDV) algorithm is developed for fast search and has lower time and space complexities than four existing algorithms, while maintaining the same level of classification accuracy.
INFORMATION SCIENCES
(2023)
Article
Computer Science, Artificial Intelligence
Carlos Eiras-Franco, Bertha Guijarro-Berdinas, Amparo Alonso-Betanzos, Antonio Bahamonde
Summary: The ReliefF-LSH algorithm simplifies the costliest step of the ReliefF algorithm by approximating the nearest neighbor graph using locality-sensitive hashing. It can process large data sets and obtains better results and is more generally applicable than the original ReliefF.
INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS
(2021)
Article
Biochemical Research Methods
Fengcheng Li, Ying Zhou, Ying Zhang, Jiayi Yin, Yunqing Qiu, Jianqing Gao, Feng Zhu
Summary: Mass spectrometry-based proteomic technique is essential in studying biological processes. However, current statistical frameworks neglect the reproducibility among identified features. Thus, developing a tool to identify reproducible and generalizable proteomic signatures is crucial.
BRIEFINGS IN BIOINFORMATICS
(2022)
Article
Computer Science, Artificial Intelligence
Pei Liang, Dingfei Lei, KwaiSang Chin, Junhua Hu
Summary: The current research on fuzzy rough sets for feature selection faces two major problems: the difficulty in evaluating the importance of feature subsets accurately in high-dimensional data space due to the use of multiple intersection operations of fuzzy relations, and the sensitivity to noisy information in the classical fuzzy rough sets model. To address these issues, this study proposes a radial basis function kernel-based similarity measure and introduces a relative classification uncertainty measure to improve the robustness of the fuzzy rough sets model.
KNOWLEDGE-BASED SYSTEMS
(2022)
Article
Computer Science, Artificial Intelligence
Zhehuang Huang, Jinjin Li
Summary: Feature selection is an important preprocessing method that reduces redundant information to improve classification performance. This study proposes a novel rough set model by integrating covering-based rough fuzzy sets with multi-granulation rough sets. Experimental results show that the proposed model outperforms other algorithms in terms of reduction rate and classification performance.
EXPERT SYSTEMS WITH APPLICATIONS
(2024)
Review
Biochemical Research Methods
Maria Virginia Sabando, Ignacio Ponzoni, Evangelos E. Milios, Axel J. Soto
Summary: With the consolidation of deep learning in drug discovery, several novel algorithms for learning molecular representations have been proposed. However, comparing different molecular embeddings and traditional representations is not straightforward, hindering the process of choosing suitable representations for QSAR modeling. The study conducted experiments comparing different embedding techniques and found that the predictive performance using molecular embeddings did not significantly surpass that of traditional representations.
BRIEFINGS IN BIOINFORMATICS
(2022)
Article
Materials Science, Multidisciplinary
Santiago A. Schustik, Fiorella Cravero, Ignacio Ponzoni, Monica F. Diaz
Summary: Refractive index is a crucial property for the design of new materials, and machine learning algorithms have been successfully applied in modeling, with the expert-in-the-loop approach showing promise in improving interpretability and generalizability of the models.
COMPUTATIONAL MATERIALS SCIENCE
(2021)
Article
Polymer Science
Santiago A. Schustik, Fiorella Cravero, M. Jimena Martinez, Ignacio Ponzoni, Monica F. Diaz
Summary: The PolyMaS software utilizes SMILES codes to generate linear macromolecules without limiting their length and molar mass, and can adjust the length of the polymer as needed.
Article
Mathematics, Applied
Jessica A. Carballido, Ignacio Ponzoni, Rocio L. Cecchini
Summary: This article presents an evolutionary method called PreCLAS for handling matrices that cannot be analyzed using conventional clustering, regression or classification methods in big data research. The method significantly reduces the number of rows in the matrix and intelligently performs unsupervised row selection, improving the effectiveness of classification and clustering methods.
LOGIC JOURNAL OF THE IGPL
(2023)
Article
Environmental Sciences
Yamila S. Grassi, Nelida B. Brignole, Monica F. Diaz
Summary: The paper provides a comprehensive analysis of vehicular fleet and mobile source emissions in Bahia Blanca, Argentina in 2018. Motorcycles were identified as the main source of CO, NMVOC, CO2 and CH4, while light commercial vehicles emitted the most amount of NOx. Despite the growth of the vehicular fleet, emissions in 2018 were lower than in 2013, attributed to the incorporation of more efficient emission control technology. However, this improvement resulted in increased GHGs emissions, presenting a continued challenge in the area.
SCIENCE OF THE TOTAL ENVIRONMENT
(2021)
Article
Chemistry, Physical
Fiorella Cravero, Monica F. Diaz, Ignacio Ponzoni
Summary: This paper introduces an artificial intelligence-based method for predicting the mechanical properties of the tensile test. By using machine learning tools, visual analytics methods, and expert-in-the-loop strategies, a QSPR model composed of five molecular descriptors is proposed, achieving a high correlation coefficient.
JOURNAL OF CHEMICAL PHYSICS
(2022)
Review
Food Science & Technology
Virginia Cardoso Schwindt, Mauricio M. Coletto, Monica F. Diaz, Ignacio Ponzoni
Summary: Food informatics is playing a significant role in improving the quality and efficiency of the food industry, particularly in the sensory analysis of wines. Machine learning models have been developed to predict wine-related characteristics, but accurate and sufficient data is still needed for reliable predictions. The use of quantitative structure-odour relationship (QSOR) models shows promise in quantitatively predicting wine sensory analysis.
FOOD AND BIOPROCESS TECHNOLOGY
(2023)
Article
Chemistry, Medicinal
Maria Jimena Martinez, Maria Virginia Sabando, Axel J. Soto, Carlos Roca, Carlos Requena-Triguero, Nuria E. Campillo, Juan A. Paez, Ignacio Ponzoni
Summary: The Ames mutagenicity test is widely used to estimate the mutagenic potential of drug candidates. However, most existing in silico models for predicting mutagenicity do not consider the test results of individual experiments conducted for each strain. In this study, we propose a novel neural-based QSAR model that leverages experimental results from different strains involved in the Ames test using multitask learning. Our model outperforms single-task modeling strategies and ensemble models built from individual strains.
JOURNAL OF CHEMICAL INFORMATION AND MODELING
(2022)
Article
Polymer Science
Maria Jimena Martinez, Roi Naveiro, Axel J. Soto, Pablo Talavante, Shin-Ho Kim Lee, Ramon Gomez Arrayas, Mario Franco, Pablo Mauleon, Hector Lozano Ordonez, Guillermo Revilla Lopez, Marco Bernabei, Nuria E. Campillo, Ignacio Ponzoni
Summary: Artificial intelligence (AI) is revolutionizing the discovery of new materials, particularly in the field of virtual screening of chemical libraries. This study developed computational models that can predict the dispersancy efficiency of oil and lubricant additives, a critical property in their design. The proposed models combined machine learning techniques with visual analytics strategies in an interactive tool, aiding domain experts in decision-making. The best-performing model achieved a mean absolute error of 5.50±0.34 and a root mean square error of 7.56±0.47, demonstrating its effectiveness in predicting dispersancy efficiency.
Article
Genetics & Heredity
Ivan Petrini, Rocio L. Cecchini, Marilina Mascaro, Ignacio Ponzoni, Jessica A. Carballido
Summary: The likelihood of being diagnosed with thyroid cancer has increased in recent years. The aim of this study is to identify potential genes relevant to Papillary Thyroid Carcinoma (PTC) through bioinformatic analysis. Four genes, PTGFR, ZMAT3, GABRB2, and DPP6, were found to be highly relevant and worthy of further investigation.
Article
Chemistry, Multidisciplinary
Ignacio Ponzoni, Juan Antonio Paez Prosper, Nuria E. Campillo
Summary: Artificial intelligence (AI) is increasingly impacting drug discovery. However, in order to be accepted by the medicinal chemistry community, it is important for AI models to be able to explain their predictions in a trustworthy manner. Therefore, research and development of explainable artificial intelligence (XAI) methods have become crucial. This article provides a comprehensive literature review on explanation methodologies for AI models in the field of drug discovery, including a new taxonomy of XAI methods, and introduces visualization strategies for XAI in the chemical domain.
WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE
(2023)
Proceedings Paper
Computer Science, Interdisciplinary Applications
Ivan Petrini, Rocio L. Cecchini, Marilina Mascaro, Ignacio Ponzoni, Jessica A. Carballido
Summary: This article presents a comprehensive and comparative analysis of thyroid cancer datasets, including stages for feature selection, hypothesis testing, and classification. The results suggest that some genes, especially the HINT3 gene, are worth further investigation.
BIOINFORMATICS AND BIOMEDICAL ENGINEERING, PT II
(2022)