4.7 Article Proceedings Paper

Comparison of vocabularies, representations and ranking algorithms for gene prioritization by text mining

向作者/读者索取更多资源

Motivation: Computational gene prioritization methods are useful to help identify susceptibility genes potentially being involved in genetic disease. Recently, text mining techniques have been applied to extract prior knowledge from text-based genomic information sources and this knowledge can be used to improve the prioritization process. However, the effect of various vocabularies, representations and ranking algorithms on text mining for gene prioritization is still an issue that requires systematic and comparative studies. Therefore, a benchmark study about the vocabularies, representations and ranking algorithms in gene prioritization by text mining is discussed in this article. Results: We investigated 5 different domain vocabularies, 2 text representation schemes and 27 linear ranking algorithms for disease gene prioritization by text mining. We indexed 288 177 MEDLINE titles and abstracts with the TXTGate text profiling system and adapted the benchmark dataset of the Endeavour gene prioritization system that consists of 618 disease-causing genes. Textual gene profiles were created and their performance for prioritization were evaluated and discussed in a comparative manner. The results show that inverse document frequency-based representation of gene term vectors performs better than the term-frequency inverse document-frequency representation. The eVOC and MESH domain vocabularies perform better than Gene Ontology, Online Mendelian Inheritance in Mans and London Dysmorphology Database. The ranking algorithms based on 1-SVM, Standard Correlation and Ward linkage method provide the best performance.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Correction Computer Science, Interdisciplinary Applications

Longitudinal machine learning modeling of MS patient trajectories improves predictions of disability progression (vol 208, 106180, 2021)

Edward De Brouwer, Thijs Becker, Yves Moreau, Eva Kubala Havrdova, Maria Trojano, Sara Eichau, Serkan Ozakbas, Marco Onofrj, Pierre Grammond, Jens Kuhle, Ludwig Kappos, Patrizia Sola, Elisabetta Cartechini, Jeannette Lechner-Scott, Raed Alroughani, Oliver Gerlach, Tomas Kalincik, Franco Granella, Francois Grand'Maison, Roberto Bergamaschi, Maria Jose Sa, Bart Van Wijmeersch, Aysun Soysal, Jose Luis Sanchez-Menoyo, Claudio Solaro, Cavit Boz, Gerardo Iuliano, Katherine Buzzard, Eduardo Aguera-Morales, Murat Terzi, Tamara Castillo Trivio, Daniele Spitaleri, Vincent Van Pesch, Vahid Shaygannejad, Fraser Moore, Celia Oreja-Guevara, Davide Maimone, Riadh Gouider, Tunde Csepany, Cristina Ramo-Tello, Liesbet Peeters

COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE (2022)

Editorial Material Biochemistry & Molecular Biology

The use of polygenic risk scores in pre-implantation genetic testing: an unproven, unethical practice

Francesca Forzano, Olga Antonova, Angus Clarke, Guido de Wert, Sabine Hentze, Yalda Jamshidi, Yves Moreau, Markus Perola, Inga Prokopenko, Andrew Read, Alexandre Reymond, Vigdis Stefansdottir, Carla van El, Maurizio Genuardi

Summary: Although polygenic risk score analyses on embryos (PGT-P) are being marketed to parents using in vitro fertilisation as a tool for selecting embryos with lower disease risk, the utility of PRS in this context is limited and lacks clinical research support. Patients need to be informed about the limitations of using PRSs in this way, and a societal debate about the selection of individual traits should take place before further implementation of this technique in this population.

EUROPEAN JOURNAL OF HUMAN GENETICS (2022)

Article Biochemistry & Molecular Biology

From genotype to phenotype in Arabidopsis thaliana: in-silico genome interpretation predicts 288 phenotypes from sequencing data

Daniele Raimondi, Massimiliano Corso, Piero Fariselli, Yves Moreau

Summary: In this paper, a novel Genome Interpretation paradigm called Galiana is proposed, which directly models the genotype-to-phenotype relationship. The model is trained using Whole Genome sequencing data to predict Arabidopsis thaliana phenotypes, particularly related to flowering traits. Galiana achieves better performances and larger phenotype coverage compared to other models, and it is also fully interpretable using Saliency Maps gradient-based approaches. Additionally, 36 novel genes associated with flowering traits are identified.

NUCLEIC ACIDS RESEARCH (2022)

Article Multidisciplinary Sciences

PyUUL provides an interface between biological structures and deep learning algorithms

Gabriele Orlando, Daniele Raimondi, Ramon Duran-Romana, Yves Moreau, Joost Schymkowitz, Frederic Rousseau

Summary: Structural bioinformatics lacks interfaces connecting with machine learning methods, hindering the application of modern neural network architectures. PyUUL is introduced as a library that translates biological structures into 3D tensors, enabling the application of state-of-the-art deep learning algorithms. The library also supports GPU and sparse calculation, and can be used to address typical bioinformatics problems.

NATURE COMMUNICATIONS (2022)

Article Biochemistry & Molecular Biology

Single-cell genome-wide concurrent haplotyping and copy-number profiling through genotyping-by-sequencing

Heleen Masset, Jia Ding, Eftychia Dimitriadou, Sophie Debrock, Olga Tsuiko, Katrien Smits, Karen Peeraer, Thierry Voet, Masoud Zamani Esteki, Joris R. Vermeesch

Summary: Single-cell whole-genome haplotyping allows simultaneous detection of haplotypes associated with monogenic diseases, chromosome copy-numbering, and revealed mosaicism in embryos and embryonic stem cells. This sequencing-based method could replace traditional genetic testing methods and has the potential to become more accessible and cost-effective.

NUCLEIC ACIDS RESEARCH (2022)

Article Clinical Neurology

Updated Results of the COVID-19 in MS Global Data Sharing Initiative Anti-CD20 and Other Risk Factors Associated With COVID-19 Severity

Steve Simpson-Yap, Ashkan Pirmani, Tomas Kalincik, Edward De Brouwer, Lotte Geys, Tina Parciak, Anne Helme, Nick Rijke, Jan A. Hillert, Yves Moreau, Gilles Edan, Sifat Sharmin, Tim Spelman, Robert McBurney, Hollie Schmidt, Arnfin B. Bergmann, Stefan Braune, Alexander Stahmann, Rod M. Middleton, Amber Salter, Bruce Bebo, Anneke van der Walt, Helmut Butzkueven, Serkan Ozakbas, Cavit Boz, Rana Karabudak, Raed Alroughani, Juan Rojas, Ingrid A. van der Mei, Guilherme Sciascia do Olival, Melinda Magyari, Ricardo N. Alonso, Richard S. Nicholas, Anibal S. Chertcoff, Ana Zabalza de Torres, Georgina Arrambide, Nupur Nag, Annabel Descamps, Lars Costers, Ruth Dobson, Aleisha Miller, Paulo Rodrigues, Vesna Prckovska, Giancarlo Comi, Liesbet M. Peeters

Summary: This study found that male sex, older age, progressive MS, and higher disability are associated with more severe COVID-19. The use of anti-CD20 medications is also linked to increased severity of COVID-19.

NEUROLOGY-NEUROIMMUNOLOGY & NEUROINFLAMMATION (2022)

Letter Biochemistry & Molecular Biology

Reply to Letter by Tellier et al., 'Scientific refutation of ESHG statement on embryo selection'

Francesca Forzano, Olga Antonova, Angus Clarke, Guido de Wert, Sabine Hentze, Yalda Jamshidi, Yves Moreau, Markus Perola, Inga Prokopenko, Andrew Read, Alexandre Reymond, Vigdis Stefansdottir, Carla van El, Maurizio Genuardi, European Soc Human Genetics

EUROPEAN JOURNAL OF HUMAN GENETICS (2023)

Article Pharmacology & Pharmacy

Natural Lipid Extracts as an Artificial Membrane for Drug Permeability Assay: In Vitro and In Silico Characterization

Anna Vincze, Gergely Dekany, Richard Bicsak, Andras Formanek, Yves Moreau, Gabor Koplanyi, Gergely Takacs, Gabor Katona, Diana Balogh-Weiser, Adam Arany, Gyorgy T. Balogh

Summary: The permeability of total and polar fractions of bovine heart and liver lipid extracts in the PAMPA model and their relationship with physicochemical descriptors of drug molecules were investigated. The results showed that there were subtle differences between total and polar lipids, while liver lipids had significantly different permeability compared to heart or brain lipid-based models. The study also found correlations between in silico descriptors of drug molecules and permeability values, providing insights into tissue-specific permeability.

PHARMACEUTICS (2023)

Article Biochemical Research Methods

Nonlinear data fusion over Entity-Relation graphs for Drug-Target Interaction prediction

Eugenio Mazzone, Yves Moreau, Piero Fariselli, Daniele Raimondi

Summary: In this study, we propose a new approach based on data fusion for reliable Drug-Target Interactions (DTIs) prediction. By extending the Matrix Factorization paradigm to the nonlinear inference over Entity-Relation graphs using the NXTfusion library, our models outperform most existing methods and have the flexibility to predict both binary DTIs and regression of real-valued drug-target affinity. Our findings suggest that DTI methods should be validated in settings that mimic real-life situations where predictions for previously unseen drugs, proteins, and drug-protein pairs are needed. Integration of heterogeneous information with our Entity-Relation data fusion approach is most beneficial in such contexts.

BIOINFORMATICS (2023)

Article Medical Informatics

The Journey of Data Within a Global Data Sharing Initiative: A Federated 3-Layer Data Analysis Pipeline to Scale Up Multiple Sclerosis Research

Ashkan Pirmani, Edward De Brouwer, Lotte Geys, Tina Parciak, Yves Moreau, Liesbet M. Peeters

Summary: This study presents a comprehensive data analysis pipeline driven by multiple stakeholders, which accommodates three prevalent data-sharing streams and has been successfully implemented in the global data sharing initiatives for multiple sclerosis and COVID-19. The pipeline facilitates data gathering from various sources and integrates them into a unified dataset for subsequent statistical analysis and secure data examination.

JMIR MEDICAL INFORMATICS (2023)

Article Mathematics, Applied

Smoothing unadjusted Langevin algorithms for nonsmooth composite potential functions

Susan Ghaderi, Masoud Ahookhosh, Adam Arany, Alexander Skupin, Panagiotis Patrinos, Yves Moreau

Summary: This paper proposes a gradient-based Markov Chain Monte Carlo (MCMC) method for sampling from the posterior distribution of problems with nonsmooth potential functions. By using smoothing techniques, the original potential function is approximated by a smooth function with the same critical points, leading to a smoothing ULA method called SULA. Non-asymptotic convergence results of SULA are established under mild assumptions on the original potential function. Numerical results demonstrate the promising performance of SULA on both synthetic and real chemoinformatics data.

APPLIED MATHEMATICS AND COMPUTATION (2024)

Meeting Abstract Clinical Neurology

Associations of DMT therapies with COVID-19 severity in multiple sclerosis: an international cohort study

S. Simpson-Yap, E. De Brouwer, T. Kalincik, N. Rijke, J. Hillert, C. Walton, G. Edan, Y. Moreau, T. Spelman, L. Peeters

MULTIPLE SCLEROSIS JOURNAL (2022)

Meeting Abstract Clinical Neurology

Associations of DMT therapies with COVID-19 severity in multiple sclerosis: an international cohort study

S. Simpson-Yap, E. De Brouwer, T. Kalincik, N. Rijke, J. Hillert, C. Walton, G. Edan, Y. Moreau, T. Spelman, L. Peeters

MULTIPLE SCLEROSIS JOURNAL (2022)

Article Biochemistry & Molecular Biology

HPMPdb: A machine learning-ready database of protein molecular phenotypes associated to human missense variants

Daniele Raimondi, Francesco Codice, Gabriele Orlando, Joost Schymkowitz, Frederic Rousseau, Yves Moreau

Summary: This article presents HPMPdb, a database containing detailed descriptions of human Single Amino acid Variants (SAVs) and their effects on protein molecular phenotypes. The database allows researchers to go beyond the existing Deleterious/Neutral prediction paradigm and build molecular phenotype predictors. Necessary means for training and testing models on the database are provided.

CURRENT RESEARCH IN STRUCTURAL BIOLOGY (2022)

Proceedings Paper Computer Science, Artificial Intelligence

Expressive Graph Informer Networks

Jaak Simm, Adam Arany, Edward De Brouwer, Yves Moreau

Summary: This paper introduces a route-based multi-attention mechanism that incorporates features from routes between node pairs, aiming at addressing the information bottleneck issue in deep learning from molecular graphs. The proposed method, called Graph Informer, is able to attend to nodes several steps away, and it outperforms existing approaches in two prediction tasks. Furthermore, a variant method called injective Graph Informer is developed and proven to be more powerful than the Weisfeiler-Lehman test for graph isomorphism.

MACHINE LEARNING, OPTIMIZATION, AND DATA SCIENCE (LOD 2021), PT II (2022)

暂无数据