4.7 Article Proceedings Paper

Onto2Vec: joint vector-based representation of biological entities and their ontology-based annotations

期刊

BIOINFORMATICS
卷 34, 期 13, 页码 52-60

出版社

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/bty259

关键词

-

资金

  1. King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) [FCC/1/1976-04, FCC/1/1976-06, URF/1/3450-01, URF/1/3454-01]

向作者/读者索取更多资源

Motivation: Biological knowledge is widely represented in the form of ontology-based annotations: ontologies describe the phenomena assumed to exist within a domain, and the annotations associate a (kind of) biological entity with a set of phenomena within the domain. The structure and information contained in ontologies and their annotations make them valuable for developing machine learning, data analysis and knowledge extraction algorithms; notably, semantic similarity is widely used to identify relations between biological entities, and ontology-based annotations are frequently used as features in machine learning applications. Results: We propose the Onto2Vec method, an approach to learn feature vectors for biological entities based on their annotations to biomedical ontologies. Our method can be applied to a wide range of bioinformatics research problems such as similarity-based prediction of interactions between proteins, classification of interaction types using supervised learning, or clustering. To evaluate Onto2Vec, we use the gene ontology (GO) and jointly produce dense vector representations of proteins, the GO classes to which they are annotated, and the axioms in GO that constrain these classes. First, we demonstrate that Onto2Vec-generated feature vectors can significantly improve prediction of protein-protein interactions in human and yeast. We then illustrate how Onto2Vec representations provide the means for constructing data-driven, trainable semantic similarity measures that can be used to identify particular relations between proteins. Finally, we use an unsupervised clustering approach to identify protein families based on their Enzyme Commission numbers. Our results demonstrate that Onto2Vec can generate high quality feature vectors from biological entities and ontologies. Onto2Vec has the potential to significantly outperform the state-of-the-art in several predictive applications in which ontologies are involved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Oncology

Associating transcriptomics data with inflammatory markers to understand tumour microenvironment in hepatocellular carcinoma

Basak Bahcivanci, Roshan Shafiha, Georgios V. Gkoutos, Animesh Acharjee

Summary: Liver cancer is the fourth leading cause of cancer-related death globally, with hepatocellular carcinoma (HCC) accounting for the majority of cases. However, current immunotherapy approaches are only partially effective due to the immunosuppressive nature of the tumor microenvironment (TME). This study aims to understand the TME in HCC and discover new immune markers for overcoming immunotherapy resistance.

CANCER MEDICINE (2023)

Article Radiology, Nuclear Medicine & Medical Imaging

Automated Multimodal Machine Learning for Esophageal Variceal Bleeding Prediction Based on Endoscopy and Structured Data

Yu Wang, Yu Hong, Yue Wang, Xin Zhou, Xin Gao, Chenyan Yu, Jiaxi Lin, Lu Liu, Jingwen Gao, Minyue Yin, Guoting Xu, Xiaolin Liu, Jinzhou Zhu

Summary: This study evaluated the feasibility of automated multimodal machine learning in predicting esophageal variceal (EV) bleeding. By integrating endoscopic images and clinical variables, the study developed deep learning models and multimodal machine learning models, and compared them with existing clinical indices. The results showed that the multimodal machine learning models achieved higher accuracy and sensitivity, making them a useful tool for predicting EV bleeding.

JOURNAL OF DIGITAL IMAGING (2023)

Review Surgery

Single-centre review of the management of intra-thoracic oesophageal perforation in a tertiary oesophageal unit: paradigm shift, short- and long-term outcomes over 15 years

Vasileios Charalampakis, Victor Roth Cardoso, Alistair Sharples, Maha Khalid, Luke Dickerson, Tom Wiggins, Georgios Gkoutos, Olga Tucker, Paul Super, Martin Richardson, Rajwinder Nijjar, Rishi Singhal

Summary: Oesophageal perforation is a rare and serious condition, and early diagnosis and treatment are crucial for patient survival. In recent years, there has been a significant shift in the management of iatrogenic perforations, with a more liberal use of CT for early diagnosis and a higher rate of oesophageal stenting as the primary treatment option.

SURGICAL ENDOSCOPY AND OTHER INTERVENTIONAL TECHNIQUES (2023)

Article Biochemical Research Methods

scBKAP: A Clustering Model for Single-Cell RNA-Seq Data Based on Bisecting K-Means

Xiaolin Wang, Hongli Gao, Ren Qi, Ruiqing Zheng, Xin Gao, Bin Yu

Summary: This study proposes a novel clustering method called scBKAP, which addresses the issues of high dropout rate and curse of dimensionality in scRNA-seq data by utilizing an autoencoder network and a dimensionality reduction model MPDR. Comprehensive experiments on 21 public scRNA-seq datasets and simulated datasets demonstrate the superior performance of scBKAP over nine state-of-the-art single-cell clustering methods.

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS (2023)

Article Multidisciplinary Sciences

A comprehensive benchmarking with practical guidelines for cellular deconvolution of spatial transcriptomics

Haoyang Li, Juexiao Zhou, Zhongxiao Li, Siyuan Chen, Xingyu Liao, Bin Zhang, Ruochi Zhang, Yu Wang, Shiwei Sun, Xin Gao

Summary: Spatial transcriptomics technologies are utilized to analyze transcriptomes while preserving spatial information, providing high-resolution characterization of transcriptional patterns and tissue architecture reconstruction. Cellular heterogeneity plays a crucial role in deciphering spatial patterns of cell types, and various related methods have been proposed. In this study, we benchmarked 18 existing methods for cellular deconvolution using 50 real-world and simulated datasets, evaluating their accuracy, robustness, and usability. CARD, Cell2location, and Tangram showed the best performance for the cellular deconvolution task. Additionally, we provide decision-tree-style guidelines and recommendations for method selection and their additional features, facilitating users in choosing the optimal method for their specific needs. This comprehensive evaluation of 18 state-of-the-art methods for cellular deconvolution in spatial transcriptomics and the accompanying decision-tree-style guidelines and recommendations are valuable resources for researchers in this field.

NATURE COMMUNICATIONS (2023)

Article Multidisciplinary Sciences

Exploiting machine learning models to identify novel Alzheimer's disease biomarkers and potential targets

Hind Alamro, Maha A. Thafar, Somayah Albaradei, Takashi Gojobori, Magbubah Essack, Xin Gao

Summary: Despite being the most common cause of dementia and impaired cognitive function, an effective treatment for Alzheimer's disease (AD) remains elusive. This study developed a computational method that combines multiple hub gene ranking methods, feature selection methods, and machine learning to identify biomarkers and targets for AD. The results showed that feature selection methods outperformed hub gene sets in prediction performance, and a small number of genes were able to accurately distinguish AD samples from healthy controls.

SCIENTIFIC REPORTS (2023)

Article Computer Science, Artificial Intelligence

TriNet: A tri-fusion neural network for the prediction of anticancer and antimicrobial peptides

Wanyun Zhou, Yufei Liu, Yingxin Li, Siqi Kong, Weilin Wang, Boyun Ding, Jiyun Han, Chaozhou Mou, Xin Gao, Juntao Liu

Summary: The paper introduces TriNet, a tri-fusion neural network for accurate prediction of anticancer peptides and antimicrobial peptides. TriNet utilizes three types of features and training modules to improve predictions. Experimental results demonstrate the superiority of TriNet compared to other methods.

PATTERNS (2023)

Article Biology

Identification of cell subpopulations associated with disease phenotypes from scRNA-seq data using PACSI

Chonghui Liu, Yan Zhang, Xin Gao, Guohua Wang

Summary: PACSI is an efficient method for identifying cell subpopulations associated with disease phenotypes.

BMC BIOLOGY (2023)

Article Biochemistry & Molecular Biology

Influence of autozygosity on common disease risk across the phenotypic spectrum

Daniel S. Malawsky, Eva van Walree, Benjamin M. Jacobs, Teng Hiang Heng, Qin Qin Huang, Ataf H. Sabir, Saadia Rahman, Saghira Malik Sharif, Ahsan Khan, Masa Umicevic Mirkov, Hiroyuki Kuwahara, Xin Gao, Fowzan S. Alkuraya, Danielle Posthuma, William G. Newman, Christopher J. Griffiths, Rohini Mathur, David A. van Heel, Sarah Finer, Jared O'Connell, Hilary C. Martin

Summary: This study investigated the association between autozygosity and common diseases, and discovered an effective method to reduce confounding. The results suggest that autozygosity has significant impact on common diseases, especially for type 2 diabetes among British Pakistanis.
Article Computer Science, Artificial Intelligence

Spectrum-irrelevant fine-grained representation for visible-infrared person re-identification

Jiahao Gong, Sanyuan Zhao, Kin-Man Lam, Xin Gao, Jianbing Shen

Summary: Visible-infrared person re-identification (VI-ReID) is a challenging task for full-time intelligent surveillance systems due to the large cross-modal discrepancy. Existing methods suffer from heterogeneous structures and different spectra. To address this issue, we propose the Spectrum-Insensitive Data Augmentation (SIDA) strategy, which alleviates disturbance in visible and infrared spectra and forces the network to learn spectrum-irrelevant features. Our method achieves state-of-the-art performance on two visible-infrared cross-modal Re-ID datasets.

COMPUTER VISION AND IMAGE UNDERSTANDING (2023)

Article Multidisciplinary Sciences

A unified method to revoke the private data of patients in intelligent healthcare with audit to forget

Juexiao Zhou, Haoyang Li, Xingyu Liao, Bin Zhang, Wenjia He, Zhongxiao Li, Longxi Zhou, Xin Gao

Summary: Revoking personal private data is a basic human right, and the authors propose a solution called AFS to audit and revoke patients' private data from pre-trained deep learning models, enhancing privacy protection and data revocation rights in real-world intelligent healthcare.

NATURE COMMUNICATIONS (2023)

Article Multidisciplinary Sciences

Clinical utility of polygenic scores for cardiometabolic disease in Arabs

Injeong Shim, Hiroyuki Kuwahara, NingNing Chen, Mais O. Hashem, Lama AlAbdi, Mohamed Abouelhoda, Hong-Hee Won, Pradeep Natarajan, Patrick T. Ellinor, Amit V. Khera, Xin Gao, Fowzan S. Alkuraya, Akl C. Fahed

Summary: Polygenic risk prediction can effectively predict the risk of cardiometabolic diseases in Arab population and is comparable to that in European-ancestry individuals. The polygenic scores are associated with disease severity and independent of conventional risk factors.

NATURE COMMUNICATIONS (2023)

Review Medicine, General & Internal

Tafamidis treatment in patients with transthyretin amyloid cardiomyopathy: a systematic review and meta-analysis

Jie Wang, Hongyu Chen, Zihuan Tang, Jinquan Zhang, Yuanwei Xu, Ke Wan, Kifah Hussain, Georgios Gkoutos, Yuchi Han, Yucheng Chen

Summary: This study systematically assessed the association of tafamidis treatment with outcomes in patients with transthyretin amyloid cardiomyopathy (ATTR-CM). The results showed that tafamidis treatment had a positive impact on the outcomes of patients with ATTR-CM, reducing the risk of adverse cardiovascular events and all-cause death.

ECLINICALMEDICINE (2023)

Article Multidisciplinary Sciences

An angiopoietin 2, FGF23, and BMP10 biomarker signature differentiates atrial fibrillation from other concomitant cardiovascular conditions

Winnie Chua, Victor R. Cardoso, Eduard Guasch, Moritz F. Sinner, Christoph Al-Taie, Paul Brady, Barbara Casadei, Harry J. G. M. Crijns, Elton A. M. P. Dudink, Stephane N. Hatem, Stefan Kaeaeb, Peter Kastner, Lluis Mont, Frantisek Nehaj, Yanish Purmah, Jasmeet S. Reyat, Ulrich Schotten, Laura C. Sommerfeld, Stef Zeemering, Andre Ziegler, Georgios V. Gkoutos, Paulus Kirchhof, Larissa Fabritz

Summary: Early detection of atrial fibrillation through the measurement of circulating biomarkers can reduce the risk of stroke, cardiovascular death, and heart failure.

SCIENTIFIC REPORTS (2023)

Article Physics, Multidisciplinary

Harmonising knowledge for safer materials via the NanoCommons Knowledge Base

Dieter Maier, Thomas E. Exner, Anastasios G. Papadiamantis, Ammar Ammar, Andreas Tsoumanis, Philip Doganis, Ian Rouse, Luke T. Slater, Georgios V. Gkoutos, Nina Jeliazkova, Hilmar Ilgenfritz, Martin Ziegler, Beatrix Gerhard, Sebastian Kopetsky, Deven Joshi, Lee Walker, Claus Svendsen, Haralambos Sarimveis, Vladimir Lobaskin, Martin Himly, Jeaphianne van Rijn, Laurent Winckers, Javier Millan Acosta, Egon Willighagen, Georgia Melagraki, Antreas Afantitis, Iseult Lynch

Summary: This paper introduces the importance and objectives of the NanoCommons project, and summarizes its infrastructure - the NanoCommons Knowledge Base, describing its features and functions. By connecting nanosafety data sources and tools, this knowledge base provides users with a user-friendly interface and API to access state-of-the-art tools for nanomaterial safety prediction, design, and risk assessment. The article also presents the relationship between the knowledge base and other initiatives and projects, as well as its application in the FAIRification of experimental workflows.

FRONTIERS IN PHYSICS (2023)

暂无数据