4.7 Article

Imputing single-cell RNA-seq data by combining graph convolution and autoencoder neural networks

期刊

ISCIENCE
卷 24, 期 5, 页码 -

出版社

CELL PRESS
DOI: 10.1016/j.isci.2021.102393

关键词

-

资金

  1. National Key R&D Program of China [2020YFB0204803]
  2. National Natural Science Foundation of China [61772566]
  3. Guangdong Key Field RD Plan [2019B020228001, 2018B010109006]
  4. Introducing Innovative and Entrepreneurial Teams [2016ZT06D211]
  5. Guangzhou ST Research Plan [202007030010]

向作者/读者索取更多资源

Single-cell RNA sequencing technology enables analysis of single-cell transcriptomes with unprecedented throughput and resolution, but faces the challenge of dropout problem. The developed method GraphSCI, based on graph convolution networks, outperforms other state-of-the-art methods in imputation, accurately inferring gene-to-gene relationships and providing powerful assistance during training.
Single-cell RNA sequencing technology promotes the profiling of single-cell transcriptomes at an unprecedented throughput and resolution. However, in scRNA-seq studies, only a low amount of sequenced mRNA in each cell leads to missing detection for a portion of mRNA molecules, i.e. the dropout problem which hinders various downstream analyses. Therefore, it is necessary to develop robust and effective imputation methods for the increasing scRNA-seq data. In this study, we have developed an imputation method (GraphSCI) to impute the dropout events in scRNA-seq data based on the graph convolution networks. Extensive experiments demonstrated that GraphSCI outperforms other state-of-the-art methods for imputation on both simulated and real scRNA-seq data. Meanwhile, GraphSCI is able to accurately infer gene-to-gene relationships and the inferred gene-to-gene relationships could also provide powerful assistance for imputation dynamically during the training process, which is a key promotion of GraphSCI compared with other imputation algorithms.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Biochemical Research Methods

AlphaFold2-aware protein-DNA binding site prediction using graph transformer

Qianmu Yuan, Sheng Chen, Jiahua Rao, Shuangjia Zheng, Huiying Zhao, Yuedong Yang

Summary: In this study, a precise predictor called GraphSite based on AlphaFold2 is proposed for identifying DNA-binding residues from protein structural models. By employing a graph transformer and leveraging predicted protein structures, GraphSite significantly improves the accuracy of DNA binding site prediction.

BRIEFINGS IN BIOINFORMATICS (2022)

Article Virology

Mendelian randomization suggests a potential causal effect of eosinophil count on influenza vaccination responsiveness

Hongwei Chen, Haoyang Zhang, Simin Wen, Xuehao Xiu, Danming You, Huiying Zhao, Dayan Wang, Yuedong Yang, Yuelong Shu

Summary: Currently, there is a lack of systematic exploration on the clinical factors influencing immune responses to influenza vaccines. The mechanism of low responsiveness to influenza vaccination (LRIV) is complex and not well understood. In this study, we combined our in-house genome-wide association studies (GWAS) analysis of LRIV with the GWAS summary of 10 blood-based biomarkers to investigate the genetics shared between LRIV and blood-based biomarkers using Mendelian randomization (MR). The results suggest a potential causal relationship between genetically instrumented LRIV and decreased eosinophil count.

JOURNAL OF MEDICAL VIROLOGY (2023)

Editorial Material Computer Science, Artificial Intelligence

Integrating supercomputing and artificial intelligence for life science

Jiahua Rao, Shuangjia Zheng, Yuedong Yang

PATTERNS (2022)

Article Biochemical Research Methods

Identifying spatial domain by adapting transcriptomics with histology through contrastive learning

Yuansong Zeng, Rui Yin, Mai Luo, Jianing Chen, Zixiang Pan, Yutong Lu, Weijiang Yu, Yuedong Yang

Summary: Recent advances in spatial transcriptomics have allowed for gene expression measurement at cell/spot resolution, while retaining spatial information and histology images of the tissues. Accurately identifying the spatial domains of spots is crucial for downstream tasks in spatial transcriptomics analysis. In this study, a novel method called ConGI is proposed, which utilizes contrastive learning to accurately exploit spatial domains by combining gene expression with histopathological images. The method outperforms existing methods and the learned representations are useful for various downstream tasks.

BRIEFINGS IN BIOINFORMATICS (2023)

Article Biochemical Research Methods

Fast and accurate protein intrinsic disorder prediction by using a pretrained language model

Yidong Song, Qianmu Yuan, Sheng Chen, Ken Chen, Yaoqi Zhou, Yuedong Yang

Summary: Determining intrinsically disordered regions of proteins is crucial for understanding protein biological functions and associated diseases. This study proposes a fast and accurate protein disorder predictor, LMDisorder, which utilizes embedding generated by unsupervised pretrained language models as features. LMDisorder outperforms other single-sequence-based methods and compares favorably to another language-model-based technique in independent test sets. Additionally, LMDisorder shows equivalent or better performance than the state-of-the-art profile-based technique SPOT-Disorder2. The high computation efficiency of LMDisorder allows for proteome-scale analysis, revealing associations between proteins with high predicted disorder content and specific biological functions. The datasets, source codes, and trained model are available at https://github.com/biomed-AI/LMDisorder.

BRIEFINGS IN BIOINFORMATICS (2023)

Article Biochemical Research Methods

Fast and accurate protein function prediction from sequence through pretrained language model and homology-based label diffusion

Qianmu Yuan, Junjie Xie, Jiancong Xie, Huiying Zhao, Yuedong Yang

Summary: Protein function prediction is crucial in bioinformatics and has implications for disease mechanism elucidation and drug target discovery. However, accurately predicting protein functions solely from sequences remains challenging. This study introduces SPROF-GO, a sequence-based alignment-free predictor that utilizes a pretrained language model to extract informative sequence embeddings and implements self-attention pooling to focus on important residues. SPROF-GO outperforms state-of-the-art approaches in precision-recall curves and demonstrates generalization capabilities.

BRIEFINGS IN BIOINFORMATICS (2023)

Article Biochemical Research Methods

A Drug Combination Prediction Framework Based on Graph Convolutional Network and Heterogeneous Information

Hegang Chen, Yuyin Lu, Yuedong Yang, Yanghui Rao

Summary: Combination therapy plays an important role in treating complex diseases, but the large number of possible combinations limits our ability to identify effective ones. This study introduces a new computational pipeline, DCMGCN, which integrates diverse drug-related information to predict novel drug combinations. The tests show that DCMGCN outperforms existing methods and may help to clarify the understanding of drug mechanisms.

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS (2023)

Article Engineering, Biomedical

ShockSurv: A machine learning model to accurately predict 28-day mortality for septic shock patients in the intensive care unit

Fudan Zheng, Luhao Wang, Yuxian Pang, Zhiguang Chen, Yutong Lu, Yuedong Yang, Jianfeng Wu

Summary: Septic shock has become the leading cause of morbidity and mortality in the ICU. However, currently there is no model to predict the mortality of septic shock patients. We aim to develop such a model.

BIOMEDICAL SIGNAL PROCESSING AND CONTROL (2023)

Article Neurosciences

Inferring the genetic relationship between brain imaging-derived phenotypes and risk of complex diseases by Mendelian randomization and genome-wide colocalization

Siying Lin, Haoyang Zhang, Mengling Qi, David N. Cooper, Yuedong Yang, Yuanhao Yang, Huiying Zhao

Summary: Observational studies consistently show that brain imaging-derived phenotypes (IDPs) are critical markers for the early diagnosis of brain disorders and cardiovascular diseases. However, the shared genetic landscape between brain IDPs and the risk of these diseases remains unclear, limiting the application of potential diagnostic techniques using brain IDPs.

NEUROIMAGE (2023)

Article Biochemistry & Molecular Biology

Characterizing RNA-binding ligands on structures, chemical information, binding affinity and drug-likeness

Cong Fan, Xin Wang, Tianze Ling, Yuedong Yang, Huiying Zhao

Summary: Recent studies suggest that RNAs have potential as drug targets, but progress in detecting RNA-ligand interactions is limited. To guide the discovery of RNA-binding ligands, it is necessary to comprehensively characterize them in terms of binding specificity, binding affinity, and drug-like properties. We established the RNALID database, which contains 358 validated RNA-ligand interactions. Comparisons with other databases show that the majority of ligands in RNALID are novel, and the analysis of ligand structure, binding affinity, and cheminformatic parameters reveals insights into the characteristics of different ligand types. Additionally, comparing RNALID ligands to FDA-approved drugs and ligands without bioactivity sheds light on their differences in chemical properties and drug-likeness.

RNA BIOLOGY (2023)

Article Biochemical Research Methods

EV1ncRNA-Dpred: improved prediction of experimentally validated lncRNAs by deep learning

Bailing Zhou, Maolin Ding, Jing Feng, Baohua Ji, Pingping Huang, Junye Zhang, Xue Yu, Zanxia Cao, Yuedong Yang, Yaoqi Zhou, Jihua Wang

Summary: Long non-coding RNAs (lncRNAs) are important in biological processes and disease. Algorithms have been developed to distinguish lncRNAs from mRNAs, resulting in the discovery of over 600,000 lncRNAs. However, only a small fraction of these have been validated through low-throughput experiments. To prioritize potentially functional lncRNAs and overcome the challenge of small datasets, deep learning algorithms were employed in this study.

BRIEFINGS IN BIOINFORMATICS (2023)

Article Computer Science, Artificial Intelligence

Subgraph-Aware Few-Shot Inductive Link Prediction Via Meta-Learning

Shuangjia Zheng, Sijie Mai, Ya Sun, Haifeng Hu, Yuedong Yang

Summary: Link prediction for knowledge graphs aims to predict missing connections between entities. Prevailing methods are limited to a transductive setting and hard to process unseen entities. The recently proposed subgraph-based models provide alternatives to predict links from the subgraph structure surrounding a candidate triplet.

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING (2023)

Article Biochemical Research Methods

Identifying B-cell epitopes using AlphaFold2 predicted structures and pretrained language model

Yuansong Zeng, Zhuoyi Wei, Qianmu Yuan, Sheng Chen, Weijiang Yu, Yutong Lu, Jianzhao Gao, Yuedong Yang

Summary: Drawing on the breakthrough of AlphaFold2 in protein structure prediction, we propose a novel graph-based model, GraphBepi, for accurate B-cell epitope prediction. By utilizing the predicted structure from AlphaFold2, GraphBepi constructs the protein graph and captures both sequence and spatial information through edge-enhanced deep graph neural networks (EGNN) and bidirectional long short-term memory neural networks (BiLSTM). The combined representations are input into a multilayer perceptron to predict B-cell epitopes. Comprehensive tests demonstrate that GraphBepi outperforms state-of-the-art methods in terms of AUC and AUPR.

BIOINFORMATICS (2023)

Article Biochemistry & Molecular Biology

Biological informed graph neural network for tumor mutation burden prediction and immunotherapy-related pathway analysis in gastric cancer

Chuwei Liu, Arabella H. Wan, Heng Liang, Lei Sun, Jiarui Li, Ranran Yang, Qinghai Li, Ruibo Wu, Kunhua Hu, Yuedong Yang, Shirong Cai, Guohui Wan, Weiling He

Summary: Tumor mutation burden (TMB) is an important biomarker for assessing the efficacy of cancer immunotherapy, but its correlation with immune checkpoint inhibitors (ICIs) responsiveness varies among different cancer types. This study explores the relationship between TMB and multi-omics data in various cancer types and develops the PGLCN model to improve the interpretability and prediction accuracy of TMB. By integrating multi-omics data, the PGLCN model outperforms traditional machine learning methods in predicting TMB status and identifies potential combined biomarkers for TMB in gastric cancer.

COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL (2023)

Article Radiology, Nuclear Medicine & Medical Imaging

Machine learning on MRI radiomic features: identification of molecular subtype alteration in breast cancer after neoadjuvant therapy

Hai-Qing Liu, Si-Ying Lin, Yi-Dong Song, Si-Yao Mai, Yue-Dong Yang, Kai Chen, Zhuo Wu, Hui-Ying Zhao

Summary: This study developed a machine learning model based on MRI to predict molecular subtype alterations in breast cancer after neoadjuvant therapy. The model showed favorable predictive efficacy in identifying molecular subtype alteration and could be a useful tool in clinical practice.

EUROPEAN RADIOLOGY (2023)

暂无数据