Article
Biochemistry & Molecular Biology
Nan Miles Xi, Jingyi Jessica Li
Summary: In this study, we empirically examined the impact of neural network architecture, activation function, and regularization strategy on imputation accuracy in scRNA-seq data. Our results show that deeper and narrower autoencoders perform better, sigmoid and tanh activation functions outperform ReLU, and regularization improves imputation accuracy and downstream analyses.
COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL
(2023)
Article
Biochemical Research Methods
Yanglan Gan, Xingyu Huang, Guobing Zou, Shuigeng Zhou, Jihong Guan
Summary: Single-cell RNA sequencing is a critical technique for studying cell heterogeneity and diversity. However, clustering analysis of scRNA-seq data is challenging due to noise, high dimensionality, and dropout events. In this study, a new deep structural clustering method called scDSC is proposed, which incorporates structural information to improve clustering accuracy and scalability.
BRIEFINGS IN BIOINFORMATICS
(2022)
Article
Multidisciplinary Sciences
Juexin Wang, Anjun Ma, Yuzhou Chang, Jianting Gong, Yuexu Jiang, Ren Qi, Cankun Wang, Hongjun Fu, Qin Ma, Dong Xu
Summary: Single-cell RNA-Seq faces challenges such as sparsity in sequencing and complex patterns in gene expression. The introduction of a graph neural network based on a hypothesis-free deep learning framework provides an effective representation of gene expression and cell-cell relationships.
NATURE COMMUNICATIONS
(2021)
Article
Genetics & Heredity
Xiang Feng, Fang Fang, Haixia Long, Rao Zeng, Yuhua Yao
Summary: The importance of gene imputation and cell clustering analysis of single-cell RNA sequencing (scRNA-seq) data has increased with the development of high-throughput sequencing technology. The scGAEGAT model, based on graph neural networks, demonstrated promising performance in gene imputation and cell clustering prediction on four scRNA-seq data sets.
FRONTIERS IN GENETICS
(2022)
Article
Biochemical Research Methods
Siqi Chen, Xuhua Yan, Ruiqing Zheng, Min Li
Summary: Single-cell RNA sequencing technology (scRNA-seq) has the drawback of large sparsity, which leads to dropout events and affects downstream analyses. To address this, we propose Bubble, which identifies and imputes dropout events using expression rate and coefficient of variation, and leverages bulk RNA-seq data as a constraint. Bubble improves recovery of missing values, correlations, and reduces false positive signals. It enhances differential expression analysis, clustering, visualization, and aids cellular trajectory inference. Moreover, Bubble provides fast and scalable imputation with minimal memory usage.
BRIEFINGS IN BIOINFORMATICS
(2023)
Article
Biochemical Research Methods
Siqi Chen, Xuhua Yan, Ruiqing Zheng, Min Li
Summary: Bubble is a method for identifying and imputing 'dropout events' in scRNA-seq data, using gene expression rate and coefficient of variation to identify zeros, and then utilizing an autoencoder for imputation. Bubble enhances the recovery of missing values, reduces the introduction of false positive signals, and improves the identification of differentially expressed genes and cell clustering and visualization.
BRIEFINGS IN BIOINFORMATICS
(2022)
Article
Computer Science, Artificial Intelligence
Anuraj Mohan, K. Pramod
Summary: The Temporal Graph Attention Network (TempGAN) aims to learn representations from continuous-time temporal networks by preserving the temporal proximity between nodes. Generating a Positive Pointwise Mutual Information matrix (PPMI) through temporal walks on the network, TempGAN architecture uses both adjacency and PPMI information to generate node embeddings. Link prediction experiments using TempGAN autoencoder are conducted to evaluate the quality of the embeddings generated and compare them with other state-of-the-art methods.
COMPLEX & INTELLIGENT SYSTEMS
(2022)
Article
Computer Science, Artificial Intelligence
Zonghan Wu, Da Zheng, Shirui Pan, Quan Gan, Guodong Long, George Karypis
Summary: This article introduces a novel spatial-temporal graph neural network called TraverseNet for capturing the spatial-temporal dependencies in traffic data. Compared to other spatial-temporal neural networks, TraverseNet views space and time as an inseparable whole and utilizes message traverse mechanisms to explore the dependencies in the spatial-temporal graph.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
(2022)
Article
Biochemical Research Methods
Shengfeng Gan, Huan Deng, Yang Qiu, Mohammed Alshahrani, Shichao Liu
Summary: This research proposes an accurate deep learning method called DSAE-Impute to impute the missing values in scRNA-seq data. The method employs stacked autoencoders and discriminative cell similarity to capture global expression features and achieve accurate imputation. Experimental results demonstrate its superiority in downstream analysis.
CURRENT BIOINFORMATICS
(2022)
Article
Biochemical Research Methods
Xiang Feng, Hongqi Zhang, Hao Lin, Haixia Long
Summary: In this study, a directed graph neural network called scDGAE was developed for scRNA-seq analysis, using graph autoencoders and graph attention network. The experiment results showed that the scDGAE model achieved promising performance in gene imputation and cell clustering prediction, and it can be applied to general scRNA-Seq analyses.
Article
Biochemical Research Methods
Jian Liu, Yichen Pan, Zhihan Ruan, Jun Guo
Summary: In this paper, we propose a novel two-stage diffusion-denoising method called SCDD for large-scale single-cell RNA-seq imputation. The method effectively suppresses the over-smooth problem and remarkably improves the downstream analysis of single-cell RNA-seq, including clustering and trajectory analysis.
BRIEFINGS IN BIOINFORMATICS
(2022)
Article
Computer Science, Artificial Intelligence
Dengdi Sun, Dashuang Li, Zhuanlian Ding, Xingyi Zhang, Jin Tang
Summary: This paper proposes a novel all-to-all graph autoencoder model, named A2AE, for multi-view graph representation learning. It utilizes the rich relational information in multiple views and recognizes the importance of different views.
APPLIED SOFT COMPUTING
(2022)
Article
Biochemical Research Methods
Hang Hu, Zhong Li, Xiangjie Li, Minzhe Yu, Xiutao Pan
Summary: This study proposes a novel deep embedding clustering method for single-cell RNA-seq data, which integrates deep learning and convolutional autoencoder for feature representation and utilizes a regularized soft K-means algorithm for clustering. Experimental results demonstrate that this method outperforms other approaches in various datasets and exhibits good compatibility and robustness.
BRIEFINGS IN BIOINFORMATICS
(2022)
Article
Computer Science, Artificial Intelligence
Dengdi Sun, Dashuang Li, Zhuanlian Ding, Xingyi Zhang, Jin Tang
Summary: The study introduces a dual-decoder graph autoencoder model that effectively embeds the topological structure and node attributes of a graph into a compact representation, showcasing superior performance in experiments.
KNOWLEDGE-BASED SYSTEMS
(2021)
Article
Computer Science, Artificial Intelligence
Dengdi Sun, Liang Liu, Bin Luo, Zhuanlian Ding
Summary: This paper proposes a novel graph Laplacian autoencoder with subspace clustering regularization for graph clustering (GLASS). The method overcomes the entanglement between convolutional filters and weight matrices in GCN encoders by using Laplacian smoothing filters and MLPs. The GLASS approach improves the feature propagation capability and clustering performance through residual connections and subspace clustering regularization. Experimental results demonstrate the effectiveness of GLASS and its advantages over GCN encoders in graph clustering and image clustering.
COGNITIVE COMPUTATION
(2023)
Article
Biochemical Research Methods
Qianmu Yuan, Sheng Chen, Jiahua Rao, Shuangjia Zheng, Huiying Zhao, Yuedong Yang
Summary: In this study, a precise predictor called GraphSite based on AlphaFold2 is proposed for identifying DNA-binding residues from protein structural models. By employing a graph transformer and leveraging predicted protein structures, GraphSite significantly improves the accuracy of DNA binding site prediction.
BRIEFINGS IN BIOINFORMATICS
(2022)
Article
Virology
Hongwei Chen, Haoyang Zhang, Simin Wen, Xuehao Xiu, Danming You, Huiying Zhao, Dayan Wang, Yuedong Yang, Yuelong Shu
Summary: Currently, there is a lack of systematic exploration on the clinical factors influencing immune responses to influenza vaccines. The mechanism of low responsiveness to influenza vaccination (LRIV) is complex and not well understood. In this study, we combined our in-house genome-wide association studies (GWAS) analysis of LRIV with the GWAS summary of 10 blood-based biomarkers to investigate the genetics shared between LRIV and blood-based biomarkers using Mendelian randomization (MR). The results suggest a potential causal relationship between genetically instrumented LRIV and decreased eosinophil count.
JOURNAL OF MEDICAL VIROLOGY
(2023)
Editorial Material
Computer Science, Artificial Intelligence
Jiahua Rao, Shuangjia Zheng, Yuedong Yang
Article
Biochemical Research Methods
Yuansong Zeng, Rui Yin, Mai Luo, Jianing Chen, Zixiang Pan, Yutong Lu, Weijiang Yu, Yuedong Yang
Summary: Recent advances in spatial transcriptomics have allowed for gene expression measurement at cell/spot resolution, while retaining spatial information and histology images of the tissues. Accurately identifying the spatial domains of spots is crucial for downstream tasks in spatial transcriptomics analysis. In this study, a novel method called ConGI is proposed, which utilizes contrastive learning to accurately exploit spatial domains by combining gene expression with histopathological images. The method outperforms existing methods and the learned representations are useful for various downstream tasks.
BRIEFINGS IN BIOINFORMATICS
(2023)
Article
Biochemical Research Methods
Yidong Song, Qianmu Yuan, Sheng Chen, Ken Chen, Yaoqi Zhou, Yuedong Yang
Summary: Determining intrinsically disordered regions of proteins is crucial for understanding protein biological functions and associated diseases. This study proposes a fast and accurate protein disorder predictor, LMDisorder, which utilizes embedding generated by unsupervised pretrained language models as features. LMDisorder outperforms other single-sequence-based methods and compares favorably to another language-model-based technique in independent test sets. Additionally, LMDisorder shows equivalent or better performance than the state-of-the-art profile-based technique SPOT-Disorder2. The high computation efficiency of LMDisorder allows for proteome-scale analysis, revealing associations between proteins with high predicted disorder content and specific biological functions. The datasets, source codes, and trained model are available at https://github.com/biomed-AI/LMDisorder.
BRIEFINGS IN BIOINFORMATICS
(2023)
Article
Biochemical Research Methods
Qianmu Yuan, Junjie Xie, Jiancong Xie, Huiying Zhao, Yuedong Yang
Summary: Protein function prediction is crucial in bioinformatics and has implications for disease mechanism elucidation and drug target discovery. However, accurately predicting protein functions solely from sequences remains challenging. This study introduces SPROF-GO, a sequence-based alignment-free predictor that utilizes a pretrained language model to extract informative sequence embeddings and implements self-attention pooling to focus on important residues. SPROF-GO outperforms state-of-the-art approaches in precision-recall curves and demonstrates generalization capabilities.
BRIEFINGS IN BIOINFORMATICS
(2023)
Article
Biochemical Research Methods
Hegang Chen, Yuyin Lu, Yuedong Yang, Yanghui Rao
Summary: Combination therapy plays an important role in treating complex diseases, but the large number of possible combinations limits our ability to identify effective ones. This study introduces a new computational pipeline, DCMGCN, which integrates diverse drug-related information to predict novel drug combinations. The tests show that DCMGCN outperforms existing methods and may help to clarify the understanding of drug mechanisms.
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS
(2023)
Article
Engineering, Biomedical
Fudan Zheng, Luhao Wang, Yuxian Pang, Zhiguang Chen, Yutong Lu, Yuedong Yang, Jianfeng Wu
Summary: Septic shock has become the leading cause of morbidity and mortality in the ICU. However, currently there is no model to predict the mortality of septic shock patients. We aim to develop such a model.
BIOMEDICAL SIGNAL PROCESSING AND CONTROL
(2023)
Article
Neurosciences
Siying Lin, Haoyang Zhang, Mengling Qi, David N. Cooper, Yuedong Yang, Yuanhao Yang, Huiying Zhao
Summary: Observational studies consistently show that brain imaging-derived phenotypes (IDPs) are critical markers for the early diagnosis of brain disorders and cardiovascular diseases. However, the shared genetic landscape between brain IDPs and the risk of these diseases remains unclear, limiting the application of potential diagnostic techniques using brain IDPs.
Article
Biochemistry & Molecular Biology
Cong Fan, Xin Wang, Tianze Ling, Yuedong Yang, Huiying Zhao
Summary: Recent studies suggest that RNAs have potential as drug targets, but progress in detecting RNA-ligand interactions is limited. To guide the discovery of RNA-binding ligands, it is necessary to comprehensively characterize them in terms of binding specificity, binding affinity, and drug-like properties. We established the RNALID database, which contains 358 validated RNA-ligand interactions. Comparisons with other databases show that the majority of ligands in RNALID are novel, and the analysis of ligand structure, binding affinity, and cheminformatic parameters reveals insights into the characteristics of different ligand types. Additionally, comparing RNALID ligands to FDA-approved drugs and ligands without bioactivity sheds light on their differences in chemical properties and drug-likeness.
Article
Biochemical Research Methods
Bailing Zhou, Maolin Ding, Jing Feng, Baohua Ji, Pingping Huang, Junye Zhang, Xue Yu, Zanxia Cao, Yuedong Yang, Yaoqi Zhou, Jihua Wang
Summary: Long non-coding RNAs (lncRNAs) are important in biological processes and disease. Algorithms have been developed to distinguish lncRNAs from mRNAs, resulting in the discovery of over 600,000 lncRNAs. However, only a small fraction of these have been validated through low-throughput experiments. To prioritize potentially functional lncRNAs and overcome the challenge of small datasets, deep learning algorithms were employed in this study.
BRIEFINGS IN BIOINFORMATICS
(2023)
Article
Computer Science, Artificial Intelligence
Shuangjia Zheng, Sijie Mai, Ya Sun, Haifeng Hu, Yuedong Yang
Summary: Link prediction for knowledge graphs aims to predict missing connections between entities. Prevailing methods are limited to a transductive setting and hard to process unseen entities. The recently proposed subgraph-based models provide alternatives to predict links from the subgraph structure surrounding a candidate triplet.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
(2023)
Article
Biochemical Research Methods
Yuansong Zeng, Zhuoyi Wei, Qianmu Yuan, Sheng Chen, Weijiang Yu, Yutong Lu, Jianzhao Gao, Yuedong Yang
Summary: Drawing on the breakthrough of AlphaFold2 in protein structure prediction, we propose a novel graph-based model, GraphBepi, for accurate B-cell epitope prediction. By utilizing the predicted structure from AlphaFold2, GraphBepi constructs the protein graph and captures both sequence and spatial information through edge-enhanced deep graph neural networks (EGNN) and bidirectional long short-term memory neural networks (BiLSTM). The combined representations are input into a multilayer perceptron to predict B-cell epitopes. Comprehensive tests demonstrate that GraphBepi outperforms state-of-the-art methods in terms of AUC and AUPR.
Article
Biochemistry & Molecular Biology
Chuwei Liu, Arabella H. Wan, Heng Liang, Lei Sun, Jiarui Li, Ranran Yang, Qinghai Li, Ruibo Wu, Kunhua Hu, Yuedong Yang, Shirong Cai, Guohui Wan, Weiling He
Summary: Tumor mutation burden (TMB) is an important biomarker for assessing the efficacy of cancer immunotherapy, but its correlation with immune checkpoint inhibitors (ICIs) responsiveness varies among different cancer types. This study explores the relationship between TMB and multi-omics data in various cancer types and develops the PGLCN model to improve the interpretability and prediction accuracy of TMB. By integrating multi-omics data, the PGLCN model outperforms traditional machine learning methods in predicting TMB status and identifies potential combined biomarkers for TMB in gastric cancer.
COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL
(2023)
Article
Radiology, Nuclear Medicine & Medical Imaging
Hai-Qing Liu, Si-Ying Lin, Yi-Dong Song, Si-Yao Mai, Yue-Dong Yang, Kai Chen, Zhuo Wu, Hui-Ying Zhao
Summary: This study developed a machine learning model based on MRI to predict molecular subtype alterations in breast cancer after neoadjuvant therapy. The model showed favorable predictive efficacy in identifying molecular subtype alteration and could be a useful tool in clinical practice.
EUROPEAN RADIOLOGY
(2023)