4.7 Article

Single-cell RNA-seq interpretations using evolutionary multiobjective ensemble pruning

Journal

BIOINFORMATICS
Volume 35, Issue 16, Pages 2809-2817

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/bty1056

Keywords

-

Funding

  1. Research Grants Council of the Hong Kong Special Administrative Region [CityU 21200816, CityU 11203217, CityU 11200218]
  2. National Natural Science Foundation of China [61603087]
  3. Natural Science Foundation of Jilin Province [20190103006JH]
  4. Fundamental Research Funds for the Central Universities [2412017FZ026]

Ask authors/readers for more resources

Motivation: In recent years, single-cell RNA sequencing enables us to discover cell types or even sub-types. Its increasing availability provides opportunities to identify cell populations from single-cell RNA-seq data. Computational methods have been employed to reveal the gene expression variations among multiple cell populations. Unfortunately, the existing ones can suffer from realistic restrictions such as experimental noises, numerical instability, high dimensionality and computational scalability. Results: We propose an evolutionary multiobjective ensemble pruning algorithm (EMEP) that addresses those realistic restrictions. Our EMEP algorithm first applies the unsupervised dimensionality reduction to project data from the original high dimensions to low-dimensional subspaces; basic clustering algorithms are applied in those new subspaces to generate different clustering results to form cluster ensembles. However, most of those cluster ensembles are unnecessarily bulky with the expense of extra time costs and memory consumption. To overcome that problem, EMEP is designed to dynamically select the suitable clustering results from the ensembles. Moreover, to guide the multiobjective ensemble evolution, three cluster validity indices including the overall cluster deviation, the within-cluster compactness and the number of basic partition clusters are formulated as the objective functions to unleash its cell type discovery performance using evolutionary multiobjective optimization. We applied EMEP to 55 simulated datasets and seven real single-cell RNA-seq datasets, including six single-cell RNA-seq dataset and one large-scale dataset with 3005 cells and 4412 genes. Two case studies are also conducted to reveal mechanistic insights into the biological relevance of EMEP. We found that EMEP can achieve superior performance over the other clustering algorithms, demonstrating that EMEP can identify cell populations clearly.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Mathematical & Computational Biology

An Artificial Intelligence Approach for Gene Editing Off-Target Quantification: Convolutional Self-attention Neural Network Designs and Considerations

Jiecong Lin, Xingjian Chen, Ka-Chun Wong

Summary: The issue of off-target cleavage in the CRISPR gene-editing system has been a concern. This study introduces a computational method using convolutional neural network and attention module to predict off-target activity in CRISPR. Validation experiments demonstrate that the proposed model outperforms existing deep-learning-based off-target prediction models in terms of predictive performance.

STATISTICS IN BIOSCIENCES (2023)

Editorial Material Computer Science, Artificial Intelligence

2019 India International Congress on Computational Intelligence

Suash Deb, Ka-Chun Wong, Thomas Hanne

NEURAL COMPUTING & APPLICATIONS (2023)

Article Computer Science, Artificial Intelligence

Improved local search for the minimum weight dominating set problem in massive graphs by using a deep optimization mechanism

Jiejiang Chen, Shaowei Cai, Yiyuan Wang, Wenhao Xu, Jia Ji, Minghao Yin

Summary: The minimum weight dominating set (MWDS) problem is an important generalization of the minimum dominating set problem. In this study, we propose an efficient local search scheme and three novel ideas to improve performance, resulting in the DeepOpt-MWDS algorithm. Extensive experiments show that DeepOpt-MWDS performs better than state-of-the-art algorithms and achieves the best solutions on large-scale graphs.

ARTIFICIAL INTELLIGENCE (2023)

Article Biochemical Research Methods

Multi-view contrastive heterogeneous graph attention network for lncRNA-disease association prediction

Xiaosa Zhao, Jun Wu, Xiaowei Zhao, Minghao Yin

Summary: This study proposes a new multi-view contrastive heterogeneous graph attention network (GAT) method for predicting lncRNA-disease associations. The method constructs two view graphs using rich biological data sources and designs a cross-contrastive learning task to guide graph embeddings. Experimental results show the effectiveness of this method.

BRIEFINGS IN BIOINFORMATICS (2023)

Review Biochemistry & Molecular Biology

Review of single-cell RNA-seq data clustering for cell-type identification and characterization

Shixiong Zhang, Xiangtao Li, Jiecong Lin, Qiuzhen Lin, Ka-Chun Wong

Summary: The advances in single-cell RNA-seq techniques have allowed for large-scale transcriptomic profiling at single-cell resolution. Unsupervised learning, such as data clustering, is a key component in identifying and characterizing novel cell types and gene expression patterns. This study reviews existing single-cell RNA-seq data clustering methods, including their advantages and limitations, as well as upstream data processing techniques like quality control, normalization, and dimension reduction. Performance comparison experiments evaluate popular single-cell RNA-seq clustering approaches on simulated and multiple single-cell transcriptomic datasets.
Article Computer Science, Information Systems

An improved master-apprentice evolutionary algorithm for minimum independent dominating set problem

Shiwei Pan, Yiming Ma, Yiyuan Wang, Zhiguo Zhou, Jinchao Ji, Minghao Yin, Shuli Hu

Summary: This work presents an improved master-apprentice evolutionary algorithm, MAE-PB, for solving the MIDS problem. The algorithm combines a construction function for generating initial solutions and candidate solution restarting. It uses a multiple neighborhood-based local search algorithm, a recombination strategy based on master and apprentice solutions, and a perturbation strategy for improving solution quality. Computational results on benchmarks and real-world applications demonstrate the high performance of the MAE-PB algorithm.

FRONTIERS OF COMPUTER SCIENCE (2023)

Article Biology

Learning the protein language of proteome-wide protein-protein binding sites via explainable ensemble deep learning

Zilong Hou, Yuning Yang, Zhiqiang Ma, Ka-chun Wong, Xiangtao Li

Summary: Protein-protein interactions (PPIs) play a crucial role in cellular pathways and processes, but accurate identification of PPI binding sites is challenging. The proposed EDLM-based method, EDLMPPI, addresses these challenges by utilizing an ensemble deep learning model. Evaluation results demonstrate that EDLMPPI outperforms state-of-the-art techniques in terms of average precision on widely-used benchmark datasets. Additionally, the method provides new insights into protein binding site identification and characterization mechanisms.

COMMUNICATIONS BIOLOGY (2023)

Article Computer Science, Artificial Intelligence

Multi-objective evolving long-short term memory networks with attention for network intrusion detection

Wenhong Wei, Yi Chen, Qiuzhen Lin, Junkai Ji, Ka-Chun Wong, Jianqiang Li

Summary: As people's use of Internet applications increases and concerns about the security of personal data on the Internet grow, cyber security has become increasingly important. Intrusion Detection Systems (IDSs) are crucial tools for detecting and responding to intrusions. Deep Learning (DL) techniques have gained popularity in IDS design due to their promising performance, but their design requires professional knowledge and can significantly impact the DL model's performance. This paper proposes a multi-objective evolutionary DL model (EvoBMF) that incorporates bidirectional Long-short Term Memory (BiLSTM), Multi-Head Attention (MHA), and Full-Connected Layer (FCL) to detect network intrusion behaviors.

APPLIED SOFT COMPUTING (2023)

Review Green & Sustainable Science & Technology

Comprehensive review and future research directions on using various lanthanum-based adsorbents for selective phosphate removal

Kendric Aaron Tee, Saeed Ahmed, Mohammad A. H. Badsha, Ka Chun James Wong, Irene M. C. Lo

Summary: Due to the strong affinity of lanthanum (La) for phosphate, La compounds such as lanthanum oxide (LO), lanthanum hydroxide (LH), and lanthanum carbonate (LC) have been used in various La-based adsorbents. This study evaluates the differences between LO, LH, and LC in terms of their phosphate removal performance, stability, and reusability. LC has shown superior adsorption capacity, wider pH range, and lower La leaching, making it a potential alternative to LO and LH for phosphate removal. Further studies are needed to compare La compounds in more complex matrices and assess the role of crystal structure in phosphate removal.

CLEAN TECHNOLOGIES AND ENVIRONMENTAL POLICY (2023)

Article Computer Science, Interdisciplinary Applications

An efficient local search algorithm for minimum positive influence dominating set problem

Rui Sun, Jieyu Wu, Chenghou Jin, Yiyuan Wang, Wenbo Zhou, Minghao Yin

Summary: This paper proposes an efficient local search algorithm based on three main ideas to solve MPIDS problems with different scale instances. The experimental results show that the proposed algorithm performs much better than several state-of-the-art MPIDS algorithms in terms of solution quality.

COMPUTERS & OPERATIONS RESEARCH (2023)

Article Computer Science, Interdisciplinary Applications

On solving simplified diversified top-k s-plex problem

Jun Wu, Chu-Min Li, Luzhi Wang, Shuli Hu, Peng Zhao, Minghao Yin

Summary: This paper focuses on the problem of finding cohesive groups in a graph, which is important for various real-world applications. The existing methods of searching for cliques are not effective in finding cohesive groups due to their strictness. To address this issue, the Simplified Diversified Top-k s-Plex (S-DTKSP) problem is proposed in this paper. An integer linear programming and an iterated local search algorithm with a tabu strategy are proposed to solve the S-DTKSP problem effectively. Experimental results demonstrate the superiority of the proposed approaches over baseline algorithms.

COMPUTERS & OPERATIONS RESEARCH (2023)

Article Computer Science, Artificial Intelligence

A greedy randomized adaptive search procedure (GRASP) for minimum weakly connected dominating set problem

Dangdang Niu, Xiaolin Nie, Lilin Zhang, Hongming Zhang, Minghao Yin

Summary: This paper introduces a 0-1 integer linear programming (ILP) model and a framework of greedy randomized adaptive search procedure (GRASP) to solve the minimum weakly connected dominating set problem (MWCDSP). By introducing two novel local search procedures and incorporating greedy functions and a tabu strategy, an improved GRASP algorithm is proposed. Experimental results demonstrate the superior performance of this algorithm over other competitors.

EXPERT SYSTEMS WITH APPLICATIONS (2023)

Article Chemistry, Multidisciplinary

Efficient Generation of Paired Single-Cell Multiomics Profiles by Deep Learning

Meng Lan, Shixiong Zhang, Lin Gao

Summary: Recent advances in single-cell sequencing technology have led to the development of a deep learning-based framework called scMOG, which can generate single-cell assay for transposase-accessible chromatin (ATAC) data in silico. This framework accurately performs cross-omics generation between RNA and ATAC, and generates paired multiomics data with biological meanings. The generated ATAC data exhibits equivalent or superior performance to that of experimentally measured counterparts. scMOG also proves to be more effective in identifying tumor samples in human lymphoma data than the experimentally measured ATAC data. Moreover, scMOG shows robust performance in generating surface protein data in other omics such as proteomics.

ADVANCED SCIENCE (2023)

Article Biochemical Research Methods

Deep transfer learning for clinical decision-making based on high-throughput data: comprehensive survey with benchmark results

Muhammad Toseef, Olutomilayo Olayemi Petinrin, Fuzhou Wang, Saifur Rahaman, Zhe Liu, Xiangtao Li, Ka-Chun Wong

Summary: The use of transfer learning in health informatics and clinical decision-making, particularly in utilizing high-throughput molecular data, has shown great potential in bridging the gap between data domains and overcoming the lack of sufficient training data in clinical research.

BRIEFINGS IN BIOINFORMATICS (2023)

Article Computer Science, Artificial Intelligence

Evolutionary Multitasking for Large-Scale Multiobjective Optimization

Songbai Liu, Qiuzhen Lin, Liang Feng, Ka-Chun Wong, Kay Chen Tan

Summary: Evolutionary transfer optimization (ETO) is a hot topic in evolutionary computation, which seeks to improve optimization efficiency by transferring knowledge across related exercises. This article proposes a multitasking ETO algorithm using transfer learning to solve large-scale multiobjective optimization problems (LMOPs). The algorithm utilizes a discriminative reconstruction network (DRN) for each LMOP to transfer solutions, evaluate correlation, and learn a reduced Pareto-optimal subspace of the target LMOP. The effectiveness of the algorithm is validated in real-world and synthetic problem suites.

IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION (2023)

No Data Available