4.7 Article

Expectation pooling: an effective and interpretable pooling method for predicting DNA-protein binding

Journal

BIOINFORMATICS
Volume 36, Issue 5, Pages 1405-1412

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btz768

Keywords

-

Funding

  1. National Key Research and Development Program of China [2016YFA0502303]
  2. National Key Basic Research Project of China [2015CB910303]
  3. National Natural Science Foundation of China [31871342]
  4. National Key R&D Program of China [2016YFC0901603]
  5. China 863 Program [2015AA020108]
  6. Beijing Advanced Innovation Center for Genomics (ICG)
  7. State Key Laboratory of Protein and Plant Gene Research, Peking University

Ask authors/readers for more resources

Motivation: Convolutional neural networks (CNNs) have outperformed conventional methods in modeling the sequence specificity of DNA-protein binding. While previous studies have built a connection between CNNs and probabilistic models, simple models of CNNs cannot achieve sufficient accuracy on this problem. Recently, some methods of neural networks have increased performance using complex neural networks whose results cannot be directly interpreted. However, it is difficult to combine probabilistic models and CNNs effectively to improve DNA- protein binding predictions. Results: In this article, we present a novel global pooling method: expectation pooling for predicting DNA-protein binding. Our pooling method stems naturally from the expectation maximization algorithm, and its benefits can be interpreted both statistically and via deep learning theory. Through experiments, we demonstrate that our pooling method improves the prediction performance DNA-protein binding. Our interpretable pooling method combines probabilistic ideas with global pooling by taking the expectations of inputs without increasing the number of parameters. We also analyze the hyperparameters in our method and propose optional structures to help fit different datasets. We explore how to effectively utilize these novel pooling methods and show that combining statistical methods with deep learning is highly beneficial, which is promising and meaningful for future studies in this field.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Biochemical Research Methods

scMRA: a robust deep learning method to annotate scRNA-seq data with multiple reference datasets

Musu Yuan, Liang Chen, Minghua Deng

Summary: The research introduces a robust deep learning-based single-cell Multiple Reference Annotator that effectively transfers knowledge from multiple insufficient reference datasets to unlabeled target data, while also removing batch effects.

BIOINFORMATICS (2022)

Article Biochemical Research Methods

Quantitative model suggests both intrinsic and contextual features contribute to the transcript coding ability determination in cells

Yu-Jian Kang, Jing-Yi Li, Lan Ke, Shuai Jiang, De-Chang Yang, Mei Hou, Ge Gao

Summary: This study introduces the ribosome calculator, which quantitatively models the coding ability of RNAs in the human genome, and successfully predicts transcripts with different coding abilities in various cell types. This suggests that the coding ability of transcripts should be modeled as a continuous spectrum with context-dependent nature.

BRIEFINGS IN BIOINFORMATICS (2022)

Article Engineering, Electrical & Electronic

Improve Deep Unsupervised Hashing via Structural and Intrinsic Similarity Learning

Xiao Luo, Zeyu Ma, Wei Cheng, Minghua Deng

Summary: This paper proposes an effective unsupervised hashing method called Hashing via Structural and Intrinsic siMilarity learning (HashSIM). It tackles the drawbacks of existing methods by utilizing structural similarity learning and intrinsic similarity learning. Experimental results demonstrate that HashSIM outperforms state-of-the-art baselines on multiple benchmark datasets.

IEEE SIGNAL PROCESSING LETTERS (2022)

Article Computer Science, Information Systems

A Survey on Deep Hashing Methods

Xiao Luo, Haixin Wang, Daqing Wu, Chong Chen, Minghua Deng, Jianqiang Huang, Xian-Sheng Hua

Summary: Nearest neighbor search is a basic task in fields like computer vision and data mining, and hashing is a widely used method for its efficiency. Deep hashing methods, with the development of deep learning, show more advantages than traditional methods. In this survey, deep supervised hashing and deep unsupervised hashing algorithms are investigated in detail. Additionally, important topics such as semi-supervised deep hashing, domain adaption deep hashing, and multi-modal deep hashing are introduced, along with commonly used datasets and performance evaluation schemes.

ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA (2023)

Article Computer Science, Artificial Intelligence

GHNN: Graph Harmonic Neural Networks for semi-supervised graph-level classification

Wei Ju, Xiao Luo, Zeyu Ma, Junwei Yang, Minghua Deng, Ming Zhang

Summary: This paper proposes a Graph Harmonic Neural Network (GHNN) that combines the advantages of graph convolutional networks and graph kernels to fully utilize unlabeled data, overcoming the scarcity of labeled data in semi-supervised scenarios.

NEURAL NETWORKS (2022)

Article Computer Science, Artificial Intelligence

Unsupervised graph-level representation learning with hierarchical contrasts

Wei Ju, Yiyang Gu, Xiao Luo, Yifan Wang, Haochen Yuan, Huasong Zhong, Ming Zhang

Summary: This paper proposes an unsupervised graph-level representation learning framework called Hierarchical Graph Contrastive Learning (HGCL), which addresses the issues of limited exploration of semantic information for graph representation and memory problems during optimization in graph domains. HGCL investigates the hierarchical structural semantics of a graph at both node and graph levels through contrastive learning. Experimental results demonstrate that HGCL outperforms a broad range of state-of-the-art baselines in graph classification and transfer learning tasks.

NEURAL NETWORKS (2023)

Article Genetics & Heredity

scEMAIL: Universal and Source-free Annotation Method for scRNA-seq Data with Novel Cell-type Perception

Hui Wan, Liang Chen, Minghua Deng

Summary: Current cell-type annotation tools for scRNA-seq data rely on well-annotated source data to identify cell types in target data. However, the need for raw source data may not always be fulfilled due to privacy concerns. These methods also struggle to detect novel cell types and often require subjective thresholds. The proposed scEMAIL framework addresses these limitations by automatically detecting novel cell types without accessing source data and utilizing a novel cell-type perception module.

GENOMICS PROTEOMICS & BIOINFORMATICS (2022)

Article Biochemical Research Methods

scGAD: a new task and end-to-end framework for generalized cell type annotation and discovery

Yuyao Zhai, Liang Chen, Minghua Deng

Summary: The rapid development of single-cell RNA sequencing technology enables us to study gene expression heterogeneity at the cellular level. In this paper, a new and practical task called generalized cell type annotation and discovery is proposed for scRNA-seq data, aiming to label target cells with either known cell types or cluster labels instead of a unified 'unassigned' label.

BRIEFINGS IN BIOINFORMATICS (2023)

Article Biochemistry & Molecular Biology

Genome-Wide Identification of Gene Loss Events Suggests Loss Relics as a Potential Source of Functional lncRNAs in Humans

Zheng-Yang Wen, Yu-Jian Kang, Lan Ke, De-Chang Yang, Ge Gao

Summary: Gene loss is a common source of genetic variation in genome evolution. We developed a new pipeline that integrates orthologous inference and genome alignment to effectively identify loss events. We discovered 33 gene loss events that give rise to novel lncRNAs with distinct expression features and potential functions related to growth, development, immunity, and reproduction in humans. Our data also revealed variable rates of protein gene loss and functional biases among different lineages.

MOLECULAR BIOLOGY AND EVOLUTION (2023)

Article Computer Science, Artificial Intelligence

Few-shot Molecular Property Prediction via Hierarchically Structured Learning on Relation Graphs

Wei Ju, Zequn Liu, Yifang Qin, Bin Feng, Chen Wang, Zhihui Guo, Xiao Luo, Ming Zhang

Summary: This paper addresses the problem of few-shot molecular property prediction in cheminformatics and drug discovery. It proposes a novel framework called HSL-RG which explores the structural semantics of molecules at global-level and local-level granularities. The framework leverages graph kernels to build relation graphs for global communication of structural knowledge and utilizes self-supervised learning signals for local transformation-invariant representations. Experimental results on benchmark datasets demonstrate the superiority of HSL-RG over existing state-of-the-art approaches.

NEURAL NETWORKS (2023)

Article Biochemistry & Molecular Biology

ESICCC as a systematic computational framework for evaluation, selection, and integration of cell-cell communication inference methods

Jiaxin Luo, Minghua Deng, Xuegong Zhang, Xiaoqiang Sun

Summary: Cell-cell communication is crucial for determining cell fates and functions in multicellular organisms. This study evaluated and compared the performances of different inference methods for cell-cell communication using various data sets. The results identified the best-performing methods for ligand-receptor inference and ligand/receptor-target regulation prediction, and provided a guideline and an ensemble pipeline for practical applications.

GENOME RESEARCH (2023)

Article Computer Science, Artificial Intelligence

Toward Effective Domain Adaptive Retrieval

Haixin Wang, Jinan Sun, Xiao Luo, Wei Xiang, Shikun Zhang, Chong Chen, Xian-Sheng Hua

Summary: This paper proposes a principled framework called PEACE for unsupervised domain adaptive hashing. PEACE holistically explores semantic information in both source and target data and incorporates it for effective domain alignment. It leverages label embeddings to guide the optimization of hash codes for source data and proposes a novel method to measure the uncertainty of pseudo-labels for unlabeled target data and minimize them through alternative optimization. PEACE also removes domain discrepancy in the Hamming space through composite adversarial learning and aligns cluster semantic centroids across domains.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2023)

Article Computer Science, Artificial Intelligence

Redundancy-Free Self-Supervised Relational Learning for Graph Clustering

Siyu Yi, Wei Ju, Yifang Qin, Xiao Luo, Luchen Liu, Yongdao Zhou, Ming Zhang

Summary: The article proposes a novel self-supervised deep graph clustering method called relational redundancy-free graph clustering (R(2)FGC). It enhances graph clustering performance by extracting relational information from both global and local views and mitigates the oversmoothing issue through a simple yet valid strategy.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2023)

Article Biochemical Research Methods

Clustering single-cell multi-omics data with MoClust

Musu Yuan, Liang Chen, Minghua Deng

Summary: This study developed a novel joint clustering framework called MoClust for analyzing single-cell multi-omics data. The framework improves data quality through automatic doublet detection and omics-specific autoencoders, and enhances clustering accuracy and separability through contrastive learning-based distribution alignment.

BIOINFORMATICS (2023)

No Data Available