Article
Biochemical Research Methods
Ruheng Wang, Junru Jin, Quan Zou, Kenta Nakai, Leyi Wei
Summary: In this study, we propose a BERT-based contrastive learning framework called PepBCL for predicting protein-peptide binding residues. This method eliminates the need for complex feature engineering by utilizing a well-pretrained protein language model to automatically extract and learn feature representations. Additionally, a contrastive learning module is used to optimize the feature representations of binding residues within the imbalanced dataset, resulting in improved performance. Experimental results demonstrate that our method outperforms existing techniques, and the integration of traditional features and learned features further enhances performance.
Article
Chemistry, Multidisciplinary
Xueru Zhao, Furong Chang, Hehe Lv, Guobing Zou, Bofeng Zhang
Summary: This paper proposes HPNet, a method that can automatically identify RNA-binding sites and binding preferences. HPNet performs feature learning from the perspectives of RNA sequence and RNA secondary structure. By combining a convolutional neural network (CNN) and a differentiable pooling graph neural network (GNN), HPNet improves the accuracy of RNA-binding site prediction. The experimental results show that HPNet achieves a mean area under the curve (AUC) of 94.5% for the benchmark dataset, outperforming state-of-the-art methods. These results also demonstrate the importance of hierarchical features of RNA secondary structure in selecting RNA-binding sites.
APPLIED SCIENCES-BASEL
(2023)
Article
Mechanics
Dawen Shen, Zhaohua Sheng, Yunzhen Zhang, Guangyao Rong, Kevin Wu, Jianping Wang
Summary: As the rotating detonation engine (RDE) matures toward engineering implementation, it is crucial to develop real-time diagnostics capable of monitoring and predicting combustion states to prevent combustion instability. In this study, a novel Transformer-based neural network, RDE-Transformer, is proposed for monitoring and predicting combustion states in advance, showing high performance and interpretability.
Article
Biochemical Research Methods
Lijun Cai, Li Wang, Xiangzheng Fu, Chenxing Xia, Xiangxiang Zeng, Quan Zou
Summary: The development of an Interpretable Therapeutic Peptide Prediction (ITP-Pred) model based on efficient feature fusion showed higher prediction performance in cross-validation and independent verification experiments, providing guidance for designing better models.
BRIEFINGS IN BIOINFORMATICS
(2021)
Article
Computer Science, Artificial Intelligence
Yifei Wang, Xue Wang, Cheng Chen, Hongli Gao, Adil Salhi, Xin Gao, Bin Yu
Summary: RNA-protein interactions play a crucial regulatory role in cellular physiological processes. This study proposes an interpretable RPI-CapsuleGAN method for RPI prediction, which combines a generative adversarial capsule network and a convolutional block attention module. The method extracts and fuses multiple features to characterize RNA and protein sequences, and effectively solves the problem of the disappearance of the model spatial structure hierarchy. Extensive experiments show that RPI-CapsuleGAN provides an efficient, accurate, and stable method for RPI prediction, outperforming other mainstream deep learning algorithms.
PATTERN RECOGNITION
(2023)
Article
Computer Science, Artificial Intelligence
Haoteng Tang, Guixiang Ma, Lifang He, Heng Huang, Liang Zhan
Summary: In this study, a new interpretable graph pooling framework - CommPOOL, is proposed to capture and preserve the hierarchical community structure of graphs in the graph representation learning process, showing superior performance in graph representation learning on five public benchmark datasets and one synthetic dataset.
Article
Biochemical Research Methods
Wenjuan Nie, Lei Deng
Summary: This study proposes a new method called TSNAPred to accurately identify the interaction between protein and nucleic acid. The method utilizes the features derived from protein sequence and employs the sliding window technique and weighted ensemble strategy to improve prediction performance. Experimental results demonstrate that TSNAPred can effectively identify different types of nucleic acid binding residues and outperforms other methods in distinguishing DNA-binding and RNA-binding residues.
BRIEFINGS IN BIOINFORMATICS
(2022)
Article
Computer Science, Information Systems
Sergio Penafiel, Nelson Baloian, Horacio Sanson, Jose A. Pino
Summary: Predicting an individual's risk of stroke is a popular research topic, with many governments using medical data and AI methods. While black-box methods are accurate, the medical field values explanations for gaining insight.
Article
Automation & Control Systems
Chandra Mohan Dasari, Santhosh Amilpur, Raju Bhukya
Summary: The proposed interpretable deep learning technique, PBVPP, utilizes experimental data and performance metrics to predict binding sites, showing the capability to extract vital features from large-scale genomic sequences and achieve accurate prediction of TFBS and RBP sites. The model reveals how to mine vital features and extract variable length patterns for improved prediction of binding sites, validating obtained motifs against known target motifs in a database, and exhibiting better performance compared to existing methods.
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE
(2021)
Article
Multidisciplinary Sciences
Shuyu Wang, Yinbo Liu, Yufeng Liu, Yong Zhang, Xiaolei Zhu
Summary: In this study, a new computational approach called BERT5mC was proposed to achieve fast and accurate prediction of DNA 5-methylcytosine (5mC) sites. The approach utilized a pre-trained and fine-tuned BERT model, resulting in high prediction accuracy. Additionally, a webserver was built for easy access to the model.
Article
Chemistry, Multidisciplinary
Surendra Kumar, Mi-hyun Kim
Summary: In the field of drug discovery, predicting protein-ligand binding affinities accurately and quickly is crucial for optimizing lead compounds. Various machine learning or deep learning methods have been proposed to address the limitations of traditional scoring functions. While these new approaches are highly accurate, they often require complex featurization processes and additional analysis to interpret the embedded features.
JOURNAL OF CHEMINFORMATICS
(2021)
Article
Biochemical Research Methods
Chun He, Xinhai Ye, Yi Yang, Liya Hu, Yuxuan Si, Xianxin Zhao, Longfei Chen, Qi Fang, Ying Wei, Fei Wu, Gongyin Ye
Summary: Allergies have become a global public health issue, and prevention through allergen identification and avoidance is crucial. Current computational methods for allergen identification have limitations in detecting low homology allergens, and deep learning-based methods are rare. This study proposes DeepAlgPro, a deep neural network-based model, which demonstrates high accuracy and applicability for large-scale forecasts compared to other available tools. Ablation experiments highlight the importance of the convolutional module, and epitope features contribute to model decision-making, improving interpretability. DeepAlgPro is capable of detecting potential new allergens, making it a powerful software for allergen identification.
BRIEFINGS IN BIOINFORMATICS
(2023)
Article
Biochemistry & Molecular Biology
Ajay Arya, Dana Mary Varghese, Ajay Kumar Verma, Shandar Ahmad
Summary: Prediction of DNA-binding residues in proteins using sequence-based methods have been widely studied. The current primary feature set, Position Specific Substitution Matrix (PSSM), is powerful for identifying conserved binding sites but falls short for residues undergoing binding to non-binding transitions.
JOURNAL OF MOLECULAR BIOLOGY
(2022)
Article
Biochemical Research Methods
Elham Khalili, Shahin Ramazi, Faezeh Ghanati, Samaneh Kouchaki
Summary: Phosphorylation of proteins is a significant post-translational modification that plays a crucial role in plant functionality. Accurate prediction of plant phosphorylation sites is vital, and this study develops machine learning-based techniques to improve the prediction of protein phosphorylation sites in soybean. The proposed technique achieves high accuracy and specificity, and can be used to automatically analyze data and predict potential protein phosphorylation sites in plants.
BRIEFINGS IN BIOINFORMATICS
(2022)
Article
Chemistry, Multidisciplinary
Cheng-Jie Ma, Lin Li, Wen-Xuan Shao, Jiang-Hui Ding, Xiao-Li Cai, Zhao-Rong Lun, Bi-Feng Yuan, Yu-Qi Feng
Summary: 5-Hydroxymethyluracil (5hmU) is a thymine modification found in various organism genomes. A novel enzyme-mediated bioorthogonal labeling method has been developed to selectively enrich 5hmU in genomes, allowing for a better understanding of its distribution and functional roles.
Article
Biochemical Research Methods
Musu Yuan, Liang Chen, Minghua Deng
Summary: The research introduces a robust deep learning-based single-cell Multiple Reference Annotator that effectively transfers knowledge from multiple insufficient reference datasets to unlabeled target data, while also removing batch effects.
Article
Biochemical Research Methods
Yu-Jian Kang, Jing-Yi Li, Lan Ke, Shuai Jiang, De-Chang Yang, Mei Hou, Ge Gao
Summary: This study introduces the ribosome calculator, which quantitatively models the coding ability of RNAs in the human genome, and successfully predicts transcripts with different coding abilities in various cell types. This suggests that the coding ability of transcripts should be modeled as a continuous spectrum with context-dependent nature.
BRIEFINGS IN BIOINFORMATICS
(2022)
Article
Engineering, Electrical & Electronic
Xiao Luo, Zeyu Ma, Wei Cheng, Minghua Deng
Summary: This paper proposes an effective unsupervised hashing method called Hashing via Structural and Intrinsic siMilarity learning (HashSIM). It tackles the drawbacks of existing methods by utilizing structural similarity learning and intrinsic similarity learning. Experimental results demonstrate that HashSIM outperforms state-of-the-art baselines on multiple benchmark datasets.
IEEE SIGNAL PROCESSING LETTERS
(2022)
Article
Computer Science, Information Systems
Xiao Luo, Haixin Wang, Daqing Wu, Chong Chen, Minghua Deng, Jianqiang Huang, Xian-Sheng Hua
Summary: Nearest neighbor search is a basic task in fields like computer vision and data mining, and hashing is a widely used method for its efficiency. Deep hashing methods, with the development of deep learning, show more advantages than traditional methods. In this survey, deep supervised hashing and deep unsupervised hashing algorithms are investigated in detail. Additionally, important topics such as semi-supervised deep hashing, domain adaption deep hashing, and multi-modal deep hashing are introduced, along with commonly used datasets and performance evaluation schemes.
ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA
(2023)
Article
Computer Science, Artificial Intelligence
Wei Ju, Xiao Luo, Zeyu Ma, Junwei Yang, Minghua Deng, Ming Zhang
Summary: This paper proposes a Graph Harmonic Neural Network (GHNN) that combines the advantages of graph convolutional networks and graph kernels to fully utilize unlabeled data, overcoming the scarcity of labeled data in semi-supervised scenarios.
Article
Computer Science, Artificial Intelligence
Wei Ju, Yiyang Gu, Xiao Luo, Yifan Wang, Haochen Yuan, Huasong Zhong, Ming Zhang
Summary: This paper proposes an unsupervised graph-level representation learning framework called Hierarchical Graph Contrastive Learning (HGCL), which addresses the issues of limited exploration of semantic information for graph representation and memory problems during optimization in graph domains. HGCL investigates the hierarchical structural semantics of a graph at both node and graph levels through contrastive learning. Experimental results demonstrate that HGCL outperforms a broad range of state-of-the-art baselines in graph classification and transfer learning tasks.
Article
Genetics & Heredity
Hui Wan, Liang Chen, Minghua Deng
Summary: Current cell-type annotation tools for scRNA-seq data rely on well-annotated source data to identify cell types in target data. However, the need for raw source data may not always be fulfilled due to privacy concerns. These methods also struggle to detect novel cell types and often require subjective thresholds. The proposed scEMAIL framework addresses these limitations by automatically detecting novel cell types without accessing source data and utilizing a novel cell-type perception module.
GENOMICS PROTEOMICS & BIOINFORMATICS
(2022)
Article
Biochemical Research Methods
Yuyao Zhai, Liang Chen, Minghua Deng
Summary: The rapid development of single-cell RNA sequencing technology enables us to study gene expression heterogeneity at the cellular level. In this paper, a new and practical task called generalized cell type annotation and discovery is proposed for scRNA-seq data, aiming to label target cells with either known cell types or cluster labels instead of a unified 'unassigned' label.
BRIEFINGS IN BIOINFORMATICS
(2023)
Article
Biochemistry & Molecular Biology
Zheng-Yang Wen, Yu-Jian Kang, Lan Ke, De-Chang Yang, Ge Gao
Summary: Gene loss is a common source of genetic variation in genome evolution. We developed a new pipeline that integrates orthologous inference and genome alignment to effectively identify loss events. We discovered 33 gene loss events that give rise to novel lncRNAs with distinct expression features and potential functions related to growth, development, immunity, and reproduction in humans. Our data also revealed variable rates of protein gene loss and functional biases among different lineages.
MOLECULAR BIOLOGY AND EVOLUTION
(2023)
Article
Computer Science, Artificial Intelligence
Wei Ju, Zequn Liu, Yifang Qin, Bin Feng, Chen Wang, Zhihui Guo, Xiao Luo, Ming Zhang
Summary: This paper addresses the problem of few-shot molecular property prediction in cheminformatics and drug discovery. It proposes a novel framework called HSL-RG which explores the structural semantics of molecules at global-level and local-level granularities. The framework leverages graph kernels to build relation graphs for global communication of structural knowledge and utilizes self-supervised learning signals for local transformation-invariant representations. Experimental results on benchmark datasets demonstrate the superiority of HSL-RG over existing state-of-the-art approaches.
Article
Biochemistry & Molecular Biology
Jiaxin Luo, Minghua Deng, Xuegong Zhang, Xiaoqiang Sun
Summary: Cell-cell communication is crucial for determining cell fates and functions in multicellular organisms. This study evaluated and compared the performances of different inference methods for cell-cell communication using various data sets. The results identified the best-performing methods for ligand-receptor inference and ligand/receptor-target regulation prediction, and provided a guideline and an ensemble pipeline for practical applications.
Article
Computer Science, Artificial Intelligence
Haixin Wang, Jinan Sun, Xiao Luo, Wei Xiang, Shikun Zhang, Chong Chen, Xian-Sheng Hua
Summary: This paper proposes a principled framework called PEACE for unsupervised domain adaptive hashing. PEACE holistically explores semantic information in both source and target data and incorporates it for effective domain alignment. It leverages label embeddings to guide the optimization of hash codes for source data and proposes a novel method to measure the uncertainty of pseudo-labels for unlabeled target data and minimize them through alternative optimization. PEACE also removes domain discrepancy in the Hamming space through composite adversarial learning and aligns cluster semantic centroids across domains.
IEEE TRANSACTIONS ON IMAGE PROCESSING
(2023)
Article
Computer Science, Artificial Intelligence
Siyu Yi, Wei Ju, Yifang Qin, Xiao Luo, Luchen Liu, Yongdao Zhou, Ming Zhang
Summary: The article proposes a novel self-supervised deep graph clustering method called relational redundancy-free graph clustering (R(2)FGC). It enhances graph clustering performance by extracting relational information from both global and local views and mitigates the oversmoothing issue through a simple yet valid strategy.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
(2023)
Article
Biochemical Research Methods
Musu Yuan, Liang Chen, Minghua Deng
Summary: This study developed a novel joint clustering framework called MoClust for analyzing single-cell multi-omics data. The framework improves data quality through automatic doublet detection and omics-specific autoencoders, and enhances clustering accuracy and separability through contrastive learning-based distribution alignment.