Article
Biology
Xiao-Yao Qiu, Hao Wu, Jiangyi Shao
Summary: In this study, the integration of sequence embeddings, contact map embeddings, and GO label embeddings based on the TALE architecture was employed to improve the accuracy of protein function prediction, outperforming other existing methods.
COMPUTERS IN BIOLOGY AND MEDICINE
(2022)
Article
Multidisciplinary Sciences
Shou Feng, Huiying Li, Jiaqing Qiao
Summary: This paper proposes a multi-label classification method for lncRNA function annotation, which can predict labels efficiently and accurately. The method utilizes a mathematical model based on Bayesian decision theory and is implemented with long-short term memory network and hierarchical constraint method.
SCIENTIFIC REPORTS
(2022)
Article
Multidisciplinary Sciences
Vladimir Gligorijevic, P. Douglas Renfrew, Tomasz Kosciolek, Julia Koehler Leman, Daniel Berenberg, Tommi Vatanen, Chris Chandler, Bryn C. Taylor, Ian M. Fisk, Hera Vlamakis, Ramnik J. Xavier, Rob Knight, Kyunghyun Cho, Richard Bonneau
Summary: DeepFRI is a graph convolutional network for predicting protein functions by leveraging sequence features from a protein language model and protein structures. It outperforms other methods and supports large-scale sequence repositories.
NATURE COMMUNICATIONS
(2021)
Article
Biochemistry & Molecular Biology
Anowarul Kabir, Amarda Shehu
Summary: This paper presents a novel protein function prediction method that utilizes transformer architecture to encode protein sequences and GO terms for multi-label classification. It also conducts a detailed investigation on constructing training and testing datasets, resulting in a new benchmark dataset.
Article
Biochemical Research Methods
Zhourun Wu, Mingyue Guo, Xiaopeng Jin, Junjie Chen, Bin Liu
Summary: CFAGO is a novel method for protein function prediction that integrates single-species PPI networks and protein biological attributes using a multi-head attention mechanism. Benchmark experiments on human and mouse datasets show that CFAGO outperforms state-of-the-art single-species network-based methods in terms of m-AUPR, M-AUPR, and Fmax, demonstrating the significant improvement in protein function prediction achieved by the cross-fusion using the multi-head attention mechanism.
Article
Biotechnology & Applied Microbiology
Lei Deng, Shengli Ren, Jingpu Zhang
Summary: This paper proposes a new computational method called DNGRGO, which is based on global heterogeneous networks, to predict the functions of lncRNAs. DNGRGO calculates the similarities among proteins, miRNAs, and lncRNAs, and annotates the functions of lncRNAs based on their similar protein-coding genes labeled with gene ontology (GO). Experimental results show that DNGRGO is able to annotate lncRNAs by capturing the low-dimensional features of the heterogeneous network, and integrating miRNA data can improve its predictive performance.
Article
Biochemistry & Molecular Biology
Petri Toronen, Liisa Holm
Summary: The advent of next-generation sequencing technology has resulted in a massive increase in gene catalogs for new genomes, transcriptomes, and metagenomes that require computational inference for functional annotation. PANNZER is a high-throughput functional annotation web server that supports annotation of up to 100,000 protein sequences and provides Gene Ontology annotations and free text description predictions. Two case studies highlight issues related to data quality and method evaluation, arguing that commonly used evaluation metrics and datasets may bias the development of automated function prediction methods.
Article
Biochemistry & Molecular Biology
Kishan Thambu, Victoria Glomb, Rolando Hernandez Trapero, Julio C. Facelli
Summary: Microproteins are a novel and expanding group of small proteins encoded by less than 100-150 codons, translated from small open reading frames (smORFs). Research results show that these microproteins have different amino acid compositions, similar structural characteristics, and fewer small-molecule ligand binding sites compared to regular proteins.
JOURNAL OF BIOMOLECULAR STRUCTURE & DYNAMICS
(2022)
Article
Biochemical Research Methods
Eman Ismail, Walaa Gad, Mohamed Hashem
Summary: In this paper, a hybrid stacking ensemble model with Synthetic Minority Oversampling Technique (Stack-SMOTE) is proposed to predict genes associated with ASD. The proposed model uses a hybrid gene similarity function (HGS) to measure the similarities between genes. The model solves the imbalanced ASD dataset problem using the Synthetic Minority Oversampling Technique (SMOTE) and enhances the prediction of ASD genes by introducing a gradient boosting-based random forest classifier (GBBRF).
BMC BIOINFORMATICS
(2023)
Article
Biochemical Research Methods
Junhua Ye, Shunfang Wang, Xin Yang, Xianjun Tang
Summary: This study uses deep learning to predict the association between genes and diseases, reducing the dimensionality of large biological networks like PPI and GO using the Mashup algorithm, combining various biological information through modular Deep Neural Network (DNN) to predict genes related to aging diseases. The results show that the algorithm outperforms standard neural network algorithm, gradient enhanced tree classifier, and logistic regression classifier in predicting gene-disease associations.
BMC BIOINFORMATICS
(2021)
Article
Biology
Leila Fattel, Dennis Psaroudakis, Colleen F. Yanarella, Kevin O. Chiteri, Haley A. Dostalik, Parnal Joshi, Dollye C. Starr, Ha Vu, Kokulapalan Wimalanathan, Carolyn J. Lawrence-Dill
Summary: Genome-wide functional annotations are crucial for studying gene function and traits in plants. By comparing functional annotation data from different species, similarities and differences can be identified, leading to the generation of novel hypotheses about gene function and evolutionary relationships.
Article
Multidisciplinary Sciences
Flavio Pazos Obregon, Diego Silvera, Pablo Soto, Patricio Yankilevich, Gustavo Guerberoff, Rafael Cantera
Summary: The function of most genes is unknown. In this study, automated function prediction was carried out using machine learning methods and multiple data sources. The prediction models were trained exclusively with features derived from gene location in the genomes. The results showed that gene location alone can be more effective than sequence in predicting gene function for certain biological processes and cellular components.
SCIENTIFIC REPORTS
(2022)
Article
Biology
Miguel Romero, Felipe Kenji Nakano, Jorge Finke, Camilo Rocha, Celine Vens
Summary: With the development of new sequencing technologies, genomic data availability has increased rapidly. Previous studies have used this data to associate genes with biological functions, but often ignored the sparsity and noise in the datasets. This research proposes a method for detecting missing annotations in a hierarchical multi-label classification context, using function relations represented as a hierarchy. Experimental results on rice datasets demonstrate the accuracy and superiority of this method compared to current state-of-the-art approaches.
COMPUTERS IN BIOLOGY AND MEDICINE
(2023)
Article
Biochemical Research Methods
Victoria Bourgeais, Farida Zehraoui, Blaise Hanczar
Summary: In this article, a knowledge-based deep learning model called GraphGONet is proposed, which incorporates Gene Ontology into the hidden layers to achieve a self-explaining neural network. The experiments confirm the accuracy and interpretability of the model, making it promising and valuable in the medical field.
Article
Genetics & Heredity
Yuanyuan Zhang, Ziqi Wang, Shudong Wang, Junliang Shang
Summary: The study systematically analyzes the performance of the GO graph and GOA graph in calculating the similarity of proteins using different graph embedding methods. It shows that graph embedding methods have advantages over traditional IC-based methods in calculating protein similarity, especially random walk graph embedding methods. Comparing link prediction experiment results from GO(DTW) and GOA(cosine) methods, it is shown that GO(DTW) features provide highly effective information for analyzing protein similarity.
FRONTIERS IN GENETICS
(2021)