☆ 4.6 Article

iRO-PsekGCC: Identify DNA Replication Origins Based on Pseudo k-Tuple GC Composition

FRONTIERS IN GENETICS (2019)

Journal

FRONTIERS IN GENETICS

Volume 10, Issue -, Pages -

Publisher

FRONTIERS MEDIA SA

DOI: 10.3389/fgene.2019.00842

Keywords

replication origin identification; pseudo k-tuple GC composition; random forest; web-server; DNA sequence analysis

Categories

Genetics & Heredity

Funding

National Natural Science Foundation of China [61672184, 61732012, 61822306]
Fok Ying-Tung Education Foundation for Young Teachers in the Higher Education Institutions of China [161063]
Shenzhen Overseas High Level Talents Innovation Foundation [KQJSCX20170327161949608]
Guangdong Natural Science Funds for Distinguished Young Scholars [2016A030306008]
Scientific Research Foundation in Shenzhen [JCYJ20180306172207178]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Identification of replication origins is playing a key role in understanding the mechanism of DNA replication. This task is of great significance in DNA sequence analysis. Because of its importance, some computational approaches have been introduced. Among these predictors, the iRO-3wPseKNC predictor is the first discriminative method that is able to correctly identify the entire replication origins. For further improving its predictive performance, we proposed the Pseudo k-tuple GC Composition (PsekGCC) approach to capture the GC asymmetry bias of yeast species by considering both the GC skew and the sequence order effects of k-tuple GC Composition (k-GCC) in this study. Based on PseKGCC, we proposed a new predictor called iRO-PsekGCC to identify the DNA replication origins. Rigorous jackknife test on two yeast species benchmark datasets (Saccharomyces cerevisiae, Pichia pastoris) indicated that iRO-PsekGCC outperformed iRO-3wPseKNC. It can be anticipated that iRO-PsekGCC will be a useful tool for DNA replication origin identification. Availability and implementation: The web-server for the iRO-PsekGCC predictor was established, and it can be accessed at http://bliulab.net/iRO-PsekGCC/.

Authors

I am an author on this paper

Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6

Not enough ratings

Secondary Ratings

Novelty

-

Significance

-

Scientific rigor

-

Rate this paper

Recommended

Article Biochemical Research Methods

KNIndex: a comprehensive database of physicochemical properties for k-tuple nucleotides

Wen-Ya Zhang, Junhai Xu, Jun Wang, Yuan-Ke Zhou, Wei Chen, Pu-Feng Du

Summary: With the advancement of high-throughput sequencing technology, genomic sequences have exponentially increased, leading to the introduction of machine learning methods for genome annotation and analysis. To facilitate the study of genomic sequences, the KNIndex database was developed to deposit and visualize physicochemical properties of k-tuple nucleotides, providing a user-friendly interface for browsing, querying, visualizing, and downloading these properties.

BRIEFINGS IN BIOINFORMATICS (2021)

Add to Collection

Article Engineering, Electrical & Electronic

FeduLPM: Federated Unsupervised Learning-Based Predictive Model for Speed Control in Customizable Automotive Variants

S. Samsudeen, G. Senthil Kumar

Summary: In this study, a novel federated unsupervised learning-based predictive model (FeduLPM) technique is proposed to control the speed of customizable automotive variants by analyzing accident data from different locations. The method distributes accident data to local models and aggregates trained parameters to generate a global model, providing accurate speed limit suggestions. Experimental results show that the proposed FeduLPM achieved 96.7% accuracy in processing data from various locations in Bengaluru, making it a better solution to prevent accidents for bike and car drivers.

IEEE SENSORS JOURNAL (2023)

Add to Collection

Article Biology

FAM111A regulates replication origin activation and cell fitness

Diana Rios-Szwed, Vanesa Alvarez, Luis Sanchez-Pulido, Elisa Garcia-Wilson, Hao Jiang, Susanne Bandau, Angus Lamond, Constance Alabert

Summary: FAM111A is a replisome-associated protein that plays an important role in the activation of DNA replication origins. Dominant mutations within its trypsin-like peptidase domain are linked to severe human developmental syndrome. Research has shown that under normal conditions, FAM111A promotes DNA replication, but in a disease context, its unrestrained expression can lead to DNA damage and cell death.

LIFE SCIENCE ALLIANCE (2023)

Add to Collection

Article Biochemistry & Molecular Biology

Anti-Ebola: an initiative to predict Ebola virus inhibitors through machine learning

Akanksha Rajput, Manoj Kumar

Summary: The Ebola virus, a deadly pathogen since 1976, has led researchers to develop computational models for drug discovery. Using molecular descriptors, a predictive model for anti-Ebola compounds was developed and integrated into a web server for scientific use.

MOLECULAR DIVERSITY (2022)

Add to Collection

Article Environmental Sciences

Rice Mapping in Training Sample Shortage Regions Using a Deep Semantic Segmentation Model Trained on Pseudo-Labels

Pengliang Wei, Ran Huang, Tao Lin, Jingfeng Huang

Summary: This study introduces a workflow that utilizes a deep semantic segmentation model to extract rice distribution information in regions with limited training samples. By training the model on pseudo-labels, the time-consuming annotation process for ground truth data can be reduced. Experimental results show that the proposed method outperforms existing approaches in terms of accurately extracting rice area and spatial distribution information.

REMOTE SENSING (2022)

Add to Collection

Article Food Science & Technology

Elucidating the flavour of cooked white asparagus by combining metabolomics and taste panel analysis

Eirini Pegiou, Roland Mumm, Robert D. Hall

Summary: A study found that the chemical composition of asparagus changes during cooking, with some substances increasing while others decreasing. Profiles of asparagus metabolites were analyzed using GC-MS and LC-MS, and the flavor attributes were evaluated by a taste panel. The study revealed the key biochemical pathways and chemical transformations relevant to asparagus flavor.

LWT-FOOD SCIENCE AND TECHNOLOGY (2023)

Add to Collection

Article Multidisciplinary Sciences

R-loop proximity proteomics identifies a role of DDX41 in transcription-associated genomic instability

Thorsten Mosler, Francesca Conte, Gabriel M. C. Longo, Ivan Mikicic, Nastasja Kreim, Martin M. Moeckel, Giuseppe Petrosino, Johanna Flach, Joan Barau, Brian Luke, Vassilis Roukos, Petra Beli

Summary: The study reveals that DDX41 plays a crucial role in regulating the accumulation of R-loop and double strand DNA breaks in gene promoters through mapping the R-loop proximal proteome in human cells.

NATURE COMMUNICATIONS (2021)

Add to Collection

Review Multidisciplinary Sciences

Biological Sequence Classification: A Review on Data and General Methods

Chunyan Ao, Shihu Jiao, Yansu Wang, Liang Yu, Quan Zou

Summary: The rapid growth of biological sequences has driven the application of machine learning in this field, focusing on function and modification classification. Establishing a support website to provide information and datasets for classification methods, discussing current challenges and future prospects.

RESEARCH (2022)

Add to Collection

Article Multidisciplinary Sciences

Pan-cancer analysis of non-oncogene addiction to DNA repair

Luis Bermudez-Guzman

Summary: This study demonstrates the existence and importance of non-oncogenic addiction to DNA repair in cancer, which may assist in identifying prognostic biomarkers and therapeutic opportunities.

SCIENTIFIC REPORTS (2021)

Add to Collection

Article Biochemistry & Molecular Biology

Genomic patterns of transcription-replication interactions in mouse primary B cells

Commodore P. St Germain, Hongchang Zhao, Vrishti Sinha, Lionel A. Sanz, Frederic Chedin, Jacqueline H. Barlow

Summary: Conflicts between transcription and replication machinery can lead to replication stress and genome instability. The newly developed TRIPn-Seq method allows the identification of genomic loci prone to transcription-replication interactions. Using TRIPn-Seq, the authors mapped 1009 unique transcription-replication interactions in mouse primary B cells, which were enriched at transcription start sites and early replicating regions.

NUCLEIC ACIDS RESEARCH (2022)

Add to Collection

Article Microbiology

Computational identification of promoters in Klebsiella aerogenes by using support vector machine

Yan Lin, Meili Sun, Junjie Zhang, Mingyan Li, Keli Yang, Chengyan Wu, Hasan Zulfiqar, Hongyan Lai

Summary: This study aims to develop a machine learning-based model for predicting promoters in Klebsiella aerogenes. The model utilizes a unique encoding and optimization method to accurately identify promoter sequences in K. aerogenes.

FRONTIERS IN MICROBIOLOGY (2023)

Add to Collection

Article Chemistry, Medicinal

Improving DNA-Binding Protein Prediction Using Three-Part Sequence-Order Feature Extraction and a Deep Neural Network Algorithm

Jun Hu, Wen-Wu Zeng, Ning-Xin Jia, Muhammad Arif, Dong-Jun Yu, Gui-Jun Zhang

Summary: A new sequence feature extraction strategy called TPSO is developed for predicting DNA-binding proteins (DBPs), which achieves higher accuracy and Matthew's correlation coefficient value compared to existing methods. The TPSO-DBP method utilizes TPSO and a deep learning framework to learn the relationship between input features and DBPs.

JOURNAL OF CHEMICAL INFORMATION AND MODELING (2023)

Add to Collection

Article Chemistry, Medicinal

Improving DNA-Binding Protein Prediction Using Three-Part Sequence-Order Feature Extraction and a Deep Neural Network Algorithm

Jun Hu, Wen-Wu Zeng, Ning-Xin Jia, Muhammad Arif, Dong-Jun Yu, Gui-Jun Zhang

Summary: A new three-part sequence order feature extraction strategy (TPSO) is developed to predict DNA-binding protein (DBP) by extracting more discriminative information from protein sequences. A deep learning-based method called TPSO-DBP is proposed, which achieves an accuracy of 87.01% and a significantly higher Matthews correlation coefficient value (0.741) compared to existing DBP prediction methods.

JOURNAL OF CHEMICAL INFORMATION AND MODELING (2023)

Add to Collection

Article Chemistry, Applied

Untargeted lipidomic approach in studying pinot noir wine lipids and predicting wine origin

Quynh Phan, Elizabeth Tomasino

Summary: This study utilized an advanced lipidomic profiling approach to analyze commercial Pinot noir wines, revealing that wine lipids have a strong potential for classifying wines by origin, with the top 58 lipids playing a significant role in discrimination.

FOOD CHEMISTRY (2021)

Add to Collection

Article Environmental Sciences

A Multifactor-Based Random Forest Regression Model to Reconstruct a Continuous Deformation Map in Xi'an, China

Xinxin Guo, Chaoying Zhao, Guangrong Li, Mimi Peng, Qin Zhang, Pierluigi Confuorto, Federico Raspini, Matteo Del Soldato, Chiara Cappadonia, Simon Plank, Mariano Di Napoli

Summary: Synthetic Aperture Radar Interferometry (InSAR) is an effective technique for monitoring large-scale ground deformation with high spatial resolution. However, it is challenging to obtain a spatially continuous deformation map due to SAR decorrelation or distortion. In this study, we propose a multifactor-based machine learning model called the K-RFR model, which combines K-means clustering and random forest regression algorithms to reconstruct a continuous deformation map. The model takes into account various influence factors on ground deformation, such as land use, geological engineering, and groundwater extraction. The study conducted in Xi'an, China, using the SBAS-InSAR technique, demonstrates the effectiveness of the proposed model in predicting ground deformation. The new model outperforms traditional interpolation methods, achieving a higher correlation coefficient with the InSAR measurements.

REMOTE SENSING (2023)

Add to Collection

Review Biochemical Research Methods

A comprehensive review and evaluation of computational methods for identifying protein complexes from protein-protein interaction networks

Zhourun Wu, Qing Liao, Bin Liu

BRIEFINGS IN BIOINFORMATICS (2020)

Add to Collection

Article Biochemical Research Methods

MotifCNN-fold: protein fold recognition based on fold-specific features extracted by motif-based convolutional neural networks

Chen-Chen Li, Bin Liu

BRIEFINGS IN BIOINFORMATICS (2020)

Add to Collection

Article Biochemical Research Methods

DeepSVM-fold: protein fold recognition by combining support vector machines and pairwise sequence similarity scores generated by deep learning networks

Bin Liu, Chen-Chen Li, Ke Yan

BRIEFINGS IN BIOINFORMATICS (2020)

Add to Collection

Article Biochemical Research Methods

Fold-LTR-TCP: protein fold recognition based on triadic closure principle

Bin Liu, Yulin Zhu, Ke Yan

BRIEFINGS IN BIOINFORMATICS (2020)

Add to Collection

Article Medicine, Research & Experimental

sgRNA-PSM: Predict sgRNAs On-Target Activity Based on Position-Specific Mismatch

Bin Liu, Zhihua Luo, Juan He

MOLECULAR THERAPY-NUCLEIC ACIDS (2020)

Add to Collection

Article Biochemical Research Methods

IDP-Seq2Seq: identification of intrinsically disordered regions based on sequence to sequence learning

Yi-Jun Tang, Yi-He Pang, Bin Liu

BIOINFORMATICS (2020)

Add to Collection

Article Biochemical Research Methods

FoldRec-C2C: protein fold recognition by combining cluster-to-cluster model and protein similarity network

Jiangyi Shao, Ke Yan, Bin Liu

Summary: The FoldRec-C2C predictor globally incorporates protein interactions for protein fold recognition, treating it as an information retrieval task in natural language processing. Tested on the LINDAHL dataset, FoldRec-C2C outperforms 34 state-of-the-art methods in the field.

BRIEFINGS IN BIOINFORMATICS (2021)

Add to Collection

Article Biochemical Research Methods

ProtFold-DFG: protein fold recognition by combining Directed Fusion Graph and PageRank algorithm

Jiangyi Shao, Bin Liu

Summary: This study introduces a network-based predictor ProtFold-DFG for protein fold recognition, utilizing Directed Fusion Graph (DFG), KL divergence, and PageRank algorithm to enhance recognition accuracy. Experimental results demonstrate that ProtFold-DFG outperforms 35 other methods on the LINDAHL dataset, making it a promising approach for protein fold recognition.

BRIEFINGS IN BIOINFORMATICS (2021)

Add to Collection

Article Biology

iPiDA-sHN: Identification of Piwi-interacting RNA-disease associations by selecting high quality negative samples

Hang Wei, Yuxin Ding, Bin Liu

COMPUTATIONAL BIOLOGY AND CHEMISTRY (2020)

Add to Collection

Article Biochemical Research Methods

Protein Fold Recognition by Combining Support Vector Machines and Pairwise Sequence Similarity Scores

Ke Yan, Jie Wen, Jin-Xing Liu, Yong Xu, Bin Liu

Summary: The study proposed two novel algorithms, TSVM-fold and ESVM-fold, utilizing sequence similarity scores generated by multiple template-based methods for protein fold recognition prediction. Experimental results showed that these algorithms outperform some state-of-the-art methods in rigorous benchmark datasets.

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS (2021)

Add to Collection

Article Biochemical Research Methods

iLncRNAdis-FB: Identify lncRNA-Disease Associations by Fusing Biological Feature Blocks Through Deep Neural Network

Hang Wei, Qing Liao, Bin Liu

Summary: Identifying lncRNA-disease associations is crucial for exploring disease mechanisms and molecular drug discovery. However, current fusion strategies fail to remove noisy and irrelevant information, leading to low predictive performance. iLncRNAdis-FB proposes a new computational predictor based on CNN to integrate feature blocks from different data sources, achieving better prediction accuracy.

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS (2021)

Add to Collection

Article Biochemical Research Methods

RFPR-IDP: reduce the false positive rates for intrinsically disordered protein and region prediction by incorporating both fully ordered proteins and disordered proteins

Yumeng Liu, Xiaolong Wang, Bin Liu

Summary: Intrinsically disordered proteins/regions (IDPs/IDRs) are important for biological functions, and accurate prediction is crucial for protein structure and function predictions. However, most existing methods tend to predict fully ordered proteins as disordered, ignoring the fact that most newly sequenced proteins are fully ordered. The proposed RFPR-IDP method, trained on both ordered and disordered proteins, outperforms existing predictors in predicting both ordered and disordered proteins.

BRIEFINGS IN BIOINFORMATICS (2021)

Add to Collection

Article Biochemical Research Methods

idenPC-MIIP: identify protein complexes from weighted PPI networks using mutual important interacting partner relation

Zhourun Wu, Qing Liao, Bin Liu

Summary: Protein complexes are key units for studying a cell system, and high-throughput approaches have enabled the determination of PPI data. The proposed mutual important interacting partner relation and the new algorithm idenPC-MIIP show improved performance in identifying protein complexes compared to existing methods.

BRIEFINGS IN BIOINFORMATICS (2021)

Add to Collection

Article Computer Science, Artificial Intelligence

MLDH-Fold: Protein fold recognition based on multi-view low-rank modeling

Ke Yan, Jie Wen, Yong Xu, Bin Liu

Summary: Protein fold recognition is crucial for understanding protein functions and drug design. New methods (MVLR and MLDH-Fold) were proposed to improve predictive performance by combining different views of protein sequences. Experimental results show that these computational methods outperform other predictors, indicating their usefulness for protein fold recognition.

NEUROCOMPUTING (2021)

Add to Collection

Article Biochemical Research Methods

HITS-PR-HHblits: protein remote homology detection by combining PageRank and Hyperlink-Induced Topic Search

Bin Liu, Shuangyan Jiang, Quan Zou

BRIEFINGS IN BIOINFORMATICS (2020)

Add to Collection

No Data Available

© Peeref 2019-2024. All rights reserved.