4.6 Article

Human Pol II promoter recognition based on primary sequences and free energy of dinucleotides

Journal

BMC BIOINFORMATICS
Volume 9, Issue -, Pages -

Publisher

BMC
DOI: 10.1186/1471-2105-9-113

Keywords

-

Ask authors/readers for more resources

Background: Promoter region plays an important role in determining where the transcription of a particular gene should be initiated. Computational prediction of eukaryotic Pol II promoter sequences is one of the most significant problems in sequence analysis. Existing promoter prediction methods are still far from being satisfactory. Results: We attempt to recognize the human Pol II promoter sequences from the non-promoter sequences which are made up of exon and intron sequences. Four methods are used: two kinds of multifractal analysis performed on the numeric sequences obtained from the dinucleotide free energy, Z curve analysis and global descriptor of the promoter/non-promoter primary sequences. A total of 141 parameters are extracted from these methods and categorized into seven groups (methods). They are used to generate certain spaces and then each promoter/non-promoter sequence is represented by a point in the corresponding space. All the 120 possible combinations of the seven methods are tested. Based on Fisher's linear discriminant algorithm, with a relatively smaller number of parameters (96 and 117), we get satisfactory discriminant accuracies. Particularly, in the case of 117 parameters, the accuracies for the training and test sets reach 90.43% and 89.79%, respectively. A comparison with five other existing methods indicates that our methods have a better performance. Using the global descriptor method (36 parameters), 17 of the 18 experimentally verified promoter sequences of human chromosome 22 are correctly identified. Conclusion: The high accuracies achieved suggest that the methods of this paper are useful for understanding the difficult problem of promoter prediction.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Biochemical Research Methods

Recognition of small molecule-RNA binding sites using RNA sequence and structure

Hong Su, Zhenling Peng, Jianyi Yang

Summary: A novel method called RNAsite was developed to predict small molecule-RNA binding sites, showing competitive and superior performance compared to other methods. The study investigated the influence of RNA structure's flexibility and conformational changes caused by ligand binding on RNAsite, as well as explored the possibility of improving RNAsite through geometry-based binding pocket detection.

BIOINFORMATICS (2021)

Article Biochemical Research Methods

Improved estimation of model quality using predicted inter-residue distance

Lisha Ye, Peikun Wu, Zhenling Peng, Jianzhao Gao, Jian Liu, Jianyi Yang

Summary: Protein model quality assessment is crucial in structure prediction, and QDistance is a new approach using inter-residue distances to estimate both global and local qualities. By comparing with reference models and incorporating predicted distances, QDistance proves to be competitive and robust in predicting protein structure quality.

BIOINFORMATICS (2021)

Article Biochemical Research Methods

Human host status inference from temporal microbiome changes via recurrent neural networks

Xingjian Chen, Lingjing Liu, Weitong Zhang, Jianyi Yang, Ka-Chun Wong

Summary: This study proposes a deep learning-based framework to infer human host status using longitudinal microbiome data. Results show that the method achieves robust performance compared to other classifiers and significantly reduces prediction time.

BRIEFINGS IN BIOINFORMATICS (2021)

Article Biochemistry & Molecular Biology

Accurate Sequence-Based Prediction of Deleterious nsSNPs with Multiple Sequence Profiles and Putative Binding Residues

Ruiyang Song, Baixin Cao, Zhenling Peng, Christopher J. Oldfield, Lukasz Kurgan, Ka-Chun Wong, Jianyi Yang

Summary: The article introduces a new sequence-based predictor, DMBS, which improves the accurate prediction of deleterious nsSNPs by optimizing conservation estimates and utilizing functional/binding residue annotations. Empirical results show that DMBS outperforms current methods in various benchmarks, demonstrating its effectiveness in guiding wet-lab experiments.

BIOMOLECULES (2021)

Article Biochemical Research Methods

Toward the assessment of predicted inter-residue distance

Zongyang Du, Zhenling Peng, Jianyi Yang

Summary: The study proposed 19 metrics to measure the accuracy of protein distance prediction, with some metrics showing high correlation with model accuracy. Experimental results showed that the metrics largely coincided with the official version when ranking distance prediction groups in CASP14. These findings suggest that the proposed metrics are effective in measuring distance prediction.

BIOINFORMATICS (2022)

Article Biochemical Research Methods

On Monomeric and Multimeric Structures-Based Protein-Ligand Interactions

Yajun Dai, Yang Li, Liping Wang, Zhenling Peng, Jianyi Yang

Summary: In this study, the protein-ligand interactions in monomeric and quaternary structures were compared using molecular docking experiments and binding free energy estimations. The results showed that ligands in quaternary structures can simultaneously interact with multiple protein chains, and quaternary structures have lower binding free energy and more accurate ligand conformations compared to monomeric structures. Therefore, it is recommended to use quaternary structures in future studies on protein-ligand interactions.

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS (2022)

Article Biochemical Research Methods

The trRosetta server for fast and accurate protein structure prediction

Zongyang Du, Hong Su, Wenkai Wang, Lisha Ye, Hong Wei, Zhenling Peng, Ivan Anishchenko, David Baker, Jianyi Yang

Summary: The trRosetta server is a web-based platform for fast and accurate protein structure prediction using deep learning and Rosetta, which distinguishes itself from other similar servers in terms of rapid and accurate prediction of novel structures.

NATURE PROTOCOLS (2021)

Article Biochemistry & Molecular Biology

RNALigands: a database and web server for RNA-ligand interactions

Saisai Sun, Jianyi Yang, Zhaolei Zhang

Summary: RNA molecules can fold into complex three-dimensional structures and interact with small molecule ligands. Researchers have established a database of RNA secondary structural motifs and bound small molecule ligands. They also developed a computational pipeline to predict RNA secondary structures and search for similar motifs and interacting small molecules. The server was successfully used to identify potential matches for a specific RNA sequence.
Article Chemistry, Multidisciplinary

Improved Protein Structure Prediction Using a New Multi-Scale Network and Homologous Templates

Hong Su, Wenkai Wang, Zongyang Du, Zhenling Peng, Shang-Hua Gao, Ming-Ming Cheng, Jianyi Yang

Summary: The accuracy of de novo protein structure prediction has been significantly improved in recent years with the introduction of deep learning techniques. The improved version trRosettaX utilizes a new multi-scale network and attention-based module to enhance prediction accuracy. trRosettaX shows significant improvements in contact precision and TM-score compared to trRosetta in CASP and CAMEO tests.

ADVANCED SCIENCE (2021)

Article Multidisciplinary Sciences

SARS-CoV-2 nucleocapsid protein binds host mRNAs and attenuates stress granules to impair host stress response

Syed Nabeel-Shah, Hyunmin Lee, Nujhat Ahmed, Giovanni L. Burke, Shaghayegh Farhangmehr, Kanwal Ashraf, Shuye Pu, Ulrich Braunschweig, Guoqing Zhong, Hong Wei, Hua Tang, Jianyi Yang, Edyta Marcon, Benjamin J. Blencowe, Zhaolei Zhang, Jack F. Greenblatt

Summary: The imbalance in immune response observed in SARS-CoV-2 infected patients may be due to the altered interaction between the nucleocapsid protein (N protein) and stress granule resident proteins, which in turn affects the stress response of host cells.

ISCIENCE (2022)

Article Multidisciplinary Sciences

Human disease prediction from microbiome data by multiple feature fusion and deep learning

Xingjian Chen, Zifan Zhu, Weitong Zhang, Yuchen Wang, Fuzhou Wang, Jianyi Yang, Ka-Chun Wong

Summary: Predicting human diseases from microbiome data is important in medical applications. Existing methods often overlook the abundance profiles of known and unknown microbial organisms, as well as the taxonomic relationships among them, resulting in information loss. To address these issues, we developed a comprehensive machine learning framework called MetaDR that combines deep learning and various information sources to predict human diseases.

ISCIENCE (2022)

Review Biochemistry & Molecular Biology

Protein structure prediction in the deep learning era

Zhenling Peng, Wenkai Wang, Renmin Han, Fa Zhang, Jianyi Yang

Summary: This article reviews the progress of deep learning-based protein structure prediction methods in the past two years, analyzes the advantages and disadvantages of the two-step and end-to-end approaches, emphasizes the value of developing both approaches, and points out the challenges in function-oriented protein structure prediction.

CURRENT OPINION IN STRUCTURAL BIOLOGY (2022)

Article Biochemistry & Molecular Biology

Improved protein structure prediction with trRosettaX2, AlphaFold2, and optimized MSAs in CASP15

Zhenling Peng, Wenkai Wang, Hong Wei, Xiaoge Li, Jianyi Yang

Summary: We present the results of our monomer and multimer structure prediction methods in CASP15. By utilizing complementary sequence databases and advanced database searching algorithms, we generated high-quality multiple sequence alignments (MSAs) and selected top MSAs for structure prediction. Our methods, named Yang-Server and Yang-Multimer, ranked first and fourth for monomer and multimer structure prediction, respectively. The predicted structure models by Yang-Server showed an average TM-score of 0.876 for 94 monomers, compared to 0.798 by the default AlphaFold2, while the predicted structure models by Yang-Multimer showed an average DockQ score of 0.464 for 42 multimers, compared to 0.389 by the default AlphaFold-Multimer. Factors such as improved MSAs, iterated modeling, and interplay between monomer and multimer structure prediction contributed to the improvement.

PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS (2023)

Article Computer Science, Interdisciplinary Applications

Single-sequence protein structure prediction using supervised transformer protein language models

Wenkai Wang, Zhenling Peng, Jianyi Yang

Summary: trRosettaX-Single is an automated algorithm for single-sequence protein structure prediction that predicts two-dimensional geometry and reconstructs three-dimensional structures using a multi-scale network and knowledge distillation. It outperforms AlphaFold2 and RoseTTAFold on orphan proteins and works well on human-designed proteins.

NATURE COMPUTATIONAL SCIENCE (2022)

Article Biochemical Research Methods

RNA Flexibility Prediction With Sequence Profile and Predicted Solvent Accessibility

Hong Wei, Boling Wang, Jianyi Yang, Jianzhao Gao

Summary: In this study, a new method called RNAbval was proposed to predict RNA B-factors using random forest and a comprehensive set of features. The method achieved a significant improvement of 9.2-20.5 percent over the state-of-the-art method on two benchmark test datasets. The proposed method is available for access online.

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS (2021)

No Data Available