☆ 4.6 Article

Amino Acid Changes in Disease-Associated Variants Differ Radically from Variants Observed in the 1000 Genomes Project Dataset

PLOS COMPUTATIONAL BIOLOGY (2013)

期刊

PLOS COMPUTATIONAL BIOLOGY

卷 9, 期 12, 页码 -

出版社

PUBLIC LIBRARY SCIENCE

DOI: 10.1371/journal.pcbi.1003382

关键词

-

类别

Biochemical Research Methods Mathematical & Computational Biology

资金

National Institutes of Health [GM094585]
U.S. Department of Energy, Office of Biological and Environmental Research [DE-AC02-06CH11357]
EMBL-EBI

向作者/读者索取更多资源

Protocol

Reagent

摘要

The 1000 Genomes Project data provides a natural background dataset for amino acid germline mutations in humans. Since the direction of mutation is known, the amino acid exchange matrix generated from the observed nucleotide variants is asymmetric and the mutabilities of the different amino acids are very different. These differences predominantly reflect preferences for nucleotide mutations in the DNA (especially the high mutation rate of the CpG dinucleotide, which makes arginine mutability very much higher than other amino acids) rather than selection imposed by protein structure constraints, although there is evidence for the latter as well. The variants occur predominantly on the surface of proteins (82%), with a slight preference for sites which are more exposed and less well conserved than random. Mutations to functional residues occur about half as often as expected by chance. The disease-associated amino acid variant distributions in OMIM are radically different from those expected on the basis of the 1000 Genomes dataset. The disease-associated variants preferentially occur in more conserved sites, compared to 1000 Genomes mutations. Many of the amino acid exchange profiles appear to exhibit an anti-correlation, with common exchanges in one dataset being rare in the other. Disease-associated variants exhibit more extreme differences in amino acid size and hydrophobicity. More modelling of the mutational processes at the nucleotide level is needed, but these observations should contribute to an improved prediction of the effects of specific variants in humans.

作者

我是这篇论文的作者

点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6

评分不足

次要评分

新颖性

-

重要性

-

科学严谨性

-

评价这篇论文

推荐

Editorial Material Biochemistry & Molecular Biology

1000 Genomes Project phase 4: The gift that keeps on giving

Neil A. Hanchard, Ananyo Choudhury

Summary: This article discusses the release of expanded high-depth sequencing data for the fourth phase of the 1000 Genomes Project. It explores how this dataset can serve as a more comprehensive and accurate resource for global genomics through extensive comparisons and benchmarks.

CELL (2022)

添加到收藏夹

Article Biochemistry & Molecular Biology

IndiGenomes: a comprehensive resource of genetic variants from over 1000 Indian genomes

Abhinav Jain, Rahul C. Bhoyar, Kavita Pandhare, Anushree Mishra, Disha Sharma, Mohamed Imran, Vigneshwar Senthivel, Mohit Kumar Divakar, Mercy Rophina, Bani Jolly, Arushi Batra, Sumit Sharma, Sanjay Siwach, Arun G. Jadhao, Nikhil Palande, Ganga Nath Jha, Nishat Ashrafi, Prashant Kumar Mishra, A. K. Vidhya, Suman Jain, Debasis Dash, Nachimuthu Senthil Kumar, Andrew Vanlallawma, Ranjan Jyoti Sarma, Lalchhandama Chhakchhuak, Shantaraman Kalyanaraman, Radha Mahadevan, Sunitha Kandasamy, B. M. Pabitha, Raskin Erusan Rajagopal, Ezhil J. Ramya, Nirmala P. Devi, Anjali Bajaj, Vishu Gupta, Samatha Mathew, Sangam Goswami, Mohit Mangla, Savinitha Prakash, Kandarp Joshi, S. Sreedevi, Devarshi Gajjar, Ronibala Soraisham, Rohit Yadav, Yumnam Silla Devi, Aayush Gupta, Mitali Mukerji, Sivaprakash Ramalingam, B. K. Binukumar, Vinod Scaria, Sridhar Sivasubbu

Summary: With the advent of next-generation sequencing, a pilot phase of the 'IndiGen' program performed whole genome sequencing of 1029 healthy Indian individuals to create the IndiGenomes database, which is now freely accessible. This comprehensive genetic variant resource for the Indian population has been extensively accessed by the worldwide community.

NUCLEIC ACIDS RESEARCH (2021)

添加到收藏夹

Article Biology

Deciphering the Molecular Mechanism Underlying African Animal Trypanosomiasis by Means of the 1000 Bull Genomes Project Genomic Dataset

Abirami Rajavel, Selina Klees, Yuehan Hui, Armin Otto Schmitt, Mehmet Gultas

Summary: Climate change increases the risk of spreading vector-borne diseases like African Animal Trypanosomiasis (AAT). This study investigates the genetic mechanisms involved in AAT susceptibility and tolerance in cattle breeds. The findings provide insights into the regulatory SNPs, gene expression profiles, and downstream effectors associated with AAT, contributing to a better understanding of resistance and susceptibility in cattle breeds.

BIOLOGY-BASEL (2022)

添加到收藏夹

Article Genetics & Heredity

Major sex differences in allele frequencies for X chromosomal variants in both the 1000 Genomes Project and gnomAD

Zhong Wang, Lei Sun, Andrew Paterson

Summary: This study found significant sex differences in genetic variations on the X chromosome, which has important implications for disease and trait associations. Variations in pseudoautosomal regions and X-transposed regions showed larger differences compared to non-pseudoautosomal regions. These findings need to be taken into account in X chromosome analyses.

PLOS GENETICS (2022)

添加到收藏夹

Article Biochemistry & Molecular Biology

High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios

Marta Byrska-Bishop, Uday S. Evani, Xuefang Zhao, Anna O. Basile, Haley J. Abel, Allison A. Regier, Andre Corvelo, Wayne E. Clarke, Rajeeva Musunuri, Kshithija Nagulapalli, Susan Fairley, Alexi Runnels, Lara Winterkorn, Ernesto Lowy, Paul Flicek, Soren Germer, Harrison Brand, Ira M. Hall, Michael E. Talkowski, Giuseppe Narzisi, Michael C. Zody

Summary: The 1000 Genomes Project is the largest open resource of whole-genome sequencing data, and a new high-coverage WGS 1kGP resource has been released, improving the sensitivity and accuracy of variant calls and making it more valuable for association studies.

CELL (2022)

添加到收藏夹

Article Clinical Neurology

Expanded CAG Repeats in ATXN1, ATXN2, ATXN3, and HTT in the 1000 Genomes Project

Fulya Akcimen, Jay P. Ross, Calwing Liao, Dan Spiegelman, Patrick A. Dion, Guy A. Rouleau

Summary: The study identified HTT-positive, ATXN2-positive, ATXN3-positive, and possibly ATXN1-positive samples in 0.5% of these populations, indicating the presence of asymptomatic small expanded repeats. There was no correlation between repeat sizes of different genes and the distribution of CAG alleles varied by ethnicity.

MOVEMENT DISORDERS (2021)

添加到收藏夹

Article Biochemical Research Methods

Cataloguing experimentally confirmed 80.7 kb-long ACKR1 haplotypes from the 1000 Genomes Project database

Kshitij Srivastava, Anne-Sophie Fratzscher, Bo Lan, Willy Albert Flegel

Summary: This study identified long reference sequences by analyzing homozygous genomic regions in the 1000 Genomes Project database. A total of 902 ACKR1 haplotypes of varying lengths were confirmed, with the longest being 80,584 nucleotides and the shortest being 1,901 nucleotides. The approach of using tracts of homozygosity for definitive reference sequences is scalable and can be applied to any gene.

BMC BIOINFORMATICS (2021)

添加到收藏夹

Article Genetics & Heredity

de novo variant calling identifies cancer mutation signatures in the 1000 Genomes Project

Jeffrey K. Ng, Pankaj Vats, Elyn Fritz-Waters, Stephanie Sarkar, Eleanor I. Sams, Evin M. Padhi, Zachary L. Payne, Shawn Leonard, Marc A. West, Chandler Prince, Lee Trani, Marshall Jansen, George Vacek, Mehrzad Samadi, Timothy T. Harkins, Craig Pohl, Tychele N. Turner

Summary: In this study, a graphics processing units-based workflow was developed to accelerate the detection of de novo variants (DNVs). The workflow was applied to whole-genome sequencing data from different sources, revealing unexpected results in the DNV callsets and potential cell line artifacts. Additionally, mutation signature analysis identified associations with B-cell lymphoma and variants in DNA repair genes. These findings have important implications for reference building and disease-related projects.

HUMAN MUTATION (2022)

添加到收藏夹

Article Biochemistry & Molecular Biology

The Changes in Canine Parvovirus Variants over the Years

Xiangqi Hao, Yanchao Li, Xiangyu Xiao, Bo Chen, Pei Zhou, Shoujun Li

Summary: Canine parvovirus (CPV-2) is a significant pathogen in dogs. Despite the development of vaccines, CPV-2 continues to circulate in the dog population. CPV-2c is replacing CPV-2a as the dominant variant in Asia, South America, North America, and Africa, with evidence of spillover.

INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES (2022)

添加到收藏夹

Article Genetics & Heredity

A systematic analysis of splicing variants identifies new diagnoses in the 100,000 Genomes Project

Alexander J. M. Blakes, Htoo A. Wai, Ian Davies, Hassan E. Moledina, April Ruiz, Tessy Thomas, David Bunyan, N. Simon Thomas, Christine P. Burren, Lynn Greenhalgh, Melissa Lees, Amanda Pichini, Sarah F. Smithson, Ana Lisa Taylor Tavares, Peter O'Donovan, Andrew G. L. Douglas, Nicola Whiffin, Diana Baralle, Jenny Lord

Summary: This study examines the role of non-canonical splicing variants in rare genetic diseases using whole-genome sequencing data. The research shows that positions near splice sites and splicing branchpoints are constrained by purifying selection and harbor potentially damaging non-coding variants. The study also identifies new likely diagnoses for individuals with unsolved rare diseases.

GENOME MEDICINE (2022)

添加到收藏夹

Article Medicine, Legal

An overview of SNP-SNP microhaplotypes in the 26 populations of the 1000 Genomes Project

Jiaming Xue, Shengqiu Qu, Mengyu Tan, Yuanyuan Xiao, Ranran Zhang, Dezhi Chen, Meili Lv, Yiming Zhang, Lin Zhang, Weibo Liang

Summary: Microhaplotypes (MHs) are a promising new type of forensic markers that have advantages such as low mutation rates, lack of stutter artifacts, and short amplicons. They have improved human identification, kinship analysis, ancestry prediction, and mixture deconvolution capabilities. While there are some public databases available for MHs, there is a need for a more comprehensive database that integrates information from other databases. This study successfully established a dual-SNP MH database (D-SNPsDB) for 26 populations, providing basic data such as physical positions, allele frequencies, and variant information.

INTERNATIONAL JOURNAL OF LEGAL MEDICINE (2022)

添加到收藏夹

Article Multidisciplinary Sciences

Novel candidates of pathogenic variants of the BRCA1 and BRCA2 genes from a dataset of 3,552 Japanese whole genomes (3.5KJPNv2)

Hideki Tokunaga, Keita Iida, Atsushi Hozawa, Soichi Ogishima, Yoh Watanabe, Shogo Shigeta, Muneaki Shimada, Yumi Yamaguchi-Kabata, Shu Tadaka, Fumiki Katsuoka, Shin Ito, Kazuki Kumada, Yohei Hamanaka, Nobuo Fuse, Kengo Kinoshita, Masayuki Yamamoto, Nobuo Yaegashi, Jun Yasuda

Summary: Identifying population frequencies of pathogenic variants in BRCA1/2 genes is essential for estimating HBOC patient numbers, while detecting moderately penetrant HBOC gene variants in the population is critical for personalized health care. A prospective cohort study can provide valuable information, with computational scoring and MAF filtration being useful for identifying potentially pathogenic variants.

PLOS ONE (2021)

添加到收藏夹

Article Genetics & Heredity

Open-Access Worldwide Population STR Database Constructed Using High-Coverage Massively Parallel Sequencing Data Obtained from the 1000 Genomes Project

Tamara Soledad Frontanilla, Guilherme Valle-Silva, Jesus Ayala, Celso Teixeira Mendes-Junior

Summary: By conducting a comprehensive genotyping analysis of STRs obtained from the 1000 Genome populations, we established a reliable open-access STR database and identified limitations of HipSTR in detecting longer alleles.

GENES (2022)

添加到收藏夹

Article Genetics & Heredity

Analysis of the Batch Effect Due to Sequencing Center in Population Statistics Quantifying Rare Events in the 1000 Genomes Project

Iago Maceda, Oscar Lao

Summary: The 1000 Genomes Project (1000G) is a valuable dataset for genomics research. Recent studies have found ghost mutation signals in 1000G, which can affect follow-up studies. This study demonstrates the association between sequencing center and loss of function alleles, singletons, and patterns of archaic introgression in 1000G.

GENES (2022)

添加到收藏夹

Article Genetics & Heredity

Targeting de novo loss-of-function variants in constrained disease genes improves diagnostic rates in the 100,000 Genomes Project

Eleanor G. Seaby, N. Simon Thomas, Amy Webb, Helen Brittain, Ana Lisa Taylor Tavares, Genomics England Consortium, Diana Baralle, Heidi L. Rehm, Anne O'Donnell-Luria, Sarah Ennis

Summary: This study proposed a screening method based on the LOEUF score to rapidly identify pathogenic variants and new diagnoses. The results showed that this method has high specificity and accuracy, and can identify diagnoses missed by 100KGP analysis.

HUMAN GENETICS (2023)

添加到收藏夹

Article Biochemistry & Molecular Biology

The Enzyme Portal: an integrative tool for enzyme information and analysis

Rossana Zaru, Joseph Onwubiko, Antonio J. M. Ribeiro, Keeva Cochrane, Jonathan D. Tyzack, Venkatesh Muthukrishnan, Lukas Pravda, Janet M. Thornton, Claire O'Donovan, Sameer Velanker, Sandra Orchard, Andrew Leach, Maria J. Martin

Summary: Enzyme Portal serves as a free hub for researchers to easily access and explore enzyme-related information from various resources, addressing the challenge of time-consuming retrieval of scattered enzyme data.

FEBS JOURNAL (2022)

添加到收藏夹

Letter Biochemical Research Methods

Srinivasan (1962-2021) in Bioinformatics and beyond

M. Michael Gromiha, Christine A. Orengo, Ramanathan Sowdhamini, Janet M. Thornton

BIOINFORMATICS (2022)

添加到收藏夹

Article Biochemical Research Methods

phastSim: Efficient simulation of sequence evolution for pandemic-scale datasets

Nicola R. De Maio, William Boulton, Lukas Weilguny, Conor Walker, Yatish Turakhia, Russell O. Corbett-Detig, Nick Goldman, Ville Mustonen, Joel Wertheim

Summary: This article introduces a new algorithm and software for efficiently simulating a large number of closely related genomes. The algorithm is based on the Gillespie approach and utilizes an efficient multi-layered search tree structure to achieve high computational efficiency, allowing integration with various evolutionary models.

PLOS COMPUTATIONAL BIOLOGY (2022)

添加到收藏夹

Correction Multidisciplinary Sciences

Genomic reconstruction of the SARS CoV-2 epidemic in England (vol 600, pg 506, 2021)

Harald S. Vohringer, Theo Sanderson, Matthew Sinnott, Nicola De Maio, Thuy Nguyen, Richard Goater, Frank Schwach, Ian Harrison, Joel Hellewell, Cristina V. Ariani, Sonia Goncalves, David K. Jackson, Ian Johnston, Alexander W. Jung, Callum Saint, John Sillitoe, Maria Suciu, Nick Goldman

NATURE (2022)

添加到收藏夹

Article Biochemistry & Molecular Biology

Ensembl 2023

Fergal J. Martin, M. Ridwan Amode, Alisha Aneja, Olanrewaju Austine-Orimoloye, Andrey G. Azov, If Barnes, Arne Becker, Ruth Bennett, Andrew Berry, Jyothish Bhai, Simarpreet Kaur Bhurji, Alexandra Bignell, Sanjay Boddu, Paulo R. Branco Lins, Lucy Brooks, Shashank Budhanuru Ramaraju, Mehrnaz Charkhchi, Alexander Cockburn, Luca Da Rin Fiorretto, Claire Davidson, Kamalkumar Dodiya, Sarah Donaldson, Bilal El Houdaigui, Tamara El Naboulsi, Reham Fatima, Carlos Garcia Giron, Thiago Genez, Gurpreet S. Ghattaoraya, Jose Gonzalez Martinez, Cristi Guijarro, Matthew Hardy, Zoe Hollis, Thibaut Hourlier, Toby Hunt, Mike Kay, Vinay Kaykala, Tuan Le, Diana Lemos, Diego Marques-Coelho, Jose Carlos Marugan, Gabriela Alejandra Merino, Louisse Paola Mirabueno, Aleena Mushtaq, Syed Nakib Hossain, Denye N. Ogeh, Manoj Pandian Sakthivel, Anne Parker, Malcolm Perry, Ivana Pilizota, Irina Prosovetskaia, Jose G. Perez-Silva, Ahamed Imran Abdul Salam, Nuno Saraiva-Agostinho, Helen Schuilenburg, Dan Sheppard, Swati Sinha, Botond Sipos, William Stark, Emily Steed, Ranjit Sukumaran, Dulika Sumathipala, Marie-Marthe Suner, Likhitha Surapaneni, Kyosti Sutinen, Michal Szpak, Francesca Floriana Tricomi, David Urbina-Gomez, Andres Veidenberg, Thomas A. Walsh, Brandon Walts, Elizabeth Wass, Natalie Willhoft, Jamie Allen, Jorge Alvarez-Jarreta, Marc Chakiachvili, Bethany Flint, Stefano Giorgetti, Leanne Haggerty, Garth R. Ilsley, Jane E. Loveland, Benjamin Moore, Jonathan M. Mudge, John Tate, David Thybert, Stephen J. Trevanion, Andrea Winterbottom, Adam Frankish, Sarah E. Hunt, Magali Ruffier, Fiona Cunningham, Sarah Dyer, Robert D. Finn, Kevin L. Howe, Peter W. Harrison, Andrew D. Yates, Paul Flicek

Summary: Ensembl has been providing high-quality genomic resources for vertebrates and model organisms for over 20 years. With the increase in high-quality reference genomes and the development of pangenome representations, Ensembl aims to support downstream research by creating high-quality annotations, tools, and services for species across the tree of life. This report highlights Ensembl's resources for popular reference genomes, the growing annotations, updates to the Variant Effect Predictor, protein structure predictions, and the beta release of their new website.

NUCLEIC ACIDS RESEARCH (2023)

添加到收藏夹

Article Biochemistry & Molecular Biology

AlphaFold2 protein structure prediction: Implications for drug discovery

Neera Borkakoti, Janet M. Thornton

Summary: The drug discovery process involves designing compounds to selectively interact with their protein targets. Recent advancements in artificial intelligence have greatly improved the accuracy of protein structure prediction, making protein targets more accessible in the drug design process. In this perspective article, we highlight the importance of accurate protein structure prediction in various stages of small molecule drug discovery, discussing current capabilities and the potential impact of further evolution of predictive procedures.

CURRENT OPINION IN STRUCTURAL BIOLOGY (2023)

添加到收藏夹

Article Biochemistry & Molecular Biology

Mapping the Constrained Coding Regions in the Human Genome to Their Corresponding Proteins

Marcia A. Hasenahuer, Alba Sanchis-Juan, Roman A. Laskowski, James A. Baker, James D. Stephenson, Christine A. Orengo, F. Lucy Raymond, Janet M. Thornton

Summary: In this study, constrained coding regions (CCRs) in the human genome were identified using DNA sequencing data from healthy control populations. These regions lack protein-changing variants and have been under constraint during human evolution. The distribution of CCRs was explored and their co-occurrence with different protein functional features was analyzed. Functional amino acids involved in DNA/RNA interactions, protein-protein contacts, and catalytic sites were found to be highly constrained. Surprisingly, linear motifs, linear interacting peptides, disorder-order transitions, and liquid-liquid phase separating regions also showed strong association with constraint for variability.

JOURNAL OF MOLECULAR BIOLOGY (2023)

添加到收藏夹

Article Biotechnology & Applied Microbiology

Dynamic, adaptive sampling during nanopore sequencing using Bayesian experimental design

Lukas Weilguny, Nicola De Maio, Rory Munro, Charlotte Manser, Ewan Birney, Matthew Loose, Nick Goldman

Summary: BOSS-RUNS is an algorithmic framework and software that dynamically updates decision strategies based on real-time updates of uncertainty at each genome position. It optimizes information gain by deciding whether to fully sequence each DNA fragment, leading to improved variant calling in microbial communities.

NATURE BIOTECHNOLOGY (2023)

添加到收藏夹

Article Biochemistry & Molecular Biology

A structural biology community assessment of AlphaFold2 applications

Mehmet Akdel, Douglas E. Pires, Eduard Porta Pardo, Jurgen Janes, Arthur O. Zalevsky, Balint Meszaros, Patrick Bryant, Lydia L. Good, Roman A. Laskowski, Gabriele Pozzati, Aditi Shenoy, Wensi Zhu, Petras Kundrotas, Victoria Ruiz Serra, Carlos H. M. Rodrigues, Alistair S. Dunham, David Burke, Neera Borkakoti, Sameer Velankar, Adam Frost, Jerome Basquin, Kresten Lindorff-Larsen, Alex Bateman, Andrey Kajava, Alfonso Valencia, Sergey Ovchinnikov, Janani Durairaj, David B. Ascher, Janet M. Thornton, Norman E. Davey, Amelie Stein, Arne Elofsson, Tristan Croll, Pedro Beltrao

Summary: This study evaluates the performance of AlphaFold2 in structural biology applications and finds that it performs well and can partially replace experimentally determined structures, which is of great significance for life science research.

NATURE STRUCTURAL & MOLECULAR BIOLOGY (2022)

添加到收藏夹

Editorial Material Biochemistry & Molecular Biology

The mission to ensure continued funding for excellent basic research

Angus I. Lamond, Ivan Dikic, Andre Nussenzweig, Christoph W. Mueller, Janet M. Thornton, Michael B. Yaffe

EMBO REPORTS (2023)

添加到收藏夹

Article Genetics & Heredity

Maximum likelihood pandemic-scale phylogenetics

Nicola De Maio, Prabhav Kalaghatgi, Yatish Turakhia, Russell Corbett-Detig, Bui Quang Minh, Nick Goldman

Summary: Phylogenetics plays a crucial role in genomic epidemiology, and the COVID-19 pandemic has generated an unprecedented amount of genome sequence data for analysis. However, most phylogenetic approaches are unable to handle the scale of these datasets. This study presents a new method called 'MAximum Parsimonious Likelihood Estimation' (MAPLE) for likelihood-based phylogenetic analysis of large genomic datasets. MAPLE is faster, more accurate, and requires significantly less memory compared to existing maximum likelihood methods, enabling the analysis of millions of genomes.

NATURE GENETICS (2023)

添加到收藏夹

Correction Multidisciplinary Sciences

Temporal changes in the gene expression heterogeneity during brain development and aging (vol 10, 4080, 2020)

Ulas Isildak, Mehmet Somel, Janet M. Thornton, Handan Melike Donertas

SCIENTIFIC REPORTS (2023)

添加到收藏夹

Article Biochemistry & Molecular Biology

The 3D Modules of Enzyme Catalysis: Deconstructing Active Sites into Distinct Functional Entities

Ioannis G. Riziotis, Antonio J. M. Ribeiro, Neera Borkakoti, Janet M. Thornton

Summary: Enzyme catalysis is controlled by a limited set of residues and co-factors. By utilizing three-dimensional templates, recurring catalytic modules that are involved in metal ion, co-factor, and substrate binding can be identified. Some of these convergent modules perform specific catalytic functions, while enzymes that have diverged during evolution retain specific regions of their active site.

JOURNAL OF MOLECULAR BIOLOGY (2023)

添加到收藏夹

Article Biochemical Research Methods

EzMechanism: an automated tool to propose catalytic mechanisms of enzyme reactions

Antonio J. M. Ribeiro, Ioannis G. Riziotis, Jonathan D. Tyzack, Neera Borkakoti, Janet M. Thornton

Summary: The rich literature on enzyme reaction mechanisms can serve as the foundation for new knowledge-based approaches to investigate enzyme mechanisms. In this study, a tool called EzMechanism is presented, which can automatically infer mechanistic paths for a given three-dimensional active site and enzyme reaction based on catalytic rules compiled from a database of enzyme mechanisms. EzMechanism facilitates and improves the generation of hypotheses by considering relevant information derived from literature on both related and unrelated enzymes.

NATURE METHODS (2023)

添加到收藏夹

Article Evolutionary Biology

DNA Sequences Are as Useful as Protein Sequences for Inferring Deep Phylogenies

Paschalia Kapli, Ioanna Kotari, Maximilian J. Telford, Nick Goldman, Ziheng Yang

Summary: Inference of deep phylogenies has primarily used protein sequences, but our analysis shows that DNA sequences may be just as useful and should not be excluded. We conducted a simulation study and analyzed empirical data, which suggest that DNA sequences can recover the correct tree as often as protein sequences. Using DNA data has computational advantages and allows for advanced models that account for heterogeneity in the nucleotide-substitution process.

SYSTEMATIC BIOLOGY (2023)

添加到收藏夹

暂无数据

© Peeref 2019-2024. All rights reserved.