4.7 Article

Accurate, scalable cohort variant calls using DeepVariant and GLnexus

期刊

BIOINFORMATICS
卷 36, 期 24, 页码 5582-5589

出版社

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btaa1081

关键词

-

资金

  1. NIH [U01HG007417]
  2. NCI [U01 HG007301]
  3. NHGRI [3UM1HG008901-03S1]

向作者/读者索取更多资源

The study presents an open-source cohort-calling method using DeepVariant and GLnexus to optimize analysis-ready cohort-level variants, showing superior results compared to GATK Best Practices in the 1000 Genomes Project samples.
Motivation: Population-scale sequenced cohorts are foundational resources for genetic analyses, but processing raw reads into analysis-ready cohort-level variants remains challenging. Results: We introduce an open-source cohort-calling method that uses the highly accurate caller DeepVariant and scalable merging tool GLnexus. Using callset quality metrics based on variant recall and precision in benchmark samples and Mendelian consistency in father-mother-child trios, we optimize the method across a range of cohort sizes, sequencing methods and sequencing depths. The resulting callsets show consistent quality improvements over those generated using existing best practices with reduced cost. We further evaluate our pipeline in the deeply sequenced 1000 Genomes Project (1KGP) samples and show superior callset quality metrics and imputation reference panel performance compared to an independently generated GATK Best Practices pipeline. Availability and implementation: We publicly release the 1KGP individual-level variant calls and cohort callset (https://console.cloud.google.com/storage/browser/brain-genomics-public/research/cohort/1KGP) to foster additional development and evaluation of cohort merging methods as well as broad studies of genetic variation. Both DeepVariant (https://github.com/google/deepvariant) and GLnexus (https://github.com/dnanexus-rnd/GLnexus) are open-source, and the optimized GLnexus setup discovered in this study is also integrated into GLnexus public releases v1.2.2 and later. Contact: cym@google.com Supplementary information: Supplementary data are available at Bioinformatics online.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Mathematics, Applied

RAINBOW GRAPHS AND SWITCHING CLASSES

Suho Oh, Hwanchul Yoo, Taedong Yun

SIAM JOURNAL ON DISCRETE MATHEMATICS (2013)

Article Biochemical Research Methods

A crowdsourced set of curated structural variants for the human genome

Lesley M. Chapman, Noah Spies, Patrick Pai, Chun Shen Lim, Andrew Carroll, Giuseppe Narzisi, Christopher M. Watson, Christos Proukakis, Wayne E. Clarke, Naoki Nariai, Eric Dawson, Garan Jones, Daniel Blankenberg, Christian Brueffer, Chunlin Xiao, Rohit Raj Kolora, Noah Alexander, Paul Wolujewicz, Azza E. Ahmed, Graeme Smith, Saadlee Shehreen, Aaron M. Wenger, Marc Salit, Justin M. Zook

PLOS COMPUTATIONAL BIOLOGY (2020)

Article Multidisciplinary Sciences

Pangenomics enables genotyping of known structural variants in 5202 diverse genomes

Jouni Siren, Jean Monlong, Xian Chang, Adam M. Novak, Jordan M. Eizenga, Charles Markello, Jonas A. Sibbesen, Glenn Hickey, Pi-Chuan Chang, Andrew Carroll, Namrata Gupta, Stacey Gabriel, Thomas W. Blackwell, Aakrosh Ratan, Kent D. Taylor, Stephen S. Rich, Jerome Rotter, David Haussler, Erik Garrison, Benedict Paten

Summary: Giraffe is a pangenome short-read mapper that efficiently maps to a collection of haplotypes threaded through a sequence graph. It speeds up mapping to thousands of human genomes and enables improved accuracy in genome-wide genotyping, ultimately enhancing genomic analyses. This tool facilitates a more comprehensive characterization of variation and has the potential to benefit various genomic studies.

SCIENCE (2021)

Article Multidisciplinary Sciences

DeepNull models non-linear covariate effects to improve phenotypic prediction and association power

Zachary R. McCaw, Thomas Colthurst, Taedong Yun, Nicholas A. Furlotte, Andrew Carroll, Babak Alipanahi, Cory Y. McLean, Farhad Hormozdiari

Summary: DeepNull is a method that uses deep learning to identify and adjust for non-linear relationships, improving statistical power and phenotypic prediction in genome-wide association studies (GWAS).

NATURE COMMUNICATIONS (2022)

Article Biology

A population-specific reference panel for improved genotype imputation in African Americans

Jared O'Connell, Taedong Yun, Meghan Moreno, Helen Li, Nadia Litterman, Alexey Kolesnikov, Elizabeth Noblin, Pi-Chuan Chang, Anjali Shastri, Elizabeth H. Dorfman, Suyash Shringarpure, Adam Auton, Andrew Carroll, Cory Y. McLean, Stella Aslibekyan, Elizabeth Babalola, Robert K. Bell, Jessica Bielenberg, Katarzyna Bryc, Emily Bullis, Daniella Coker, Gabriel Cuellar Partida, Devika Dhamija, Sayantan Das, Sarah L. Elson, Teresa Filshtein, Kipper Fletez-Brant, Pierre Fontanillas, Will Freyman, Pooja M. Gandhi, Karl Heilbron, Alejandro Hernandez, Barry Hicks, David A. Hinds, Ethan M. Jewett, Yunxuan Jiang, Katelyn Kukar, Keng-Han Lin, Maya Lowe, Jey McCreight, Matthew H. Mclntyre, Steven J. Micheletti, Joanna L. Mountain, Priyanka Nandakumar, Aaron A. Petrakovitz, G. David Poznik, Morgan Schumacher, Janie F. Shelton, Jingchunzi Shi, Christophe Toukam Tchakoute, Vinh Tran, Joyce Y. Tung, Xin Wang, Wei Wang, Catherine H. Weldon, Peter Wilton, Corinna Wong

Summary: A new genome-wide imputation reference panel comprising 2,269 individuals of Sub-Saharan African ancestries was constructed by O'Connell et al. Using DeepVariant, they created best practices for reference panel development and generated a high-quality resource that will empower high-resolution genome-wide imputation efforts for individuals with African ancestries. The raw sequencing data, variant calls, and imputation panel for this cohort are freely available via dbGaP, serving as an invaluable resource for further study of admixed African genetics.

COMMUNICATIONS BIOLOGY (2021)

Article Biochemistry & Molecular Biology

A complete pedigree-based graph workflow for rare candidate variant analysis

Charles Markello, Charles Huang, Alex Rodriguez, Andrew Carroll, Pi-Chuan Chang, Jordan Eizenga, Thomas Markello, David Haussler, Benedict Paten

Summary: This study introduces a pedigree-aware workflow based on pangenome graphs to improve the accuracy of genome mapping and variant calling. The workflow shows significant improvements in single-nucleotide variants and insertion/deletion variants compared to linear-reference mapping and pangenome graph mapping. Additionally, the study adapts and upgrades deleterious-variant detecting methods for streamlined application in undiagnosed diseases.

GENOME RESEARCH (2022)

Article Biotechnology & Applied Microbiology

DeepConsensus improves the accuracy of sequences with a gap-aware sequence transformer

Gunjan Baid, Daniel E. Cook, Kishwar Shafin, Taedong Yun, Felipe Llinares-Lopez, Quentin Berthet, Anastasiya Belyaeva, Armin Topfer, Aaron M. Wenger, William J. Rowell, Howard Yang, Alexey Kolesnikov, Waleed Ammar, Jean-Philippe Vert, Ashish Vaswani, Cory Y. McLean, Maria Nattestad, Pi-Chuan Chang, Andrew Carroll

Summary: DeepConsensus improves the accuracy and quality of PacBio HiFi reads by significantly reducing read errors through sequence correction.

NATURE BIOTECHNOLOGY (2023)

Article Biochemical Research Methods

Improving variant calling using population data and deep learning

Nae-Chyun Chen, Alexey Kolesnikov, Sidharth Goel, Taedong Yun, Pi-Chuan Chang, Andrew Carroll

Summary: In this study, population-aware DeepVariant models were developed to improve the accuracy and recall of variant calling in single samples. By using allele frequencies from the 1000 Genomes Project, this model reduced variant calling errors and improved the precision of rare homozygous and pathogenic clinvar calls. The study also found that diverse reference panels were more accurate than population-specific panels, even when the sample ancestry matched the population.

BMC BIOINFORMATICS (2023)

Article Genetics & Heredity

Inference of chronic obstructive pulmonary disease with deep learning on raw spirograms identifies new genetic loci and improves risk models

Justin Cosentino, Babak Behsaz, Babak Alipanahi, Zachary R. McCaw, Davin Hill, Tae-Hwi Schwantes-An, Dongbing Lai, Andrew Carroll, Brian D. Hobbs, Michael H. Cho, Cory Y. McLean, Farhad Hormozdiari

Summary: A deep convolutional neural network is utilized to predict COPD case-control status using raw spirograms and noisy medical-record-based labels. The machine-learning-based liability score accurately distinguishes COPD cases and controls, predicts COPD-related hospitalization, and is associated with overall survival and exacerbation events. The genome-wide association study on the liability score replicates known COPD and lung function loci and identifies new loci.

NATURE GENETICS (2023)

暂无数据