☆ 4.7 Article

Constructing germline research cohorts from the discarded reads of clinical tumor sequences

GENOME MEDICINE (2021)

期刊

GENOME MEDICINE

卷 13, 期 1, 页码 -

出版社

BMC

DOI: 10.1186/s13073-021-00999-4

关键词

类别

Genetics & Heredity

资金

NIH [K25HL121295, U01HG009080, R01HG006399, R01CA227237, R01ES029929, R01HG011345]
DoD [W81XWH-16-2-0018]
Chan Zuckerberg Science Initiative
Doris Duke Charitable Foundation
Louis B. Mayer Foundation
Claudia Adams Barr Foundation
[R01CA244569]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

We developed a framework for inferring germline variants from tumor panel sequencing, demonstrating high to moderate accuracy for imputed common variants, genetic ancestry, polygenic risk scores, and individual HLA alleles. Our approach showcases the feasibility and utility of utilizing targeted tumor sequencing to build rich germline research cohorts, enabling the study of relationships between genetic ancestry, polygenic risk, and tumor characteristics that could not be studied with conventional on-target tumor data. We provide our analysis pipeline publicly to facilitate this effort.

Background: Hundreds of thousands of cancer patients have had targeted (panel) tumor sequencing to identify clinically meaningful mutations. In addition to improving patient outcomes, this activity has led to significant discoveries in basic and translational domains. However, the targeted nature of clinical tumor sequencing has a limited scope, especially for germline genetics. In this work, we assess the utility of discarded, off-target reads from tumor-only panel sequencing for the recovery of genome-wide germline genotypes through imputation. Methods: We developed a framework for inference of germline variants from tumor panel sequencing, including imputation, quality control, inference of genetic ancestry, germline polygenic risk scores, and HLA alleles. We benchmarked our framework on 833 individuals with tumor sequencing and matched germline SNP array data. We then applied our approach to a prospectively collected panel sequencing cohort of 25,889 tumors. Results: We demonstrate high to moderate accuracy of each inferred feature relative to direct germline SNP array genotyping: individual common variants were imputed with a mean accuracy (correlation) of 0.86, genetic ancestry was inferred with a correlation of > 0.98, polygenic risk scores were inferred with a correlation of > 0.90, and individual HLA alleles were inferred with a correlation of > 0.80. We demonstrate a minimal influence on the accuracy of somatic copy number alterations and other tumor features. We showcase the feasibility and utility of our framework by analyzing 25,889 tumors and identifying the relationships between genetic ancestry, polygenic risk, and tumor characteristics that could not be studied with conventional on-target tumor data. Conclusions: We conclude that targeted tumor sequencing can be leveraged to build rich germline research cohorts from existing data and make our analysis pipeline publicly available to facilitate this effort.

Constructing germline research cohorts from the discarded reads of clinical tumor sequences

期刊

GENOME MEDICINE

出版社

BMC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Constructing germline research cohorts from the discarded reads of clinical tumor sequences

期刊

GENOME MEDICINE

出版社

BMC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文