☆ 4.6 Article

Measurements of Intrahost Viral Diversity Are Extremely Sensitive to Systematic Errors in Variant Calling

JOURNAL OF VIROLOGY (2016)

期刊

JOURNAL OF VIROLOGY

卷 90, 期 15, 页码 6884-6895

出版社

AMER SOC MICROBIOLOGY

DOI: 10.1128/JVI.00667-16

关键词

类别

Virology

资金

HHS \ National Institutes of Health (NIH) [T32GM007544, AI118886]
Doris Duke Charitable Foundation (DDCF) [CSDA 2013105]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

With next-generation sequencing technologies, it is now feasible to efficiently sequence patient-derived virus populations at a depth of coverage sufficient to detect rare variants. However, each sequencing platform has characteristic error profiles, and sample collection, target amplification, and library preparation are additional processes whereby errors are introduced and propagated. Many studies account for these errors by using ad hoc quality thresholds and/or previously published statistical algorithms. Despite common usage, the majority of these approaches have not been validated under conditions that characterize many studies of intrahost diversity. Here, we use defined populations of influenza virus to mimic the diversity and titer typically found in patient-derived samples. We identified single-nucleotide variants using two commonly employed variant callers, Deep-SNV and LoFreq. We found that the accuracy of these variant callers was lower than expected and exquisitely sensitive to the input titer. Small reductions in specificity had a significant impact on the number of minority variants identified and subsequent measures of diversity. We were able to increase the specificity of DeepSNV to >99.95% by applying an empirically validated set of quality thresholds. When applied to a set of influenza virus samples from a household-based cohort study, these changes resulted in a 10-fold reduction in measurements of viral diversity. We have made our sequence data and analysis code available so that others may improve on our work and use our data set to benchmark their own bioinformatics pipelines. Our work demonstrates that inadequate quality control and validation can lead to significant overestimation of intrahost diversity. IMPORTANCE Advances in sequencing technology have made it feasible to sequence patient-derived viral samples at a level sufficient for detection of rare mutations. These high-throughput, cost-effective methods are revolutionizing the study of within-host viral diversity. However, the techniques are error prone, and the methods commonly used to control for these errors have not been validated under the conditions that characterize patient-derived samples. Here, we show that these conditions affect measurements of viral diversity. We found that the accuracy of previously benchmarked analysis pipelines was greatly reduced under patient-derived conditions. By carefully validating our sequencing analysis using known control samples, we were able to identify biases in our method and to improve our accuracy to acceptable levels. Application of our modified pipeline to a set of influenza virus samples from a cohort study provided a realistic picture of intrahost diversity and suggested the need for rigorous quality control in such studies.

Measurements of Intrahost Viral Diversity Are Extremely Sensitive to Systematic Errors in Variant Calling

期刊

JOURNAL OF VIROLOGY

出版社

AMER SOC MICROBIOLOGY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Measurements of Intrahost Viral Diversity Are Extremely Sensitive to Systematic Errors in Variant Calling

期刊

JOURNAL OF VIROLOGY

出版社

AMER SOC MICROBIOLOGY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文