4.6 Article

Unbiased population heterozygosity estimates from genome-wide sequence data

Journal

METHODS IN ECOLOGY AND EVOLUTION
Volume 12, Issue 10, Pages 1888-1898

Publisher

WILEY
DOI: 10.1111/2041-210X.13659

Keywords

conservation; DArTseq; filtering; genetic mixing; heterozygosity; population structure; RADseq; single nucleotide polymorphisms

Categories

Funding

  1. Australian Research Council [DP190100990]
  2. National Health and Medical Research Council [1118640, 1132412]
  3. Wellcome Trust [108508]

Ask authors/readers for more resources

Heterozygosity estimation from genome-wide sequence data can be influenced by sample size, rare allele filtering, missing data thresholds, and population structure. Recommendations are made to report autosomal heterozygosity, exclude sites with missing data, and analyze populations independently for accurate comparisons within and across studies.
Heterozygosity is a metric of genetic variability frequently used to inform the management of threatened taxa. Estimating observed and expected heterozygosities from genome-wide sequence data has become increasingly common, and these estimates are often derived directly from genotypes at single nucleotide polymorphism (SNP) markers. While many SNP markers can provide precise estimates of genetic processes, the results of 'downstream' analysis with these markers may depend heavily on 'upstream' filtering decisions. Here we explore the downstream consequences of sample size, rare allele filtering, missing data thresholds and known population structure on estimates of observed and expected heterozygosity using two reduced-representation sequencing datasets, one from the mosquito Aedes aegypti (ddRADseq) and the other from a threatened grasshopper, Keyacris scurra (DArTseq). We show that estimates based on polymorphic markers only (i.e. SNP heterozygosity) are always biased by global sample size (N), with smaller N producing larger estimates. By contrast, results are unbiased by sample size when calculations consider monomorphic as well as polymorphic sequence information (i.e. genome-wide or autosomal heterozygosity). SNP heterozygosity is also biased when differentiated populations are analysed together while autosomal heterozygosity remains unbiased. We also show that when nucleotide sites with missing genotypes are included, observed and expected heterozygosity estimates diverge in proportion to the amount of missing data permitted at each site. We make three recommendations for estimating genome-wide heterozygosity: (a) autosomal heterozygosity should be reported instead of (or in addition to) SNP heterozygosity; (b) sites with any missing data should be omitted and (c) populations should be analysed in independent runs. This should facilitate comparisons within and across studies and between observed and expected measures of heterozygosity.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available