Journal article
The effects of sample size on population genomic analyses - implications for the tests of neutrality
BMC Genomics, Vol.17(1), 123
2016
Abstract
Background: One of the fundamental measures of molecular genetic variation is the Watterson's estimator (Θ), which is based on the number of segregating sites. The estimation of Θ is unbiased only under neutrality and constant population growth. It is well known that the estimation of Θ is biased when these assumptions are violated. However, the effects of sample size in modulating the bias was not well appreciated. Results: We examined this issue in detail based on large-scale exome data and robust simulations. Our investigation revealed that sample size appreciably influences Θ estimation and this effect was much higher for constrained genomic regions than that of neutral regions. For instance, Θ estimated for synonymous sites using 512 human exomes was 1.9 times higher than that obtained using 16 exomes. However, this difference was 2.5 times for the nonsynonymous sites of the same data. We observed a positive correlation between the rate of increase in Θ estimates (with respect to the sample size) and the magnitude of selection pressure. For example, Θ estimated for the nonsynonymous sites of highly constrained genes (dN/dS < 0.1) using 512 exomes was 3.6 times higher than that estimated using 16 exomes. In contrast this difference was only 2 times for the less constrained genes (dN/dS > 0.9). Conclusions: The results of this study reveal the extent of underestimation owing to small sample sizes and thus emphasize the importance of sample size in estimating a number of population genomic parameters. Our results have serious implications for neutrality tests such as Tajima D, Fu-Li D and those based on the McDonald and Kreitman test: Neutrality Index and the fraction of adaptive substitutions. For instance, use of 16 exomes produced 2.4 times higher proportion of adaptive substitutions compared to that obtained using 512 exomes (24 % vs 10 %). © 2016 Subramanian.
Details
- Title
- The effects of sample size on population genomic analyses - implications for the tests of neutrality
- Authors
- Sankar Subramanian (Author) - Griffith University
- Publication details
- BMC Genomics, Vol.17(1), 123; 13
- Publisher
- BioMed Central Ltd.
- Date published
- 2016
- DOI
- 10.1186/s12864-016-2441-8
- ISSN
- 1471-2164
- Copyright note
- Copyright © 2016 Subramanian. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Organisation Unit
- School of Science and Engineering - Legacy; University of the Sunshine Coast, Queensland; School of Science, Technology and Engineering; Centre for Bioinnovation
- Language
- English
- Record Identifier
- 99450321902621
- Output Type
- Journal article
Metrics
24 File views/ downloads
300 Record Views
InCites Highlights
These are selected metrics from InCites Benchmarking & Analytics tool, related to this output
- Web Of Science research areas
- Biotechnology & Applied Microbiology
- Genetics & Heredity
UN Sustainable Development Goals (SDGs)
This output has contributed to the advancement of the following goals:
Source: InCites