A novel nonlinear dimension reduction approach to infer population structure for low-coverage sequencing data
Name:
s12859-021-04265-7.pdf
Size:
2.122Mb
Format:
PDF
Description:
Final Published Version
Affiliation
Interdisciplinary Program in Statistics and Data Science, University of ArizonaDepartment of Mathematics, University of Arizona
Department of Epidemiology and Biostatistics, University of Arizona
Issue Date
2021
Metadata
Show full item recordPublisher
BMCCitation
Zhang, M., Liu, Y., Zhou, H., Watkins, J., & Zhou, J. (2021). A novel nonlinear dimension reduction approach to infer population structure for low-coverage sequencing data. BMC Bioinformatics, 22(1), 348.Journal
BMC BioinformaticsRights
Copyright © The Author(s), 2021. This article is licensed under a Creative Commons Attribution 4.0 International License.Collection Information
This item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at repository@u.library.arizona.edu.Abstract
BACKGROUND: Low-depth sequencing allows researchers to increase sample size at the expense of lower accuracy. To incorporate uncertainties while maintaining statistical power, we introduce MCPCA_PopGen to analyze population structure of low-depth sequencing data. RESULTS: The method optimizes the choice of nonlinear transformations of dosages to maximize the Ky Fan norm of the covariance matrix. The transformation incorporates the uncertainty in calling between heterozygotes and the common homozygotes for loci having a rare allele and is more linear when both variants are common. CONCLUSIONS: We apply MCPCA_PopGen to samples from two indigenous Siberian populations and reveal hidden population structure accurately using only a single chromosome. The MCPCA_PopGen package is available on https://github.com/yiwenstat/MCPCA_PopGen .Note
Open access journalISSN
1471-2105PubMed ID
34174829Version
Final published versionae974a485f413a2113503eed53cd6c53
10.1186/s12859-021-04265-7
Scopus Count
Collections
Except where otherwise noted, this item's license is described as Copyright © The Author(s), 2021. This article is licensed under a Creative Commons Attribution 4.0 International License.
Related articles
- Low-depth genotyping-by-sequencing (GBS) in a bovine population: strategies to maximize the selection of high quality genotypes and the accuracy of imputation.
- Authors: Brouard JS, Boyle B, Ibeagha-Awemu EM, Bissonnette N
- Issue date: 2017 Apr 5
- polyRAD: Genotype Calling with Uncertainty from Sequencing Data in Polyploids and Diploids.
- Authors: Clark LV, Lipka AE, Sacks EJ
- Issue date: 2019 Mar 7
- Genotype-Frequency Estimation from High-Throughput Sequencing Data.
- Authors: Maruki T, Lynch M
- Issue date: 2015 Oct
- ngsTools: methods for population genetics analyses from next-generation sequencing data.
- Authors: Fumagalli M, Vieira FG, Linderoth T, Nielsen R
- Issue date: 2014 May 15
- Very low-depth whole-genome sequencing in complex trait association studies.
- Authors: Gilly A, Southam L, Suveges D, Kuchenbaecker K, Moore R, Melloni GEM, Hatzikotoulas K, Farmaki AE, Ritchie G, Schwartzentruber J, Danecek P, Kilian B, Pollard MO, Ge X, Tsafantakis E, Dedoussis G, Zeggini E
- Issue date: 2019 Aug 1

