A novel nonlinear dimension reduction approach to infer population structure for low-coverage sequencing data
AffiliationInterdisciplinary Program in Statistics and Data Science, University of Arizona
Department of Mathematics, University of Arizona
Department of Epidemiology and Biostatistics, University of Arizona
MetadataShow full item record
CitationZhang, M., Liu, Y., Zhou, H., Watkins, J., & Zhou, J. (2021). A novel nonlinear dimension reduction approach to infer population structure for low-coverage sequencing data. BMC Bioinformatics, 22(1), 348.
RightsCopyright © The Author(s), 2021. This article is licensed under a Creative Commons Attribution 4.0 International License.
Collection InformationThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at email@example.com.
AbstractBACKGROUND: Low-depth sequencing allows researchers to increase sample size at the expense of lower accuracy. To incorporate uncertainties while maintaining statistical power, we introduce MCPCA_PopGen to analyze population structure of low-depth sequencing data. RESULTS: The method optimizes the choice of nonlinear transformations of dosages to maximize the Ky Fan norm of the covariance matrix. The transformation incorporates the uncertainty in calling between heterozygotes and the common homozygotes for loci having a rare allele and is more linear when both variants are common. CONCLUSIONS: We apply MCPCA_PopGen to samples from two indigenous Siberian populations and reveal hidden population structure accurately using only a single chromosome. The MCPCA_PopGen package is available on https://github.com/yiwenstat/MCPCA_PopGen .
NoteOpen access journal
VersionFinal published version
Except where otherwise noted, this item's license is described as Copyright © The Author(s), 2021. This article is licensed under a Creative Commons Attribution 4.0 International License.
- Genotype-Frequency Estimation from High-Throughput Sequencing Data.
- Authors: Maruki T, Lynch M
- Issue date: 2015 Oct
- polyRAD: Genotype Calling with Uncertainty from Sequencing Data in Polyploids and Diploids.
- Authors: Clark LV, Lipka AE, Sacks EJ
- Issue date: 2019 Mar 7
- Low-depth genotyping-by-sequencing (GBS) in a bovine population: strategies to maximize the selection of high quality genotypes and the accuracy of imputation.
- Authors: Brouard JS, Boyle B, Ibeagha-Awemu EM, Bissonnette N
- Issue date: 2017 Apr 5
- Using genotype array data to compare multi- and single-sample variant calls and improve variant call sets from deep coverage whole-genome sequencing data.
- Authors: Shringarpure SS, Mathias RA, Hernandez RD, O'Connor TD, Szpiech ZA, Torres R, De La Vega FM, Bustamante CD, Barnes KC, Taub MA, CAAPA Consortium.
- Issue date: 2017 Apr 15
- Genotype Calling from Population-Genomic Sequencing Data.
- Authors: Maruki T, Lynch M
- Issue date: 2017 May 5