Iterative hard thresholding in genome-wide association studies: Generalized linear models, prior weights, and double sparsity
Author
Chu, Benjamin B.Keys, Kevin L.
German, Christopher A.
Zhou, Hua
Zhou, Jin J.
Sobel, Eric M.
Sinsheimer, Janet S.
Lange, Kenneth
Affiliation
Univ Arizona, Div Epidemiol & BiostatIssue Date
2020-06
Metadata
Show full item recordPublisher
OXFORD UNIV PRESSCitation
Chu, B. B., Keys, K. L., German, C. A., Zhou, H., Zhou, J. J., Sobel, E. M., ... & Lange, K. (2020). Iterative hard thresholding in genome-wide association studies: Generalized linear models, prior weights, and double sparsity. GigaScience, 9(6), giaa044.Journal
GIGASCIENCERights
© The Author(s) 2020. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/).Collection Information
This item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at repository@u.library.arizona.edu.Abstract
Background: Consecutive testing of single nucleotide polymorphisms (SNPs) is usually employed to identify genetic variants associated with complex traits. Ideally one should model all covariates in unison, but most existing analysis methods for genome-wide association studies (GWAS) perform only univariate regression. Results: We extend and efficiently implement iterative hard thresholding (IHT) for multiple regression, treating all SNPs simultaneously. Our extensions accommodate generalized linear models, prior information on genetic variants, and grouping of variants. In our simulations, IHT recovers up to 30% more true predictors than SNP-by-SNP association testing and exhibits a 2-3 orders of magnitude decrease in false-positive rates compared with lasso regression. We also test IHT on the UK Biobank hypertension phenotypes and the Northern Finland Birth Cohort of 1966 cardiovascular phenotypes. We find that IHT scales to the large datasets of contemporary human genetics and recovers the plausible genetic variants identified by previous studies. Conclusions: Our real data analysis and simulation studies suggest that IHT can (i) recover highly correlated predictors, (ii) avoid over-fitting, (iii) deliver better true-positive and false-positive rates than either marginal testing or lasso regression, (iv) recover unbiased regression coefficients, (v) exploit prior information and group-sparsity, and (vi) be used with biobank-sized datasets. Although these advances are studied for genome-wide association studies inference, our extensions are pertinent to other regression problems with large numbers of predictors.Note
Open access journalISSN
2047-217XPubMed ID
32491161Version
Final published versionae974a485f413a2113503eed53cd6c53
10.1093/gigascience/giaa044
Scopus Count
Collections
Except where otherwise noted, this item's license is described as © The Author(s) 2020. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/).
Related articles
- Iterative hard thresholding for model selection in genome-wide association studies.
- Authors: Keys KL, Chen GK, Lange K
- Issue date: 2017 Dec
- Multivariate genome-wide association analysis by iterative hard thresholding.
- Authors: Chu BB, Ko S, Zhou JJ, Jensen A, Zhou H, Sinsheimer JS, Lange K
- Issue date: 2023 Apr 3
- Smooth-Threshold Multivariate Genetic Prediction with Unbiased Model Selection.
- Authors: Ueki M, Tamiya G, Alzheimer's Disease Neuroimaging Initiative
- Issue date: 2016 Apr
- Efficient Implementation of Penalized Regression for Genetic Risk Prediction.
- Authors: Privé F, Aschard H, Blum MGB
- Issue date: 2019 May
- A fast algorithm for Bayesian multi-locus model in genome-wide association studies.
- Authors: Duan W, Zhao Y, Wei Y, Yang S, Bai J, Shen S, Du M, Huang L, Hu Z, Chen F
- Issue date: 2017 Aug