Penalized Regression Methods in the Study of Serum Biomarkers for Overweight and Obesity
AuthorVasquez, Monica M.
MetadataShow full item record
PublisherThe University of Arizona.
RightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
AbstractThe study of circulating biomarkers and their association with disease outcomes has become progressively complex due to advances in the measurement of these biomarkers through multiplex technologies. Although the availability of numerous serum biomarkers is highly promising, multiplex assays present statistical challenges due to the high dimensionality of these data. In this dissertation, three studies are presented that address these challenges using L1 penalized regression methods. In the first part of the dissertation, an extensive simulation study is performed for the logistic regression model that compares the Least Absolute Shrinkage and Selection Operator (LASSO) method with five LASSO-type methods given scenarios that are present in serum biomarker research, such as high correlation between biomarkers, weak associations with the outcome, and sparse number of true signals. Results show that choice of optimal LASSO-type method is dependent on data structure and should be guided by the research objective. Methods are then applied to the Tucson Epidemiological Study of Airway Obstructive Disease (TESAOD) study for the identification of serum biomarkers of overweight and obesity. Measurement of serum biomarkers using multiplex technologies may be more variable as compared to traditional single biomarker methods. Measurement error may induce bias in parameter estimation and complicate the variable selection process. In the second part of the dissertation, an existing measurement error correction method for penalized linear regression with L1 penalty has been adapted to accommodate validation data on a randomly selected subset of the study sample. A simulation study and analysis of TESAOD data demonstrate that the proposed approach improves variable selection and reduces bias in parameter estimation for validation data as small as 10 percent of the study sample. In the third part of the dissertation, a measurement error correction method that utilizes validation data is proposed for the penalized logistic regression model with the L1 penalty. A simulation study and analysis of TESAOD data are used to evaluate the proposed method. Results show an improvement in variable selection.
Degree ProgramGraduate College