Show simple item record

dc.contributor.authorVasquez, Monica M.
dc.contributor.authorHu, Chengcheng
dc.contributor.authorRoe, Denise J.
dc.contributor.authorChen, Zhao
dc.contributor.authorHalonen, Marilyn
dc.contributor.authorGuerra, Stefano
dc.date.accessioned2017-03-14T00:12:45Z
dc.date.available2017-03-14T00:12:45Z
dc.date.issued2016-11-14
dc.identifier.citationLeast absolute shrinkage and selection operator type methods for the identification of serum biomarkers of overweight and obesity: simulation and application 2016, 16 (1) BMC Medical Research Methodologyen
dc.identifier.issn1471-2288
dc.identifier.pmid27842498
dc.identifier.doi10.1186/s12874-016-0254-8
dc.identifier.urihttp://hdl.handle.net/10150/622824
dc.description.abstractBackground: The study of circulating biomarkers and their association with disease outcomes has become progressively complex due to advances in the measurement of these biomarkers through multiplex technologies. The Least Absolute Shrinkage and Selection Operator (LASSO) is a data analysis method that may be utilized for biomarker selection in these high dimensional data. However, it is unclear which LASSO-type method is preferable when considering data scenarios that may be present in serum biomarker research, such as high correlation between biomarkers, weak associations with the outcome, and sparse number of true signals. The goal of this study was to compare the LASSO to five LASSO-type methods given these scenarios. Methods: A simulation study was performed to compare the LASSO, Adaptive LASSO, Elastic Net, Iterated LASSO, Bootstrap-Enhanced LASSO, and Weighted Fusion for the binary logistic regression model. The simulation study was designed to reflect the data structure of the population-based Tucson Epidemiological Study of Airway Obstructive Disease (TESAOD), specifically the sample size (N = 1000 for total population, 500 for sub-analyses), correlation of biomarkers (0.20, 0.50, 0.80), prevalence of overweight (40%) and obese (12%) outcomes, and the association of outcomes with standardized serum biomarker concentrations (log-odds ratio = 0.05-1.75). Each LASSO-type method was then applied to the TESAOD data of 306 overweight, 66 obese, and 463 normal-weight subjects with a panel of 86 serum biomarkers. Results: Based on the simulation study, no method had an overall superior performance. The Weighted Fusion correctly identified more true signals, but incorrectly included more noise variables. The LASSO and Elastic Net correctly identified many true signals and excluded more noise variables. In the application study, biomarkers of overweight and obesity selected by all methods were Adiponectin, Apolipoprotein H, Calcitonin, CD14, Complement 3, C-reactive protein, Ferritin, Growth Hormone, Immunoglobulin M, Interleukin-18, Leptin, Monocyte Chemotactic Protein-1, Myoglobin, Sex Hormone Binding Globulin, Surfactant Protein D, and YKL-40. Conclusions: For the data scenarios examined, choice of optimal LASSO-type method was data structure dependent and should be guided by the research objective. The LASSO-type methods identified biomarkers that have known associations with obesity and obesity related conditions.
dc.description.sponsorshipCADET award [HL107188]; R01 award from the National Heart, Lung, and Blood Institute [HL095021]en
dc.language.isoenen
dc.publisherBIOMED CENTRAL LTDen
dc.relation.urlhttp://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-016-0254-8en
dc.rights© The Author(s) 2016. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).en
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subjectLASSOen
dc.subjectBiomarkersen
dc.subjectHigh-Dimensionalen
dc.subjectObesityen
dc.subjectOverweighten
dc.titleLeast absolute shrinkage and selection operator type methods for the identification of serum biomarkers of overweight and obesity: simulation and applicationen
dc.typeArticleen
dc.contributor.departmentUniv Arizona, Mel & Enid Zuckerman Coll Publ Hlthen
dc.contributor.departmentUniv Arizona, Asthma & Airway Dis Res Ctren
dc.identifier.journalBMC Medical Research Methodologyen
dc.description.collectioninformationThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at repository@u.library.arizona.edu.en
dc.eprint.versionFinal published versionen
refterms.dateFOA2018-09-11T17:57:19Z
html.description.abstractBackground: The study of circulating biomarkers and their association with disease outcomes has become progressively complex due to advances in the measurement of these biomarkers through multiplex technologies. The Least Absolute Shrinkage and Selection Operator (LASSO) is a data analysis method that may be utilized for biomarker selection in these high dimensional data. However, it is unclear which LASSO-type method is preferable when considering data scenarios that may be present in serum biomarker research, such as high correlation between biomarkers, weak associations with the outcome, and sparse number of true signals. The goal of this study was to compare the LASSO to five LASSO-type methods given these scenarios. Methods: A simulation study was performed to compare the LASSO, Adaptive LASSO, Elastic Net, Iterated LASSO, Bootstrap-Enhanced LASSO, and Weighted Fusion for the binary logistic regression model. The simulation study was designed to reflect the data structure of the population-based Tucson Epidemiological Study of Airway Obstructive Disease (TESAOD), specifically the sample size (N = 1000 for total population, 500 for sub-analyses), correlation of biomarkers (0.20, 0.50, 0.80), prevalence of overweight (40%) and obese (12%) outcomes, and the association of outcomes with standardized serum biomarker concentrations (log-odds ratio = 0.05-1.75). Each LASSO-type method was then applied to the TESAOD data of 306 overweight, 66 obese, and 463 normal-weight subjects with a panel of 86 serum biomarkers. Results: Based on the simulation study, no method had an overall superior performance. The Weighted Fusion correctly identified more true signals, but incorrectly included more noise variables. The LASSO and Elastic Net correctly identified many true signals and excluded more noise variables. In the application study, biomarkers of overweight and obesity selected by all methods were Adiponectin, Apolipoprotein H, Calcitonin, CD14, Complement 3, C-reactive protein, Ferritin, Growth Hormone, Immunoglobulin M, Interleukin-18, Leptin, Monocyte Chemotactic Protein-1, Myoglobin, Sex Hormone Binding Globulin, Surfactant Protein D, and YKL-40. Conclusions: For the data scenarios examined, choice of optimal LASSO-type method was data structure dependent and should be guided by the research objective. The LASSO-type methods identified biomarkers that have known associations with obesity and obesity related conditions.


Files in this item

Thumbnail
Name:
art_3A10.1186_2Fs12874-016-025 ...
Size:
486.2Kb
Format:
PDF
Description:
Final Published Version

This item appears in the following Collection(s)

Show simple item record

© The Author(s) 2016. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).
Except where otherwise noted, this item's license is described as © The Author(s) 2016. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).