Show simple item record

dc.contributor.advisorAn, Linglingen
dc.contributor.authorAbdul Wahab, Ahmad Hakeem
dc.creatorAbdul Wahab, Ahmad Hakeemen
dc.date.accessioned2015-08-13T17:03:04Zen
dc.date.available2015-08-13T17:03:04Zen
dc.date.issued2015en
dc.identifier.urihttp://hdl.handle.net/10150/566996en
dc.description.abstractMetagenomics holds unyielding potential in uncovering relationships within microbial communities that have yet to be discovered, particularly because the field circumvents the need to isolate and culture microbes from their natural environmental settings. A common research objective is to detect biomarkers, microbes are associated with changes in a status. For instance, determining such microbes across conditions such as healthy and diseased groups for instance allows researchers to identify pathogens and probiotics. This is often achieved via analysis of differential abundance of microbes. The problem is that differential abundance analysis looks at each microbe individually without considering the possible associations the microbes may have with each other. This is not favorable, since microbes rarely act individually but within intricate communities involving other microbes. An alternative would be variable selection techniques such as Lasso or Elastic Net which considers all the microbes simultaneously and conducts selection. However, Lasso often selects only a representative feature of a correlated cluster of features and the Elastic Net may incorrectly select unimportant features too frequently and erratically due to high levels of sparsity and variation in the data.\par In this research paper, the proposed method AdaLassop is an augmented variable selection technique that overcomes the misgivings of Lasso and Elastic Net. It provides researchers with a holistic model that takes into account the effects of selected biomarkers in presence of other important biomarkers. For AdaLassop, variable selection on sparse ultra-high dimensional data is implemented using the Adaptive Lasso with p-values extracted from Zero Inflated Negative Binomial Regressions as augmented weights. Comprehensive simulations involving varying correlation structures indicate that AdaLassop has optimal performance in the presence multicollinearity. This is especially apparent as sample size grows. Application of Adalassop on a Metagenome-wide study of diabetic patients reveals both pathogens and probiotics that have been researched in the medical field.
dc.language.isoen_USen
dc.publisherThe University of Arizona.en
dc.rightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.en
dc.subjectAdaptive Lassoen
dc.subjectBiomarkeren
dc.subjectMetagenomicsen
dc.subjectVariable Selectionen
dc.subjectStatisticsen
dc.subjectAdaptive Elastic Neten
dc.titleStatistical Discovery of Biomarkers in Metagenomicsen_US
dc.typetexten
dc.typeElectronic Thesisen
thesis.degree.grantorUniversity of Arizonaen
thesis.degree.levelmastersen
dc.contributor.committeememberHao, Ningen
dc.contributor.committeememberHurwitz, Bonnieen
dc.description.releaseRelease after 29-Jan-2016en
thesis.degree.disciplineGraduate Collegeen
thesis.degree.disciplineStatisticsen
thesis.degree.nameM.S.en
refterms.dateFOA2016-01-29T00:00:00Z
html.description.abstractMetagenomics holds unyielding potential in uncovering relationships within microbial communities that have yet to be discovered, particularly because the field circumvents the need to isolate and culture microbes from their natural environmental settings. A common research objective is to detect biomarkers, microbes are associated with changes in a status. For instance, determining such microbes across conditions such as healthy and diseased groups for instance allows researchers to identify pathogens and probiotics. This is often achieved via analysis of differential abundance of microbes. The problem is that differential abundance analysis looks at each microbe individually without considering the possible associations the microbes may have with each other. This is not favorable, since microbes rarely act individually but within intricate communities involving other microbes. An alternative would be variable selection techniques such as Lasso or Elastic Net which considers all the microbes simultaneously and conducts selection. However, Lasso often selects only a representative feature of a correlated cluster of features and the Elastic Net may incorrectly select unimportant features too frequently and erratically due to high levels of sparsity and variation in the data.\par In this research paper, the proposed method AdaLassop is an augmented variable selection technique that overcomes the misgivings of Lasso and Elastic Net. It provides researchers with a holistic model that takes into account the effects of selected biomarkers in presence of other important biomarkers. For AdaLassop, variable selection on sparse ultra-high dimensional data is implemented using the Adaptive Lasso with p-values extracted from Zero Inflated Negative Binomial Regressions as augmented weights. Comprehensive simulations involving varying correlation structures indicate that AdaLassop has optimal performance in the presence multicollinearity. This is especially apparent as sample size grows. Application of Adalassop on a Metagenome-wide study of diabetic patients reveals both pathogens and probiotics that have been researched in the medical field.


Files in this item

Thumbnail
Name:
azu_etd_14042_sip1_m.pdf
Size:
1.457Mb
Format:
PDF

This item appears in the following Collection(s)

Show simple item record