Show simple item record

dc.contributor.advisorHu, Chengcheng
dc.contributor.authorWhite, Lisa Michelle
dc.creatorWhite, Lisa Michelle
dc.date.accessioned2020-03-13T20:53:48Z
dc.date.available2020-03-13T20:53:48Z
dc.date.issued2019
dc.identifier.urihttp://hdl.handle.net/10150/637719
dc.description.abstractRandom forest and gradient boosting models are commonly found in publications using prediction models. They are referenced almost interchangeably within data competitions as easy methods for analyzing big data. This thesis compared the prediction accuracy, sensitivity, and specificity of the two methods using simulated data covering a variety of data characteristics. Gradient boosting and random forest had similar accuracy when the data had equal numbers of observations for the binary outcome. However, gradient boosting greatly outperformed random forest as sample size and variable number increased. Gradient boosting also had markedly higher sensitivity and specificity regardless of data characteristics when the outcomes were equal. Both methods had low values in all three categories measured when the binary outcomes were not equally represented, however gradient boosting still had better prediction sensitivity and specificity than random forest. We illustrated the methods using real data from a study of human experts identifying musk-like aromatic molecules. The data contain chemical properties that could potentially be used to predict whether a molecule could be classified as musk without expert identification. As demonstrated by the simulation studies, the two methods had similar accuracy, but random forest had slightly higher sensitivity and higher mean prediction specificity.
dc.language.isoen
dc.publisherThe University of Arizona.
dc.rightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction, presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
dc.subjectEnsemble algorithms
dc.subjectGradient boosting
dc.subjectMachine learning
dc.subjectRandom forest
dc.titleComparison of Ensemble Methods
dc.typetext
dc.typeElectronic Thesis
thesis.degree.grantorUniversity of Arizona
thesis.degree.levelmasters
dc.contributor.committeememberBillheimer, Dean
dc.contributor.committeememberBedrick, Edward
dc.description.releaseRelease after 02/06/2022
thesis.degree.disciplineGraduate College
thesis.degree.disciplineBiostatistics
thesis.degree.nameM.S.


Files in this item

Thumbnail
Name:
azu_etd_17758_sip1_m.pdf
Embargo:
2022-02-06
Size:
690.4Kb
Format:
PDF

This item appears in the following Collection(s)

Show simple item record