Show simple item record

dc.contributor.advisorWedel, Andrew B.
dc.contributor.authorBell, Dane Edward
dc.creatorBell, Dane Edward
dc.date.accessioned2018-06-28T00:39:19Z
dc.date.available2018-06-28T00:39:19Z
dc.date.issued2018
dc.identifier.urihttp://hdl.handle.net/10150/628180
dc.description.abstractThe incidence of type 2 diabetes mellitus is rising in the United States and worldwide. Diabetes is a common, debilitating, and deadly disease that begins with a long asymptomatic period, during which its severity can be limited through intervention. Intervention improves with earlier detection, but most prediabetic individuals are unaware of their risk. This study uses machine learning for the linguistic analysis of social media text to detect diabetes risk. To this end, it seeks to answer the questions "What linguistic features most indicate diabetes risk," "What algorithms best detect diabetes risk from these features," and "How can the data to train such algorithms best be collected?" To address these questions, I describe findings from an experiment in eliciting participation in data collection through an initial risk classifier based on public sources. I continue by comparing various linguistic feature sets and machine learning algorithms in detecting body mass index (kg/m^2, a risk factor for diabetes) as well as a more complete diabetes risk measure. Results show that participant engagement with the results of research is robust, but few of these individuals are willing to participate in the research when any personally identifiable data is collected. From these results, it is also evident that limiting feature sets to lexicons of domain-relevant words such as food and exercise terms can be effective, and that modeling a writer's gender and a text's recency can improve detection, along with distinguishing quoted text from original text. This work is a first step toward detecting diabetes risk, with the ultimate goal of designing effective, automated, and individualized interventions through social media. It has shown that language is a valuable predictor of important health variables, and proposes a novel method for accounting for a writer's gender when analyzing their text. Future work will benefit from pursuing larger datasets, potentially through methods described in this work, and from multimodal algorithms capitalizing from the interplay between text and images.en_US
dc.language.isoen_USen_US
dc.publisherThe University of Arizona.en_US
dc.rightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.en_US
dc.subjectmachine learningen_US
dc.subjectneural networksen_US
dc.subjectsocial mediaen_US
dc.subjecttype 2 diabetes mellitusen_US
dc.titleDetecting Preventable Disease Risk on Social Mediaen_US
dc.typetexten_US
dc.typeElectronic Dissertationen_US
thesis.degree.grantorUniversity of Arizonaen_US
thesis.degree.leveldoctoralen_US
dc.contributor.committeememberBever, Thomas G.
dc.contributor.committeememberSurdeanu, Mihai
dc.description.releaseRelease after 18-May-2019en_US
thesis.degree.disciplineGraduate Collegeen_US
thesis.degree.disciplineLinguisticsen_US
thesis.degree.namePh.D.en_US


Files in this item

Thumbnail
Name:
azu_etd_16288_sip1_m.pdf
Size:
8.695Mb
Format:
PDF

This item appears in the following Collection(s)

Show simple item record