Author
BERTSCH, AMANDA LYNNIssue Date
2021Advisor
Bethard, Steven
Metadata
Show full item recordPublisher
The University of Arizona.Rights
Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.Abstract
On Wikipedia, an online crowdsourced encyclopedia, volunteers enforce the encyclopedia’s editorial policies. Wikipedia’s detailed policy on maintaining a neutral point of view has made the project a popular target for NLP researchers working on bias detection or sentiment analysis. Often, this work focuses on a particular category of bias that Wikipedia identifies; while “weasel words” and “hedges” have both received significant attention, little work has been done on identifying “peacock phrases,” phrases that are overly positive without a verifiable source. In this work, we present a model for identifying peacock phrases that achieves a 0.963 f1 score. We also discuss the general issues inherent in building a dataset from Wikipedia, with this project as a case study. Finally, we demonstrate a way to use Wikipedia’s public infrastructure to host a tool that uses the trained model to give back to the Wikipedia editor community.Type
Electronic thesistext
Degree Name
B.S.Degree Level
bachelorsDegree Program
Computer ScienceHonors College