Show simple item record

dc.contributor.advisorSurdeanu, Mihai
dc.contributor.authorVan, Hoang Nguyen Hung
dc.creatorVan, Hoang Nguyen Hung
dc.date.accessioned2023-01-20T19:14:33Z
dc.date.available2023-01-20T19:14:33Z
dc.date.issued2023
dc.identifier.citationVan, Hoang Nguyen Hung. (2023). Mitigating Data Scarcity for Neural Language Models (Doctoral dissertation, University of Arizona, Tucson, USA).
dc.identifier.urihttp://hdl.handle.net/10150/667710
dc.description.abstractIn recent years, pretrained neural language models (PNLMs) have taken the field of natural language processing by storm, achieving new benchmarks and state-of-theart performances. These models often rely heavily on annotated data, which may not always be available. Data scarcity are commonly found in specialized domains, such as medical, or in low-resource languages that are underexplored by AI research. In this dissertation, we focus on mitigating data scarcity using data augmentation and neural ensemble learning techniques for neural language models. In both research directions, we implement neural network algorithms and evaluate their impact on assisting neural language models in downstream NLP tasks. Specifically, for data augmentation, we explore two techniques: 1) creating positive training data by moving an answer span around its original context and 2) using text simplification techniques to introduce a variety of writing styles to the original training data. Our results indicate that these simple and effective solutions improve the performance of neural language models considerably in low-resource NLP domains and tasks. For neural ensemble learning, we use a multi-label neural classifier to select the best prediction outcome from a variety of individual pretrained neural language models trained for a low-resource medical text simplification task.
dc.language.isoen
dc.publisherThe University of Arizona.
dc.rightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction, presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/
dc.titleMitigating Data Scarcity for Neural Language Models
dc.typetext
dc.typeElectronic Dissertation
thesis.degree.grantorUniversity of Arizona
thesis.degree.leveldoctoral
dc.contributor.committeememberSurdeanu, Mihai
dc.contributor.committeememberBethard, Steven
dc.contributor.committeememberHahn-Powell, Gus
dc.contributor.committeememberLevine, Joshua
thesis.degree.disciplineGraduate College
thesis.degree.disciplineComputer Science
thesis.degree.namePh.D.
refterms.dateFOA2023-01-20T19:14:33Z


Files in this item

Thumbnail
Name:
azu_etd_20234_sip1_m.pdf
Size:
3.574Mb
Format:
PDF

This item appears in the following Collection(s)

Show simple item record