Cheap and Good? Simple and Effective Data Augmentation for Low Resource Machine Reading
Name:
Data_Augmentation_SIGIR_21_UA_ ...
Size:
455.5Kb
Format:
PDF
Description:
Final Accepted Manuscript
Publisher
ACMCitation
Van, H., Yadav, V., & Surdeanu, M. (2021). Cheap and Good? Simple and Effective Data Augmentation for Low Resource Machine Reading. SIGIR 2021 - Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2116–2120.Rights
© 2021 Copyright held by the owner/author(s). Publication rights licensed to ACM.Collection Information
This item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at repository@u.library.arizona.edu.Abstract
We propose a simple and effective strategy for data augmentation for low-resource machine reading comprehension (MRC). Our approach first pretrains the answer extraction components of a MRC system on the augmented data that contains approximate context of the correct answers, before training it on the exact answer spans. The approximate context helps the QA method components in narrowing the location of the answers. We demonstrate that our simple strategy substantially improves both document retrieval and answer extraction performance by providing larger context of the answers and additional training data. In particular, our method significantly improves the performance of BERT based retriever (15.12%), and answer extractor (4.33% F1) on TechQA, a complex, low-resource MRC task. Further, our data augmentation strategy yields significant improvements of up to 3.9% exact match (EM) and 2.7% F1 for answer extraction on PolicyQA, another practical but moderate sized QA dataset that also contains long answer spans.Note
Immediate accessVersion
Final accepted manuscriptSponsors
DARPAae974a485f413a2113503eed53cd6c53
10.1145/3404835.3463099