Cheap and Good? Simple and Effective Data Augmentation for Low Resource Machine Reading
| dc.contributor.author | Van, Hoang | |
| dc.contributor.author | Yadav, Vikas | |
| dc.contributor.author | Surdeanu, Mihai | |
| dc.date.accessioned | 2021-08-19T17:48:12Z | |
| dc.date.available | 2021-08-19T17:48:12Z | |
| dc.date.issued | 2021-07-11 | |
| dc.identifier.citation | Van, H., Yadav, V., & Surdeanu, M. (2021). Cheap and Good? Simple and Effective Data Augmentation for Low Resource Machine Reading. SIGIR 2021 - Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2116–2120. | en_US |
| dc.identifier.doi | 10.1145/3404835.3463099 | |
| dc.identifier.uri | http://hdl.handle.net/10150/661305 | |
| dc.description.abstract | We propose a simple and effective strategy for data augmentation for low-resource machine reading comprehension (MRC). Our approach first pretrains the answer extraction components of a MRC system on the augmented data that contains approximate context of the correct answers, before training it on the exact answer spans. The approximate context helps the QA method components in narrowing the location of the answers. We demonstrate that our simple strategy substantially improves both document retrieval and answer extraction performance by providing larger context of the answers and additional training data. In particular, our method significantly improves the performance of BERT based retriever (15.12%), and answer extractor (4.33% F1) on TechQA, a complex, low-resource MRC task. Further, our data augmentation strategy yields significant improvements of up to 3.9% exact match (EM) and 2.7% F1 for answer extraction on PolicyQA, another practical but moderate sized QA dataset that also contains long answer spans. | en_US |
| dc.description.sponsorship | DARPA | en_US |
| dc.language.iso | en | en_US |
| dc.publisher | ACM | en_US |
| dc.rights | © 2021 Copyright held by the owner/author(s). Publication rights licensed to ACM. | en_US |
| dc.rights.uri | http://rightsstatements.org/vocab/InC/1.0/ | en_US |
| dc.source | Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval | |
| dc.subject | data augmentation | en_US |
| dc.subject | document retrieval | en_US |
| dc.subject | question answering | en_US |
| dc.title | Cheap and Good? Simple and Effective Data Augmentation for Low Resource Machine Reading | en_US |
| dc.type | Article | en_US |
| dc.contributor.department | University of Arizona | en_US |
| dc.identifier.journal | SIGIR 2021 - Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval | en_US |
| dc.description.note | Immediate access | en_US |
| dc.description.collectioninformation | This item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at repository@u.library.arizona.edu. | en_US |
| dc.eprint.version | Final accepted manuscript | en_US |
| dc.identifier.pii | 10.1145/3404835.3463099 | |
| dc.identifier.pii | 10.1145/3404835 | |
| refterms.dateFOA | 2021-08-19T17:48:14Z |
