Towards the Advancement of Open-Domain Textual Question Answering Methods
Author
Luo, FanIssue Date
2022Keywords
Active LearningDeep Learning Ranking
Multi-Hop Question Answering
Natural Language Processing
Open-Domain Textual Question Answering
Question-Answering System
Advisor
Surdeanu, Mihai
Metadata
Show full item recordPublisher
The University of Arizona.Rights
Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction, presentation (such as public display or performance) of protected items is prohibited except with permission of the author.Abstract
Open-domain Textual Question Answering (ODQA) aims to answer a question in the form of natural language based on large-scale unstructured documents. While recent neural reading comprehension has advanced performance to new heights, there are several open research questions in this research area, including answering complex questions, providing human interpretable explanations, and minimizing labeling costs. In particular, we investigate Multi-Hop QA (MHQA), a well-known complex QA task, which requires combining multiple supporting context pieces scattered in documents to infer the correct answer. In this dissertation, we investigate the architecture of a typical ODQA system as well as the techniques adopted in each of the components, and present solutions that improve the systems to deal with the challenges aforementioned. First, we presented an unsupervised graph-based method called STEP for identifying bridge phrases that connect the supporting evidence, which remains one of the challenging problems for MHQA. The identified bridge phrases are then used to expand the query, helping in increasing the relevance of evidence that has little lexical overlap or semantic relation with the question. In our second work, we presented a hybrid solution to improve the effectiveness of evidence re-ranking for MHQA, which scores and ranks the retrieved contexts to push the most useful information to the top before feeding to the answer prediction module. We applied off-the-shelf statistical and neural models to capture different dimensions of relevance, including exact matching, semantic textual similarity, and textual entailment, and effectively combined them to jointly rank candidate evidences. Lastly, to advance MHQA or ODQA in general, the underlying reader model needs to be improved. Due to the 'data-hungry' of the state-of-art reader model and the high annotation cost in practice, we applied a deep active learning technique and presented a perturbation-based acquisition function to select unlabeled training instances that are most informative to annotate in an effective way to reduce the annotation effort. All the approaches presented in the dissertation easily fit into a typical modern ODQA architecture.Type
textElectronic Dissertation
Degree Name
Ph.D.Degree Level
doctoralDegree Program
Graduate CollegeComputer Science