OD-TQA: On-Demand Visual Augmentation for Textual Question Answering Task
Multimedia Information Retrieval
Multimodal Deep Learning
Natural Language Processing
MetadataShow full item record
PublisherThe University of Arizona.
RightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction, presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
AbstractTextual Question Answering is a difficult task that has been studied for over a decade. With the rise of transformer networks, there has been an increase in the utilization of external knowledge (pre-trained models) on this task. However, these methodologies are missing a critical component: external visual comprehension. When asked a question, we as humans use imagination, in the form of vision and audio, to better understand the concepts of the question, and that is what we are doing in this study: providing machines with the necessary visualization to allow them to comprehend a given question and generate more pertinent answers. This is accomplished using Google's image search, which provides us with access to worldwide knowledge. A novel methodology for determining the best answer using on-demand visual grounding is presented, and various multimedia model designs are introduced and compared. Lastly, we demonstrated that the proposed solution outperforms the previous system without any pre-training, proving the benefits of the on-demand image retrieval concept for textual question answering task.
Degree ProgramGraduate College