OD-TQA: On-Demand Visual Augmentation for Textual Question Answering Task
dc.contributor.advisor | Surdeanu, Mihai | |
dc.contributor.author | Ehsani, Sina | |
dc.creator | Ehsani, Sina | |
dc.date.accessioned | 2022-12-17T00:11:07Z | |
dc.date.available | 2022-12-17T00:11:07Z | |
dc.date.issued | 2022 | |
dc.identifier.citation | Ehsani, Sina. (2022). OD-TQA: On-Demand Visual Augmentation for Textual Question Answering Task (Master's thesis, University of Arizona, Tucson, USA). | |
dc.identifier.uri | http://hdl.handle.net/10150/667289 | |
dc.description.abstract | Textual Question Answering is a difficult task that has been studied for over a decade. With the rise of transformer networks, there has been an increase in the utilization of external knowledge (pre-trained models) on this task. However, these methodologies are missing a critical component: external visual comprehension. When asked a question, we as humans use imagination, in the form of vision and audio, to better understand the concepts of the question, and that is what we are doing in this study: providing machines with the necessary visualization to allow them to comprehend a given question and generate more pertinent answers. This is accomplished using Google's image search, which provides us with access to worldwide knowledge. A novel methodology for determining the best answer using on-demand visual grounding is presented, and various multimedia model designs are introduced and compared. Lastly, we demonstrated that the proposed solution outperforms the previous system without any pre-training, proving the benefits of the on-demand image retrieval concept for textual question answering task. | |
dc.language.iso | en | |
dc.publisher | The University of Arizona. | |
dc.rights | Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction, presentation (such as public display or performance) of protected items is prohibited except with permission of the author. | |
dc.rights.uri | http://rightsstatements.org/vocab/InC/1.0/ | |
dc.subject | Data Augmentation | |
dc.subject | Multimedia Information Retrieval | |
dc.subject | Multimodal Deep Learning | |
dc.subject | Natural Language Processing | |
dc.subject | Question Answering | |
dc.subject | Visual Grounding | |
dc.title | OD-TQA: On-Demand Visual Augmentation for Textual Question Answering Task | |
dc.type | text | |
dc.type | Electronic Thesis | |
thesis.degree.grantor | University of Arizona | |
thesis.degree.level | masters | |
dc.contributor.committeemember | Bethard, Steven | |
dc.contributor.committeemember | Barnard, Kobus | |
thesis.degree.discipline | Graduate College | |
thesis.degree.discipline | Computer Science | |
thesis.degree.name | M.S. | |
refterms.dateFOA | 2022-12-17T00:11:07Z |