Publisher
The University of Arizona.Rights
Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction, presentation (such as public display or performance) of protected items is prohibited except with permission of the author.Abstract
Pre-trained transformer is a class of neural networks behind many recent natural language processing systems. Its success is often attributed to linguistic knowledge injected during the pre-training process. In this work, we make multiple attempts to surgically remove language specific knowledge from BERT. Surprisingly, these interventions often do little damage to BERT's performance on GLUE tasks. By contrasting against non-pre-trained transformers with oracle initialization, we argue that when it comes to explain BERT's working, there is a sizable void below linguistic probing and above model initialization.Type
textElectronic Thesis
Degree Name
M.S.Degree Level
mastersDegree Program
Graduate CollegeComputer Science