Publisher
The University of Arizona.Rights
Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction, presentation (such as public display or performance) of protected items is prohibited except with permission of the author.Abstract
The theme of this work is artificial neural networks. We discuss the mathematics of multi-class logistic regression, and secondly, we study the utility and limitations of a pure attention-based approach to optical character recognition (OCR). For multi-class logistic regression, we prove the existence of the minimum of the loss function when applied to gradient descent if the label matrix is fully smoothed. We also find bounds on the smallest and largest eigenvalues of the Hessian and compute its condition number. From the theory of numerical analysis, the condition number gives the maximum contraction rate possible using a learning rate parameter. For attention and OCR, we do experiments on isolated word recognition using a cursive font. These experiments show that attention relies excessively on memorization/correlation of letters, which is a limitation. It has serious trouble recognizing text when the training samples are significantly different from the test samples. This includes the case when the training data set consists of: bigrams; trigrams; random words.Type
textElectronic Dissertation
Degree Name
Ph.D.Degree Level
doctoralDegree Program
Graduate CollegeApplied Mathematics