Improving Neural Net Machine Translation Systems with Linguistic Information
MetadataShow full item record
PublisherThe University of Arizona.
RightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction, presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
AbstractInterlinear Glossed Text (IGT) is widely used in linguistic studies. In a form of Interlinear Glossed Text, the first line is a sentence of the language of interest, the second line is a word-by-word translation, annotated with relevant grammatical information, and the third line is an English translation. The innovation of the current work is to incorporate the gloss information of Interlinear Glossed Text data into neural net machine translation systems. Critically, if the Gaelic data and the gloss data are combined in a specific way as the training data, which is named as Parallel-Partial treatment, the performance of the systems is improved significantly. The systems with Parallel-Partial treatment outperform the baseline systems by 93% and outperform Google translation by 40%. The Parallel-Partial treatment lets the machine learn four sets of mappings: 1.) from source sentences to target sentences, 2.) from gloss lines to target sentences, 3.) from gloss lines to source sentences, and 4) from source language words to gloss items. Moreover, the boosting effect of the Parallel-Partial treatment is consistent across different languages and across neural net machine translation systems with different hyper-parameter settings. How theoretical linguistics may work hand in hand with natural language processing, and how neural net machine learning may exploit linguistics are important questions (Pater 2017). The current work also exemplifies how theoretical linguistics may work hand in hand with natural language processing successfully, in addition to practically building better machine translation systems.
Degree ProgramGraduate College