Citation
Kabiri, R., Karimi, S., & Surdeanu, M. (2022). Informal persian universal dependency treebank. arXiv preprint arXiv:2201.03679.Rights
© 2022. The Author(s). This work uses a Creative Commons CC BY license: https://creativecommons.org/licenses/by/4.0/.Collection Information
This item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at repository@u.library.arizona.edu.Abstract
This paper presents the phonological, morphological, and syntactic distinctions between formal and informal Persian, showing that these two variants have fundamental differences that cannot be attributed solely to pronunciation discrepancies. Given that informal Persian exhibits particular characteristics, any computational model trained on formal Persian is unlikely to transfer well to informal Persian, necessitating the creation of dedicated treebanks for this variety. We thus detail the development of the open-source Informal Persian Universal Dependency Treebank, a new treebank annotated within the Universal Dependencies scheme. We then investigate the parsing of informal Persian by training two dependency parsers on existing formal treebanks and evaluating them on out-of-domain data, i.e. the development set of our informal treebank. Our results show that parsers experience a substantial performance drop when we move across the two domains, as they face more unknown tokens and structures and fail to generalize well. Furthermore, the dependency relations whose performance deteriorates the most represent the unique properties of the informal variant. The ultimate goal of this study that demonstrates a broader impact is to provide a stepping-stone to reveal the significance of informal variants of languages, which have been widely overlooked in natural language processing tools across languages. © European Language Resources Association (ELRA), licensed under CC-BY-NC-4.0.Note
Open access journalISBN
9791095546726Version
Final published versionae974a485f413a2113503eed53cd6c53
10.48550/arXiv.2201.03679
Scopus Count
Collections
Except where otherwise noted, this item's license is described as © 2022. The Author(s). This work uses a Creative Commons CC BY license: https://creativecommons.org/licenses/by/4.0/.